AFL++ Instrumentation in Practice: A Trace from Compilation to Fuzz II

In the previous post, we stopped at the point where AFL++ had produced instrumented binaries with PCGUARD and LTO. In this installment, we pick up the story at runtime: tracing the execution of those binaries, observing how coverage signals are collected, and following how afl-fuzz consumes that feedback to discover new paths. This is where the theory and compilation details come together, revealing the full feedback loop in action.

Quick Recap & Setup

To make the exploration more tangible, we are using the LTO binary we compiled in the latest post.

As a reminder, we are using AFL++ v4.33c, (commit eadc8a).

Gathering coverage from the Instrumented Binary

afl-fuzz

It’s time to start afl-fuzz and see how the coverage is compared to determine if an input has discovered new edges.

We start at main in src/afl-fuzz.c. Here, the vast majority of the code deals with initialization options for the different modes AFL++ offers.

Our starting point is the call to get_map_size. This function tries to get the size of the shared memory coverage map by reading it from an env variable, otherwise returns DEFAULT_SHMEM_SIZE (8 MB) as shown below.


/* Reads the map size from ENV */
u32 get_map_size(void) {
  uint32_t map_size = DEFAULT_SHMEM_SIZE;
  char     *ptr;

  if ((ptr = getenv("AFL_MAP_SIZE")) || (ptr = getenv("AFL_MAPSIZE"))) {
    map_size = atoi(ptr);
    if (!map_size || map_size > (1 << 29)) {
      FATAL("illegal AFL_MAP_SIZE %u, must be between %u and %u", map_size, 64U,
            1U << 29);
    }

    if (map_size % 64) { map_size = (((map_size >> 6) + 1) << 6); }
  } else if (getenv("AFL_SKIP_BIN_CHECK")) {
    map_size = MAP_SIZE;
  }

  return map_size;
}

Source

The next important part comes with the initialization of two key structs: afl_state_t and afl_forkserver_t.

afl_state_t is the structure that manages the fuzzer and many of its parameters like the afl env or the global bitmap (a.k.a virgin_bits). On the other hand, afl_forkserver_t manages the forkserver and is embedded in afl_state_t. This structure contains relevant information related to the status of the fuzzing process like the local bitmap (a.k.a trace_bits).

When initializing the afl_state_t struct, afl records the obtained map size from the previous step.


void afl_state_init(afl_state_t *afl, uint32_t map_size) {
  /* thanks to this memset, growing vars like out_buf
     and out_size are NULL/0 by default. */
  memset(afl, 0, sizeof(afl_state_t));
  afl->shm.map_size = map_size ? map_size : MAP_SIZE;

Source

Just after, the afl_forkserver_t is initialized too.

A next worth to mention point sets afl->shmem_testcase_mode. As stated in the code "We always try to perform shmem fuzzing". The function setup_testcase_shmem, is responsible of set up the shared map for sharedmem fuzzing.

Our next stop is the function init_count_class16. The core of this code lies in two lookup tables: count_class_lookup8 and count_class_lookup16:

count_class_lookup8: This 256-entry array serves as the foundation. It maps an 8-bit hit count to a bucket, but the mapping is non-linear. Low hit counts (e.g., 1, 2, 3) are given distinct buckets, as the transition from zero to a small number of hits is a highly significant event in path discovery. As the counts increase, the buckets become progressively wider, grouping larger ranges of values. This design intentionally discards precision where it is less meaningful, focusing only on the order-of-magnitude changes.
count_class_lookup16: This larger, 65,536-entry table is populated at startup by init_count_class16. The function constructs this table by decomposing each 16-bit index into its high and low bytes, looking up each byte's bucket in count_class_lookup8, and then recombining the results.


u16 count_class_lookup16[65536];


/* Destructively classify execution counts in a trace. This is used as a
   preprocessing step for any newly acquired traces. Called on every exec,
   must be fast. */
static const u8 count_class_lookup8[256] = {
    [0] = 0,
    [1] = 1,
    [2] = 2,
    [3] = 4,
    [4 ... 7] = 8,
    [8 ... 15] = 16,
    [16 ... 31] = 32,
    [32 ... 127] = 64,
    [128 ... 255] = 128
};


void init_count_class16(void) {
  u32 b1, b2;
  for (b1 = 0; b1 < 256; b1++) {
    for (b2 = 0; b2 < 256; b2++) {
      count_class_lookup16[(b1 << 8) + b2] =
          (count_class_lookup8[b1] << 8) | count_class_lookup8[b2];
    }
  }
}

Source

Executing this function and examining its output map reveals a key characteristic: saturation. Large, contiguous blocks of input indices are mapped to the exact same bucket value. This is the intended and logical result of the design. Because the underlying 8-bit table already groups ranges of numbers, combining two such bucketed values amplifies this effect, leading to a significant loss of precision at higher hit counts. This is the explicit trade-off: the system sacrifices granularity for the ability to classify any 16-bit hit count with a single, branchless memory lookup.

We then stop at afl_shm_init, to setup the bitmap that is going to be shared with the child process. This function starts by asking for an allocated shared memory segment by calling the function shmget. The result is the id of this segment, that will be further shared with the child process.


shm->shm_id =
    shmget(IPC_PRIVATE, map_size == MAP_SIZE ? map_size + 8 : map_size,
           IPC_CREAT | IPC_EXCL | permission);

Source

Next step is to attach the created map to afl-fuzz.


shm->map = shmat(shm->shm_id, NULL, 0);

Source

Then sets the environment variable SHM_FUZZ_ENV_VAR (__AFL_SHM_FUZZ_ID), which — as we’ll see in the Child Process section — allows the binary to attach to the parent’s shared memory region.

Just after initializing the trace_bits bitmap, the size of this map is passed with the env variable AFL_MAP_SIZE to the child process.


afl->fsrv.map_size = DEFAULT_SHMEM_SIZE;  // dummy temporary value
char vbuf[16];
snprintf(vbuf, sizeof(vbuf), "%u", DEFAULT_SHMEM_SIZE);
setenv("AFL_MAP_SIZE", vbuf, 1);

Source

Finally, the execution reaches the afl_fsrv_get_mapsize function, that internally calls afl_fsrv_start. This function is the responsible of start the forkserver.


fsrv->fsrv_pid = fork();

Source

The server side gets now the real map size of the target, sent by the child process.


if ((status & FS_NEW_OPT_MAPSIZE)) {
  u32 tmp_map_size;
  rlen = read(fsrv->fsrv_st_fd, &tmp_map_size, 4);
  if (!fsrv->map_size) { fsrv->map_size = MAP_SIZE; }
  fsrv->real_map_size = tmp_map_size;
  if (tmp_map_size % 64) {
    tmp_map_size = (((tmp_map_size + 63) >> 6) << 6);
  }

  if (!be_quiet) { ACTF("Target map size: %u", fsrv->real_map_size); }

Source

The final step of the parent is to initialize the afl->virgin_bits bitmap. After receiving the correct bitmap size it is initialized.


memset(afl->virgin_bits, 255, map_size);

Source

At this point, the parent process has finished initializing the shared memory for coverage and testcase buffers, set the necessary environment variables (__AFL_SHM_ID, and AFL_MAP_SIZE), and completed the forkserver handshake. With the forkserver running and the real map size known, the fuzzer is ready to start its main work. The next step is the main loop in afl-fuzz.c, where the parent process repeatedly selects inputs, sends them to the target through the forkserver, and updates its state based on the coverage feedback. Now we will examine how the child process starts, gather coverage in a run and notifies the parent, then we will finally see how the parent will merge that coverage into the global bitmap.

Child Process

When the child process starts, it executes fsrv_exec_child function. Summarizing, this function execve's into the instrumented binary.

In the instrumented binary, we start looking into the __afl_auto_init_globals function. This function sets afl_final_loc variable, which is later used to compute the real size of the shared map. As we saw in the latest post, this function is declared in instrumentation/afl-llvm-rt-lto.o.c but the body of the function is constructed at compiling time.

We continue by looking into the __afl_auto_second function. This function ensures that the instrumented binary always has a valid coverage buffer. By default, the runtime provides a small static array called __afl_area_initial. If the binary needs more slots than this default region provides, __afl_auto_second allocates a larger local buffer (via mmap or malloc) and point __afl_area_ptr at it, updating __afl_map_size accordingly.

It’s important to note that this local buffer is only a fallback mechanism. In a normal fuzzing run under afl-fuzz, this buffer will be discarded. Shortly afterward, the function __afl_map_shm attaches the process to the shared memory region created by the parent. When that happens, __afl_area_ptr is updated to point into the shared region, and __afl_map_size is overwritten with the real map size coming from the parent.


__attribute__((constructor(1))) void __afl_auto_second(void) {
  if (__afl_already_initialized_second) return;
  __afl_already_initialized_second = 1;

  if (getenv("AFL_DEBUG")) {
    __afl_debug = 1;
    fprintf(stderr, "DEBUG: debug enabled\n");
    fprintf(stderr, "DEBUG: AFL++ afl-compiler-rt" VERSION "\n");
  }

  if (getenv("AFL_DISABLE_LLVM_INSTRUMENTATION")) return;
  u8 *ptr;

  if (__afl_final_loc > MAP_INITIAL_SIZE) {
    __afl_first_final_loc = __afl_final_loc + 1;
    if (__afl_area_ptr && __afl_area_ptr != __afl_area_initial)
      free(__afl_area_ptr);

    if (__afl_map_addr)
      ptr = (u8 *)mmap((void *)__afl_map_addr, __afl_first_final_loc,
                     PROT_READ | PROT_WRITE,
                     MAP_FIXED_NOREPLACE | MAP_SHARED | MAP_ANONYMOUS, -1, 0);
    else
      ptr = (u8 *)malloc(__afl_first_final_loc);

    if (ptr && (ssize_t)ptr != -1) {
      __afl_area_ptr = ptr;
      __afl_area_ptr_dummy = __afl_area_ptr;
      __afl_area_ptr_backup = __afl_area_ptr;
    }
  }
}

Source

Next step is __afl_map_shm called from __afl_auto_early. This function is the responsible of mapping the bitmap in the child process. With this mapping, the child will be able to store the coverage information of each run, then the parent will merge this information with the global bitmap. The first step queries for the SHM_ENV_VAR. This variable holds the shm_id returned when the parent called shmget.


char *id_str = getenv(SHM_ENV_VAR);


u32 shm_id = atoi(id_str);


__afl_area_ptr = (u8 *)shmat(shm_id, (void *)__afl_map_addr, 0);

Source

Another task this function performs is to store in __afl_map_size the real size of the shared map. As we saw earlier this value will be sent to the parent process.


__afl_map_size = __afl_final_loc + 1;  // as we count starting 0

Source

Finally, it writes the value "1" in the first position of the bitmap. This serves to notify afl-fuzz that the process is alive.


/* Write something into the bitmap so that even with low AFL_INST_RATIO,
       our parent doesn't give up on us. */

    __afl_area_ptr[0] = 1;

Source

With the shared memory set up in the child process, now it's time to communicate with the parent process. In this dialogue, the child tells the parent the real size of the shared memory. This is done in the function __afl_start_forkserver.


u8 *msg = (u8 *)&status;


// FS_NEW_OPT_MAPSIZE - we always send the map size
    status = __afl_map_size;
    if (write(FORKSRV_FD + 1, msg, 4) != 4) { _exit(1); }

Source

From now on, the child process will perform a fuzz round, writing the coverage info in the shared bitmap. Now we return to the parent process to see how the coverage information is merged.

Coverage merging

This part starts at the afl_fsrv_run_target function. Here the parent clears fsrv->trace_bits, so the child process can write the coverage information in a clean state.


#ifdef __linux__
    if (likely(!fsrv->nyx_mode)) {
      memset(fsrv->trace_bits, 0, fsrv->map_size);
      MEM_BARRIER();
    }
#else
    memset(fsrv->trace_bits, 0, fsrv->map_size);
    MEM_BARRIER();
#endif

Source

Then it will let the child process run and finish the current execution. After that it inspects the coverage information reported by the child process. This inspection is done in the save_if_interesting function. This function first invokes calculate_new_bits_if_necessary, to see if it has to apply bucketing or just check if coverage has increased.


static inline void calculate_new_bits_if_necessary(afl_state_t *afl,
                                                   u8          *new_bits,
                                                   bool        *bits_counted,
                                                   bool        *classified) {

  if (*bits_counted) return;
  if (*classified) {
    *new_bits = has_new_bits(afl, afl->virgin_bits);
  } else {
    *new_bits = has_new_bits_unclassified(afl, afl->virgin_bits, classified);
  }
  *bits_counted = true;
}

Source

If it has to apply bucketing, first it will call the function has_new_bits_unclassified, that eventually will call classify_counts.


static inline u8 has_new_bits_unclassified(afl_state_t *afl, u8 *virgin_map,
                                           bool *classified) {
  /* Handle the hot path first: no new coverage */
  u8 *end = afl->fsrv.trace_bits + afl->fsrv.map_size;

#ifdef WORD_SIZE_64
  if (!skim((u64 *)virgin_map, (u64 *)afl->fsrv.trace_bits, (u64 *)end))
    return 0;
#else
  if (!skim((u32 *)virgin_map, (u32 *)afl->fsrv.trace_bits, (u32 *)end))
    return 0;
#endif                                                     /* ^WORD_SIZE_64 */
  classify_counts(&afl->fsrv);
  *classified = true;
  return has_new_bits(afl, virgin_map);
}

Source

The latter function applies the count_class_lookup16 table to bucket raw execution counts into ranges. This bucketing allows AFL++ to detect not only new edges, but also when a known edge has been exercised with a significantly higher iteration count. Such bucket jumps are considered new coverage and can mark an input as interesting.


inline u64 classify_word(u64 word) {
  u16 mem16[4];
  memcpy(mem16, &word, sizeof(mem16));
  mem16[0] = count_class_lookup16[mem16[0]];
  mem16[1] = count_class_lookup16[mem16[1]];
  mem16[2] = count_class_lookup16[mem16[2]];
  mem16[3] = count_class_lookup16[mem16[3]];
  memcpy(&word, mem16, sizeof(mem16));
  return word;
}


inline void classify_counts(afl_forkserver_t *fsrv) {
  u64 *mem = (u64 *)fsrv->trace_bits;
  u32  i = (fsrv->map_size >> 3);
  while (i--) {
    if (unlikely(*mem)) { *mem = classify_word(*mem); }
    mem++;
  }
}

Source

After that process, it's time to check if the fuzz run has reported new coverage. To do so, the function has_new_bits compares each bitmap.


inline u8 has_new_bits(afl_state_t *afl, u8 *virgin_map) {
#ifdef WORD_SIZE_64
  u64 *current = (u64 *)afl->fsrv.trace_bits;
  u64 *virgin = (u64 *)virgin_map;
  u32 i = ((afl->fsrv.real_map_size + 7) >> 3);
#else
  u32 *current = (u32 *)afl->fsrv.trace_bits;
  u32 *virgin = (u32 *)virgin_map;
  u32 i = ((afl->fsrv.real_map_size + 3) >> 2);
#endif   
}


  while (i--) {
    if (unlikely(*current)) discover_word(&ret, current, virgin);
    current++;
    virgin++;
  }
}

Source

The real classification work happens in discover_word. This function inspects overlapping words in the current trace and virgin map:


inline void discover_word(u8 *ret, u64 *current, u64 *virgin) {
  if (*current & *virgin) {
    if (likely(*ret < 2)) {
      u8 *cur = (u8 *)current;
      u8 *vir = (u8 *)virgin;
      /* Looks like we have not found any new bytes yet; see if any non-zero
         bytes in current[] are pristine in virgin[]. */
      if ((cur[0] && vir[0] == 0xff) || (cur[1] && vir[1] == 0xff) ||
          (cur[2] && vir[2] == 0xff) || (cur[3] && vir[3] == 0xff) ||
          (cur[4] && vir[4] == 0xff) || (cur[5] && vir[5] == 0xff) ||
          (cur[6] && vir[6] == 0xff) || (cur[7] && vir[7] == 0xff))
        *ret = 2;
      else
        *ret = 1;
    }
    *virgin &= ~*current;
  }
}

Source

AFL++'s mechanism doesn't just see coverage as a simple yes/no; it evaluates the quality of the discovery.

A "Level 1" Discovery: This occurs when a new path is found within a byte-sized region of the map that was already partially explored. In this case, the mechanism has found a new branch off a known area. It flags this by setting its return value to 1.
A "Level 2" Discovery: This is a much more significant event. The mechanism detects this when a new path appears in a "virgin byte"—a byte in the map where no paths had ever been recorded before. This signals the exploration of a completely new function or code region. This high-value discovery is flagged by setting the return value to 2.

Example: Bitmap Evolution

Imagine the global virgin_bits has a byte set to 0xff (all bits untouched):


virgin: 11111111
current: 00001000

The overlap (AND) reveals a brand new edge in a previously untouched byte → Level 2 discovery. That byte in virgin_bits is then updated:


virgin: 11110111

On a later run:


virgin: 11110111
current: 00000100

Here, the new edge falls inside the same byte, but since it’s no longer completely virgin, AFL++ marks this as a Level 1 discovery.

Updating Knowledge

Finally, *virgin &= ~*current updates the global map by clearing bits corresponding to edges just seen. This ensures discoveries are only counted once and guarantees AFL++ always focuses on genuinely unexplored paths.

Trace Simplification and Crash Deduplication

Other mechanism that AFL++ uses to recognize redundant crashes is the simplify_trace function. This function reduces every byte to one of two values: 0x01 for zero (no hits) and 0x80 for non‑zero (hit). This coarse simplification ignores execution counts entirely and encodes only presence or absence. The resulting simplified map is then hashed, producing a checksum that serves as a stable fingerprint.


void simplify_trace(afl_state_t *afl, u8 *bytes) {
  u64 *mem = (u64 *)bytes;
  u32  i = (afl->fsrv.map_size >> 3);

  while (i--) {
    if (unlikely(*mem)) {
      u8 *mem8 = (u8 *)mem;
      mem8[0] = simplify_lookup[mem8[0]];
      mem8[1] = simplify_lookup[mem8[1]];
      mem8[2] = simplify_lookup[mem8[2]];
      mem8[3] = simplify_lookup[mem8[3]];
      mem8[4] = simplify_lookup[mem8[4]];
      mem8[5] = simplify_lookup[mem8[5]];
      mem8[6] = simplify_lookup[mem8[6]];
      mem8[7] = simplify_lookup[mem8[7]];
    } else
      *mem = 0x0101010101010101ULL;
    mem++;
  }
}

Source

This simplification can also be found in the sanitizer and deduplication path inside save_if_interesting. By collapsing coverage to a binary fingerprint, AFL++ ensures that fuzzing effort isn’t wasted on repeatedly triaging the same crash, allowing the fuzzer to prioritize genuinely distinct failure cases

Summary

In this second installment, we followed AFL++ from the moment the forkserver handshake completes through to the point where coverage feedback is classified, compared, and acted upon. Along the way we saw several distinct mechanisms at work:

Shared Memory Setup: The parent and child coordinate via a shared bitmap that records edge hits for each run.
Count Classification: Raw hit counts are bucketed into ranges using count_class_lookup16, preserving only order-of-magnitude information.
Coverage Discovery: The has_new_bits and discover_word functions merge coverage into the global virgin_bits map, distinguishing between incremental (Level 1) and entirely new (Level 2) discoveries.
Knowledge Update: Virgin map entries are cleared as edges are exercised, ensuring discoveries are counted once.
Crash Deduplication: Independently, simplify_trace reduces coverage to a binary fingerprint, enabling AFL++ to filter out redundant crashes and avoid wasting effort.

Together, these steps complete the feedback loop: the fuzzer executes a test case, the target records coverage in the shared map, AFL++ interprets that signal through classification and comparison, and finally decides whether the input is “interesting” enough to keep. This is the foundation on which its powerful evolutionary search is built.

Josep Comes

AFL++ Instrumentation in Practice: A Trace from Compilation to Fuzz II

Quick Recap & Setup

Gathering coverage from the Instrumented Binary

afl-fuzz

Child Process

Coverage merging

Example: Bitmap Evolution

Updating Knowledge

Trace Simplification and Crash Deduplication

Summary

References