DatAFLow is a fuzzer built on top of AFL++. However, instead of a control-flow-based feedback mechanism (e.g., based on control-flow edge coverage), datAFLow uses a data-flow-based feedback mechanism; specifically, data flows based on def-use associations.
To enable performant fuzzing, datAFLow uses a flexible and efficient memory object metadata scheme based on the "Padding Area MetaData" (PAMD) approach.
More details are available in our registered report, published at the 1st International Fuzzing Workshop (FUZZING) 2022, and in our TOSEM paper. You can read the report here and the final journal paper here.
datAFLow is built on LLVM v12-14. Python is also required (for the dataflow-cc wrapper).
Z3 is required by SVF (for static analysis). SVF is an optional component. If running datAFLow on Ubuntu 20.04, you can install z3 via apt.
git clone https://github.com/z3prover/z3 git -C z3 checkout z3-4.8.8 mkdir -p z3/build cd z3/build cmake .. \ -DCMAKE_INSTALL_PREFIX=$(realpath ../install) -DZ3_BUILD_LIBZ3_SHARED=False make -j make installFUZZALLOC_SRC variable refers to this directory (i.e., the root source directory). Ensure all submodules are initialized.
cd $FUZZALLOC_SRC git submodule update --init --recursiveThen build.
cd $FUZZALLOC_SRC mkdir build cd build cmake .. \ -DCMAKE_C_COMPILER=clang-12 -DCMAKE_CXX_COMPILER=clang++-12 \ -DLLVM_DIR=$(llvm-config-12 --cmakedir) \ -DZ3_DIR=/path/to/z3/install make -jTo build the SVF-based static analysis, pass the -DUSE_SVF=True option to cmake. As described above, SVF requires z3. If z3 was built from source, the -DZ3_DIR=/path/to/z3/install option is also required.
The dataflow-cc (and dataflow-cc++) tools can be used as dropin replacements for clang (and clang++). These wrappers provide a number of environment variables to configure the target:
-
FUZZALLOC_DEF_MEM_FUNCS: Path to a special case list (see below) listing custom memory allocation routines -
FUZZALLOC_DEF_SENSITIVITY: The def sites to instrument. One ofarray,struct, orarray:struct. -
FUZZALLOC_USE_SENSITIVITY: The use sites to instrument. One ofread,write, orread:write. -
FUZZALLOC_USE_CAPTURE: What to capture at each use site. One ofuse,offset, orvalue. -
FUZZALLOC_INST: Instrumentation. One of:afl(for fuzzing);tracer(for accurate tracing of def-use chains); ornone.
If the target uses custom memory allocation routines (i.e., wrapping malloc, calloc, etc.), then a special case list containing a list of these routines should be provided to dataflow-preprocess. Doing so ensures dynamically-allocated variable def sites are appropriately tagged. The list is provided via the --def-mem-funcs option. The special case list must be formatted as:
[fuzzalloc] fun:malloc_wrapper fun:calloc_wrapper fun:realloc_wrapper In addition to dataflow-cc and dataflow-c++, we provide the following tools:
Uses SVF to statically derive an upper bounds on the number of def-use chains in a BC file. This tool generates JSON output tying these def-use chains to source-level variables (recovered through debug info).
Note that you must run CMake with the -DUSE_SVF=On option to build this tool.
Collect fuzzalloc stats from an instrumented bitcode file. Stats include: number of tagged variables, number of instrumented use sites, etc.
static-region-cov statically extracts Clang's source-based code coverage from an instrumented binary.
Generate data-flow coverage over time from an AFL++ queue output directory. Relies on a version of the target program instrumented with trace mode (i.e., setting FUZZALLOC_INST=trace) to replay the queue through, generating JSON reports logging covered def-use chains.
Generate control-flow coverage over time from an AFL++ queue output directory. Relies on a version of the target program instrumented with Clang's source-based coverage (i.e., compiled using Clang's -fprofile-instr-generate -fcoverage-mapping flags) to replay the queue through, generating JSON reports logging covered def-use chains.
See README.magma.md and README.ddfuzz.md for reproducing the Magma and DDFuzz experiments, respectively.