[PATCH 1/2] docs/fuzz: rST-ify the fuzzing documentation

From: Alexander Bulekov <alxndr@bu.edu>
To: qemu-devel@nongnu.org
Cc: Alexander Bulekov <alxndr@bu.edu>, Bandan Das <bsd@redhat.com>,
	Thomas Huth <thuth@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: [PATCH 1/2] docs/fuzz: rST-ify the fuzzing documentation
Date: Fri,  6 Nov 2020 13:05:59 -0500	[thread overview]
Message-ID: <20201106180600.360110-2-alxndr@bu.edu> (raw)
In-Reply-To: <20201106180600.360110-1-alxndr@bu.edu>

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
---
 MAINTAINERS            |   2 +-
 docs/devel/fuzzing.rst | 236 +++++++++++++++++++++++++++++++++++++++++
 docs/devel/fuzzing.txt | 214 -------------------------------------
 docs/devel/index.rst   |   1 +
 4 files changed, 238 insertions(+), 215 deletions(-)
 create mode 100644 docs/devel/fuzzing.rst
 delete mode 100644 docs/devel/fuzzing.txt

diff --git a/MAINTAINERS b/MAINTAINERS
index 63223e1183..da1ef68ff1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2517,7 +2517,7 @@ R: Thomas Huth <thuth@redhat.com>
 S: Maintained
 F: tests/qtest/fuzz/
 F: scripts/oss-fuzz/
-F: docs/devel/fuzzing.txt
+F: docs/devel/fuzzing.rst
 
 Register API
 M: Alistair Francis <alistair@alistair23.me>
diff --git a/docs/devel/fuzzing.rst b/docs/devel/fuzzing.rst
new file mode 100644
index 0000000000..f19d75ceff
--- /dev/null
+++ b/docs/devel/fuzzing.rst
@@ -0,0 +1,236 @@
+========
+Fuzzing
+========
+
+This document describes the virtual-device fuzzing infrastructure in QEMU and
+how to use it to implement additional fuzzers.
+
+Basics
+------
+
+Fuzzing operates by passing inputs to an entry point/target function. The
+fuzzer tracks the code coverage triggered by the input. Based on these
+findings, the fuzzer mutates the input and repeats the fuzzing.
+
+To fuzz QEMU, we rely on libfuzzer. Unlike other fuzzers such as AFL, libfuzzer
+is an *in-process* fuzzer. For the developer, this means that it is their
+responsibility to ensure that state is reset between fuzzing-runs.
+
+Building the fuzzers
+--------------------
+
+*NOTE*: If possible, build a 32-bit binary. When forking, the 32-bit fuzzer is
+much faster, since the page-map has a smaller size. This is due to the fact that
+AddressSanitizer maps ~20TB of memory, as part of its detection. This results
+in a large page-map, and a much slower ``fork()``.
+
+To build the fuzzers, install a recent version of clang:
+Configure with (substitute the clang binaries with the version you installed).
+Here, enable-sanitizers, is optional but it allows us to reliably detect bugs
+such as out-of-bounds accesses, use-after-frees, double-frees etc.::
+
+    CC=clang-8 CXX=clang++-8 /path/to/configure --enable-fuzzing \
+                                                --enable-sanitizers
+
+Fuzz targets are built similarly to system targets::
+
+    make i386-softmmu/fuzz
+
+This builds ``./i386-softmmu/qemu-fuzz-i386``
+
+The first option to this command is: ``--fuzz-target=FUZZ_NAME``
+To list all of the available fuzzers run ``qemu-fuzz-i386`` with no arguments.
+
+For example::
+
+    ./i386-softmmu/qemu-fuzz-i386 --fuzz-target=virtio-scsi-fuzz
+
+Internally, libfuzzer parses all arguments that do not begin with ``"--"``.
+Information about these is available by passing ``-help=1``
+
+Now the only thing left to do is wait for the fuzzer to trigger potential
+crashes.
+
+Useful libFuzzer flags
+----------------------
+
+As mentioned above, libFuzzer accepts some arguments. Passing ``-help=1`` will
+list the available arguments. In particular, these arguments might be helpful:
+
+* ``CORPUS_DIR/`` : Specify a directory as the last argument to libFuzzer.
+  libFuzzer stores each "interesting" input in this corpus directory. The next
+  time you run libFuzzer, it will read all of the inputs from the corpus, and
+  continue fuzzing from there. You can also specify multiple directories.
+  libFuzzer loads existing inputs from all specified directories, but will only
+  write new ones to the first one specified.
+
+* ``-max_len=4096`` : specify the maximum byte-length of the inputs libFuzzer
+  will generate.
+
+* ``-close_fd_mask={1,2,3}`` : close, stderr, or both. Useful for targets that
+  trigger many debug/error messages, or create output on the serial console.
+
+* ``-jobs=4 -workers=4`` : These arguments configure libFuzzer to run 4 fuzzers in
+  parallel (4 fuzzing jobs in 4 worker processes). Alternatively, with only
+  ``-jobs=N``, libFuzzer automatically spawns a number of workers less than or equal
+  to half the available CPU cores. Replace 4 with a number appropriate for your
+  machine. Make sure to specify a ``CORPUS_DIR``, which will allow the parallel
+  fuzzers to share information about the interesting inputs they find.
+
+* ``-use_value_profile=1`` : For each comparison operation, libFuzzer computes
+  ``(caller_pc&4095) | (popcnt(Arg1 ^ Arg2) << 12)`` and places this in the
+  coverage table. Useful for targets with "magic" constants. If Arg1 came from
+  the fuzzer's input and Arg2 is a magic constant, then each time the Hamming
+  distance between Arg1 and Arg2 decreases, libFuzzer adds the input to the
+  corpus.
+
+* ``-shrink=1`` : Tries to make elements of the corpus "smaller". Might lead to
+  better coverage performance, depending on the target.
+
+Note that libFuzzer's exact behavior will depend on the version of
+clang and libFuzzer used to build the device fuzzers.
+
+Generating Coverage Reports
+---------------------------
+
+Code coverage is a crucial metric for evaluating a fuzzer's performance.
+libFuzzer's output provides a "cov: " column that provides a total number of
+unique blocks/edges covered. To examine coverage on a line-by-line basis we
+can use Clang coverage:
+
+ 1. Configure libFuzzer to store a corpus of all interesting inputs (see
+    CORPUS_DIR above)
+ 2. ``./configure`` the QEMU build with ::
+
+    --enable-fuzzing \
+    --extra-cflags="-fprofile-instr-generate -fcoverage-mapping"
+
+ 3. Re-run the fuzzer. Specify $CORPUS_DIR/* as an argument, telling libfuzzer
+    to execute all of the inputs in $CORPUS_DIR and exit. Once the process
+    exits, you should find a file, "default.profraw" in the working directory.
+ 4. Execute these commands to generate a detailed HTML coverage-report::
+
+      llvm-profdata merge -output=default.profdata default.profraw
+      llvm-cov show ./path/to/qemu-fuzz-i386 -instr-profile=default.profdata \
+      --format html -output-dir=/path/to/output/report
+
+Adding a new fuzzer
+-------------------
+
+Coverage over virtual devices can be improved by adding additional fuzzers.
+Fuzzers are kept in ``tests/qtest/fuzz/`` and should be added to
+``tests/qtest/fuzz/Makefile.include``
+
+Fuzzers can rely on both qtest and libqos to communicate with virtual devices.
+
+1. Create a new source file. For example ``tests/qtest/fuzz/foo-device-fuzz.c``.
+
+2. Write the fuzzing code using the libqtest/libqos API. See existing fuzzers
+   for reference.
+
+3. Register the fuzzer in ``tests/fuzz/Makefile.include`` by appending the
+   corresponding object to fuzz-obj-y
+
+Fuzzers can be more-or-less thought of as special qtest programs which can
+modify the qtest commands and/or qtest command arguments based on inputs
+provided by libfuzzer. Libfuzzer passes a byte array and length. Commonly the
+fuzzer loops over the byte-array interpreting it as a list of qtest commands,
+addresses, or values.
+
+The Generic Fuzzer
+------------------
+
+Writing a fuzz target can be a lot of effort (especially if a device driver has
+not be built-out within libqos). Many devices can be fuzzed to some degree,
+without any device-specific code, using the generic-fuzz target.
+
+The generic-fuzz target is capable of fuzzing devices over their PIO, MMIO,
+and DMA input-spaces. To apply the generic-fuzz to a device, we need to define
+two env-variables, at minimum:
+
+* ``QEMU_FUZZ_ARGS=`` is the set of QEMU arguments used to configure a machine, with
+  the device attached. For example, if we want to fuzz the virtio-net device
+  attached to a pc-i440fx machine, we can specify::
+
+    QEMU_FUZZ_ARGS="-M pc -nodefaults -netdev user,id=user0 \
+    -device virtio-net,netdev=user0"
+
+* ``QEMU_FUZZ_OBJECTS=`` is a set of space-delimited strings used to identify
+  the MemoryRegions that will be fuzzed. These strings are compared against
+  MemoryRegion names and MemoryRegion owner names, to decide whether each
+  MemoryRegion should be fuzzed. These strings support globbing. For the
+  virtio-net example, we could use one of ::
+
+    QEMU_FUZZ_OBJECTS='virtio-net'
+    QEMU_FUZZ_OBJECTS='virtio*'
+    QEMU_FUZZ_OBJECTS='virtio* pcspk' # Fuzz the virtio devices and the speaker
+    QEMU_FUZZ_OBJECTS='*' # Fuzz the whole machine``
+
+The ``"info mtree"`` and ``"info qom-tree"`` monitor commands can be especially
+useful for identifying the ``MemoryRegion`` and ``Object`` names used for
+matching.
+
+As a generic rule-of-thumb, the more ``MemoryRegions``/Devices we match, the
+greater the input-space, and the smaller the probability of finding crashing
+inputs for individual devices. As such, it is usually a good idea to limit the
+fuzzer to only a few ``MemoryRegions``.
+
+To ensure that these env variables have been configured correctly, we can use::
+
+    ./qemu-fuzz-i386 --fuzz-target=generic-fuzz -runs=0
+
+The output should contain a complete list of matched MemoryRegions.
+
+Implementation Details / Fuzzer Lifecycle
+-----------------------------------------
+
+The fuzzer has two entrypoints that libfuzzer calls. libfuzzer provides it's
+own ``main()``, which performs some setup, and calls the entrypoints:
+
+``LLVMFuzzerInitialize``: called prior to fuzzing. Used to initialize all of the
+necessary state
+
+``LLVMFuzzerTestOneInput``: called for each fuzzing run. Processes the input and
+resets the state at the end of each run.
+
+In more detail:
+
+``LLVMFuzzerInitialize`` parses the arguments to the fuzzer (must start with two
+dashes, so they are ignored by libfuzzer ``main()``). Currently, the arguments
+select the fuzz target. Then, the qtest client is initialized. If the target
+requires qos, qgraph is set up and the QOM/LIBQOS modules are initialized.
+Then the QGraph is walked and the QEMU cmd_line is determined and saved.
+
+After this, the ``vl.c:qemu_main`` is called to set up the guest. There are
+target-specific hooks that can be called before and after qemu_main, for
+additional setup(e.g. PCI setup, or VM snapshotting).
+
+``LLVMFuzzerTestOneInput``: Uses qtest/qos functions to act based on the fuzz
+input. It is also responsible for manually calling ``main_loop_wait`` to ensure
+that bottom halves are executed and any cleanup required before the next input.
+
+Since the same process is reused for many fuzzing runs, QEMU state needs to
+be reset at the end of each run. There are currently two implemented
+options for resetting state:
+
+- Reboot the guest between runs.
+  - *Pros*: Straightforward and fast for simple fuzz targets.
+
+  - *Cons*: Depending on the device, does not reset all device state. If the
+    device requires some initialization prior to being ready for fuzzing (common
+    for QOS-based targets), this initialization needs to be done after each
+    reboot.
+
+  - *Example target*: ``i440fx-qtest-reboot-fuzz``
+
+- Run each test case in a separate forked process and copy the coverage
+   information back to the parent. This is fairly similar to AFL's "deferred"
+   fork-server mode [3]
+
+  - *Pros*: Relatively fast. Devices only need to be initialized once. No need to
+    do slow reboots or vmloads.
+
+  - *Cons*: Not officially supported by libfuzzer. Does not work well for
+     devices that rely on dedicated threads.
+
+  - *Example target*: ``virtio-net-fork-fuzz``
diff --git a/docs/devel/fuzzing.txt b/docs/devel/fuzzing.txt
deleted file mode 100644
index 03585c1a9b..0000000000
--- a/docs/devel/fuzzing.txt
+++ /dev/null
@@ -1,214 +0,0 @@
-= Fuzzing =
-
-== Introduction ==
-
-This document describes the virtual-device fuzzing infrastructure in QEMU and
-how to use it to implement additional fuzzers.
-
-== Basics ==
-
-Fuzzing operates by passing inputs to an entry point/target function. The
-fuzzer tracks the code coverage triggered by the input. Based on these
-findings, the fuzzer mutates the input and repeats the fuzzing.
-
-To fuzz QEMU, we rely on libfuzzer. Unlike other fuzzers such as AFL, libfuzzer
-is an _in-process_ fuzzer. For the developer, this means that it is their
-responsibility to ensure that state is reset between fuzzing-runs.
-
-== Building the fuzzers ==
-
-NOTE: If possible, build a 32-bit binary. When forking, the 32-bit fuzzer is
-much faster, since the page-map has a smaller size. This is due to the fact that
-AddressSanitizer mmaps ~20TB of memory, as part of its detection. This results
-in a large page-map, and a much slower fork().
-
-To build the fuzzers, install a recent version of clang:
-Configure with (substitute the clang binaries with the version you installed).
-Here, enable-sanitizers, is optional but it allows us to reliably detect bugs
-such as out-of-bounds accesses, use-after-frees, double-frees etc.
-
-    CC=clang-8 CXX=clang++-8 /path/to/configure --enable-fuzzing \
-                                                --enable-sanitizers
-
-Fuzz targets are built similarly to system/softmmu:
-
-    make i386-softmmu/fuzz
-
-This builds ./i386-softmmu/qemu-fuzz-i386
-
-The first option to this command is: --fuzz-target=FUZZ_NAME
-To list all of the available fuzzers run qemu-fuzz-i386 with no arguments.
-
-For example:
-    ./i386-softmmu/qemu-fuzz-i386 --fuzz-target=virtio-scsi-fuzz
-
-Internally, libfuzzer parses all arguments that do not begin with "--".
-Information about these is available by passing -help=1
-
-Now the only thing left to do is wait for the fuzzer to trigger potential
-crashes.
-
-== Useful libFuzzer flags ==
-
-As mentioned above, libFuzzer accepts some arguments. Passing -help=1 will list
-the available arguments. In particular, these arguments might be helpful:
-
-$CORPUS_DIR/ : Specify a directory as the last argument to libFuzzer. libFuzzer
-stores each "interesting" input in this corpus directory. The next time you run
-libFuzzer, it will read all of the inputs from the corpus, and continue fuzzing
-from there. You can also specify multiple directories. libFuzzer loads existing
-inputs from all specified directories, but will only write new ones to the
-first one specified.
-
--max_len=4096 : specify the maximum byte-length of the inputs libFuzzer will
-generate.
-
--close_fd_mask={1,2,3} : close, stderr, or both. Useful for targets that
-trigger many debug/error messages, or create output on the serial console.
-
--jobs=4 -workers=4 : These arguments configure libFuzzer to run 4 fuzzers in
-parallel (4 fuzzing jobs in 4 worker processes). Alternatively, with only
--jobs=N, libFuzzer automatically spawns a number of workers less than or equal
-to half the available CPU cores. Replace 4 with a number appropriate for your
-machine. Make sure to specify a $CORPUS_DIR, which will allow the parallel
-fuzzers to share information about the interesting inputs they find.
-
--use_value_profile=1 : For each comparison operation, libFuzzer computes 
-(caller_pc&4095) | (popcnt(Arg1 ^ Arg2) << 12) and places this in the coverage
-table. Useful for targets with "magic" constants. If Arg1 came from the fuzzer's
-input and Arg2 is a magic constant, then each time the Hamming distance
-between Arg1 and Arg2 decreases, libFuzzer adds the input to the corpus.
-
--shrink=1 : Tries to make elements of the corpus "smaller". Might lead to
-better coverage performance, depending on the target.
-
-Note that libFuzzer's exact behavior will depend on the version of
-clang and libFuzzer used to build the device fuzzers.
-
-== Generating Coverage Reports ==
-Code coverage is a crucial metric for evaluating a fuzzer's performance.
-libFuzzer's output provides a "cov: " column that provides a total number of
-unique blocks/edges covered. To examine coverage on a line-by-line basis we
-can use Clang coverage:
-
- 1. Configure libFuzzer to store a corpus of all interesting inputs (see
-    CORPUS_DIR above)
- 2. ./configure the QEMU build with:
-    --enable-fuzzing \
-    --extra-cflags="-fprofile-instr-generate -fcoverage-mapping"
- 3. Re-run the fuzzer. Specify $CORPUS_DIR/* as an argument, telling libfuzzer
-    to execute all of the inputs in $CORPUS_DIR and exit. Once the process
-    exits, you should find a file, "default.profraw" in the working directory.
- 4. Execute these commands to generate a detailed HTML coverage-report:
- llvm-profdata merge -output=default.profdata default.profraw
- llvm-cov show ./path/to/qemu-fuzz-i386 -instr-profile=default.profdata \
- --format html -output-dir=/path/to/output/report
-
-== Adding a new fuzzer ==
-Coverage over virtual devices can be improved by adding additional fuzzers.
-Fuzzers are kept in tests/qtest/fuzz/ and should be added to
-tests/qtest/fuzz/Makefile.include
-
-Fuzzers can rely on both qtest and libqos to communicate with virtual devices.
-
-1. Create a new source file. For example ``tests/qtest/fuzz/foo-device-fuzz.c``.
-
-2. Write the fuzzing code using the libqtest/libqos API. See existing fuzzers
-for reference.
-
-3. Register the fuzzer in ``tests/fuzz/Makefile.include`` by appending the
-corresponding object to fuzz-obj-y
-
-Fuzzers can be more-or-less thought of as special qtest programs which can
-modify the qtest commands and/or qtest command arguments based on inputs
-provided by libfuzzer. Libfuzzer passes a byte array and length. Commonly the
-fuzzer loops over the byte-array interpreting it as a list of qtest commands,
-addresses, or values.
-
-== The Generic Fuzzer ==
-Writing a fuzz target can be a lot of effort (especially if a device driver has
-not be built-out within libqos). Many devices can be fuzzed to some degree,
-without any device-specific code, using the generic-fuzz target.
-
-The generic-fuzz target is capable of fuzzing devices over their PIO, MMIO,
-and DMA input-spaces. To apply the generic-fuzz to a device, we need to define
-two env-variables, at minimum:
-
-QEMU_FUZZ_ARGS= is the set of QEMU arguments used to configure a machine, with
-the device attached. For example, if we want to fuzz the virtio-net device
-attached to a pc-i440fx machine, we can specify:
-QEMU_FUZZ_ARGS="-M pc -nodefaults -netdev user,id=user0 \
-                -device virtio-net,netdev=user0"
-
-QEMU_FUZZ_OBJECTS= is a set of space-delimited strings used to identify the
-MemoryRegions that will be fuzzed. These strings are compared against
-MemoryRegion names and MemoryRegion owner names, to decide whether each
-MemoryRegion should be fuzzed. These strings support globbing. For the
-virtio-net example, we could use QEMU_FUZZ_OBJECTS=
- * 'virtio-net'
- * 'virtio*'
- * 'virtio* pcspk' (Fuzz the virtio devices and the PC speaker...)
- * '*' (Fuzz the whole machine)
-
-The "info mtree" and "info qom-tree" monitor commands can be especially useful
-for identifying the MemoryRegion and Object names used for matching.
-
-As a generic rule-of-thumb, the more MemoryRegions/Devices we match, the greater
-the input-space, and the smaller the probability of finding crashing inputs for
-individual devices. As such, it is usually a good idea to limit the fuzzer to
-only a few MemoryRegions.
-
-To ensure that these env variables have been configured correctly, we can use:
-
-./qemu-fuzz-i386 --fuzz-target=generic-fuzz -runs=0
-
-The output should contain a complete list of matched MemoryRegions.
-
-= Implementation Details =
-
-== The Fuzzer's Lifecycle ==
-
-The fuzzer has two entrypoints that libfuzzer calls. libfuzzer provides it's
-own main(), which performs some setup, and calls the entrypoints:
-
-LLVMFuzzerInitialize: called prior to fuzzing. Used to initialize all of the
-necessary state
-
-LLVMFuzzerTestOneInput: called for each fuzzing run. Processes the input and
-resets the state at the end of each run.
-
-In more detail:
-
-LLVMFuzzerInitialize parses the arguments to the fuzzer (must start with two
-dashes, so they are ignored by libfuzzer main()). Currently, the arguments
-select the fuzz target. Then, the qtest client is initialized. If the target
-requires qos, qgraph is set up and the QOM/LIBQOS modules are initialized.
-Then the QGraph is walked and the QEMU cmd_line is determined and saved.
-
-After this, the vl.c:qemu__main is called to set up the guest. There are
-target-specific hooks that can be called before and after qemu_main, for
-additional setup(e.g. PCI setup, or VM snapshotting).
-
-LLVMFuzzerTestOneInput: Uses qtest/qos functions to act based on the fuzz
-input. It is also responsible for manually calling the main loop/main_loop_wait
-to ensure that bottom halves are executed and any cleanup required before the
-next input.
-
-Since the same process is reused for many fuzzing runs, QEMU state needs to
-be reset at the end of each run. There are currently two implemented
-options for resetting state:
-1. Reboot the guest between runs.
-   Pros: Straightforward and fast for simple fuzz targets.
-   Cons: Depending on the device, does not reset all device state. If the
-   device requires some initialization prior to being ready for fuzzing
-   (common for QOS-based targets), this initialization needs to be done after
-   each reboot.
-   Example target: i440fx-qtest-reboot-fuzz
-2. Run each test case in a separate forked process and copy the coverage
-   information back to the parent. This is fairly similar to AFL's "deferred"
-   fork-server mode [3]
-   Pros: Relatively fast. Devices only need to be initialized once. No need
-   to do slow reboots or vmloads.
-   Cons: Not officially supported by libfuzzer. Does not work well for devices
-   that rely on dedicated threads.
-   Example target: virtio-net-fork-fuzz
diff --git a/docs/devel/index.rst b/docs/devel/index.rst
index 77baae5c77..f10ed77e4c 100644
--- a/docs/devel/index.rst
+++ b/docs/devel/index.rst
@@ -22,6 +22,7 @@ Contents:
    stable-process
    testing
    qtest
+   fuzzing
    decodetree
    secure-coding-practices
    tcg
-- 
2.28.0