All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration
@ 2017-10-10 16:20 George Dunlap
  2017-10-10 16:20 ` [PATCH v3 02/12] fuzz/x86_emulate: Improve failure descriptions in x86_emulate harness George Dunlap
                   ` (12 more replies)
  0 siblings, 13 replies; 45+ messages in thread
From: George Dunlap @ 2017-10-10 16:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Wei Liu, George Dunlap, Jan Beulich, Andrew Cooper

Once feof() returns true for a stream, it will continue to return true
for that stream until clearerr() is called (or the stream is closed
and re-opened).

In llvm-clang-fast-mode, the same file descriptor is used for each
iteration of the loop, meaning that the "Input too large" check was
broken -- feof() would return true even if the fread() hadn't hit the
end of the file.  The result is that AFL generates testcases of
arbitrary size.

Fix this by fseek'ing to the beginning of the file on every iteration;
this resets the EOF marker and other state.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
Changes in v3:
- Fix the issue in the official sanctioned way

This is a candidate for backport to 4.9.

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 tools/fuzz/x86_instruction_emulator/afl-harness.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/fuzz/x86_instruction_emulator/afl-harness.c b/tools/fuzz/x86_instruction_emulator/afl-harness.c
index b4d15451b5..57b4542556 100644
--- a/tools/fuzz/x86_instruction_emulator/afl-harness.c
+++ b/tools/fuzz/x86_instruction_emulator/afl-harness.c
@@ -77,6 +77,17 @@ int main(int argc, char **argv)
                 exit(-1);
             }
         }
+#ifdef __AFL_HAVE_MANUAL_CONTROL
+        else
+        {
+            /* 
+             * This will ensure we're dealing with a clean stream
+             * state after the afl-fuzz process messes with the open
+             * file handle.
+             */
+            fseek(fp, 0, SEEK_SET);
+        }
+#endif
 
         size = fread(input, 1, INPUT_SIZE, fp);
 
-- 
2.14.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v3 02/12] fuzz/x86_emulate: Improve failure descriptions in x86_emulate harness
  2017-10-10 16:20 [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration George Dunlap
@ 2017-10-10 16:20 ` George Dunlap
  2017-10-10 16:20 ` [PATCH v3 03/12] fuzz/x86_emulate: Implement input_read() and input_avail() George Dunlap
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 45+ messages in thread
From: George Dunlap @ 2017-10-10 16:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Wei Liu, George Dunlap, Jan Beulich, Andrew Cooper

- Print the symbolic name rather than the number
- Explicitly state when data_read() fails due to EOI

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Changes in v4:
- Make array 'static const char* const'
Changes in v2:
- Add spaces around '='

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 tools/fuzz/x86_instruction_emulator/fuzz-emul.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
index 48a879cc88..999f417716 100644
--- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
+++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
@@ -52,6 +52,14 @@ struct fuzz_state
     struct x86_emulate_ops ops;
 };
 
+static const char* const x86emul_return_string[] = {
+    [X86EMUL_OKAY] = "X86EMUL_OKAY",
+    [X86EMUL_UNHANDLEABLE] = "X86EMUL_UNHANDLEABLE",
+    [X86EMUL_EXCEPTION] = "X86EMUL_EXCEPTION",
+    [X86EMUL_RETRY] = "X86EMUL_RETRY",
+    [X86EMUL_DONE] = "X86EMUL_DONE",
+};
+
 /*
  * Randomly return success or failure when processing data.  If
  * `exception` is false, this function turns _EXCEPTION to _OKAY.
@@ -84,7 +92,7 @@ static int maybe_fail(struct x86_emulate_ctxt *ctxt,
     if ( rc == X86EMUL_EXCEPTION && !exception )
         rc = X86EMUL_OKAY;
 
-    printf("maybe_fail %s: %d\n", why, rc);
+    printf("maybe_fail %s: %s\n", why, x86emul_return_string[rc]);
 
     if ( rc == X86EMUL_EXCEPTION )
         /* Fake up a pagefault. */
@@ -113,6 +121,7 @@ static int data_read(struct x86_emulate_ctxt *ctxt,
             x86_emul_hw_exception(13, 0, ctxt);
 
         rc = X86EMUL_EXCEPTION;
+        printf("data_read %s: X86EMUL_EXCEPTION (end of input)\n", why);
     }
     else
         rc = maybe_fail(ctxt, why, true);
-- 
2.14.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v3 03/12] fuzz/x86_emulate: Implement input_read() and input_avail()
  2017-10-10 16:20 [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration George Dunlap
  2017-10-10 16:20 ` [PATCH v3 02/12] fuzz/x86_emulate: Improve failure descriptions in x86_emulate harness George Dunlap
@ 2017-10-10 16:20 ` George Dunlap
  2017-10-10 16:52   ` Andrew Cooper
  2017-10-10 17:24   ` Ian Jackson
  2017-10-10 16:20 ` [PATCH v3 04/12] fuzz/x86_emulate: Rename the file containing the wrapper code George Dunlap
                   ` (10 subsequent siblings)
  12 siblings, 2 replies; 45+ messages in thread
From: George Dunlap @ 2017-10-10 16:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Wei Liu, George Dunlap, Jan Beulich, Andrew Cooper

Rather than open-coding the "read" from the input file.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
v3:
 - s/input_available/input_avail/;
 - Constify argument to input_avail
 - Fix off-by-one error in input_avail
 - Return false / true rather than 0 / 1 in input_read
v2:
- Use less dread-ful names
- Return bool rather than int

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 tools/fuzz/x86_instruction_emulator/fuzz-emul.c | 31 ++++++++++++++++++-------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
index 999f417716..5fb8586955 100644
--- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
+++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
@@ -52,6 +52,22 @@ struct fuzz_state
     struct x86_emulate_ops ops;
 };
 
+static inline bool input_avail(const struct fuzz_state *s, size_t size)
+{
+    return s->data_index + size <= s->data_num;
+}
+
+static inline bool input_read(struct fuzz_state *s, void *dst, size_t size)
+{
+    if ( !input_avail(s, size) )
+        return false;
+
+    memcpy(dst, &s->corpus->data[s->data_index], size);
+    s->data_index += size;
+
+    return true;
+}
+
 static const char* const x86emul_return_string[] = {
     [X86EMUL_OKAY] = "X86EMUL_OKAY",
     [X86EMUL_UNHANDLEABLE] = "X86EMUL_UNHANDLEABLE",
@@ -68,10 +84,10 @@ static int maybe_fail(struct x86_emulate_ctxt *ctxt,
                       const char *why, bool exception)
 {
     struct fuzz_state *s = ctxt->data;
-    const struct fuzz_corpus *c = s->corpus;
+    unsigned char c;
     int rc;
 
-    if ( s->data_index >= s->data_num )
+    if ( !input_read(s, &c, sizeof(c)) )
         rc = X86EMUL_EXCEPTION;
     else
     {
@@ -80,13 +96,12 @@ static int maybe_fail(struct x86_emulate_ctxt *ctxt,
          * 25% unhandlable
          * 25% exception
          */
-        if ( c->data[s->data_index] > 0xc0 )
+        if ( c > 0xc0 )
             rc = X86EMUL_EXCEPTION;
-        else if ( c->data[s->data_index] > 0x80 )
+        else if ( c > 0x80 )
             rc = X86EMUL_UNHANDLEABLE;
         else
             rc = X86EMUL_OKAY;
-        s->data_index++;
     }
 
     if ( rc == X86EMUL_EXCEPTION && !exception )
@@ -106,11 +121,10 @@ static int data_read(struct x86_emulate_ctxt *ctxt,
                      const char *why, void *dst, unsigned int bytes)
 {
     struct fuzz_state *s = ctxt->data;
-    const struct fuzz_corpus *c = s->corpus;
     unsigned int i;
     int rc;
 
-    if ( s->data_index + bytes > s->data_num )
+    if ( !input_avail(s, bytes) )
     {
         /*
          * Fake up a segment limit violation.  System segment limit volations
@@ -128,8 +142,7 @@ static int data_read(struct x86_emulate_ctxt *ctxt,
 
     if ( rc == X86EMUL_OKAY )
     {
-        memcpy(dst, &c->data[s->data_index], bytes);
-        s->data_index += bytes;
+        input_read(s, dst, bytes);
 
         printf("%s: ", why);
         for ( i = 0; i < bytes; i++ )
-- 
2.14.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v3 04/12] fuzz/x86_emulate: Rename the file containing the wrapper code
  2017-10-10 16:20 [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration George Dunlap
  2017-10-10 16:20 ` [PATCH v3 02/12] fuzz/x86_emulate: Improve failure descriptions in x86_emulate harness George Dunlap
  2017-10-10 16:20 ` [PATCH v3 03/12] fuzz/x86_emulate: Implement input_read() and input_avail() George Dunlap
@ 2017-10-10 16:20 ` George Dunlap
  2017-10-11  9:03   ` Jan Beulich
  2017-10-10 16:20 ` [PATCH v3 05/12] fuzz/x86_emulate: Add 'afl-cov' target George Dunlap
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 45+ messages in thread
From: George Dunlap @ 2017-10-10 16:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Wei Liu, George Dunlap, Jan Beulich, Andrew Cooper

When generating coverage output, by default gcov generates output
filenames based only on the coverage file and the "leaf" source file,
not the full path.  As a result, it uses the same name for
x86_emulate.c and x86_emulate/x86_emulate.c, generally overwriting the
second (which we actually are about) with the first (which is just a
wrapper).

Rename the user-space wrapper helpers to x86-emulate.[ch], so
that it generates separate files.

There is actually an option to gcov, `--preserve-paths`, which will
cause the full path name to be included in the filename, properly
distinguishing between the two.  However, given that the user-space
wrapper doesn't actually do any emulation (and the poor state of gcov
documentation making it difficult to find the option in the first
place), it seems to make more sense to rename the file anyway.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
v3:
- x86-emulate.* rather than x86_emulate_user.*
- Also update .gitignore to ignore the new files

NB: I discovered the `-p` option to gcov after writing this patch.
But I think the patch itself still makes sense.

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 .gitignore                                                |  3 ++-
 tools/fuzz/x86_instruction_emulator/Makefile              | 12 ++++++------
 tools/fuzz/x86_instruction_emulator/fuzz-emul.c           |  2 +-
 tools/tests/x86_emulator/Makefile                         |  6 +++---
 tools/tests/x86_emulator/test_x86_emulator.c              |  2 +-
 tools/tests/x86_emulator/{x86_emulate.c => x86-emulate.c} |  2 +-
 tools/tests/x86_emulator/{x86_emulate.h => x86-emulate.h} |  0
 7 files changed, 14 insertions(+), 13 deletions(-)
 rename tools/tests/x86_emulator/{x86_emulate.c => x86-emulate.c} (99%)
 rename tools/tests/x86_emulator/{x86_emulate.h => x86-emulate.h} (100%)

diff --git a/.gitignore b/.gitignore
index f36ddd284d..ef27553a2d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -160,7 +160,8 @@ tools/flask/utils/flask-set-bool
 tools/flask/utils/flask-label-pci
 tools/fuzz/libelf/afl-libelf-fuzzer
 tools/fuzz/x86_instruction_emulator/asm
-tools/fuzz/x86_instruction_emulator/x86_emulate*
+tools/fuzz/x86_instruction_emulator/x86_emulate
+tools/fuzz/x86_instruction_emulator/x86-emulate.[ch]
 tools/fuzz/x86_instruction_emulator/afl-harness
 tools/helpers/_paths.h
 tools/helpers/init-xenstore-domain
diff --git a/tools/fuzz/x86_instruction_emulator/Makefile b/tools/fuzz/x86_instruction_emulator/Makefile
index a3f6b2c754..107bf62a21 100644
--- a/tools/fuzz/x86_instruction_emulator/Makefile
+++ b/tools/fuzz/x86_instruction_emulator/Makefile
@@ -18,22 +18,22 @@ asm:
 
 asm/%: asm ;
 
-x86_emulate.c x86_emulate.h: %:
+x86-emulate.c x86-emulate.h: %:
 	[ -L $* ] || ln -sf $(XEN_ROOT)/tools/tests/x86_emulator/$*
 
 CFLAGS += $(CFLAGS_xeninclude) -D__XEN_TOOLS__ -I.
 
 x86.h := asm/x86-vendors.h asm/x86-defns.h asm/msr-index.h
-x86_emulate.h := x86_emulate.h x86_emulate/x86_emulate.h $(x86.h)
+x86_emulate.h := x86-emulate.h x86_emulate/x86_emulate.h $(x86.h)
 
-x86_emulate.o: x86_emulate.c x86_emulate/x86_emulate.c $(x86_emulate.h)
+x86-emulate.o: x86-emulate.c x86_emulate/x86_emulate.c $(x86_emulate.h)
 
 fuzz-emul.o: $(x86_emulate.h)
 
-x86-insn-fuzzer.a: fuzz-emul.o x86_emulate.o
+x86-insn-fuzzer.a: fuzz-emul.o x86-emulate.o
 	$(AR) rc $@ $^
 
-afl-harness: afl-harness.o fuzz-emul.o x86_emulate.o
+afl-harness: afl-harness.o fuzz-emul.o x86-emulate.o
 	$(CC) $(CFLAGS) $^ -o $@
 
 # Common targets
@@ -42,7 +42,7 @@ all: x86-insn-fuzz-all
 
 .PHONY: distclean
 distclean: clean
-	rm -f x86_emulate x86_emulate.c x86_emulate.h asm
+	rm -f x86_emulate x86-emulate.c x86-emulate.h asm
 
 .PHONY: clean
 clean:
diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
index 5fb8586955..8998f21fe1 100644
--- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
+++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
@@ -15,7 +15,7 @@
 #include <unistd.h>
 #include <xen/xen.h>
 
-#include "x86_emulate.h"
+#include "x86-emulate.h"
 
 #define MSR_INDEX_MAX 16
 
diff --git a/tools/tests/x86_emulator/Makefile b/tools/tests/x86_emulator/Makefile
index d7be77d98a..ed0fd9710e 100644
--- a/tools/tests/x86_emulator/Makefile
+++ b/tools/tests/x86_emulator/Makefile
@@ -77,7 +77,7 @@ $(addsuffix .h,$(TESTCASES)): %.h: %.c testcase.mk Makefile
 $(addsuffix .c,$(SIMD)) $(addsuffix -avx.c,$(filter sse%,$(SIMD))):
 	ln -sf simd.c $@
 
-$(TARGET): x86_emulate.o test_x86_emulator.o
+$(TARGET): x86-emulate.o test_x86_emulator.o
 	$(HOSTCC) $(HOSTCFLAGS) -o $@ $^
 
 .PHONY: clean
@@ -105,9 +105,9 @@ $(call cc-option-add,HOSTCFLAGS-x86_64,HOSTCC,-no-pie)
 HOSTCFLAGS += $(CFLAGS_xeninclude) -I. $(HOSTCFLAGS-$(XEN_COMPILE_ARCH))
 
 x86.h := asm/x86-vendors.h asm/x86-defns.h asm/msr-index.h
-x86_emulate.h := x86_emulate.h x86_emulate/x86_emulate.h $(x86.h)
+x86_emulate.h := x86-emulate.h x86_emulate/x86_emulate.h $(x86.h)
 
-x86_emulate.o: x86_emulate.c x86_emulate/x86_emulate.c $(x86_emulate.h)
+x86-emulate.o: x86-emulate.c x86_emulate/x86_emulate.c $(x86_emulate.h)
 	$(HOSTCC) $(HOSTCFLAGS) -D__XEN_TOOLS__ -c -g -o $@ $<
 
 test_x86_emulator.o: test_x86_emulator.c $(addsuffix .h,$(TESTCASES)) $(x86_emulate.h)
diff --git a/tools/tests/x86_emulator/test_x86_emulator.c b/tools/tests/x86_emulator/test_x86_emulator.c
index 2bb4b1e04b..7a8df419cd 100644
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -3,7 +3,7 @@
 #include <stdio.h>
 #include <sys/mman.h>
 
-#include "x86_emulate.h"
+#include "x86-emulate.h"
 #include "blowfish.h"
 #include "sse.h"
 #include "sse2.h"
diff --git a/tools/tests/x86_emulator/x86_emulate.c b/tools/tests/x86_emulator/x86-emulate.c
similarity index 99%
rename from tools/tests/x86_emulator/x86_emulate.c
rename to tools/tests/x86_emulator/x86-emulate.c
index 79661d5c2b..975ddc7e53 100644
--- a/tools/tests/x86_emulator/x86_emulate.c
+++ b/tools/tests/x86_emulator/x86-emulate.c
@@ -1,4 +1,4 @@
-#include "x86_emulate.h"
+#include "x86-emulate.h"
 
 #include <sys/mman.h>
 
diff --git a/tools/tests/x86_emulator/x86_emulate.h b/tools/tests/x86_emulator/x86-emulate.h
similarity index 100%
rename from tools/tests/x86_emulator/x86_emulate.h
rename to tools/tests/x86_emulator/x86-emulate.h
-- 
2.14.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v3 05/12] fuzz/x86_emulate: Add 'afl-cov' target
  2017-10-10 16:20 [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration George Dunlap
                   ` (2 preceding siblings ...)
  2017-10-10 16:20 ` [PATCH v3 04/12] fuzz/x86_emulate: Rename the file containing the wrapper code George Dunlap
@ 2017-10-10 16:20 ` George Dunlap
  2017-10-10 16:53   ` Andrew Cooper
  2017-10-10 16:20 ` [PATCH v3 06/12] fuzz/x86_emulate: Take multiple test files for inputs George Dunlap
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 45+ messages in thread
From: George Dunlap @ 2017-10-10 16:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Wei Liu, George Dunlap, Jan Beulich, Andrew Cooper

...to generate a "normal" coverage-instrumented binary, suitable for
use with gcov or afl-cov.

This is slightly annoying because:

 - Every object file needs to have been instrumented to work
   effectively

 - You generally want to have both an afl-instrumented binary and a
   gcov-instrumented binary at the same time, but

 - gcov instrumentation and afl instrumentation are mutually exclusive

So when making the `afl-cov` target, generate a second set of object
files and a second binary with the `-cov` suffix.

While we're here, remove the redundant x86-emulate.c dependency for
x86-emulate.o.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
v3:
- Rebase on new versions of previous patch (mainly x86-emulate.* rename)
- Tighten up build rules
- Add newline at the end of README.afl
- Use := for GCOV_FLAGS in Makefile
Changes in v2:
- Pull 'inputs' to x86_emulate_user* into a make variable to avoid duplication

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 .gitignore                                   |  1 +
 tools/fuzz/README.afl                        | 14 ++++++++++++++
 tools/fuzz/x86_instruction_emulator/Makefile | 17 ++++++++++++++---
 3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/.gitignore b/.gitignore
index ef27553a2d..0514842dae 100644
--- a/.gitignore
+++ b/.gitignore
@@ -163,6 +163,7 @@ tools/fuzz/x86_instruction_emulator/asm
 tools/fuzz/x86_instruction_emulator/x86_emulate
 tools/fuzz/x86_instruction_emulator/x86-emulate.[ch]
 tools/fuzz/x86_instruction_emulator/afl-harness
+tools/fuzz/x86_instruction_emulator/afl-harness-cov
 tools/helpers/_paths.h
 tools/helpers/init-xenstore-domain
 tools/helpers/xen-init-dom0
diff --git a/tools/fuzz/README.afl b/tools/fuzz/README.afl
index 4758de2490..8b58b8cdea 100644
--- a/tools/fuzz/README.afl
+++ b/tools/fuzz/README.afl
@@ -41,3 +41,17 @@ Use the x86 instruction emulator fuzzer as an example.
    $ $AFLPATH/afl-fuzz -t 1000 -i testcase_dir -o findings_dir -- ./afl-harness
 
 Please see AFL documentation for more information.
+
+# GENERATING COVERAGE INFORMATION
+
+To use afl-cov or gcov, you need a separate binary instrumented to
+generate coverage data.  To do this, use the target `afl-cov`:
+
+    $ make afl-cov #produces afl-harness-cov
+
+NOTE: Please also note that the coverage instrumentation hard-codes
+the absolute path for the instrumentation read and write files in the
+binary; so coverage data will always show up in the build directory no
+matter where you run the binary from.
+
+Please see afl-cov and/or gcov documentation for more information.
diff --git a/tools/fuzz/x86_instruction_emulator/Makefile b/tools/fuzz/x86_instruction_emulator/Makefile
index 107bf62a21..cb561aec3f 100644
--- a/tools/fuzz/x86_instruction_emulator/Makefile
+++ b/tools/fuzz/x86_instruction_emulator/Makefile
@@ -23,12 +23,17 @@ x86-emulate.c x86-emulate.h: %:
 
 CFLAGS += $(CFLAGS_xeninclude) -D__XEN_TOOLS__ -I.
 
+GCOV_FLAGS := --coverage
+%-cov.o: %.c
+	$(CC) -c $(CFLAGS) $(GCOV_FLAGS) $< -o $@
+
 x86.h := asm/x86-vendors.h asm/x86-defns.h asm/msr-index.h
 x86_emulate.h := x86-emulate.h x86_emulate/x86_emulate.h $(x86.h)
 
-x86-emulate.o: x86-emulate.c x86_emulate/x86_emulate.c $(x86_emulate.h)
+# x86-emulate.c will be implicit for both
+x86-emulate.o x86-emulate-cov.o: x86_emulate/x86_emulate.c $(x86_emulate.h)
 
-fuzz-emul.o: $(x86_emulate.h)
+fuzz-emul.o fuzz-emulate-cov.o: $(x86_emulate.h)
 
 x86-insn-fuzzer.a: fuzz-emul.o x86-emulate.o
 	$(AR) rc $@ $^
@@ -36,6 +41,9 @@ x86-insn-fuzzer.a: fuzz-emul.o x86-emulate.o
 afl-harness: afl-harness.o fuzz-emul.o x86-emulate.o
 	$(CC) $(CFLAGS) $^ -o $@
 
+afl-harness-cov: afl-harness-cov.o fuzz-emul-cov.o x86-emulate-cov.o
+	$(CC) $(CFLAGS) $(GCOV_FLAGS) $^ -o $@
+
 # Common targets
 .PHONY: all
 all: x86-insn-fuzz-all
@@ -46,7 +54,7 @@ distclean: clean
 
 .PHONY: clean
 clean:
-	rm -f *.a *.o .*.d afl-harness
+	rm -f *.a *.o .*.d afl-harness afl-harness-cov *.gcda *.gcno *.gcov
 
 .PHONY: install
 install: all
@@ -55,3 +63,6 @@ install: all
 
 .PHONY: afl
 afl: afl-harness
+
+.PHONY: afl-cov
+afl-cov: afl-harness-cov
-- 
2.14.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v3 06/12] fuzz/x86_emulate: Take multiple test files for inputs
  2017-10-10 16:20 [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration George Dunlap
                   ` (3 preceding siblings ...)
  2017-10-10 16:20 ` [PATCH v3 05/12] fuzz/x86_emulate: Add 'afl-cov' target George Dunlap
@ 2017-10-10 16:20 ` George Dunlap
  2017-10-10 16:56   ` Andrew Cooper
  2017-10-10 16:20 ` [PATCH v3 07/12] fuzz/x86_emulate: Move all state into fuzz_state George Dunlap
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 45+ messages in thread
From: George Dunlap @ 2017-10-10 16:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Wei Liu, George Dunlap, Jan Beulich, Andrew Cooper

Finding aggregate coverage for a set of test files means running each
afl-generated test case through the harness.  At the moment, this is
done by re-executing afl-harness-cov with each input file.  When a
large number of test cases have been generated, this can take a
significant amonut of time; a recent test with 30k total files
generated by 4 parallel fuzzers took over 7 minutes.

The vast majority of this time is taken up with 'exec', however.
Since the harness is already designed to loop over multiple inputs for
llvm "persistent mode", just allow it to take a large number of inputs
on the same when *not* running in llvm "persistent mode"..  Then the
command can be efficiently executed like this:

  ls */queue/id* | xargs $path/afl-harness-cov

For the above-mentioned test on 30k files, the time to generate
coverage data was reduced from 7 minutes to under 30 seconds.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
v3:
- Combine some variable declarations
- Make sure that count is set only once no matter how it's compiled
v2:
- Make check for batch processing more clear

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 tools/fuzz/README.afl                             |  7 +++++++
 tools/fuzz/x86_instruction_emulator/afl-harness.c | 25 +++++++++++++++--------
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/tools/fuzz/README.afl b/tools/fuzz/README.afl
index 8b58b8cdea..a59564985a 100644
--- a/tools/fuzz/README.afl
+++ b/tools/fuzz/README.afl
@@ -49,6 +49,13 @@ generate coverage data.  To do this, use the target `afl-cov`:
 
     $ make afl-cov #produces afl-harness-cov
 
+In order to speed up the process of checking total coverage,
+`afl-harness-cov` can take several test inputs on its command-line;
+the speed-up effect should be similar to that of using afl-clang-fast.
+You can use xargs to do this most efficiently, like so:
+
+    $ ls queue/id* | xargs $path/afl-harness-cov
+
 NOTE: Please also note that the coverage instrumentation hard-codes
 the absolute path for the instrumentation read and write files in the
 binary; so coverage data will always show up in the build directory no
diff --git a/tools/fuzz/x86_instruction_emulator/afl-harness.c b/tools/fuzz/x86_instruction_emulator/afl-harness.c
index 57b4542556..26b710cb3f 100644
--- a/tools/fuzz/x86_instruction_emulator/afl-harness.c
+++ b/tools/fuzz/x86_instruction_emulator/afl-harness.c
@@ -16,6 +16,7 @@ int main(int argc, char **argv)
 {
     size_t size;
     FILE *fp = NULL;
+    int max, count;
 
     setbuf(stdin, NULL);
     setbuf(stdout, NULL);
@@ -42,8 +43,7 @@ int main(int argc, char **argv)
             break;
 
         case '?':
-        usage:
-            printf("Usage: %s $FILE | [--min-input-size]\n", argv[0]);
+            printf("Usage: %s $FILE [$FILE...] | [--min-input-size]\n", argv[0]);
             exit(-1);
             break;
 
@@ -54,10 +54,13 @@ int main(int argc, char **argv)
         }
     }
 
-    if ( optind == argc ) /* No positional parameters.  Use stdin. */
+    max = argc - optind;
+
+    if ( !max ) /* No positional parameters.  Use stdin. */
+    {
+        max = 1;
         fp = stdin;
-    else if ( optind != (argc - 1) )
-        goto usage;
+    }
 
     if ( LLVMFuzzerInitialize(&argc, &argv) )
         exit(-1);
@@ -65,12 +68,15 @@ int main(int argc, char **argv)
 #ifdef __AFL_HAVE_MANUAL_CONTROL
     __AFL_INIT();
 
-    while ( __AFL_LOOP(1000) )
+    for( count = 0; __AFL_LOOP(1000); )
+#else
+    for( count = 0; count < max; count++ )
 #endif
     {
         if ( fp != stdin ) /* If not using stdin, open the provided file. */
         {
-            fp = fopen(argv[optind], "rb");
+            printf("Opening file %s\n", argv[optind]);
+            fp = fopen(argv[optind + count], "rb");
             if ( fp == NULL )
             {
                 perror("fopen");
@@ -100,7 +106,10 @@ int main(int argc, char **argv)
         if ( !feof(fp) )
         {
             printf("Input too large\n");
-            exit(-1);
+            /* Don't exit if we're doing batch processing */
+            if ( max == 1 )
+                exit(-1);
+            continue;
         }
 
         if ( fp != stdin )
-- 
2.14.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v3 07/12] fuzz/x86_emulate: Move all state into fuzz_state
  2017-10-10 16:20 [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration George Dunlap
                   ` (4 preceding siblings ...)
  2017-10-10 16:20 ` [PATCH v3 06/12] fuzz/x86_emulate: Take multiple test files for inputs George Dunlap
@ 2017-10-10 16:20 ` George Dunlap
  2017-10-10 18:20   ` Andrew Cooper
  2017-10-10 16:20 ` [PATCH v3 08/12] fuzz/x86_emulate: Move definitions into a header George Dunlap
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 45+ messages in thread
From: George Dunlap @ 2017-10-10 16:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Wei Liu, George Dunlap, Jan Beulich, Andrew Cooper

This is in preparation for adding the option for a more "compact"
interpretation of the fuzzing data, in which we only change select
bits of the state.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
v3:
 - Move DATA_OFFSET inside the structure
 - Remove a stray blank line
v2: Port over previous changes

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 tools/fuzz/x86_instruction_emulator/fuzz-emul.c | 89 +++++++++++++------------
 1 file changed, 45 insertions(+), 44 deletions(-)

diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
index 8998f21fe1..20d52b33f8 100644
--- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
+++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
@@ -24,14 +24,8 @@
 /* Layout of data expected as fuzzing input. */
 struct fuzz_corpus
 {
-    unsigned long cr[5];
-    uint64_t msr[MSR_INDEX_MAX];
-    struct cpu_user_regs regs;
-    struct segment_register segments[SEG_NUM];
-    unsigned long options;
     unsigned char data[4096];
 } input;
-#define DATA_OFFSET offsetof(struct fuzz_corpus, data)
 
 /*
  * Internal state of the fuzzing harness.  Calculated initially from the input
@@ -39,7 +33,14 @@ struct fuzz_corpus
  */
 struct fuzz_state
 {
+    unsigned long options;
+    unsigned long cr[5];
+    uint64_t msr[MSR_INDEX_MAX];
+    struct segment_register segments[SEG_NUM];
+    struct cpu_user_regs regs;
+
     /* Fuzzer's input data. */
+#define DATA_OFFSET offsetof(struct fuzz_state, corpus)
     struct fuzz_corpus *corpus;
 
     /* Real amount of data backing corpus->data[]. */
@@ -392,11 +393,10 @@ static int fuzz_read_segment(
     struct x86_emulate_ctxt *ctxt)
 {
     const struct fuzz_state *s = ctxt->data;
-    const struct fuzz_corpus *c = s->corpus;
 
     assert(is_x86_user_segment(seg) || is_x86_system_segment(seg));
 
-    *reg = c->segments[seg];
+    *reg = s->segments[seg];
 
     return X86EMUL_OKAY;
 }
@@ -407,7 +407,6 @@ static int fuzz_write_segment(
     struct x86_emulate_ctxt *ctxt)
 {
     struct fuzz_state *s = ctxt->data;
-    struct fuzz_corpus *c = s->corpus;
     int rc;
 
     assert(is_x86_user_segment(seg) || is_x86_system_segment(seg));
@@ -415,7 +414,7 @@ static int fuzz_write_segment(
     rc = maybe_fail(ctxt, "write_segment", true);
 
     if ( rc == X86EMUL_OKAY )
-        c->segments[seg] = *reg;
+        s->segments[seg] = *reg;
 
     return rc;
 }
@@ -426,12 +425,11 @@ static int fuzz_read_cr(
     struct x86_emulate_ctxt *ctxt)
 {
     const struct fuzz_state *s = ctxt->data;
-    const struct fuzz_corpus *c = s->corpus;
 
-    if ( reg >= ARRAY_SIZE(c->cr) )
+    if ( reg >= ARRAY_SIZE(s->cr) )
         return X86EMUL_UNHANDLEABLE;
 
-    *val = c->cr[reg];
+    *val = s->cr[reg];
 
     return X86EMUL_OKAY;
 }
@@ -442,17 +440,16 @@ static int fuzz_write_cr(
     struct x86_emulate_ctxt *ctxt)
 {
     struct fuzz_state *s = ctxt->data;
-    struct fuzz_corpus *c = s->corpus;
     int rc;
 
-    if ( reg >= ARRAY_SIZE(c->cr) )
+    if ( reg >= ARRAY_SIZE(s->cr) )
         return X86EMUL_UNHANDLEABLE;
 
     rc = maybe_fail(ctxt, "write_cr", true);
     if ( rc != X86EMUL_OKAY )
         return rc;
 
-    c->cr[reg] = val;
+    s->cr[reg] = val;
 
     return X86EMUL_OKAY;
 }
@@ -487,7 +484,6 @@ static int fuzz_read_msr(
     struct x86_emulate_ctxt *ctxt)
 {
     const struct fuzz_state *s = ctxt->data;
-    const struct fuzz_corpus *c = s->corpus;
     unsigned int idx;
 
     switch ( reg )
@@ -501,10 +497,10 @@ static int fuzz_read_msr(
          */
         return data_read(ctxt, x86_seg_none, "read_msr", val, sizeof(*val));
     case MSR_EFER:
-        *val = c->msr[MSRI_EFER];
+        *val = s->msr[MSRI_EFER];
         *val &= ~EFER_LMA;
-        if ( (*val & EFER_LME) && (c->cr[4] & X86_CR4_PAE) &&
-             (c->cr[0] & X86_CR0_PG) )
+        if ( (*val & EFER_LME) && (s->cr[4] & X86_CR4_PAE) &&
+             (s->cr[0] & X86_CR0_PG) )
         {
             printf("Setting EFER_LMA\n");
             *val |= EFER_LMA;
@@ -516,7 +512,7 @@ static int fuzz_read_msr(
     {
         if ( msr_index[idx] == reg )
         {
-            *val = c->msr[idx];
+            *val = s->msr[idx];
             return X86EMUL_OKAY;
         }
     }
@@ -531,7 +527,6 @@ static int fuzz_write_msr(
     struct x86_emulate_ctxt *ctxt)
 {
     struct fuzz_state *s = ctxt->data;
-    struct fuzz_corpus *c = s->corpus;
     unsigned int idx;
     int rc;
 
@@ -550,7 +545,7 @@ static int fuzz_write_msr(
     {
         if ( msr_index[idx] == reg )
         {
-            c->msr[idx] = val;
+            s->msr[idx] = val;
             return X86EMUL_OKAY;
         }
     }
@@ -600,15 +595,14 @@ static void setup_fpu_exception_handler(void)
 static void dump_state(struct x86_emulate_ctxt *ctxt)
 {
     struct fuzz_state *s = ctxt->data;
-    const struct fuzz_corpus *c = s->corpus;
     struct cpu_user_regs *regs = ctxt->regs;
     uint64_t val = 0;
 
     printf(" -- State -- \n");
     printf("addr / sp size: %d / %d\n", ctxt->addr_size, ctxt->sp_size);
-    printf(" cr0: %lx\n", c->cr[0]);
-    printf(" cr3: %lx\n", c->cr[3]);
-    printf(" cr4: %lx\n", c->cr[4]);
+    printf(" cr0: %lx\n", s->cr[0]);
+    printf(" cr3: %lx\n", s->cr[3]);
+    printf(" cr4: %lx\n", s->cr[4]);
 
     printf(" rip: %"PRIx64"\n", regs->rip);
 
@@ -629,15 +623,13 @@ static bool long_mode_active(struct x86_emulate_ctxt *ctxt)
 static bool in_longmode(struct x86_emulate_ctxt *ctxt)
 {
     const struct fuzz_state *s = ctxt->data;
-    const struct fuzz_corpus *c = s->corpus;
 
-    return long_mode_active(ctxt) && c->segments[x86_seg_cs].l;
+    return long_mode_active(ctxt) && s->segments[x86_seg_cs].l;
 }
 
 static void set_sizes(struct x86_emulate_ctxt *ctxt)
 {
     struct fuzz_state *s = ctxt->data;
-    const struct fuzz_corpus *c = s->corpus;
 
     ctxt->lma = long_mode_active(ctxt);
 
@@ -645,11 +637,20 @@ static void set_sizes(struct x86_emulate_ctxt *ctxt)
         ctxt->addr_size = ctxt->sp_size = 64;
     else
     {
-        ctxt->addr_size = c->segments[x86_seg_cs].db ? 32 : 16;
-        ctxt->sp_size   = c->segments[x86_seg_ss].db ? 32 : 16;
+        ctxt->addr_size = s->segments[x86_seg_cs].db ? 32 : 16;
+        ctxt->sp_size   = s->segments[x86_seg_ss].db ? 32 : 16;
     }
 }
 
+static void setup_state(struct x86_emulate_ctxt *ctxt)
+{
+    struct fuzz_state *s = ctxt->data;
+
+    /* Fuzz all of the state in one go */
+    if (!input_read(s, s, DATA_OFFSET))
+        exit(-1);
+}
+
 #define CANONICALIZE(x)                                   \
     do {                                                  \
         uint64_t _y = (x);                                \
@@ -709,8 +710,7 @@ enum {
 static void disable_hooks(struct x86_emulate_ctxt *ctxt)
 {
     struct fuzz_state *s = ctxt->data;
-    const struct fuzz_corpus *c = s->corpus;
-    unsigned long bitmap = c->options;
+    unsigned long bitmap = s->options;
 
     /* See also sanitize_input, some hooks can't be disabled. */
     MAYBE_DISABLE_HOOK(read);
@@ -760,12 +760,11 @@ static void disable_hooks(struct x86_emulate_ctxt *ctxt)
 static void sanitize_input(struct x86_emulate_ctxt *ctxt)
 {
     struct fuzz_state *s = ctxt->data;
-    struct fuzz_corpus *c = s->corpus;
-    struct cpu_user_regs *regs = &c->regs;
-    unsigned long bitmap = c->options;
+    struct cpu_user_regs *regs = ctxt->regs;
+    unsigned long bitmap = s->options;
 
     /* Some hooks can't be disabled. */
-    c->options &= ~((1<<HOOK_read)|(1<<HOOK_insn_fetch));
+    s->options &= ~((1<<HOOK_read)|(1<<HOOK_insn_fetch));
 
     /* Zero 'private' entries */
     regs->error_code = 0;
@@ -779,8 +778,8 @@ static void sanitize_input(struct x86_emulate_ctxt *ctxt)
      * CR0.PG can't be set if CR0.PE isn't set.  Set is more interesting, so
      * set PE if PG is set.
      */
-    if ( c->cr[0] & X86_CR0_PG )
-        c->cr[0] |= X86_CR0_PE;
+    if ( s->cr[0] & X86_CR0_PG )
+        s->cr[0] |= X86_CR0_PE;
 
     /* EFLAGS.VM not available in long mode */
     if ( long_mode_active(ctxt) )
@@ -789,8 +788,8 @@ static void sanitize_input(struct x86_emulate_ctxt *ctxt)
     /* EFLAGS.VM implies 16-bit mode */
     if ( regs->rflags & X86_EFLAGS_VM )
     {
-        c->segments[x86_seg_cs].db = 0;
-        c->segments[x86_seg_ss].db = 0;
+        s->segments[x86_seg_cs].db = 0;
+        s->segments[x86_seg_ss].db = 0;
     }
 }
 
@@ -812,7 +811,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size)
     };
     struct x86_emulate_ctxt ctxt = {
         .data = &state,
-        .regs = &input.regs,
+        .regs = &state.regs,
         .addr_size = 8 * sizeof(void *),
         .sp_size = 8 * sizeof(void *),
     };
@@ -836,7 +835,9 @@ int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size)
     memcpy(&input, data_p, size);
 
     state.corpus = &input;
-    state.data_num = size - DATA_OFFSET;
+    state.data_num = size;
+
+    setup_state(&ctxt);
 
     sanitize_input(&ctxt);
 
-- 
2.14.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v3 08/12] fuzz/x86_emulate: Move definitions into a header
  2017-10-10 16:20 [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration George Dunlap
                   ` (5 preceding siblings ...)
  2017-10-10 16:20 ` [PATCH v3 07/12] fuzz/x86_emulate: Move all state into fuzz_state George Dunlap
@ 2017-10-10 16:20 ` George Dunlap
  2017-10-10 17:25   ` Ian Jackson
  2017-10-10 16:20 ` [PATCH v3 09/12] fuzz/x86_emulate: Make input more compact George Dunlap
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 45+ messages in thread
From: George Dunlap @ 2017-10-10 16:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Wei Liu, George Dunlap, Jan Beulich, Andrew Cooper

Move fuzz-emul.c function prototypes into a header.  Also share the
definition of the input size (rather than hard-coding it in
fuzz-emul.c).

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
RFC: Worth trying to BUILD_BUG_ON(INPUT_SIZE < DATA_SIZE_FULL)?

v3:
- New in this version

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 tools/fuzz/x86_instruction_emulator/afl-harness.c |  6 +-----
 tools/fuzz/x86_instruction_emulator/fuzz-emul.c   |  3 ++-
 tools/fuzz/x86_instruction_emulator/fuzz-emul.h   | 10 ++++++++++
 3 files changed, 13 insertions(+), 6 deletions(-)
 create mode 100644 tools/fuzz/x86_instruction_emulator/fuzz-emul.h

diff --git a/tools/fuzz/x86_instruction_emulator/afl-harness.c b/tools/fuzz/x86_instruction_emulator/afl-harness.c
index 26b710cb3f..891e56f448 100644
--- a/tools/fuzz/x86_instruction_emulator/afl-harness.c
+++ b/tools/fuzz/x86_instruction_emulator/afl-harness.c
@@ -4,12 +4,8 @@
 #include <stdlib.h>
 #include <string.h>
 #include <getopt.h>
+#include "fuzz-emul.h"
 
-extern int LLVMFuzzerInitialize(int *argc, char ***argv);
-extern int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size);
-extern unsigned int fuzz_minimal_input_size(void);
-
-#define INPUT_SIZE  4096
 static uint8_t input[INPUT_SIZE];
 
 int main(int argc, char **argv)
diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
index 20d52b33f8..9bbe973fd0 100644
--- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
+++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
@@ -16,6 +16,7 @@
 #include <xen/xen.h>
 
 #include "x86-emulate.h"
+#include "fuzz-emul.h"
 
 #define MSR_INDEX_MAX 16
 
@@ -24,7 +25,7 @@
 /* Layout of data expected as fuzzing input. */
 struct fuzz_corpus
 {
-    unsigned char data[4096];
+    unsigned char data[INPUT_SIZE];
 } input;
 
 /*
diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.h b/tools/fuzz/x86_instruction_emulator/fuzz-emul.h
new file mode 100644
index 0000000000..30dd8de21e
--- /dev/null
+++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.h
@@ -0,0 +1,10 @@
+#ifndef FUZZ_EMUL_H
+# define FUZZ_EMUL_H
+
+extern int LLVMFuzzerInitialize(int *argc, char ***argv);
+extern int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size);
+extern unsigned int fuzz_minimal_input_size(void);
+
+#define INPUT_SIZE  4096
+
+#endif /* ifdef FUZZ_EMUL_H */
-- 
2.14.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v3 09/12] fuzz/x86_emulate: Make input more compact
  2017-10-10 16:20 [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration George Dunlap
                   ` (6 preceding siblings ...)
  2017-10-10 16:20 ` [PATCH v3 08/12] fuzz/x86_emulate: Move definitions into a header George Dunlap
@ 2017-10-10 16:20 ` George Dunlap
  2017-10-10 16:59   ` Andrew Cooper
                     ` (2 more replies)
  2017-10-10 16:20 ` [PATCH v3 10/12] fuzz/x86_emulate: Add --rerun option to try to track down instability George Dunlap
                   ` (4 subsequent siblings)
  12 siblings, 3 replies; 45+ messages in thread
From: George Dunlap @ 2017-10-10 16:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Wei Liu, George Dunlap, Jan Beulich, Andrew Cooper

At the moment, AFL reckons that for any given input, 87% of it is
completely irrelevant: that is, it can change it as much as it wants
but have no impact on the result of the test; and yet it can't remove
it.

This is largely because we interpret the blob handed to us as a large
struct, including CR values, MSR values, segment registers, and a full
cpu_user_regs.

Instead, modify our interpretation to have a "set state" stanza at the
front.  Begin by reading a 16-bit value; if it is lower than a certain
threshold, set some state according to what byte it is, and repeat.
Continue until the byte is above a certain threshold.

This allows AFL to compact any given test case much smaller; to the
point where now it reckons there is not a single byte of the test file
which becomes irrelevant.  Testing have shown that this option both
allows AFL to reach coverage much faster, and to have a total coverage
higher than with the old format.

Make this an option (rather than a unilateral change) to enable
side-by-side performance comparison of the old and new formats.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
v3:
 - Set default for opt_compact statically in fuzz-emul.c rather than afl-harness
 - Make input size / unconditional input reading more consistent
 - Require only minimum input read, not first instruction byte
 - Use ARRAY_SIZE() for hardcoded values
 - Use `for ( ; ; )` rather than `while(1)`
 - Some style issues
 - Move opt_compact declaration into fuzz-emul.h
v2: Port over previous changes

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 tools/fuzz/x86_instruction_emulator/afl-harness.c |  9 ++-
 tools/fuzz/x86_instruction_emulator/fuzz-emul.c   | 96 ++++++++++++++++++++---
 tools/fuzz/x86_instruction_emulator/fuzz-emul.h   |  2 +
 3 files changed, 96 insertions(+), 11 deletions(-)

diff --git a/tools/fuzz/x86_instruction_emulator/afl-harness.c b/tools/fuzz/x86_instruction_emulator/afl-harness.c
index 891e56f448..052239cea4 100644
--- a/tools/fuzz/x86_instruction_emulator/afl-harness.c
+++ b/tools/fuzz/x86_instruction_emulator/afl-harness.c
@@ -4,6 +4,7 @@
 #include <stdlib.h>
 #include <string.h>
 #include <getopt.h>
+#include <stdbool.h>
 #include "fuzz-emul.h"
 
 static uint8_t input[INPUT_SIZE];
@@ -21,9 +22,11 @@ int main(int argc, char **argv)
     {
         enum {
             OPT_MIN_SIZE,
+            OPT_COMPACT,
         };
         static const struct option lopts[] = {
             { "min-input-size", no_argument, NULL, OPT_MIN_SIZE },
+            { "compact", required_argument, NULL, OPT_COMPACT },
             { 0, 0, 0, 0 }
         };
         int c = getopt_long_only(argc, argv, "", lopts, NULL);
@@ -38,8 +41,12 @@ int main(int argc, char **argv)
             exit(0);
             break;
 
+        case OPT_COMPACT:
+            opt_compact = atoi(optarg);
+            break;
+            
         case '?':
-            printf("Usage: %s $FILE [$FILE...] | [--min-input-size]\n", argv[0]);
+            printf("Usage: %s [--compact=0|1] $FILE [$FILE...] | [--min-input-size]\n", argv[0]);
             exit(-1);
             break;
 
diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
index 9bbe973fd0..b6ebcebc19 100644
--- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
+++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
@@ -35,13 +35,14 @@ struct fuzz_corpus
 struct fuzz_state
 {
     unsigned long options;
+#define DATA_SIZE_COMPACT offsetof(struct fuzz_state, cr)
     unsigned long cr[5];
     uint64_t msr[MSR_INDEX_MAX];
     struct segment_register segments[SEG_NUM];
     struct cpu_user_regs regs;
 
     /* Fuzzer's input data. */
-#define DATA_OFFSET offsetof(struct fuzz_state, corpus)
+#define DATA_SIZE_FULL offsetof(struct fuzz_state, corpus)
     struct fuzz_corpus *corpus;
 
     /* Real amount of data backing corpus->data[]. */
@@ -54,6 +55,13 @@ struct fuzz_state
     struct x86_emulate_ops ops;
 };
 
+bool opt_compact = true;
+
+unsigned int fuzz_minimal_input_size(void)
+{
+    return opt_compact ? DATA_SIZE_COMPACT : DATA_SIZE_FULL;
+}
+
 static inline bool input_avail(const struct fuzz_state *s, size_t size)
 {
     return s->data_index + size <= s->data_num;
@@ -647,9 +655,82 @@ static void setup_state(struct x86_emulate_ctxt *ctxt)
 {
     struct fuzz_state *s = ctxt->data;
 
-    /* Fuzz all of the state in one go */
-    if (!input_read(s, s, DATA_OFFSET))
-        exit(-1);
+    if ( !opt_compact )
+    {
+        /* Fuzz all of the state in one go */
+        if ( !input_read(s, s, DATA_SIZE_FULL) )
+            exit(-1);
+        return;
+    }
+
+    /* Modify only select bits of state */
+
+    /* Always read 'options' */
+    if ( !input_read(s, s, DATA_SIZE_COMPACT) )
+        return;
+    
+    for ( ; ; )
+    {
+        uint16_t offset;
+
+        /* Read 16 bits to decide what bit of state to modify */
+        if ( !input_read(s, &offset, sizeof(offset)) )
+            return;
+
+        /* 
+         * Then decide if it's "pointing to" different bits of the
+         * state 
+         */
+
+        /* cr[]? */
+        if ( offset < ARRAY_SIZE(s->cr) )
+        {
+            if ( !input_read(s, s->cr + offset, sizeof(*s->cr)) )
+                return;
+            printf("Setting CR %d to %lx\n", offset, s->cr[offset]);
+            continue;
+        }
+        
+        offset -= ARRAY_SIZE(s->cr);
+
+        /* msr[]? */
+        if ( offset < ARRAY_SIZE(s->msr) )
+        {
+            if ( !input_read(s, s->msr + offset, sizeof(*s->msr)) )
+                return;
+            printf("Setting MSR i%d (%x) to %lx\n", offset,
+                   msr_index[offset], s->msr[offset]);
+            continue;
+        }
+
+        offset -= ARRAY_SIZE(s->msr);
+
+        /* segments[]? */
+        if ( offset < ARRAY_SIZE(s->segments) )
+        {
+            if ( !input_read(s, s->segments + offset, sizeof(*s->segments)) )
+                return;
+            printf("Setting Segment %d\n", offset);
+            continue;
+            
+        }
+
+        offset -= ARRAY_SIZE(s->segments);
+
+        /* regs? */
+        if ( offset < sizeof(struct cpu_user_regs)
+             && offset + sizeof(uint64_t) <= sizeof(struct cpu_user_regs) )
+        {
+            if ( !input_read(s, ((char *)ctxt->regs) + offset, sizeof(uint64_t)) )
+                return;
+            printf("Setting cpu_user_regs offset %x\n", offset);
+            continue;
+        }
+
+        /* None of the above -- take that as "start emulating" */
+        
+        return;
+    }
 }
 
 #define CANONICALIZE(x)                                   \
@@ -821,7 +902,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size)
     /* Reset all global state variables */
     memset(&input, 0, sizeof(input));
 
-    if ( size <= DATA_OFFSET )
+    if ( size < fuzz_minimal_input_size() )
     {
         printf("Input too small\n");
         return 1;
@@ -858,11 +939,6 @@ int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size)
     return 0;
 }
 
-unsigned int fuzz_minimal_input_size(void)
-{
-    return DATA_OFFSET + 1;
-}
-
 /*
  * Local variables:
  * mode: C
diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.h b/tools/fuzz/x86_instruction_emulator/fuzz-emul.h
index 30dd8de21e..85d21cbf0f 100644
--- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.h
+++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.h
@@ -5,6 +5,8 @@ extern int LLVMFuzzerInitialize(int *argc, char ***argv);
 extern int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size);
 extern unsigned int fuzz_minimal_input_size(void);
 
+extern bool opt_compact;
+
 #define INPUT_SIZE  4096
 
 #endif /* ifdef FUZZ_EMUL_H */
-- 
2.14.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v3 10/12] fuzz/x86_emulate: Add --rerun option to try to track down instability
  2017-10-10 16:20 [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration George Dunlap
                   ` (7 preceding siblings ...)
  2017-10-10 16:20 ` [PATCH v3 09/12] fuzz/x86_emulate: Make input more compact George Dunlap
@ 2017-10-10 16:20 ` George Dunlap
  2017-10-10 18:44   ` Andrew Cooper
  2017-10-10 16:20 ` [PATCH v3 11/12] fuzz/x86_emulate: Set and fuzz more CPU state George Dunlap
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 45+ messages in thread
From: George Dunlap @ 2017-10-10 16:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Wei Liu, George Dunlap, Jan Beulich, Andrew Cooper

Current stability numbers are not 100%.  In order to help track this
down, add a --rerun option which will run the same input twice,
resetting the state between each run, and comparing the state
afterwards.  If the state differs, call abort().

This allows AFL to help the process of tracking down what state is not
being reset properly between runs by proving testcases that
demonstrate the behavior.

To do this:

- Move ctxt into struct fuzz-state to simplify handling

- Rather than copying the data into input, treat the data handed as
  immutable and point each "copy" to it

- Factor out various steps (setting up fuzz state, running an
  individual test) so that they can be efficiently run either once or
  twice (as necessary)

- Compare the states afterwards, printing what's different and calling
  abort() if anything is found.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
v3:
- Make new local functions static
- Avoid losing const-ness of pointer in setup_fuzz_state()
- Remove useless *_size initialization
- Remove extra blank line
- Use ARRAY_SIZE() when appropriate
- Move opt_rerun declaration into fuzz-emul.h
- Change loop variable to unsigned int
- Print segment contents when segments differ
v2:
- Fix some coding style issues
- Port over previous changes

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 tools/fuzz/x86_instruction_emulator/afl-harness.c |   8 +-
 tools/fuzz/x86_instruction_emulator/fuzz-emul.c   | 194 ++++++++++++++++++----
 tools/fuzz/x86_instruction_emulator/fuzz-emul.h   |   1 +
 3 files changed, 171 insertions(+), 32 deletions(-)

diff --git a/tools/fuzz/x86_instruction_emulator/afl-harness.c b/tools/fuzz/x86_instruction_emulator/afl-harness.c
index 052239cea4..4a55ac3c3f 100644
--- a/tools/fuzz/x86_instruction_emulator/afl-harness.c
+++ b/tools/fuzz/x86_instruction_emulator/afl-harness.c
@@ -23,10 +23,12 @@ int main(int argc, char **argv)
         enum {
             OPT_MIN_SIZE,
             OPT_COMPACT,
+            OPT_RERUN,
         };
         static const struct option lopts[] = {
             { "min-input-size", no_argument, NULL, OPT_MIN_SIZE },
             { "compact", required_argument, NULL, OPT_COMPACT },
+            { "rerun", no_argument, NULL, OPT_RERUN },
             { 0, 0, 0, 0 }
         };
         int c = getopt_long_only(argc, argv, "", lopts, NULL);
@@ -45,8 +47,12 @@ int main(int argc, char **argv)
             opt_compact = atoi(optarg);
             break;
             
+        case OPT_RERUN:
+            opt_rerun = true;
+            break;
+
         case '?':
-            printf("Usage: %s [--compact=0|1] $FILE [$FILE...] | [--min-input-size]\n", argv[0]);
+            printf("Usage: %s [--compact=0|1] [--rerun] $FILE [$FILE...] | [--min-input-size]\n", argv[0]);
             exit(-1);
             break;
 
diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
index b6ebcebc19..7685e976b8 100644
--- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
+++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
@@ -43,7 +43,7 @@ struct fuzz_state
 
     /* Fuzzer's input data. */
 #define DATA_SIZE_FULL offsetof(struct fuzz_state, corpus)
-    struct fuzz_corpus *corpus;
+    const struct fuzz_corpus *corpus;
 
     /* Real amount of data backing corpus->data[]. */
     size_t data_num;
@@ -53,6 +53,7 @@ struct fuzz_state
 
     /* Emulation ops, some of which are disabled based on corpus->options. */
     struct x86_emulate_ops ops;
+    struct x86_emulate_ctxt ctxt;
 };
 
 bool opt_compact = true;
@@ -495,6 +496,12 @@ static int fuzz_read_msr(
     const struct fuzz_state *s = ctxt->data;
     unsigned int idx;
 
+    /* 
+     * NB at the moment dump_state() relies on the fact that this
+     * cannot fail.  If we add in fuzzed failures we'll have to handle
+     * that differently.
+     */
+    
     switch ( reg )
     {
     case MSR_TSC_AUX:
@@ -615,6 +622,7 @@ static void dump_state(struct x86_emulate_ctxt *ctxt)
 
     printf(" rip: %"PRIx64"\n", regs->rip);
 
+    /* read_msr() never fails at the moment */
     fuzz_read_msr(MSR_EFER, &val, ctxt);
     printf("EFER: %"PRIx64"\n", val);
 }
@@ -659,7 +667,10 @@ static void setup_state(struct x86_emulate_ctxt *ctxt)
     {
         /* Fuzz all of the state in one go */
         if ( !input_read(s, s, DATA_SIZE_FULL) )
+        {
+            printf("Input size too small\n");
             exit(-1);
+        }
         return;
     }
 
@@ -789,9 +800,8 @@ enum {
         printf("Disabling hook "#h"\n");               \
     }
 
-static void disable_hooks(struct x86_emulate_ctxt *ctxt)
+static void disable_hooks(struct fuzz_state *s)
 {
-    struct fuzz_state *s = ctxt->data;
     unsigned long bitmap = s->options;
 
     /* See also sanitize_input, some hooks can't be disabled. */
@@ -839,7 +849,7 @@ static void disable_hooks(struct x86_emulate_ctxt *ctxt)
  *  - ...bases to below 1Mb, 16-byte aligned
  *  - ...selectors to (base >> 4)
  */
-static void sanitize_input(struct x86_emulate_ctxt *ctxt)
+static void sanitize_state(struct x86_emulate_ctxt *ctxt)
 {
     struct fuzz_state *s = ctxt->data;
     struct cpu_user_regs *regs = ctxt->regs;
@@ -886,21 +896,144 @@ int LLVMFuzzerInitialize(int *argc, char ***argv)
     return 0;
 }
 
-int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size)
+static void setup_fuzz_state(struct fuzz_state *state, const void *data_p, size_t size)
 {
-    struct fuzz_state state = {
-        .ops = all_fuzzer_ops,
-    };
-    struct x86_emulate_ctxt ctxt = {
-        .data = &state,
-        .regs = &state.regs,
-        .addr_size = 8 * sizeof(void *),
-        .sp_size = 8 * sizeof(void *),
-    };
+    memset(state, 0, sizeof(*state));
+    state->corpus = data_p;
+    state->data_num = size;
+}
+
+static int runtest(struct fuzz_state *state) {
     int rc;
 
-    /* Reset all global state variables */
-    memset(&input, 0, sizeof(input));
+    struct x86_emulate_ctxt *ctxt = &state->ctxt;
+    
+    state->ops = all_fuzzer_ops;
+
+    ctxt->data = state;
+    ctxt->regs = &state->regs;
+
+    setup_state(ctxt);
+
+    sanitize_state(ctxt);
+
+    disable_hooks(state);
+
+    do {
+        /* FIXME: Until we actually implement SIGFPE handling properly */
+        setup_fpu_exception_handler();
+
+        set_sizes(ctxt);
+        dump_state(ctxt);
+
+        rc = x86_emulate(ctxt, &state->ops);
+        printf("Emulation result: %d\n", rc);
+    } while ( rc == X86EMUL_OKAY );
+
+    return 0;
+}
+
+static void compare_states(struct fuzz_state state[2])
+{
+    // First zero any "internal" pointers
+    state[0].corpus = state[1].corpus = NULL;
+    state[0].ctxt.data = state[1].ctxt.data = NULL;
+    state[0].ctxt.regs = state[1].ctxt.regs = NULL;
+
+    if ( memcmp(&state[0], &state[1], sizeof(struct fuzz_state)) )
+    {
+        unsigned int i;
+
+        printf("State mismatch\n");
+
+        for ( i=0; i<ARRAY_SIZE(state[0].cr); i++ )
+            if ( state[0].cr[i] != state[1].cr[i] )
+                printf("cr[%u]: %lx != %lx\n",
+                       i, state[0].cr[i], state[1].cr[i]);
+        
+        for ( i=0; i<ARRAY_SIZE(state[0].msr); i++ )
+            if ( state[0].msr[i] != state[1].msr[i] )
+                printf("msr[%u]: %lx != %lx\n",
+                       i, state[0].msr[i], state[1].msr[i]);
+        
+        for ( i=0; i<ARRAY_SIZE(state[0].segments); i++ )
+            if ( memcmp(&state[0].segments[i], &state[1].segments[i],
+                        sizeof(state[0].segments[0])) )
+                printf("segments[%u]: [%x:%x:%x:%lx] != [%x:%x:%x:%lx]!\n", i,
+                       (unsigned)state[0].segments[i].sel,
+                       (unsigned)state[0].segments[i].attr,
+                       state[0].segments[i].limit,
+                       state[0].segments[i].base,
+                       (unsigned)state[1].segments[i].sel,
+                       (unsigned)state[1].segments[i].attr,
+                       state[1].segments[i].limit,
+                       state[1].segments[i].base);
+
+        if ( state[0].data_num != state[1].data_num )
+            printf("data_num: %lx !=  %lx\n", state[0].data_num,
+                   state[1].data_num);
+        if ( state[0].data_index != state[1].data_index )
+            printf("data_index: %lx !=  %lx\n", state[0].data_index,
+                   state[1].data_index);
+
+        if ( memcmp(&state[0].regs, &state[1].regs, sizeof(state[0].regs)) )
+        {
+            printf("registers differ!\n");
+            /* Print If Not Equal */
+#define PINE(elem)\
+            if ( state[0].elem != state[1].elem ) \
+                printf(#elem " differ: %lx != %lx\n", \
+                       (unsigned long)state[0].elem, \
+                       (unsigned long)state[1].elem)
+            PINE(regs.r15);
+            PINE(regs.r14);
+            PINE(regs.r13);
+            PINE(regs.r12);
+            PINE(regs.rbp);
+            PINE(regs.rbx);
+            PINE(regs.r10);
+            PINE(regs.r11);
+            PINE(regs.r9);
+            PINE(regs.r8);
+            PINE(regs.rax);
+            PINE(regs.rcx);
+            PINE(regs.rdx);
+            PINE(regs.rsi);
+            PINE(regs.rdi);
+
+            for ( i = offsetof(struct cpu_user_regs, error_code) / sizeof(unsigned);
+                  i < sizeof(state[1].regs)/sizeof(unsigned); i++ )
+            {
+                printf("[%04lu] %08x %08x\n",
+                        i * sizeof(unsigned), ((unsigned *)&state[0].regs)[i],
+                       ((unsigned *)&state[1].regs)[i]);
+            }
+        }
+
+        if ( memcmp(&state[0].ops, &state[1].ops, sizeof(state[0].ops)) )
+            printf("ops differ!\n");
+
+        if ( memcmp(&state[0].ctxt, &state[1].ctxt, sizeof(state[0].ctxt)) )
+        {
+            printf("ctxt differs!\n");
+            for ( i = 0;  i < sizeof(state[0].ctxt)/sizeof(unsigned); i++ )
+            {
+                printf("[%04lu] %08x %08x\n",
+                        i * sizeof(unsigned), ((unsigned *)&state[0].ctxt)[i],
+                       ((unsigned *)&state[1].ctxt)[i]);
+            }
+            
+        }
+
+        abort();
+    }
+}
+
+bool opt_rerun = false;
+
+int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size)
+{
+    struct fuzz_state state[2];
 
     if ( size < fuzz_minimal_input_size() )
     {
@@ -908,7 +1041,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size)
         return 1;
     }
 
-    if ( size > sizeof(input) )
+    if ( size > sizeof(struct fuzz_corpus) )
     {
         printf("Input too large\n");
         return 1;
@@ -916,25 +1049,24 @@ int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size)
 
     memcpy(&input, data_p, size);
 
-    state.corpus = &input;
-    state.data_num = size;
-
-    setup_state(&ctxt);
+    setup_fuzz_state(&state[0], data_p, size);
+    
+    if ( opt_rerun )
+        printf("||| INITIAL RUN |||\n");
+    
+    runtest(&state[0]);
 
-    sanitize_input(&ctxt);
+    if ( !opt_rerun )
+        return 0;
 
-    disable_hooks(&ctxt);
+    /* Reset all global state variables again */
+    setup_fuzz_state(&state[1], data_p, size);
 
-    do {
-        /* FIXME: Until we actually implement SIGFPE handling properly */
-        setup_fpu_exception_handler();
+    printf("||| SECOND RUN |||\n");
 
-        set_sizes(&ctxt);
-        dump_state(&ctxt);
+    runtest(&state[1]);
 
-        rc = x86_emulate(&ctxt, &state.ops);
-        printf("Emulation result: %d\n", rc);
-    } while ( rc == X86EMUL_OKAY );
+    compare_states(state);
 
     return 0;
 }
diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.h b/tools/fuzz/x86_instruction_emulator/fuzz-emul.h
index 85d21cbf0f..4863bf2166 100644
--- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.h
+++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.h
@@ -6,6 +6,7 @@ extern int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size);
 extern unsigned int fuzz_minimal_input_size(void);
 
 extern bool opt_compact;
+extern bool opt_rerun;
 
 #define INPUT_SIZE  4096
 
-- 
2.14.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v3 11/12] fuzz/x86_emulate: Set and fuzz more CPU state
  2017-10-10 16:20 [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration George Dunlap
                   ` (8 preceding siblings ...)
  2017-10-10 16:20 ` [PATCH v3 10/12] fuzz/x86_emulate: Add --rerun option to try to track down instability George Dunlap
@ 2017-10-10 16:20 ` George Dunlap
  2017-10-11  9:31   ` Jan Beulich
  2017-10-10 16:20 ` [PATCH v3 12/12] fuzz/x86_emulate: Add an option to limit the number of instructions executed George Dunlap
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 45+ messages in thread
From: George Dunlap @ 2017-10-10 16:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Wei Liu, George Dunlap, Jan Beulich, Andrew Cooper

x86_emulate() operates not only on state passed to it in
cpu_user_regs, but also on state currently found on the cpu: namely,
the FPU and XMM registers.  At the moment, we re-zero (and/or
re-initialize) cpu_user_regs on every invocation, but leave the
cpu-stored state alone.  In "persistent mode", this causes test cases
to behave differently -- sometimes significantly so -- depending on
which test cases have been run beforehand.

Zero out the state before each test run, and then fuzz it based on the
corpus input.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
v3:
- Make type 512 bytes rather than 464
- Style changes
- Change argument from 'store' to 'write'
- Add a comment explaining why we always 'save' even for a write
- Sanitize mxcsr with mxcrs_mask when writing instead of zeroing it in sanitize_state
- Get rid of redundant mxcsr_mask setting
- Add comments explaining why we're arbitrarily writing 32 bits
v2: Rebase on top of previous changes

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 tools/fuzz/x86_instruction_emulator/fuzz-emul.c | 82 ++++++++++++++++++++++++-
 1 file changed, 81 insertions(+), 1 deletion(-)

diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
index 7685e976b8..79dd36ec30 100644
--- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
+++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
@@ -40,6 +40,8 @@ struct fuzz_state
     uint64_t msr[MSR_INDEX_MAX];
     struct segment_register segments[SEG_NUM];
     struct cpu_user_regs regs;
+    char fxsave[512] __attribute__((aligned(16)));
+
 
     /* Fuzzer's input data. */
 #define DATA_SIZE_FULL offsetof(struct fuzz_state, corpus)
@@ -596,6 +598,54 @@ static const struct x86_emulate_ops all_fuzzer_ops = {
 };
 #undef SET
 
+/*
+ * This funciton will read or write fxsave to the fpu.  When writing,
+ * it 'sanitizes' the state: It will mask off the appropriate bits in
+ * the mxcsr, 'restore' the state to the fpu, then 'save' it again so
+ * that the data in fxsave reflects what's actually in the FPU.
+ *
+ * TODO: Extend state beyond just FPU (ymm registers, &c)
+ */
+static void _set_fpu_state(char *fxsave, bool write)
+{
+    if ( cpu_has_fxsr )
+    {
+        static union __attribute__((__aligned__(16))) {
+            char x[512];
+            struct {
+                uint32_t other[6];
+                uint32_t mxcsr;
+                uint32_t mxcsr_mask;
+                /* ... */
+            };
+        } *fxs;
+
+        fxs = (typeof(fxs)) fxsave;
+
+        if ( write )
+        {
+            char null[512] __attribute__((aligned(16))) = { };
+            
+            fxs->mxcsr &= mxcsr_mask;
+
+            asm volatile( "fxrstor %0" :: "m" (*null) );
+            asm volatile( "fxrstor %0" :: "m" (*fxs) );
+        }
+
+        asm volatile( "fxsave %0" : "=m" (*fxs) );
+    }
+}
+
+static void set_fpu_state(char *fxsave)
+{
+    _set_fpu_state(fxsave, true);
+}
+
+static void save_fpu_state(char *fxsave)
+{
+    _set_fpu_state(fxsave, false);
+}
+
 static void setup_fpu_exception_handler(void)
 {
     /* FIXME - just disable exceptions for now */
@@ -674,7 +724,11 @@ static void setup_state(struct x86_emulate_ctxt *ctxt)
         return;
     }
 
-    /* Modify only select bits of state */
+    /*
+     * Modify only select bits of state.  In general, try not to fuzz less
+     * than 32 bits at a time; otherwise we're reading 2 bytes in order to fuzz only
+     * one byte. 
+     */
 
     /* Always read 'options' */
     if ( !input_read(s, s, DATA_SIZE_COMPACT) )
@@ -737,6 +791,18 @@ static void setup_state(struct x86_emulate_ctxt *ctxt)
             printf("Setting cpu_user_regs offset %x\n", offset);
             continue;
         }
+        offset -= sizeof(struct cpu_user_regs);
+
+        /* Fuzz fxsave state */
+        if ( offset < 128 )
+        {
+            /* 32-bit size is arbitrary; see comment above */
+            if ( !input_read(s, s->fxsave + (offset * 4), 4) )
+                return;
+            printf("Setting fxsave offset %x\n", offset * 4);
+            continue;
+        }
+        offset -= 128;
 
         /* None of the above -- take that as "start emulating" */
         
@@ -919,6 +985,8 @@ static int runtest(struct fuzz_state *state) {
 
     disable_hooks(state);
 
+    set_fpu_state(state->fxsave);
+
     do {
         /* FIXME: Until we actually implement SIGFPE handling properly */
         setup_fpu_exception_handler();
@@ -930,6 +998,8 @@ static int runtest(struct fuzz_state *state) {
         printf("Emulation result: %d\n", rc);
     } while ( rc == X86EMUL_OKAY );
 
+    save_fpu_state(state->fxsave);
+    
     return 0;
 }
 
@@ -1013,6 +1083,16 @@ static void compare_states(struct fuzz_state state[2])
         if ( memcmp(&state[0].ops, &state[1].ops, sizeof(state[0].ops)) )
             printf("ops differ!\n");
 
+        if ( memcmp(&state[0].fxsave, &state[1].fxsave, sizeof(state[0].fxsave)) )
+        {
+            printf("fxsave differs!\n");
+            for ( i = 0;  i < sizeof(state[0].fxsave)/sizeof(unsigned); i++ )
+            {
+                printf("[%04lu] %08x %08x\n",
+                        i * sizeof(unsigned), ((unsigned *)&state[0].fxsave)[i], ((unsigned *)&state[1].fxsave)[i]);
+            }
+        }
+
         if ( memcmp(&state[0].ctxt, &state[1].ctxt, sizeof(state[0].ctxt)) )
         {
             printf("ctxt differs!\n");
-- 
2.14.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v3 12/12] fuzz/x86_emulate: Add an option to limit the number of instructions executed
  2017-10-10 16:20 [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration George Dunlap
                   ` (9 preceding siblings ...)
  2017-10-10 16:20 ` [PATCH v3 11/12] fuzz/x86_emulate: Set and fuzz more CPU state George Dunlap
@ 2017-10-10 16:20 ` George Dunlap
  2017-10-11  9:34   ` Jan Beulich
  2017-10-10 16:47 ` [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration George Dunlap
  2017-10-10 17:22 ` Ian Jackson
  12 siblings, 1 reply; 45+ messages in thread
From: George Dunlap @ 2017-10-10 16:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Wei Liu, George Dunlap, Jan Beulich, Andrew Cooper

AFL considers a testcase to be a useful addition not only if there are
tuples exercised by that testcase which were not exercised otherwise,
but also if the *number* of times an individual tuple is exercised
changes significantly; in particular, if the number of the highest
non-zero bit changes (i.e., if it is run 1, 2-3, 4-7, 8-15, &c).

One simple way to increase these stats it to execute the same (or
similar) instructions multiple times: If executing a given instruction
once hits a particular tuple 2 times, executing it twice will hit the
tuple 4 times, four times will hit the tuple 8 times, and so on.  All
of these will look different to AFL, and so it is likely that many of
the "unique test cases" will simply be the same instruction repeated
powers of 2 times until the tuple counts max out (at 128).

It is unlikely that executing a single instruction more than a handful
of times will generate any state we actually care about; but such long
testcases take exponentially longer to fuzz: the fuzzer spends more
time flipping bits looking for meaningful changes, and each execution
takes longer because it is doing more things.  So long paths which add
nothing to the actual code coverage but effectively "distract" the
fuzzer, making it less effective.

Experiments have shown that not allowing infinite number of
instruction retries for the old (non-compact) format does indeed speed
up and increase code coverage.  However, it has also shown that on the
new, more compact format, having no instruction limit causes the highest
throughput in code coverage.

So leave the option in, but have it default to 0 (no limit).

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
v3:
- Change opt_instruction_limit to unsigned, default to UINT_MAX
- Simplify limit checking (now that the actual variable itself will never be 0)
- Change counter to unsigned
- Update changelog to try to be a bit more clear


CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 tools/fuzz/x86_instruction_emulator/afl-harness.c | 11 ++++++++++-
 tools/fuzz/x86_instruction_emulator/fuzz-emul.c   |  5 ++++-
 tools/fuzz/x86_instruction_emulator/fuzz-emul.h   |  1 +
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/tools/fuzz/x86_instruction_emulator/afl-harness.c b/tools/fuzz/x86_instruction_emulator/afl-harness.c
index 4a55ac3c3f..7d09cc29c6 100644
--- a/tools/fuzz/x86_instruction_emulator/afl-harness.c
+++ b/tools/fuzz/x86_instruction_emulator/afl-harness.c
@@ -1,4 +1,5 @@
 #include <assert.h>
+#include <limits.h>
 #include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
@@ -24,11 +25,13 @@ int main(int argc, char **argv)
             OPT_MIN_SIZE,
             OPT_COMPACT,
             OPT_RERUN,
+            OPT_INSTRUCTION_LIMIT,
         };
         static const struct option lopts[] = {
             { "min-input-size", no_argument, NULL, OPT_MIN_SIZE },
             { "compact", required_argument, NULL, OPT_COMPACT },
             { "rerun", no_argument, NULL, OPT_RERUN },
+            { "instruction-limit", required_argument, NULL, OPT_INSTRUCTION_LIMIT },
             { 0, 0, 0, 0 }
         };
         int c = getopt_long_only(argc, argv, "", lopts, NULL);
@@ -51,8 +54,14 @@ int main(int argc, char **argv)
             opt_rerun = true;
             break;
 
+        case OPT_INSTRUCTION_LIMIT:
+            opt_instruction_limit = atoi(optarg);
+            if ( !opt_instruction_limit )
+                opt_instruction_limit = UINT_MAX;
+            break;
+
         case '?':
-            printf("Usage: %s [--compact=0|1] [--rerun] $FILE [$FILE...] | [--min-input-size]\n", argv[0]);
+            printf("Usage: %s [--compact=0|1] [--rerun] [--instruction-limit=N] $FILE [$FILE...] | [--min-input-size]\n", argv[0]);
             exit(-1);
             break;
 
diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
index 79dd36ec30..8aaec93973 100644
--- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
+++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
@@ -969,10 +969,13 @@ static void setup_fuzz_state(struct fuzz_state *state, const void *data_p, size_
     state->data_num = size;
 }
 
+unsigned int opt_instruction_limit = UINT_MAX;
+
 static int runtest(struct fuzz_state *state) {
     int rc;
 
     struct x86_emulate_ctxt *ctxt = &state->ctxt;
+    unsigned int icount = 0;
     
     state->ops = all_fuzzer_ops;
 
@@ -996,7 +999,7 @@ static int runtest(struct fuzz_state *state) {
 
         rc = x86_emulate(ctxt, &state->ops);
         printf("Emulation result: %d\n", rc);
-    } while ( rc == X86EMUL_OKAY );
+    } while ( rc == X86EMUL_OKAY && ++icount < opt_instruction_limit );
 
     save_fpu_state(state->fxsave);
     
diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.h b/tools/fuzz/x86_instruction_emulator/fuzz-emul.h
index 4863bf2166..746e1b542d 100644
--- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.h
+++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.h
@@ -7,6 +7,7 @@ extern unsigned int fuzz_minimal_input_size(void);
 
 extern bool opt_compact;
 extern bool opt_rerun;
+extern unsigned int opt_instruction_limit;
 
 #define INPUT_SIZE  4096
 
-- 
2.14.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration
  2017-10-10 16:20 [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration George Dunlap
                   ` (10 preceding siblings ...)
  2017-10-10 16:20 ` [PATCH v3 12/12] fuzz/x86_emulate: Add an option to limit the number of instructions executed George Dunlap
@ 2017-10-10 16:47 ` George Dunlap
  2017-10-10 16:47   ` Andrew Cooper
  2017-10-11  8:59   ` Jan Beulich
  2017-10-10 17:22 ` Ian Jackson
  12 siblings, 2 replies; 45+ messages in thread
From: George Dunlap @ 2017-10-10 16:47 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Wei Liu, Jan Beulich, Andrew Cooper

[-- Attachment #1: Type: text/plain, Size: 931 bytes --]

On 10/10/2017 05:20 PM, George Dunlap wrote:
> Once feof() returns true for a stream, it will continue to return true
> for that stream until clearerr() is called (or the stream is closed
> and re-opened).
> 
> In llvm-clang-fast-mode, the same file descriptor is used for each
> iteration of the loop, meaning that the "Input too large" check was
> broken -- feof() would return true even if the fread() hadn't hit the
> end of the file.  The result is that AFL generates testcases of
> arbitrary size.
> 
> Fix this by fseek'ing to the beginning of the file on every iteration;
> this resets the EOF marker and other state.
> 
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
> ---
> Changes in v3:
> - Fix the issue in the official sanctioned way

Hmm, seems v2 of this patch was checked in; review had flagged up that
"clearerr()" was too big of a hammer.

Attached is a revised v1/12 patch that fixes this.

 -George

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-fuzz-x86_emulate-Clear-errors-in-the-officially-sanc.patch --]
[-- Type: text/x-patch; name="0001-fuzz-x86_emulate-Clear-errors-in-the-officially-sanc.patch", Size: 1947 bytes --]

From d07b2d68085957bf3d7a2567dce9c4f031fb5966 Mon Sep 17 00:00:00 2001
From: George Dunlap <george.dunlap@citrix.com>
Date: Wed, 4 Oct 2017 17:09:10 +0100
Subject: [PATCH] fuzz/x86_emulate: Clear errors in the officially sanctioned
 way

Commit 849a1f10c9 was checked in in appropriately; review flagged up
that clearerr() was too big a hammer, as it would clear both the EOF
flag and stream errors.

Stream errors shouldn't be cleared; we only want the EOF and other
stream-related state cleared.  To do this, it is sufficient to fseek()
to zero.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
This is a candidate for backport to 4.9.

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 tools/fuzz/x86_instruction_emulator/afl-harness.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/tools/fuzz/x86_instruction_emulator/afl-harness.c b/tools/fuzz/x86_instruction_emulator/afl-harness.c
index b4d15451b5..31ae1daef1 100644
--- a/tools/fuzz/x86_instruction_emulator/afl-harness.c
+++ b/tools/fuzz/x86_instruction_emulator/afl-harness.c
@@ -77,6 +77,17 @@ int main(int argc, char **argv)
                 exit(-1);
             }
         }
+#ifdef __AFL_HAVE_MANUAL_CONTROL
+        else
+        {
+            /* 
+             * This will ensure we're dealing with a clean stream
+             * state after the afl-fuzz process messes with the open
+             * file handle.
+             */
+            fseek(fp, 0, SEEK_SET);
+        }
+#endif
 
         size = fread(input, 1, INPUT_SIZE, fp);
 
@@ -97,8 +108,6 @@ int main(int argc, char **argv)
             fclose(fp);
             fp = NULL;
         }
-        else
-            clearerr(fp);
 
         LLVMFuzzerTestOneInput(input, size);
     }
-- 
2.14.2


[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration
  2017-10-10 16:47 ` [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration George Dunlap
@ 2017-10-10 16:47   ` Andrew Cooper
  2017-10-11  8:59   ` Jan Beulich
  1 sibling, 0 replies; 45+ messages in thread
From: Andrew Cooper @ 2017-10-10 16:47 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: Ian Jackson, Wei Liu, Jan Beulich

On 10/10/17 17:47, George Dunlap wrote:
> On 10/10/2017 05:20 PM, George Dunlap wrote:
>> Once feof() returns true for a stream, it will continue to return true
>> for that stream until clearerr() is called (or the stream is closed
>> and re-opened).
>>
>> In llvm-clang-fast-mode, the same file descriptor is used for each
>> iteration of the loop, meaning that the "Input too large" check was
>> broken -- feof() would return true even if the fread() hadn't hit the
>> end of the file.  The result is that AFL generates testcases of
>> arbitrary size.
>>
>> Fix this by fseek'ing to the beginning of the file on every iteration;
>> this resets the EOF marker and other state.
>>
>> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
>> ---
>> Changes in v3:
>> - Fix the issue in the official sanctioned way
> Hmm, seems v2 of this patch was checked in; review had flagged up that
> "clearerr()" was too big of a hammer.
>
> Attached is a revised v1/12 patch that fixes this.
>
>  -George

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 03/12] fuzz/x86_emulate: Implement input_read() and input_avail()
  2017-10-10 16:20 ` [PATCH v3 03/12] fuzz/x86_emulate: Implement input_read() and input_avail() George Dunlap
@ 2017-10-10 16:52   ` Andrew Cooper
  2017-10-10 17:24   ` Ian Jackson
  1 sibling, 0 replies; 45+ messages in thread
From: Andrew Cooper @ 2017-10-10 16:52 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: Ian Jackson, Wei Liu, Jan Beulich

On 10/10/17 17:20, George Dunlap wrote:
> Rather than open-coding the "read" from the input file.
>
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 05/12] fuzz/x86_emulate: Add 'afl-cov' target
  2017-10-10 16:20 ` [PATCH v3 05/12] fuzz/x86_emulate: Add 'afl-cov' target George Dunlap
@ 2017-10-10 16:53   ` Andrew Cooper
  0 siblings, 0 replies; 45+ messages in thread
From: Andrew Cooper @ 2017-10-10 16:53 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: Ian Jackson, Wei Liu, Jan Beulich

On 10/10/17 17:20, George Dunlap wrote:
> ...to generate a "normal" coverage-instrumented binary, suitable for
> use with gcov or afl-cov.
>
> This is slightly annoying because:
>
>  - Every object file needs to have been instrumented to work
>    effectively
>
>  - You generally want to have both an afl-instrumented binary and a
>    gcov-instrumented binary at the same time, but
>
>  - gcov instrumentation and afl instrumentation are mutually exclusive
>
> So when making the `afl-cov` target, generate a second set of object
> files and a second binary with the `-cov` suffix.
>
> While we're here, remove the redundant x86-emulate.c dependency for
> x86-emulate.o.
>
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 06/12] fuzz/x86_emulate: Take multiple test files for inputs
  2017-10-10 16:20 ` [PATCH v3 06/12] fuzz/x86_emulate: Take multiple test files for inputs George Dunlap
@ 2017-10-10 16:56   ` Andrew Cooper
  2017-10-10 16:58     ` George Dunlap
  0 siblings, 1 reply; 45+ messages in thread
From: Andrew Cooper @ 2017-10-10 16:56 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: Ian Jackson, Wei Liu, Jan Beulich

On 10/10/17 17:20, George Dunlap wrote:
> @@ -65,12 +68,15 @@ int main(int argc, char **argv)
>  #ifdef __AFL_HAVE_MANUAL_CONTROL
>      __AFL_INIT();
>  
> -    while ( __AFL_LOOP(1000) )
> +    for( count = 0; __AFL_LOOP(1000); )
> +#else
> +    for( count = 0; count < max; count++ )
>  #endif
>      {
>          if ( fp != stdin ) /* If not using stdin, open the provided file. */
>          {
> -            fp = fopen(argv[optind], "rb");
> +            printf("Opening file %s\n", argv[optind]);
> +            fp = fopen(argv[optind + count], "rb");

I presume the printf() wants adjusting to match the fopen() ?

~Andrew

>              if ( fp == NULL )
>              {
>                  perror("fopen");
> @@ -100,7 +106,10 @@ int main(int argc, char **argv)
>          if ( !feof(fp) )
>          {
>              printf("Input too large\n");
> -            exit(-1);
> +            /* Don't exit if we're doing batch processing */
> +            if ( max == 1 )
> +                exit(-1);
> +            continue;
>          }
>  
>          if ( fp != stdin )


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 06/12] fuzz/x86_emulate: Take multiple test files for inputs
  2017-10-10 16:56   ` Andrew Cooper
@ 2017-10-10 16:58     ` George Dunlap
  2017-10-10 17:56       ` Andrew Cooper
  0 siblings, 1 reply; 45+ messages in thread
From: George Dunlap @ 2017-10-10 16:58 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: Ian Jackson, Wei Liu, Jan Beulich

On 10/10/2017 05:56 PM, Andrew Cooper wrote:
> On 10/10/17 17:20, George Dunlap wrote:
>> @@ -65,12 +68,15 @@ int main(int argc, char **argv)
>>  #ifdef __AFL_HAVE_MANUAL_CONTROL
>>      __AFL_INIT();
>>  
>> -    while ( __AFL_LOOP(1000) )
>> +    for( count = 0; __AFL_LOOP(1000); )
>> +#else
>> +    for( count = 0; count < max; count++ )
>>  #endif
>>      {
>>          if ( fp != stdin ) /* If not using stdin, open the provided file. */
>>          {
>> -            fp = fopen(argv[optind], "rb");
>> +            printf("Opening file %s\n", argv[optind]);
>> +            fp = fopen(argv[optind + count], "rb");
> 
> I presume the printf() wants adjusting to match the fopen() ?

Oh!  I thought I'd fixed that.  Indeed it does.

I can fix that on check-in, if we don't find anything bigger worth
re-sending for.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 09/12] fuzz/x86_emulate: Make input more compact
  2017-10-10 16:20 ` [PATCH v3 09/12] fuzz/x86_emulate: Make input more compact George Dunlap
@ 2017-10-10 16:59   ` Andrew Cooper
  2017-10-10 17:01     ` George Dunlap
  2017-10-10 17:26   ` Ian Jackson
  2017-10-11  9:18   ` Jan Beulich
  2 siblings, 1 reply; 45+ messages in thread
From: Andrew Cooper @ 2017-10-10 16:59 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: Ian Jackson, Wei Liu, Jan Beulich

On 10/10/17 17:20, George Dunlap wrote:
> At the moment, AFL reckons that for any given input, 87% of it is
> completely irrelevant: that is, it can change it as much as it wants
> but have no impact on the result of the test; and yet it can't remove
> it.
>
> This is largely because we interpret the blob handed to us as a large
> struct, including CR values, MSR values, segment registers, and a full
> cpu_user_regs.
>
> Instead, modify our interpretation to have a "set state" stanza at the
> front.  Begin by reading a 16-bit value; if it is lower than a certain
> threshold, set some state according to what byte it is, and repeat.
> Continue until the byte is above a certain threshold.
>
> This allows AFL to compact any given test case much smaller; to the
> point where now it reckons there is not a single byte of the test file
> which becomes irrelevant.  Testing have shown that this option both
> allows AFL to reach coverage much faster, and to have a total coverage
> higher than with the old format.
>
> Make this an option (rather than a unilateral change) to enable
> side-by-side performance comparison of the old and new formats.
>
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>

I am still of the opinion that this is a waste of effort, which would be
better spent actually removing the irrelevant state in the first place;
not building an obfuscation algorithm.

I'm not going to nack the patch because that is probably over the top,
but I'm not in favour if this change going in.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 09/12] fuzz/x86_emulate: Make input more compact
  2017-10-10 16:59   ` Andrew Cooper
@ 2017-10-10 17:01     ` George Dunlap
  2017-10-10 17:11       ` Andrew Cooper
  0 siblings, 1 reply; 45+ messages in thread
From: George Dunlap @ 2017-10-10 17:01 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: Ian Jackson, Wei Liu, Jan Beulich

On 10/10/2017 05:59 PM, Andrew Cooper wrote:
> On 10/10/17 17:20, George Dunlap wrote:
>> At the moment, AFL reckons that for any given input, 87% of it is
>> completely irrelevant: that is, it can change it as much as it wants
>> but have no impact on the result of the test; and yet it can't remove
>> it.
>>
>> This is largely because we interpret the blob handed to us as a large
>> struct, including CR values, MSR values, segment registers, and a full
>> cpu_user_regs.
>>
>> Instead, modify our interpretation to have a "set state" stanza at the
>> front.  Begin by reading a 16-bit value; if it is lower than a certain
>> threshold, set some state according to what byte it is, and repeat.
>> Continue until the byte is above a certain threshold.
>>
>> This allows AFL to compact any given test case much smaller; to the
>> point where now it reckons there is not a single byte of the test file
>> which becomes irrelevant.  Testing have shown that this option both
>> allows AFL to reach coverage much faster, and to have a total coverage
>> higher than with the old format.
>>
>> Make this an option (rather than a unilateral change) to enable
>> side-by-side performance comparison of the old and new formats.
>>
>> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
> 
> I am still of the opinion that this is a waste of effort, which would be
> better spent actually removing the irrelevant state in the first place;
> not building an obfuscation algorithm.
> 
> I'm not going to nack the patch because that is probably over the top,
> but I'm not in favour if this change going in.

Did you look at the evidence I presented, demonstrating that this
significantly increases the effectiveness of AFL?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 09/12] fuzz/x86_emulate: Make input more compact
  2017-10-10 17:01     ` George Dunlap
@ 2017-10-10 17:11       ` Andrew Cooper
  2017-10-10 17:13         ` George Dunlap
  0 siblings, 1 reply; 45+ messages in thread
From: Andrew Cooper @ 2017-10-10 17:11 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: Ian Jackson, Wei Liu, Jan Beulich

On 10/10/17 18:01, George Dunlap wrote:
> On 10/10/2017 05:59 PM, Andrew Cooper wrote:
>> On 10/10/17 17:20, George Dunlap wrote:
>>> At the moment, AFL reckons that for any given input, 87% of it is
>>> completely irrelevant: that is, it can change it as much as it wants
>>> but have no impact on the result of the test; and yet it can't remove
>>> it.
>>>
>>> This is largely because we interpret the blob handed to us as a large
>>> struct, including CR values, MSR values, segment registers, and a full
>>> cpu_user_regs.
>>>
>>> Instead, modify our interpretation to have a "set state" stanza at the
>>> front.  Begin by reading a 16-bit value; if it is lower than a certain
>>> threshold, set some state according to what byte it is, and repeat.
>>> Continue until the byte is above a certain threshold.
>>>
>>> This allows AFL to compact any given test case much smaller; to the
>>> point where now it reckons there is not a single byte of the test file
>>> which becomes irrelevant.  Testing have shown that this option both
>>> allows AFL to reach coverage much faster, and to have a total coverage
>>> higher than with the old format.
>>>
>>> Make this an option (rather than a unilateral change) to enable
>>> side-by-side performance comparison of the old and new formats.
>>>
>>> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
>> I am still of the opinion that this is a waste of effort, which would be
>> better spent actually removing the irrelevant state in the first place;
>> not building an obfuscation algorithm.
>>
>> I'm not going to nack the patch because that is probably over the top,
>> but I'm not in favour if this change going in.
> Did you look at the evidence I presented, demonstrating that this
> significantly increases the effectiveness of AFL?

I can easily believe that you've found an obfucation algorithm which
does better than the current state layout.

I do not believe that any amount of obfuscation will be better than
actually fixing the root cause of the problem; that the current state
really is mostly irrelevant, and can easily be shrunk.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 09/12] fuzz/x86_emulate: Make input more compact
  2017-10-10 17:11       ` Andrew Cooper
@ 2017-10-10 17:13         ` George Dunlap
  2017-10-10 17:31           ` Andrew Cooper
  0 siblings, 1 reply; 45+ messages in thread
From: George Dunlap @ 2017-10-10 17:13 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: Ian Jackson, Wei Liu, Jan Beulich

On 10/10/2017 06:11 PM, Andrew Cooper wrote:
> On 10/10/17 18:01, George Dunlap wrote:
>> On 10/10/2017 05:59 PM, Andrew Cooper wrote:
>>> On 10/10/17 17:20, George Dunlap wrote:
>>>> At the moment, AFL reckons that for any given input, 87% of it is
>>>> completely irrelevant: that is, it can change it as much as it wants
>>>> but have no impact on the result of the test; and yet it can't remove
>>>> it.
>>>>
>>>> This is largely because we interpret the blob handed to us as a large
>>>> struct, including CR values, MSR values, segment registers, and a full
>>>> cpu_user_regs.
>>>>
>>>> Instead, modify our interpretation to have a "set state" stanza at the
>>>> front.  Begin by reading a 16-bit value; if it is lower than a certain
>>>> threshold, set some state according to what byte it is, and repeat.
>>>> Continue until the byte is above a certain threshold.
>>>>
>>>> This allows AFL to compact any given test case much smaller; to the
>>>> point where now it reckons there is not a single byte of the test file
>>>> which becomes irrelevant.  Testing have shown that this option both
>>>> allows AFL to reach coverage much faster, and to have a total coverage
>>>> higher than with the old format.
>>>>
>>>> Make this an option (rather than a unilateral change) to enable
>>>> side-by-side performance comparison of the old and new formats.
>>>>
>>>> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
>>> I am still of the opinion that this is a waste of effort, which would be
>>> better spent actually removing the irrelevant state in the first place;
>>> not building an obfuscation algorithm.
>>>
>>> I'm not going to nack the patch because that is probably over the top,
>>> but I'm not in favour if this change going in.
>> Did you look at the evidence I presented, demonstrating that this
>> significantly increases the effectiveness of AFL?
> 
> I can easily believe that you've found an obfucation algorithm which
> does better than the current state layout.
> 
> I do not believe that any amount of obfuscation will be better than
> actually fixing the root cause of the problem; that the current state
> really is mostly irrelevant, and can easily be shrunk.

Right; well I've already explained why I don't think "obfuscation" is
the right term.  For the time being, we have something which improves
efficiency; let's check it in now, and in the future if you or someone
else finds a way to fix it "properly" we can do that.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration
  2017-10-10 16:20 [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration George Dunlap
                   ` (11 preceding siblings ...)
  2017-10-10 16:47 ` [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration George Dunlap
@ 2017-10-10 17:22 ` Ian Jackson
  2017-10-11  9:00   ` Jan Beulich
  12 siblings, 1 reply; 45+ messages in thread
From: Ian Jackson @ 2017-10-10 17:22 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Wei Liu, Jan Beulich, Andrew Cooper

George Dunlap writes ("[PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration"):
> Once feof() returns true for a stream, it will continue to return true
> for that stream until clearerr() is called (or the stream is closed
> and re-opened).
> 
> In llvm-clang-fast-mode, the same file descriptor is used for each
> iteration of the loop, meaning that the "Input too large" check was
> broken -- feof() would return true even if the fread() hadn't hit the
> end of the file.  The result is that AFL generates testcases of
> arbitrary size.
> 
> Fix this by fseek'ing to the beginning of the file on every iteration;
> this resets the EOF marker and other state.

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

> This is a candidate for backport to 4.9.

Please let me know when it is committed and I will add it to my
backport list.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 03/12] fuzz/x86_emulate: Implement input_read() and input_avail()
  2017-10-10 16:20 ` [PATCH v3 03/12] fuzz/x86_emulate: Implement input_read() and input_avail() George Dunlap
  2017-10-10 16:52   ` Andrew Cooper
@ 2017-10-10 17:24   ` Ian Jackson
  1 sibling, 0 replies; 45+ messages in thread
From: Ian Jackson @ 2017-10-10 17:24 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Wei Liu, Jan Beulich, Andrew Cooper

George Dunlap writes ("[PATCH v3 03/12] fuzz/x86_emulate: Implement input_read() and input_avail()"):
> Rather than open-coding the "read" from the input file.
> 
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>

Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 08/12] fuzz/x86_emulate: Move definitions into a header
  2017-10-10 16:20 ` [PATCH v3 08/12] fuzz/x86_emulate: Move definitions into a header George Dunlap
@ 2017-10-10 17:25   ` Ian Jackson
  2017-10-11  9:09     ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: Ian Jackson @ 2017-10-10 17:25 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Wei Liu, Jan Beulich, Andrew Cooper

George Dunlap writes ("[PATCH v3 08/12] fuzz/x86_emulate: Move definitions into a header"):
> Move fuzz-emul.c function prototypes into a header.  Also share the
> definition of the input size (rather than hard-coding it in
> fuzz-emul.c).
> 
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>

Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>

> RFC: Worth trying to BUILD_BUG_ON(INPUT_SIZE < DATA_SIZE_FULL)?

I don't mind.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 09/12] fuzz/x86_emulate: Make input more compact
  2017-10-10 16:20 ` [PATCH v3 09/12] fuzz/x86_emulate: Make input more compact George Dunlap
  2017-10-10 16:59   ` Andrew Cooper
@ 2017-10-10 17:26   ` Ian Jackson
  2017-10-10 18:57     ` George Dunlap
  2017-10-11  9:18   ` Jan Beulich
  2 siblings, 1 reply; 45+ messages in thread
From: Ian Jackson @ 2017-10-10 17:26 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Wei Liu, Jan Beulich, Andrew Cooper

George Dunlap writes ("[PATCH v3 09/12] fuzz/x86_emulate: Make input more compact"):
> At the moment, AFL reckons that for any given input, 87% of it is
> completely irrelevant: that is, it can change it as much as it wants
> but have no impact on the result of the test; and yet it can't remove
> it.
> 
> This is largely because we interpret the blob handed to us as a large
> struct, including CR values, MSR values, segment registers, and a full
> cpu_user_regs.
> 
> Instead, modify our interpretation to have a "set state" stanza at the
> front.  Begin by reading a 16-bit value; if it is lower than a certain
> threshold, set some state according to what byte it is, and repeat.
> Continue until the byte is above a certain threshold.
> 
> This allows AFL to compact any given test case much smaller; to the
> point where now it reckons there is not a single byte of the test file
> which becomes irrelevant.  Testing have shown that this option both
> allows AFL to reach coverage much faster, and to have a total coverage
> higher than with the old format.

This is basically a compression scheme.  How odd that it should help.

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 09/12] fuzz/x86_emulate: Make input more compact
  2017-10-10 17:13         ` George Dunlap
@ 2017-10-10 17:31           ` Andrew Cooper
  2017-10-10 20:55             ` George Dunlap
  0 siblings, 1 reply; 45+ messages in thread
From: Andrew Cooper @ 2017-10-10 17:31 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: Ian Jackson, Wei Liu, Jan Beulich

On 10/10/17 18:13, George Dunlap wrote:
> On 10/10/2017 06:11 PM, Andrew Cooper wrote:
>> On 10/10/17 18:01, George Dunlap wrote:
>>> On 10/10/2017 05:59 PM, Andrew Cooper wrote:
>>>> On 10/10/17 17:20, George Dunlap wrote:
>>>>> At the moment, AFL reckons that for any given input, 87% of it is
>>>>> completely irrelevant: that is, it can change it as much as it wants
>>>>> but have no impact on the result of the test; and yet it can't remove
>>>>> it.
>>>>>
>>>>> This is largely because we interpret the blob handed to us as a large
>>>>> struct, including CR values, MSR values, segment registers, and a full
>>>>> cpu_user_regs.
>>>>>
>>>>> Instead, modify our interpretation to have a "set state" stanza at the
>>>>> front.  Begin by reading a 16-bit value; if it is lower than a certain
>>>>> threshold, set some state according to what byte it is, and repeat.
>>>>> Continue until the byte is above a certain threshold.
>>>>>
>>>>> This allows AFL to compact any given test case much smaller; to the
>>>>> point where now it reckons there is not a single byte of the test file
>>>>> which becomes irrelevant.  Testing have shown that this option both
>>>>> allows AFL to reach coverage much faster, and to have a total coverage
>>>>> higher than with the old format.
>>>>>
>>>>> Make this an option (rather than a unilateral change) to enable
>>>>> side-by-side performance comparison of the old and new formats.
>>>>>
>>>>> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
>>>> I am still of the opinion that this is a waste of effort, which would be
>>>> better spent actually removing the irrelevant state in the first place;
>>>> not building an obfuscation algorithm.
>>>>
>>>> I'm not going to nack the patch because that is probably over the top,
>>>> but I'm not in favour if this change going in.
>>> Did you look at the evidence I presented, demonstrating that this
>>> significantly increases the effectiveness of AFL?
>> I can easily believe that you've found an obfucation algorithm which
>> does better than the current state layout.
>>
>> I do not believe that any amount of obfuscation will be better than
>> actually fixing the root cause of the problem; that the current state
>> really is mostly irrelevant, and can easily be shrunk.
> Right; well I've already explained why I don't think "obfuscation" is
> the right term.

How else would describe it?  You are purposefully hiding the structure
of the data by doing conditional initialisation based on earlier values.

> For the time being, we have something which improves
> efficiency;

How much of this measured efficiently is actually ALF finding paths
around setup_state() rather than finding new paths in the emulator
itself?  I can't think of a test which would isolate this properly.

> let's check it in now, and in the future if you or someone
> else finds a way to fix it "properly" we can do that.

I wouldn't really mind so much if this change didn't make it harder to
fix the root cause.  As it is, your prerequisite patch prohibits any
ability to overlay a minimal structure over the fuzzing corpus.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 06/12] fuzz/x86_emulate: Take multiple test files for inputs
  2017-10-10 16:58     ` George Dunlap
@ 2017-10-10 17:56       ` Andrew Cooper
  0 siblings, 0 replies; 45+ messages in thread
From: Andrew Cooper @ 2017-10-10 17:56 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: Ian Jackson, Wei Liu, Jan Beulich

On 10/10/17 17:58, George Dunlap wrote:
> On 10/10/2017 05:56 PM, Andrew Cooper wrote:
>> On 10/10/17 17:20, George Dunlap wrote:
>>> @@ -65,12 +68,15 @@ int main(int argc, char **argv)
>>>  #ifdef __AFL_HAVE_MANUAL_CONTROL
>>>      __AFL_INIT();
>>>  
>>> -    while ( __AFL_LOOP(1000) )
>>> +    for( count = 0; __AFL_LOOP(1000); )
>>> +#else
>>> +    for( count = 0; count < max; count++ )
>>>  #endif
>>>      {
>>>          if ( fp != stdin ) /* If not using stdin, open the provided file. */
>>>          {
>>> -            fp = fopen(argv[optind], "rb");
>>> +            printf("Opening file %s\n", argv[optind]);
>>> +            fp = fopen(argv[optind + count], "rb");
>> I presume the printf() wants adjusting to match the fopen() ?
> Oh!  I thought I'd fixed that.  Indeed it does.
>
> I can fix that on check-in, if we don't find anything bigger worth
> re-sending for.

I can't see anything else needing fixing.  Acked-by: Andrew Cooper
<andrew.cooper3@citrix.com>

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 07/12] fuzz/x86_emulate: Move all state into fuzz_state
  2017-10-10 16:20 ` [PATCH v3 07/12] fuzz/x86_emulate: Move all state into fuzz_state George Dunlap
@ 2017-10-10 18:20   ` Andrew Cooper
  2017-10-11 11:30     ` George Dunlap
  0 siblings, 1 reply; 45+ messages in thread
From: Andrew Cooper @ 2017-10-10 18:20 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: Ian Jackson, Wei Liu, Jan Beulich

On 10/10/17 17:20, George Dunlap wrote:
> This is in preparation for adding the option for a more "compact"
> interpretation of the fuzzing data, in which we only change select
> bits of the state.
>
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
> Acked-by: Jan Beulich <jbeulich@suse.com>
> ---
> v3:
>  - Move DATA_OFFSET inside the structure
>  - Remove a stray blank line
> v2: Port over previous changes
>
> CC: Ian Jackson <ian.jackson@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Jan Beulich <jbeulich@suse.com>
> ---
>  tools/fuzz/x86_instruction_emulator/fuzz-emul.c | 89 +++++++++++++------------
>  1 file changed, 45 insertions(+), 44 deletions(-)
>
> diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
> index 8998f21fe1..20d52b33f8 100644
> --- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
> +++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
> @@ -24,14 +24,8 @@
>  /* Layout of data expected as fuzzing input. */
>  struct fuzz_corpus
>  {
> -    unsigned long cr[5];
> -    uint64_t msr[MSR_INDEX_MAX];
> -    struct cpu_user_regs regs;
> -    struct segment_register segments[SEG_NUM];
> -    unsigned long options;
>      unsigned char data[4096];
>  } input;
> -#define DATA_OFFSET offsetof(struct fuzz_corpus, data)
>  
>  /*
>   * Internal state of the fuzzing harness.  Calculated initially from the input
> @@ -39,7 +33,14 @@ struct fuzz_corpus
>   */

You've invalidated a number of the comments describing behaviours,
including the description of the difference between fuzz_state and
fuzz_corpus.

Why do you need to move any of this in the first place?  If you insist
on keeping the compact mode, what's wrong with conditionally reading the
AFL input into either fuzz_copus entirely, or into fuzz_corpus.data[]
and then conditionally deriving this state?

That way, you don't block work to fix the root cause, which needs to end
up with architectural values in fuzz_state, derived from a bitfields in
fuzz_corpus.

~Andrew

>  struct fuzz_state
>  {
> +    unsigned long options;
> +    unsigned long cr[5];
> +    uint64_t msr[MSR_INDEX_MAX];
> +    struct segment_register segments[SEG_NUM];
> +    struct cpu_user_regs regs;
> +
>      /* Fuzzer's input data. */
> +#define DATA_OFFSET offsetof(struct fuzz_state, corpus)
>      struct fuzz_corpus *corpus;
>  
>      /* Real amount of data backing corpus->data[]. */
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 10/12] fuzz/x86_emulate: Add --rerun option to try to track down instability
  2017-10-10 16:20 ` [PATCH v3 10/12] fuzz/x86_emulate: Add --rerun option to try to track down instability George Dunlap
@ 2017-10-10 18:44   ` Andrew Cooper
  2017-10-11  9:20     ` Jan Beulich
  2017-10-11 15:56     ` George Dunlap
  0 siblings, 2 replies; 45+ messages in thread
From: Andrew Cooper @ 2017-10-10 18:44 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: Ian Jackson, Wei Liu, Jan Beulich

On 10/10/17 17:20, George Dunlap wrote:
> @@ -659,7 +667,10 @@ static void setup_state(struct x86_emulate_ctxt *ctxt)
>      {
>          /* Fuzz all of the state in one go */
>          if ( !input_read(s, s, DATA_SIZE_FULL) )
> +        {
> +            printf("Input size too small\n");
>              exit(-1);
> +        }

Is this hunk intended to be in the previous patch?

> @@ -886,21 +896,144 @@ int LLVMFuzzerInitialize(int *argc, char ***argv)
>      return 0;
>  }
>  
> -int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size)
> +static void setup_fuzz_state(struct fuzz_state *state, const void *data_p, size_t size)
>  {
> -    struct fuzz_state state = {
> -        .ops = all_fuzzer_ops,
> -    };
> -    struct x86_emulate_ctxt ctxt = {
> -        .data = &state,
> -        .regs = &state.regs,
> -        .addr_size = 8 * sizeof(void *),
> -        .sp_size = 8 * sizeof(void *),
> -    };
> +    memset(state, 0, sizeof(*state));
> +    state->corpus = data_p;
> +    state->data_num = size;
> +}
> +
> +static int runtest(struct fuzz_state *state) {
>      int rc;
>  
> -    /* Reset all global state variables */
> -    memset(&input, 0, sizeof(input));
> +    struct x86_emulate_ctxt *ctxt = &state->ctxt;
> +    
> +    state->ops = all_fuzzer_ops;
> +
> +    ctxt->data = state;
> +    ctxt->regs = &state->regs;
> +
> +    setup_state(ctxt);
> +
> +    sanitize_state(ctxt);
> +
> +    disable_hooks(state);
> +
> +    do {
> +        /* FIXME: Until we actually implement SIGFPE handling properly */
> +        setup_fpu_exception_handler();
> +
> +        set_sizes(ctxt);
> +        dump_state(ctxt);
> +
> +        rc = x86_emulate(ctxt, &state->ops);
> +        printf("Emulation result: %d\n", rc);
> +    } while ( rc == X86EMUL_OKAY );
> +
> +    return 0;
> +}
> +
> +static void compare_states(struct fuzz_state state[2])
> +{
> +    // First zero any "internal" pointers

/* */

> +    state[0].corpus = state[1].corpus = NULL;
> +    state[0].ctxt.data = state[1].ctxt.data = NULL;
> +    state[0].ctxt.regs = state[1].ctxt.regs = NULL;
> +
> +    if ( memcmp(&state[0], &state[1], sizeof(struct fuzz_state)) )
> +    {
> +        unsigned int i;
> +
> +        printf("State mismatch\n");
> +
> +        for ( i=0; i<ARRAY_SIZE(state[0].cr); i++ )

Various spaces.

> +            if ( state[0].cr[i] != state[1].cr[i] )
> +                printf("cr[%u]: %lx != %lx\n",
> +                       i, state[0].cr[i], state[1].cr[i]);
> +        
> +        for ( i=0; i<ARRAY_SIZE(state[0].msr); i++ )
> +            if ( state[0].msr[i] != state[1].msr[i] )
> +                printf("msr[%u]: %lx != %lx\n",
> +                       i, state[0].msr[i], state[1].msr[i]);
> +        
> +        for ( i=0; i<ARRAY_SIZE(state[0].segments); i++ )
> +            if ( memcmp(&state[0].segments[i], &state[1].segments[i],
> +                        sizeof(state[0].segments[0])) )
> +                printf("segments[%u]: [%x:%x:%x:%lx] != [%x:%x:%x:%lx]!\n", i,
> +                       (unsigned)state[0].segments[i].sel,
> +                       (unsigned)state[0].segments[i].attr,
> +                       state[0].segments[i].limit,
> +                       state[0].segments[i].base,
> +                       (unsigned)state[1].segments[i].sel,
> +                       (unsigned)state[1].segments[i].attr,
> +                       state[1].segments[i].limit,
> +                       state[1].segments[i].base);
> +
> +        if ( state[0].data_num != state[1].data_num )
> +            printf("data_num: %lx !=  %lx\n", state[0].data_num,
> +                   state[1].data_num);
> +        if ( state[0].data_index != state[1].data_index )
> +            printf("data_index: %lx !=  %lx\n", state[0].data_index,
> +                   state[1].data_index);
> +
> +        if ( memcmp(&state[0].regs, &state[1].regs, sizeof(state[0].regs)) )
> +        {
> +            printf("registers differ!\n");
> +            /* Print If Not Equal */
> +#define PINE(elem)\

PRINT_NE() ?

> +            if ( state[0].elem != state[1].elem ) \
> +                printf(#elem " differ: %lx != %lx\n", \
> +                       (unsigned long)state[0].elem, \
> +                       (unsigned long)state[1].elem)
> +            PINE(regs.r15);

Hmm - this is going to cause problems for the 32bit build.  I can't
think of an easy suggestion to fix it.

> +            PINE(regs.r14);
> +            PINE(regs.r13);
> +            PINE(regs.r12);
> +            PINE(regs.rbp);
> +            PINE(regs.rbx);
> +            PINE(regs.r10);
> +            PINE(regs.r11);
> +            PINE(regs.r9);
> +            PINE(regs.r8);
> +            PINE(regs.rax);
> +            PINE(regs.rcx);
> +            PINE(regs.rdx);
> +            PINE(regs.rsi);
> +            PINE(regs.rdi);
> +
> +            for ( i = offsetof(struct cpu_user_regs, error_code) / sizeof(unsigned);
> +                  i < sizeof(state[1].regs)/sizeof(unsigned); i++ )
> +            {
> +                printf("[%04lu] %08x %08x\n",
> +                        i * sizeof(unsigned), ((unsigned *)&state[0].regs)[i],
> +                       ((unsigned *)&state[1].regs)[i]);

It would be helpful to pull out at least rflags individually, because
that has the highest individual chance of being unstable.  The segment
selectors might also be nice to pull out individually, whereas
everything else should be zero and remain zero throughout (at which
point, hex-dumping is fine).

> +            }
> +        }
> +
> +        if ( memcmp(&state[0].ops, &state[1].ops, sizeof(state[0].ops)) )
> +            printf("ops differ!\n");
> +
> +        if ( memcmp(&state[0].ctxt, &state[1].ctxt, sizeof(state[0].ctxt)) )
> +        {
> +            printf("ctxt differs!\n");
> +            for ( i = 0;  i < sizeof(state[0].ctxt)/sizeof(unsigned); i++ )
> +            {
> +                printf("[%04lu] %08x %08x\n",

%04zu for size_t

> +                        i * sizeof(unsigned), ((unsigned *)&state[0].ctxt)[i],
> +                       ((unsigned *)&state[1].ctxt)[i]);
> +            }
> +            
> +        }
> +
> +        abort();
> +    }
> +}
> +
> +bool opt_rerun = false;

Should this not be somewhere near the top of the file?

~Andrew

> +
> +int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size)
> +{
> +    struct fuzz_state state[2];
>  
>      if ( size < fuzz_minimal_input_size() )
>      {


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 09/12] fuzz/x86_emulate: Make input more compact
  2017-10-10 17:26   ` Ian Jackson
@ 2017-10-10 18:57     ` George Dunlap
  0 siblings, 0 replies; 45+ messages in thread
From: George Dunlap @ 2017-10-10 18:57 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Wei Liu, Jan Beulich, Andrew Cooper


> On Oct 10, 2017, at 6:26 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:
> 
> George Dunlap writes ("[PATCH v3 09/12] fuzz/x86_emulate: Make input more compact"):
>> At the moment, AFL reckons that for any given input, 87% of it is
>> completely irrelevant: that is, it can change it as much as it wants
>> but have no impact on the result of the test; and yet it can't remove
>> it.
>> 
>> This is largely because we interpret the blob handed to us as a large
>> struct, including CR values, MSR values, segment registers, and a full
>> cpu_user_regs.
>> 
>> Instead, modify our interpretation to have a "set state" stanza at the
>> front.  Begin by reading a 16-bit value; if it is lower than a certain
>> threshold, set some state according to what byte it is, and repeat.
>> Continue until the byte is above a certain threshold.
>> 
>> This allows AFL to compact any given test case much smaller; to the
>> point where now it reckons there is not a single byte of the test file
>> which becomes irrelevant.  Testing have shown that this option both
>> allows AFL to reach coverage much faster, and to have a total coverage
>> higher than with the old format.
> 
> This is basically a compression scheme.  How odd that it should help.

Well I’m pretty sure the size of the input file is more or less the precise cause for the difference in speed: Fuzzing a 32-byte file is just a lot faster than fuzzing a 1-k file.  Running them side by side makes the effect more obvious — I’ll show you tomorrow if you’re interested.

Since the file size is the direct cause of the speed difference, having a “compressed” file will naturally make things faster.


> 
> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Thanks.

 -George
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 09/12] fuzz/x86_emulate: Make input more compact
  2017-10-10 17:31           ` Andrew Cooper
@ 2017-10-10 20:55             ` George Dunlap
  0 siblings, 0 replies; 45+ messages in thread
From: George Dunlap @ 2017-10-10 20:55 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Wei Liu, Jan Beulich, Ian Jackson


> On Oct 10, 2017, at 6:31 PM, Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
> 
> On 10/10/17 18:13, George Dunlap wrote:
>> On 10/10/2017 06:11 PM, Andrew Cooper wrote:
>>> On 10/10/17 18:01, George Dunlap wrote:
>>>> On 10/10/2017 05:59 PM, Andrew Cooper wrote:
>>>>> On 10/10/17 17:20, George Dunlap wrote:
>>>>>> At the moment, AFL reckons that for any given input, 87% of it is
>>>>>> completely irrelevant: that is, it can change it as much as it wants
>>>>>> but have no impact on the result of the test; and yet it can't remove
>>>>>> it.
>>>>>> 
>>>>>> This is largely because we interpret the blob handed to us as a large
>>>>>> struct, including CR values, MSR values, segment registers, and a full
>>>>>> cpu_user_regs.
>>>>>> 
>>>>>> Instead, modify our interpretation to have a "set state" stanza at the
>>>>>> front.  Begin by reading a 16-bit value; if it is lower than a certain
>>>>>> threshold, set some state according to what byte it is, and repeat.
>>>>>> Continue until the byte is above a certain threshold.
>>>>>> 
>>>>>> This allows AFL to compact any given test case much smaller; to the
>>>>>> point where now it reckons there is not a single byte of the test file
>>>>>> which becomes irrelevant.  Testing have shown that this option both
>>>>>> allows AFL to reach coverage much faster, and to have a total coverage
>>>>>> higher than with the old format.
>>>>>> 
>>>>>> Make this an option (rather than a unilateral change) to enable
>>>>>> side-by-side performance comparison of the old and new formats.
>>>>>> 
>>>>>> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
>>>>> I am still of the opinion that this is a waste of effort, which would be
>>>>> better spent actually removing the irrelevant state in the first place;
>>>>> not building an obfuscation algorithm.
>>>>> 
>>>>> I'm not going to nack the patch because that is probably over the top,
>>>>> but I'm not in favour if this change going in.
>>>> Did you look at the evidence I presented, demonstrating that this
>>>> significantly increases the effectiveness of AFL?
>>> I can easily believe that you've found an obfucation algorithm which
>>> does better than the current state layout.
>>> 
>>> I do not believe that any amount of obfuscation will be better than
>>> actually fixing the root cause of the problem; that the current state
>>> really is mostly irrelevant, and can easily be shrunk.
>> Right; well I've already explained why I don't think "obfuscation" is
>> the right term.
> 
> 

> How else would describe it?  You are purposefully hiding the structure
> of the data by doing conditional initialisation based on earlier values.

Consider the following progression:

1. Have a big structure, which includes RAX, and have the corpus be loaded into the whole thing (meaning we always fuzz the whole thing).

2. Have <state, value> tuples which will set specific state to specific values; e.g., <FUZZ_rax, 0xfefefefefefefefe>.  

3. Have <offset, value> tuples which write $value into a specific offset of the processor state (including RAX).

I would consider #3 a generalized form of #2.


>> For the time being, we have something which improves
>> efficiency;
> 
> How much of this measured efficiently is actually ALF finding paths
> around setup_state() rather than finding new paths in the emulator
> itself?  I can't think of a test which would isolate this properly.

Come now; we’re talking about thousands of unique “tuples” difference between “all paths found after 24 hours” for the compact and non-compact versions.  (Like, 12k vs 8k.)  Do you really think there are 4000 unique tuples inside that one function?

We now have a way to generate gcov data from AFL test cases, so that’s a theory that can be tested now, if you care to do so.

> 
>> let's check it in now, and in the future if you or someone
>> else finds a way to fix it "properly" we can do that.
> 
> I wouldn't really mind so much if this change didn't make it harder to
> fix the root cause.  As it is, your prerequisite patch prohibits any
> ability to overlay a minimal structure over the fuzzing corpus.

I’d be interested to know what you mean by “a minimal structure” (and “architectural values in fuzz_state derived from bitfields in fuzz_corpus” in another email).

Consider the following three input formats.

1. We bits to fuzz_state->options for whether we should set the GPRs to fuzzed data or not.  If, for instance, the FUZZ_rax bit is set, then we read 8 bytes out of the fuzz state into RAX as part of the setup.

2. A format similar to what I have, but we have an explicit enumeration.  Read “fuzz_target” byte.  If fuzz_target == FUZZ_rax, then you read 64 bytes and write it into regs->rax.

3. The format I have: read an offset, write bytes into that offset.

Clearly there will be similarly ‘compact’ corpus files in all three that specify: “Set RAX to 0xfefefefefefefefe, then execute mov RAX, RBX”.  In all three formats AFL can efficiently mutate the corpus to setting RAX to 0xffffffffffffffff, or RBX to 0xfefefefefefefefe, or running “mov RAX, RDX” instead.  

From a fuzzing perspective, I think #2 and #3 are almost identical.

One advantage of #2 is that it could in theory align more closely with AFL’s “deterministic” fuzzing steps: If the value that ends up being written into RAX always stored as an aligned 64-bit number inside the file, for example, then maybe AFL’s “arithmetic” and “special value” heuristics might be more effective at finding interesting testcases, rather than relying mainly on “havoc” finding those same testcases by random bit-mashing.

One *disadvantage* of #2 is that it will only fuzz the parts of the data structure you expect to be set.  #3 will set *all* parts of your cpu structure, even the parts you don’t think are relevant.  From a “finding bugs in the emulator” perspective, I think #3 is clearly superior.

See the response I gave to Jan about clearing the upper bits of GPRs in 16- or 32-bit mode. Yes, in theory the upper bits should have no effect on the outcome of the emulation.  But if they *did* have an effect, we’d want to know, wouldn’t we?  Similarly, yes, in theory bits of the cpu structure that aren’t used shouldn’t have any effect on the emulation.  But if they *did*, we’d want to know: it would definitely be a bug, and we’d want to make sure that nobody could trigger that state 

I’m not sure if #2 is anything like you had in mind, but it wouldn’t actually be very difficult to morph what I’ve done here into it: we already have a “range” of “offset” values for MSRs, segment registers, and so on; all it would take is adding specifically-enumerated state into that list.

If that’s not your idea, all I can say is this: the primary way the more “compact” state increases effectiveness is by reducing file size.  Anything that doesn’t reduce file size will, I predict, be ineffective.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration
  2017-10-10 16:47 ` [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration George Dunlap
  2017-10-10 16:47   ` Andrew Cooper
@ 2017-10-11  8:59   ` Jan Beulich
  1 sibling, 0 replies; 45+ messages in thread
From: Jan Beulich @ 2017-10-11  8:59 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, Wei Liu, xen-devel, Ian Jackson

>>> On 10.10.17 at 18:47, <george.dunlap@citrix.com> wrote:
> On 10/10/2017 05:20 PM, George Dunlap wrote:
>> Once feof() returns true for a stream, it will continue to return true
>> for that stream until clearerr() is called (or the stream is closed
>> and re-opened).
>> 
>> In llvm-clang-fast-mode, the same file descriptor is used for each
>> iteration of the loop, meaning that the "Input too large" check was
>> broken -- feof() would return true even if the fread() hadn't hit the
>> end of the file.  The result is that AFL generates testcases of
>> arbitrary size.
>> 
>> Fix this by fseek'ing to the beginning of the file on every iteration;
>> this resets the EOF marker and other state.
>> 
>> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
>> ---
>> Changes in v3:
>> - Fix the issue in the official sanctioned way
> 
> Hmm, seems v2 of this patch was checked in; review had flagged up that
> "clearerr()" was too big of a hammer.

Oh, I'm sorry for having overlooked that feedback.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration
  2017-10-10 17:22 ` Ian Jackson
@ 2017-10-11  9:00   ` Jan Beulich
  0 siblings, 0 replies; 45+ messages in thread
From: Jan Beulich @ 2017-10-11  9:00 UTC (permalink / raw)
  To: Ian Jackson; +Cc: AndrewCooper, Wei Liu, George Dunlap, xen-devel

>>> On 10.10.17 at 19:22, <ian.jackson@eu.citrix.com> wrote:
> George Dunlap writes ("[PATCH v3 01/12] fuzz/x86_emulate: Clear errors after 
> each iteration"):
>> Once feof() returns true for a stream, it will continue to return true
>> for that stream until clearerr() is called (or the stream is closed
>> and re-opened).
>> 
>> In llvm-clang-fast-mode, the same file descriptor is used for each
>> iteration of the loop, meaning that the "Input too large" check was
>> broken -- feof() would return true even if the fread() hadn't hit the
>> end of the file.  The result is that AFL generates testcases of
>> arbitrary size.
>> 
>> Fix this by fseek'ing to the beginning of the file on every iteration;
>> this resets the EOF marker and other state.
> 
> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
> 
>> This is a candidate for backport to 4.9.
> 
> Please let me know when it is committed and I will add it to my
> backport list.

I have the original one on mine already, so I can easily add this
one then as well; perhaps I would want to even fold the two into
just a single (good) commit).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 04/12] fuzz/x86_emulate: Rename the file containing the wrapper code
  2017-10-10 16:20 ` [PATCH v3 04/12] fuzz/x86_emulate: Rename the file containing the wrapper code George Dunlap
@ 2017-10-11  9:03   ` Jan Beulich
  0 siblings, 0 replies; 45+ messages in thread
From: Jan Beulich @ 2017-10-11  9:03 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, Wei Liu, xen-devel, Ian Jackson

>>> On 10.10.17 at 18:20, <george.dunlap@citrix.com> wrote:
> When generating coverage output, by default gcov generates output
> filenames based only on the coverage file and the "leaf" source file,
> not the full path.  As a result, it uses the same name for
> x86_emulate.c and x86_emulate/x86_emulate.c, generally overwriting the
> second (which we actually are about) with the first (which is just a
> wrapper).
> 
> Rename the user-space wrapper helpers to x86-emulate.[ch], so
> that it generates separate files.
> 
> There is actually an option to gcov, `--preserve-paths`, which will
> cause the full path name to be included in the filename, properly
> distinguishing between the two.  However, given that the user-space
> wrapper doesn't actually do any emulation (and the poor state of gcov
> documentation making it difficult to find the option in the first
> place), it seems to make more sense to rename the file anyway.
> 
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
> Acked-by: Wei Liu <wei.liu2@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 08/12] fuzz/x86_emulate: Move definitions into a header
  2017-10-10 17:25   ` Ian Jackson
@ 2017-10-11  9:09     ` Jan Beulich
  0 siblings, 0 replies; 45+ messages in thread
From: Jan Beulich @ 2017-10-11  9:09 UTC (permalink / raw)
  To: George Dunlap, Ian Jackson; +Cc: AndrewCooper, Wei Liu, xen-devel

>>> On 10.10.17 at 19:25, <ian.jackson@eu.citrix.com> wrote:
> George Dunlap writes ("[PATCH v3 08/12] fuzz/x86_emulate: Move definitions 
> into a header"):
>> Move fuzz-emul.c function prototypes into a header.  Also share the
>> definition of the input size (rather than hard-coding it in
>> fuzz-emul.c).
>> 
>> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
> 
> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>
(also if you add ...

>> RFC: Worth trying to BUILD_BUG_ON(INPUT_SIZE < DATA_SIZE_FULL)?
> 
> I don't mind.

... this)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 09/12] fuzz/x86_emulate: Make input more compact
  2017-10-10 16:20 ` [PATCH v3 09/12] fuzz/x86_emulate: Make input more compact George Dunlap
  2017-10-10 16:59   ` Andrew Cooper
  2017-10-10 17:26   ` Ian Jackson
@ 2017-10-11  9:18   ` Jan Beulich
  2 siblings, 0 replies; 45+ messages in thread
From: Jan Beulich @ 2017-10-11  9:18 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, Wei Liu, xen-devel, Ian Jackson

>>> On 10.10.17 at 18:20, <george.dunlap@citrix.com> wrote:
> At the moment, AFL reckons that for any given input, 87% of it is
> completely irrelevant: that is, it can change it as much as it wants
> but have no impact on the result of the test; and yet it can't remove
> it.
> 
> This is largely because we interpret the blob handed to us as a large
> struct, including CR values, MSR values, segment registers, and a full
> cpu_user_regs.
> 
> Instead, modify our interpretation to have a "set state" stanza at the
> front.  Begin by reading a 16-bit value; if it is lower than a certain
> threshold, set some state according to what byte it is, and repeat.
> Continue until the byte is above a certain threshold.
> 
> This allows AFL to compact any given test case much smaller; to the
> point where now it reckons there is not a single byte of the test file
> which becomes irrelevant.  Testing have shown that this option both
> allows AFL to reach coverage much faster, and to have a total coverage
> higher than with the old format.
> 
> Make this an option (rather than a unilateral change) to enable
> side-by-side performance comparison of the old and new formats.
> 
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>

Without meaning to override Andrew's objections, in case
he can grudgingly accept this going in
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 10/12] fuzz/x86_emulate: Add --rerun option to try to track down instability
  2017-10-10 18:44   ` Andrew Cooper
@ 2017-10-11  9:20     ` Jan Beulich
  2017-10-11 15:56     ` George Dunlap
  1 sibling, 0 replies; 45+ messages in thread
From: Jan Beulich @ 2017-10-11  9:20 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Ian Jackson, Wei Liu, George Dunlap, xen-devel

>>> On 10.10.17 at 20:44, <andrew.cooper3@citrix.com> wrote:
> On 10/10/17 17:20, George Dunlap wrote:
>> +            if ( state[0].elem != state[1].elem ) \
>> +                printf(#elem " differ: %lx != %lx\n", \
>> +                       (unsigned long)state[0].elem, \
>> +                       (unsigned long)state[1].elem)
>> +            PINE(regs.r15);
> 
> Hmm - this is going to cause problems for the 32bit build.  I can't
> think of an easy suggestion to fix it.

The fuzzer isn't being built as a 32-bit binary:

ifeq ($(CONFIG_X86_64),y)
x86-insn-fuzz-all: x86-insn-fuzzer.a fuzz-emul.o afl
else
x86-insn-fuzz-all:
endif

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 11/12] fuzz/x86_emulate: Set and fuzz more CPU state
  2017-10-10 16:20 ` [PATCH v3 11/12] fuzz/x86_emulate: Set and fuzz more CPU state George Dunlap
@ 2017-10-11  9:31   ` Jan Beulich
  2017-10-11 16:52     ` George Dunlap
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2017-10-11  9:31 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, Wei Liu, xen-devel, Ian Jackson

>>> On 10.10.17 at 18:20, <george.dunlap@citrix.com> wrote:
> --- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
> +++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
> @@ -40,6 +40,8 @@ struct fuzz_state
>      uint64_t msr[MSR_INDEX_MAX];
>      struct segment_register segments[SEG_NUM];
>      struct cpu_user_regs regs;
> +    char fxsave[512] __attribute__((aligned(16)));
> +
>  
>      /* Fuzzer's input data. */

No double blank lines please.

> @@ -596,6 +598,54 @@ static const struct x86_emulate_ops all_fuzzer_ops = {
>  };
>  #undef SET
>  
> +/*
> + * This funciton will read or write fxsave to the fpu.  When writing,
> + * it 'sanitizes' the state: It will mask off the appropriate bits in
> + * the mxcsr, 'restore' the state to the fpu, then 'save' it again so
> + * that the data in fxsave reflects what's actually in the FPU.
> + *
> + * TODO: Extend state beyond just FPU (ymm registers, &c)
> + */
> +static void _set_fpu_state(char *fxsave, bool write)
> +{
> +    if ( cpu_has_fxsr )
> +    {
> +        static union __attribute__((__aligned__(16))) {
> +            char x[512];
> +            struct {
> +                uint32_t other[6];
> +                uint32_t mxcsr;
> +                uint32_t mxcsr_mask;
> +                /* ... */
> +            };
> +        } *fxs;
> +
> +        fxs = (typeof(fxs)) fxsave;

Stray blank after the cast.

> +        if ( write )
> +        {
> +            char null[512] __attribute__((aligned(16))) = { };
> +            
> +            fxs->mxcsr &= mxcsr_mask;
> +
> +            asm volatile( "fxrstor %0" :: "m" (*null) );
> +            asm volatile( "fxrstor %0" :: "m" (*fxs) );

Without a comment I still don't really understand what good this
double load is intended to do. In particular I don't think there are
any state components that the first may alter but the second
won't. Quite the opposite, on AMD I think you may end up not
altering FIP/FDP/FOP despite this double load (iirc they're being
loaded only when an exception is indicated to be pending in the
status word; see fpu_fxrstor() in the hypervisor).

> @@ -737,6 +791,18 @@ static void setup_state(struct x86_emulate_ctxt *ctxt)
>              printf("Setting cpu_user_regs offset %x\n", offset);
>              continue;
>          }
> +        offset -= sizeof(struct cpu_user_regs);
> +
> +        /* Fuzz fxsave state */
> +        if ( offset < 128 )

sizeof(s->fxsave) / 4

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 12/12] fuzz/x86_emulate: Add an option to limit the number of instructions executed
  2017-10-10 16:20 ` [PATCH v3 12/12] fuzz/x86_emulate: Add an option to limit the number of instructions executed George Dunlap
@ 2017-10-11  9:34   ` Jan Beulich
  0 siblings, 0 replies; 45+ messages in thread
From: Jan Beulich @ 2017-10-11  9:34 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, Wei Liu, xen-devel, Ian Jackson

>>> On 10.10.17 at 18:20, <george.dunlap@citrix.com> wrote:
> AFL considers a testcase to be a useful addition not only if there are
> tuples exercised by that testcase which were not exercised otherwise,
> but also if the *number* of times an individual tuple is exercised
> changes significantly; in particular, if the number of the highest
> non-zero bit changes (i.e., if it is run 1, 2-3, 4-7, 8-15, &c).
> 
> One simple way to increase these stats it to execute the same (or
> similar) instructions multiple times: If executing a given instruction
> once hits a particular tuple 2 times, executing it twice will hit the
> tuple 4 times, four times will hit the tuple 8 times, and so on.  All
> of these will look different to AFL, and so it is likely that many of
> the "unique test cases" will simply be the same instruction repeated
> powers of 2 times until the tuple counts max out (at 128).
> 
> It is unlikely that executing a single instruction more than a handful
> of times will generate any state we actually care about; but such long
> testcases take exponentially longer to fuzz: the fuzzer spends more
> time flipping bits looking for meaningful changes, and each execution
> takes longer because it is doing more things.  So long paths which add
> nothing to the actual code coverage but effectively "distract" the
> fuzzer, making it less effective.
> 
> Experiments have shown that not allowing infinite number of
> instruction retries for the old (non-compact) format does indeed speed
> up and increase code coverage.  However, it has also shown that on the
> new, more compact format, having no instruction limit causes the highest
> throughput in code coverage.
> 
> So leave the option in, but have it default to 0 (no limit).
> 
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 07/12] fuzz/x86_emulate: Move all state into fuzz_state
  2017-10-10 18:20   ` Andrew Cooper
@ 2017-10-11 11:30     ` George Dunlap
  2017-10-11 14:50       ` George Dunlap
  0 siblings, 1 reply; 45+ messages in thread
From: George Dunlap @ 2017-10-11 11:30 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: Ian Jackson, Wei Liu, Jan Beulich

On 10/10/2017 07:20 PM, Andrew Cooper wrote:
> On 10/10/17 17:20, George Dunlap wrote:
>> This is in preparation for adding the option for a more "compact"
>> interpretation of the fuzzing data, in which we only change select
>> bits of the state.
>>
>> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
>> Acked-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> v3:
>>  - Move DATA_OFFSET inside the structure
>>  - Remove a stray blank line
>> v2: Port over previous changes
>>
>> CC: Ian Jackson <ian.jackson@citrix.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>> CC: Jan Beulich <jbeulich@suse.com>
>> ---
>>  tools/fuzz/x86_instruction_emulator/fuzz-emul.c | 89 +++++++++++++------------
>>  1 file changed, 45 insertions(+), 44 deletions(-)
>>
>> diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
>> index 8998f21fe1..20d52b33f8 100644
>> --- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
>> +++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
>> @@ -24,14 +24,8 @@
>>  /* Layout of data expected as fuzzing input. */
>>  struct fuzz_corpus
>>  {
>> -    unsigned long cr[5];
>> -    uint64_t msr[MSR_INDEX_MAX];
>> -    struct cpu_user_regs regs;
>> -    struct segment_register segments[SEG_NUM];
>> -    unsigned long options;
>>      unsigned char data[4096];
>>  } input;
>> -#define DATA_OFFSET offsetof(struct fuzz_corpus, data)
>>  
>>  /*
>>   * Internal state of the fuzzing harness.  Calculated initially from the input
>> @@ -39,7 +33,14 @@ struct fuzz_corpus
>>   */
> 
> You've invalidated a number of the comments describing behaviours,
> including the description of the difference between fuzz_state and
> fuzz_corpus.

Well completely apart from the 'compact' format, I think this move makes
sense.  The state moved is actually the state of the "emulated cpu" --
the emulator actually modifies this state as instructions are executed.
I think it makes sense to keep the "current state of the virtual
processor" separate from "input we get from a file".

In fact, the comment above this says:  "Internal state of the fuzzing
harness.  Calculated initially from the input corpus, and later
[mutated] by the emulation callbacks."

This actually makes that comment *more* true, since before this patch
almost the only state modified by the emulation callbacks was actually
in fuzz_corpus, not fuzz_state.

> Why do you need to move any of this in the first place?  If you insist
> on keeping the compact mode, what's wrong with conditionally reading the
> AFL input into either fuzz_copus entirely, or into fuzz_corpus.data[]
> and then conditionally deriving this state?

I don't see any advantage of that.  In fact it would mean that the name
"input" and "fuzz_corpus" even more misleading than they are before this
patch.

> That way, you don't block work to fix the root cause, which needs to end
> up with architectural values in fuzz_state, derived from a bitfields in
> fuzz_corpus.

x86_emulate() needs a cpu_user_regs structure; so if you want the data
in fuzz_corpus not to look exactly like cpu_user_regs, then you'll need
to have another place to put cpu_user_regs, and "populate" it based on
the data (whatever that looks like).  The most obvious thing to do is to
is to do exactly what I've done -- place cpu_user_regs inside fuzz_state.

The same thing is true for the other bits of data -- the read_* and
write* callbacks need "unpacked" state which they can read and modify.
The most obvious thing to do is to have arrays in fuzz_state which they
can simply read and write, and to populate them based on whatever
structure "compact" structure you end up with.

If you want to re-introduce a more compact structure format for the
input file, there are lots of options.  We can make fuzz_corpus into a
union.  We can have 'input' be a pure char array, and cast it into
several different structures depending on how we want to interpret it.
Or, we can define a local structure on a stack and "read" from the main
input file via input_read() and friends.

This patch makes all of those options easier, not harder.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 07/12] fuzz/x86_emulate: Move all state into fuzz_state
  2017-10-11 11:30     ` George Dunlap
@ 2017-10-11 14:50       ` George Dunlap
  0 siblings, 0 replies; 45+ messages in thread
From: George Dunlap @ 2017-10-11 14:50 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: Ian Jackson, Wei Liu, Jan Beulich

On 10/11/2017 12:30 PM, George Dunlap wrote:
> On 10/10/2017 07:20 PM, Andrew Cooper wrote:
>> On 10/10/17 17:20, George Dunlap wrote:
>>> This is in preparation for adding the option for a more "compact"
>>> interpretation of the fuzzing data, in which we only change select
>>> bits of the state.
>>>
>>> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
>>> Acked-by: Jan Beulich <jbeulich@suse.com>
>>> ---
>>> v3:
>>>  - Move DATA_OFFSET inside the structure
>>>  - Remove a stray blank line
>>> v2: Port over previous changes
>>>
>>> CC: Ian Jackson <ian.jackson@citrix.com>
>>> CC: Wei Liu <wei.liu2@citrix.com>
>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>>> CC: Jan Beulich <jbeulich@suse.com>
>>> ---
>>>  tools/fuzz/x86_instruction_emulator/fuzz-emul.c | 89 +++++++++++++------------
>>>  1 file changed, 45 insertions(+), 44 deletions(-)
>>>
>>> diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
>>> index 8998f21fe1..20d52b33f8 100644
>>> --- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
>>> +++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
>>> @@ -24,14 +24,8 @@
>>>  /* Layout of data expected as fuzzing input. */
>>>  struct fuzz_corpus
>>>  {
>>> -    unsigned long cr[5];
>>> -    uint64_t msr[MSR_INDEX_MAX];
>>> -    struct cpu_user_regs regs;
>>> -    struct segment_register segments[SEG_NUM];
>>> -    unsigned long options;
>>>      unsigned char data[4096];
>>>  } input;
>>> -#define DATA_OFFSET offsetof(struct fuzz_corpus, data)
>>>  
>>>  /*
>>>   * Internal state of the fuzzing harness.  Calculated initially from the input
>>> @@ -39,7 +33,14 @@ struct fuzz_corpus
>>>   */
>>
>> You've invalidated a number of the comments describing behaviours,
>> including the description of the difference between fuzz_state and
>> fuzz_corpus.
> 
> Well completely apart from the 'compact' format, I think this move makes
> sense.  The state moved is actually the state of the "emulated cpu" --
> the emulator actually modifies this state as instructions are executed.
> I think it makes sense to keep the "current state of the virtual
> processor" separate from "input we get from a file".

It's also necessary for when we add the `--rerun` parameter: We have to
make sure we leave the input data alone, and have two parallel states
that we set up and can compare.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 10/12] fuzz/x86_emulate: Add --rerun option to try to track down instability
  2017-10-10 18:44   ` Andrew Cooper
  2017-10-11  9:20     ` Jan Beulich
@ 2017-10-11 15:56     ` George Dunlap
  1 sibling, 0 replies; 45+ messages in thread
From: George Dunlap @ 2017-10-11 15:56 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: Ian Jackson, Wei Liu, Jan Beulich

On 10/10/2017 07:44 PM, Andrew Cooper wrote:
> On 10/10/17 17:20, George Dunlap wrote:
>> @@ -659,7 +667,10 @@ static void setup_state(struct x86_emulate_ctxt *ctxt)
>>      {
>>          /* Fuzz all of the state in one go */
>>          if ( !input_read(s, s, DATA_SIZE_FULL) )
>> +        {
>> +            printf("Input size too small\n");
>>              exit(-1);
>> +        }
> 
> Is this hunk intended to be in the previous patch?

Looking more into it, I suppose it shouldn't be needed at all, since we
check the input size in the main fuzzing function.

> 
>> @@ -886,21 +896,144 @@ int LLVMFuzzerInitialize(int *argc, char ***argv)
>>      return 0;
>>  }
>>  
>> -int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size)
>> +static void setup_fuzz_state(struct fuzz_state *state, const void *data_p, size_t size)
>>  {
>> -    struct fuzz_state state = {
>> -        .ops = all_fuzzer_ops,
>> -    };
>> -    struct x86_emulate_ctxt ctxt = {
>> -        .data = &state,
>> -        .regs = &state.regs,
>> -        .addr_size = 8 * sizeof(void *),
>> -        .sp_size = 8 * sizeof(void *),
>> -    };
>> +    memset(state, 0, sizeof(*state));
>> +    state->corpus = data_p;
>> +    state->data_num = size;
>> +}
>> +
>> +static int runtest(struct fuzz_state *state) {
>>      int rc;
>>  
>> -    /* Reset all global state variables */
>> -    memset(&input, 0, sizeof(input));
>> +    struct x86_emulate_ctxt *ctxt = &state->ctxt;
>> +    
>> +    state->ops = all_fuzzer_ops;
>> +
>> +    ctxt->data = state;
>> +    ctxt->regs = &state->regs;
>> +
>> +    setup_state(ctxt);
>> +
>> +    sanitize_state(ctxt);
>> +
>> +    disable_hooks(state);
>> +
>> +    do {
>> +        /* FIXME: Until we actually implement SIGFPE handling properly */
>> +        setup_fpu_exception_handler();
>> +
>> +        set_sizes(ctxt);
>> +        dump_state(ctxt);
>> +
>> +        rc = x86_emulate(ctxt, &state->ops);
>> +        printf("Emulation result: %d\n", rc);
>> +    } while ( rc == X86EMUL_OKAY );
>> +
>> +    return 0;
>> +}
>> +
>> +static void compare_states(struct fuzz_state state[2])
>> +{
>> +    // First zero any "internal" pointers
> 
> /* */

Ack

> 
>> +    state[0].corpus = state[1].corpus = NULL;
>> +    state[0].ctxt.data = state[1].ctxt.data = NULL;
>> +    state[0].ctxt.regs = state[1].ctxt.regs = NULL;
>> +
>> +    if ( memcmp(&state[0], &state[1], sizeof(struct fuzz_state)) )
>> +    {
>> +        unsigned int i;
>> +
>> +        printf("State mismatch\n");
>> +
>> +        for ( i=0; i<ARRAY_SIZE(state[0].cr); i++ )
> 
> Various spaces.

Ack

> 
>> +            if ( state[0].cr[i] != state[1].cr[i] )
>> +                printf("cr[%u]: %lx != %lx\n",
>> +                       i, state[0].cr[i], state[1].cr[i]);
>> +        
>> +        for ( i=0; i<ARRAY_SIZE(state[0].msr); i++ )
>> +            if ( state[0].msr[i] != state[1].msr[i] )
>> +                printf("msr[%u]: %lx != %lx\n",
>> +                       i, state[0].msr[i], state[1].msr[i]);
>> +        
>> +        for ( i=0; i<ARRAY_SIZE(state[0].segments); i++ )
>> +            if ( memcmp(&state[0].segments[i], &state[1].segments[i],
>> +                        sizeof(state[0].segments[0])) )
>> +                printf("segments[%u]: [%x:%x:%x:%lx] != [%x:%x:%x:%lx]!\n", i,
>> +                       (unsigned)state[0].segments[i].sel,
>> +                       (unsigned)state[0].segments[i].attr,
>> +                       state[0].segments[i].limit,
>> +                       state[0].segments[i].base,
>> +                       (unsigned)state[1].segments[i].sel,
>> +                       (unsigned)state[1].segments[i].attr,
>> +                       state[1].segments[i].limit,
>> +                       state[1].segments[i].base);
>> +
>> +        if ( state[0].data_num != state[1].data_num )
>> +            printf("data_num: %lx !=  %lx\n", state[0].data_num,
>> +                   state[1].data_num);
>> +        if ( state[0].data_index != state[1].data_index )
>> +            printf("data_index: %lx !=  %lx\n", state[0].data_index,
>> +                   state[1].data_index);
>> +
>> +        if ( memcmp(&state[0].regs, &state[1].regs, sizeof(state[0].regs)) )
>> +        {
>> +            printf("registers differ!\n");
>> +            /* Print If Not Equal */
>> +#define PINE(elem)\
> 
> PRINT_NE() ?

OK.

> 
>> +            if ( state[0].elem != state[1].elem ) \
>> +                printf(#elem " differ: %lx != %lx\n", \
>> +                       (unsigned long)state[0].elem, \
>> +                       (unsigned long)state[1].elem)
>> +            PINE(regs.r15);
> 
> Hmm - this is going to cause problems for the 32bit build.  I can't
> think of an easy suggestion to fix it.
> 
>> +            PINE(regs.r14);
>> +            PINE(regs.r13);
>> +            PINE(regs.r12);
>> +            PINE(regs.rbp);
>> +            PINE(regs.rbx);
>> +            PINE(regs.r10);
>> +            PINE(regs.r11);
>> +            PINE(regs.r9);
>> +            PINE(regs.r8);
>> +            PINE(regs.rax);
>> +            PINE(regs.rcx);
>> +            PINE(regs.rdx);
>> +            PINE(regs.rsi);
>> +            PINE(regs.rdi);
>> +
>> +            for ( i = offsetof(struct cpu_user_regs, error_code) / sizeof(unsigned);
>> +                  i < sizeof(state[1].regs)/sizeof(unsigned); i++ )
>> +            {
>> +                printf("[%04lu] %08x %08x\n",
>> +                        i * sizeof(unsigned), ((unsigned *)&state[0].regs)[i],
>> +                       ((unsigned *)&state[1].regs)[i]);
> 
> It would be helpful to pull out at least rflags individually, because
> that has the highest individual chance of being unstable.  The segment
> selectors might also be nice to pull out individually, whereas
> everything else should be zero and remain zero throughout (at which
> point, hex-dumping is fine).
> 
>> +            }
>> +        }
>> +
>> +        if ( memcmp(&state[0].ops, &state[1].ops, sizeof(state[0].ops)) )
>> +            printf("ops differ!\n");
>> +
>> +        if ( memcmp(&state[0].ctxt, &state[1].ctxt, sizeof(state[0].ctxt)) )
>> +        {
>> +            printf("ctxt differs!\n");
>> +            for ( i = 0;  i < sizeof(state[0].ctxt)/sizeof(unsigned); i++ )
>> +            {
>> +                printf("[%04lu] %08x %08x\n",
> 
> %04zu for size_t

Ack

> 
>> +                        i * sizeof(unsigned), ((unsigned *)&state[0].ctxt)[i],
>> +                       ((unsigned *)&state[1].ctxt)[i]);
>> +            }
>> +            
>> +        }
>> +
>> +        abort();
>> +    }
>> +}
>> +
>> +bool opt_rerun = false;
> 
> Should this not be somewhere near the top of the file?

I've been introducing these just before they're used so it's easier to
see the default value.  If you prefer them all to be at the top of the
file I can move them there.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 11/12] fuzz/x86_emulate: Set and fuzz more CPU state
  2017-10-11  9:31   ` Jan Beulich
@ 2017-10-11 16:52     ` George Dunlap
  2017-10-12  9:58       ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: George Dunlap @ 2017-10-11 16:52 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel, Ian Jackson

On 10/11/2017 10:31 AM, Jan Beulich wrote:
>>>> On 10.10.17 at 18:20, <george.dunlap@citrix.com> wrote:
>> --- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
>> +++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
>> @@ -40,6 +40,8 @@ struct fuzz_state
>>      uint64_t msr[MSR_INDEX_MAX];
>>      struct segment_register segments[SEG_NUM];
>>      struct cpu_user_regs regs;
>> +    char fxsave[512] __attribute__((aligned(16)));
>> +
>>  
>>      /* Fuzzer's input data. */
> 
> No double blank lines please.

<smacks head>

> 
>> @@ -596,6 +598,54 @@ static const struct x86_emulate_ops all_fuzzer_ops = {
>>  };
>>  #undef SET
>>  
>> +/*
>> + * This funciton will read or write fxsave to the fpu.  When writing,
>> + * it 'sanitizes' the state: It will mask off the appropriate bits in
>> + * the mxcsr, 'restore' the state to the fpu, then 'save' it again so
>> + * that the data in fxsave reflects what's actually in the FPU.
>> + *
>> + * TODO: Extend state beyond just FPU (ymm registers, &c)
>> + */
>> +static void _set_fpu_state(char *fxsave, bool write)
>> +{
>> +    if ( cpu_has_fxsr )
>> +    {
>> +        static union __attribute__((__aligned__(16))) {
>> +            char x[512];
>> +            struct {
>> +                uint32_t other[6];
>> +                uint32_t mxcsr;
>> +                uint32_t mxcsr_mask;
>> +                /* ... */
>> +            };
>> +        } *fxs;
>> +
>> +        fxs = (typeof(fxs)) fxsave;
> 
> Stray blank after the cast.
> 
>> +        if ( write )
>> +        {
>> +            char null[512] __attribute__((aligned(16))) = { };
>> +            
>> +            fxs->mxcsr &= mxcsr_mask;
>> +
>> +            asm volatile( "fxrstor %0" :: "m" (*null) );
>> +            asm volatile( "fxrstor %0" :: "m" (*fxs) );
> 
> Without a comment I still don't really understand what good this
> double load is intended to do. In particular I don't think there are
> any state components that the first may alter but the second
> won't. Quite the opposite, on AMD I think you may end up not
> altering FIP/FDP/FOP despite this double load (iirc they're being
> loaded only when an exception is indicated to be pending in the
> status word; see fpu_fxrstor() in the hypervisor).

As I said, somewhere online I think I read that doing an fxrstor with a
certain part of the state as all zeros would "reset" the fpu.  But I
don't see that in the manual, so it's probably wrong (or at least not
architectural); in which case I should just remove the first fxrstor.

But you haven't given me clear direction about what I should do instead.
 Should I attempt to copy the functionality of fpu_fxrstor() somehow?

>> @@ -737,6 +791,18 @@ static void setup_state(struct x86_emulate_ctxt *ctxt)
>>              printf("Setting cpu_user_regs offset %x\n", offset);
>>              continue;
>>          }
>> +        offset -= sizeof(struct cpu_user_regs);
>> +
>> +        /* Fuzz fxsave state */
>> +        if ( offset < 128 )
> 
> sizeof(s->fxsave) / 4

Ack.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 11/12] fuzz/x86_emulate: Set and fuzz more CPU state
  2017-10-11 16:52     ` George Dunlap
@ 2017-10-12  9:58       ` Jan Beulich
  0 siblings, 0 replies; 45+ messages in thread
From: Jan Beulich @ 2017-10-12  9:58 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, Wei Liu, xen-devel, Ian Jackson

>>> On 11.10.17 at 18:52, <george.dunlap@citrix.com> wrote:
> On 10/11/2017 10:31 AM, Jan Beulich wrote:
>>>>> On 10.10.17 at 18:20, <george.dunlap@citrix.com> wrote:
>>> --- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
>>> +++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
>>> @@ -40,6 +40,8 @@ struct fuzz_state
>>>      uint64_t msr[MSR_INDEX_MAX];
>>>      struct segment_register segments[SEG_NUM];
>>>      struct cpu_user_regs regs;
>>> +    char fxsave[512] __attribute__((aligned(16)));
>>> +
>>>  
>>>      /* Fuzzer's input data. */
>> 
>> No double blank lines please.
> 
> <smacks head>

Please don't.

>>> +        if ( write )
>>> +        {
>>> +            char null[512] __attribute__((aligned(16))) = { };
>>> +            
>>> +            fxs->mxcsr &= mxcsr_mask;
>>> +
>>> +            asm volatile( "fxrstor %0" :: "m" (*null) );
>>> +            asm volatile( "fxrstor %0" :: "m" (*fxs) );
>> 
>> Without a comment I still don't really understand what good this
>> double load is intended to do. In particular I don't think there are
>> any state components that the first may alter but the second
>> won't. Quite the opposite, on AMD I think you may end up not
>> altering FIP/FDP/FOP despite this double load (iirc they're being
>> loaded only when an exception is indicated to be pending in the
>> status word; see fpu_fxrstor() in the hypervisor).
> 
> As I said, somewhere online I think I read that doing an fxrstor with a
> certain part of the state as all zeros would "reset" the fpu.  But I
> don't see that in the manual, so it's probably wrong (or at least not
> architectural); in which case I should just remove the first fxrstor.
> 
> But you haven't given me clear direction about what I should do instead.
>  Should I attempt to copy the functionality of fpu_fxrstor() somehow?

Well, what (if anything) to do about this depends on what you
want to achieve. Random input to FXRSTOR isn't going to be a
problem if you don't expect said fields to never change across
emulation. FPU insns may result in the fields changing in any
event. If the fields appearing to have changed without any FPU
insn having been emulated is not a problem, not doing anything
may be good enough. Otherwise cloning some of the logic from
fpu_fxrstor() may be necessary, but to give more precise advice
I'd prefer to know some more detail on what unreliable state
transition is seen with what insn sequence.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2017-10-12  9:58 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-10 16:20 [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration George Dunlap
2017-10-10 16:20 ` [PATCH v3 02/12] fuzz/x86_emulate: Improve failure descriptions in x86_emulate harness George Dunlap
2017-10-10 16:20 ` [PATCH v3 03/12] fuzz/x86_emulate: Implement input_read() and input_avail() George Dunlap
2017-10-10 16:52   ` Andrew Cooper
2017-10-10 17:24   ` Ian Jackson
2017-10-10 16:20 ` [PATCH v3 04/12] fuzz/x86_emulate: Rename the file containing the wrapper code George Dunlap
2017-10-11  9:03   ` Jan Beulich
2017-10-10 16:20 ` [PATCH v3 05/12] fuzz/x86_emulate: Add 'afl-cov' target George Dunlap
2017-10-10 16:53   ` Andrew Cooper
2017-10-10 16:20 ` [PATCH v3 06/12] fuzz/x86_emulate: Take multiple test files for inputs George Dunlap
2017-10-10 16:56   ` Andrew Cooper
2017-10-10 16:58     ` George Dunlap
2017-10-10 17:56       ` Andrew Cooper
2017-10-10 16:20 ` [PATCH v3 07/12] fuzz/x86_emulate: Move all state into fuzz_state George Dunlap
2017-10-10 18:20   ` Andrew Cooper
2017-10-11 11:30     ` George Dunlap
2017-10-11 14:50       ` George Dunlap
2017-10-10 16:20 ` [PATCH v3 08/12] fuzz/x86_emulate: Move definitions into a header George Dunlap
2017-10-10 17:25   ` Ian Jackson
2017-10-11  9:09     ` Jan Beulich
2017-10-10 16:20 ` [PATCH v3 09/12] fuzz/x86_emulate: Make input more compact George Dunlap
2017-10-10 16:59   ` Andrew Cooper
2017-10-10 17:01     ` George Dunlap
2017-10-10 17:11       ` Andrew Cooper
2017-10-10 17:13         ` George Dunlap
2017-10-10 17:31           ` Andrew Cooper
2017-10-10 20:55             ` George Dunlap
2017-10-10 17:26   ` Ian Jackson
2017-10-10 18:57     ` George Dunlap
2017-10-11  9:18   ` Jan Beulich
2017-10-10 16:20 ` [PATCH v3 10/12] fuzz/x86_emulate: Add --rerun option to try to track down instability George Dunlap
2017-10-10 18:44   ` Andrew Cooper
2017-10-11  9:20     ` Jan Beulich
2017-10-11 15:56     ` George Dunlap
2017-10-10 16:20 ` [PATCH v3 11/12] fuzz/x86_emulate: Set and fuzz more CPU state George Dunlap
2017-10-11  9:31   ` Jan Beulich
2017-10-11 16:52     ` George Dunlap
2017-10-12  9:58       ` Jan Beulich
2017-10-10 16:20 ` [PATCH v3 12/12] fuzz/x86_emulate: Add an option to limit the number of instructions executed George Dunlap
2017-10-11  9:34   ` Jan Beulich
2017-10-10 16:47 ` [PATCH v3 01/12] fuzz/x86_emulate: Clear errors after each iteration George Dunlap
2017-10-10 16:47   ` Andrew Cooper
2017-10-11  8:59   ` Jan Beulich
2017-10-10 17:22 ` Ian Jackson
2017-10-11  9:00   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.