All of lore.kernel.org
 help / color / mirror / Atom feed
From: George Dunlap <george.dunlap@citrix.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>,
	xen-devel@lists.xenproject.org
Cc: Ian Jackson <ian.jackson@citrix.com>,
	Wei Liu <wei.liu2@citrix.com>, Jan Beulich <jbeulich@suse.com>
Subject: Re: [PATCH 11/14] fuzz/x86_emulate: Make input more compact
Date: Mon, 28 Aug 2017 10:10:18 +0100	[thread overview]
Message-ID: <cf91db57-f289-fee9-b836-737d96f99fee@citrix.com> (raw)
In-Reply-To: <aa1a6bb3-96b2-d3df-f7b2-95d8b67ca8ef@citrix.com>

On 08/25/2017 06:59 PM, Andrew Cooper wrote:
> On 25/08/17 17:43, George Dunlap wrote:
>> At the moment, AFL reckons that for any given input, 87% of it is
>> completely irrelevant: that is, it can change it as much as it wants
>> but have no impact on the result of the test; and yet it can't remove
>> it.
>>
>> This is largely because we interpret the blob handed to us as a large
>> struct, including CR values, MSR values, segment registers, and a full
>> cpu_user_regs.
>>
>> Instead, modify our interpretation to have a "set state" stanza at the
>> front.  Begin by reading a byte; if it is lower than a certain
>> threshold, set some state according to what byte it is, and repeat.
>> Continue until the byte is above a certain threshold.
>>
>> This allows AFL to compact any given test case much smaller; to the
>> point where now it reckons there is not a single byte of the test file
>> which becomes irrelevant.  Testing have shown that this option both
>> allows AFL to reach coverage much faster, and to have a total coverage
>> higher than with the old format.
>>
>> Make this an option (rather than a unilateral change) to enable
>> side-by-side performance comparison of the old and new formats.
>>
>> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
> 
> I continue to think this is a bad idea.  You are taking a genuine
> problem and adding a complicated algorithm to try and fool alf, rather
> than fixing the problem.
> 
> The reason 87% of input is irrelevant is because it really is.  The
> input state is full of 64bit values being used for a one or two bits
> which we ever look at.

My take on it is this.

The state struct is very large -- I think it's around 500 bytes without
the FPU state, and over 1k with it.

Nearly every bit of the state has *some* instruction that interacts with
it.  But in order for AFL to notice that, it has to find a combination
<state, instruction> such that the instruction and the state actually
interact.  For any given <state, instruction> tuple, changing the vast
majority of the state will have absolutely no effect -- meaning that
AFL's fuzzing is almost entirely wasted.  AFL will spend a huge amount
of time fuzzing bits of the state struct that cannot possibly have an
effect on the outcome, since the instuctions in the test case don't
interact with those bits at all.

With the "compact" input interpretation, once AFL finds a <offset,
content, instruction> tuple that correspond, changing any of the three
is pretty much guaranteed to have some effect; finding a set that
correspond should be much easier, and in any case fuzzing it should be a
lot faster.

As such, I don't think I'm trying to "fool" AFL: I'm just giving it
tools to search more effectively.

> The solution to this problem is remove the irrelevant information from
> fuzz_corpus.  I already started doing this with the alf-fast work for
> the Xen 4.9 release, but I've basically been doing security work ever
> since and haven't had time to continue it.
> 
> For the record, this hunk is how I intended to continue the work:
> 
> diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
> b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
> index 74e8c85..dafe435 100644
> --- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
> +++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
> @@ -24,7 +24,27 @@
>  /* Layout of data expected as fuzzing input. */
>  struct fuzz_corpus
>  {
> -    unsigned long cr[5];
> +    /* %cr0 */
> +    bool pe:1;
> +    bool mp:1;
> +    bool em:1;
> +    bool ts:1;
> +    bool pg:1;
> +
> +    /* %cr4 */
> +    bool vme:1;
> +    bool pvi:1;
> +    bool tsd:1;
> +    bool osfxsr:1;
> +    bool osxmmexcpt:1;
> +    bool umip:1;
> +    bool fsgsbase:1;
> +    bool osxsave:1;
> +
> +    /* EFER */
> +    bool sce:1;
> +    bool lme:1;

I'm 98% sure that the handful of bits in cr0 and cr4 aren't the problem.
 AFL isnt' looking at *bits* in the cr0 register, it's looking at that
as an interger, and I'm sure that it notices, "I add X to this and it
changes stuff, so it must be used."

The problem is these:

>      uint64_t msr[MSR_INDEX_MAX];
>      struct cpu_user_regs regs;
>      struct segment_register segments[SEG_NUM];

And in particular this one:
    char fxsave[512] __attribute__((aligned(16)));

Your proposal doesn't change the fact that for any given test case, the
vast majority of state won't interact with the instructions it contains
at all.

I've still got the scripts and analysis stuff, so if you send me a patch
you think is better, I'm happy to run it through and add it to the
comparison.

In any case, prediction based on expert intuition is good, but empirical
evidence is better*.  The graph I sent clearly shows that the 'compact'
format, with unlimited number of instructions, generates far more map
coverage in far less time than any other combination.  Unless you can
find some flaw in my methodology, or an alternate interpretation of the
data, I think we should go with the more 'compact' format.  The old
format is still available, and it would be easy to switch the default
back (or entirely remove the 'compact' format) anytime it is shown
empirically to be more effective.

-George

* For instance, if I hadn't done these graphs, I would have suggested an
instruction limit of 2 or 3, because my intiution told me that unlimited
instructions wasted AFL's time.  Which it does in the old format, but
apparently does not in the new.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2017-08-28  9:10 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-25 16:43 [PATCH 01/14] fuzz/x86_emulate: Remove redundant AFL hook George Dunlap
2017-08-25 16:43 ` [PATCH 02/14] x86emul/fuzz: add rudimentary limit checking George Dunlap
2017-08-25 16:43 ` [PATCH 03/14] fuzz/x86_emulate: Actually use cpu_regs input George Dunlap
2017-09-15 11:21   ` Wei Liu
2017-08-25 16:43 ` [PATCH 04/14] fuzz/x86_emulate: Add a better input size check George Dunlap
2017-08-25 17:42   ` Andrew Cooper
2017-09-15 11:39   ` Wei Liu
2017-09-25  9:36     ` George Dunlap
2017-09-25 11:08       ` George Dunlap
2017-08-25 16:43 ` [PATCH 05/14] fuzz/x86_emulate: Improve failure descriptions in x86_emulate harness George Dunlap
2017-09-15 11:41   ` Wei Liu
2017-09-15 11:47     ` George Dunlap
2017-08-25 16:43 ` [PATCH 06/14] fuzz/x86_emulate: Implement dread() and davail() George Dunlap
2017-08-25 17:45   ` Andrew Cooper
2017-09-14 17:06     ` George Dunlap
2017-09-25 11:40     ` George Dunlap
2017-08-25 16:43 ` [PATCH 07/14] fuzz/x86_emulate: Rename the file containing the wrapper code George Dunlap
2017-09-15 11:45   ` Wei Liu
2017-08-25 16:43 ` [PATCH 08/14] fuzz/x86_emulate: Add 'afl-cov' target George Dunlap
2017-09-15 12:55   ` Wei Liu
2017-09-15 12:57   ` Wei Liu
2017-09-15 13:28     ` George Dunlap
2017-08-25 16:43 ` [PATCH 09/14] fuzz/x86_emulate: Take multiple test files for inputs George Dunlap
2017-09-15 13:07   ` Wei Liu
2017-09-15 13:27     ` George Dunlap
2017-09-15 13:42       ` Wei Liu
2017-08-25 16:43 ` [PATCH 10/14] fuzz/x86_emulate: Move all state into fuzz_state George Dunlap
2017-08-25 16:43 ` [PATCH 11/14] fuzz/x86_emulate: Make input more compact George Dunlap
2017-08-25 16:52   ` George Dunlap
2017-08-25 17:59   ` Andrew Cooper
2017-08-28  9:10     ` George Dunlap [this message]
2017-08-25 16:43 ` [PATCH 12/14] fuzz/x86_emulate: Add --rerun option to try to track down instability George Dunlap
2017-09-15 13:30   ` Wei Liu
2017-08-25 16:43 ` [PATCH 13/14] fuzz/x86_emulate: Set and fuzz more CPU state George Dunlap
2017-08-25 16:43 ` [PATCH 14/14] fuzz/x86_emulate: Add an option to limit the number of instructions executed George Dunlap
2017-09-15 13:38   ` Wei Liu
2017-09-15 13:55     ` George Dunlap
2017-09-19 10:05       ` Wei Liu
2017-08-25 17:37 ` [PATCH 01/14] fuzz/x86_emulate: Remove redundant AFL hook Andrew Cooper
2017-08-28 10:34   ` George Dunlap
2017-09-14 15:26     ` George Dunlap
2017-09-22 15:47   ` George Dunlap
2017-09-22 16:09     ` Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cf91db57-f289-fee9-b836-737d96f99fee@citrix.com \
    --to=george.dunlap@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ian.jackson@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.