Re: [PATCH 11/14] fuzz/x86_emulate: Make input more compact

From: George Dunlap <george.dunlap@citrix.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>,
	xen-devel@lists.xenproject.org
Cc: Ian Jackson <ian.jackson@citrix.com>,
	Wei Liu <wei.liu2@citrix.com>, Jan Beulich <jbeulich@suse.com>
Subject: Re: [PATCH 11/14] fuzz/x86_emulate: Make input more compact
Date: Mon, 28 Aug 2017 10:10:18 +0100	[thread overview]
Message-ID: <cf91db57-f289-fee9-b836-737d96f99fee@citrix.com> (raw)
In-Reply-To: <aa1a6bb3-96b2-d3df-f7b2-95d8b67ca8ef@citrix.com>

On 08/25/2017 06:59 PM, Andrew Cooper wrote:
> On 25/08/17 17:43, George Dunlap wrote:
>> At the moment, AFL reckons that for any given input, 87% of it is
>> completely irrelevant: that is, it can change it as much as it wants
>> but have no impact on the result of the test; and yet it can't remove
>> it.
>>
>> This is largely because we interpret the blob handed to us as a large
>> struct, including CR values, MSR values, segment registers, and a full
>> cpu_user_regs.
>>
>> Instead, modify our interpretation to have a "set state" stanza at the
>> front.  Begin by reading a byte; if it is lower than a certain
>> threshold, set some state according to what byte it is, and repeat.
>> Continue until the byte is above a certain threshold.
>>
>> This allows AFL to compact any given test case much smaller; to the
>> point where now it reckons there is not a single byte of the test file
>> which becomes irrelevant.  Testing have shown that this option both
>> allows AFL to reach coverage much faster, and to have a total coverage
>> higher than with the old format.
>>
>> Make this an option (rather than a unilateral change) to enable
>> side-by-side performance comparison of the old and new formats.
>>
>> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
> 
> I continue to think this is a bad idea.  You are taking a genuine
> problem and adding a complicated algorithm to try and fool alf, rather
> than fixing the problem.
> 
> The reason 87% of input is irrelevant is because it really is.  The
> input state is full of 64bit values being used for a one or two bits
> which we ever look at.

My take on it is this.

The state struct is very large -- I think it's around 500 bytes without
the FPU state, and over 1k with it.

Nearly every bit of the state has *some* instruction that interacts with
it.  But in order for AFL to notice that, it has to find a combination
<state, instruction> such that the instruction and the state actually
interact.  For any given <state, instruction> tuple, changing the vast
majority of the state will have absolutely no effect -- meaning that
AFL's fuzzing is almost entirely wasted.  AFL will spend a huge amount
of time fuzzing bits of the state struct that cannot possibly have an
effect on the outcome, since the instuctions in the test case don't
interact with those bits at all.

With the "compact" input interpretation, once AFL finds a <offset,
content, instruction> tuple that correspond, changing any of the three
is pretty much guaranteed to have some effect; finding a set that
correspond should be much easier, and in any case fuzzing it should be a
lot faster.

As such, I don't think I'm trying to "fool" AFL: I'm just giving it
tools to search more effectively.

> The solution to this problem is remove the irrelevant information from
> fuzz_corpus.  I already started doing this with the alf-fast work for
> the Xen 4.9 release, but I've basically been doing security work ever
> since and haven't had time to continue it.
> 
> For the record, this hunk is how I intended to continue the work:
> 
> diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
> b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
> index 74e8c85..dafe435 100644
> --- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
> +++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
> @@ -24,7 +24,27 @@
>  /* Layout of data expected as fuzzing input. */
>  struct fuzz_corpus
>  {
> -    unsigned long cr[5];
> +    /* %cr0 */
> +    bool pe:1;
> +    bool mp:1;
> +    bool em:1;
> +    bool ts:1;
> +    bool pg:1;
> +
> +    /* %cr4 */
> +    bool vme:1;
> +    bool pvi:1;
> +    bool tsd:1;
> +    bool osfxsr:1;
> +    bool osxmmexcpt:1;
> +    bool umip:1;
> +    bool fsgsbase:1;
> +    bool osxsave:1;
> +
> +    /* EFER */
> +    bool sce:1;
> +    bool lme:1;

I'm 98% sure that the handful of bits in cr0 and cr4 aren't the problem.
 AFL isnt' looking at *bits* in the cr0 register, it's looking at that
as an interger, and I'm sure that it notices, "I add X to this and it
changes stuff, so it must be used."

The problem is these:

>      uint64_t msr[MSR_INDEX_MAX];
>      struct cpu_user_regs regs;
>      struct segment_register segments[SEG_NUM];

And in particular this one:
    char fxsave[512] __attribute__((aligned(16)));

Your proposal doesn't change the fact that for any given test case, the
vast majority of state won't interact with the instructions it contains
at all.

I've still got the scripts and analysis stuff, so if you send me a patch
you think is better, I'm happy to run it through and add it to the
comparison.

In any case, prediction based on expert intuition is good, but empirical
evidence is better*.  The graph I sent clearly shows that the 'compact'
format, with unlimited number of instructions, generates far more map
coverage in far less time than any other combination.  Unless you can
find some flaw in my methodology, or an alternate interpretation of the
data, I think we should go with the more 'compact' format.  The old
format is still available, and it would be easy to switch the default
back (or entirely remove the 'compact' format) anytime it is shown
empirically to be more effective.

-George

* For instance, if I hadn't done these graphs, I would have suggested an
instruction limit of 2 or 3, because my intiution told me that unlimited
instructions wasted AFL's time.  Which it does in the old format, but
apparently does not in the new.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel