RE: Testmgr fuzz testing

From: Pascal Van Leeuwen <pvanleeuwen@verimatrix.com>
To: Eric Biggers <ebiggers@kernel.org>
Cc: "linux-crypto@vger.kernel.org" <linux-crypto@vger.kernel.org>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	"davem@davemloft.net" <davem@davemloft.net>
Subject: RE: Testmgr fuzz testing
Date: Tue, 23 Jul 2019 00:56:36 +0000	[thread overview]
Message-ID: <MN2PR20MB297309E29BFDFA5C6377BEBECAC70@MN2PR20MB2973.namprd20.prod.outlook.com> (raw)
In-Reply-To: <20190722230641.GA22126@gmail.com>

> -----Original Message----
> From: Eric Biggers <ebiggers@kernel.org>
> Sent: Tuesday, July 23, 2019 1:07 AM
> To: Pascal Van Leeuwen <pvanleeuwen@verimatrix.com>
> Cc: linux-crypto@vger.kernel.org; Herbert Xu <herbert@gondor.apana.org.au>; davem@davemloft.net
> Subject: Re: Testmgr fuzz testing
> 
> > >
> > > Sure, it's not always repeatable, but that's the nature of fuzz testing.
> > >
> > No. It's not repeatable *at all*. The odds of getting the exact same sequence of random
> > numbers should approach zero, assuming that random generator is half decent and
> > properly seeded with another (true?) random value.
> >
> > For us hardware engineers, (constrained) random testing is our bread and butter.
> > Given the design complexity we're dealing with today and the fact that any serious bug may
> > actually put us out of business (considering mask costs and respin turnaround), it's the only
> > way to cope. So I have many years of experience and I can assure you that being "not always
> > repeatable" does not NEED to be the "nature of fuzz testing".  As one of the first things you
> > learn (the hard way) is to log the random seed(s) and make them controllable ... somehow.
> > Because nothing is as frustrating as finding a bug after days of simulation and then
> > not being able to reproduce it with waves and debugging enabled ...
> >
> > >We *could* start with a constant seed, but then all the random numbers would change
> > > every time anyone made any minor change to the fuzz tests anyway.
> > >
> > Yes, this is a well known fact: any change to either the design, the test environment or
> > the test itself may affect the random generation and cause different behavior.
> > Which may be a problem if you want to hit a specific case with 100% certainty - in
> > which case you should indeed make a dedicated test for that instead.
> > (but usually, you don't realise the corner case exists until you first hit it ...)
> >
> > *However* this is NOT relevant to the repeatability-for-debugging situation, as in
> > that case, you should should not change *anything* until you've thoroughly root-
> > caused the issue. (or created a baseline in your source control system such that you
> > can always go back to the *exact* situation  that caused the error).
> > This is (hardware) verification 101.
> 
> I meant repeatable in the sense that the same bug is hit, even if the generated
> test case is not 100% identical.
> 
It may not need to be 100% identical, sure, but I would still assert that the odds
of hitting the same obscure corner case bug may not be as high as you might hope.
And for me,  "repeatable" means that I can *depend* on hitting it and don't need
to be "lucky" on the next boot. My recent personal experiences are not so good :-(

What's more tricky: now I implemented a fix. So my *expectation* is that it passes.
And it actually does. But does that mean my fix *really* worked? Or am I just not
hitting the relevant corner case anymore? Perhaps even due to my change itself
influencing the random generation? How do I possibly know for sure?'

I may end up submitting a patch that does not work at all. It may even introduce a
new corner case issue that's not hit either. If there's a lot of people out there
trying my patch I may have a reasonable chance one of them finds the problem,
but fact of the matter is, AFAIK, I only have one person able to try my patches.

> Anyway, you're welcome to send a patch changing the fuzzing code to use a
> constant seed, if you really think it would be useful and can provide proper
> justification for it.  I'm not sure why you keep sending these long rants, when
> you could simply send a patch yourself.
> 
Ok, oops, I did not mean for that to be a rant.  And I'm usually pretty verbose, so 
feel free to TL;DR :-)

I did have a quick look at the random generation to see if I could fix the seed but 
it currently seems to be global per CPU, not just for testmgr, so probably no 
guarantee to get the same sequence inside testmgr even if I do manage to fix 
that seed (but I didn't try yet).

To give you an honest answer on that last question: lack of time and skills.
I'm not a software engineer and my boss merely tolerates me working on the 
driver, so I can only spend idle cycles on this.

> >
> > > In my experience the bugs found by the fuzz tests tend to be found within a
> > > couple hundred iterations, so are seen within a few boots at most with the
> > > default fuzz_iterations=100, and are "always" seen with fuzz_iterations=1000.
> > > Raising fuzz_iterations to 10000 didn't find anything else.
> > >
> > That may be because you've been working mostly with software implementations
> > which were already in pretty good shape to begin with. Hardware tends to have
> > many more (tricky) corner cases.
> > The odds of hitting a specific corner case in just 100 or 1000 vectors is really not
> > as high as you may think. Especially if many of the vectors being generated are
> > actually illegal and just test for proper error response from the driver.
> >
> > Just try to compute the (current) odds of getting an AAD length and cipher text
> > length that are zero at the same time. Which is a relevant corner case at least for
> > our hardware. In fact, having the digestsize zero at the same time as well is yet
> > another corner case requiring yet another workaround. The odds of those 3
> > cases colliding while generating random lengths over a decent range are really,
> > really slim.
> >
> > Also, how fast you can reboot depends very much on the platform you're
> > working on. It quickly becomes annoying if rebooting takes minutes and plenty
> > of manual interaction. Speaking from experience.
> >
> > > If you find otherwise and come across some really rare case, you should either
> > > add a real test vector (i.e. not part of the fuzz tests) for it,
> > >
> > The problem with "generating a test vector for it" is that it requires a known-
> > good reference implementation, which is usually hard to find. And then you
> > have to convert it to testmgr.h format and add it there manually. Which is both
> > cumbersome and will cause testmgr.h (and kernel size) to explode at some point.
> >
> > While in the overal majority of cases you don't really care about the input data
> > itself at all, so it's fine to generate that randomly, what you care about are things
> > like specific lengths, alignments, sizes, IV (counter) values and combinations thereof.
> >
> > Also, adding those as "normal" test vectors will cause the normal boot produre to
> > slow to a crawl verifying  loads of obscure corner cases that are really only relevant
> > during development anyway.
> >
> > And why bother if you have the generic implementation available to do all
> > this on-the-fly and only with the extra tests enabled, just like you do with the full
> > random vectors?
> >
> > It was just a crazy idea anyway, based on a real-life observation I made.
> 
> Being too lazy to add a test vector isn't really an excuse
>
Lazyness has very little to do with it. Adding generated vectors to testmgr.h just
feels like a waste of effort and space, as you could generate them on-the-fly.
(if the generic implementation is known to be good for those cases)

> , and it won't bloat
> the kernel size or boot time unless you add a massive number of test vectors or
> if they use very large lengths. 
>
It will increase the kernel size and boot time.  Beyond what threshold that will be 
considered bloat is in the eye of the beholder I suppose. And some corner cases
(i.e. length restrictions, hash block size boundaries, counter sizes, ...) may indeed 
require significant lengths, so with so many cipher suites that adds up quickly.

> We could even make certain test vectors
> conditional on CONFIG_CRYPTO_MANAGER_EXTRA_TESTS if it's an issue...
> 
Ok, I guess that would solve the bloat issue as well (not counting testmgr.h itself).

> >
> > > or you should
> > > update the fuzz tests to generate the case more often so that it's likely to be
> > > hit with the default fuzz_iterations=100.
> > >
> > Running just 10 times more iterations is really not going to be sufficient to hit
> > the really tricky *combinations* of parameters, considering the ranges of the
> > individual parameters. Just do some statistics on that and you'll soon realise.
> >
> > Just some example for the AEAD corner case I mentioned before:
> >
> > The odds of generating a zero authsize are 1/4 * 1/17 = 1/68. But that will still
> > need to coincide with all other parameters (including key sizes) being legal.
> > So there already, your 100 iterations come short.
> >
> > A zero length AAD *and* plaintext happens only once every 8000+ vectors.
> > And that also still has to coincide with key sizes etc. being legal. So there even
> > 10000 iterations would not be enough to be *sure* to hit that.
> >
> > To have some reasonable shot at hitting the combination those two cases
> > you'd need well over a million iterations ...
> >
> > > I don't think it's necessary to split
> > > the fuzz testing into 2 parts; instead we just need to boost the probability of
> > > generating known edge cases (e.g. see what generate_random_bytes() already does).
> > >
> > I guess somehow tweaking the random generation such that the probability of
> > generating the interesting cases becomes *significantly* higher would work too.
> > To me, that's just implementation detail though.
> >
> 
> Like I said, if you encounter bugs that the fuzz tests should be finding but
> aren't, it would be really helpful if you added test vectors 
>
The only vectors I have easy access to are the ones we use to verify our HW.
And that's exactly the problem here - these are often cases that we don't formally 
support with our HW and require driver workarounds. Hence, I don't have
those vectors readily available.

> for them and/or
> updated the fuzz tests to generate those cases more often.  
>
Sounds like a plan, but easier said than done ... relevant corner cases may
differ from cipher suite to cipher suite so you need some way to control the
random generation of the various parameters from alg_test_descs[x].
But OK, fair enough, I'll try and see if I can come up with something myself ... 
if I can find some cycles to actually work on that ... don't hold you breath for it.

> I was simply
> pointing out that to do the latter, we don't really need to split the tests into
> 2 parts; it would be sufficient just to change the probabilities with which
> different things are generated.  Note that the probabilities can be conditional
> on other things, which can get around the issue where a bunch of small
> independent probabilities are multiplied together.
> 
> - Eric

Regards,
Pascal van Leeuwen
Silicon IP Architect, Multi-Protocol Engines @ Verimatrix
www.insidesecure.com