RE: [PATCH linux-kselftest/test v2] ext4: add kunit test for decoding extended timestamps

From: <Tim.Bird@sony.com>
To: <tytso@mit.edu>, <skhan@linuxfoundation.org>
Cc: <brendanhiggins@google.com>, <yzaikin@google.com>,
	<linux-kselftest@vger.kernel.org>, <linux-ext4@vger.kernel.org>,
	<adilger.kernel@dilger.ca>, <kunit-dev@googlegroups.com>
Subject: RE: [PATCH linux-kselftest/test v2] ext4: add kunit test for decoding extended timestamps
Date: Thu, 17 Oct 2019 22:25:35 +0000	[thread overview]
Message-ID: <ECADFF3FD767C149AD96A924E7EA6EAF977D0023@USCULXMSG01.am.sony.com> (raw)
In-Reply-To: <20191017120833.GA25548@mit.edu>

> -----Original Message-----
> From: Theodore Y. Ts'o on October 17, 2019 2:09 AM
> 
> On Wed, Oct 16, 2019 at 05:26:29PM -0600, Shuah Khan wrote:
> >
> > I don't really buy the argument that unit tests should be deterministic
> > Possibly, but I would opt for having the ability to feed test data.
> 
> I strongly believe that unit tests should be deterministic.
> Non-deterministic tests are essentially fuzz tests.  And fuzz tests
> should be different from unit tests.

I'm not sure I have the entire context here, but I think deterministic
might not be the right word, or it might not capture the exact meaning
intended.

I think there are multiple issues here:
 1. Does the test enclose all its data, including working data and expected results?
Or, does the test allow someone to provide working data?  This alternative
implies that either the some of testcases or the results might be different depending on
the data that is provided.  IMHO the test would be deterministic if it always produced
the same results based on the same data inputs.  And if the input data was deterministic.
I would call this a data-driven test.

Since the results would be dependent on the data provided, the results
from tests using different data would not be comparable.  Essentially,
changing the input data changes the test so maybe it's best to consider
this a different test.  Like 'test-with-data-A' and 'test-with-data-B'.

2. Does the test automatically detect some attribute of the system, and adjust
its operation based on that (does the test probe?)  This is actually quite common
if you include things like when a test requires root access to run.  Sometimes such tests,
when run without root privilege, run as many testcases as possible not as root, and skip
the testcases that require root.

In general, altering the test based on probed data is a form of data-driven test,
except the data is not provided by the user.  Whether
this is deterministic in the sense of (1) depends on whether the data that
is probed is deterministic.  In the case or requiring root, then it should
not change from run to run (and it should probably be reflected in the characterization
of the results).

Maybe neither of the above cases fall in the category of unit tests, but
they are not necessarily fuzzing tests.  IMHO a fuzzing test is one which randomizes
the data for a data-driven test (hence using non-deterministic data).  Once the fuzzer
has found a bug, and the data and code for a test is fixed into a reproducer program,
then at that point it should be deterministic (modulo what I say about race condition
tests below).

> 
> We want unit tests to run quickly.  Fuzz tests need to be run for a
> large number of passes (perhaps hours) in order to be sure that we've
> hit any possible bad cases.  We want to be able to easily bisect fuzz
> tests --- preferably, automatically.  And any kind of flakey test is
> hell to bisect.
Agreed.

> It's bad enough when a test is flakey because of the underlying code.
> But when a test is flakey because the test inputs are
> non-deterministic, it's even worse.
I very much agree on this as well.

I'm not sure how one classes a program that seeks to invoke a race condition.
This can take variable time, so in that sense it is not deterministic.   But it should
produce the same result if the probabilities required for the race condition
to be hit are fulfilled.  Probably (see what I did there :-), one needs to take
a probabilistic approach to reproducing and bisecting such bugs.  The duration
or iterations required to reproduce the bug (to some confidence level) may
need to be included with the reproducer program.  I'm not sure if the syskaller
reproducers do this or not, or if they just run forever.  One I looked at ran forever.
But you would want to limit this in order to produce results with some confidence
level (and not waste testing resources).

---
The reason I want get clarity on the issue of data-driven tests is that I think
data-driven tests and tests that probe are very much desirable.  This allows a
test to be able to be more generalized and allows for specialization of the
test for more scenarios without re-coding it.
I'm not sure if this still qualifies as unit testing, but it's very useful as a means to
extend the value of a test.  We haven't trod into the mocking parts of kunit,
but I'm hoping that it may be possible to have that be data-driven (depending on
what's being mocked), to make it easier to test more things using the same code.

Finally, I think the issue of testing speed is orthogonal to whether a test is self-enclosed
or data-driven.  Definitely fuzzers, which are experimenting with system interaction
in a non-deterministic way, have speed problems.

Just my 2 cents.
 -- Tim