RE: RFC - kernel selftest result documentation (KTAP)

From: "Bird, Tim" <Tim.Bird@sony.com>
To: Kees Cook <keescook@chromium.org>
Cc: "shuah@kernel.org" <shuah@kernel.org>,
	"linux-kselftest@vger.kernel.org"
	<linux-kselftest@vger.kernel.org>,
	Brendan Higgins <brendanhiggins@google.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	David Gow <davidgow@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: RE: RFC - kernel selftest result documentation (KTAP)
Date: Mon, 15 Jun 2020 19:07:34 +0000	[thread overview]
Message-ID: <CY4PR13MB11757D57CD441C5CAEC3F257FD9C0@CY4PR13MB1175.namprd13.prod.outlook.com> (raw)
In-Reply-To: <202006141120.96FF8C5@keescook>

Kees,

Thanks for the great feedback.  See comments inline below.

> -----Original Message-----
> From: Kees Cook <keescook@chromium.org>
> 
> On Wed, Jun 10, 2020 at 06:11:06PM +0000, Bird, Tim wrote:
> > The kernel test result format consists of 5 major elements,
> > 4 of which are line-based:
> >  * the output version line
> >  * the plan line
> 
> Note: making the plan line required differs from TAP13 and TAP14. I
> think it's the right choice, but we should be clear.

Noted.  In re-reading my doc, I've conflated my sections.  The first
section is "single-line", and the next section is "optional".  ???
I'll fix that.

With regards to making it optional or not, I don't have a strong
preference.  The extra info seems helpful in some circumstances.
I don't know if it's too onerous to make it a requirement or not.
I'd prefer if it was always there (either at the beginning or the end),
but if there is some situation where it's quite difficult to calculate,
then it would be best not to mandate it. I can't think of any impossible
situations at the moment.

> 
> >  * one or more test result lines (also called test result lines)
> >  * a possible "Bail out!" line
> 
> "Bail out!" to be moved to "optional" elements, since it may not appear.
> And we should clarify TAP13 and TAP14's language to say it should only
> appear when the test is aborting without running later tests -- for this
> reason, I think the optional "description" following "Bail out!" should
> be made required. I.e. it must be: "Bail out! $reason"

I'll make sure this is listed as optional.
I like adding a mandatory reason.
> 
> > optional elements:
> >  * diagnostic data
> 
> nit: diagnostic lines (not data)
OK.

> 
> > The 5th element is diagnostic information, which is used to describe
> > items running in the test, and possibly to explain test results.
> > A sample test result is show below:
> >
> > Some other lines may be placed the test harness, and are not emitted
> > by individual test programs:
> >  * one or more test identification lines
> >  * a possible results summary line
> >
> > Here is an example:
> >
> > 	TAP version 13
> > 	1..1
> > 	# selftests: cpufreq: main.sh
> > 	# pid 8101's current affinity mask: fff
> > 	# pid 8101's new affinity mask: 1
> > 	ok 1 selftests: cpufreq: main.sh
> 
> Nit: for examples, I this should should show more than one test.
> (Preferably, it should show all the possible cases, ok, not ok, SKIP,
> etc.)
Agree.  I will expand this.

> 
> > The output version line is: "TAP version 13"
> >
> > The test plan is "1..1".
> >
> > Element details
> > ===============
> >
> > Output version line
> > -------------------
> > The output version line is always "TAP version 13".
> >
> > Although the kernel test result format has some additions
> > to the TAP13 format, the version line reported by kselftest tests
> > is (currently) always the exact string "TAP version 13"
> >
> > This is always the first line of test output.
> >
> > Test plan line
> > --------------
> > The test plan indicates the number of individual test cases intended to
> > be executed by the test. It always starts with "1.." and is followed
> > by the number of tests cases.  In the example above, 1..1", indicates
> > that this test reports only 1 test case.
> >
> > The test plan line can be placed in two locations:
> >  * the second line of test output, when the number of test cases is known
> >    in advance
> >  * as the last line of test output, when the number of test cases is not
> >    known in advance.
> >
> > Most often, the number of test cases is known in advance, and the test plan
> > line appears as the second line of test output, immediately following
> > the output version line.  The number of test cases might not be known
> > in advance if the number of tests is calculated from runtime data.
> > In this case, the test plan line is emitted as the last line of test
> > output.
> 
> "... must be ..." ?
> 
> >
> > Test result lines
> > -----------------
> > The test output consists of one or more test result lines that indicate
> > the actual results for the test.  These have the format:
> >
> >   <result> <number> <description> [<directive>] [<diagnostic data>]
> 
> This should be:
> 
> <result> <number> <description> [# [<directive> ][<diagnostic data>]]
> 
> >
> > The ''result'' must appear at the start of a line (except for when a
> > test is nested, see below), and must consist of one of the following
> > two phrases:
> >   * ok
> >   * not ok
> >
> > If the test passed, then the result is reported as "ok".  If the test
> > failed, then the result is reported as "not ok".  These must be in
> > lower case, exactly as shown.
> >
> > The ''number'' in the test result line represents the number of the
> > test case being performed by the test program.  This is often used by
> > test harnesses as a unique identifier for each test case.  The test
> > number is a base-10 number, starting with 1.  It should increase by
> > one for each new test result line emitted.  If possible the number
> > for a test case should be kept the same over the lifetime of the test.
> >
> > The ''description'' is a short description of the test case.
> > This can be any string of words, but should avoid using colons (':')
> 
> Must also avoid "#".
ok.
> 
> > except as part of a fully qualifed test case name (see below).
> 
> TAP13/14 makes description optional, are we making it required (I think
> we should). There seems to be a TAP13/14 "convention" of starting
> <description> with "- ", which I'm on the fence about it. It does make
> parsing maybe a little easier.

I would like the description to be required.
I don't have a strong opinion on the dash.  I'm OK with either one (dash
or no dash), but we should make kselftest and KUnit consistent.

> 
> > Finally, it is possible to use a test directive to indicate another
> > possible outcome for a test: that it was skipped.  To report that
> > a test case was skipped, the result line should start with the
> > result "not ok", and the directive "# SKIP" should be placed after
> > the test description. (Note that this deviates from the TAP13
> > specification).
> 
> This is what TAP14 changed, I think (i.e. directive follows description
> now).
> 
> >
> > A test may be skipped for a variety of reasons, ranging for
> > insufficient privileges to missing features or resources required
> > to execute that test case.
> >
> > It is usually helpful if a diagnostic message is emitted to explain
> > the reasons for the skip.  If the message is a single line and is
> > short, the diagnostic message may be placed after the '# SKIP'
> > directive on the same line as the test result.  However, if it is
> > not on the test result line, it should precede the test line (see
> > diagnostic data, next).
> >
> > Diagnostic data
> > ---------------
> > Diagnostic data is text that reports on test conditions or test
> > operations, or that explains test results.  In the kernel test
> > result format, diagnostic data is placed on lines that start with a
> > hash sign, followed by a space ('# ').
> >
> > One special format of diagnostic data is a test identification line,
> > that has the fully qualified name of a test case.  Such a test
> > identification line marks the start of test output for a test case.
> >
> > In the example above, there are three lines that start with '#'
> > which precede the test result line:
> > 	# selftests: cpufreq: main.sh
> > 	# pid 8101's current affinity mask: fff
> > 	# pid 8101's new affinity mask: 1
> > These are used to indicate diagnostic data for the test case
> > 'selftests: cpufreq: main.sh'
> >
> > Material in comments between the identification line and the test
> > result line are diagnostic data that can help to interpret the
> > results of the test.
> >
> > The TAP specification indicates that automated test harnesses may
> > ignore any line that is not one of the mandatory prescribed lines
> > (that is, the output format version line, the plan line, a test
> > result line, or a "Bail out!" line.)
> >
> > Bail out!
> > ---------
> > If a line in the test output starts with 'Bail out!', it indicates
> > that the test was aborted for some reason.  It indicates that
> > the test is unable to proceed, and no additional tests will be
> > performed.
> >
> > This can be used at the very beginning of a test, or anywhere in the
> > middle of the test, to indicate that the test can not continue.
> 
> I think the required syntax should be:
> 
> Bail out! <reason>
> 
> And to make it clear that this is optionally used to indicate an early
> abort. (Though with a leading plan line, a parser should be able to
> determine this on its own.)
> 
> > --- from here on is not-yet-organized material
> >
> > Tip:
> >  - don't change the test plan based on skipped tests.
> >    - it is better to report that a test case was skipped, than to
> >      not report it
> >    - that is, don't adjust the number of test cases based on skipped
> >      tests
> 
> Yes, totally.
> 
> > Other things to mention:
> > TAP13 elements not used:
> >  - yaml for diagnostic messages
> >    - reason: try to keep things line-based, since output from other things
> >    may be interspersed with messages from the test itself
> 
> Agreed: the yaml extension is not sensible for our use.
> 
> >  - TODO directive
> 
> Agreed: SKIP should cover everything TODO does.
> 
> > KTAP Extensions beyond TAP13:
> >  - nesting
> 
> (I would call this 'subtests')
Sounds good.  Will do.

> 
> >    - via indentation
> >      - indentation makes it easier for humans to read
> 
> And allows for separable parsing of subtests.
Agree.  I'll try to work that into the doc.

> 
> >  - test identifier
> >     - multiple parts, separated by ':'
> 
> This is interesting... is the goal to be able to report test status over
> time by name?
Yes.  KernelCI and Fuego have the notions of a testcase namespace
hierarchy.  As the number of tests expands it is helpful to have
the name-space for a sub-test be limited, just like a filesystem hierarchy
provides scope for the names of objects (directories and files) that
it contains.

> 
> >  - summary lines
> >    - can be skipped by CI systems that do their own calculations
> 
> Right -- I think per-test summary line should be included for the humans
> reading a single test (which may scroll off the screen).
> 
> > Other notes:
> >  - automatic assignment of result status based on exit code
> 
> This is, I think, a matter for the kselftest running infrastructure, not
> the KTAP output?
Agreed.  This doesn't have anything to do with the API between
the tests and the results processor.  I'll take it out.
> 
> > Tips:
> >  - do NOT describe the result in the test line
> >    - the test case description should be the same whether the test
> >      succeeds or fails
> >    - use diagnostic lines to describe or explain results, if this is
> >      desirable
> 
> Right.
> 
> >  - test numbers are considered harmful
> >    - test harnesses should use the test description as the identifier
> >    - test numbers change when testcases are added or removed
> >      - which means that results can't be compared between different
> >        versions of the test
> 
> Right.
> 
> >  - recommendations for diagnostic messages:
> >    - reason for failure
> >    - reason for skip
> >    - diagnostic data should always preceding the result line
> >      - problem: harness may emit result before test can do assessment
> >        to determine reason for result
> >      - this is what the kernel uses
> 
> Right.
> 
> > Differences between kernel test result format and TAP13:
> >  - in KTAP the "# SKIP" directive is placed after the description on
> >    the test result line
> 
> Right, this is the same as TAP14, IIRC. KTAP's big deltas are the "#"
> diagnostic lines and the subtest handling.
> 
> > ====== start of ktap-doc-rfc.txt ======
> > OK - that's the end of the RFC doc.
> >
> > Here are a few questions:
> >  - is this document desired or not?
> 
> Yes.
Great.  I'll make this a priority to work on.

> 
> >     - is it too long or too short?
> 
> Should be slightly longer: more examples.
> 
> >  - if the document is desired, where should it be placed?
> >    I assume somewhere under Documentation, and put into
> >    .rst format. Suggestions for a name and location are welcome.
> 
> Yes Documentation/*.rst Not sure on name yet, but where do kselftest
> docs live? :)
Documentation/dev-tools/kselftest.rst

I'll put this at: Documentation/dev-tools/test-results-format.rst

> 
> >  - is this document accurate?
> >    I think KUNIT does a few things differently than this description.
> 
> Let's fix it. :)
> 
> >    - is the intent to have kunit and kselftest have the same output format?
> >       if so, then these should be rationalized.
> 
> Yes please.
> 
> > Finally,
> >   - Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
> > See https://testanything.org/tap-version-13-specification.html
> 
> Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it
> did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP
> relate? Neither SKIP nor XFAIL count toward failure, though, so both
> should be "ok"? I guess we should change it to "ok".

I have the same initial impression.  In my mind, a skip is "not ok", since
the test didn't run. However, a SKIP and should be treated differently
from either "ok" or "not ok" by the results interpreter, so I don't think it
matters.  Originally I was averse to changing the SKIP result to "ok"
(as suggested by Paulo Bonzini [1]), but I checked and it's pretty trivial to
change the parser in Fuego, and it would make the kernel results format
match the TAP13 spec.  I don't see a strong reason for us to be different
from TAP13 here.

I raised this issue on our automated testing conference call last week
(which includes people from the CKI, Fuego, KernelCI and LKFT projects), and
so people should be chiming in if their parser will have a problem with this change.)

[1]  https://lkml.kernel.org/lkml/20200610154447.15826-1-pbonzini@redhat.com/T/

Thanks very much for the feedback.
 -- Tim