linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RFC - kernel selftest result documentation (KTAP)
@ 2020-06-10 18:11 Bird, Tim
  2020-06-13  5:07 ` David Gow
                   ` (4 more replies)
  0 siblings, 5 replies; 43+ messages in thread
From: Bird, Tim @ 2020-06-10 18:11 UTC (permalink / raw)
  To: shuah, linux-kselftest, Brendan Higgins; +Cc: linux-kernel

Some months ago I started work on a document to formalize how
kselftest implements the TAP specification.  However, I didn't finish
that work.  Maybe it's time to do so now.

kselftest has developed a few differences from the original
TAP specification, and  some extensions that I believe are worth
documenting.

Essentially, we have created our own KTAP (kernel TAP)
format.  I think it is worth documenting our conventions, in order to
keep everyone on the same page.

Below is a partially completed document on my understanding
of KTAP, based on examination of some of the kselftest test
output.  I have not reconciled this with the kunit output format,
which I believe has some differences (which maybe we should
resolve before we get too far into this).

I submit the document now, before it is finished, because a patch
was recently introduced to alter one of the result conventions
(from SKIP='not ok' to SKIP='ok').

See the document include inline below

====== start of ktap-doc-rfc.txt ======
Selftest preferred output format
--------------------------------

The linux kernel selftest system uses TAP (Test Anything Protocol)
output to make testing results consumable by automated systems.  A
number of Continuous Integration (CI) systems test the kernel every
day.  It is useful for these systems that output from selftest
programs be consistent and machine-parsable.

At the same time, it is useful for test results to be human-readable
as well.

The kernel test result format is based on a variation TAP
TAP is a simple text-based format that is
documented on the TAP home page (http://testanything.org/).  There
is an official TAP13 specification here:
http://testanything.org/tap-version-13-specification.html

The kernel test result format consists of 5 major elements,
4 of which are line-based:
 * the output version line
 * the plan line
 * one or more test result lines (also called test result lines)
 * a possible "Bail out!" line

optional elements:
 * diagnostic data

The 5th element is diagnostic information, which is used to describe
items running in the test, and possibly to explain test results.
A sample test result is show below:

Some other lines may be placed the test harness, and are not emitted
by individual test programs:
 * one or more test identification lines
 * a possible results summary line

Here is an example:

	TAP version 13
	1..1
	# selftests: cpufreq: main.sh
	# pid 8101's current affinity mask: fff
	# pid 8101's new affinity mask: 1
	ok 1 selftests: cpufreq: main.sh

The output version line is: "TAP version 13"

The test plan is "1..1".

Element details
===============

Output version line
-------------------
The output version line is always "TAP version 13".

Although the kernel test result format has some additions
to the TAP13 format, the version line reported by kselftest tests
is (currently) always the exact string "TAP version 13"

This is always the first line of test output.

Test plan line
--------------
The test plan indicates the number of individual test cases intended to
be executed by the test. It always starts with "1.." and is followed
by the number of tests cases.  In the example above, 1..1", indicates
that this test reports only 1 test case.

The test plan line can be placed in two locations:
 * the second line of test output, when the number of test cases is known
   in advance
 * as the last line of test output, when the number of test cases is not
   known in advance.

Most often, the number of test cases is known in advance, and the test plan
line appears as the second line of test output, immediately following
the output version line.  The number of test cases might not be known
in advance if the number of tests is calculated from runtime data.
In this case, the test plan line is emitted as the last line of test
output.

Test result lines
-----------------
The test output consists of one or more test result lines that indicate
the actual results for the test.  These have the format:

  <result> <number> <description> [<directive>] [<diagnostic data>]

The ''result'' must appear at the start of a line (except for when a
test is nested, see below), and must consist of one of the following
two phrases:
  * ok
  * not ok

If the test passed, then the result is reported as "ok".  If the test
failed, then the result is reported as "not ok".  These must be in
lower case, exactly as shown.

The ''number'' in the test result line represents the number of the
test case being performed by the test program.  This is often used by
test harnesses as a unique identifier for each test case.  The test
number is a base-10 number, starting with 1.  It should increase by
one for each new test result line emitted.  If possible the number
for a test case should be kept the same over the lifetime of the test.

The ''description'' is a short description of the test case.
This can be any string of words, but should avoid using colons (':')
except as part of a fully qualifed test case name (see below).

Finally, it is possible to use a test directive to indicate another
possible outcome for a test: that it was skipped.  To report that
a test case was skipped, the result line should start with the
result "not ok", and the directive "# SKIP" should be placed after
the test description. (Note that this deviates from the TAP13
specification).

A test may be skipped for a variety of reasons, ranging for
insufficient privileges to missing features or resources required
to execute that test case.

It is usually helpful if a diagnostic message is emitted to explain
the reasons for the skip.  If the message is a single line and is
short, the diagnostic message may be placed after the '# SKIP'
directive on the same line as the test result.  However, if it is
not on the test result line, it should precede the test line (see
diagnostic data, next).

Diagnostic data
---------------
Diagnostic data is text that reports on test conditions or test
operations, or that explains test results.  In the kernel test
result format, diagnostic data is placed on lines that start with a
hash sign, followed by a space ('# ').

One special format of diagnostic data is a test identification line,
that has the fully qualified name of a test case.  Such a test
identification line marks the start of test output for a test case.

In the example above, there are three lines that start with '#'
which precede the test result line:
	# selftests: cpufreq: main.sh
	# pid 8101's current affinity mask: fff
	# pid 8101's new affinity mask: 1
These are used to indicate diagnostic data for the test case
'selftests: cpufreq: main.sh'

Material in comments between the identification line and the test
result line are diagnostic data that can help to interpret the
results of the test.

The TAP specification indicates that automated test harnesses may
ignore any line that is not one of the mandatory prescribed lines
(that is, the output format version line, the plan line, a test
result line, or a "Bail out!" line.) 

Bail out!
---------
If a line in the test output starts with 'Bail out!', it indicates
that the test was aborted for some reason.  It indicates that 
the test is unable to proceed, and no additional tests will be
performed.

This can be used at the very beginning of a test, or anywhere in the
middle of the test, to indicate that the test can not continue.

--- from here on is not-yet-organized material

Tip:
 - don't change the test plan based on skipped tests.
   - it is better to report that a test case was skipped, than to
     not report it
   - that is, don't adjust the number of test cases based on skipped
     tests

Other things to mention:
TAP13 elements not used:
 - yaml for diagnostic messages
   - reason: try to keep things line-based, since output from other things
   may be interspersed with messages from the test itself
 - TODO directive

KTAP Extensions beyond TAP13:
 - nesting
   - via indentation
     - indentation makes it easier for humans to read
 - test identifier
    - multiple parts, separated by ':'
 - summary lines
   - can be skipped by CI systems that do their own calculations

Other notes:
 - automatic assignment of result status based on exit code

Tips:
 - do NOT describe the result in the test line
   - the test case description should be the same whether the test
     succeeds or fails
   - use diagnostic lines to describe or explain results, if this is
     desirable
 - test numbers are considered harmful
   - test harnesses should use the test description as the identifier
   - test numbers change when testcases are added or removed
     - which means that results can't be compared between different
       versions of the test
 - recommendations for diagnostic messages:
   - reason for failure
   - reason for skip
   - diagnostic data should always preceding the result line
     - problem: harness may emit result before test can do assessment
       to determine reason for result
     - this is what the kernel uses

Differences between kernel test result format and TAP13:
 - in KTAP the "# SKIP" directive is placed after the description on
   the test result line

====== start of ktap-doc-rfc.txt ======
OK - that's the end of the RFC doc.

Here are a few questions:
 - is this document desired or not?
    - is it too long or too short?
 - if the document is desired, where should it be placed?
   I assume somewhere under Documentation, and put into
   .rst format. Suggestions for a name and location are welcome.
 - is this document accurate?
   I think KUNIT does a few things differently than this description.
   - is the intent to have kunit and kselftest have the same output format?
      if so, then these should be rationalized.

Finally,
  - Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
See https://testanything.org/tap-version-13-specification.html

Regards,
 -- Tim

   

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-10 18:11 RFC - kernel selftest result documentation (KTAP) Bird, Tim
@ 2020-06-13  5:07 ` David Gow
  2020-06-15 17:34   ` Bird, Tim
  2020-06-14 18:17 ` Kees Cook
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 43+ messages in thread
From: David Gow @ 2020-06-13  5:07 UTC (permalink / raw)
  To: Bird, Tim; +Cc: shuah, linux-kselftest, Brendan Higgins, linux-kernel

On Thu, Jun 11, 2020 at 2:11 AM Bird, Tim <Tim.Bird@sony.com> wrote:
>
> Some months ago I started work on a document to formalize how
> kselftest implements the TAP specification.  However, I didn't finish
> that work.  Maybe it's time to do so now.
>
> kselftest has developed a few differences from the original
> TAP specification, and  some extensions that I believe are worth
> documenting.
>
> Essentially, we have created our own KTAP (kernel TAP)
> format.  I think it is worth documenting our conventions, in order to
> keep everyone on the same page.
>
> Below is a partially completed document on my understanding
> of KTAP, based on examination of some of the kselftest test
> output.  I have not reconciled this with the kunit output format,
> which I believe has some differences (which maybe we should
> resolve before we get too far into this).

Thanks for doing this! This is something we've wanted to have for a while!

On the KUnit side of things, we've not (intentionally) deviated too
much from TAP/kselftest
It's certainly our intention to hew as close as possible to what
kselftest is doing: I don't think there are any real conflicts
conceptually (at least at the moment), but we're almost certainly
handling a few details differently.

One other thing worth noting is that KUnit has a parser for our TAP
results: './tools/testing/kunit/kunit.py parse' will do some basic
parsing and print out results, a summary, etc.

A few other comments below:

> I submit the document now, before it is finished, because a patch
> was recently introduced to alter one of the result conventions
> (from SKIP='not ok' to SKIP='ok').
>
> See the document include inline below
>
> ====== start of ktap-doc-rfc.txt ======
> Selftest preferred output format
> --------------------------------
>
> The linux kernel selftest system uses TAP (Test Anything Protocol)
> output to make testing results consumable by automated systems.  A
> number of Continuous Integration (CI) systems test the kernel every
> day.  It is useful for these systems that output from selftest
> programs be consistent and machine-parsable.
>
> At the same time, it is useful for test results to be human-readable
> as well.
>
> The kernel test result format is based on a variation TAP
> TAP is a simple text-based format that is
> documented on the TAP home page (http://testanything.org/).  There
> is an official TAP13 specification here:
> http://testanything.org/tap-version-13-specification.html
>
> The kernel test result format consists of 5 major elements,
> 4 of which are line-based:
>  * the output version line
>  * the plan line
>  * one or more test result lines (also called test result lines)
>  * a possible "Bail out!" line
>
> optional elements:
>  * diagnostic data
>
> The 5th element is diagnostic information, which is used to describe
> items running in the test, and possibly to explain test results.
> A sample test result is show below:
>
> Some other lines may be placed the test harness, and are not emitted
> by individual test programs:
>  * one or more test identification lines
>  * a possible results summary line
>
> Here is an example:
>
>         TAP version 13
>         1..1
>         # selftests: cpufreq: main.sh
>         # pid 8101's current affinity mask: fff
>         # pid 8101's new affinity mask: 1
>         ok 1 selftests: cpufreq: main.sh
>
> The output version line is: "TAP version 13"
>
> The test plan is "1..1".
>
> Element details
> ===============
>
> Output version line
> -------------------
> The output version line is always "TAP version 13".
>
> Although the kernel test result format has some additions
> to the TAP13 format, the version line reported by kselftest tests
> is (currently) always the exact string "TAP version 13"
>
> This is always the first line of test output.

KUnit is currently outputting "TAP version 14", as we were hoping some
of our changes would get into the TAP14 spec. (Any comments, Brendan?)
Maybe this should end up saying "KTAP version 1" or something?

> Test plan line
> --------------
> The test plan indicates the number of individual test cases intended to
> be executed by the test. It always starts with "1.." and is followed
> by the number of tests cases.  In the example above, 1..1", indicates
> that this test reports only 1 test case.
>
> The test plan line can be placed in two locations:
>  * the second line of test output, when the number of test cases is known
>    in advance
>  * as the last line of test output, when the number of test cases is not
>    known in advance.
>
> Most often, the number of test cases is known in advance, and the test plan
> line appears as the second line of test output, immediately following
> the output version line.  The number of test cases might not be known
> in advance if the number of tests is calculated from runtime data.
> In this case, the test plan line is emitted as the last line of test
> output.

KUnit is currently including the test plan line only for subtests, as
the current version doesn't actually know how many test suites will
run in advance.
This is something there's work underway to fix, though.

> Test result lines
> -----------------
> The test output consists of one or more test result lines that indicate
> the actual results for the test.  These have the format:
>
>   <result> <number> <description> [<directive>] [<diagnostic data>]
>
> The ''result'' must appear at the start of a line (except for when a
> test is nested, see below), and must consist of one of the following
> two phrases:
>   * ok
>   * not ok
>
> If the test passed, then the result is reported as "ok".  If the test
> failed, then the result is reported as "not ok".  These must be in
> lower case, exactly as shown.
>
> The ''number'' in the test result line represents the number of the
> test case being performed by the test program.  This is often used by
> test harnesses as a unique identifier for each test case.  The test
> number is a base-10 number, starting with 1.  It should increase by
> one for each new test result line emitted.  If possible the number
> for a test case should be kept the same over the lifetime of the test.
>
> The ''description'' is a short description of the test case.
> This can be any string of words, but should avoid using colons (':')
> except as part of a fully qualifed test case name (see below).
>
> Finally, it is possible to use a test directive to indicate another
> possible outcome for a test: that it was skipped.  To report that
> a test case was skipped, the result line should start with the
> result "not ok", and the directive "# SKIP" should be placed after
> the test description. (Note that this deviates from the TAP13
> specification).
>
> A test may be skipped for a variety of reasons, ranging for
> insufficient privileges to missing features or resources required
> to execute that test case.
>
> It is usually helpful if a diagnostic message is emitted to explain
> the reasons for the skip.  If the message is a single line and is
> short, the diagnostic message may be placed after the '# SKIP'
> directive on the same line as the test result.  However, if it is
> not on the test result line, it should precede the test line (see
> diagnostic data, next).

We're in the process of supporting test skipping in KUnit at the
moment[1], and haven't totally formalised what the syntax here should
be. The only output issues thus far have been on the "ok"/"not ok"
point (my in-progress patch is using 'ok', the previous RFC could
output either). At the moment, the reason a test is skipped has to be
on the same line as the result for the tools to pick it up (and the
KUnit API always requests such a 'status comment', even if it ends up
as the empty string).

We'll probably follow whatever kselftest does here, though, but will
be able to do more with skip reasons on the reult line.

> Diagnostic data
> ---------------
> Diagnostic data is text that reports on test conditions or test
> operations, or that explains test results.  In the kernel test
> result format, diagnostic data is placed on lines that start with a
> hash sign, followed by a space ('# ').
>
> One special format of diagnostic data is a test identification line,
> that has the fully qualified name of a test case.  Such a test
> identification line marks the start of test output for a test case.
>
> In the example above, there are three lines that start with '#'
> which precede the test result line:
>         # selftests: cpufreq: main.sh
>         # pid 8101's current affinity mask: fff
>         # pid 8101's new affinity mask: 1
> These are used to indicate diagnostic data for the test case
> 'selftests: cpufreq: main.sh'
>
> Material in comments between the identification line and the test
> result line are diagnostic data that can help to interpret the
> results of the test.
>
> The TAP specification indicates that automated test harnesses may
> ignore any line that is not one of the mandatory prescribed lines
> (that is, the output format version line, the plan line, a test
> result line, or a "Bail out!" line.)
>
> Bail out!
> ---------
> If a line in the test output starts with 'Bail out!', it indicates
> that the test was aborted for some reason.  It indicates that
> the test is unable to proceed, and no additional tests will be
> performed.
>
> This can be used at the very beginning of a test, or anywhere in the
> middle of the test, to indicate that the test can not continue.
>
> --- from here on is not-yet-organized material
>
> Tip:
>  - don't change the test plan based on skipped tests.
>    - it is better to report that a test case was skipped, than to
>      not report it
>    - that is, don't adjust the number of test cases based on skipped
>      tests
>
> Other things to mention:
> TAP13 elements not used:
>  - yaml for diagnostic messages
>    - reason: try to keep things line-based, since output from other things
>    may be interspersed with messages from the test itself
We're not using this in KUnit, either.
>  - TODO directive
Ditto: the upcoming SKIP support leaves room for this to easily be
added, though.
>
> KTAP Extensions beyond TAP13:
>  - nesting
>    - via indentation
>      - indentation makes it easier for humans to read
We're using this a lot in KUnit, as all tests are split into suites.
The syntax is basically a full nested TAP document, indented with four
spaces. (There are a couple of tests which output some non-indented
lines to our log, though.)

I've included some example output at the end of this email of what
we're doing currently.

>  - test identifier
>     - multiple parts, separated by ':'

>  - summary lines
>    - can be skipped by CI systems that do their own calculations

We're not outputting any summary lines for the tests as a whole, but
the success of a test suite is determined from the success of nested
tests.

> Other notes:
>  - automatic assignment of result status based on exit code
>
> Tips:
>  - do NOT describe the result in the test line
>    - the test case description should be the same whether the test
>      succeeds or fails
>    - use diagnostic lines to describe or explain results, if this is
>      desirable
>  - test numbers are considered harmful
>    - test harnesses should use the test description as the identifier
>    - test numbers change when testcases are added or removed
>      - which means that results can't be compared between different
>        versions of the test
>  - recommendations for diagnostic messages:
>    - reason for failure
>    - reason for skip
>    - diagnostic data should always preceding the result line
>      - problem: harness may emit result before test can do assessment
>        to determine reason for result
>      - this is what the kernel uses
>
> Differences between kernel test result format and TAP13:
>  - in KTAP the "# SKIP" directive is placed after the description on
>    the test result line

That's what we're planning to do with KUnit as well: clearly I didn't
read the TAP13 spec as thoroughly as I'd intended, as I'd naively
assumed that this was TAP13 spec compliant. Oops.
I'm very much in favour of this change.

>
> ====== start of ktap-doc-rfc.txt ======
> OK - that's the end of the RFC doc.
>
> Here are a few questions:
>  - is this document desired or not?

This is definitely a good thing for us: thanks a lot!

>     - is it too long or too short?
>  - if the document is desired, where should it be placed?
>    I assume somewhere under Documentation, and put into
>    .rst format. Suggestions for a name and location are welcome.
>  - is this document accurate?
>    I think KUNIT does a few things differently than this description.
>    - is the intent to have kunit and kselftest have the same output format?
>       if so, then these should be rationalized.

As above, we'd love to at least try to have kunit and kselftest using
the same format.


> Finally,
>   - Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
> See https://testanything.org/tap-version-13-specification.html

I have a very mild preference for 'ok': but it doesn't really matter
much one way or the other. Our tooling throws the result away if it
sees a SKIP.

> Regards,
>  -- Tim
>
>

Example KUnit output (including the in-progress "skip test" support):
TAP version 14
   # Subtest: kunit-try-catch-test
   1..2
   ok 1 - kunit_test_try_catch_successful_try_no_catch
   ok 2 - kunit_test_try_catch_unsuccessful_try_does_catch
ok 1 - kunit-try-catch-test
    # Subtest: example
   1..2
   # example_simple_test: initializing
   ok 1 - example_simple_test
   # example_skip_test: initializing
   ok 2 - example_skip_test # SKIP this test should be skipped
ok 2 - example



[1]: https://lore.kernel.org/linux-kselftest/20200513042956.109987-1-davidgow@google.com/T/#u

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-10 18:11 RFC - kernel selftest result documentation (KTAP) Bird, Tim
  2020-06-13  5:07 ` David Gow
@ 2020-06-14 18:17 ` Kees Cook
  2020-06-15 17:45   ` Bird, Tim
  2020-06-14 18:39 ` Kees Cook
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 43+ messages in thread
From: Kees Cook @ 2020-06-14 18:17 UTC (permalink / raw)
  To: David Gow
  Cc: Vitor Massaru Iha, KUnit Development, Shuah Khan,
	open list:KERNEL SELFTEST FRAMEWORK, Linux Kernel Mailing List,
	Brendan Higgins, linux-kernel-mentees, linux

On Sat, Jun 13, 2020 at 02:51:17PM +0800, David Gow wrote:
> On Sat, Jun 13, 2020 at 6:36 AM Kees Cook <keescook@chromium.org> wrote:
> > Regarding output:
> >
> > [   36.611358] TAP version 14
> > [   36.611953]     # Subtest: overflow
> > [   36.611954]     1..3
> > ...
> > [   36.622914]     # overflow_calculation_test: s64: 21 arithmetic tests
> > [   36.624020]     ok 1 - overflow_calculation_test
> > ...
> > [   36.731096]     # overflow_shift_test: ok: (s64)(0 << 63) == 0
> > [   36.731840]     ok 2 - overflow_shift_test
> > ...
> > [   36.750294] kunit_try_catch: vmalloc: allocation failure: 18446744073709551615 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0
> > ...
> > [   36.805350]     # overflow_allocation_test: devm_kzalloc detected saturation
> > [   36.807763]     ok 3 - overflow_allocation_test
> > [   36.807765] ok 1 - overflow
> >
> > A few things here....
>
> Tim Bird has just sent out an RFC for a "KTAP" specification, which
> we'll hope to support in KUnit:

Ah-ha! Thanks for the heads-up. :)

> https://lore.kernel.org/linux-kselftest/CY4PR13MB1175B804E31E502221BC8163FD830@CY4PR13MB1175.namprd13.prod.outlook.com/T/#u

*thread split/merge*

This is coming from:
https://lore.kernel.org/linux-kselftest/CABVgOSnofuJQ_fiCL-8KdKezg3Hnqk3A+X509c4YP_toKeBVBg@mail.gmail.com/
But I'm attempting a thread jump... ;)

> That's probably where we'll end up trying to hash out exactly what
> this format should be. Fortunately, I don't think any of these will
> require any per-test work, just changes to the KUnit implementation.

Yup, good.

> > - On the outer test report, there is no "plan" line (I was expecting
> >   "1..1"). Technically it's optional, but it seems like the information
> >   is available. :)
>
> There's work underway to support this, but it's hit a few minor snags:
> https://lkml.org/lkml/2020/2/27/2155

Okay, cool. It's not critical, I don't think.

> > - The subtest should have its own "TAP version 14" line, and it should
> >   be using the diagnostic line prefix for the top-level test (this is
> >   what kselftest is doing).
>
> Alas, TAP itself hasn't standardised subtests. Personally, I think it
> doesn't fundamentally matter which way we do this (I actually prefer
> the way we're doing it currently slightly), but converging with what
> kselftest does would be ideal.

I see the KTAP RFC doesn't discuss subtests at all, but kselftest actually
already handles subtests. I strongly feel that line-start formatting is
the correct way to deal with this, with each subtest having it's own
self-contained KTAP. This allows for several important features:

- the subtest, run on its own, needs no knowledge about its execution
  environment: it simply emits its standard KTAP output.

- subtest output can be externally parsed separably, without any
  knowledge or parsing of the enclosing test.

For example, with my recent series[1], "make -C seccomp run_tests"
produces:

TAP version 13
1..2
# selftests: seccomp: seccomp_bpf
# TAP version 13
# 1..77
# # Starting 77 tests from 6 test cases.
# #  RUN           global.mode_strict_support ...
# #            OK  global.mode_strict_support
# ok 1 global.mode_strict_support
...
# ok 77 TSYNC.two_siblings_not_under_filter
# # FAILED: 73 / 77 tests passed.
# # Totals: pass:72 fail:4 xfail:1 xpass:0 skip:0 error:0
not ok 1 selftests: seccomp: seccomp_bpf # exit=1
# selftests: seccomp: seccomp_benchmark
#
not ok 2 selftests: seccomp: seccomp_benchmark # TIMEOUT

> > - There is no way to distinguish top-level TAP output from kernel log
> >   lines. I think we should stick with the existing marker, which is
> >   "# ", so that kernel output has no way to be interpreted as TAP
> >   details -- unless it intentionally starts adding "#"s. ;)
>
> At the moment, we're doing this in KUnit tool by stripping anything
> before "TAP version 14" (e.g., the timestamp), and then only incuding
> lines which parse correctly (are a test plan, result, or a diagnostic
> line beginning with '#').
> This has worked pretty well thus far, and we do have the ability to
> get results from debugfs instead of the kernel log, which won't have
> the same problems.
>
> It's worth considering, though, particularly since our parser should
> handle this anyway without any changes.

For the KTAP parsing, actually think it's very important to include
kernel log lines in the test output (as diagnostic lines), since they
are "unexpected" (they fail to have the correct indentation) and may
provide additional context for test failures when reading log files.
(As in the "vmalloc: allocation failure" line in the quoted section
above, to be included as a diagnostic line for test #3.)

> > - There is no summary line (to help humans). For example, the kselftest
> >   API produces a final pass/fail report.
>
> Currently, we're relying on the kunit.py script to produce this, but
> it shouldn't be impossible to add to the kernel, particularly once the
> "KUnit Executor" changes mentioned above land.

Cool. Yeah, it's not required, but I think there are two use cases:
humans running a single test (where is a summary is valuable, especially
for long tests that scroll off the screen), and automation (where it can
ignore the summary, as it will produce its own in a regularized fashion).

> > Taken together, I was expecting the output to be:
> >
> > [   36.611358] # TAP version 14
> > [   36.611953] # 1..1
> > [   36.611958] # # TAP version 14
> > [   36.611954] # # 1..3
> > ...
> > [   36.622914] # # # overflow_calculation_test: s64: 21 arithmetic tests
> > [   36.624020] # # ok 1 - overflow_calculation_test
> > ...
> > [   36.731096] # # # overflow_shift_test: ok: (s64)(0 << 63) == 0
> > [   36.731840] # # ok 2 - overflow_shift_test
> > ...
> > [   36.750294] kunit_try_catch: vmalloc: allocation failure: 18446744073709551615 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0
> > ...
> > [   36.805350] # # # overflow_allocation_test: devm_kzalloc detected saturation
> > [   36.807763] # # ok 3 - overflow_allocation_test
> > [   36.807763] # # # overflow: PASS
> > [   36.807765] # ok 1 - overflow
> > [   36.807767] # # kunit: PASS
> >
> > But I assume there are threads on this that I've not read... :)
>
> These discussions all seem to be coming to a head now, so this is
> probably just the kick we need to prioritise this a bit more. The KTAP
> thread hasn't covered all of these (particularly the subtest stuff)
> yet, so I confess I hadn't realised the extent of the divergence
> between KUnit and kselftest here.

Cool. Yeah, I'd like to keep things as close as possible. In looking at
this again, I wonder if perhaps it would be better to change the
"indent" from "# " to "  ", which might make things more readable for
both dmesg and terminal output:

[   36.611358]   TAP version 14
[   36.611953]   1..1
[   36.611958]     TAP version 14
[   36.611954]     1..3
...
[   36.622914]     # overflow_calculation_test: s64: 21 arithmetic tests
[   36.624020]     ok 1 - overflow_calculation_test
...
[   36.731096]     # overflow_shift_test: ok: (s64)(0 << 63) == 0
[   36.731840]     ok 2 - overflow_shift_test
...
[   36.750294] kunit_try_catch: vmalloc: allocation failure: 18446744073709551615 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0
...
[   36.805350]     # overflow_allocation_test: devm_kzalloc detected saturation
[   36.807763]     ok 3 - overflow_allocation_test
[   36.807763]     # overflow: PASS
[   36.807765]   ok 1 - overflow
[   36.807767]   # kunit: PASS

As it happens, subtests are a bit rare in kselftests right now, but
for kunit, the "common" output (IIUC) will fundamentally be a top-level
test running all the subtests, so we should get it right. :)

-Kees

[1] https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=selftests/harness/tap

--
Kees Cook

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-10 18:11 RFC - kernel selftest result documentation (KTAP) Bird, Tim
  2020-06-13  5:07 ` David Gow
  2020-06-14 18:17 ` Kees Cook
@ 2020-06-14 18:39 ` Kees Cook
  2020-06-15 19:07   ` Bird, Tim
  2020-06-16 20:48 ` Brendan Higgins
  2020-06-19 17:13 ` Frank Rowand
  4 siblings, 1 reply; 43+ messages in thread
From: Kees Cook @ 2020-06-14 18:39 UTC (permalink / raw)
  To: Bird, Tim; +Cc: shuah, linux-kselftest, Brendan Higgins, linux-kernel

On Wed, Jun 10, 2020 at 06:11:06PM +0000, Bird, Tim wrote:
> The kernel test result format consists of 5 major elements,
> 4 of which are line-based:
>  * the output version line
>  * the plan line

Note: making the plan line required differs from TAP13 and TAP14. I
think it's the right choice, but we should be clear.

>  * one or more test result lines (also called test result lines)
>  * a possible "Bail out!" line

"Bail out!" to be moved to "optional" elements, since it may not appear.
And we should clarify TAP13 and TAP14's language to say it should only
appear when the test is aborting without running later tests -- for this
reason, I think the optional "description" following "Bail out!" should
be made required. I.e. it must be: "Bail out! $reason"

> optional elements:
>  * diagnostic data

nit: diagnostic lines (not data)

> The 5th element is diagnostic information, which is used to describe
> items running in the test, and possibly to explain test results.
> A sample test result is show below:
> 
> Some other lines may be placed the test harness, and are not emitted
> by individual test programs:
>  * one or more test identification lines
>  * a possible results summary line
> 
> Here is an example:
> 
> 	TAP version 13
> 	1..1
> 	# selftests: cpufreq: main.sh
> 	# pid 8101's current affinity mask: fff
> 	# pid 8101's new affinity mask: 1
> 	ok 1 selftests: cpufreq: main.sh

Nit: for examples, I this should should show more than one test.
(Preferably, it should show all the possible cases, ok, not ok, SKIP,
etc.)

> The output version line is: "TAP version 13"
> 
> The test plan is "1..1".
> 
> Element details
> ===============
> 
> Output version line
> -------------------
> The output version line is always "TAP version 13".
> 
> Although the kernel test result format has some additions
> to the TAP13 format, the version line reported by kselftest tests
> is (currently) always the exact string "TAP version 13"
> 
> This is always the first line of test output.
> 
> Test plan line
> --------------
> The test plan indicates the number of individual test cases intended to
> be executed by the test. It always starts with "1.." and is followed
> by the number of tests cases.  In the example above, 1..1", indicates
> that this test reports only 1 test case.
> 
> The test plan line can be placed in two locations:
>  * the second line of test output, when the number of test cases is known
>    in advance
>  * as the last line of test output, when the number of test cases is not
>    known in advance.
> 
> Most often, the number of test cases is known in advance, and the test plan
> line appears as the second line of test output, immediately following
> the output version line.  The number of test cases might not be known
> in advance if the number of tests is calculated from runtime data.
> In this case, the test plan line is emitted as the last line of test
> output.

"... must be ..." ?

> 
> Test result lines
> -----------------
> The test output consists of one or more test result lines that indicate
> the actual results for the test.  These have the format:
> 
>   <result> <number> <description> [<directive>] [<diagnostic data>]

This should be:

<result> <number> <description> [# [<directive> ][<diagnostic data>]]

> 
> The ''result'' must appear at the start of a line (except for when a
> test is nested, see below), and must consist of one of the following
> two phrases:
>   * ok
>   * not ok
> 
> If the test passed, then the result is reported as "ok".  If the test
> failed, then the result is reported as "not ok".  These must be in
> lower case, exactly as shown.
> 
> The ''number'' in the test result line represents the number of the
> test case being performed by the test program.  This is often used by
> test harnesses as a unique identifier for each test case.  The test
> number is a base-10 number, starting with 1.  It should increase by
> one for each new test result line emitted.  If possible the number
> for a test case should be kept the same over the lifetime of the test.
> 
> The ''description'' is a short description of the test case.
> This can be any string of words, but should avoid using colons (':')

Must also avoid "#".

> except as part of a fully qualifed test case name (see below).

TAP13/14 makes description optional, are we making it required (I think
we should). There seems to be a TAP13/14 "convention" of starting
<description> with "- ", which I'm on the fence about it. It does make
parsing maybe a little easier.

> Finally, it is possible to use a test directive to indicate another
> possible outcome for a test: that it was skipped.  To report that
> a test case was skipped, the result line should start with the
> result "not ok", and the directive "# SKIP" should be placed after
> the test description. (Note that this deviates from the TAP13
> specification).

This is what TAP14 changed, I think (i.e. directive follows description
now).

> 
> A test may be skipped for a variety of reasons, ranging for
> insufficient privileges to missing features or resources required
> to execute that test case.
> 
> It is usually helpful if a diagnostic message is emitted to explain
> the reasons for the skip.  If the message is a single line and is
> short, the diagnostic message may be placed after the '# SKIP'
> directive on the same line as the test result.  However, if it is
> not on the test result line, it should precede the test line (see
> diagnostic data, next).
> 
> Diagnostic data
> ---------------
> Diagnostic data is text that reports on test conditions or test
> operations, or that explains test results.  In the kernel test
> result format, diagnostic data is placed on lines that start with a
> hash sign, followed by a space ('# ').
> 
> One special format of diagnostic data is a test identification line,
> that has the fully qualified name of a test case.  Such a test
> identification line marks the start of test output for a test case.
> 
> In the example above, there are three lines that start with '#'
> which precede the test result line:
> 	# selftests: cpufreq: main.sh
> 	# pid 8101's current affinity mask: fff
> 	# pid 8101's new affinity mask: 1
> These are used to indicate diagnostic data for the test case
> 'selftests: cpufreq: main.sh'
> 
> Material in comments between the identification line and the test
> result line are diagnostic data that can help to interpret the
> results of the test.
> 
> The TAP specification indicates that automated test harnesses may
> ignore any line that is not one of the mandatory prescribed lines
> (that is, the output format version line, the plan line, a test
> result line, or a "Bail out!" line.) 
> 
> Bail out!
> ---------
> If a line in the test output starts with 'Bail out!', it indicates
> that the test was aborted for some reason.  It indicates that 
> the test is unable to proceed, and no additional tests will be
> performed.
> 
> This can be used at the very beginning of a test, or anywhere in the
> middle of the test, to indicate that the test can not continue.

I think the required syntax should be:

Bail out! <reason>

And to make it clear that this is optionally used to indicate an early
abort. (Though with a leading plan line, a parser should be able to
determine this on its own.)

> --- from here on is not-yet-organized material
> 
> Tip:
>  - don't change the test plan based on skipped tests.
>    - it is better to report that a test case was skipped, than to
>      not report it
>    - that is, don't adjust the number of test cases based on skipped
>      tests

Yes, totally.

> Other things to mention:
> TAP13 elements not used:
>  - yaml for diagnostic messages
>    - reason: try to keep things line-based, since output from other things
>    may be interspersed with messages from the test itself

Agreed: the yaml extension is not sensible for our use.

>  - TODO directive

Agreed: SKIP should cover everything TODO does.

> KTAP Extensions beyond TAP13:
>  - nesting

(I would call this 'subtests')

>    - via indentation
>      - indentation makes it easier for humans to read

And allows for separable parsing of subtests.

>  - test identifier
>     - multiple parts, separated by ':'

This is interesting... is the goal to be able to report test status over
time by name?

>  - summary lines
>    - can be skipped by CI systems that do their own calculations

Right -- I think per-test summary line should be included for the humans
reading a single test (which may scroll off the screen).

> Other notes:
>  - automatic assignment of result status based on exit code

This is, I think, a matter for the kselftest running infrastructure, not
the KTAP output?

> Tips:
>  - do NOT describe the result in the test line
>    - the test case description should be the same whether the test
>      succeeds or fails
>    - use diagnostic lines to describe or explain results, if this is
>      desirable

Right.

>  - test numbers are considered harmful
>    - test harnesses should use the test description as the identifier
>    - test numbers change when testcases are added or removed
>      - which means that results can't be compared between different
>        versions of the test

Right.

>  - recommendations for diagnostic messages:
>    - reason for failure
>    - reason for skip
>    - diagnostic data should always preceding the result line
>      - problem: harness may emit result before test can do assessment
>        to determine reason for result
>      - this is what the kernel uses

Right.

> Differences between kernel test result format and TAP13:
>  - in KTAP the "# SKIP" directive is placed after the description on
>    the test result line

Right, this is the same as TAP14, IIRC. KTAP's big deltas are the "#"
diagnostic lines and the subtest handling.

> ====== start of ktap-doc-rfc.txt ======
> OK - that's the end of the RFC doc.
> 
> Here are a few questions:
>  - is this document desired or not?

Yes.

>     - is it too long or too short?

Should be slightly longer: more examples.

>  - if the document is desired, where should it be placed?
>    I assume somewhere under Documentation, and put into
>    .rst format. Suggestions for a name and location are welcome.

Yes Documentation/*.rst Not sure on name yet, but where do kselftest
docs live? :)

>  - is this document accurate?
>    I think KUNIT does a few things differently than this description.

Let's fix it. :)

>    - is the intent to have kunit and kselftest have the same output format?
>       if so, then these should be rationalized.

Yes please.

> Finally,
>   - Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
> See https://testanything.org/tap-version-13-specification.html

Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it
did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP
relate? Neither SKIP nor XFAIL count toward failure, though, so both
should be "ok"? I guess we should change it to "ok".

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: RFC - kernel selftest result documentation (KTAP)
  2020-06-13  5:07 ` David Gow
@ 2020-06-15 17:34   ` Bird, Tim
  2020-06-16 20:03     ` Brendan Higgins
  0 siblings, 1 reply; 43+ messages in thread
From: Bird, Tim @ 2020-06-15 17:34 UTC (permalink / raw)
  To: David Gow
  Cc: shuah, linux-kselftest, Brendan Higgins, linux-kernel, Kees Cook

> -----Original Message-----
> From: David Gow <davidgow@google.com>
> 
> On Thu, Jun 11, 2020 at 2:11 AM Bird, Tim <Tim.Bird@sony.com> wrote:
> >
> > Some months ago I started work on a document to formalize how
> > kselftest implements the TAP specification.  However, I didn't finish
> > that work.  Maybe it's time to do so now.
> >
> > kselftest has developed a few differences from the original
> > TAP specification, and  some extensions that I believe are worth
> > documenting.
> >
> > Essentially, we have created our own KTAP (kernel TAP)
> > format.  I think it is worth documenting our conventions, in order to
> > keep everyone on the same page.
> >
> > Below is a partially completed document on my understanding
> > of KTAP, based on examination of some of the kselftest test
> > output.  I have not reconciled this with the kunit output format,
> > which I believe has some differences (which maybe we should
> > resolve before we get too far into this).
> 
> Thanks for doing this! This is something we've wanted to have for a while!
> 
> On the KUnit side of things, we've not (intentionally) deviated too
> much from TAP/kselftest
> It's certainly our intention to hew as close as possible to what
> kselftest is doing: I don't think there are any real conflicts
> conceptually (at least at the moment), but we're almost certainly
> handling a few details differently.
> 
> One other thing worth noting is that KUnit has a parser for our TAP
> results: './tools/testing/kunit/kunit.py parse' will do some basic
> parsing and print out results, a summary, etc.
> 
> A few other comments below:
> 
> > I submit the document now, before it is finished, because a patch
> > was recently introduced to alter one of the result conventions
> > (from SKIP='not ok' to SKIP='ok').
> >
> > See the document include inline below
> >
> > ====== start of ktap-doc-rfc.txt ======
> > Selftest preferred output format
> > --------------------------------
> >
> > The linux kernel selftest system uses TAP (Test Anything Protocol)
> > output to make testing results consumable by automated systems.  A
> > number of Continuous Integration (CI) systems test the kernel every
> > day.  It is useful for these systems that output from selftest
> > programs be consistent and machine-parsable.
> >
> > At the same time, it is useful for test results to be human-readable
> > as well.
> >
> > The kernel test result format is based on a variation TAP
> > TAP is a simple text-based format that is
> > documented on the TAP home page (http://testanything.org/).  There
> > is an official TAP13 specification here:
> > http://testanything.org/tap-version-13-specification.html
> >
> > The kernel test result format consists of 5 major elements,
> > 4 of which are line-based:
> >  * the output version line
> >  * the plan line
> >  * one or more test result lines (also called test result lines)
> >  * a possible "Bail out!" line
> >
> > optional elements:
> >  * diagnostic data
> >
> > The 5th element is diagnostic information, which is used to describe
> > items running in the test, and possibly to explain test results.
> > A sample test result is show below:
> >
> > Some other lines may be placed the test harness, and are not emitted
> > by individual test programs:
> >  * one or more test identification lines
> >  * a possible results summary line
> >
> > Here is an example:
> >
> >         TAP version 13
> >         1..1
> >         # selftests: cpufreq: main.sh
> >         # pid 8101's current affinity mask: fff
> >         # pid 8101's new affinity mask: 1
> >         ok 1 selftests: cpufreq: main.sh
> >
> > The output version line is: "TAP version 13"
> >
> > The test plan is "1..1".
> >
> > Element details
> > ===============
> >
> > Output version line
> > -------------------
> > The output version line is always "TAP version 13".
> >
> > Although the kernel test result format has some additions
> > to the TAP13 format, the version line reported by kselftest tests
> > is (currently) always the exact string "TAP version 13"
> >
> > This is always the first line of test output.
> 
> KUnit is currently outputting "TAP version 14", as we were hoping some
> of our changes would get into the TAP14 spec. (Any comments, Brendan?)
> Maybe this should end up saying "KTAP version 1" or something?

I don't know if this will break any existing results parsers or not.
I hesitate to use "TAP version 14", as TAP appears to be a dormant
initiative at the moment, and there's no guarantee that the kernel's
changes will get adopted into an official spec.

If we are a strict super-set of TAP, then I suppose we could just
start using TAP version 14, and unilaterally declare that our changes
make a new spec.  But since we don't control the web site this feels
like a hostile takeover.

I'm most comfortable with calling our thing KTAP, and just
referencing TAP as inspiration.  I don't have a strong opinion on
KTAP vs TAP, but I do feel strongly that kselftest and kunit should use the
same version line (if we can get them to use the same conventions).

> 
> > Test plan line
> > --------------
> > The test plan indicates the number of individual test cases intended to
> > be executed by the test. It always starts with "1.." and is followed
> > by the number of tests cases.  In the example above, 1..1", indicates
> > that this test reports only 1 test case.
> >
> > The test plan line can be placed in two locations:
> >  * the second line of test output, when the number of test cases is known
> >    in advance
> >  * as the last line of test output, when the number of test cases is not
> >    known in advance.
> >
> > Most often, the number of test cases is known in advance, and the test plan
> > line appears as the second line of test output, immediately following
> > the output version line.  The number of test cases might not be known
> > in advance if the number of tests is calculated from runtime data.
> > In this case, the test plan line is emitted as the last line of test
> > output.
> 
> KUnit is currently including the test plan line only for subtests, as
> the current version doesn't actually know how many test suites will
> run in advance.
> This is something there's work underway to fix, though.
Sounds good.  You can just put the line at the bottom if it's
obnoxious to calculate ahead of time.

Does this mean that KUnit treats each sub-test as an individual test case
of the "super-test"?

In results summaries for a super-test, are all sub-test cases counted,
or just the list of sub-tests?


> 
> > Test result lines
> > -----------------
> > The test output consists of one or more test result lines that indicate
> > the actual results for the test.  These have the format:
> >
> >   <result> <number> <description> [<directive>] [<diagnostic data>]
> >
> > The ''result'' must appear at the start of a line (except for when a
> > test is nested, see below), and must consist of one of the following
> > two phrases:
> >   * ok
> >   * not ok
> >
> > If the test passed, then the result is reported as "ok".  If the test
> > failed, then the result is reported as "not ok".  These must be in
> > lower case, exactly as shown.
> >
> > The ''number'' in the test result line represents the number of the
> > test case being performed by the test program.  This is often used by
> > test harnesses as a unique identifier for each test case.  The test
> > number is a base-10 number, starting with 1.  It should increase by
> > one for each new test result line emitted.  If possible the number
> > for a test case should be kept the same over the lifetime of the test.
> >
> > The ''description'' is a short description of the test case.
> > This can be any string of words, but should avoid using colons (':')
> > except as part of a fully qualifed test case name (see below).
> >
> > Finally, it is possible to use a test directive to indicate another
> > possible outcome for a test: that it was skipped.  To report that
> > a test case was skipped, the result line should start with the
> > result "not ok", and the directive "# SKIP" should be placed after
> > the test description. (Note that this deviates from the TAP13
> > specification).
> >
> > A test may be skipped for a variety of reasons, ranging for
> > insufficient privileges to missing features or resources required
> > to execute that test case.
> >
> > It is usually helpful if a diagnostic message is emitted to explain
> > the reasons for the skip.  If the message is a single line and is
> > short, the diagnostic message may be placed after the '# SKIP'
> > directive on the same line as the test result.  However, if it is
> > not on the test result line, it should precede the test line (see
> > diagnostic data, next).
> 
> We're in the process of supporting test skipping in KUnit at the
> moment[1], and haven't totally formalised what the syntax here should
> be. The only output issues thus far have been on the "ok"/"not ok"
> point (my in-progress patch is using 'ok', the previous RFC could
> output either).

I'll comment on this in my reply to Kees' email.

> At the moment, the reason a test is skipped has to be
> on the same line as the result for the tools to pick it up (and the
> KUnit API always requests such a 'status comment', even if it ends up
> as the empty string).

OK - I think this is a good convention.

> 
> We'll probably follow whatever kselftest does here, though, but will
> be able to do more with skip reasons on the result line.
> 
> > Diagnostic data
> > ---------------
> > Diagnostic data is text that reports on test conditions or test
> > operations, or that explains test results.  In the kernel test
> > result format, diagnostic data is placed on lines that start with a
> > hash sign, followed by a space ('# ').
> >
> > One special format of diagnostic data is a test identification line,
> > that has the fully qualified name of a test case.  Such a test
> > identification line marks the start of test output for a test case.
> >
> > In the example above, there are three lines that start with '#'
> > which precede the test result line:
> >         # selftests: cpufreq: main.sh
> >         # pid 8101's current affinity mask: fff
> >         # pid 8101's new affinity mask: 1
> > These are used to indicate diagnostic data for the test case
> > 'selftests: cpufreq: main.sh'
> >
> > Material in comments between the identification line and the test
> > result line are diagnostic data that can help to interpret the
> > results of the test.
> >
> > The TAP specification indicates that automated test harnesses may
> > ignore any line that is not one of the mandatory prescribed lines
> > (that is, the output format version line, the plan line, a test
> > result line, or a "Bail out!" line.)
> >
> > Bail out!
> > ---------
> > If a line in the test output starts with 'Bail out!', it indicates
> > that the test was aborted for some reason.  It indicates that
> > the test is unable to proceed, and no additional tests will be
> > performed.
> >
> > This can be used at the very beginning of a test, or anywhere in the
> > middle of the test, to indicate that the test can not continue.
> >
> > --- from here on is not-yet-organized material
> >
> > Tip:
> >  - don't change the test plan based on skipped tests.
> >    - it is better to report that a test case was skipped, than to
> >      not report it
> >    - that is, don't adjust the number of test cases based on skipped
> >      tests
> >
> > Other things to mention:
> > TAP13 elements not used:
> >  - yaml for diagnostic messages
> >    - reason: try to keep things line-based, since output from other things
> >    may be interspersed with messages from the test itself
> We're not using this in KUnit, either.
> >  - TODO directive
> Ditto: the upcoming SKIP support leaves room for this to easily be
> added, though.
> >
> > KTAP Extensions beyond TAP13:
> >  - nesting
> >    - via indentation
> >      - indentation makes it easier for humans to read
> We're using this a lot in KUnit, as all tests are split into suites.
> The syntax is basically a full nested TAP document, indented with four
> spaces. (There are a couple of tests which output some non-indented
> lines to our log, though.)
> 
> I've included some example output at the end of this email of what
> we're doing currently.
> 
> >  - test identifier
> >     - multiple parts, separated by ':'
> 
> >  - summary lines
> >    - can be skipped by CI systems that do their own calculations
> 
> We're not outputting any summary lines for the tests as a whole, but
> the success of a test suite is determined from the success of nested
> tests.
> 
> > Other notes:
> >  - automatic assignment of result status based on exit code
> >
> > Tips:
> >  - do NOT describe the result in the test line
> >    - the test case description should be the same whether the test
> >      succeeds or fails
> >    - use diagnostic lines to describe or explain results, if this is
> >      desirable
> >  - test numbers are considered harmful
> >    - test harnesses should use the test description as the identifier
> >    - test numbers change when testcases are added or removed
> >      - which means that results can't be compared between different
> >        versions of the test
> >  - recommendations for diagnostic messages:
> >    - reason for failure
> >    - reason for skip
> >    - diagnostic data should always preceding the result line
> >      - problem: harness may emit result before test can do assessment
> >        to determine reason for result
> >      - this is what the kernel uses
> >
> > Differences between kernel test result format and TAP13:
> >  - in KTAP the "# SKIP" directive is placed after the description on
> >    the test result line
> 
> That's what we're planning to do with KUnit as well: clearly I didn't
> read the TAP13 spec as thoroughly as I'd intended, as I'd naively
> assumed that this was TAP13 spec compliant. Oops.
> I'm very much in favour of this change.

OK - thanks for the feedback.   It's my preference also.

> 
> >
> > ====== start of ktap-doc-rfc.txt ======
> > OK - that's the end of the RFC doc.
> >
> > Here are a few questions:
> >  - is this document desired or not?
> 
> This is definitely a good thing for us: thanks a lot!
> 
> >     - is it too long or too short?
> >  - if the document is desired, where should it be placed?
> >    I assume somewhere under Documentation, and put into
> >    .rst format. Suggestions for a name and location are welcome.
> >  - is this document accurate?
> >    I think KUNIT does a few things differently than this description.
> >    - is the intent to have kunit and kselftest have the same output format?
> >       if so, then these should be rationalized.
> 
> As above, we'd love to at least try to have kunit and kselftest using
> the same format.
> 
> 
> > Finally,
> >   - Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
> > See https://testanything.org/tap-version-13-specification.html
> 
> I have a very mild preference for 'ok': but it doesn't really matter
> much one way or the other. Our tooling throws the result away if it
> sees a SKIP.

Ok - thanks for the feedback.
 -- Tim

> 
> > Regards,
> >  -- Tim
> >
> >
> 
> Example KUnit output (including the in-progress "skip test" support):
> TAP version 14
>    # Subtest: kunit-try-catch-test
>    1..2
>    ok 1 - kunit_test_try_catch_successful_try_no_catch
>    ok 2 - kunit_test_try_catch_unsuccessful_try_does_catch
> ok 1 - kunit-try-catch-test
>     # Subtest: example
>    1..2
>    # example_simple_test: initializing
>    ok 1 - example_simple_test
>    # example_skip_test: initializing
>    ok 2 - example_skip_test # SKIP this test should be skipped
> ok 2 - example
> 
> 
> [1]: https://lore.kernel.org/linux-kselftest/20200513042956.109987-1-davidgow@google.com/T/#u

It's nice to have this example.  Thanks.
 -- Tim


^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: RFC - kernel selftest result documentation (KTAP)
  2020-06-14 18:17 ` Kees Cook
@ 2020-06-15 17:45   ` Bird, Tim
  2020-06-15 18:44     ` Kees Cook
  0 siblings, 1 reply; 43+ messages in thread
From: Bird, Tim @ 2020-06-15 17:45 UTC (permalink / raw)
  To: Kees Cook, David Gow
  Cc: Vitor Massaru Iha, KUnit Development, Shuah Khan,
	open list:KERNEL SELFTEST FRAMEWORK, Linux Kernel Mailing List,
	Brendan Higgins, linux-kernel-mentees, linux



> -----Original Message-----
> From: Kees Cook
> Sent: Sunday, June 14, 2020 12:18 PM
> To: David Gow <davidgow@google.com>
> Cc: Vitor Massaru Iha <vitor@massaru.org>; KUnit Development <kunit-dev@googlegroups.com>; Shuah Khan
> <skhan@linuxfoundation.org>; open list:KERNEL SELFTEST FRAMEWORK <linux-kselftest@vger.kernel.org>; Linux Kernel Mailing List
> <linux-kernel@vger.kernel.org>; Brendan Higgins <brendanhiggins@google.com>; linux-kernel-mentees@lists.linuxfoundation.org;
> linux@rasmusvillemoes.dk
> Subject: Re: RFC - kernel selftest result documentation (KTAP)
> 
> On Sat, Jun 13, 2020 at 02:51:17PM +0800, David Gow wrote:
> > On Sat, Jun 13, 2020 at 6:36 AM Kees Cook <keescook@chromium.org> wrote:
> > > Regarding output:
> > >
> > > [   36.611358] TAP version 14
> > > [   36.611953]     # Subtest: overflow
> > > [   36.611954]     1..3
> > > ...
> > > [   36.622914]     # overflow_calculation_test: s64: 21 arithmetic tests
> > > [   36.624020]     ok 1 - overflow_calculation_test
> > > ...
> > > [   36.731096]     # overflow_shift_test: ok: (s64)(0 << 63) == 0
> > > [   36.731840]     ok 2 - overflow_shift_test
> > > ...
> > > [   36.750294] kunit_try_catch: vmalloc: allocation failure: 18446744073709551615 bytes, mode:0xcc0(GFP_KERNEL),
> nodemask=(null),cpuset=/,mems_allowed=0
> > > ...
> > > [   36.805350]     # overflow_allocation_test: devm_kzalloc detected saturation
> > > [   36.807763]     ok 3 - overflow_allocation_test
> > > [   36.807765] ok 1 - overflow
> > >
> > > A few things here....
> >
> > Tim Bird has just sent out an RFC for a "KTAP" specification, which
> > we'll hope to support in KUnit:
> 
> Ah-ha! Thanks for the heads-up. :)
> 
> > https://lore.kernel.org/linux-
> kselftest/CY4PR13MB1175B804E31E502221BC8163FD830@CY4PR13MB1175.namprd13.prod.outlook.com/T/#u
> 
> *thread split/merge*
> 
> This is coming from:
> https://lore.kernel.org/linux-kselftest/CABVgOSnofuJQ_fiCL-8KdKezg3Hnqk3A+X509c4YP_toKeBVBg@mail.gmail.com/
> But I'm attempting a thread jump... ;)
> 
> > That's probably where we'll end up trying to hash out exactly what
> > this format should be. Fortunately, I don't think any of these will
> > require any per-test work, just changes to the KUnit implementation.
> 
> Yup, good.
> 
> > > - On the outer test report, there is no "plan" line (I was expecting
> > >   "1..1"). Technically it's optional, but it seems like the information
> > >   is available. :)
> >
> > There's work underway to support this, but it's hit a few minor snags:
> > https://lkml.org/lkml/2020/2/27/2155
> 
> Okay, cool. It's not critical, I don't think.
> 
> > > - The subtest should have its own "TAP version 14" line, and it should
> > >   be using the diagnostic line prefix for the top-level test (this is
> > >   what kselftest is doing).
> >
> > Alas, TAP itself hasn't standardised subtests. Personally, I think it
> > doesn't fundamentally matter which way we do this (I actually prefer
> > the way we're doing it currently slightly), but converging with what
> > kselftest does would be ideal.
> 
> I see the KTAP RFC doesn't discuss subtests at all, but kselftest actually
> already handles subtests. I strongly feel that line-start formatting is
> the correct way to deal with this, with each subtest having it's own
> self-contained KTAP. This allows for several important features:
> 
> - the subtest, run on its own, needs no knowledge about its execution
>   environment: it simply emits its standard KTAP output.
> 
> - subtest output can be externally parsed separably, without any
>   knowledge or parsing of the enclosing test.

I agree with both of these, but will save additional comments for the
thread on my KTAP doc RFC.

> 
> For example, with my recent series[1], "make -C seccomp run_tests"
> produces:
> 
> TAP version 13
> 1..2
> # selftests: seccomp: seccomp_bpf
> # TAP version 13
> # 1..77
> # # Starting 77 tests from 6 test cases.
> # #  RUN           global.mode_strict_support ...
> # #            OK  global.mode_strict_support
> # ok 1 global.mode_strict_support
> ...
> # ok 77 TSYNC.two_siblings_not_under_filter
> # # FAILED: 73 / 77 tests passed.
> # # Totals: pass:72 fail:4 xfail:1 xpass:0 skip:0 error:0
> not ok 1 selftests: seccomp: seccomp_bpf # exit=1
> # selftests: seccomp: seccomp_benchmark
> #
> not ok 2 selftests: seccomp: seccomp_benchmark # TIMEOUT
> 
> > > - There is no way to distinguish top-level TAP output from kernel log
> > >   lines. I think we should stick with the existing marker, which is
> > >   "# ", so that kernel output has no way to be interpreted as TAP
> > >   details -- unless it intentionally starts adding "#"s. ;)
> >
> > At the moment, we're doing this in KUnit tool by stripping anything
> > before "TAP version 14" (e.g., the timestamp), and then only incuding
> > lines which parse correctly (are a test plan, result, or a diagnostic
> > line beginning with '#').
> > This has worked pretty well thus far, and we do have the ability to
> > get results from debugfs instead of the kernel log, which won't have
> > the same problems.
> >
> > It's worth considering, though, particularly since our parser should
> > handle this anyway without any changes.
> 
> For the KTAP parsing, actually think it's very important to include
> kernel log lines in the test output (as diagnostic lines), since they
> are "unexpected" (they fail to have the correct indentation) and may
> provide additional context for test failures when reading log files.
> (As in the "vmalloc: allocation failure" line in the quoted section
> above, to be included as a diagnostic line for test #3.)
> 
> > > - There is no summary line (to help humans). For example, the kselftest
> > >   API produces a final pass/fail report.
> >
> > Currently, we're relying on the kunit.py script to produce this, but
> > it shouldn't be impossible to add to the kernel, particularly once the
> > "KUnit Executor" changes mentioned above land.
> 
> Cool. Yeah, it's not required, but I think there are two use cases:
> humans running a single test (where is a summary is valuable, especially
> for long tests that scroll off the screen), and automation (where it can
> ignore the summary, as it will produce its own in a regularized fashion).
> 
> > > Taken together, I was expecting the output to be:
> > >
> > > [   36.611358] # TAP version 14
> > > [   36.611953] # 1..1
> > > [   36.611958] # # TAP version 14
> > > [   36.611954] # # 1..3
> > > ...
> > > [   36.622914] # # # overflow_calculation_test: s64: 21 arithmetic tests
> > > [   36.624020] # # ok 1 - overflow_calculation_test
> > > ...
> > > [   36.731096] # # # overflow_shift_test: ok: (s64)(0 << 63) == 0
> > > [   36.731840] # # ok 2 - overflow_shift_test
> > > ...
> > > [   36.750294] kunit_try_catch: vmalloc: allocation failure: 18446744073709551615 bytes, mode:0xcc0(GFP_KERNEL),
> nodemask=(null),cpuset=/,mems_allowed=0
> > > ...
> > > [   36.805350] # # # overflow_allocation_test: devm_kzalloc detected saturation
> > > [   36.807763] # # ok 3 - overflow_allocation_test
> > > [   36.807763] # # # overflow: PASS
> > > [   36.807765] # ok 1 - overflow
> > > [   36.807767] # # kunit: PASS
> > >
> > > But I assume there are threads on this that I've not read... :)
> >
> > These discussions all seem to be coming to a head now, so this is
> > probably just the kick we need to prioritise this a bit more. The KTAP
> > thread hasn't covered all of these (particularly the subtest stuff)
> > yet, so I confess I hadn't realised the extent of the divergence
> > between KUnit and kselftest here.
> 
> Cool. Yeah, I'd like to keep things as close as possible. In looking at
> this again, I wonder if perhaps it would be better to change the
> "indent" from "# " to "  ", which might make things more readable for
> both dmesg and terminal output:
> 
> [   36.611358]   TAP version 14
> [   36.611953]   1..1
> [   36.611958]     TAP version 14
> [   36.611954]     1..3
> ...
> [   36.622914]     # overflow_calculation_test: s64: 21 arithmetic tests
> [   36.624020]     ok 1 - overflow_calculation_test
> ...
> [   36.731096]     # overflow_shift_test: ok: (s64)(0 << 63) == 0
> [   36.731840]     ok 2 - overflow_shift_test
> ...
> [   36.750294] kunit_try_catch: vmalloc: allocation failure: 18446744073709551615 bytes, mode:0xcc0(GFP_KERNEL),
> nodemask=(null),cpuset=/,mems_allowed=0
> ...
> [   36.805350]     # overflow_allocation_test: devm_kzalloc detected saturation
> [   36.807763]     ok 3 - overflow_allocation_test
> [   36.807763]     # overflow: PASS
> [   36.807765]   ok 1 - overflow
> [   36.807767]   # kunit: PASS
> 
> As it happens, subtests are a bit rare in kselftests right now, but
> for kunit, the "common" output (IIUC) will fundamentally be a top-level
> test running all the subtests, so we should get it right. :)

Personally, as a human I find the space prefix a bit easier to read.
However, I think that in "normal" kernel log output it is unusual for
a line to be prefixed with a hash (#), so this might be easier to
both visually distinguish and for automated parsing.
So I'm torn.  I don't have a strong opinion on space vs. hash prefix
for indicating sub-test.  I think the KUnit convention of matching
whatever was the prefix of the "TAP version 14" line is clever, but
it would be nice to just pick a prefix and be done with it.

IMHO.
 -- Tim


> -Kees
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=selftests/harness/tap
> 
> --
> Kees Cook

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-15 17:45   ` Bird, Tim
@ 2020-06-15 18:44     ` Kees Cook
  0 siblings, 0 replies; 43+ messages in thread
From: Kees Cook @ 2020-06-15 18:44 UTC (permalink / raw)
  To: Bird, Tim
  Cc: David Gow, Vitor Massaru Iha, KUnit Development, Shuah Khan,
	open list:KERNEL SELFTEST FRAMEWORK, Linux Kernel Mailing List,
	Brendan Higgins, linux-kernel-mentees, linux

On Mon, Jun 15, 2020 at 05:45:28PM +0000, Bird, Tim wrote:
> Personally, as a human I find the space prefix a bit easier to read.
> However, I think that in "normal" kernel log output it is unusual for
> a line to be prefixed with a hash (#), so this might be easier to
> both visually distinguish and for automated parsing.
> So I'm torn.  I don't have a strong opinion on space vs. hash prefix
> for indicating sub-test.  I think the KUnit convention of matching
> whatever was the prefix of the "TAP version 14" line is clever, but
> it would be nice to just pick a prefix and be done with it.

Are there plans in kernelci for doing any parsing of subtests? (As in,
what do we break if we move from "# " to "  "?)

I'm really thinking "  " makes sense now. :)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: RFC - kernel selftest result documentation (KTAP)
  2020-06-14 18:39 ` Kees Cook
@ 2020-06-15 19:07   ` Bird, Tim
  2020-06-16 12:08     ` Paolo Bonzini
                       ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Bird, Tim @ 2020-06-15 19:07 UTC (permalink / raw)
  To: Kees Cook
  Cc: shuah, linux-kselftest, Brendan Higgins, linux-kernel, David Gow,
	Paolo Bonzini

Kees,

Thanks for the great feedback.  See comments inline below.

> -----Original Message-----
> From: Kees Cook <keescook@chromium.org>
> 
> On Wed, Jun 10, 2020 at 06:11:06PM +0000, Bird, Tim wrote:
> > The kernel test result format consists of 5 major elements,
> > 4 of which are line-based:
> >  * the output version line
> >  * the plan line
> 
> Note: making the plan line required differs from TAP13 and TAP14. I
> think it's the right choice, but we should be clear.

Noted.  In re-reading my doc, I've conflated my sections.  The first
section is "single-line", and the next section is "optional".  ???
I'll fix that.

With regards to making it optional or not, I don't have a strong
preference.  The extra info seems helpful in some circumstances.
I don't know if it's too onerous to make it a requirement or not.
I'd prefer if it was always there (either at the beginning or the end),
but if there is some situation where it's quite difficult to calculate,
then it would be best not to mandate it. I can't think of any impossible
situations at the moment.

> 
> >  * one or more test result lines (also called test result lines)
> >  * a possible "Bail out!" line
> 
> "Bail out!" to be moved to "optional" elements, since it may not appear.
> And we should clarify TAP13 and TAP14's language to say it should only
> appear when the test is aborting without running later tests -- for this
> reason, I think the optional "description" following "Bail out!" should
> be made required. I.e. it must be: "Bail out! $reason"

I'll make sure this is listed as optional.
I like adding a mandatory reason.
> 
> > optional elements:
> >  * diagnostic data
> 
> nit: diagnostic lines (not data)
OK.

> 
> > The 5th element is diagnostic information, which is used to describe
> > items running in the test, and possibly to explain test results.
> > A sample test result is show below:
> >
> > Some other lines may be placed the test harness, and are not emitted
> > by individual test programs:
> >  * one or more test identification lines
> >  * a possible results summary line
> >
> > Here is an example:
> >
> > 	TAP version 13
> > 	1..1
> > 	# selftests: cpufreq: main.sh
> > 	# pid 8101's current affinity mask: fff
> > 	# pid 8101's new affinity mask: 1
> > 	ok 1 selftests: cpufreq: main.sh
> 
> Nit: for examples, I this should should show more than one test.
> (Preferably, it should show all the possible cases, ok, not ok, SKIP,
> etc.)
Agree.  I will expand this.

> 
> > The output version line is: "TAP version 13"
> >
> > The test plan is "1..1".
> >
> > Element details
> > ===============
> >
> > Output version line
> > -------------------
> > The output version line is always "TAP version 13".
> >
> > Although the kernel test result format has some additions
> > to the TAP13 format, the version line reported by kselftest tests
> > is (currently) always the exact string "TAP version 13"
> >
> > This is always the first line of test output.
> >
> > Test plan line
> > --------------
> > The test plan indicates the number of individual test cases intended to
> > be executed by the test. It always starts with "1.." and is followed
> > by the number of tests cases.  In the example above, 1..1", indicates
> > that this test reports only 1 test case.
> >
> > The test plan line can be placed in two locations:
> >  * the second line of test output, when the number of test cases is known
> >    in advance
> >  * as the last line of test output, when the number of test cases is not
> >    known in advance.
> >
> > Most often, the number of test cases is known in advance, and the test plan
> > line appears as the second line of test output, immediately following
> > the output version line.  The number of test cases might not be known
> > in advance if the number of tests is calculated from runtime data.
> > In this case, the test plan line is emitted as the last line of test
> > output.
> 
> "... must be ..." ?
> 
> >
> > Test result lines
> > -----------------
> > The test output consists of one or more test result lines that indicate
> > the actual results for the test.  These have the format:
> >
> >   <result> <number> <description> [<directive>] [<diagnostic data>]
> 
> This should be:
> 
> <result> <number> <description> [# [<directive> ][<diagnostic data>]]
> 
> >
> > The ''result'' must appear at the start of a line (except for when a
> > test is nested, see below), and must consist of one of the following
> > two phrases:
> >   * ok
> >   * not ok
> >
> > If the test passed, then the result is reported as "ok".  If the test
> > failed, then the result is reported as "not ok".  These must be in
> > lower case, exactly as shown.
> >
> > The ''number'' in the test result line represents the number of the
> > test case being performed by the test program.  This is often used by
> > test harnesses as a unique identifier for each test case.  The test
> > number is a base-10 number, starting with 1.  It should increase by
> > one for each new test result line emitted.  If possible the number
> > for a test case should be kept the same over the lifetime of the test.
> >
> > The ''description'' is a short description of the test case.
> > This can be any string of words, but should avoid using colons (':')
> 
> Must also avoid "#".
ok.
> 
> > except as part of a fully qualifed test case name (see below).
> 
> TAP13/14 makes description optional, are we making it required (I think
> we should). There seems to be a TAP13/14 "convention" of starting
> <description> with "- ", which I'm on the fence about it. It does make
> parsing maybe a little easier.

I would like the description to be required.
I don't have a strong opinion on the dash.  I'm OK with either one (dash
or no dash), but we should make kselftest and KUnit consistent.

> 
> > Finally, it is possible to use a test directive to indicate another
> > possible outcome for a test: that it was skipped.  To report that
> > a test case was skipped, the result line should start with the
> > result "not ok", and the directive "# SKIP" should be placed after
> > the test description. (Note that this deviates from the TAP13
> > specification).
> 
> This is what TAP14 changed, I think (i.e. directive follows description
> now).
> 
> >
> > A test may be skipped for a variety of reasons, ranging for
> > insufficient privileges to missing features or resources required
> > to execute that test case.
> >
> > It is usually helpful if a diagnostic message is emitted to explain
> > the reasons for the skip.  If the message is a single line and is
> > short, the diagnostic message may be placed after the '# SKIP'
> > directive on the same line as the test result.  However, if it is
> > not on the test result line, it should precede the test line (see
> > diagnostic data, next).
> >
> > Diagnostic data
> > ---------------
> > Diagnostic data is text that reports on test conditions or test
> > operations, or that explains test results.  In the kernel test
> > result format, diagnostic data is placed on lines that start with a
> > hash sign, followed by a space ('# ').
> >
> > One special format of diagnostic data is a test identification line,
> > that has the fully qualified name of a test case.  Such a test
> > identification line marks the start of test output for a test case.
> >
> > In the example above, there are three lines that start with '#'
> > which precede the test result line:
> > 	# selftests: cpufreq: main.sh
> > 	# pid 8101's current affinity mask: fff
> > 	# pid 8101's new affinity mask: 1
> > These are used to indicate diagnostic data for the test case
> > 'selftests: cpufreq: main.sh'
> >
> > Material in comments between the identification line and the test
> > result line are diagnostic data that can help to interpret the
> > results of the test.
> >
> > The TAP specification indicates that automated test harnesses may
> > ignore any line that is not one of the mandatory prescribed lines
> > (that is, the output format version line, the plan line, a test
> > result line, or a "Bail out!" line.)
> >
> > Bail out!
> > ---------
> > If a line in the test output starts with 'Bail out!', it indicates
> > that the test was aborted for some reason.  It indicates that
> > the test is unable to proceed, and no additional tests will be
> > performed.
> >
> > This can be used at the very beginning of a test, or anywhere in the
> > middle of the test, to indicate that the test can not continue.
> 
> I think the required syntax should be:
> 
> Bail out! <reason>
> 
> And to make it clear that this is optionally used to indicate an early
> abort. (Though with a leading plan line, a parser should be able to
> determine this on its own.)
> 
> > --- from here on is not-yet-organized material
> >
> > Tip:
> >  - don't change the test plan based on skipped tests.
> >    - it is better to report that a test case was skipped, than to
> >      not report it
> >    - that is, don't adjust the number of test cases based on skipped
> >      tests
> 
> Yes, totally.
> 
> > Other things to mention:
> > TAP13 elements not used:
> >  - yaml for diagnostic messages
> >    - reason: try to keep things line-based, since output from other things
> >    may be interspersed with messages from the test itself
> 
> Agreed: the yaml extension is not sensible for our use.
> 
> >  - TODO directive
> 
> Agreed: SKIP should cover everything TODO does.
> 
> > KTAP Extensions beyond TAP13:
> >  - nesting
> 
> (I would call this 'subtests')
Sounds good.  Will do.

> 
> >    - via indentation
> >      - indentation makes it easier for humans to read
> 
> And allows for separable parsing of subtests.
Agree.  I'll try to work that into the doc.

> 
> >  - test identifier
> >     - multiple parts, separated by ':'
> 
> This is interesting... is the goal to be able to report test status over
> time by name?
Yes.  KernelCI and Fuego have the notions of a testcase namespace
hierarchy.  As the number of tests expands it is helpful to have
the name-space for a sub-test be limited, just like a filesystem hierarchy
provides scope for the names of objects (directories and files) that
it contains.

> 
> >  - summary lines
> >    - can be skipped by CI systems that do their own calculations
> 
> Right -- I think per-test summary line should be included for the humans
> reading a single test (which may scroll off the screen).
> 
> > Other notes:
> >  - automatic assignment of result status based on exit code
> 
> This is, I think, a matter for the kselftest running infrastructure, not
> the KTAP output?
Agreed.  This doesn't have anything to do with the API between
the tests and the results processor.  I'll take it out.
> 
> > Tips:
> >  - do NOT describe the result in the test line
> >    - the test case description should be the same whether the test
> >      succeeds or fails
> >    - use diagnostic lines to describe or explain results, if this is
> >      desirable
> 
> Right.
> 
> >  - test numbers are considered harmful
> >    - test harnesses should use the test description as the identifier
> >    - test numbers change when testcases are added or removed
> >      - which means that results can't be compared between different
> >        versions of the test
> 
> Right.
> 
> >  - recommendations for diagnostic messages:
> >    - reason for failure
> >    - reason for skip
> >    - diagnostic data should always preceding the result line
> >      - problem: harness may emit result before test can do assessment
> >        to determine reason for result
> >      - this is what the kernel uses
> 
> Right.
> 
> > Differences between kernel test result format and TAP13:
> >  - in KTAP the "# SKIP" directive is placed after the description on
> >    the test result line
> 
> Right, this is the same as TAP14, IIRC. KTAP's big deltas are the "#"
> diagnostic lines and the subtest handling.
> 
> > ====== start of ktap-doc-rfc.txt ======
> > OK - that's the end of the RFC doc.
> >
> > Here are a few questions:
> >  - is this document desired or not?
> 
> Yes.
Great.  I'll make this a priority to work on.

> 
> >     - is it too long or too short?
> 
> Should be slightly longer: more examples.
> 
> >  - if the document is desired, where should it be placed?
> >    I assume somewhere under Documentation, and put into
> >    .rst format. Suggestions for a name and location are welcome.
> 
> Yes Documentation/*.rst Not sure on name yet, but where do kselftest
> docs live? :)
Documentation/dev-tools/kselftest.rst

I'll put this at: Documentation/dev-tools/test-results-format.rst

> 
> >  - is this document accurate?
> >    I think KUNIT does a few things differently than this description.
> 
> Let's fix it. :)
> 
> >    - is the intent to have kunit and kselftest have the same output format?
> >       if so, then these should be rationalized.
> 
> Yes please.
> 
> > Finally,
> >   - Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
> > See https://testanything.org/tap-version-13-specification.html
> 
> Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it
> did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP
> relate? Neither SKIP nor XFAIL count toward failure, though, so both
> should be "ok"? I guess we should change it to "ok".

I have the same initial impression.  In my mind, a skip is "not ok", since
the test didn't run. However, a SKIP and should be treated differently
from either "ok" or "not ok" by the results interpreter, so I don't think it
matters.  Originally I was averse to changing the SKIP result to "ok"
(as suggested by Paulo Bonzini [1]), but I checked and it's pretty trivial to
change the parser in Fuego, and it would make the kernel results format
match the TAP13 spec.  I don't see a strong reason for us to be different
from TAP13 here.

I raised this issue on our automated testing conference call last week
(which includes people from the CKI, Fuego, KernelCI and LKFT projects), and
so people should be chiming in if their parser will have a problem with this change.)

[1]  https://lkml.kernel.org/lkml/20200610154447.15826-1-pbonzini@redhat.com/T/

Thanks very much for the feedback.
 -- Tim


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-15 19:07   ` Bird, Tim
@ 2020-06-16 12:08     ` Paolo Bonzini
  2020-06-16 16:42       ` Bird, Tim
  2020-06-19 17:58       ` Frank Rowand
  2020-06-16 23:52     ` Kees Cook
  2020-06-19 19:49     ` Frank Rowand
  2 siblings, 2 replies; 43+ messages in thread
From: Paolo Bonzini @ 2020-06-16 12:08 UTC (permalink / raw)
  To: Bird, Tim, Kees Cook
  Cc: shuah, linux-kselftest, Brendan Higgins, linux-kernel, David Gow

On 15/06/20 21:07, Bird, Tim wrote:
>> Note: making the plan line required differs from TAP13 and TAP14. I
>> think it's the right choice, but we should be clear.

As an aside, where is TAP14?

> With regards to making it optional or not, I don't have a strong
> preference.  The extra info seems helpful in some circumstances.
> I don't know if it's too onerous to make it a requirement or not.
> I'd prefer if it was always there (either at the beginning or the end),
> but if there is some situation where it's quite difficult to calculate,
> then it would be best not to mandate it. I can't think of any impossible
> situations at the moment.

I think making the plan mandatory is a good idea.  "Late plans" work
very well for cases where you cannot know in advance the number of tests
(for example in filters that produce TAP from other output), and provide
an additional safety net.

>> "Bail out!" to be moved to "optional" elements, since it may not appear.
>> And we should clarify TAP13 and TAP14's language to say it should only
>> appear when the test is aborting without running later tests -- for this
>> reason, I think the optional "description" following "Bail out!" should
>> be made required. I.e. it must be: "Bail out! $reason"
> 
> I'll make sure this is listed as optional.
> I like adding a mandatory reason.

+1.

>> TAP13/14 makes description optional, are we making it required (I think
>> we should). There seems to be a TAP13/14 "convention" of starting
>> <description> with "- ", which I'm on the fence about it. It does make
>> parsing maybe a little easier.
> 
> I would like the description to be required.
> I don't have a strong opinion on the dash.  I'm OK with either one (dash
> or no dash), but we should make kselftest and KUnit consistent.

I think no mandatory dash is better (or even mandatory no-dash!).  We
can suggest removing it when formatting TAP output.

>>> Finally, it is possible to use a test directive to indicate another
>>> possible outcome for a test: that it was skipped.  To report that
>>> a test case was skipped, the result line should start with the
>>> result "not ok", and the directive "# SKIP" should be placed after
>>> the test description. (Note that this deviates from the TAP13
>>> specification).

How so?  The description comes first, but there can be a description of
the directive.

     not ok 4 - Summarized correctly # TODO Not written yet

>>> It is usually helpful if a diagnostic message is emitted to explain
>>> the reasons for the skip.  If the message is a single line and is
>>> short, the diagnostic message may be placed after the '# SKIP'
>>> directive on the same line as the test result.  However, if it is
>>> not on the test result line, it should precede the test line (see
>>> diagnostic data, next).
>>>
>>> Bail out!
>>> ---------
>>> If a line in the test output starts with 'Bail out!', it indicates
>>> that the test was aborted for some reason.  It indicates that
>>> the test is unable to proceed, and no additional tests will be
>>> performed.
>>>
>>> This can be used at the very beginning of a test, or anywhere in the
>>> middle of the test, to indicate that the test can not continue.
>>
>> I think the required syntax should be:
>>
>> Bail out! <reason>
>>
>> And to make it clear that this is optionally used to indicate an early
>> abort. (Though with a leading plan line, a parser should be able to
>> determine this on its own.)

True.  However, "Bail out!" allow to distinguish issues with the harness
(such as ENOSPC) from test aborts.

>>>  - TODO directive
>>
>> Agreed: SKIP should cover everything TODO does.

XFAIL/XPASS are different from SKIP.  I personally don't have a need for
them, but kselftests includes XFAIL/XPASS exit codes and they aren't
reflected into selftests/kselftest/runner.sh.

Likewise, kselftest.h has ksft_inc_xfail_cnt but not
ksft_test_result_xfail/ksft_test_result_xpass.

It's important to notice in the spec that the TODO directive inverts the
direction of ok/not-ok (i.e. XFAIL, the "good" result, is represented by
"not ok # TODO").

>>>  - test identifier
>>>     - multiple parts, separated by ':'
>>
>> This is interesting... is the goal to be able to report test status over
>> time by name?

What about "/" instead?

>>> Finally,
>>>   - Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
>>> See https://testanything.org/tap-version-13-specification.html
>>
>> Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it
>> did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP
>> relate? Neither SKIP nor XFAIL count toward failure, though, so both
>> should be "ok"? I guess we should change it to "ok".

See above for XFAIL.

I initially raised the issue with "SKIP" because I have a lot of tests
that depend on hardware availability---for example, a test that does not
run on some processor kinds (e.g. on AMD, or old Intel)---and for those
SKIP should be considered a success.

Paolo

> I have the same initial impression.  In my mind, a skip is "not ok", since
> the test didn't run. However, a SKIP and should be treated differently
> from either "ok" or "not ok" by the results interpreter, so I don't think it
> matters.  Originally I was averse to changing the SKIP result to "ok"
> (as suggested by Paulo Bonzini [1]), but I checked and it's pretty trivial to
> change the parser in Fuego, and it would make the kernel results format
> match the TAP13 spec.  I don't see a strong reason for us to be different
> from TAP13 here.
> 
> I raised this issue on our automated testing conference call last week
> (which includes people from the CKI, Fuego, KernelCI and LKFT projects), and
> so people should be chiming in if their parser will have a problem with this change.)
> 
> [1]  https://lkml.kernel.org/lkml/20200610154447.15826-1-pbonzini@redhat.com/T/
> 
> Thanks very much for the feedback.
>  -- Tim
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: RFC - kernel selftest result documentation (KTAP)
  2020-06-16 12:08     ` Paolo Bonzini
@ 2020-06-16 16:42       ` Bird, Tim
  2020-06-16 19:44         ` Brendan Higgins
  2020-06-19 18:33         ` Frank Rowand
  2020-06-19 17:58       ` Frank Rowand
  1 sibling, 2 replies; 43+ messages in thread
From: Bird, Tim @ 2020-06-16 16:42 UTC (permalink / raw)
  To: Paolo Bonzini, Kees Cook
  Cc: shuah, linux-kselftest, Brendan Higgins, linux-kernel, David Gow



> -----Original Message-----
> From: Paolo Bonzini <pbonzini@redhat.com>
> 
> On 15/06/20 21:07, Bird, Tim wrote:
> >> Note: making the plan line required differs from TAP13 and TAP14. I
> >> think it's the right choice, but we should be clear.
> 
> As an aside, where is TAP14?
By TAP14, I was referring to the current, undocumented, KUnit
conventions.

> 
> > With regards to making it optional or not, I don't have a strong
> > preference.  The extra info seems helpful in some circumstances.
> > I don't know if it's too onerous to make it a requirement or not.
> > I'd prefer if it was always there (either at the beginning or the end),
> > but if there is some situation where it's quite difficult to calculate,
> > then it would be best not to mandate it. I can't think of any impossible
> > situations at the moment.
> 
> I think making the plan mandatory is a good idea.  "Late plans" work
> very well for cases where you cannot know in advance the number of tests
> (for example in filters that produce TAP from other output), and provide
> an additional safety net.
> 
> >> "Bail out!" to be moved to "optional" elements, since it may not appear.
> >> And we should clarify TAP13 and TAP14's language to say it should only
> >> appear when the test is aborting without running later tests -- for this
> >> reason, I think the optional "description" following "Bail out!" should
> >> be made required. I.e. it must be: "Bail out! $reason"
> >
> > I'll make sure this is listed as optional.
> > I like adding a mandatory reason.
> 
> +1.
> 
> >> TAP13/14 makes description optional, are we making it required (I think
> >> we should). There seems to be a TAP13/14 "convention" of starting
> >> <description> with "- ", which I'm on the fence about it. It does make
> >> parsing maybe a little easier.
> >
> > I would like the description to be required.
> > I don't have a strong opinion on the dash.  I'm OK with either one (dash
> > or no dash), but we should make kselftest and KUnit consistent.
> 
> I think no mandatory dash is better (or even mandatory no-dash!).  We
> can suggest removing it when formatting TAP output.

My personal preference is to have the dash.  I think it's more human readable.
I note that the TAP spec has examples of result lines both with and without
the dash, so even the spec is ambiguous on this.   I think not mandating it
either way is probably best.  For regex parsers, it's easy to ignore with '[-]?'
outside the pattern groups that grab the number and description.

> 
> >>> Finally, it is possible to use a test directive to indicate another
> >>> possible outcome for a test: that it was skipped.  To report that
> >>> a test case was skipped, the result line should start with the
> >>> result "not ok", and the directive "# SKIP" should be placed after
> >>> the test description. (Note that this deviates from the TAP13
> >>> specification).
> 
> How so?  The description comes first, but there can be a description of
> the directive.
None of the examples of skips in the TAP13 spec have a test descriptions before
the '# SKIP' directive.  But maybe I read too much into the examples. There is a
format example, and a list of items in a result line that both have the test description
before the directive.  So maybe I read this wrong.

> 
>      not ok 4 - Summarized correctly # TODO Not written yet
> 
> >>> It is usually helpful if a diagnostic message is emitted to explain
> >>> the reasons for the skip.  If the message is a single line and is
> >>> short, the diagnostic message may be placed after the '# SKIP'
> >>> directive on the same line as the test result.  However, if it is
> >>> not on the test result line, it should precede the test line (see
> >>> diagnostic data, next).
> >>>
> >>> Bail out!
> >>> ---------
> >>> If a line in the test output starts with 'Bail out!', it indicates
> >>> that the test was aborted for some reason.  It indicates that
> >>> the test is unable to proceed, and no additional tests will be
> >>> performed.
> >>>
> >>> This can be used at the very beginning of a test, or anywhere in the
> >>> middle of the test, to indicate that the test can not continue.
> >>
> >> I think the required syntax should be:
> >>
> >> Bail out! <reason>
> >>
> >> And to make it clear that this is optionally used to indicate an early
> >> abort. (Though with a leading plan line, a parser should be able to
> >> determine this on its own.)
> 
> True.  However, "Bail out!" allow to distinguish issues with the harness
> (such as ENOSPC) from test aborts.
> 
> >>>  - TODO directive
> >>
> >> Agreed: SKIP should cover everything TODO does.
> 
> XFAIL/XPASS are different from SKIP.  I personally don't have a need for
> them, but kselftests includes XFAIL/XPASS exit codes and they aren't
> reflected into selftests/kselftest/runner.sh.
> 
> Likewise, kselftest.h has ksft_inc_xfail_cnt but not
> ksft_test_result_xfail/ksft_test_result_xpass.
> 
> It's important to notice in the spec that the TODO directive inverts the
> direction of ok/not-ok (i.e. XFAIL, the "good" result, is represented by
> "not ok # TODO").

The TAP13 spec is not explicit about the result for TODO (and only provides
one example), but the text *does* say a TODO can represent a bug to be fixed.
This makes it the equivalent of XFAIL.  I hadn't noticed this before.  Thanks.

> 
> >>>  - test identifier
> >>>     - multiple parts, separated by ':'
> >>
> >> This is interesting... is the goal to be able to report test status over
> >> time by name?
> 
> What about "/" instead?
In my experience, / is used in a lot of test descriptions (when quoting
file paths) (not in kselftest, but in lots of other tests).  Both Fuego
and KernelCI use colons, and that's what kselftest already uses,
so it seems like a good choice.

> 
> >>> Finally,
> >>>   - Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
> >>> See https://testanything.org/tap-version-13-specification.html
> >>
> >> Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it
> >> did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP
> >> relate? Neither SKIP nor XFAIL count toward failure, though, so both
> >> should be "ok"? I guess we should change it to "ok".
> 
> See above for XFAIL.
> 
> I initially raised the issue with "SKIP" because I have a lot of tests
> that depend on hardware availability---for example, a test that does not
> run on some processor kinds (e.g. on AMD, or old Intel)---and for those
> SKIP should be considered a success.
> 
> Paolo
> 
> > I have the same initial impression.  In my mind, a skip is "not ok", since
> > the test didn't run. However, a SKIP and should be treated differently
> > from either "ok" or "not ok" by the results interpreter, so I don't think it
> > matters.  Originally I was averse to changing the SKIP result to "ok"
> > (as suggested by Paulo Bonzini [1]), but I checked and it's pretty trivial to
> > change the parser in Fuego, and it would make the kernel results format
> > match the TAP13 spec.  I don't see a strong reason for us to be different
> > from TAP13 here.
> >
> > I raised this issue on our automated testing conference call last week
> > (which includes people from the CKI, Fuego, KernelCI and LKFT projects), and
> > so people should be chiming in if their parser will have a problem with this change.)
> >
> > [1]  https://lkml.kernel.org/lkml/20200610154447.15826-1-pbonzini@redhat.com/T/
> >
> > Thanks very much for the feedback.
> >  -- Tim
> >


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-16 16:42       ` Bird, Tim
@ 2020-06-16 19:44         ` Brendan Higgins
  2020-06-16 20:30           ` Bird, Tim
  2020-06-16 23:58           ` Kees Cook
  2020-06-19 18:33         ` Frank Rowand
  1 sibling, 2 replies; 43+ messages in thread
From: Brendan Higgins @ 2020-06-16 19:44 UTC (permalink / raw)
  To: Bird, Tim
  Cc: Paolo Bonzini, Kees Cook, shuah, linux-kselftest, linux-kernel,
	David Gow

On Tue, Jun 16, 2020 at 9:42 AM Bird, Tim <Tim.Bird@sony.com> wrote:

Apologies for taking so long to get to this. I have been busy with
some stuff internally at Google.

> > -----Original Message-----
> > From: Paolo Bonzini <pbonzini@redhat.com>
> >
> > On 15/06/20 21:07, Bird, Tim wrote:
> > >> Note: making the plan line required differs from TAP13 and TAP14. I
> > >> think it's the right choice, but we should be clear.
> >
> > As an aside, where is TAP14?
> By TAP14, I was referring to the current, undocumented, KUnit
> conventions.

Not so. TAP14 is the proposed next version of TAP13:

https://github.com/TestAnything/testanything.github.io/pull/36
https://github.com/isaacs/testanything.github.io/blob/tap14/tap-version-14-specification.md

Based on the discussion, it seems like most of the things we wanted
from TAP14 would probably make it in if TAP ever accepts another pull
request.

Our only real extension is how we print out a violated exception/assertion.

> > > With regards to making it optional or not, I don't have a strong
> > > preference.  The extra info seems helpful in some circumstances.
> > > I don't know if it's too onerous to make it a requirement or not.
> > > I'd prefer if it was always there (either at the beginning or the end),
> > > but if there is some situation where it's quite difficult to calculate,
> > > then it would be best not to mandate it. I can't think of any impossible
> > > situations at the moment.
> >
> > I think making the plan mandatory is a good idea.  "Late plans" work
> > very well for cases where you cannot know in advance the number of tests
> > (for example in filters that produce TAP from other output), and provide
> > an additional safety net.
> >
> > >> "Bail out!" to be moved to "optional" elements, since it may not appear.
> > >> And we should clarify TAP13 and TAP14's language to say it should only
> > >> appear when the test is aborting without running later tests -- for this
> > >> reason, I think the optional "description" following "Bail out!" should
> > >> be made required. I.e. it must be: "Bail out! $reason"
> > >
> > > I'll make sure this is listed as optional.
> > > I like adding a mandatory reason.
> >
> > +1.
> >
> > >> TAP13/14 makes description optional, are we making it required (I think
> > >> we should). There seems to be a TAP13/14 "convention" of starting
> > >> <description> with "- ", which I'm on the fence about it. It does make
> > >> parsing maybe a little easier.
> > >
> > > I would like the description to be required.
> > > I don't have a strong opinion on the dash.  I'm OK with either one (dash
> > > or no dash), but we should make kselftest and KUnit consistent.
> >
> > I think no mandatory dash is better (or even mandatory no-dash!).  We
> > can suggest removing it when formatting TAP output.
>
> My personal preference is to have the dash.  I think it's more human readable.
> I note that the TAP spec has examples of result lines both with and without
> the dash, so even the spec is ambiguous on this.   I think not mandating it
> either way is probably best.  For regex parsers, it's easy to ignore with '[-]?'
> outside the pattern groups that grab the number and description.

I don't think we care, because we don't use it.

> >
> > >>> Finally, it is possible to use a test directive to indicate another
> > >>> possible outcome for a test: that it was skipped.  To report that
> > >>> a test case was skipped, the result line should start with the
> > >>> result "not ok", and the directive "# SKIP" should be placed after
> > >>> the test description. (Note that this deviates from the TAP13
> > >>> specification).
> >
> > How so?  The description comes first, but there can be a description of
> > the directive.
> None of the examples of skips in the TAP13 spec have a test descriptions before
> the '# SKIP' directive.  But maybe I read too much into the examples. There is a
> format example, and a list of items in a result line that both have the test description
> before the directive.  So maybe I read this wrong.
>
> >
> >      not ok 4 - Summarized correctly # TODO Not written yet
> >
> > >>> It is usually helpful if a diagnostic message is emitted to explain
> > >>> the reasons for the skip.  If the message is a single line and is
> > >>> short, the diagnostic message may be placed after the '# SKIP'
> > >>> directive on the same line as the test result.  However, if it is
> > >>> not on the test result line, it should precede the test line (see
> > >>> diagnostic data, next).
> > >>>
> > >>> Bail out!
> > >>> ---------
> > >>> If a line in the test output starts with 'Bail out!', it indicates
> > >>> that the test was aborted for some reason.  It indicates that
> > >>> the test is unable to proceed, and no additional tests will be
> > >>> performed.
> > >>>
> > >>> This can be used at the very beginning of a test, or anywhere in the
> > >>> middle of the test, to indicate that the test can not continue.
> > >>
> > >> I think the required syntax should be:
> > >>
> > >> Bail out! <reason>
> > >>
> > >> And to make it clear that this is optionally used to indicate an early
> > >> abort. (Though with a leading plan line, a parser should be able to
> > >> determine this on its own.)
> >
> > True.  However, "Bail out!" allow to distinguish issues with the harness
> > (such as ENOSPC) from test aborts.
> >
> > >>>  - TODO directive
> > >>
> > >> Agreed: SKIP should cover everything TODO does.
> >
> > XFAIL/XPASS are different from SKIP.  I personally don't have a need for
> > them, but kselftests includes XFAIL/XPASS exit codes and they aren't
> > reflected into selftests/kselftest/runner.sh.
> >
> > Likewise, kselftest.h has ksft_inc_xfail_cnt but not
> > ksft_test_result_xfail/ksft_test_result_xpass.
> >
> > It's important to notice in the spec that the TODO directive inverts the
> > direction of ok/not-ok (i.e. XFAIL, the "good" result, is represented by
> > "not ok # TODO").
>
> The TAP13 spec is not explicit about the result for TODO (and only provides
> one example), but the text *does* say a TODO can represent a bug to be fixed.
> This makes it the equivalent of XFAIL.  I hadn't noticed this before.  Thanks.
>
> >
> > >>>  - test identifier
> > >>>     - multiple parts, separated by ':'
> > >>
> > >> This is interesting... is the goal to be able to report test status over
> > >> time by name?
> >
> > What about "/" instead?
> In my experience, / is used in a lot of test descriptions (when quoting
> file paths) (not in kselftest, but in lots of other tests).  Both Fuego
> and KernelCI use colons, and that's what kselftest already uses,
> so it seems like a good choice.
>
> >
> > >>> Finally,
> > >>>   - Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
> > >>> See https://testanything.org/tap-version-13-specification.html
> > >>
> > >> Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it
> > >> did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP
> > >> relate? Neither SKIP nor XFAIL count toward failure, though, so both
> > >> should be "ok"? I guess we should change it to "ok".
> >
> > See above for XFAIL.
> >
> > I initially raised the issue with "SKIP" because I have a lot of tests
> > that depend on hardware availability---for example, a test that does not
> > run on some processor kinds (e.g. on AMD, or old Intel)---and for those
> > SKIP should be considered a success.
> >
> > Paolo
> >
> > > I have the same initial impression.  In my mind, a skip is "not ok", since
> > > the test didn't run. However, a SKIP and should be treated differently
> > > from either "ok" or "not ok" by the results interpreter, so I don't think it
> > > matters.  Originally I was averse to changing the SKIP result to "ok"
> > > (as suggested by Paulo Bonzini [1]), but I checked and it's pretty trivial to
> > > change the parser in Fuego, and it would make the kernel results format
> > > match the TAP13 spec.  I don't see a strong reason for us to be different
> > > from TAP13 here.
> > >
> > > I raised this issue on our automated testing conference call last week
> > > (which includes people from the CKI, Fuego, KernelCI and LKFT projects), and
> > > so people should be chiming in if their parser will have a problem with this change.)
> > >
> > > [1]  https://lkml.kernel.org/lkml/20200610154447.15826-1-pbonzini@redhat.com/T/
> > >
> > > Thanks very much for the feedback.
> > >  -- Tim
> > >
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-15 17:34   ` Bird, Tim
@ 2020-06-16 20:03     ` Brendan Higgins
  2020-06-16 20:37       ` Bird, Tim
  2020-06-19 18:17       ` Frank Rowand
  0 siblings, 2 replies; 43+ messages in thread
From: Brendan Higgins @ 2020-06-16 20:03 UTC (permalink / raw)
  To: Bird, Tim; +Cc: David Gow, shuah, linux-kselftest, linux-kernel, Kees Cook

On Mon, Jun 15, 2020 at 10:34 AM Bird, Tim <Tim.Bird@sony.com> wrote:
>
> > -----Original Message-----
> > From: David Gow <davidgow@google.com>
> >
> > On Thu, Jun 11, 2020 at 2:11 AM Bird, Tim <Tim.Bird@sony.com> wrote:
[...]
> > KUnit is currently outputting "TAP version 14", as we were hoping some
> > of our changes would get into the TAP14 spec. (Any comments, Brendan?)
> > Maybe this should end up saying "KTAP version 1" or something?
>
> I don't know if this will break any existing results parsers or not.
> I hesitate to use "TAP version 14", as TAP appears to be a dormant
> initiative at the moment, and there's no guarantee that the kernel's
> changes will get adopted into an official spec.

We were using "TAP version 14" since the "extensions" we are using
were all proposed among the TAP people to go into the next version of
TAP. Based on discussions among them they seem to like the subtest
stuff:

https://github.com/TestAnything/testanything.github.io/pull/36

Anyway, I can still appreciate that they might change their minds.

> If we are a strict super-set of TAP, then I suppose we could just
> start using TAP version 14, and unilaterally declare that our changes
> make a new spec.  But since we don't control the web site this feels
> like a hostile takeover.

I just thought it was okay because it was already in their proposed
TAP14 spec, but yeah, if we just decide amongst ourselves to use it,
we should probably do something else.

> I'm most comfortable with calling our thing KTAP, and just
> referencing TAP as inspiration.  I don't have a strong opinion on

I am cool with that.

> KTAP vs TAP, but I do feel strongly that kselftest and kunit should use the
> same version line (if we can get them to use the same conventions).

Yeah, I agree: it would be better if there was just one version of
(K)TAP in the kernel.

> > > Test plan line
> > > --------------
> > > The test plan indicates the number of individual test cases intended to
> > > be executed by the test. It always starts with "1.." and is followed
> > > by the number of tests cases.  In the example above, 1..1", indicates
> > > that this test reports only 1 test case.
> > >
> > > The test plan line can be placed in two locations:
> > >  * the second line of test output, when the number of test cases is known
> > >    in advance
> > >  * as the last line of test output, when the number of test cases is not
> > >    known in advance.
> > >
> > > Most often, the number of test cases is known in advance, and the test plan
> > > line appears as the second line of test output, immediately following
> > > the output version line.  The number of test cases might not be known
> > > in advance if the number of tests is calculated from runtime data.
> > > In this case, the test plan line is emitted as the last line of test
> > > output.
> >
> > KUnit is currently including the test plan line only for subtests, as
> > the current version doesn't actually know how many test suites will
> > run in advance.
> > This is something there's work underway to fix, though.
> Sounds good.  You can just put the line at the bottom if it's
> obnoxious to calculate ahead of time.

I thought that is not in the TAP spec?

I kind of like printing out ahead of time how many tests we expect to
run, so if we crash we know how many tests weren't run.

In any case, until we get the change in that David is referencing, we
cannot print out the test plan for the "super test" before or after
because KUnit doesn't know when it is "done". So moving it to the
bottom doesn't really help us.

> Does this mean that KUnit treats each sub-test as an individual test case
> of the "super-test"?

Yes.

At the top level, we have all test suites. Each subtest in TAP is a
test suite in KUnit. Each case in each subtest in TAP is a test case
in KUnit.

> In results summaries for a super-test, are all sub-test cases counted,
> or just the list of sub-tests?

Just the sub-tests. Each subtest is responsible for counting it's own cases:

https://github.com/isaacs/testanything.github.io/blob/tap14/tap-version-14-specification.md#subtests

Cheers

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: RFC - kernel selftest result documentation (KTAP)
  2020-06-16 19:44         ` Brendan Higgins
@ 2020-06-16 20:30           ` Bird, Tim
  2020-06-16 23:58           ` Kees Cook
  1 sibling, 0 replies; 43+ messages in thread
From: Bird, Tim @ 2020-06-16 20:30 UTC (permalink / raw)
  To: Brendan Higgins
  Cc: Paolo Bonzini, Kees Cook, shuah, linux-kselftest, linux-kernel,
	David Gow



> -----Original Message-----
> From: Brendan Higgins
> 
> On Tue, Jun 16, 2020 at 9:42 AM Bird, Tim <Tim.Bird@sony.com> wrote:
> 
> Apologies for taking so long to get to this. I have been busy with
> some stuff internally at Google.
> 
> > > -----Original Message-----
> > > From: Paolo Bonzini <pbonzini@redhat.com>
> > >
> > > On 15/06/20 21:07, Bird, Tim wrote:
> > > >> Note: making the plan line required differs from TAP13 and TAP14. I
> > > >> think it's the right choice, but we should be clear.
> > >
> > > As an aside, where is TAP14?
> > By TAP14, I was referring to the current, undocumented, KUnit
> > conventions.
> 
> Not so. TAP14 is the proposed next version of TAP13:
> 
> https://github.com/TestAnything/testanything.github.io/pull/36
> https://github.com/isaacs/testanything.github.io/blob/tap14/tap-version-14-specification.md
> 

Thanks!  I thought there was a draft spec somewhere that hadn't been
pulled or approved or whatever.  I searched but couldn't find it.
Thanks for the links.

> Based on the discussion, it seems like most of the things we wanted
> from TAP14 would probably make it in if TAP ever accepts another pull
> request.
> 
> Our only real extension is how we print out a violated exception/assertion.

I guess I need to examine the TAP14 spec and see what it's got in it.
I didn't realize it had subtest nesting (it's been literally years since I
read it, and I didn't realize that KUnit was based on it).

> > > > With regards to making it optional or not, I don't have a strong
> > > > preference.  The extra info seems helpful in some circumstances.
> > > > I don't know if it's too onerous to make it a requirement or not.
> > > > I'd prefer if it was always there (either at the beginning or the end),
> > > > but if there is some situation where it's quite difficult to calculate,
> > > > then it would be best not to mandate it. I can't think of any impossible
> > > > situations at the moment.
> > >
> > > I think making the plan mandatory is a good idea.  "Late plans" work
> > > very well for cases where you cannot know in advance the number of tests
> > > (for example in filters that produce TAP from other output), and provide
> > > an additional safety net.
> > >
> > > >> "Bail out!" to be moved to "optional" elements, since it may not appear.
> > > >> And we should clarify TAP13 and TAP14's language to say it should only
> > > >> appear when the test is aborting without running later tests -- for this
> > > >> reason, I think the optional "description" following "Bail out!" should
> > > >> be made required. I.e. it must be: "Bail out! $reason"
> > > >
> > > > I'll make sure this is listed as optional.
> > > > I like adding a mandatory reason.
> > >
> > > +1.
> > >
> > > >> TAP13/14 makes description optional, are we making it required (I think
> > > >> we should). There seems to be a TAP13/14 "convention" of starting
> > > >> <description> with "- ", which I'm on the fence about it. It does make
> > > >> parsing maybe a little easier.
> > > >
> > > > I would like the description to be required.
> > > > I don't have a strong opinion on the dash.  I'm OK with either one (dash
> > > > or no dash), but we should make kselftest and KUnit consistent.
> > >
> > > I think no mandatory dash is better (or even mandatory no-dash!).  We
> > > can suggest removing it when formatting TAP output.
> >
> > My personal preference is to have the dash.  I think it's more human readable.
> > I note that the TAP spec has examples of result lines both with and without
> > the dash, so even the spec is ambiguous on this.   I think not mandating it
> > either way is probably best.  For regex parsers, it's easy to ignore with '[-]?'
> > outside the pattern groups that grab the number and description.
> 
> I don't think we care, because we don't use it.
OK.  If we decide to support both, it would be good to document that.

Thanks,
 -- Tim

> 
> > >
> > > >>> Finally, it is possible to use a test directive to indicate another
> > > >>> possible outcome for a test: that it was skipped.  To report that
> > > >>> a test case was skipped, the result line should start with the
> > > >>> result "not ok", and the directive "# SKIP" should be placed after
> > > >>> the test description. (Note that this deviates from the TAP13
> > > >>> specification).
> > >
> > > How so?  The description comes first, but there can be a description of
> > > the directive.
> > None of the examples of skips in the TAP13 spec have a test descriptions before
> > the '# SKIP' directive.  But maybe I read too much into the examples. There is a
> > format example, and a list of items in a result line that both have the test description
> > before the directive.  So maybe I read this wrong.
> >
> > >
> > >      not ok 4 - Summarized correctly # TODO Not written yet
> > >
> > > >>> It is usually helpful if a diagnostic message is emitted to explain
> > > >>> the reasons for the skip.  If the message is a single line and is
> > > >>> short, the diagnostic message may be placed after the '# SKIP'
> > > >>> directive on the same line as the test result.  However, if it is
> > > >>> not on the test result line, it should precede the test line (see
> > > >>> diagnostic data, next).
> > > >>>
> > > >>> Bail out!
> > > >>> ---------
> > > >>> If a line in the test output starts with 'Bail out!', it indicates
> > > >>> that the test was aborted for some reason.  It indicates that
> > > >>> the test is unable to proceed, and no additional tests will be
> > > >>> performed.
> > > >>>
> > > >>> This can be used at the very beginning of a test, or anywhere in the
> > > >>> middle of the test, to indicate that the test can not continue.
> > > >>
> > > >> I think the required syntax should be:
> > > >>
> > > >> Bail out! <reason>
> > > >>
> > > >> And to make it clear that this is optionally used to indicate an early
> > > >> abort. (Though with a leading plan line, a parser should be able to
> > > >> determine this on its own.)
> > >
> > > True.  However, "Bail out!" allow to distinguish issues with the harness
> > > (such as ENOSPC) from test aborts.
> > >
> > > >>>  - TODO directive
> > > >>
> > > >> Agreed: SKIP should cover everything TODO does.
> > >
> > > XFAIL/XPASS are different from SKIP.  I personally don't have a need for
> > > them, but kselftests includes XFAIL/XPASS exit codes and they aren't
> > > reflected into selftests/kselftest/runner.sh.
> > >
> > > Likewise, kselftest.h has ksft_inc_xfail_cnt but not
> > > ksft_test_result_xfail/ksft_test_result_xpass.
> > >
> > > It's important to notice in the spec that the TODO directive inverts the
> > > direction of ok/not-ok (i.e. XFAIL, the "good" result, is represented by
> > > "not ok # TODO").
> >
> > The TAP13 spec is not explicit about the result for TODO (and only provides
> > one example), but the text *does* say a TODO can represent a bug to be fixed.
> > This makes it the equivalent of XFAIL.  I hadn't noticed this before.  Thanks.
> >
> > >
> > > >>>  - test identifier
> > > >>>     - multiple parts, separated by ':'
> > > >>
> > > >> This is interesting... is the goal to be able to report test status over
> > > >> time by name?
> > >
> > > What about "/" instead?
> > In my experience, / is used in a lot of test descriptions (when quoting
> > file paths) (not in kselftest, but in lots of other tests).  Both Fuego
> > and KernelCI use colons, and that's what kselftest already uses,
> > so it seems like a good choice.
> >
> > >
> > > >>> Finally,
> > > >>>   - Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
> > > >>> See https://testanything.org/tap-version-13-specification.html
> > > >>
> > > >> Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it
> > > >> did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP
> > > >> relate? Neither SKIP nor XFAIL count toward failure, though, so both
> > > >> should be "ok"? I guess we should change it to "ok".
> > >
> > > See above for XFAIL.
> > >
> > > I initially raised the issue with "SKIP" because I have a lot of tests
> > > that depend on hardware availability---for example, a test that does not
> > > run on some processor kinds (e.g. on AMD, or old Intel)---and for those
> > > SKIP should be considered a success.
> > >
> > > Paolo
> > >
> > > > I have the same initial impression.  In my mind, a skip is "not ok", since
> > > > the test didn't run. However, a SKIP and should be treated differently
> > > > from either "ok" or "not ok" by the results interpreter, so I don't think it
> > > > matters.  Originally I was averse to changing the SKIP result to "ok"
> > > > (as suggested by Paulo Bonzini [1]), but I checked and it's pretty trivial to
> > > > change the parser in Fuego, and it would make the kernel results format
> > > > match the TAP13 spec.  I don't see a strong reason for us to be different
> > > > from TAP13 here.
> > > >
> > > > I raised this issue on our automated testing conference call last week
> > > > (which includes people from the CKI, Fuego, KernelCI and LKFT projects), and
> > > > so people should be chiming in if their parser will have a problem with this change.)
> > > >
> > > > [1]  https://lkml.kernel.org/lkml/20200610154447.15826-1-pbonzini@redhat.com/T/
> > > >
> > > > Thanks very much for the feedback.
> > > >  -- Tim
> > > >
> >

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: RFC - kernel selftest result documentation (KTAP)
  2020-06-16 20:03     ` Brendan Higgins
@ 2020-06-16 20:37       ` Bird, Tim
  2020-06-17  0:02         ` Kees Cook
  2020-06-19 19:32         ` Brendan Higgins
  2020-06-19 18:17       ` Frank Rowand
  1 sibling, 2 replies; 43+ messages in thread
From: Bird, Tim @ 2020-06-16 20:37 UTC (permalink / raw)
  To: Brendan Higgins
  Cc: David Gow, shuah, linux-kselftest, linux-kernel, Kees Cook,
	Paolo Bonzini

> -----Original Message-----
> From: Brendan Higgins <brendanhiggins@google.com>
> 
> On Mon, Jun 15, 2020 at 10:34 AM Bird, Tim <Tim.Bird@sony.com> wrote:
> >
> > > -----Original Message-----
> > > From: David Gow <davidgow@google.com>
> > >
> > > On Thu, Jun 11, 2020 at 2:11 AM Bird, Tim <Tim.Bird@sony.com> wrote:
> [...]
> > > KUnit is currently outputting "TAP version 14", as we were hoping some
> > > of our changes would get into the TAP14 spec. (Any comments, Brendan?)
> > > Maybe this should end up saying "KTAP version 1" or something?
> >
> > I don't know if this will break any existing results parsers or not.
> > I hesitate to use "TAP version 14", as TAP appears to be a dormant
> > initiative at the moment, and there's no guarantee that the kernel's
> > changes will get adopted into an official spec.
> 
> We were using "TAP version 14" since the "extensions" we are using
> were all proposed among the TAP people to go into the next version of
> TAP. Based on discussions among them they seem to like the subtest
> stuff:
> 
> https://github.com/TestAnything/testanything.github.io/pull/36
> 
> Anyway, I can still appreciate that they might change their minds.
I don't know if there's anyone left around to change their mind.

I have to agree with isaacs (who proposed TAP14 5 years ago),
that the TAP specification is in a weird state.

https://github.com/TestAnything/testanything.github.io/pull/36#issuecomment-566321633

There have been commits to the github respository by a few different
people recently (3 commits in the last 9 months).  But there's no body to approve or
disapprove of a new spec.

> 
> > If we are a strict super-set of TAP, then I suppose we could just
> > start using TAP version 14, and unilaterally declare that our changes
> > make a new spec.  But since we don't control the web site this feels
> > like a hostile takeover.
> 
> I just thought it was okay because it was already in their proposed
> TAP14 spec, but yeah, if we just decide amongst ourselves to use it,
> we should probably do something else.
> 
> > I'm most comfortable with calling our thing KTAP, and just
> > referencing TAP as inspiration.  I don't have a strong opinion on
> 
> I am cool with that.
I hate forks, but if we do go with declaring our own fork as KTAP,
then we should probably pull in the TAP14 spec as a starting point.
Since it has no home other than in a pull request, it seems a bit tentative.
We may need to put the spec in the kernel source or something.

I think we're definitely not doing anything with the yaml blocks at
the moment (or maybe ever), so there's one big point of departure.

> 
> > KTAP vs TAP, but I do feel strongly that kselftest and kunit should use the
> > same version line (if we can get them to use the same conventions).
> 
> Yeah, I agree: it would be better if there was just one version of
> (K)TAP in the kernel.

One thing that needs to be rationalized between KUnit and selftest
is the syntax for subtests.  KUnit follows the TAP14 spec, and starts
subtests with indented "# Subtest: name of the child test"
and selftests just indents the output from the child test, so it
starts with indented "TAP version 13".  One issue I have with the
TAP14/KUnit approach is that the output from the child is different
depending on whether the test is called in the context of another
test or not.

In KUnit, is it the parent test or the child test that prints out the
"# Subtest: ..." line?
 -- Tim


> 
> > > > Test plan line
> > > > --------------
> > > > The test plan indicates the number of individual test cases intended to
> > > > be executed by the test. It always starts with "1.." and is followed
> > > > by the number of tests cases.  In the example above, 1..1", indicates
> > > > that this test reports only 1 test case.
> > > >
> > > > The test plan line can be placed in two locations:
> > > >  * the second line of test output, when the number of test cases is known
> > > >    in advance
> > > >  * as the last line of test output, when the number of test cases is not
> > > >    known in advance.
> > > >
> > > > Most often, the number of test cases is known in advance, and the test plan
> > > > line appears as the second line of test output, immediately following
> > > > the output version line.  The number of test cases might not be known
> > > > in advance if the number of tests is calculated from runtime data.
> > > > In this case, the test plan line is emitted as the last line of test
> > > > output.
> > >
> > > KUnit is currently including the test plan line only for subtests, as
> > > the current version doesn't actually know how many test suites will
> > > run in advance.
> > > This is something there's work underway to fix, though.
> > Sounds good.  You can just put the line at the bottom if it's
> > obnoxious to calculate ahead of time.
> 
> I thought that is not in the TAP spec?
> 
> I kind of like printing out ahead of time how many tests we expect to
> run, so if we crash we know how many tests weren't run.
> 
> In any case, until we get the change in that David is referencing, we
> cannot print out the test plan for the "super test" before or after
> because KUnit doesn't know when it is "done". So moving it to the
> bottom doesn't really help us.
> 
> > Does this mean that KUnit treats each sub-test as an individual test case
> > of the "super-test"?
> 
> Yes.
> 
> At the top level, we have all test suites. Each subtest in TAP is a
> test suite in KUnit. Each case in each subtest in TAP is a test case
> in KUnit.
> 
> > In results summaries for a super-test, are all sub-test cases counted,
> > or just the list of sub-tests?
> 
> Just the sub-tests. Each subtest is responsible for counting it's own cases:
> 
> https://github.com/isaacs/testanything.github.io/blob/tap14/tap-version-14-specification.md#subtests
> 
> Cheers

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-10 18:11 RFC - kernel selftest result documentation (KTAP) Bird, Tim
                   ` (2 preceding siblings ...)
  2020-06-14 18:39 ` Kees Cook
@ 2020-06-16 20:48 ` Brendan Higgins
  2020-06-16 21:16   ` Bird, Tim
  2020-06-19 17:13 ` Frank Rowand
  4 siblings, 1 reply; 43+ messages in thread
From: Brendan Higgins @ 2020-06-16 20:48 UTC (permalink / raw)
  To: Bird, Tim; +Cc: shuah, linux-kselftest, linux-kernel

On Wed, Jun 10, 2020 at 06:11:06PM +0000, Bird, Tim wrote:
> Some months ago I started work on a document to formalize how
> kselftest implements the TAP specification.  However, I didn't finish
> that work.  Maybe it's time to do so now.
> 
> kselftest has developed a few differences from the original
> TAP specification, and  some extensions that I believe are worth
> documenting.
> 
> Essentially, we have created our own KTAP (kernel TAP)
> format.  I think it is worth documenting our conventions, in order to
> keep everyone on the same page.
> 
> Below is a partially completed document on my understanding
> of KTAP, based on examination of some of the kselftest test
> output.  I have not reconciled this with the kunit output format,
> which I believe has some differences (which maybe we should
> resolve before we get too far into this).
> 
> I submit the document now, before it is finished, because a patch
> was recently introduced to alter one of the result conventions
> (from SKIP='not ok' to SKIP='ok').
> 
> See the document include inline below
> 
> ====== start of ktap-doc-rfc.txt ======

[...]

> --- from here on is not-yet-organized material
> 
> Tip:
>  - don't change the test plan based on skipped tests.
>    - it is better to report that a test case was skipped, than to
>      not report it
>    - that is, don't adjust the number of test cases based on skipped
>      tests
> 
> Other things to mention:
> TAP13 elements not used:
>  - yaml for diagnostic messages

We talked about this before, but I would like some way to get failed
expectation/assertion information in the test in a consistent machine
parsible way. Currently we do the following:

    # Subtest: example
    1..1
    # example_simple_test: initializing
    # example_simple_test: EXPECTATION FAILED at lib/kunit/kunit-example-test.c:29
    Expected 1 + 1 == 3, but
        1 + 1 == 2
        3 == 3
    not ok 1 - example_simple_test
not ok 5 - example

Technically not TAP compliant, but no one seems to mind. I am okay with
keeping it the way it is, but if we don't want it in the KTAP spec, we
will need some kind of recourse.

>    - reason: try to keep things line-based, since output from other things
>    may be interspersed with messages from the test itself
>  - TODO directive

Is this more of stating a fact or desire? We don't use TODO either, but
it looks like it could be useful.

> KTAP Extensions beyond TAP13:
>  - nesting
>    - via indentation
>      - indentation makes it easier for humans to read
>  - test identifier
>     - multiple parts, separated by ':'

Can you elabroate on this more? I am not sure what you mean.

>  - summary lines
>    - can be skipped by CI systems that do their own calculations
> 
> Other notes:
>  - automatic assignment of result status based on exit code
> 
> Tips:
>  - do NOT describe the result in the test line
>    - the test case description should be the same whether the test
>      succeeds or fails
>    - use diagnostic lines to describe or explain results, if this is
>      desirable
>  - test numbers are considered harmful
>    - test harnesses should use the test description as the identifier
>    - test numbers change when testcases are added or removed
>      - which means that results can't be compared between different
>        versions of the test
>  - recommendations for diagnostic messages:
>    - reason for failure
>    - reason for skip
>    - diagnostic data should always preceding the result line
>      - problem: harness may emit result before test can do assessment
>        to determine reason for result
>      - this is what the kernel uses
> 
> Differences between kernel test result format and TAP13:
>  - in KTAP the "# SKIP" directive is placed after the description on
>    the test result line
> 
> ====== start of ktap-doc-rfc.txt ======
> OK - that's the end of the RFC doc.
> 
> Here are a few questions:
>  - is this document desired or not?
>     - is it too long or too short?
>  - if the document is desired, where should it be placed?

I like it. I don't think we can rely on the TAP people updating their
stuff based on my interactions with them. So having a spec which is
actually maintained would be nice.

Maybe in Documentation/dev-tools/ ?

>    I assume somewhere under Documentation, and put into
>    .rst format. Suggestions for a name and location are welcome.
>  - is this document accurate?
>    I think KUNIT does a few things differently than this description.
>    - is the intent to have kunit and kselftest have the same output format?
>       if so, then these should be rationalized.

Yeah, I think it would be nice if all test frameworks/libraries for the
kernel output tests in the same language.

Cheers

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: RFC - kernel selftest result documentation (KTAP)
  2020-06-16 20:48 ` Brendan Higgins
@ 2020-06-16 21:16   ` Bird, Tim
  2020-06-16 21:19     ` Bird, Tim
                       ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Bird, Tim @ 2020-06-16 21:16 UTC (permalink / raw)
  To: Brendan Higgins
  Cc: shuah, linux-kselftest, linux-kernel, Kees Cook, Paolo Bonzini,
	David Gow



> -----Original Message-----
> From: Brendan Higgins
> 
> On Wed, Jun 10, 2020 at 06:11:06PM +0000, Bird, Tim wrote:
> > Some months ago I started work on a document to formalize how
> > kselftest implements the TAP specification.  However, I didn't finish
> > that work.  Maybe it's time to do so now.
> >
> > kselftest has developed a few differences from the original
> > TAP specification, and  some extensions that I believe are worth
> > documenting.
> >
> > Essentially, we have created our own KTAP (kernel TAP)
> > format.  I think it is worth documenting our conventions, in order to
> > keep everyone on the same page.
> >
> > Below is a partially completed document on my understanding
> > of KTAP, based on examination of some of the kselftest test
> > output.  I have not reconciled this with the kunit output format,
> > which I believe has some differences (which maybe we should
> > resolve before we get too far into this).
> >
> > I submit the document now, before it is finished, because a patch
> > was recently introduced to alter one of the result conventions
> > (from SKIP='not ok' to SKIP='ok').
> >
> > See the document include inline below
> >
> > ====== start of ktap-doc-rfc.txt ======
> 
> [...]
> 
> > --- from here on is not-yet-organized material
> >
> > Tip:
> >  - don't change the test plan based on skipped tests.
> >    - it is better to report that a test case was skipped, than to
> >      not report it
> >    - that is, don't adjust the number of test cases based on skipped
> >      tests
> >
> > Other things to mention:
> > TAP13 elements not used:
> >  - yaml for diagnostic messages
> 
> We talked about this before, but I would like some way to get failed
> expectation/assertion information in the test in a consistent machine
> parsible way. Currently we do the following:
> 
>     # Subtest: example
>     1..1
>     # example_simple_test: initializing
>     # example_simple_test: EXPECTATION FAILED at lib/kunit/kunit-example-test.c:29
>     Expected 1 + 1 == 3, but
>         1 + 1 == 2
>         3 == 3
>     not ok 1 - example_simple_test
> not ok 5 - example
> 
> Technically not TAP compliant, but no one seems to mind. I am okay with
> keeping it the way it is, but if we don't want it in the KTAP spec, we
> will need some kind of recourse.

So far, most of the CI systems don't parse out diagnostic data, so it doesn't
really matter what the format is.  If it's useful for humans, it's valuable as is.
However, it would be nice if that could change.  But without some formalization
of the format of the diagnostic data, it's an intractable problem for CI systems
to parse it.  So it's really a chicken and egg problem.  To solve it, we would have
to determine what exactly needs to be provided on a consistent basis for diagnostic
data across many tests.  I think that it's too big a problem to handle right now.
I'm not opposed to migrating to some structure with yaml in the future, but free
form text output seems OK for now.

> 
> >    - reason: try to keep things line-based, since output from other things
> >    may be interspersed with messages from the test itself
> >  - TODO directive
> 
> Is this more of stating a fact or desire? We don't use TODO either, but
> it looks like it could be useful.
Just stating a fact.  I didn't find TODO in either KUnit or selftest in 
November when I initially wrote this up.  If TODO serves as a kind
of XFAIL, it could be useful.  I have nothing against it.

> 
> > KTAP Extensions beyond TAP13:
> >  - nesting
> >    - via indentation
> >      - indentation makes it easier for humans to read
> >  - test identifier
> >     - multiple parts, separated by ':'
> 
> Can you elabroate on this more? I am not sure what you mean.
An individual test case can have a name that is scoped by a containing
test or test suite.  For example: selftests: cpufreq: main.sh
This test identifier consists of the test system (selftests), the test
area (cpufreq), and the test case name (main.sh).  This one's a bit
weird because the test case name is just the name of the program
in that test area.  The program itself doesn't output data in TAP format,
and the harness uses it's exit code to detect PASS/FAIL.  if main.sh had
multiple test cases, it might produce test identifiers like this:
selftests: cpufreq: main: check_change_afinity_mask
selftests: cpufreq: main: check_permissions_for_mask_operation
(Or it might just produce the last part of these strings, the
testcase names, and the testcase id might be something generated
by the harness or CI system.)

The value of having a single string to identify the testcase (like a
uniform resource locator), is that it's easier to use the string to
correlate results produced from different CI system that are executing
the same test.

> 
> >  - summary lines
> >    - can be skipped by CI systems that do their own calculations
> >
> > Other notes:
> >  - automatic assignment of result status based on exit code
> >
> > Tips:
> >  - do NOT describe the result in the test line
> >    - the test case description should be the same whether the test
> >      succeeds or fails
> >    - use diagnostic lines to describe or explain results, if this is
> >      desirable
> >  - test numbers are considered harmful
> >    - test harnesses should use the test description as the identifier
> >    - test numbers change when testcases are added or removed
> >      - which means that results can't be compared between different
> >        versions of the test
> >  - recommendations for diagnostic messages:
> >    - reason for failure
> >    - reason for skip
> >    - diagnostic data should always preceding the result line
> >      - problem: harness may emit result before test can do assessment
> >        to determine reason for result
> >      - this is what the kernel uses
> >
> > Differences between kernel test result format and TAP13:
> >  - in KTAP the "# SKIP" directive is placed after the description on
> >    the test result line
> >
> > ====== start of ktap-doc-rfc.txt ======
> > OK - that's the end of the RFC doc.
> >
> > Here are a few questions:
> >  - is this document desired or not?
> >     - is it too long or too short?
> >  - if the document is desired, where should it be placed?
> 
> I like it. I don't think we can rely on the TAP people updating their
> stuff based on my interactions with them. So having a spec which is
> actually maintained would be nice.
> 
> Maybe in Documentation/dev-tools/ ?
I'm leaning towards Documentation/dev-tools/test-results_format.rst

> 
> >    I assume somewhere under Documentation, and put into
> >    .rst format. Suggestions for a name and location are welcome.
> >  - is this document accurate?
> >    I think KUNIT does a few things differently than this description.
> >    - is the intent to have kunit and kselftest have the same output format?
> >       if so, then these should be rationalized.
> 
> Yeah, I think it would be nice if all test frameworks/libraries for the
> kernel output tests in the same language.
Agreed.

 -- Tim


^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: RFC - kernel selftest result documentation (KTAP)
  2020-06-16 21:16   ` Bird, Tim
@ 2020-06-16 21:19     ` Bird, Tim
  2020-06-17  0:06     ` Kees Cook
  2020-06-19 19:39     ` Brendan Higgins
  2 siblings, 0 replies; 43+ messages in thread
From: Bird, Tim @ 2020-06-16 21:19 UTC (permalink / raw)
  To: Bird, Tim, Brendan Higgins
  Cc: shuah, linux-kselftest, linux-kernel, Kees Cook, Paolo Bonzini,
	David Gow

> -----Original Message-----
> From: Bird, Tim
> 
> > Maybe in Documentation/dev-tools/ ?
> I'm leaning towards Documentation/dev-tools/test-results_format.rst
Ummm.  Make that "test-results-format.rst"
(with a dash not an underline.  I'm not insane, I swear...)
 -- Tim


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-15 19:07   ` Bird, Tim
  2020-06-16 12:08     ` Paolo Bonzini
@ 2020-06-16 23:52     ` Kees Cook
  2020-06-19 18:52       ` Frank Rowand
  2020-06-19 19:50       ` Brendan Higgins
  2020-06-19 19:49     ` Frank Rowand
  2 siblings, 2 replies; 43+ messages in thread
From: Kees Cook @ 2020-06-16 23:52 UTC (permalink / raw)
  To: Bird, Tim
  Cc: shuah, linux-kselftest, Brendan Higgins, linux-kernel, David Gow,
	Paolo Bonzini

On Mon, Jun 15, 2020 at 07:07:34PM +0000, Bird, Tim wrote:
> From: Kees Cook <keescook@chromium.org>
> > Note: making the plan line required differs from TAP13 and TAP14. I
> > think it's the right choice, but we should be clear.
> 
> [...]
> With regards to making it optional or not, I don't have a strong
> preference.  The extra info seems helpful in some circumstances.
> I don't know if it's too onerous to make it a requirement or not.
> I'd prefer if it was always there (either at the beginning or the end),
> but if there is some situation where it's quite difficult to calculate,
> then it would be best not to mandate it. I can't think of any impossible
> situations at the moment.

I think we should require one of:

- starting plan line
- ending plan line
- ending with something that indicates "I'm done, but I have no idea how
  many tests actually ran" (Maybe "1..?")

To me, the point of the plan line is to be able to say "this test did,
in fact, finish". So even if some test can't even count how many tests
it _ran_, it can at least say "I am now finished".

> > TAP13/14 makes description optional, are we making it required (I think
> > we should). There seems to be a TAP13/14 "convention" of starting
> > <description> with "- ", which I'm on the fence about it. It does make
> > parsing maybe a little easier.
> 
> I would like the description to be required.
> I don't have a strong opinion on the dash.  I'm OK with either one (dash
> or no dash), but we should make kselftest and KUnit consistent.

I find the dash to be distracting -- it doesn't help me scan it, and it
doesn't help a parser (which only needs to find "#").

> > > Differences between kernel test result format and TAP13:
> > >  - in KTAP the "# SKIP" directive is placed after the description on
> > >    the test result line

I sent a bunch of clean-ups for kselftest.h recently[1], but it looks
like we'll need more for adding "description" to skip (right now it only
prints the SKIP reason).

[1] https://lore.kernel.org/lkml/20200611224028.3275174-1-keescook@chromium.org/

> > Yes Documentation/*.rst Not sure on name yet, but where do kselftest
> > docs live? :)
> Documentation/dev-tools/kselftest.rst
> 
> I'll put this at: Documentation/dev-tools/test-results-format.rst

Sounds good!

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-16 19:44         ` Brendan Higgins
  2020-06-16 20:30           ` Bird, Tim
@ 2020-06-16 23:58           ` Kees Cook
  2020-06-19 18:47             ` Frank Rowand
  1 sibling, 1 reply; 43+ messages in thread
From: Kees Cook @ 2020-06-16 23:58 UTC (permalink / raw)
  To: Brendan Higgins
  Cc: Bird, Tim, Paolo Bonzini, shuah, linux-kselftest, linux-kernel,
	David Gow

On Tue, Jun 16, 2020 at 12:44:28PM -0700, Brendan Higgins wrote:
> On Tue, Jun 16, 2020 at 9:42 AM Bird, Tim <Tim.Bird@sony.com> wrote:
> > > From: Paolo Bonzini <pbonzini@redhat.com>
> > >
> > > On 15/06/20 21:07, Bird, Tim wrote:
> > > >> Note: making the plan line required differs from TAP13 and TAP14. I
> > > >> think it's the right choice, but we should be clear.
> > >
> > > As an aside, where is TAP14?
> > By TAP14, I was referring to the current, undocumented, KUnit
> > conventions.
> 
> Not so. TAP14 is the proposed next version of TAP13:
> 
> https://github.com/TestAnything/testanything.github.io/pull/36
> https://github.com/isaacs/testanything.github.io/blob/tap14/tap-version-14-specification.md

I was reading this (I haven't compared to the blob above):

https://github.com/TestAnything/Specification/blob/tap-14-specification/specification.md

> Based on the discussion, it seems like most of the things we wanted
> from TAP14 would probably make it in if TAP ever accepts another pull
> request.

Were our leading diagnostic lines part of their latest spec? I thought
we were pretty far off in left field for that particular bit.

> > My personal preference is to have the dash.  I think it's more human readable.
> > I note that the TAP spec has examples of result lines both with and without
> > the dash, so even the spec is ambiguous on this.   I think not mandating it
> > either way is probably best.  For regex parsers, it's easy to ignore with '[-]?'
> > outside the pattern groups that grab the number and description.
> 
> I don't think we care, because we don't use it.

Yeah, I'm in the same place. I don't care -- I would just like a
determination. (The "implied" nature of it in TAP14 bothers me.)

> > > XFAIL/XPASS are different from SKIP.  I personally don't have a need for
> > > them, but kselftests includes XFAIL/XPASS exit codes and they aren't
> > > reflected into selftests/kselftest/runner.sh.
> > >
> > > Likewise, kselftest.h has ksft_inc_xfail_cnt but not
> > > ksft_test_result_xfail/ksft_test_result_xpass.

I proposed fixing that recently[1]. seccomp uses XFAIL for "I have
detected you lack the config to test this, so I can't say it's working
or not, because it only looks like a failure without the config."

[1] https://lore.kernel.org/lkml/20200611224028.3275174-7-keescook@chromium.org/

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-16 20:37       ` Bird, Tim
@ 2020-06-17  0:02         ` Kees Cook
  2020-06-19 19:32         ` Brendan Higgins
  1 sibling, 0 replies; 43+ messages in thread
From: Kees Cook @ 2020-06-17  0:02 UTC (permalink / raw)
  To: Bird, Tim
  Cc: Brendan Higgins, David Gow, shuah, linux-kselftest, linux-kernel,
	Paolo Bonzini

On Tue, Jun 16, 2020 at 08:37:18PM +0000, Bird, Tim wrote:
> One thing that needs to be rationalized between KUnit and selftest
> is the syntax for subtests.  KUnit follows the TAP14 spec, and starts
> subtests with indented "# Subtest: name of the child test"
> and selftests just indents the output from the child test, so it
> starts with indented "TAP version 13".  One issue I have with the
> TAP14/KUnit approach is that the output from the child is different
> depending on whether the test is called in the context of another
> test or not.

Right -- I'd *really* like the subtests to be "separable", since the
primary issue is actually that a subtest may not know it is a subtest,
and passing that knowledge in may be difficult/disruptive.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-16 21:16   ` Bird, Tim
  2020-06-16 21:19     ` Bird, Tim
@ 2020-06-17  0:06     ` Kees Cook
  2020-06-17  2:30       ` Bird, Tim
  2020-06-19 19:39     ` Brendan Higgins
  2 siblings, 1 reply; 43+ messages in thread
From: Kees Cook @ 2020-06-17  0:06 UTC (permalink / raw)
  To: Bird, Tim
  Cc: Brendan Higgins, shuah, linux-kselftest, linux-kernel,
	Paolo Bonzini, David Gow

On Tue, Jun 16, 2020 at 09:16:01PM +0000, Bird, Tim wrote:
> So far, most of the CI systems don't parse out diagnostic data, so it doesn't
> really matter what the format is.  If it's useful for humans, it's valuable as is.
> However, it would be nice if that could change.  But without some formalization
> of the format of the diagnostic data, it's an intractable problem for CI systems
> to parse it.  So it's really a chicken and egg problem.  To solve it, we would have
> to determine what exactly needs to be provided on a consistent basis for diagnostic
> data across many tests.  I think that it's too big a problem to handle right now.
> I'm not opposed to migrating to some structure with yaml in the future, but free
> form text output seems OK for now.

For a CI system, if I see a test has failed, I expect to be able to
click a link to get the log of that test, which includes the diagnostic
lines. The other reason to have them there is to show progress during a
manual run.

> > Yeah, I think it would be nice if all test frameworks/libraries for the
> > kernel output tests in the same language.
> Agreed.

$ git grep "TAP version"
exec/binfmt_script:print("TAP version 1.3")
kselftest.h:            printf("TAP version 13\n");
kselftest/runner.sh:    echo "TAP version 13"
resctrl/resctrl_tests.c:        printf("TAP version 13\n");
size/get_size.c:        print("TAP version 13\n");

Looks like there are 2 tests to convert to kselftest.h, and then we can
just change the version to 14 in the header and the runner. ;)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: RFC - kernel selftest result documentation (KTAP)
  2020-06-17  0:06     ` Kees Cook
@ 2020-06-17  2:30       ` Bird, Tim
  2020-06-17  3:36         ` Kees Cook
  0 siblings, 1 reply; 43+ messages in thread
From: Bird, Tim @ 2020-06-17  2:30 UTC (permalink / raw)
  To: Kees Cook
  Cc: Brendan Higgins, shuah, linux-kselftest, linux-kernel,
	Paolo Bonzini, David Gow



> -----Original Message-----
> From: Kees Cook
> 
> On Tue, Jun 16, 2020 at 09:16:01PM +0000, Bird, Tim wrote:
> > So far, most of the CI systems don't parse out diagnostic data, so it doesn't
> > really matter what the format is.  If it's useful for humans, it's valuable as is.
> > However, it would be nice if that could change.  But without some formalization
> > of the format of the diagnostic data, it's an intractable problem for CI systems
> > to parse it.  So it's really a chicken and egg problem.  To solve it, we would have
> > to determine what exactly needs to be provided on a consistent basis for diagnostic
> > data across many tests.  I think that it's too big a problem to handle right now.
> > I'm not opposed to migrating to some structure with yaml in the future, but free
> > form text output seems OK for now.
> 
> For a CI system, if I see a test has failed, I expect to be able to
> click a link to get the log of that test, which includes the diagnostic
> lines. The other reason to have them there is to show progress during a
> manual run.
Agreed.  You only need machine-parsable data if you expect the CI
system to do something more with the data than just present it.
What that would be, that would be common for all tests (or at least
many test), is unclear.  Maybe there are patterns in the diagnostic
data that could lead to higher-level analysis, or even automated
fixes, that don't become apparent if the data is unstructured.  But
it's hard to know until you have lots of data.  I think just getting
the other things consistent is a good priority right now.
 -- Tim

> 
> > > Yeah, I think it would be nice if all test frameworks/libraries for the
> > > kernel output tests in the same language.
> > Agreed.
> 
> $ git grep "TAP version"
> exec/binfmt_script:print("TAP version 1.3")
> kselftest.h:            printf("TAP version 13\n");
> kselftest/runner.sh:    echo "TAP version 13"
> resctrl/resctrl_tests.c:        printf("TAP version 13\n");
> size/get_size.c:        print("TAP version 13\n");
> 
> Looks like there are 2 tests to convert to kselftest.h, and then we can
> just change the version to 14 in the header and the runner. ;)
> 
> --
> Kees Cook

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-17  2:30       ` Bird, Tim
@ 2020-06-17  3:36         ` Kees Cook
  2020-06-17  4:05           ` David Gow
  0 siblings, 1 reply; 43+ messages in thread
From: Kees Cook @ 2020-06-17  3:36 UTC (permalink / raw)
  To: Bird, Tim
  Cc: Brendan Higgins, shuah, linux-kselftest, linux-kernel,
	Paolo Bonzini, David Gow

On Wed, Jun 17, 2020 at 02:30:45AM +0000, Bird, Tim wrote:
> Agreed.  You only need machine-parsable data if you expect the CI
> system to do something more with the data than just present it.
> What that would be, that would be common for all tests (or at least
> many test), is unclear.  Maybe there are patterns in the diagnostic
> data that could lead to higher-level analysis, or even automated
> fixes, that don't become apparent if the data is unstructured.  But
> it's hard to know until you have lots of data.  I think just getting
> the other things consistent is a good priority right now.

Yeah. I think the main place for this is performance analysis, but I
think that's a separate system entirely. TAP is really strictly yes/no,
where as performance analysis a whole other thing. The only other thing
I can think of is some kind of feature analysis, but that would be built
out of the standard yes/no output. i.e. if I create a test that checks
for specific security mitigation features (*cough*LKDTM*cough*), having
a dashboard that shows features down one axis and architectures and/or
kernel versions on other axes, then I get a pretty picture. But it's
still being built out of the yes/no info.

*shrug*

I think diagnostic should be expressly non-machine-oriented.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-17  3:36         ` Kees Cook
@ 2020-06-17  4:05           ` David Gow
  2020-06-19 19:44             ` Brendan Higgins
  2020-06-19 20:19             ` Frank Rowand
  0 siblings, 2 replies; 43+ messages in thread
From: David Gow @ 2020-06-17  4:05 UTC (permalink / raw)
  To: Kees Cook
  Cc: Bird, Tim, Brendan Higgins, shuah, linux-kselftest, linux-kernel,
	Paolo Bonzini

On Wed, Jun 17, 2020 at 11:36 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Wed, Jun 17, 2020 at 02:30:45AM +0000, Bird, Tim wrote:
> > Agreed.  You only need machine-parsable data if you expect the CI
> > system to do something more with the data than just present it.
> > What that would be, that would be common for all tests (or at least
> > many test), is unclear.  Maybe there are patterns in the diagnostic
> > data that could lead to higher-level analysis, or even automated
> > fixes, that don't become apparent if the data is unstructured.  But
> > it's hard to know until you have lots of data.  I think just getting
> > the other things consistent is a good priority right now.
>
> Yeah. I think the main place for this is performance analysis, but I
> think that's a separate system entirely. TAP is really strictly yes/no,
> where as performance analysis a whole other thing. The only other thing
> I can think of is some kind of feature analysis, but that would be built
> out of the standard yes/no output. i.e. if I create a test that checks
> for specific security mitigation features (*cough*LKDTM*cough*), having
> a dashboard that shows features down one axis and architectures and/or
> kernel versions on other axes, then I get a pretty picture. But it's
> still being built out of the yes/no info.
>
> *shrug*
>
> I think diagnostic should be expressly non-machine-oriented.

So from the KUnit side, we sort-of have three kinds of diagnostic lines:
- Lines printed directly from tests (typically using kunit_info() or
similar functions): as I understand it, these are basically the
equivalent of what kselftest typically uses diagnostics for --
test-specific, human-readable messages. I don't think we need/want to
parse these much.
- Kernel messages during test execution. If we get the results from
scraping the kernel log (which is still the default for KUnit, though
there is also a debugfs info), other kernel logs can be interleaved
with the results. Sometimes these are irrelevant things happening on
another thread, sometimes they're something directly related to the
test which we'd like to capture (KASAN errors, for instance). I don't
think we want these to be machine oriented, but we may want to be able
to filter them out.
- Expectation failures: as Brendan mentioned, KUnit will print some
diagnostic messages for individual assertion/expectation failures,
including the expected and actual values. We'd ideally like to be able
to identify and parse these, but keeping them human-readable is
definitely also a goal.

Now, to be honest, I doubt that the distinction here would be of much
use to kselftest, but it could be nice to not go out of our way to
make parsing some diagnostic lines possible. That being said,
personally I'm all for avoiding the yaml for diagnostic messages stuff
and sticking to something simple and line-based, possibly
standardising a the format of a few common diagnostic measurements
(e.g., assertions/expected values/etc) in a way that's both
human-readable and parsable if possible.

I agree that there's a lot of analysis that is possible with just the
yes/no data. There's probably some fancy correlation one could do even
with unstructured diagnostic logs, so I don't think overstructuring
things is a necessity by any means. Where we have different tests
doing similar sorts of things, though, consistency in message
formatting could help even if things are not explicitly parsed.
Ensuring that helper functions that log and the like are spitting
things out in the same format is probably a good starting step down
that path.

Cheers,
-- David

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-10 18:11 RFC - kernel selftest result documentation (KTAP) Bird, Tim
                   ` (3 preceding siblings ...)
  2020-06-16 20:48 ` Brendan Higgins
@ 2020-06-19 17:13 ` Frank Rowand
  4 siblings, 0 replies; 43+ messages in thread
From: Frank Rowand @ 2020-06-19 17:13 UTC (permalink / raw)
  To: Bird, Tim, shuah, linux-kselftest, Brendan Higgins; +Cc: linux-kernel

On 2020-06-10 13:11, Bird, Tim wrote:
> Some months ago I started work on a document to formalize how
> kselftest implements the TAP specification.  However, I didn't finish
> that work.  Maybe it's time to do so now.
> 
> kselftest has developed a few differences from the original
> TAP specification, and  some extensions that I believe are worth
> documenting.
> 
> Essentially, we have created our own KTAP (kernel TAP)
> format.  I think it is worth documenting our conventions, in order to
> keep everyone on the same page.
> 
> Below is a partially completed document on my understanding
> of KTAP, based on examination of some of the kselftest test
> output.  I have not reconciled this with the kunit output format,
> which I believe has some differences (which maybe we should
> resolve before we get too far into this).
> 
> I submit the document now, before it is finished, because a patch
> was recently introduced to alter one of the result conventions
> (from SKIP='not ok' to SKIP='ok').
> 
> See the document include inline below
> 
> ====== start of ktap-doc-rfc.txt ======
> Selftest preferred output format
> --------------------------------
> 
> The linux kernel selftest system uses TAP (Test Anything Protocol)
> output to make testing results consumable by automated systems.  A
> number of Continuous Integration (CI) systems test the kernel every
> day.  It is useful for these systems that output from selftest
> programs be consistent and machine-parsable.
> 
> At the same time, it is useful for test results to be human-readable
> as well.
> 
> The kernel test result format is based on a variation TAP
> TAP is a simple text-based format that is
> documented on the TAP home page (http://testanything.org/).  There
> is an official TAP13 specification here:
> http://testanything.org/tap-version-13-specification.html
> 
> The kernel test result format consists of 5 major elements,
> 4 of which are line-based:

This document should use the terminology used by the TAP spec,
if at all reasonable and possible.  It is very hard to go between
this document and the TAP specs (v13 and v14) when they use
different names for the same thing.  For example, the TAP spec
calls this "tests lines" instead of "test result".

-Frank

>  * the output version line
>  * the plan line
>  * one or more test result lines (also called test result lines)
>  * a possible "Bail out!" line
> 
> optional elements:
>  * diagnostic data
> 
> The 5th element is diagnostic information, which is used to describe
> items running in the test, and possibly to explain test results.
> A sample test result is show below:
> 
> Some other lines may be placed the test harness, and are not emitted
> by individual test programs:
>  * one or more test identification lines
>  * a possible results summary line
> 
> Here is an example:
> 
> 	TAP version 13
> 	1..1
> 	# selftests: cpufreq: main.sh
> 	# pid 8101's current affinity mask: fff
> 	# pid 8101's new affinity mask: 1
> 	ok 1 selftests: cpufreq: main.sh
> 
> The output version line is: "TAP version 13"
> 
> The test plan is "1..1".
> 
> Element details
> ===============
> 
> Output version line
> -------------------
> The output version line is always "TAP version 13".
> 
> Although the kernel test result format has some additions
> to the TAP13 format, the version line reported by kselftest tests
> is (currently) always the exact string "TAP version 13"
> 
> This is always the first line of test output.
> 
> Test plan line
> --------------
> The test plan indicates the number of individual test cases intended to
> be executed by the test. It always starts with "1.." and is followed
> by the number of tests cases.  In the example above, 1..1", indicates
> that this test reports only 1 test case.
> 
> The test plan line can be placed in two locations:
>  * the second line of test output, when the number of test cases is known
>    in advance
>  * as the last line of test output, when the number of test cases is not
>    known in advance.
> 
> Most often, the number of test cases is known in advance, and the test plan
> line appears as the second line of test output, immediately following
> the output version line.  The number of test cases might not be known
> in advance if the number of tests is calculated from runtime data.
> In this case, the test plan line is emitted as the last line of test
> output.
> 
> Test result lines
> -----------------
> The test output consists of one or more test result lines that indicate
> the actual results for the test.  These have the format:
> 
>   <result> <number> <description> [<directive>] [<diagnostic data>]
> 
> The ''result'' must appear at the start of a line (except for when a
> test is nested, see below), and must consist of one of the following
> two phrases:
>   * ok
>   * not ok
> 
> If the test passed, then the result is reported as "ok".  If the test
> failed, then the result is reported as "not ok".  These must be in
> lower case, exactly as shown.
> 
> The ''number'' in the test result line represents the number of the
> test case being performed by the test program.  This is often used by
> test harnesses as a unique identifier for each test case.  The test
> number is a base-10 number, starting with 1.  It should increase by
> one for each new test result line emitted.  If possible the number
> for a test case should be kept the same over the lifetime of the test.
> 
> The ''description'' is a short description of the test case.
> This can be any string of words, but should avoid using colons (':')
> except as part of a fully qualifed test case name (see below).
> 
> Finally, it is possible to use a test directive to indicate another
> possible outcome for a test: that it was skipped.  To report that
> a test case was skipped, the result line should start with the
> result "not ok", and the directive "# SKIP" should be placed after
> the test description. (Note that this deviates from the TAP13
> specification).
> 
> A test may be skipped for a variety of reasons, ranging for
> insufficient privileges to missing features or resources required
> to execute that test case.
> 
> It is usually helpful if a diagnostic message is emitted to explain
> the reasons for the skip.  If the message is a single line and is
> short, the diagnostic message may be placed after the '# SKIP'
> directive on the same line as the test result.  However, if it is
> not on the test result line, it should precede the test line (see
> diagnostic data, next).
> 
> Diagnostic data
> ---------------
> Diagnostic data is text that reports on test conditions or test
> operations, or that explains test results.  In the kernel test
> result format, diagnostic data is placed on lines that start with a
> hash sign, followed by a space ('# ').
> 
> One special format of diagnostic data is a test identification line,
> that has the fully qualified name of a test case.  Such a test
> identification line marks the start of test output for a test case.
> 
> In the example above, there are three lines that start with '#'
> which precede the test result line:
> 	# selftests: cpufreq: main.sh
> 	# pid 8101's current affinity mask: fff
> 	# pid 8101's new affinity mask: 1
> These are used to indicate diagnostic data for the test case
> 'selftests: cpufreq: main.sh'
> 
> Material in comments between the identification line and the test
> result line are diagnostic data that can help to interpret the
> results of the test.
> 
> The TAP specification indicates that automated test harnesses may
> ignore any line that is not one of the mandatory prescribed lines
> (that is, the output format version line, the plan line, a test
> result line, or a "Bail out!" line.) 
> 
> Bail out!
> ---------
> If a line in the test output starts with 'Bail out!', it indicates
> that the test was aborted for some reason.  It indicates that 
> the test is unable to proceed, and no additional tests will be
> performed.
> 
> This can be used at the very beginning of a test, or anywhere in the
> middle of the test, to indicate that the test can not continue.
> 
> --- from here on is not-yet-organized material
> 
> Tip:
>  - don't change the test plan based on skipped tests.
>    - it is better to report that a test case was skipped, than to
>      not report it
>    - that is, don't adjust the number of test cases based on skipped
>      tests
> 
> Other things to mention:
> TAP13 elements not used:
>  - yaml for diagnostic messages
>    - reason: try to keep things line-based, since output from other things
>    may be interspersed with messages from the test itself
>  - TODO directive
> 
> KTAP Extensions beyond TAP13:
>  - nesting
>    - via indentation
>      - indentation makes it easier for humans to read
>  - test identifier
>     - multiple parts, separated by ':'
>  - summary lines
>    - can be skipped by CI systems that do their own calculations
> 
> Other notes:
>  - automatic assignment of result status based on exit code
> 
> Tips:
>  - do NOT describe the result in the test line
>    - the test case description should be the same whether the test
>      succeeds or fails
>    - use diagnostic lines to describe or explain results, if this is
>      desirable
>  - test numbers are considered harmful
>    - test harnesses should use the test description as the identifier
>    - test numbers change when testcases are added or removed
>      - which means that results can't be compared between different
>        versions of the test
>  - recommendations for diagnostic messages:
>    - reason for failure
>    - reason for skip
>    - diagnostic data should always preceding the result line
>      - problem: harness may emit result before test can do assessment
>        to determine reason for result
>      - this is what the kernel uses
> 
> Differences between kernel test result format and TAP13:
>  - in KTAP the "# SKIP" directive is placed after the description on
>    the test result line
> 
> ====== start of ktap-doc-rfc.txt ======
> OK - that's the end of the RFC doc.
> 
> Here are a few questions:
>  - is this document desired or not?
>     - is it too long or too short?
>  - if the document is desired, where should it be placed?
>    I assume somewhere under Documentation, and put into
>    .rst format. Suggestions for a name and location are welcome.
>  - is this document accurate?
>    I think KUNIT does a few things differently than this description.
>    - is the intent to have kunit and kselftest have the same output format?
>       if so, then these should be rationalized.
> 
> Finally,
>   - Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
> See https://testanything.org/tap-version-13-specification.html
> 
> Regards,
>  -- Tim
> 
>    
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-16 12:08     ` Paolo Bonzini
  2020-06-16 16:42       ` Bird, Tim
@ 2020-06-19 17:58       ` Frank Rowand
  2020-06-20  6:44         ` David Gow
  1 sibling, 1 reply; 43+ messages in thread
From: Frank Rowand @ 2020-06-19 17:58 UTC (permalink / raw)
  To: Paolo Bonzini, Bird, Tim, Kees Cook
  Cc: shuah, linux-kselftest, Brendan Higgins, linux-kernel, David Gow

On 2020-06-16 07:08, Paolo Bonzini wrote:
> On 15/06/20 21:07, Bird, Tim wrote:
>>> Note: making the plan line required differs from TAP13 and TAP14. I
>>> think it's the right choice, but we should be clear.
> 
> As an aside, where is TAP14?
> 
>> With regards to making it optional or not, I don't have a strong
>> preference.  The extra info seems helpful in some circumstances.
>> I don't know if it's too onerous to make it a requirement or not.
>> I'd prefer if it was always there (either at the beginning or the end),
>> but if there is some situation where it's quite difficult to calculate,
>> then it would be best not to mandate it. I can't think of any impossible
>> situations at the moment.
> 
> I think making the plan mandatory is a good idea.  "Late plans" work
> very well for cases where you cannot know in advance the number of tests
> (for example in filters that produce TAP from other output), and provide
> an additional safety net.
> 
>>> "Bail out!" to be moved to "optional" elements, since it may not appear.
>>> And we should clarify TAP13 and TAP14's language to say it should only
>>> appear when the test is aborting without running later tests -- for this
>>> reason, I think the optional "description" following "Bail out!" should
>>> be made required. I.e. it must be: "Bail out! $reason"
>>
>> I'll make sure this is listed as optional.
>> I like adding a mandatory reason.
> 
> +1.
> 
>>> TAP13/14 makes description optional, are we making it required (I think
>>> we should). There seems to be a TAP13/14 "convention" of starting
>>> <description> with "- ", which I'm on the fence about it. It does make
>>> parsing maybe a little easier.
>>
>> I would like the description to be required.
>> I don't have a strong opinion on the dash.  I'm OK with either one (dash
>> or no dash), but we should make kselftest and KUnit consistent.
> 
> I think no mandatory dash is better (or even mandatory no-dash!).  We
> can suggest removing it when formatting TAP output.
> 
>>>> Finally, it is possible to use a test directive to indicate another
>>>> possible outcome for a test: that it was skipped.  To report that
>>>> a test case was skipped, the result line should start with the
>>>> result "not ok", and the directive "# SKIP" should be placed after
>>>> the test description. (Note that this deviates from the TAP13
>>>> specification).
> 
> How so?  The description comes first, but there can be a description of
> the directive.
> 
>      not ok 4 - Summarized correctly # TODO Not written yet
> 
>>>> It is usually helpful if a diagnostic message is emitted to explain
>>>> the reasons for the skip.  If the message is a single line and is
>>>> short, the diagnostic message may be placed after the '# SKIP'
>>>> directive on the same line as the test result.  However, if it is
>>>> not on the test result line, it should precede the test line (see
>>>> diagnostic data, next).
>>>>
>>>> Bail out!
>>>> ---------
>>>> If a line in the test output starts with 'Bail out!', it indicates
>>>> that the test was aborted for some reason.  It indicates that
>>>> the test is unable to proceed, and no additional tests will be
>>>> performed.
>>>>
>>>> This can be used at the very beginning of a test, or anywhere in the
>>>> middle of the test, to indicate that the test can not continue.
>>>
>>> I think the required syntax should be:
>>>
>>> Bail out! <reason>
>>>
>>> And to make it clear that this is optionally used to indicate an early
>>> abort. (Though with a leading plan line, a parser should be able to
>>> determine this on its own.)
> 
> True.  However, "Bail out!" allow to distinguish issues with the harness
> (such as ENOSPC) from test aborts.
> 
>>>>  - TODO directive
>>>
>>> Agreed: SKIP should cover everything TODO does.
> 
> XFAIL/XPASS are different from SKIP.  I personally don't have a need for
> them, but kselftests includes XFAIL/XPASS exit codes and they aren't
> reflected into selftests/kselftest/runner.sh.
> 
> Likewise, kselftest.h has ksft_inc_xfail_cnt but not
> ksft_test_result_xfail/ksft_test_result_xpass.

It has ksft_exit_xfail() and ksft_exit_xpass().  But at 5.8-rc1 I do not
see any callers of those two functions.

> 
> It's important to notice in the spec that the TODO directive inverts the
> direction of ok/not-ok (i.e. XFAIL, the "good" result, is represented by
> "not ok # TODO").

For a slightly more verbose explanation, the mindset behind pytest "xfail"
and TAP "not ok # TODO explanation" is different, though they both provide
essentially the same information.

  pytest "xfail" (if I understand correctly) means the test is expected to
  fail, and it does indeed fail.

  pytest "xpass" means the test is expected to fail, but it does not fail.

  TAP "not ok # TODO explanation" means the test is expected to fail, and
  it does indeed fail.

  TAP "ok # TODO explanation" means the test is expected to fail, but it does not fail.

It does not seem to be a good idea to have two different ways of reporting
the same thing.

The TAP method seems cleaner to me.  It is consistently reported whether or
not the test passed, no matter what the expectation of pass/fail is.  "# TODO ..."
is a way for the code maintainer to convey information to the test community
that the test is expected to fail due to a known bug.  If the output of
the test changes to "ok # TODO explanation", then the tester can report
that as an issue to the code maintainer, and the code maintainer can then
determine whether the bug got fixed (and thus the "# TODO explanation"
should be removed from the test code).

Can we just remove ksft_exit_xfail() and ksft_exit_xpass() since no one is
using it, and there is an equivalent way to express the same information
in TAP?

> 
>>>>  - test identifier
>>>>     - multiple parts, separated by ':'
>>>
>>> This is interesting... is the goal to be able to report test status over
>>> time by name?
> 
> What about "/" instead?
> 
>>>> Finally,
>>>>   - Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
>>>> See https://testanything.org/tap-version-13-specification.html
>>>
>>> Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it
>>> did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP
>>> relate? Neither SKIP nor XFAIL count toward failure, though, so both
>>> should be "ok"? I guess we should change it to "ok".
> 
> See above for XFAIL.
> 
> I initially raised the issue with "SKIP" because I have a lot of tests
> that depend on hardware availability---for example, a test that does not
> run on some processor kinds (e.g. on AMD, or old Intel)---and for those
> SKIP should be considered a success.

No, SKIP should not be considered a success.  It should also not be considered
a failure.  Please do not blur the lines between success, failure, and
skipped.

-Frank

> 
> Paolo
> 
>> I have the same initial impression.  In my mind, a skip is "not ok", since
>> the test didn't run. However, a SKIP and should be treated differently
>> from either "ok" or "not ok" by the results interpreter, so I don't think it
>> matters.  Originally I was averse to changing the SKIP result to "ok"
>> (as suggested by Paulo Bonzini [1]), but I checked and it's pretty trivial to
>> change the parser in Fuego, and it would make the kernel results format
>> match the TAP13 spec.  I don't see a strong reason for us to be different
>> from TAP13 here.
>>
>> I raised this issue on our automated testing conference call last week
>> (which includes people from the CKI, Fuego, KernelCI and LKFT projects), and
>> so people should be chiming in if their parser will have a problem with this change.)
>>
>> [1]  https://lkml.kernel.org/lkml/20200610154447.15826-1-pbonzini@redhat.com/T/
>>
>> Thanks very much for the feedback.
>>  -- Tim
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-16 20:03     ` Brendan Higgins
  2020-06-16 20:37       ` Bird, Tim
@ 2020-06-19 18:17       ` Frank Rowand
  1 sibling, 0 replies; 43+ messages in thread
From: Frank Rowand @ 2020-06-19 18:17 UTC (permalink / raw)
  To: Brendan Higgins, Bird, Tim
  Cc: David Gow, shuah, linux-kselftest, linux-kernel, Kees Cook

On 2020-06-16 15:03, Brendan Higgins wrote:
> On Mon, Jun 15, 2020 at 10:34 AM Bird, Tim <Tim.Bird@sony.com> wrote:
>>
>>> -----Original Message-----
>>> From: David Gow <davidgow@google.com>
>>>
>>> On Thu, Jun 11, 2020 at 2:11 AM Bird, Tim <Tim.Bird@sony.com> wrote:
> [...]
>>> KUnit is currently outputting "TAP version 14", as we were hoping some
>>> of our changes would get into the TAP14 spec. (Any comments, Brendan?)
>>> Maybe this should end up saying "KTAP version 1" or something?
>>
>> I don't know if this will break any existing results parsers or not.
>> I hesitate to use "TAP version 14", as TAP appears to be a dormant
>> initiative at the moment, and there's no guarantee that the kernel's
>> changes will get adopted into an official spec.
> 
> We were using "TAP version 14" since the "extensions" we are using
> were all proposed among the TAP people to go into the next version of
> TAP. Based on discussions among them they seem to like the subtest
> stuff:
> 
> https://github.com/TestAnything/testanything.github.io/pull/36
> 
> Anyway, I can still appreciate that they might change their minds.
> 
>> If we are a strict super-set of TAP, then I suppose we could just
>> start using TAP version 14, and unilaterally declare that our changes
>> make a new spec.  But since we don't control the web site this feels
>> like a hostile takeover.
> 
> I just thought it was okay because it was already in their proposed
> TAP14 spec, but yeah, if we just decide amongst ourselves to use it,
> we should probably do something else.
> 
>> I'm most comfortable with calling our thing KTAP, and just
>> referencing TAP as inspiration.  I don't have a strong opinion on
> 
> I am cool with that.

I like a KTAP specification, based on the proposed TAP 14, but with a
good edit, and with the extensions and changes that are coming out
of this conversation.

It would be great if we could just have a finished TAP 14, but my
understanding is that efforts to move that forward are not making
any progress.  And if we fork to make KTAP, maybe that will be
the incentive for TAP to move forward, and maybe KTAP could be
merged back into TAP.

-Frank

> 
>> KTAP vs TAP, but I do feel strongly that kselftest and kunit should use the
>> same version line (if we can get them to use the same conventions).
> 
> Yeah, I agree: it would be better if there was just one version of
> (K)TAP in the kernel.
> 
>>>> Test plan line
>>>> --------------
>>>> The test plan indicates the number of individual test cases intended to
>>>> be executed by the test. It always starts with "1.." and is followed
>>>> by the number of tests cases.  In the example above, 1..1", indicates
>>>> that this test reports only 1 test case.
>>>>
>>>> The test plan line can be placed in two locations:
>>>>  * the second line of test output, when the number of test cases is known
>>>>    in advance
>>>>  * as the last line of test output, when the number of test cases is not
>>>>    known in advance.
>>>>
>>>> Most often, the number of test cases is known in advance, and the test plan
>>>> line appears as the second line of test output, immediately following
>>>> the output version line.  The number of test cases might not be known
>>>> in advance if the number of tests is calculated from runtime data.
>>>> In this case, the test plan line is emitted as the last line of test
>>>> output.
>>>
>>> KUnit is currently including the test plan line only for subtests, as
>>> the current version doesn't actually know how many test suites will
>>> run in advance.
>>> This is something there's work underway to fix, though.
>> Sounds good.  You can just put the line at the bottom if it's
>> obnoxious to calculate ahead of time.
> 
> I thought that is not in the TAP spec?
> 
> I kind of like printing out ahead of time how many tests we expect to
> run, so if we crash we know how many tests weren't run.
> 
> In any case, until we get the change in that David is referencing, we
> cannot print out the test plan for the "super test" before or after
> because KUnit doesn't know when it is "done". So moving it to the
> bottom doesn't really help us.
> 
>> Does this mean that KUnit treats each sub-test as an individual test case
>> of the "super-test"?
> 
> Yes.
> 
> At the top level, we have all test suites. Each subtest in TAP is a
> test suite in KUnit. Each case in each subtest in TAP is a test case
> in KUnit.
> 
>> In results summaries for a super-test, are all sub-test cases counted,
>> or just the list of sub-tests?
> 
> Just the sub-tests. Each subtest is responsible for counting it's own cases:
> 
> https://github.com/isaacs/testanything.github.io/blob/tap14/tap-version-14-specification.md#subtests
> 
> Cheers
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-16 16:42       ` Bird, Tim
  2020-06-16 19:44         ` Brendan Higgins
@ 2020-06-19 18:33         ` Frank Rowand
  1 sibling, 0 replies; 43+ messages in thread
From: Frank Rowand @ 2020-06-19 18:33 UTC (permalink / raw)
  To: Bird, Tim, Paolo Bonzini, Kees Cook
  Cc: shuah, linux-kselftest, Brendan Higgins, linux-kernel, David Gow

On 2020-06-16 11:42, Bird, Tim wrote:
> 
> 
>> -----Original Message-----
>> From: Paolo Bonzini <pbonzini@redhat.com>
>>
>> On 15/06/20 21:07, Bird, Tim wrote:
>>>> Note: making the plan line required differs from TAP13 and TAP14. I
>>>> think it's the right choice, but we should be clear.
>>
>> As an aside, where is TAP14?
> By TAP14, I was referring to the current, undocumented, KUnit
> conventions.

When "TAP14" is used in this discussion, let's please use the current
proposed TAP14 spec which Brendan has provided a link to.  If you
want to describe current KUnit conventions, please say current
KUnit conventions.

> 
>>
>>> With regards to making it optional or not, I don't have a strong
>>> preference.  The extra info seems helpful in some circumstances.
>>> I don't know if it's too onerous to make it a requirement or not.
>>> I'd prefer if it was always there (either at the beginning or the end),
>>> but if there is some situation where it's quite difficult to calculate,
>>> then it would be best not to mandate it. I can't think of any impossible
>>> situations at the moment.
>>
>> I think making the plan mandatory is a good idea.  "Late plans" work
>> very well for cases where you cannot know in advance the number of tests
>> (for example in filters that produce TAP from other output), and provide
>> an additional safety net.
>>
>>>> "Bail out!" to be moved to "optional" elements, since it may not appear.
>>>> And we should clarify TAP13 and TAP14's language to say it should only
>>>> appear when the test is aborting without running later tests -- for this
>>>> reason, I think the optional "description" following "Bail out!" should
>>>> be made required. I.e. it must be: "Bail out! $reason"
>>>
>>> I'll make sure this is listed as optional.
>>> I like adding a mandatory reason.
>>
>> +1.
>>
>>>> TAP13/14 makes description optional, are we making it required (I think
>>>> we should). There seems to be a TAP13/14 "convention" of starting
>>>> <description> with "- ", which I'm on the fence about it. It does make
>>>> parsing maybe a little easier.
>>>
>>> I would like the description to be required.
>>> I don't have a strong opinion on the dash.  I'm OK with either one (dash
>>> or no dash), but we should make kselftest and KUnit consistent.
>>
>> I think no mandatory dash is better (or even mandatory no-dash!).  We
>> can suggest removing it when formatting TAP output.
> 
> My personal preference is to have the dash.  I think it's more human readable.
> I note that the TAP spec has examples of result lines both with and without
> the dash, so even the spec is ambiguous on this.   I think not mandating it
> either way is probably best.  For regex parsers, it's easy to ignore with '[-]?'
> outside the pattern groups that grab the number and description.
> 
>>
>>>>> Finally, it is possible to use a test directive to indicate another
>>>>> possible outcome for a test: that it was skipped.  To report that
>>>>> a test case was skipped, the result line should start with the
>>>>> result "not ok", and the directive "# SKIP" should be placed after
>>>>> the test description. (Note that this deviates from the TAP13
>>>>> specification).
>>
>> How so?  The description comes first, but there can be a description of
>> the directive.
> None of the examples of skips in the TAP13 spec have a test descriptions before
> the '# SKIP' directive.  But maybe I read too much into the examples. There is a
> format example, and a list of items in a result line that both have the test description
> before the directive.  So maybe I read this wrong.

Yes, I think you read too much into the examples.  I think the TAP spec
is very hard to read in the current form (v13 and proposed v14).  If
we create a KTAP spec, I can do editing to clean up some of the issues
or give review comments on what issues about clarity that I see and how
to fix them.

I read the spec as saying that the description is optional, but if the
description exists in a test line that contains "# TODO explanation"
then the description will precede the "#".

(Yes, you are seeing "I volunteer" because the current spec is so
frustrating to me.)

> 
>>
>>      not ok 4 - Summarized correctly # TODO Not written yet
>>
>>>>> It is usually helpful if a diagnostic message is emitted to explain
>>>>> the reasons for the skip.  If the message is a single line and is
>>>>> short, the diagnostic message may be placed after the '# SKIP'
>>>>> directive on the same line as the test result.  However, if it is
>>>>> not on the test result line, it should precede the test line (see
>>>>> diagnostic data, next).
>>>>>
>>>>> Bail out!
>>>>> ---------
>>>>> If a line in the test output starts with 'Bail out!', it indicates
>>>>> that the test was aborted for some reason.  It indicates that
>>>>> the test is unable to proceed, and no additional tests will be
>>>>> performed.
>>>>>
>>>>> This can be used at the very beginning of a test, or anywhere in the
>>>>> middle of the test, to indicate that the test can not continue.
>>>>
>>>> I think the required syntax should be:
>>>>
>>>> Bail out! <reason>
>>>>
>>>> And to make it clear that this is optionally used to indicate an early
>>>> abort. (Though with a leading plan line, a parser should be able to
>>>> determine this on its own.)
>>
>> True.  However, "Bail out!" allow to distinguish issues with the harness
>> (such as ENOSPC) from test aborts.
>>
>>>>>  - TODO directive
>>>>
>>>> Agreed: SKIP should cover everything TODO does.
>>
>> XFAIL/XPASS are different from SKIP.  I personally don't have a need for
>> them, but kselftests includes XFAIL/XPASS exit codes and they aren't
>> reflected into selftests/kselftest/runner.sh.
>>
>> Likewise, kselftest.h has ksft_inc_xfail_cnt but not
>> ksft_test_result_xfail/ksft_test_result_xpass.
>>
>> It's important to notice in the spec that the TODO directive inverts the
>> direction of ok/not-ok (i.e. XFAIL, the "good" result, is represented by
>> "not ok # TODO").
> 
> The TAP13 spec is not explicit about the result for TODO (and only provides
> one example), but the text *does* say a TODO can represent a bug to be fixed.
> This makes it the equivalent of XFAIL.  I hadn't noticed this before.  Thanks.

TAP 13 spec:

  "Note that if the TODO has an explanation it must be separated from TODO by a
  space. These tests represent a feature to be implemented or a bug to be fixed
  and act as something of an executable “things to do” list. They are not expected
  to succeed. Should a todo test point begin succeeding, the harness should report
  it as a bonus. This indicates that whatever you were supposed to do has been done 
  and you should promote this to a normal test point."

That seems very clear and explicit to me about what the intent and usage of TODO is.


> 
>>
>>>>>  - test identifier
>>>>>     - multiple parts, separated by ':'

I don't find "identifier" when I grep for it in KUnit related places.  And
it has been a while since I've read through the KUnit code.  So I am a
little lost about what a "test identifier" is.  Is this part of the test
line "Description"?  Any pointers to what this is?

-Frank


>>>>
>>>> This is interesting... is the goal to be able to report test status over
>>>> time by name?
>>
>> What about "/" instead?
> In my experience, / is used in a lot of test descriptions (when quoting
> file paths) (not in kselftest, but in lots of other tests).  Both Fuego
> and KernelCI use colons, and that's what kselftest already uses,
> so it seems like a good choice.
> 
>>
>>>>> Finally,
>>>>>   - Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
>>>>> See https://testanything.org/tap-version-13-specification.html
>>>>
>>>> Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it
>>>> did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP
>>>> relate? Neither SKIP nor XFAIL count toward failure, though, so both
>>>> should be "ok"? I guess we should change it to "ok".
>>
>> See above for XFAIL.
>>
>> I initially raised the issue with "SKIP" because I have a lot of tests
>> that depend on hardware availability---for example, a test that does not
>> run on some processor kinds (e.g. on AMD, or old Intel)---and for those
>> SKIP should be considered a success.
>>
>> Paolo
>>
>>> I have the same initial impression.  In my mind, a skip is "not ok", since
>>> the test didn't run. However, a SKIP and should be treated differently
>>> from either "ok" or "not ok" by the results interpreter, so I don't think it
>>> matters.  Originally I was averse to changing the SKIP result to "ok"
>>> (as suggested by Paulo Bonzini [1]), but I checked and it's pretty trivial to
>>> change the parser in Fuego, and it would make the kernel results format
>>> match the TAP13 spec.  I don't see a strong reason for us to be different
>>> from TAP13 here.
>>>
>>> I raised this issue on our automated testing conference call last week
>>> (which includes people from the CKI, Fuego, KernelCI and LKFT projects), and
>>> so people should be chiming in if their parser will have a problem with this change.)
>>>
>>> [1]  https://lkml.kernel.org/lkml/20200610154447.15826-1-pbonzini@redhat.com/T/
>>>
>>> Thanks very much for the feedback.
>>>  -- Tim
>>>
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-16 23:58           ` Kees Cook
@ 2020-06-19 18:47             ` Frank Rowand
  2020-06-19 19:11               ` Kees Cook
  2020-06-19 22:58               ` Paolo Bonzini
  0 siblings, 2 replies; 43+ messages in thread
From: Frank Rowand @ 2020-06-19 18:47 UTC (permalink / raw)
  To: Kees Cook, Brendan Higgins
  Cc: Bird, Tim, Paolo Bonzini, shuah, linux-kselftest, linux-kernel,
	David Gow

On 2020-06-16 18:58, Kees Cook wrote:
> On Tue, Jun 16, 2020 at 12:44:28PM -0700, Brendan Higgins wrote:
>> On Tue, Jun 16, 2020 at 9:42 AM Bird, Tim <Tim.Bird@sony.com> wrote:
>>>> From: Paolo Bonzini <pbonzini@redhat.com>
>>>>
>>>> On 15/06/20 21:07, Bird, Tim wrote:
>>>>>> Note: making the plan line required differs from TAP13 and TAP14. I
>>>>>> think it's the right choice, but we should be clear.
>>>>
>>>> As an aside, where is TAP14?
>>> By TAP14, I was referring to the current, undocumented, KUnit
>>> conventions.
>>
>> Not so. TAP14 is the proposed next version of TAP13:
>>
>> https://github.com/TestAnything/testanything.github.io/pull/36
>> https://github.com/isaacs/testanything.github.io/blob/tap14/tap-version-14-specification.md
> 
> I was reading this (I haven't compared to the blob above):
> 
> https://github.com/TestAnything/Specification/blob/tap-14-specification/specification.md
> 
>> Based on the discussion, it seems like most of the things we wanted
>> from TAP14 would probably make it in if TAP ever accepts another pull
>> request.
> 
> Were our leading diagnostic lines part of their latest spec? I thought
> we were pretty far off in left field for that particular bit.
> 
>>> My personal preference is to have the dash.  I think it's more human readable.
>>> I note that the TAP spec has examples of result lines both with and without
>>> the dash, so even the spec is ambiguous on this.   I think not mandating it
>>> either way is probably best.  For regex parsers, it's easy to ignore with '[-]?'
>>> outside the pattern groups that grab the number and description.
>>
>> I don't think we care, because we don't use it.
> 
> Yeah, I'm in the same place. I don't care -- I would just like a
> determination. (The "implied" nature of it in TAP14 bothers me.)
> 
>>>> XFAIL/XPASS are different from SKIP.  I personally don't have a need for
>>>> them, but kselftests includes XFAIL/XPASS exit codes and they aren't
>>>> reflected into selftests/kselftest/runner.sh.
>>>>
>>>> Likewise, kselftest.h has ksft_inc_xfail_cnt but not
>>>> ksft_test_result_xfail/ksft_test_result_xpass.
> 
> I proposed fixing that recently[1]. seccomp uses XFAIL for "I have
> detected you lack the config to test this, so I can't say it's working
> or not, because it only looks like a failure without the config."

Based on that description, the case sounds like it should be a skip.

Or if the entire test depends on the missing config then Bail out might
be appropriate.

> 
> [1] https://lore.kernel.org/lkml/20200611224028.3275174-7-keescook@chromium.org/
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-16 23:52     ` Kees Cook
@ 2020-06-19 18:52       ` Frank Rowand
  2020-06-19 19:50       ` Brendan Higgins
  1 sibling, 0 replies; 43+ messages in thread
From: Frank Rowand @ 2020-06-19 18:52 UTC (permalink / raw)
  To: Kees Cook, Bird, Tim
  Cc: shuah, linux-kselftest, Brendan Higgins, linux-kernel, David Gow,
	Paolo Bonzini, Frank Rowand

On 2020-06-16 18:52, Kees Cook wrote:
> On Mon, Jun 15, 2020 at 07:07:34PM +0000, Bird, Tim wrote:
>> From: Kees Cook <keescook@chromium.org>
>>> Note: making the plan line required differs from TAP13 and TAP14. I
>>> think it's the right choice, but we should be clear.
>>
>> [...]
>> With regards to making it optional or not, I don't have a strong
>> preference.  The extra info seems helpful in some circumstances.
>> I don't know if it's too onerous to make it a requirement or not.
>> I'd prefer if it was always there (either at the beginning or the end),
>> but if there is some situation where it's quite difficult to calculate,
>> then it would be best not to mandate it. I can't think of any impossible
>> situations at the moment.
> 
> I think we should require one of:
> 
> - starting plan line
> - ending plan line

> - ending with something that indicates "I'm done, but I have no idea how
>   many tests actually ran" (Maybe "1..?")

I understand the desire to be able to say "I don't know how many tests
actually ran", but the use of it should be discouraged, even if available.

-Frank

> 
> To me, the point of the plan line is to be able to say "this test did,
> in fact, finish". So even if some test can't even count how many tests
> it _ran_, it can at least say "I am now finished".
> 
>>> TAP13/14 makes description optional, are we making it required (I think
>>> we should). There seems to be a TAP13/14 "convention" of starting
>>> <description> with "- ", which I'm on the fence about it. It does make
>>> parsing maybe a little easier.
>>
>> I would like the description to be required.
>> I don't have a strong opinion on the dash.  I'm OK with either one (dash
>> or no dash), but we should make kselftest and KUnit consistent.
> 
> I find the dash to be distracting -- it doesn't help me scan it, and it
> doesn't help a parser (which only needs to find "#").
> 
>>>> Differences between kernel test result format and TAP13:
>>>>  - in KTAP the "# SKIP" directive is placed after the description on
>>>>    the test result line
> 
> I sent a bunch of clean-ups for kselftest.h recently[1], but it looks
> like we'll need more for adding "description" to skip (right now it only
> prints the SKIP reason).
> 
> [1] https://lore.kernel.org/lkml/20200611224028.3275174-1-keescook@chromium.org/
> 
>>> Yes Documentation/*.rst Not sure on name yet, but where do kselftest
>>> docs live? :)
>> Documentation/dev-tools/kselftest.rst
>>
>> I'll put this at: Documentation/dev-tools/test-results-format.rst
> 
> Sounds good!
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-19 18:47             ` Frank Rowand
@ 2020-06-19 19:11               ` Kees Cook
  2020-06-19 22:58               ` Paolo Bonzini
  1 sibling, 0 replies; 43+ messages in thread
From: Kees Cook @ 2020-06-19 19:11 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Brendan Higgins, Bird, Tim, Paolo Bonzini, shuah,
	linux-kselftest, linux-kernel, David Gow

On Fri, Jun 19, 2020 at 01:47:29PM -0500, Frank Rowand wrote:
> On 2020-06-16 18:58, Kees Cook wrote:
> > I proposed fixing that recently[1]. seccomp uses XFAIL for "I have
> > detected you lack the config to test this, so I can't say it's working
> > or not, because it only looks like a failure without the config."
> 
> Based on that description, the case sounds like it should be a skip.

hrm hrm. Yeah. Thinking more about this, I agree. I think it came about
this way because the kselftest_harness.h API (not TAP output) is different from the
kselftest.h API (TAP output), and so the tests were written with what
was available in kselftest_harness.h which has no "SKIP" idea.

The series linked was mostly built to bring kselftest_harness.h into the
TAP output universe, so the XFAIL -> SKIP mapping needs to be included
as well.

> Or if the entire test depends on the missing config then Bail out might
> be appropriate.
> 
> > [1] https://lore.kernel.org/lkml/20200611224028.3275174-7-keescook@chromium.org/

I will rework this series.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-16 20:37       ` Bird, Tim
  2020-06-17  0:02         ` Kees Cook
@ 2020-06-19 19:32         ` Brendan Higgins
  1 sibling, 0 replies; 43+ messages in thread
From: Brendan Higgins @ 2020-06-19 19:32 UTC (permalink / raw)
  To: Bird, Tim
  Cc: David Gow, shuah, linux-kselftest, linux-kernel, Kees Cook,
	Paolo Bonzini

On Tue, Jun 16, 2020 at 1:37 PM Bird, Tim <Tim.Bird@sony.com> wrote:
>
> > -----Original Message-----
> > From: Brendan Higgins <brendanhiggins@google.com>
> >
> > On Mon, Jun 15, 2020 at 10:34 AM Bird, Tim <Tim.Bird@sony.com> wrote:
> > >
> > > > -----Original Message-----
> > > > From: David Gow <davidgow@google.com>
> > > >
> > > > On Thu, Jun 11, 2020 at 2:11 AM Bird, Tim <Tim.Bird@sony.com> wrote:
> > [...]
> > > > KUnit is currently outputting "TAP version 14", as we were hoping some
> > > > of our changes would get into the TAP14 spec. (Any comments, Brendan?)
> > > > Maybe this should end up saying "KTAP version 1" or something?
> > >
> > > I don't know if this will break any existing results parsers or not.
> > > I hesitate to use "TAP version 14", as TAP appears to be a dormant
> > > initiative at the moment, and there's no guarantee that the kernel's
> > > changes will get adopted into an official spec.
> >
> > We were using "TAP version 14" since the "extensions" we are using
> > were all proposed among the TAP people to go into the next version of
> > TAP. Based on discussions among them they seem to like the subtest
> > stuff:
> >
> > https://github.com/TestAnything/testanything.github.io/pull/36
> >
> > Anyway, I can still appreciate that they might change their minds.
> I don't know if there's anyone left around to change their mind.
>
> I have to agree with isaacs (who proposed TAP14 5 years ago),
> that the TAP specification is in a weird state.
>
> https://github.com/TestAnything/testanything.github.io/pull/36#issuecomment-566321633
>
> There have been commits to the github respository by a few different
> people recently (3 commits in the last 9 months).  But there's no body to approve or
> disapprove of a new spec.
>
> >
> > > If we are a strict super-set of TAP, then I suppose we could just
> > > start using TAP version 14, and unilaterally declare that our changes
> > > make a new spec.  But since we don't control the web site this feels
> > > like a hostile takeover.
> >
> > I just thought it was okay because it was already in their proposed
> > TAP14 spec, but yeah, if we just decide amongst ourselves to use it,
> > we should probably do something else.
> >
> > > I'm most comfortable with calling our thing KTAP, and just
> > > referencing TAP as inspiration.  I don't have a strong opinion on
> >
> > I am cool with that.
> I hate forks, but if we do go with declaring our own fork as KTAP,
> then we should probably pull in the TAP14 spec as a starting point.
> Since it has no home other than in a pull request, it seems a bit tentative.
> We may need to put the spec in the kernel source or something.

Yeah, I didn't really want to fork either, but I don't see any other
way around it given the current state of TAP. Seems like if we want an
updated maintained spec that doesn't live in a pull request, then it
is up to us to give it a new home.

> I think we're definitely not doing anything with the yaml blocks at
> the moment (or maybe ever), so there's one big point of departure.

Yeah, seeing that everyone seems amenable to the expectation blocks
that KUnit uses in our version of the spec, I don't see any other
reason to keep YAML, and I agree it is probably the least desirable
thing that they have in the spec. So yeah, I am on board with omitting
YAML from KTAP.

> > > KTAP vs TAP, but I do feel strongly that kselftest and kunit should use the
> > > same version line (if we can get them to use the same conventions).
> >
> > Yeah, I agree: it would be better if there was just one version of
> > (K)TAP in the kernel.
>
> One thing that needs to be rationalized between KUnit and selftest
> is the syntax for subtests.  KUnit follows the TAP14 spec, and starts
> subtests with indented "# Subtest: name of the child test"
> and selftests just indents the output from the child test, so it
> starts with indented "TAP version 13".  One issue I have with the
> TAP14/KUnit approach is that the output from the child is different
> depending on whether the test is called in the context of another
> test or not.

I can see (in later emails), why you would say that. On the flip side,
I think that indenting and starting off with "# Subtest: name of the
child test" makes it a lot more readable for a human.

> In KUnit, is it the parent test or the child test that prints out the
> "# Subtest: ..." line?

Short answer: The child test.

Long answer: The answer is complicated. From KUnit's perspective the
"parent test" is always just the collection of all test suites. Right
now there is no loop that runs each test suite; each one is a
late_initcall. However, that is something that we are working on
fixing (I really need to get the next version of that patchset out).

Cheers

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-16 21:16   ` Bird, Tim
  2020-06-16 21:19     ` Bird, Tim
  2020-06-17  0:06     ` Kees Cook
@ 2020-06-19 19:39     ` Brendan Higgins
  2 siblings, 0 replies; 43+ messages in thread
From: Brendan Higgins @ 2020-06-19 19:39 UTC (permalink / raw)
  To: Bird, Tim, Alan Maguire
  Cc: shuah, linux-kselftest, linux-kernel, Kees Cook, Paolo Bonzini,
	David Gow

On Tue, Jun 16, 2020 at 2:16 PM Bird, Tim <Tim.Bird@sony.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Brendan Higgins
> >
> > On Wed, Jun 10, 2020 at 06:11:06PM +0000, Bird, Tim wrote:
> > > Some months ago I started work on a document to formalize how
> > > kselftest implements the TAP specification.  However, I didn't finish
> > > that work.  Maybe it's time to do so now.
> > >
> > > kselftest has developed a few differences from the original
> > > TAP specification, and  some extensions that I believe are worth
> > > documenting.
> > >
> > > Essentially, we have created our own KTAP (kernel TAP)
> > > format.  I think it is worth documenting our conventions, in order to
> > > keep everyone on the same page.
> > >
> > > Below is a partially completed document on my understanding
> > > of KTAP, based on examination of some of the kselftest test
> > > output.  I have not reconciled this with the kunit output format,
> > > which I believe has some differences (which maybe we should
> > > resolve before we get too far into this).
> > >
> > > I submit the document now, before it is finished, because a patch
> > > was recently introduced to alter one of the result conventions
> > > (from SKIP='not ok' to SKIP='ok').
> > >
> > > See the document include inline below
> > >
> > > ====== start of ktap-doc-rfc.txt ======
> >
> > [...]
> >
> > > --- from here on is not-yet-organized material
> > >
> > > Tip:
> > >  - don't change the test plan based on skipped tests.
> > >    - it is better to report that a test case was skipped, than to
> > >      not report it
> > >    - that is, don't adjust the number of test cases based on skipped
> > >      tests
> > >
> > > Other things to mention:
> > > TAP13 elements not used:
> > >  - yaml for diagnostic messages
> >
> > We talked about this before, but I would like some way to get failed
> > expectation/assertion information in the test in a consistent machine
> > parsible way. Currently we do the following:
> >
> >     # Subtest: example
> >     1..1
> >     # example_simple_test: initializing
> >     # example_simple_test: EXPECTATION FAILED at lib/kunit/kunit-example-test.c:29
> >     Expected 1 + 1 == 3, but
> >         1 + 1 == 2
> >         3 == 3
> >     not ok 1 - example_simple_test
> > not ok 5 - example
> >
> > Technically not TAP compliant, but no one seems to mind. I am okay with
> > keeping it the way it is, but if we don't want it in the KTAP spec, we
> > will need some kind of recourse.
>
> So far, most of the CI systems don't parse out diagnostic data, so it doesn't
> really matter what the format is.  If it's useful for humans, it's valuable as is.
> However, it would be nice if that could change.  But without some formalization
> of the format of the diagnostic data, it's an intractable problem for CI systems
> to parse it.  So it's really a chicken and egg problem.  To solve it, we would have
> to determine what exactly needs to be provided on a consistent basis for diagnostic
> data across many tests.  I think that it's too big a problem to handle right now.
> I'm not opposed to migrating to some structure with yaml in the future, but free
> form text output seems OK for now.

Well as long as everyone is cool with it for now we can put the
problem for later.

> > >    - reason: try to keep things line-based, since output from other things
> > >    may be interspersed with messages from the test itself
> > >  - TODO directive
> >
> > Is this more of stating a fact or desire? We don't use TODO either, but
> > it looks like it could be useful.
> Just stating a fact.  I didn't find TODO in either KUnit or selftest in
> November when I initially wrote this up.  If TODO serves as a kind
> of XFAIL, it could be useful.  I have nothing against it.

Fair enough.

> > > KTAP Extensions beyond TAP13:
> > >  - nesting
> > >    - via indentation
> > >      - indentation makes it easier for humans to read
> > >  - test identifier
> > >     - multiple parts, separated by ':'
> >
> > Can you elabroate on this more? I am not sure what you mean.
> An individual test case can have a name that is scoped by a containing
> test or test suite.  For example: selftests: cpufreq: main.sh
> This test identifier consists of the test system (selftests), the test
> area (cpufreq), and the test case name (main.sh).  This one's a bit
> weird because the test case name is just the name of the program
> in that test area.  The program itself doesn't output data in TAP format,
> and the harness uses it's exit code to detect PASS/FAIL.  if main.sh had
> multiple test cases, it might produce test identifiers like this:
> selftests: cpufreq: main: check_change_afinity_mask
> selftests: cpufreq: main: check_permissions_for_mask_operation
> (Or it might just produce the last part of these strings, the
> testcase names, and the testcase id might be something generated
> by the harness or CI system.)

+Alan Maguire

Aha, that is something we (Alan, David, Kees, and myself) were talking
about on another thread:

https://lore.kernel.org/linux-kselftest/CABVgOSnjrzfFOMF0VE1-5Ks-e40fc0XZsNZ92jE60ZOhYmZWog@mail.gmail.com/T/#m682be9f9103f7b363b702e49c137d83a4833fcae

I think that makes a lot of sense if it isn't too hard in practice.

> The value of having a single string to identify the testcase (like a
> uniform resource locator), is that it's easier to use the string to
> correlate results produced from different CI system that are executing
> the same test.

Makes sense.

> > >  - summary lines
> > >    - can be skipped by CI systems that do their own calculations
> > >
> > > Other notes:
> > >  - automatic assignment of result status based on exit code
> > >
> > > Tips:
> > >  - do NOT describe the result in the test line
> > >    - the test case description should be the same whether the test
> > >      succeeds or fails
> > >    - use diagnostic lines to describe or explain results, if this is
> > >      desirable
> > >  - test numbers are considered harmful
> > >    - test harnesses should use the test description as the identifier
> > >    - test numbers change when testcases are added or removed
> > >      - which means that results can't be compared between different
> > >        versions of the test
> > >  - recommendations for diagnostic messages:
> > >    - reason for failure
> > >    - reason for skip
> > >    - diagnostic data should always preceding the result line
> > >      - problem: harness may emit result before test can do assessment
> > >        to determine reason for result
> > >      - this is what the kernel uses
> > >
> > > Differences between kernel test result format and TAP13:
> > >  - in KTAP the "# SKIP" directive is placed after the description on
> > >    the test result line
> > >
> > > ====== start of ktap-doc-rfc.txt ======
> > > OK - that's the end of the RFC doc.
> > >
> > > Here are a few questions:
> > >  - is this document desired or not?
> > >     - is it too long or too short?
> > >  - if the document is desired, where should it be placed?
> >
> > I like it. I don't think we can rely on the TAP people updating their
> > stuff based on my interactions with them. So having a spec which is
> > actually maintained would be nice.
> >
> > Maybe in Documentation/dev-tools/ ?
> I'm leaning towards Documentation/dev-tools/test-results_format.rst

SGTM.

> > >    I assume somewhere under Documentation, and put into
> > >    .rst format. Suggestions for a name and location are welcome.
> > >  - is this document accurate?
> > >    I think KUNIT does a few things differently than this description.
> > >    - is the intent to have kunit and kselftest have the same output format?
> > >       if so, then these should be rationalized.
> >
> > Yeah, I think it would be nice if all test frameworks/libraries for the
> > kernel output tests in the same language.
> Agreed.

Cheers

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-17  4:05           ` David Gow
@ 2020-06-19 19:44             ` Brendan Higgins
  2020-06-19 20:19             ` Frank Rowand
  1 sibling, 0 replies; 43+ messages in thread
From: Brendan Higgins @ 2020-06-19 19:44 UTC (permalink / raw)
  To: David Gow
  Cc: Kees Cook, Bird, Tim, shuah, linux-kselftest, linux-kernel,
	Paolo Bonzini

On Tue, Jun 16, 2020 at 9:06 PM David Gow <davidgow@google.com> wrote:
>
> On Wed, Jun 17, 2020 at 11:36 AM Kees Cook <keescook@chromium.org> wrote:
> >
> > On Wed, Jun 17, 2020 at 02:30:45AM +0000, Bird, Tim wrote:
> > > Agreed.  You only need machine-parsable data if you expect the CI
> > > system to do something more with the data than just present it.
> > > What that would be, that would be common for all tests (or at least
> > > many test), is unclear.  Maybe there are patterns in the diagnostic
> > > data that could lead to higher-level analysis, or even automated
> > > fixes, that don't become apparent if the data is unstructured.  But
> > > it's hard to know until you have lots of data.  I think just getting
> > > the other things consistent is a good priority right now.
> >
> > Yeah. I think the main place for this is performance analysis, but I
> > think that's a separate system entirely. TAP is really strictly yes/no,
> > where as performance analysis a whole other thing. The only other thing
> > I can think of is some kind of feature analysis, but that would be built
> > out of the standard yes/no output. i.e. if I create a test that checks
> > for specific security mitigation features (*cough*LKDTM*cough*), having
> > a dashboard that shows features down one axis and architectures and/or
> > kernel versions on other axes, then I get a pretty picture. But it's
> > still being built out of the yes/no info.
> >
> > *shrug*
> >
> > I think diagnostic should be expressly non-machine-oriented.
>
> So from the KUnit side, we sort-of have three kinds of diagnostic lines:
> - Lines printed directly from tests (typically using kunit_info() or
> similar functions): as I understand it, these are basically the
> equivalent of what kselftest typically uses diagnostics for --
> test-specific, human-readable messages. I don't think we need/want to
> parse these much.
> - Kernel messages during test execution. If we get the results from
> scraping the kernel log (which is still the default for KUnit, though
> there is also a debugfs info), other kernel logs can be interleaved
> with the results. Sometimes these are irrelevant things happening on
> another thread, sometimes they're something directly related to the
> test which we'd like to capture (KASAN errors, for instance). I don't
> think we want these to be machine oriented, but we may want to be able
> to filter them out.
> - Expectation failures: as Brendan mentioned, KUnit will print some
> diagnostic messages for individual assertion/expectation failures,
> including the expected and actual values. We'd ideally like to be able
> to identify and parse these, but keeping them human-readable is
> definitely also a goal.

Seems like a fair characterization to me.

> Now, to be honest, I doubt that the distinction here would be of much
> use to kselftest, but it could be nice to not go out of our way to
> make parsing some diagnostic lines possible. That being said,
> personally I'm all for avoiding the yaml for diagnostic messages stuff
> and sticking to something simple and line-based, possibly
> standardising a the format of a few common diagnostic measurements
> (e.g., assertions/expected values/etc) in a way that's both
> human-readable and parsable if possible.

Agreed.

> I agree that there's a lot of analysis that is possible with just the
> yes/no data. There's probably some fancy correlation one could do even
> with unstructured diagnostic logs, so I don't think overstructuring
> things is a necessity by any means. Where we have different tests
> doing similar sorts of things, though, consistency in message
> formatting could help even if things are not explicitly parsed.
> Ensuring that helper functions that log and the like are spitting
> things out in the same format is probably a good starting step down
> that path.

Super agreed. I am sure that there are people who can do some very
clever things with what is being described here, but I would prefer
not to overcomplicate the spec right out of the gate. It feels like we
only have two TAP versions to reconcile here, and we are mostly in
agreement on how to do that.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-15 19:07   ` Bird, Tim
  2020-06-16 12:08     ` Paolo Bonzini
  2020-06-16 23:52     ` Kees Cook
@ 2020-06-19 19:49     ` Frank Rowand
  2 siblings, 0 replies; 43+ messages in thread
From: Frank Rowand @ 2020-06-19 19:49 UTC (permalink / raw)
  To: Bird, Tim, Kees Cook
  Cc: shuah, linux-kselftest, Brendan Higgins, linux-kernel, David Gow,
	Paolo Bonzini, Frank Rowand

On 2020-06-15 14:07, Bird, Tim wrote:
> Kees,
> 
> Thanks for the great feedback.  See comments inline below.
> 
>> -----Original Message-----
>> From: Kees Cook <keescook@chromium.org>
>>
>> On Wed, Jun 10, 2020 at 06:11:06PM +0000, Bird, Tim wrote:
>>> The kernel test result format consists of 5 major elements,
>>> 4 of which are line-based:
>>>  * the output version line
>>>  * the plan line
>>
>> Note: making the plan line required differs from TAP13 and TAP14. I
>> think it's the right choice, but we should be clear.
> 
> Noted.  In re-reading my doc, I've conflated my sections.  The first
> section is "single-line", and the next section is "optional".  ???
> I'll fix that.
> 
> With regards to making it optional or not, I don't have a strong
> preference.  The extra info seems helpful in some circumstances.
> I don't know if it's too onerous to make it a requirement or not.
> I'd prefer if it was always there (either at the beginning or the end),
> but if there is some situation where it's quite difficult to calculate,
> then it would be best not to mandate it. I can't think of any impossible
> situations at the moment.
> 
>>
>>>  * one or more test result lines (also called test result lines)
>>>  * a possible "Bail out!" line
>>
>> "Bail out!" to be moved to "optional" elements, since it may not appear.
>> And we should clarify TAP13 and TAP14's language to say it should only
>> appear when the test is aborting without running later tests -- for this
>> reason, I think the optional "description" following "Bail out!" should
>> be made required. I.e. it must be: "Bail out! $reason"
> 
> I'll make sure this is listed as optional.
> I like adding a mandatory reason.
>>
>>> optional elements:
>>>  * diagnostic data
>>
>> nit: diagnostic lines (not data)
> OK.
> 
>>
>>> The 5th element is diagnostic information, which is used to describe
>>> items running in the test, and possibly to explain test results.
>>> A sample test result is show below:
>>>
>>> Some other lines may be placed the test harness, and are not emitted
>>> by individual test programs:
>>>  * one or more test identification lines
>>>  * a possible results summary line
>>>
>>> Here is an example:
>>>
>>> 	TAP version 13
>>> 	1..1
>>> 	# selftests: cpufreq: main.sh
>>> 	# pid 8101's current affinity mask: fff
>>> 	# pid 8101's new affinity mask: 1
>>> 	ok 1 selftests: cpufreq: main.sh
>>
>> Nit: for examples, I this should should show more than one test.
>> (Preferably, it should show all the possible cases, ok, not ok, SKIP,
>> etc.)
> Agree.  I will expand this.
> 
>>
>>> The output version line is: "TAP version 13"
>>>
>>> The test plan is "1..1".
>>>
>>> Element details
>>> ===============
>>>
>>> Output version line
>>> -------------------
>>> The output version line is always "TAP version 13".
>>>
>>> Although the kernel test result format has some additions
>>> to the TAP13 format, the version line reported by kselftest tests
>>> is (currently) always the exact string "TAP version 13"
>>>
>>> This is always the first line of test output.
>>>
>>> Test plan line
>>> --------------
>>> The test plan indicates the number of individual test cases intended to
>>> be executed by the test. It always starts with "1.." and is followed
>>> by the number of tests cases.  In the example above, 1..1", indicates
>>> that this test reports only 1 test case.
>>>
>>> The test plan line can be placed in two locations:
>>>  * the second line of test output, when the number of test cases is known
>>>    in advance
>>>  * as the last line of test output, when the number of test cases is not
>>>    known in advance.
>>>
>>> Most often, the number of test cases is known in advance, and the test plan
>>> line appears as the second line of test output, immediately following
>>> the output version line.  The number of test cases might not be known
>>> in advance if the number of tests is calculated from runtime data.
>>> In this case, the test plan line is emitted as the last line of test
>>> output.
>>
>> "... must be ..." ?
>>
>>>
>>> Test result lines
>>> -----------------
>>> The test output consists of one or more test result lines that indicate
>>> the actual results for the test.  These have the format:
>>>
>>>   <result> <number> <description> [<directive>] [<diagnostic data>]
>>
>> This should be:
>>
>> <result> <number> <description> [# [<directive> ][<diagnostic data>]]
>>
>>>
>>> The ''result'' must appear at the start of a line (except for when a
>>> test is nested, see below), and must consist of one of the following
>>> two phrases:
>>>   * ok
>>>   * not ok
>>>
>>> If the test passed, then the result is reported as "ok".  If the test
>>> failed, then the result is reported as "not ok".  These must be in
>>> lower case, exactly as shown.
>>>
>>> The ''number'' in the test result line represents the number of the
>>> test case being performed by the test program.  This is often used by
>>> test harnesses as a unique identifier for each test case.  The test
>>> number is a base-10 number, starting with 1.  It should increase by
>>> one for each new test result line emitted.  If possible the number
>>> for a test case should be kept the same over the lifetime of the test.
>>>
>>> The ''description'' is a short description of the test case.
>>> This can be any string of words, but should avoid using colons (':')
>>
>> Must also avoid "#".
> ok.
>>
>>> except as part of a fully qualifed test case name (see below).
>>
>> TAP13/14 makes description optional, are we making it required (I think
>> we should). There seems to be a TAP13/14 "convention" of starting
>> <description> with "- ", which I'm on the fence about it. It does make
>> parsing maybe a little easier.
> 
> I would like the description to be required.

Agree, description should be required.

-Frank

> I don't have a strong opinion on the dash.  I'm OK with either one (dash
> or no dash), but we should make kselftest and KUnit consistent.
> 
>>
>>> Finally, it is possible to use a test directive to indicate another
>>> possible outcome for a test: that it was skipped.  To report that
>>> a test case was skipped, the result line should start with the
>>> result "not ok", and the directive "# SKIP" should be placed after
>>> the test description. (Note that this deviates from the TAP13
>>> specification).
>>
>> This is what TAP14 changed, I think (i.e. directive follows description
>> now).
>>
>>>
>>> A test may be skipped for a variety of reasons, ranging for
>>> insufficient privileges to missing features or resources required
>>> to execute that test case.
>>>
>>> It is usually helpful if a diagnostic message is emitted to explain
>>> the reasons for the skip.  If the message is a single line and is
>>> short, the diagnostic message may be placed after the '# SKIP'
>>> directive on the same line as the test result.  However, if it is
>>> not on the test result line, it should precede the test line (see
>>> diagnostic data, next).
>>>
>>> Diagnostic data
>>> ---------------
>>> Diagnostic data is text that reports on test conditions or test
>>> operations, or that explains test results.  In the kernel test
>>> result format, diagnostic data is placed on lines that start with a
>>> hash sign, followed by a space ('# ').
>>>
>>> One special format of diagnostic data is a test identification line,
>>> that has the fully qualified name of a test case.  Such a test
>>> identification line marks the start of test output for a test case.
>>>
>>> In the example above, there are three lines that start with '#'
>>> which precede the test result line:
>>> 	# selftests: cpufreq: main.sh
>>> 	# pid 8101's current affinity mask: fff
>>> 	# pid 8101's new affinity mask: 1
>>> These are used to indicate diagnostic data for the test case
>>> 'selftests: cpufreq: main.sh'
>>>
>>> Material in comments between the identification line and the test
>>> result line are diagnostic data that can help to interpret the
>>> results of the test.
>>>
>>> The TAP specification indicates that automated test harnesses may
>>> ignore any line that is not one of the mandatory prescribed lines
>>> (that is, the output format version line, the plan line, a test
>>> result line, or a "Bail out!" line.)
>>>
>>> Bail out!
>>> ---------
>>> If a line in the test output starts with 'Bail out!', it indicates
>>> that the test was aborted for some reason.  It indicates that
>>> the test is unable to proceed, and no additional tests will be
>>> performed.
>>>
>>> This can be used at the very beginning of a test, or anywhere in the
>>> middle of the test, to indicate that the test can not continue.
>>
>> I think the required syntax should be:
>>
>> Bail out! <reason>
>>
>> And to make it clear that this is optionally used to indicate an early
>> abort. (Though with a leading plan line, a parser should be able to
>> determine this on its own.)
>>
>>> --- from here on is not-yet-organized material
>>>
>>> Tip:
>>>  - don't change the test plan based on skipped tests.
>>>    - it is better to report that a test case was skipped, than to
>>>      not report it
>>>    - that is, don't adjust the number of test cases based on skipped
>>>      tests
>>
>> Yes, totally.
>>
>>> Other things to mention:
>>> TAP13 elements not used:
>>>  - yaml for diagnostic messages
>>>    - reason: try to keep things line-based, since output from other things
>>>    may be interspersed with messages from the test itself
>>
>> Agreed: the yaml extension is not sensible for our use.
>>
>>>  - TODO directive
>>
>> Agreed: SKIP should cover everything TODO does.
>>
>>> KTAP Extensions beyond TAP13:
>>>  - nesting
>>
>> (I would call this 'subtests')
> Sounds good.  Will do.
> 
>>
>>>    - via indentation
>>>      - indentation makes it easier for humans to read
>>
>> And allows for separable parsing of subtests.
> Agree.  I'll try to work that into the doc.
> 
>>
>>>  - test identifier
>>>     - multiple parts, separated by ':'
>>
>> This is interesting... is the goal to be able to report test status over
>> time by name?
> Yes.  KernelCI and Fuego have the notions of a testcase namespace
> hierarchy.  As the number of tests expands it is helpful to have
> the name-space for a sub-test be limited, just like a filesystem hierarchy
> provides scope for the names of objects (directories and files) that
> it contains.
> 
>>
>>>  - summary lines
>>>    - can be skipped by CI systems that do their own calculations
>>
>> Right -- I think per-test summary line should be included for the humans
>> reading a single test (which may scroll off the screen).
>>
>>> Other notes:
>>>  - automatic assignment of result status based on exit code
>>
>> This is, I think, a matter for the kselftest running infrastructure, not
>> the KTAP output?
> Agreed.  This doesn't have anything to do with the API between
> the tests and the results processor.  I'll take it out.
>>
>>> Tips:
>>>  - do NOT describe the result in the test line
>>>    - the test case description should be the same whether the test
>>>      succeeds or fails
>>>    - use diagnostic lines to describe or explain results, if this is
>>>      desirable
>>
>> Right.
>>
>>>  - test numbers are considered harmful
>>>    - test harnesses should use the test description as the identifier
>>>    - test numbers change when testcases are added or removed
>>>      - which means that results can't be compared between different
>>>        versions of the test
>>
>> Right.
>>
>>>  - recommendations for diagnostic messages:
>>>    - reason for failure
>>>    - reason for skip
>>>    - diagnostic data should always preceding the result line
>>>      - problem: harness may emit result before test can do assessment
>>>        to determine reason for result
>>>      - this is what the kernel uses
>>
>> Right.
>>
>>> Differences between kernel test result format and TAP13:
>>>  - in KTAP the "# SKIP" directive is placed after the description on
>>>    the test result line
>>
>> Right, this is the same as TAP14, IIRC. KTAP's big deltas are the "#"
>> diagnostic lines and the subtest handling.
>>
>>> ====== start of ktap-doc-rfc.txt ======
>>> OK - that's the end of the RFC doc.
>>>
>>> Here are a few questions:
>>>  - is this document desired or not?
>>
>> Yes.
> Great.  I'll make this a priority to work on.
> 
>>
>>>     - is it too long or too short?
>>
>> Should be slightly longer: more examples.
>>
>>>  - if the document is desired, where should it be placed?
>>>    I assume somewhere under Documentation, and put into
>>>    .rst format. Suggestions for a name and location are welcome.
>>
>> Yes Documentation/*.rst Not sure on name yet, but where do kselftest
>> docs live? :)
> Documentation/dev-tools/kselftest.rst
> 
> I'll put this at: Documentation/dev-tools/test-results-format.rst
> 
>>
>>>  - is this document accurate?
>>>    I think KUNIT does a few things differently than this description.
>>
>> Let's fix it. :)
>>
>>>    - is the intent to have kunit and kselftest have the same output format?
>>>       if so, then these should be rationalized.
>>
>> Yes please.
>>
>>> Finally,
>>>   - Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
>>> See https://testanything.org/tap-version-13-specification.html
>>
>> Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it
>> did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP
>> relate? Neither SKIP nor XFAIL count toward failure, though, so both
>> should be "ok"? I guess we should change it to "ok".
> 
> I have the same initial impression.  In my mind, a skip is "not ok", since
> the test didn't run. However, a SKIP and should be treated differently
> from either "ok" or "not ok" by the results interpreter, so I don't think it
> matters.  Originally I was averse to changing the SKIP result to "ok"
> (as suggested by Paulo Bonzini [1]), but I checked and it's pretty trivial to
> change the parser in Fuego, and it would make the kernel results format
> match the TAP13 spec.  I don't see a strong reason for us to be different
> from TAP13 here.
> 
> I raised this issue on our automated testing conference call last week
> (which includes people from the CKI, Fuego, KernelCI and LKFT projects), and
> so people should be chiming in if their parser will have a problem with this change.)
> 
> [1]  https://lkml.kernel.org/lkml/20200610154447.15826-1-pbonzini@redhat.com/T/
> 
> Thanks very much for the feedback.
>  -- Tim
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-16 23:52     ` Kees Cook
  2020-06-19 18:52       ` Frank Rowand
@ 2020-06-19 19:50       ` Brendan Higgins
  1 sibling, 0 replies; 43+ messages in thread
From: Brendan Higgins @ 2020-06-19 19:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: Bird, Tim, shuah, linux-kselftest, linux-kernel, David Gow,
	Paolo Bonzini

On Tue, Jun 16, 2020 at 4:52 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Mon, Jun 15, 2020 at 07:07:34PM +0000, Bird, Tim wrote:
> > From: Kees Cook <keescook@chromium.org>
> > > Note: making the plan line required differs from TAP13 and TAP14. I
> > > think it's the right choice, but we should be clear.
> >
> > [...]
> > With regards to making it optional or not, I don't have a strong
> > preference.  The extra info seems helpful in some circumstances.
> > I don't know if it's too onerous to make it a requirement or not.
> > I'd prefer if it was always there (either at the beginning or the end),
> > but if there is some situation where it's quite difficult to calculate,
> > then it would be best not to mandate it. I can't think of any impossible
> > situations at the moment.
>
> I think we should require one of:
>
> - starting plan line
> - ending plan line
> - ending with something that indicates "I'm done, but I have no idea how
>   many tests actually ran" (Maybe "1..?")
>
> To me, the point of the plan line is to be able to say "this test did,
> in fact, finish". So even if some test can't even count how many tests
> it _ran_, it can at least say "I am now finished".

So the counting is actually not the hard part for us, it's figuring
out when we have finished. Again, the change that I am working on (I
REALLY need to get that out) should fix that, but until we get that
upstream, KUnit doesn't actually know when it is done running tests.

> > > TAP13/14 makes description optional, are we making it required (I think
> > > we should). There seems to be a TAP13/14 "convention" of starting
> > > <description> with "- ", which I'm on the fence about it. It does make
> > > parsing maybe a little easier.
> >
> > I would like the description to be required.
> > I don't have a strong opinion on the dash.  I'm OK with either one (dash
> > or no dash), but we should make kselftest and KUnit consistent.
>
> I find the dash to be distracting -- it doesn't help me scan it, and it
> doesn't help a parser (which only needs to find "#").

Yeah, I also prefer spaces and/or "#". I am okay if spaces are
optional only to aid human readability. And honestly I don't care
about this point too much. Just offering my 2 cents.

> > > > Differences between kernel test result format and TAP13:
> > > >  - in KTAP the "# SKIP" directive is placed after the description on
> > > >    the test result line
>
> I sent a bunch of clean-ups for kselftest.h recently[1], but it looks
> like we'll need more for adding "description" to skip (right now it only
> prints the SKIP reason).
>
> [1] https://lore.kernel.org/lkml/20200611224028.3275174-1-keescook@chromium.org/
>
> > > Yes Documentation/*.rst Not sure on name yet, but where do kselftest
> > > docs live? :)
> > Documentation/dev-tools/kselftest.rst
> >
> > I'll put this at: Documentation/dev-tools/test-results-format.rst
>
> Sounds good!
>
> --
> Kees Cook

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-17  4:05           ` David Gow
  2020-06-19 19:44             ` Brendan Higgins
@ 2020-06-19 20:19             ` Frank Rowand
  2020-06-19 23:47               ` Bird, Tim
  1 sibling, 1 reply; 43+ messages in thread
From: Frank Rowand @ 2020-06-19 20:19 UTC (permalink / raw)
  To: David Gow, Kees Cook
  Cc: Bird, Tim, Brendan Higgins, shuah, linux-kselftest, linux-kernel,
	Paolo Bonzini

On 2020-06-16 23:05, David Gow wrote:
> On Wed, Jun 17, 2020 at 11:36 AM Kees Cook <keescook@chromium.org> wrote:
>>
>> On Wed, Jun 17, 2020 at 02:30:45AM +0000, Bird, Tim wrote:
>>> Agreed.  You only need machine-parsable data if you expect the CI
>>> system to do something more with the data than just present it.
>>> What that would be, that would be common for all tests (or at least
>>> many test), is unclear.  Maybe there are patterns in the diagnostic
>>> data that could lead to higher-level analysis, or even automated
>>> fixes, that don't become apparent if the data is unstructured.  But
>>> it's hard to know until you have lots of data.  I think just getting
>>> the other things consistent is a good priority right now.
>>
>> Yeah. I think the main place for this is performance analysis, but I
>> think that's a separate system entirely. TAP is really strictly yes/no,
>> where as performance analysis a whole other thing. The only other thing
>> I can think of is some kind of feature analysis, but that would be built
>> out of the standard yes/no output. i.e. if I create a test that checks
>> for specific security mitigation features (*cough*LKDTM*cough*), having
>> a dashboard that shows features down one axis and architectures and/or
>> kernel versions on other axes, then I get a pretty picture. But it's
>> still being built out of the yes/no info.
>>
>> *shrug*
>>
>> I think diagnostic should be expressly non-machine-oriented.
> 
> So from the KUnit side, we sort-of have three kinds of diagnostic lines:
> - Lines printed directly from tests (typically using kunit_info() or
> similar functions): as I understand it, these are basically the
> equivalent of what kselftest typically uses diagnostics for --
> test-specific, human-readable messages. I don't think we need/want to
> parse these much.


> - Kernel messages during test execution. If we get the results from
> scraping the kernel log (which is still the default for KUnit, though
> there is also a debugfs info), other kernel logs can be interleaved
> with the results. Sometimes these are irrelevant things happening on
> another thread, sometimes they're something directly related to the
> test which we'd like to capture (KASAN errors, for instance). I don't
> think we want these to be machine oriented, but we may want to be able
> to filter them out.

This is an important conceptual difference between testing a user
space program (which is the environment that TAP initially was
created for) and testing kernel code.  This difference should be
addressed in the KTAP standard.  As noted above, a kernel test
case may call into other kernel code, where the other kernel code
generates messages that get into the test output.

One issue with the kernel issues is that they may be warnings or
errors, and to anyone other than the test creator it is probably
hard to determine whether the warnings and errors are reporting
bugs or whether they are expected results triggered by the test.

I created a solution to report what error(s) were expected for a
test, and a tool to validate whether the error(s) occurred or not.
This is currently in the devicetree unittests, but the exact
implementation should be discussed in the KUnit context, and it
should be included in the KTAP spec.

I can describe the current implementation and start a discussion
of any issues in this thread or I can start a new thread.  Whichever
seems appropriate to everyone.

-Frank


> - Expectation failures: as Brendan mentioned, KUnit will print some
> diagnostic messages for individual assertion/expectation failures,
> including the expected and actual values. We'd ideally like to be able
> to identify and parse these, but keeping them human-readable is
> definitely also a goal.
> 
> Now, to be honest, I doubt that the distinction here would be of much
> use to kselftest, but it could be nice to not go out of our way to
> make parsing some diagnostic lines possible. That being said,
> personally I'm all for avoiding the yaml for diagnostic messages stuff
> and sticking to something simple and line-based, possibly
> standardising a the format of a few common diagnostic measurements
> (e.g., assertions/expected values/etc) in a way that's both
> human-readable and parsable if possible.
> 
> I agree that there's a lot of analysis that is possible with just the
> yes/no data. There's probably some fancy correlation one could do even
> with unstructured diagnostic logs, so I don't think overstructuring
> things is a necessity by any means. Where we have different tests
> doing similar sorts of things, though, consistency in message
> formatting could help even if things are not explicitly parsed.
> Ensuring that helper functions that log and the like are spitting
> things out in the same format is probably a good starting step down
> that path.
> 
> Cheers,
> -- David
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-19 18:47             ` Frank Rowand
  2020-06-19 19:11               ` Kees Cook
@ 2020-06-19 22:58               ` Paolo Bonzini
  2020-06-20 14:51                 ` Frank Rowand
  1 sibling, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2020-06-19 22:58 UTC (permalink / raw)
  To: Frank Rowand, Kees Cook, Brendan Higgins
  Cc: Bird, Tim, shuah, linux-kselftest, linux-kernel, David Gow

On 19/06/20 20:47, Frank Rowand wrote:
> Or if the entire test depends on the missing config then Bail out might
> be appropriate.

No, in that case you want

	1..0 # SKIP: unsupported configuration

The spec is not clear if "Bail out!" is an error condition or just a
warning that only part of the test was run, but prove(1) and Automake
both treat it as the former, for example.

For example, an ENOSPC error creating a temporary file could be turned
into a bail-out, while an ENOSYS would be a skip.

Paolo


^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: RFC - kernel selftest result documentation (KTAP)
  2020-06-19 20:19             ` Frank Rowand
@ 2020-06-19 23:47               ` Bird, Tim
  0 siblings, 0 replies; 43+ messages in thread
From: Bird, Tim @ 2020-06-19 23:47 UTC (permalink / raw)
  To: Frank Rowand, David Gow, Kees Cook
  Cc: Brendan Higgins, shuah, linux-kselftest, linux-kernel, Paolo Bonzini

Just a quick note that there's been a lot of good discussion.

I have an updated draft of the document, but I need to review
the flurry of comments today, and I'm busy getting my slides
ready for a conference.  So I just wanted to give a heads up
that I'll be working on this (responding to comments and
hopefully posting an updated draft version) early next week.

Thanks for the feedback.
 -- Tim



> -----Original Message-----
> From: Frank Rowand <frowand.list@gmail.com>
> 
> On 2020-06-16 23:05, David Gow wrote:
> > On Wed, Jun 17, 2020 at 11:36 AM Kees Cook <keescook@chromium.org> wrote:
> >>
> >> On Wed, Jun 17, 2020 at 02:30:45AM +0000, Bird, Tim wrote:
> >>> Agreed.  You only need machine-parsable data if you expect the CI
> >>> system to do something more with the data than just present it.
> >>> What that would be, that would be common for all tests (or at least
> >>> many test), is unclear.  Maybe there are patterns in the diagnostic
> >>> data that could lead to higher-level analysis, or even automated
> >>> fixes, that don't become apparent if the data is unstructured.  But
> >>> it's hard to know until you have lots of data.  I think just getting
> >>> the other things consistent is a good priority right now.
> >>
> >> Yeah. I think the main place for this is performance analysis, but I
> >> think that's a separate system entirely. TAP is really strictly yes/no,
> >> where as performance analysis a whole other thing. The only other thing
> >> I can think of is some kind of feature analysis, but that would be built
> >> out of the standard yes/no output. i.e. if I create a test that checks
> >> for specific security mitigation features (*cough*LKDTM*cough*), having
> >> a dashboard that shows features down one axis and architectures and/or
> >> kernel versions on other axes, then I get a pretty picture. But it's
> >> still being built out of the yes/no info.
> >>
> >> *shrug*
> >>
> >> I think diagnostic should be expressly non-machine-oriented.
> >
> > So from the KUnit side, we sort-of have three kinds of diagnostic lines:
> > - Lines printed directly from tests (typically using kunit_info() or
> > similar functions): as I understand it, these are basically the
> > equivalent of what kselftest typically uses diagnostics for --
> > test-specific, human-readable messages. I don't think we need/want to
> > parse these much.
> 
> 
> > - Kernel messages during test execution. If we get the results from
> > scraping the kernel log (which is still the default for KUnit, though
> > there is also a debugfs info), other kernel logs can be interleaved
> > with the results. Sometimes these are irrelevant things happening on
> > another thread, sometimes they're something directly related to the
> > test which we'd like to capture (KASAN errors, for instance). I don't
> > think we want these to be machine oriented, but we may want to be able
> > to filter them out.
> 
> This is an important conceptual difference between testing a user
> space program (which is the environment that TAP initially was
> created for) and testing kernel code.  This difference should be
> addressed in the KTAP standard.  As noted above, a kernel test
> case may call into other kernel code, where the other kernel code
> generates messages that get into the test output.
> 
> One issue with the kernel issues is that they may be warnings or
> errors, and to anyone other than the test creator it is probably
> hard to determine whether the warnings and errors are reporting
> bugs or whether they are expected results triggered by the test.
> 
> I created a solution to report what error(s) were expected for a
> test, and a tool to validate whether the error(s) occurred or not.
> This is currently in the devicetree unittests, but the exact
> implementation should be discussed in the KUnit context, and it
> should be included in the KTAP spec.
> 
> I can describe the current implementation and start a discussion
> of any issues in this thread or I can start a new thread.  Whichever
> seems appropriate to everyone.
> 
> -Frank
> 
> 
> > - Expectation failures: as Brendan mentioned, KUnit will print some
> > diagnostic messages for individual assertion/expectation failures,
> > including the expected and actual values. We'd ideally like to be able
> > to identify and parse these, but keeping them human-readable is
> > definitely also a goal.
> >
> > Now, to be honest, I doubt that the distinction here would be of much
> > use to kselftest, but it could be nice to not go out of our way to
> > make parsing some diagnostic lines possible. That being said,
> > personally I'm all for avoiding the yaml for diagnostic messages stuff
> > and sticking to something simple and line-based, possibly
> > standardising a the format of a few common diagnostic measurements
> > (e.g., assertions/expected values/etc) in a way that's both
> > human-readable and parsable if possible.
> >
> > I agree that there's a lot of analysis that is possible with just the
> > yes/no data. There's probably some fancy correlation one could do even
> > with unstructured diagnostic logs, so I don't think overstructuring
> > things is a necessity by any means. Where we have different tests
> > doing similar sorts of things, though, consistency in message
> > formatting could help even if things are not explicitly parsed.
> > Ensuring that helper functions that log and the like are spitting
> > things out in the same format is probably a good starting step down
> > that path.
> >
> > Cheers,
> > -- David
> >


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-19 17:58       ` Frank Rowand
@ 2020-06-20  6:44         ` David Gow
  2020-06-20 15:03           ` Frank Rowand
  0 siblings, 1 reply; 43+ messages in thread
From: David Gow @ 2020-06-20  6:44 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Paolo Bonzini, Bird, Tim, Kees Cook, shuah, linux-kselftest,
	Brendan Higgins, linux-kernel

On Sat, Jun 20, 2020 at 1:58 AM Frank Rowand <frowand.list@gmail.com> wrote:
>
> On 2020-06-16 07:08, Paolo Bonzini wrote:
> > On 15/06/20 21:07, Bird, Tim wrote:

> >>>> Finally,
> >>>>   - Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
> >>>> See https://testanything.org/tap-version-13-specification.html
> >>>
> >>> Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it
> >>> did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP
> >>> relate? Neither SKIP nor XFAIL count toward failure, though, so both
> >>> should be "ok"? I guess we should change it to "ok".
> >
> > See above for XFAIL.
> >
> > I initially raised the issue with "SKIP" because I have a lot of tests
> > that depend on hardware availability---for example, a test that does not
> > run on some processor kinds (e.g. on AMD, or old Intel)---and for those
> > SKIP should be considered a success.
>
> No, SKIP should not be considered a success.  It should also not be considered
> a failure.  Please do not blur the lines between success, failure, and
> skipped.

I agree that skipped tests should be their own thing, separate from
success and failure, but the way they tend to behave tends to be
closer to a success than a failure.

I guess the important note here is that a suite of tests, some of
which are SKIPped, can be listed as having passed, so long as none of
them failed. So, the rule for "bubbling up" test results is that any
failures cause the parent to fail, the parent is marked as skipped if
_all_ subtests are skipped, and otherwise is marked as having
succeeded. (Reversing the last part: having a suite be marked as
skipped if _any_ of the subtests are skipped also makes sense, and has
its advantages, but anecdotally seems less common in other systems.)

The other really brave thing one could do to break from the TAP
specification would be to add a "skipped" value alongside "ok" and
"not ok", and get rid of the whole "SKIP" directive/comment stuff.
Possibly not worth the departure from the spec, but it would sidestep
part of the problem.


Cheers,
-- David

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-19 22:58               ` Paolo Bonzini
@ 2020-06-20 14:51                 ` Frank Rowand
  0 siblings, 0 replies; 43+ messages in thread
From: Frank Rowand @ 2020-06-20 14:51 UTC (permalink / raw)
  To: Paolo Bonzini, Kees Cook, Brendan Higgins
  Cc: Bird, Tim, shuah, linux-kselftest, linux-kernel, David Gow, Frank Rowand

On 2020-06-19 17:58, Paolo Bonzini wrote:
> On 19/06/20 20:47, Frank Rowand wrote:
>> Or if the entire test depends on the missing config then Bail out might
>> be appropriate.
> 
> No, in that case you want
> 
> 	1..0 # SKIP: unsupported configuration
> 
> The spec is not clear if "Bail out!" is an error condition or just a
> warning that only part of the test was run, but prove(1) and Automake
> both treat it as the former, for example.
> 
> For example, an ENOSPC error creating a temporary file could be turned
> into a bail-out, while an ENOSYS would be a skip.
> 
> Paolo
> 

I think that "Bail out!" needs to be more clearly defined by the spec,
and we should come up with an intent of what it is intended to mean.

The TAP 13 spec gives a "Bail out!" example of "MySQL is not running."
The spec gives this example after the general guidance of (among other
things) "missing dependencies".  A missing config _could_ be thought
of as a missing dependency.

To me, the key differentiator for "Bail out!" is "that further tests
are useless".  In other words, the detected problem means that
every subsequent part of the tests will fail for the same reason.

If _some_ subsequent parts of the tests will NOT fail for the same
reason then SKIP becomes more reasonable.  What percent of subsequent
parts of the tests is large enough to comprise "_some_" is probably
a value judgement.

For the case that I was commenting on, with the limited amount of
description provided, my preference is also SKIP, which is what I
was suggesting.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-20  6:44         ` David Gow
@ 2020-06-20 15:03           ` Frank Rowand
  2020-06-23  2:58             ` David Gow
  0 siblings, 1 reply; 43+ messages in thread
From: Frank Rowand @ 2020-06-20 15:03 UTC (permalink / raw)
  To: David Gow
  Cc: Paolo Bonzini, Bird, Tim, Kees Cook, shuah, linux-kselftest,
	Brendan Higgins, linux-kernel

On 2020-06-20 01:44, David Gow wrote:
> On Sat, Jun 20, 2020 at 1:58 AM Frank Rowand <frowand.list@gmail.com> wrote:
>>
>> On 2020-06-16 07:08, Paolo Bonzini wrote:
>>> On 15/06/20 21:07, Bird, Tim wrote:
> 
>>>>>> Finally,
>>>>>>   - Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
>>>>>> See https://testanything.org/tap-version-13-specification.html
>>>>>
>>>>> Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it
>>>>> did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP
>>>>> relate? Neither SKIP nor XFAIL count toward failure, though, so both
>>>>> should be "ok"? I guess we should change it to "ok".
>>>
>>> See above for XFAIL.
>>>
>>> I initially raised the issue with "SKIP" because I have a lot of tests
>>> that depend on hardware availability---for example, a test that does not
>>> run on some processor kinds (e.g. on AMD, or old Intel)---and for those
>>> SKIP should be considered a success.
>>
>> No, SKIP should not be considered a success.  It should also not be considered
>> a failure.  Please do not blur the lines between success, failure, and
>> skipped.
> 


> I agree that skipped tests should be their own thing, separate from
> success and failure, but the way they tend to behave tends to be
> closer to a success than a failure.
> 
> I guess the important note here is that a suite of tests, some of
> which are SKIPped, can be listed as having passed, so long as none of
> them failed. So, the rule for "bubbling up" test results is that any
> failures cause the parent to fail, the parent is marked as skipped if
> _all_ subtests are skipped, and otherwise is marked as having
> succeeded. (Reversing the last part: having a suite be marked as
> skipped if _any_ of the subtests are skipped also makes sense, and has
> its advantages, but anecdotally seems less common in other systems.)

That really caught my attention as something to be captured in the spec.

My initial response was that bubbling up results is the domain of the
test analysis tools, not the test code.

If I were writing a test analysis tool, I would want the user to have
the ability to configure the bubble up rules.  Different use cases
would desire different rules.

My second response was to start thinking about whether the tests
themselves should have any sort of bubble up implemented.  I think
it is a very interesting question.  My current mindset is that
each test is independent, and their is not a concept of an umbrella
test that is the union of a set of subtests.  But maybe there is
value to umbrella tests.  If there is a concept of umbrella tests
then I think the spec should define how skip bubbles up.


> 
> The other really brave thing one could do to break from the TAP
> specification would be to add a "skipped" value alongside "ok" and
> "not ok", and get rid of the whole "SKIP" directive/comment stuff.
> Possibly not worth the departure from the spec, but it would sidestep
> part of the problem.

I like being brave in this case.  Elevating SKIP to be a peer of
"ok" and "not ok" provides a more clear model that SKIP is a first
class citizen.  It also removes the muddled thinking that the
current model promotes.

> 
> 
> Cheers,
> -- David
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: RFC - kernel selftest result documentation (KTAP)
  2020-06-20 15:03           ` Frank Rowand
@ 2020-06-23  2:58             ` David Gow
  0 siblings, 0 replies; 43+ messages in thread
From: David Gow @ 2020-06-23  2:58 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Paolo Bonzini, Bird, Tim, Kees Cook, shuah, linux-kselftest,
	Brendan Higgins, linux-kernel

On Sat, Jun 20, 2020 at 11:03 PM Frank Rowand <frowand.list@gmail.com> wrote:
>
> On 2020-06-20 01:44, David Gow wrote:
> > On Sat, Jun 20, 2020 at 1:58 AM Frank Rowand <frowand.list@gmail.com> wrote:
> >>
> >> On 2020-06-16 07:08, Paolo Bonzini wrote:
> >>> On 15/06/20 21:07, Bird, Tim wrote:
> >
> >>>>>> Finally,
> >>>>>>   - Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
> >>>>>> See https://testanything.org/tap-version-13-specification.html
> >>>>>
> >>>>> Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it
> >>>>> did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP
> >>>>> relate? Neither SKIP nor XFAIL count toward failure, though, so both
> >>>>> should be "ok"? I guess we should change it to "ok".
> >>>
> >>> See above for XFAIL.
> >>>
> >>> I initially raised the issue with "SKIP" because I have a lot of tests
> >>> that depend on hardware availability---for example, a test that does not
> >>> run on some processor kinds (e.g. on AMD, or old Intel)---and for those
> >>> SKIP should be considered a success.
> >>
> >> No, SKIP should not be considered a success.  It should also not be considered
> >> a failure.  Please do not blur the lines between success, failure, and
> >> skipped.
> >
>
>
> > I agree that skipped tests should be their own thing, separate from
> > success and failure, but the way they tend to behave tends to be
> > closer to a success than a failure.
> >
> > I guess the important note here is that a suite of tests, some of
> > which are SKIPped, can be listed as having passed, so long as none of
> > them failed. So, the rule for "bubbling up" test results is that any
> > failures cause the parent to fail, the parent is marked as skipped if
> > _all_ subtests are skipped, and otherwise is marked as having
> > succeeded. (Reversing the last part: having a suite be marked as
> > skipped if _any_ of the subtests are skipped also makes sense, and has
> > its advantages, but anecdotally seems less common in other systems.)
>
> That really caught my attention as something to be captured in the spec.
>
> My initial response was that bubbling up results is the domain of the
> test analysis tools, not the test code.

KUnit is actually sitting in the middle. Results are bubbled up from
individual tests to the test suites in-kernel (by the common KUnit
code), as the suites are TAP tests (individual test cases being
subtests), and so need to provide results. The kunit.py script then
bubbles those results up (using the same rules) to print a summary.

> If I were writing a test analysis tool, I would want the user to have
> the ability to configure the bubble up rules.  Different use cases
> would desire different rules.

I tend to agree: it'd be nice if test analysis tools could implement
different rules here. If we're using TAP subtests, though, the parent
tests do need to return a result in the test code, so either that
needs to be test-specific (if the parent test is not just a simple
union of its subtests), or it could be ignored by an analysis tool
which would follow its own rules. (In either case, it may make sense
to be able to configure a test analysis tool to always fail or mark
tests with failed or skipped subtests, even if its result is "ok", but
not vice-versa -- a test which failed would stay failed, even if all
its subtests passed.)

> My second response was to start thinking about whether the tests
> themselves should have any sort of bubble up implemented.  I think
> it is a very interesting question.  My current mindset is that
> each test is independent, and their is not a concept of an umbrella
> test that is the union of a set of subtests.  But maybe there is
> value to umbrella tests.  If there is a concept of umbrella tests
> then I think the spec should define how skip bubbles up.
>

KUnit suites are definitely that kind of "umbrella test" at the moment.

> >
> > The other really brave thing one could do to break from the TAP
> > specification would be to add a "skipped" value alongside "ok" and
> > "not ok", and get rid of the whole "SKIP" directive/comment stuff.
> > Possibly not worth the departure from the spec, but it would sidestep
> > part of the problem.
>
> I like being brave in this case.  Elevating SKIP to be a peer of
> "ok" and "not ok" provides a more clear model that SKIP is a first
> class citizen.  It also removes the muddled thinking that the
> current model promotes.
>
> >
> >
> > Cheers,
> > -- David
> >
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2020-06-23  2:58 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-10 18:11 RFC - kernel selftest result documentation (KTAP) Bird, Tim
2020-06-13  5:07 ` David Gow
2020-06-15 17:34   ` Bird, Tim
2020-06-16 20:03     ` Brendan Higgins
2020-06-16 20:37       ` Bird, Tim
2020-06-17  0:02         ` Kees Cook
2020-06-19 19:32         ` Brendan Higgins
2020-06-19 18:17       ` Frank Rowand
2020-06-14 18:17 ` Kees Cook
2020-06-15 17:45   ` Bird, Tim
2020-06-15 18:44     ` Kees Cook
2020-06-14 18:39 ` Kees Cook
2020-06-15 19:07   ` Bird, Tim
2020-06-16 12:08     ` Paolo Bonzini
2020-06-16 16:42       ` Bird, Tim
2020-06-16 19:44         ` Brendan Higgins
2020-06-16 20:30           ` Bird, Tim
2020-06-16 23:58           ` Kees Cook
2020-06-19 18:47             ` Frank Rowand
2020-06-19 19:11               ` Kees Cook
2020-06-19 22:58               ` Paolo Bonzini
2020-06-20 14:51                 ` Frank Rowand
2020-06-19 18:33         ` Frank Rowand
2020-06-19 17:58       ` Frank Rowand
2020-06-20  6:44         ` David Gow
2020-06-20 15:03           ` Frank Rowand
2020-06-23  2:58             ` David Gow
2020-06-16 23:52     ` Kees Cook
2020-06-19 18:52       ` Frank Rowand
2020-06-19 19:50       ` Brendan Higgins
2020-06-19 19:49     ` Frank Rowand
2020-06-16 20:48 ` Brendan Higgins
2020-06-16 21:16   ` Bird, Tim
2020-06-16 21:19     ` Bird, Tim
2020-06-17  0:06     ` Kees Cook
2020-06-17  2:30       ` Bird, Tim
2020-06-17  3:36         ` Kees Cook
2020-06-17  4:05           ` David Gow
2020-06-19 19:44             ` Brendan Higgins
2020-06-19 20:19             ` Frank Rowand
2020-06-19 23:47               ` Bird, Tim
2020-06-19 19:39     ` Brendan Higgins
2020-06-19 17:13 ` Frank Rowand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).