All of lore.kernel.org
 help / color / mirror / Atom feed
* [Fuego] status update
@ 2017-06-27  6:04 Daniel Sangorrin
  2017-06-28  0:47 ` Bird, Timothy
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Sangorrin @ 2017-06-27  6:04 UTC (permalink / raw)
  To: fuego

Hi all,

I've working on the parsing code. Here is a list of things that I managed to complete
and things that need some discussion:

Completed tasks:
- Output a single run.json file for each job's run/build that captures both the metadata and the test results.
- Merge run.json files for a test_suite into a single results.json file that flot can visualize.
   + I have added concurrency locks to protect results.json from concurrent writes.
- Add HTML output support to flot (similar, but not exactly the same yet, to the one in AGL JTA).
- Fixed the test Functional.tiff (AGL test) and confirmed that it works on docker and beaglebone black.
   + There are many more AGL tests to fix.
- Fixed several bug fixes that occurred when a test fails in an unexpected way.

Discarded tasks:
- Output a jUnit XML file so that the plugin "Test Results Analyzer" can display.
  + This is working but it isn't as flexible as I'd like. The new flot's HTML output support that I added should deprecate it.
- Ability to download a PNG/SVG from the flot plugin directly.
  + I managed to get this working by using the canvas2image library or using the canvas.toDataURL interface. Unfortunately, flot  doesn't store the axes' information in the canvas so only the plotting space is saved. There is a library to accomplish this task [1] that but there seems to be a version mismatch with the javascript libraries in fuego and it didn't work. I decided to postpone this.

Pending tasks:
- Add report functionality
   + I have removed the generation of plot.png files. In part because I want to do that directly from the
       flot plugin, and in part because I think it is more useful if we integrate it in the future ftc report command,
- Add more information to the run.json files.
   + I am trying to produce a schema that is very close to the one used in Kernel.CI. Probably I can make it compatible.
   + There is some information that needs to be added to Fuego. Unfortunately, I will probably have to fix a lot of files:
       1) Each test_case (remember test_suite > test_set > test_case) should be able to store a list of
            measurements. Each measurement would consist of a name, value, units, duration and maybe more
            such as error messages specific to the test_case or expected values (e.g. expected to fail, or expected to
            be greater than 50).
       2) Each test_case should have a status (PASS, FAIL..) that is not necessarily the same as the test_set/suite status.
   + Add vcs_commit information (git or tarball information)
- Handle failed runs better. Sometimes the test fails very early, before even building it.
- I am not sure what to do with the "reference.log" files
   + Currently they are used to store thresholds, but these are probably board dependent.
   + This is probably related to the discussion with Rafael about parameterized builds. We should
       be able to define the threshold for a specific test_case's measure.
- Remove testplans?
   + I was thinking that we can substitute testplans by custom scripts that call ftc add-jobs
- Create a staging folder for tests that do not work or files that are not used.
   + Or maybe at least list them up on the wiki.

Thanks,
Daniel

[1] https://github.com/markrcote/flot-axislabels



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Fuego] status update
  2017-06-27  6:04 [Fuego] status update Daniel Sangorrin
@ 2017-06-28  0:47 ` Bird, Timothy
  2017-06-28  1:58   ` Daniel Sangorrin
  0 siblings, 1 reply; 8+ messages in thread
From: Bird, Timothy @ 2017-06-28  0:47 UTC (permalink / raw)
  To: Daniel Sangorrin, fuego



> -----Original Message-----
> From: Daniel Sangorrin on Monday, June 26, 2017 11:04 PM
>
> I've working on the parsing code. Here is a list of things that I managed to
> complete
> and things that need some discussion:
> 
> Completed tasks:
> - Output a single run.json file for each job's run/build that captures both the
> metadata and the test results.

Sounds good.   This is probably the right approach.

> - Merge run.json files for a test_suite into a single results.json file that flot
> can visualize.

So, just to clarify, the run.json file has the results for a multiple metrics for
a single execution (or 'run') of a test, and the results.json file has
results for multiple runs of the test?  That is, results.json has results
for a single Fuego host with results from different runs and on different boards?

>    + I have added concurrency locks to protect results.json from concurrent
> writes.
> - Add HTML output support to flot (similar, but not exactly the same yet, to
> the one in AGL JTA).
> - Fixed the test Functional.tiff (AGL test) and confirmed that it works on
> docker and beaglebone black.
>    + There are many more AGL tests to fix.
> - Fixed several bug fixes that occurred when a test fails in an unexpected
> way.

Sounds good.  Thanks!
 
> Discarded tasks:
> - Output a jUnit XML file so that the plugin "Test Results Analyzer" can
> display.
>   + This is working but it isn't as flexible as I'd like. The new flot's HTML output
> support that I added should deprecate it.
OK.

> - Ability to download a PNG/SVG from the flot plugin directly.
>   + I managed to get this working by using the canvas2image library or using
> the canvas.toDataURL interface. Unfortunately, flot  doesn't store the axes'
> information in the canvas so only the plotting space is saved. There is a library
> to accomplish this task [1] that but there seems to be a version mismatch
> with the javascript libraries in fuego and it didn't work. I decided to postpone
> this.
OK.

> 
> Pending tasks:
> - Add report functionality
>    + I have removed the generation of plot.png files. In part because I want to
> do that directly from the
>        flot plugin, and in part because I think it is more useful if we integrate it in
> the future ftc report command,
> - Add more information to the run.json files.
>    + I am trying to produce a schema that is very close to the one used in
> Kernel.CI. Probably I can make it compatible.

That sounds good.  I'm very interested in the schema.  I believe 
that Milo Casagrande mentioned something about groups, that I don't think
we have yet.  Everything else in your analysis from April
(https://lists.linuxfoundation.org/pipermail/fuego/2017-April/000448.html)
I think shows some analog between Fuego and KernelCI fields.

>    + There is some information that needs to be added to Fuego.
> Unfortunately, I will probably have to fix a lot of files:
>        1) Each test_case (remember test_suite > test_set > test_case) should
> be able to store a list of
>             measurements. Each measurement would consist of a name, value,
> units, duration and maybe more
>             such as error messages specific to the test_case or expected values
> (e.g. expected to fail, or expected to
>             be greater than 50).
I think this needs to be somewhere, but possibly not in the results schema.
For example, I don't want every listing of dbench results to have to report
the units for each benchmark metric.  These should be fairly static
and we should be able to define them per-test, not per-run.  Things
like thresholds are a bit different, and we may need to record them
per-run, since the success/failure threshold could be different depending
on the board, or even changed by the user to fine-tune the regression-checking.

>        2) Each test_case should have a status (PASS, FAIL..) that is not
> necessarily the same as the test_set/suite status.
Agreed.

>    + Add vcs_commit information (git or tarball information)
This is expected to be captured by the test version (in test.yaml),
and the test version should be saved in the run.json file.
But I could see saving the test source code version in the run.json file also.

> - Handle failed runs better. Sometimes the test fails very early, before even
> building it.
> - I am not sure what to do with the "reference.log" files
>    + Currently they are used to store thresholds, but these are probably board
> dependent.
>    + This is probably related to the discussion with Rafael about parameterized
> builds. We should
>        be able to define the threshold for a specific test_case's measure.

reference logs should be savable by the user, to compare with future runs.
The system we have now uses parsed testlogs, which are generated using
log_compare and very simple line-based regular-expresison filters (using 'grep').
It will be much more flexible and robust to compare run.json files instead
of a parsed log and a reference log.

The purpose of these is to save the data from a "known good" run, so that
regressions can be detected when the data from a current run differs from that.
This can include sub-test failures, that we have decided to ignore or postpone
resolution of.

I think once we have in place a system to save all the sub-test metric data
from the testlogs (using a parser) in json format, then we can eliminate these.
We should be able to replace reference.log with reference.json (which is just
a saved run.json file).  This is a key thing that I would like Fuego to be able to
share easily between developers (and sites).

I have already started working on a json difference program (called 'jdiff')
to compare 2 json files and report the differences between the two.

On the issue of where to save them, currently they should be saved somehow
at the 'board' level.  That is, tests will definitely have different results per-board.
But there may be such a thing as a reference file that is dependent on the kernel
version, or the distribution, or some other parameter.  We should discuss the
naming and storage of these.

> - Remove testplans?
>    + I was thinking that we can substitute testplans by custom scripts that call
> ftc add-jobs

Why do we want to remove them?  I think they serve a useful function - expressing
a set of tests (with their specs) to execute in sequence.

> - Create a staging folder for tests that do not work or files that are not used.
>    + Or maybe at least list them up on the wiki.

Currently, if they are not listed in a testplan, the tests are functionally 'dead'.
(Although a user can create a job for a single test and try it out).
Maybe it would be good to have a 'staging' folder for tests that are
under development or conversion (like a lot of the AGL tests). I agree that
we should have some notion of the "approved and likely to work" tests,
and that should be expressed somehow in the test placement or in
documentation.
 -- Tim



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Fuego] status update
  2017-06-28  0:47 ` Bird, Timothy
@ 2017-06-28  1:58   ` Daniel Sangorrin
  2017-06-28  4:55     ` Bird, Timothy
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Sangorrin @ 2017-06-28  1:58 UTC (permalink / raw)
  To: 'Bird, Timothy', fuego

Hi Tim,

> -----Original Message-----
> From: Bird, Timothy [mailto:Tim.Bird@sony.com]
> Sent: Wednesday, June 28, 2017 9:47 AM
> > - Merge run.json files for a test_suite into a single results.json file that flot
> > can visualize.
> 
> So, just to clarify, the run.json file has the results for a multiple metrics for
> a single execution (or 'run') of a test, and the results.json file has
> results for multiple runs of the test?  That is, results.json has results
> for a single Fuego host with results from different runs and on different boards?

Yes, that's correct. It's kind of a text "database". The size of this file will increase
with the number of runs so it may not scale. Probably we will need to implement
a logrotate-like functionality once we can submit all runs to a proper database
in a centralized server (maybe one running kernelci or something similar).
 
> > Pending tasks:
> > - Add report functionality
> >    + I have removed the generation of plot.png files. In part because I want to
> > do that directly from the
> >        flot plugin, and in part because I think it is more useful if we integrate it in
> > the future ftc report command,
> > - Add more information to the run.json files.
> >    + I am trying to produce a schema that is very close to the one used in
> > Kernel.CI. Probably I can make it compatible.
> 
> That sounds good.  I'm very interested in the schema.  I believe
> that Milo Casagrande mentioned something about groups, that I don't think
> we have yet.  Everything else in your analysis from April
> (https://lists.linuxfoundation.org/pipermail/fuego/2017-April/000448.html)
> I think shows some analog between Fuego and KernelCI fields.

Groups are called "test_sets" in kernel CI and can contain an array of "test_cases".
# I'm thinking about using kernel ci's nomenclature for this. What do you think?
# I also want to rename platform to toolchain, fwver to kernel_version, spec to
# test_spec, testplan to test_plan...

In Fuego, we have a similar concept. If you see bonnie's reference.log [1] you will
notice that there are several groups/test_sets and multiple tests/test_cases inside
each group:

test_set: Sequential_Output
test_cases: Block, PerChr, Rewrite

I am making an schema that is compatible with Kernel CI but that it will also allow
having test_sets inside test_sets. For example: "LTP > RT tests > func > sched_jitter"
contains 2-levels of test_sets (RT tests and func).

[1] https://bitbucket.org/tbird20d/fuego-core/src/805adb067afc492382ee23bc9178c059b90c043e/engine/tests/Benchmark.bonnie/reference.log?at=next&fileviewer=file-view-default

[Note] In the past, tests.info used to store this test_set > test_case information. Now this information
is actually provided by the test's parser.py and reference.log. The parser.py's information includes
information for the test_sets/test_cases that were actually executed (depends on the spec), whereas reference.log 
contains all possible test_sets/test_cases for that test (including those that were skipped somehow because
of the selected test_spec). We need to decouple the reference thresholds from this information though.

> >    + There is some information that needs to be added to Fuego.
> > Unfortunately, I will probably have to fix a lot of files:
> >        1) Each test_case (remember test_suite > test_set > test_case) should
> > be able to store a list of
> >             measurements. Each measurement would consist of a name, value,
> > units, duration and maybe more
> >             such as error messages specific to the test_case or expected values
> > (e.g. expected to fail, or expected to
> >             be greater than 50).
> I think this needs to be somewhere, but possibly not in the results schema.
> For example, I don't want every listing of dbench results to have to report
> the units for each benchmark metric.  These should be fairly static
> and we should be able to define them per-test, not per-run.  Things
> like thresholds are a bit different, and we may need to record them
> per-run, since the success/failure threshold could be different depending
> on the board, or even changed by the user to fine-tune the regression-checking.

The reasons I wanted to add units to the schema were:
  - AGL was using them on their HTML output reports, and it does make reports more readable.
    If we don't have this information in the results.json file, flot will need to get it
    from somewhere else (e.g. a json file). The problem is that we will need to update
    that file everytime we add a new test. I'd rather have that information inside the test directory.
    What do you think? 
  - Kernel CI format [2][3] allows including units in their measures (not strictly required) as well.

[2] https://api.kernelci.org/schema-test-case.html
[3] https://api.kernelci.org/json-schema/1.0/measurement.json

> > - Handle failed runs better. Sometimes the test fails very early, before even
> > building it.
> > - I am not sure what to do with the "reference.log" files
> >    + Currently they are used to store thresholds, but these are probably board
> > dependent.
> >    + This is probably related to the discussion with Rafael about parameterized
> > builds. We should
> >        be able to define the threshold for a specific test_case's measure.
> 
> reference logs should be savable by the user, to compare with future runs.
> The system we have now uses parsed testlogs, which are generated using
> log_compare and very simple line-based regular-expresison filters (using 'grep').
> It will be much more flexible and robust to compare run.json files instead
> of a parsed log and a reference log.
> 
> The purpose of these is to save the data from a "known good" run, so that
> regressions can be detected when the data from a current run differs from that.
> This can include sub-test failures, that we have decided to ignore or postpone
> resolution of.
> 
> I think once we have in place a system to save all the sub-test metric data
> from the testlogs (using a parser) in json format, then we can eliminate these.
> We should be able to replace reference.log with reference.json (which is just
> a saved run.json file).  This is a key thing that I would like Fuego to be able to
> share easily between developers (and sites).
> 
> I have already started working on a json difference program (called 'jdiff')
> to compare 2 json files and report the differences between the two.
> 
> On the issue of where to save them, currently they should be saved somehow
> at the 'board' level.  That is, tests will definitely have different results per-board.
> But there may be such a thing as a reference file that is dependent on the kernel
> version, or the distribution, or some other parameter.  We should discuss the
> naming and storage of these.

How about saving them at the board's testplan, and also allow users to try
different ones through the ftc run-test interface?

> > - Remove testplans?
> >    + I was thinking that we can substitute testplans by custom scripts that call
> > ftc add-jobs
> 
> Why do we want to remove them?  I think they serve a useful function - expressing
> a set of tests (with their specs) to execute in sequence.

OK. I just wanted to mention that is redundant. But I agree that they are useful.

> > - Create a staging folder for tests that do not work or files that are not used.
> >    + Or maybe at least list them up on the wiki.
> 
> Currently, if they are not listed in a testplan, the tests are functionally 'dead'.
> (Although a user can create a job for a single test and try it out).
> Maybe it would be good to have a 'staging' folder for tests that are
> under development or conversion (like a lot of the AGL tests). I agree that
> we should have some notion of the "approved and likely to work" tests,
> and that should be expressed somehow in the test placement or in
> documentation.

Maybe we can write this information on the test.yaml file (you mentioned something
about evaluating tests with stars). Actually, I would like to know your plans with the
test.yaml files in general. I haven't looked into them deeply yet.

Thanks,
Daniel




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Fuego] status update
  2017-06-28  1:58   ` Daniel Sangorrin
@ 2017-06-28  4:55     ` Bird, Timothy
  2017-06-28  5:23       ` Bird, Timothy
  2017-06-28  8:43       ` Daniel Sangorrin
  0 siblings, 2 replies; 8+ messages in thread
From: Bird, Timothy @ 2017-06-28  4:55 UTC (permalink / raw)
  To: Daniel Sangorrin, fuego



> -----Original Message-----
> From: Daniel Sangorrin on Tuesday, June 27, 2017 6:58 PM
> > -----Original Message-----
> > From: Bird, Timothy [mailto:Tim.Bird@sony.com]
> > Sent: Wednesday, June 28, 2017 9:47 AM
> > > - Merge run.json files for a test_suite into a single results.json file that
> flot
> > > can visualize.
> >
> > So, just to clarify, the run.json file has the results for a multiple metrics for
> > a single execution (or 'run') of a test, and the results.json file has
> > results for multiple runs of the test?  That is, results.json has results
> > for a single Fuego host with results from different runs and on different
> boards?
> 
> Yes, that's correct. It's kind of a text "database". The size of this file will
> increase
> with the number of runs so it may not scale. Probably we will need to
> implement
> a logrotate-like functionality once we can submit all runs to a proper
> database
> in a centralized server (maybe one running kernelci or something similar).
For a central server, I think we'll definitely have to use a database.
If we end up using the same (or a 'similar-enough') schema, we may be
able to reuse parts of the kernelci setup for our own Fuego central server.
That is, copy their database configuration and maybe reuse their server-side
web front-end.  Maybe we could end up sharing code and helping each other
out. It would be nice not to reinvent that wheel.

However, I don't want to require a database setup for the average developer
(with one host and one or a few boards).

> 
> > > Pending tasks:
> > > - Add report functionality
> > >    + I have removed the generation of plot.png files. In part because I
> want to
> > > do that directly from the
> > >        flot plugin, and in part because I think it is more useful if we integrate
> it in
> > > the future ftc report command,
> > > - Add more information to the run.json files.
> > >    + I am trying to produce a schema that is very close to the one used in
> > > Kernel.CI. Probably I can make it compatible.
> >
> > That sounds good.  I'm very interested in the schema.  I believe
> > that Milo Casagrande mentioned something about groups, that I don't
> think
> > we have yet.  Everything else in your analysis from April
> > (https://lists.linuxfoundation.org/pipermail/fuego/2017-April/000448.html)
> > I think shows some analog between Fuego and KernelCI fields.
> 
> Groups are called "test_sets" in kernel CI and can contain an array of
> "test_cases".
> # I'm thinking about using kernel ci's nomenclature for this. What do you
> think?
Did they ever give us an example of what their LTP results look like?
I think Milo described them, but I don't recall actually seeing a json file.
LTP is probably the most complicated test we'll need to handle.

I'm not sure I like their nomenclature.  They have 3 things:
test_suite, test_set and test_case.
I guess these are roughly the same as our:
test_plan, test, and (unnamed by us) individual sub-test case.
(but it's unclear these are exact analogs).

I find these three levels confusing  - particularly because a test_suite
in kernelCI can point to both test_sets and test_cases.

> # I also want to rename platform to toolchain, fwver to kernel_version, spec
> to
> # test_spec, testplan to test_plan...
Agree on rename of platform to toolchain, fwver to kernel_version.

Our 'spec' is essentially the same as their test_set 'parameters' object.
Note that their 'test_case' can have a 'parameters' object as well.

I'm thinking of test_case as something like: LTP.syscall.kill10
where, given that they support multiple measurements per test_case, maybe
they would classify this as:
 - test_suite LTP
 - test_set syscall
 - test_case: kill10
 - measurement: ? (does kill10 do more than one measurement?)

> 
> In Fuego, we have a similar concept. If you see bonnie's reference.log [1] you
> will
> notice that there are several groups/test_sets and multiple tests/test_cases
> inside
> each group:
> 
> test_set: Sequential_Output
> test_cases: Block, PerChr, Rewrite

In this case, is Block a set of measurements, or a single measurement?

> 
> I am making an schema that is compatible with Kernel CI but that it will also
> allow
> having test_sets inside test_sets. For example: "LTP > RT tests > func >
> sched_jitter"
> contains 2-levels of test_sets (RT tests and func).
2 levels: RT tests and func?
or just 2 test_sets?

I'm not that familiar with LTP, so is 'func' actually nested under 'RT tests'?

> 
> [1] https://bitbucket.org/tbird20d/fuego-
> core/src/805adb067afc492382ee23bc9178c059b90c043e/engine/tests/Bench
> mark.bonnie/reference.log?at=next&fileviewer=file-view-default
> 
> [Note] In the past, tests.info used to store this test_set > test_case
> information. Now this information
> is actually provided by the test's parser.py and reference.log. The parser.py's
> information includes
> information for the test_sets/test_cases that were actually executed
> (depends on the spec), whereas reference.log
> contains all possible test_sets/test_cases for that test (including those that
> were skipped somehow because
> of the selected test_spec). We need to decouple the reference thresholds
> from this information though.
> 
> > >    + There is some information that needs to be added to Fuego.
> > > Unfortunately, I will probably have to fix a lot of files:
> > >        1) Each test_case (remember test_suite > test_set > test_case)
> should
> > > be able to store a list of
> > >             measurements. Each measurement would consist of a name, value,
> > > units, duration and maybe more
> > >             such as error messages specific to the test_case or expected values
> > > (e.g. expected to fail, or expected to
> > >             be greater than 50).
> > I think this needs to be somewhere, but possibly not in the results schema.
> > For example, I don't want every listing of dbench results to have to report
> > the units for each benchmark metric.  These should be fairly static
> > and we should be able to define them per-test, not per-run.  Things
> > like thresholds are a bit different, and we may need to record them
> > per-run, since the success/failure threshold could be different depending
> > on the board, or even changed by the user to fine-tune the regression-
> checking.
> 
> The reasons I wanted to add units to the schema were:
>   - AGL was using them on their HTML output reports, and it does make
> reports more readable.
>     If we don't have this information in the results.json file, flot will need to get
> it
>     from somewhere else (e.g. a json file). The problem is that we will need to
> update
>     that file everytime we add a new test. I'd rather have that information
> inside the test directory.
>     What do you think?
OK - that makes sense.

>   - Kernel CI format [2][3] allows including units in their measures (not strictly
> required) as well.
> 
> [2] https://api.kernelci.org/schema-test-case.html
> [3] https://api.kernelci.org/json-schema/1.0/measurement.json
> 
> > > - Handle failed runs better. Sometimes the test fails very early, before
> even
> > > building it.
> > > - I am not sure what to do with the "reference.log" files
> > >    + Currently they are used to store thresholds, but these are probably
> board
> > > dependent.
> > >    + This is probably related to the discussion with Rafael about
> parameterized
> > > builds. We should
> > >        be able to define the threshold for a specific test_case's measure.
> >
> > reference logs should be savable by the user, to compare with future runs.
> > The system we have now uses parsed testlogs, which are generated using
> > log_compare and very simple line-based regular-expresison filters (using
> 'grep').
> > It will be much more flexible and robust to compare run.json files instead
> > of a parsed log and a reference log.
> >
> > The purpose of these is to save the data from a "known good" run, so that
> > regressions can be detected when the data from a current run differs from
> that.
> > This can include sub-test failures, that we have decided to ignore or
> postpone
> > resolution of.
> >
> > I think once we have in place a system to save all the sub-test metric data
> > from the testlogs (using a parser) in json format, then we can eliminate
> these.
> > We should be able to replace reference.log with reference.json (which is
> just
> > a saved run.json file).  This is a key thing that I would like Fuego to be able
> to
> > share easily between developers (and sites).
> >
> > I have already started working on a json difference program (called 'jdiff')
> > to compare 2 json files and report the differences between the two.
> >
> > On the issue of where to save them, currently they should be saved
> somehow
> > at the 'board' level.  That is, tests will definitely have different results per-
> board.
> > But there may be such a thing as a reference file that is dependent on the
> kernel
> > version, or the distribution, or some other parameter.  We should discuss
> the
> > naming and storage of these.
> 
> How about saving them at the board's testplan, and also allow users to try
> different ones through the ftc run-test interface?
I don't follow this.  In /fuego-ro/boards?

They are definitely test-specific, but I'm not sure I want them in /fuego-core/engine/tests/<testname>.
That will pollute the fuego-core directory with site-specific data.

I think they need to go into /fuego-rw somewhere, as they can be user-generated (and possibly
user-downloaded).

> 
> > > - Remove testplans?
> > >    + I was thinking that we can substitute testplans by custom scripts that
> call
> > > ftc add-jobs
> >
> > Why do we want to remove them?  I think they serve a useful function -
> expressing
> > a set of tests (with their specs) to execute in sequence.
> 
> OK. I just wanted to mention that is redundant. But I agree that they are
> useful.
> 
> > > - Create a staging folder for tests that do not work or files that are not
> used.
> > >    + Or maybe at least list them up on the wiki.
> >
> > Currently, if they are not listed in a testplan, the tests are functionally
> 'dead'.
> > (Although a user can create a job for a single test and try it out).
> > Maybe it would be good to have a 'staging' folder for tests that are
> > under development or conversion (like a lot of the AGL tests). I agree that
> > we should have some notion of the "approved and likely to work" tests,
> > and that should be expressed somehow in the test placement or in
> > documentation.
> 
> Maybe we can write this information on the test.yaml file (you mentioned
> something
> about evaluating tests with stars).
I hadn't considered putting the rating information into the test.yaml file,
but an indicator of test 'readiness' might work there.  I have test version,
which, if the number is less than 1.0, is a proxy for indicating that the 
test is not really considered valid yet (ie, it's pre-release quality).
Note that the version field in test.yaml is the version of the fuego test, not
the version of the test program used in the test.

> Actually, I would like to know your plans
> with the
> test.yaml files in general. I haven't looked into them deeply yet.
They were created as the place to hold data used for packaging a test
(for test distribution outside of the fuego-core repository).  Currently,
they have a manifest of files and some extra information related to 
packaging a test (author, license, version, etc.).  I will formally define
these, clean them up, and add them for all the tests in the repository
when we roll out "test packages" as an officially supported feature. 

Right now, test packages are implemented as just a proof of concept.  More work
needs to be done server side (with a rating system, security to prevent
malware, etc) before this feature is ready. (ie - not this release).
You can think of them like rpm .spec files, or debian control files.
An example of one is in fuego-core/engine/tests/Functional.bc/test.yaml
(Now that I look at it, it's out of date.  It lists the base_script name, and that
is no longer needed, as the base_script is now always 'fuego_test.sh').
 -- Tim


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Fuego] status update
  2017-06-28  4:55     ` Bird, Timothy
@ 2017-06-28  5:23       ` Bird, Timothy
  2017-06-28  9:02         ` Daniel Sangorrin
  2017-06-28  8:43       ` Daniel Sangorrin
  1 sibling, 1 reply; 8+ messages in thread
From: Bird, Timothy @ 2017-06-28  5:23 UTC (permalink / raw)
  To: Bird, Timothy, Daniel Sangorrin, fuego



> -----Original Message-----
> From: Bird, Timothy on Tuesday, June 27, 2017 9:56 PM
> > -----Original Message-----
> > From: Daniel Sangorrin on Tuesday, June 27, 2017 6:58 PM
> > > -----Original Message-----
> > > From: Bird, Timothy [mailto:Tim.Bird@sony.com]
> > > Sent: Wednesday, June 28, 2017 9:47 AM
> > In Fuego, we have a similar concept. If you see bonnie's reference.log [1]
> you
> > will
> > notice that there are several groups/test_sets and multiple
> tests/test_cases
> > inside
> > each group:
> >
> > test_set: Sequential_Output
> > test_cases: Block, PerChr, Rewrite
> 
> In this case, is Block a set of measurements, or a single measurement?

OK - it's a single measurement.

Ugh.  I was confused.  reference.log is not the file I was thinking of. 
reference.log is the file that holds thresholds for all of the benchmark
test measurements.  I was thinking of the 'pn log'.

All the stuff I said about placement of reference.log (well most of it), I meant
in regard to pn log files.

OK, getting back to reference.log...
First, it's terribly named.  It's not a 'log' by any stretch of the imagination.
It's also terribly formatted.  It's some weird custom format created just to be
easy to read and write with 'awk'.  It does need to specify all possible measurement
names (that are relevant for determining benchmark pass or fail status)  This
is usually a small subset of all the measurements made by a benchmark.
And, I'm not sure it belongs in the test directory, as it should be writable by the
user to reflect their own notion of values that indicate a regression threshold
for a measurement.

However, having said that, I'm not sure I want to change it this release.

Please note one other thing, though, related to test schema and test_cases and
nomenclature...

I strongly believe that every test case/measurement in the system should have
something that I call a TGUID - a test globally unique identifier (something like an URL in
World-wide-web parlance).  This is a single string that can be used to 
identify a particular measurement from all other measurements.  This should be
able to be constructed using a path-like syntax.  Personally, I would use dots
as delimiters, like bonnie's reference log does (rather than slashes, like URLs or
filepaths use).

It is important, IMO, to have a static test identifier, for referring to the test
independent of host, board, test_plan or spec:
e.g. "LTP.syscall.kill10"

We should be able to construct these fully-qualified test identifiers using values
in the json schemas for a run.

The reason for having TGUIDs is that I would like for people to be able to share
meta-data about individual measurements, in a way that is independent of other
test infrastructure or layout.

 -- Tim


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Fuego] status update
  2017-06-28  4:55     ` Bird, Timothy
  2017-06-28  5:23       ` Bird, Timothy
@ 2017-06-28  8:43       ` Daniel Sangorrin
  2017-06-29  7:15         ` Milo Casagrande
  1 sibling, 1 reply; 8+ messages in thread
From: Daniel Sangorrin @ 2017-06-28  8:43 UTC (permalink / raw)
  To: 'Bird, Timothy', fuego

Hi Tim,

# I've added Milo to the Cc.

> -----Original Message-----
> From: Bird, Timothy [mailto:Tim.Bird@sony.com]
> Sent: Wednesday, June 28, 2017 1:56 PM
...
> > > > Pending tasks:
> > > > - Add report functionality
> > > >    + I have removed the generation of plot.png files. In part because I
> > want to
> > > > do that directly from the
> > > >        flot plugin, and in part because I think it is more useful if we integrate
> > it in
> > > > the future ftc report command,
> > > > - Add more information to the run.json files.
> > > >    + I am trying to produce a schema that is very close to the one used in
> > > > Kernel.CI. Probably I can make it compatible.
> > >
> > > That sounds good.  I'm very interested in the schema.  I believe
> > > that Milo Casagrande mentioned something about groups, that I don't
> > think
> > > we have yet.  Everything else in your analysis from April
> > > (https://lists.linuxfoundation.org/pipermail/fuego/2017-April/000448.html)
> > > I think shows some analog between Fuego and KernelCI fields.
> >
> > Groups are called "test_sets" in kernel CI and can contain an array of
> > "test_cases".
> > # I'm thinking about using kernel ci's nomenclature for this. What do you
> > think?
> Did they ever give us an example of what their LTP results look like?
> I think Milo described them, but I don't recall actually seeing a json file.
> LTP is probably the most complicated test we'll need to handle.

Milo sent 3 hackbench json files but not an LTP one. LTP normally can work with 3 levels
(e.g.: LTP > syscalls > kill01) so there is no problem about that.
However, LTP now also includes 2 more test suites inside (Posix 
open testsuite, and the real-time testsuite) with their own test sets and test cases. 
For that reason, you could end up with 4 nested levels (unless you create 3 test suites
from the same source code).

In any case, I don't see the need to restrict ourselves to only 3 nesting levels.
 
> I'm not sure I like their nomenclature.  They have 3 things:
> test_suite, test_set and test_case.
> I guess these are roughly the same as our:
> test_plan, test, and (unnamed by us) individual sub-test case.
> (but it's unclear these are exact analogs).

I think this is a better analogy:
   test_suite, test_set , test_case == test_name, groupname, test == bonnie, sequential_output, rewrite

The concept of test_plan is not in kernel ci afaik (maybe it is in LAVA with a different name such batch jobs?).

> I find these three levels confusing  - particularly because a test_suite
> in kernelCI can point to both test_sets and test_cases.

Actually that makes sense. For example, suppose you have a simple
test suite (hello world ) with one single test case. Then you don't really
need to define a test_set.

> > # I also want to rename platform to toolchain, fwver to kernel_version, spec
> > to
> > # test_spec, testplan to test_plan...
> Agree on rename of platform to toolchain, fwver to kernel_version.
> 
> Our 'spec' is essentially the same as their test_set 'parameters' object.
> Note that their 'test_case' can have a 'parameters' object as well.

It's something like that. But I think we should write the name of the test_spec 
at the test_suite level in the schema because we do not support per-test_set 
parameters at the moment.

> I'm thinking of test_case as something like: LTP.syscall.kill10
> where, given that they support multiple measurements per test_case, maybe
> they would classify this as:
>  - test_suite LTP
>  - test_set syscall
>  - test_case: kill10
>  - measurement: ? (does kill10 do more than one measurement?)

Functional test cases, such as LTP test cases, normally just finish with a return value (TPASS, TFAIL, TBROK etc..)
so you don't really need measurements for them (unless you want to store each checkpoint/assertion inside
the test case).

Benchmark test cases, on the other hand, may have more than one measurement. For example, netperf
returns the network throughput but also the CPU utilization. Fuego's Benchmark.netperf currently is
doing this:

test_suite: netperf
  test_set: MIGRATED_TCP_MAERTS
    test_cases: cpu, net
  test_set: MIGRATED_TCP_STREAM
    test_cases: cpu, net

I think that what we want to achieve is actually:

test_suite: netperf
  test_set: TCP
     test_case: MIGRATED_TCP_MAERTS
       measurements: [cpu, net]
     test_case: MIGRATED_TCP_STREAM
       measurements: [cpu, net]
  test_set: UDP
     test_case: whatever..
       measurements: [cpu, net]

> > In Fuego, we have a similar concept. If you see bonnie's reference.log [1] you
> > will
> > notice that there are several groups/test_sets and multiple tests/test_cases
> > inside
> > each group:
> >
> > test_set: Sequential_Output
> > test_cases: Block, PerChr, Rewrite
> 
> In this case, is Block a set of measurements, or a single measurement?

A single one, but it could be multiple if we for example did it for different sector sizes.

> > I am making an schema that is compatible with Kernel CI but that it will also
> > allow
> > having test_sets inside test_sets. For example: "LTP > RT tests > func >
> > sched_jitter"
> > contains 2-levels of test_sets (RT tests and func).
> 2 levels: RT tests and func?
> or just 2 test_sets?
> 
> I'm not that familiar with LTP, so is 'func' actually nested under 'RT tests'?

Yes. RT tests have "func", "stress" and "perf" test sets.
Ref: https://github.com/linux-test-project/ltp/tree/master/testcases/realtime

> >
> > [1] https://bitbucket.org/tbird20d/fuego-
> > core/src/805adb067afc492382ee23bc9178c059b90c043e/engine/tests/Bench
> > mark.bonnie/reference.log?at=next&fileviewer=file-view-default
> >
> > [Note] In the past, tests.info used to store this test_set > test_case
> > information. Now this information
> > is actually provided by the test's parser.py and reference.log. The parser.py's
> > information includes
> > information for the test_sets/test_cases that were actually executed
> > (depends on the spec), whereas reference.log
> > contains all possible test_sets/test_cases for that test (including those that
> > were skipped somehow because
> > of the selected test_spec). We need to decouple the reference thresholds
> > from this information though.
> >
> > > >    + There is some information that needs to be added to Fuego.
> > > > Unfortunately, I will probably have to fix a lot of files:
> > > >        1) Each test_case (remember test_suite > test_set > test_case)
> > should
> > > > be able to store a list of
> > > >             measurements. Each measurement would consist of a name, value,
> > > > units, duration and maybe more
> > > >             such as error messages specific to the test_case or expected values
> > > > (e.g. expected to fail, or expected to
> > > >             be greater than 50).
> > > I think this needs to be somewhere, but possibly not in the results schema.
> > > For example, I don't want every listing of dbench results to have to report
> > > the units for each benchmark metric.  These should be fairly static
> > > and we should be able to define them per-test, not per-run.  Things
> > > like thresholds are a bit different, and we may need to record them
> > > per-run, since the success/failure threshold could be different depending
> > > on the board, or even changed by the user to fine-tune the regression-
> > checking.
> >
> > The reasons I wanted to add units to the schema were:
> >   - AGL was using them on their HTML output reports, and it does make
> > reports more readable.
> >     If we don't have this information in the results.json file, flot will need to get
> > it
> >     from somewhere else (e.g. a json file). The problem is that we will need to
> > update
> >     that file everytime we add a new test. I'd rather have that information
> > inside the test directory.
> >     What do you think?
> OK - that makes sense.
> 
> >   - Kernel CI format [2][3] allows including units in their measures (not strictly
> > required) as well.
> >
> > [2] https://api.kernelci.org/schema-test-case.html
> > [3] https://api.kernelci.org/json-schema/1.0/measurement.json
> >
> > > > - Handle failed runs better. Sometimes the test fails very early, before
> > even
> > > > building it.
> > > > - I am not sure what to do with the "reference.log" files
> > > >    + Currently they are used to store thresholds, but these are probably
> > board
> > > > dependent.
> > > >    + This is probably related to the discussion with Rafael about
> > parameterized
> > > > builds. We should
> > > >        be able to define the threshold for a specific test_case's measure.
> > >
> > > reference logs should be savable by the user, to compare with future runs.
> > > The system we have now uses parsed testlogs, which are generated using
> > > log_compare and very simple line-based regular-expresison filters (using
> > 'grep').
> > > It will be much more flexible and robust to compare run.json files instead
> > > of a parsed log and a reference log.
> > >
> > > The purpose of these is to save the data from a "known good" run, so that
> > > regressions can be detected when the data from a current run differs from
> > that.
> > > This can include sub-test failures, that we have decided to ignore or
> > postpone
> > > resolution of.
> > >
> > > I think once we have in place a system to save all the sub-test metric data
> > > from the testlogs (using a parser) in json format, then we can eliminate
> > these.
> > > We should be able to replace reference.log with reference.json (which is
> > just
> > > a saved run.json file).  This is a key thing that I would like Fuego to be able
> > to
> > > share easily between developers (and sites).
> > >
> > > I have already started working on a json difference program (called 'jdiff')
> > > to compare 2 json files and report the differences between the two.
> > >
> > > On the issue of where to save them, currently they should be saved
> > somehow
> > > at the 'board' level.  That is, tests will definitely have different results per-
> > board.
> > > But there may be such a thing as a reference file that is dependent on the
> > kernel
> > > version, or the distribution, or some other parameter.  We should discuss
> > the
> > > naming and storage of these.
> >
> > How about saving them at the board's testplan, and also allow users to try
> > different ones through the ftc run-test interface?
> I don't follow this.  In /fuego-ro/boards?

Sorry, I forgot to say that testplans should also be per-board/user-generated and therefore not in fuego-core.
 
> They are definitely test-specific, but I'm not sure I want them in /fuego-core/engine/tests/<testname>.
> That will pollute the fuego-core directory with site-specific data.
> 
> I think they need to go into /fuego-rw somewhere, as they can be user-generated (and possibly
> user-downloaded).

I think that we should add a configuration file for the user to specify a path containing its boards/testplans etc.
# I think Rafael mentioned something about this.

Thanks,
Daniel



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Fuego] status update
  2017-06-28  5:23       ` Bird, Timothy
@ 2017-06-28  9:02         ` Daniel Sangorrin
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Sangorrin @ 2017-06-28  9:02 UTC (permalink / raw)
  To: 'Bird, Timothy', fuego



> -----Original Message-----
> From: Bird, Timothy [mailto:Tim.Bird@sony.com]
> Sent: Wednesday, June 28, 2017 2:23 PM
> To: Bird, Timothy; Daniel Sangorrin; fuego@lists.linuxfoundation.org
> Subject: RE: [Fuego] status update
> 
> 
> 
> > -----Original Message-----
> > From: Bird, Timothy on Tuesday, June 27, 2017 9:56 PM
> > > -----Original Message-----
> > > From: Daniel Sangorrin on Tuesday, June 27, 2017 6:58 PM
> > > > -----Original Message-----
> > > > From: Bird, Timothy [mailto:Tim.Bird@sony.com]
> > > > Sent: Wednesday, June 28, 2017 9:47 AM
> > > In Fuego, we have a similar concept. If you see bonnie's reference.log [1]
> > you
> > > will
> > > notice that there are several groups/test_sets and multiple
> > tests/test_cases
> > > inside
> > > each group:
> > >
> > > test_set: Sequential_Output
> > > test_cases: Block, PerChr, Rewrite
> >
> > In this case, is Block a set of measurements, or a single measurement?
> 
> OK - it's a single measurement.
> 
> Ugh.  I was confused.  reference.log is not the file I was thinking of.
> reference.log is the file that holds thresholds for all of the benchmark
> test measurements.  I was thinking of the 'pn log'.

Ouch, I had already forgotten about this guys.
 
> All the stuff I said about placement of reference.log (well most of it), I meant
> in regard to pn log files.
> 
> OK, getting back to reference.log...
> First, it's terribly named.  It's not a 'log' by any stretch of the imagination.
> It's also terribly formatted.  It's some weird custom format created just to be
> easy to read and write with 'awk'.  It does need to specify all possible measurement
> names (that are relevant for determining benchmark pass or fail status)  This
> is usually a small subset of all the measurements made by a benchmark.
> And, I'm not sure it belongs in the test directory, as it should be writable by the
> user to reflect their own notion of values that indicate a regression threshold
> for a measurement.
> 
> However, having said that, I'm not sure I want to change it this release.

OK, that's what I wanted to know. For now, I will stick to what we have.
 
> Please note one other thing, though, related to test schema and test_cases and
> nomenclature...
> 
> I strongly believe that every test case/measurement in the system should have
> something that I call a TGUID - a test globally unique identifier (something like an URL in
> World-wide-web parlance).  This is a single string that can be used to
> identify a particular measurement from all other measurements.  This should be
> able to be constructed using a path-like syntax.  Personally, I would use dots
> as delimiters, like bonnie's reference log does (rather than slashes, like URLs or
> filepaths use).
> 
> It is important, IMO, to have a static test identifier, for referring to the test
> independent of host, board, test_plan or spec:
> e.g. "LTP.syscall.kill10"
> 
> We should be able to construct these fully-qualified test identifiers using values
> in the json schemas for a run.
> 
> The reason for having TGUIDs is that I would like for people to be able to share
> meta-data about individual measurements, in a way that is independent of other
> test infrastructure or layout.

In kernel CI they ask for this ID to the centralized server, before submitting the json files. 
We can do as them, or create our own identifiers by combining several words.

Thanks,
Daniel




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Fuego] status update
  2017-06-28  8:43       ` Daniel Sangorrin
@ 2017-06-29  7:15         ` Milo Casagrande
  0 siblings, 0 replies; 8+ messages in thread
From: Milo Casagrande @ 2017-06-29  7:15 UTC (permalink / raw)
  To: fuego

Hello,

On Wed, Jun 28, 2017 at 10:43 AM, Daniel Sangorrin
<daniel.sangorrin@toshiba.co.jp> wrote:
>
> Milo sent 3 hackbench json files but not an LTP one. LTP normally can work with 3 levels
> (e.g.: LTP > syscalls > kill01) so there is no problem about that.

we don't have any actual LTP runs in our dataset, just some kselftest
and hackbench.

> However, LTP now also includes 2 more test suites inside (Posix
> open testsuite, and the real-time testsuite) with their own test sets and test cases.
> For that reason, you could end up with 4 nested levels (unless you create 3 test suites
> from the same source code).

In this case the kernelci test schemas might pose a problem, I don't
think we ever considered this scenario.

> I think this is a better analogy:
>    test_suite, test_set , test_case == test_name, groupname, test == bonnie, sequential_output, rewrite
>
> The concept of test_plan is not in kernel ci afaik (maybe it is in LAVA with a different name such batch jobs?).

No, kernelci doesn't hold any "test plan", we just keep the results
(but it depends also on the definition you are giving to "test plan":
for us is "a document describing the test workflow").
I'm not even sure if those are stored in LAVA though.

-- 
Milo Casagrande
Linaro.org <www.linaro.org> │ Open source software for ARM SoCs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-06-29  7:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-27  6:04 [Fuego] status update Daniel Sangorrin
2017-06-28  0:47 ` Bird, Timothy
2017-06-28  1:58   ` Daniel Sangorrin
2017-06-28  4:55     ` Bird, Timothy
2017-06-28  5:23       ` Bird, Timothy
2017-06-28  9:02         ` Daniel Sangorrin
2017-06-28  8:43       ` Daniel Sangorrin
2017-06-29  7:15         ` Milo Casagrande

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.