[Fuego] UOF

* [Fuego] UOF - more thoughts after the call
@ 2017-07-18 16:02 Bird, Timothy
  2017-07-20  0:56 ` Daniel Sangorrin
  0 siblings, 1 reply; 5+ messages in thread
From: Bird, Timothy @ 2017-07-18 16:02 UTC (permalink / raw)
  To: fuego

Daniel,

Thanks for sending the information about the UOF work so far for the AGL-CIAT/Fuego
conference call. 

I wanted to clarify a few thoughts I had in the call, and continue some discussion points on
the mailing list.

Right now we have:
 reference.json - 
   - this describes the test measurements (name, units, and pass criteria for each measure)
   - it can also have group pass criteria (fail_count, etc.) at  different levels (test_suite, test_set, test_case?)
 run.json  - the results from a single run
  -  this includes test data (time of test, device under test, kernel, host, etc.)
  - it also has the results of the test
     - including individual sub-test results
 results.json - an aggregate of results from multiple runs
  - currently has test data, as well as results for multiple runs (organized by measure)

I think what's throwing me off about reference.json is that it has some invariant data:
(name and units) mixed in with data that may need to be customized  (pass criteria).
Pass criteria consists of an operation and a value for each measure (with '=' and 'PASS')
as an inferred pass criteria for individual functional test measures.

A tester may want to alter the pass criteria, for the following types of scenarios:
 * some tests should be ignored on a  64-bit system, but used on a 32-bit system
 * some tests results may differ based on whether some uses a Debian or a Yocto-based
 distribution on their board
 * benchmark thresholds may need to change based on the location of the file system
 (rotating media vs. flash-based)
 * benchmark thresholds may need to change based on the board (high-end board vs. low-end board)

It seems redundant to have the units repeated in lots of different files, when they are
invariant for the test. 

For functional tests, we never had pass-criteria explicitly listed on a per-measure basis.
All we had were: 1) pass/fail counts, and 2) p/n log comparisons.  Our new system will
actually be much more explicit, and flexible.  So I very much like the direction we
are heading.

A few questions came up in the call:
 - the format for alternate pass-criteria?
   - you suggested json diffs, and I suggested an alternate full json file
 - where to store the alternate pass-criteria?
   - they definitely need to be recorded for a run, so either the diff or the json file
   should be copied to the log directory, or the criteria should be put into the run.json file.
   - they also need to be stored per host, for application with each test
      - they should be outside the test directory - their lifecycle is not conducive to being
      in the fuego-core repository
  - I think we should put them in /fuego-rw/test-conf/<test-name>/<criteria-name>.[json|jdiff]
   where criteria-name is something like: beaglebone-debian

I have other data I'd like to associate with each measure in the test framework (keyed
off the test guid).  This includes things like test descriptions, and information about results
interpretation.  This data should be invariant to a run (similar to the units) (I guess in a way
units is one element of interpreting the results data).  So, the extra data that I've had in
the back of my mind seems like the units would be better co-located with this extra
data, than with the pass-criteria. 

A made-up example of this extra data would be:
'Cyclictest.Thread0.Max': 'description': 'This represents the maximum latency observed
for an interrupt during the test.  If hard real-time is required, the threshold for this
should be set to 50% less than the longest acceptable latency required by your real-time
application.'

Arguably, someone my want to make their own
custom notes about a test (their own interpretation), and share those with the world.

For example, this tidbit of knowledge would be good to share:
'LTP.syscall.creat05': 'note': 'creat05 takes a very long time (40 minutes) in docker because
it creates and removes 1 million files, which takes a long time on the docker stacked filesystem'.
Or someone may want to write:
'LTP.syscall.kill10'': 'note': 'kill10 is hanging on various machines, for reasons I haven't been
able to determine yet.'

We can't create an economy of sharing pass-criteria, descriptions, interpretation and notes
until we formalize how they are formatted and shared.  We don't need to formalize
this other material this release, but it may influence how we want to organize the material
we *are* working on in this release - namely the schemas for the run, results, reference
(and pass-criteria - in there somewhere).

What do you think of /fuego-rw/test-conf ?  I'm open to other names for this. I think
it does need to be in the rw directory, because users (testers) are expected to customize it.
Anything that needs to be written to from inside the docker container needs to
be in the rw directory.

I can imagine creating new pass-criteria with something like this:
(in the future, not for this release)

 * ftc set-thresholds --tguid bonnie.*.Read --run beagle-bone.run.5  --sigma 5% -o my-fs-criteria
This would take the values for bonnie measurements ending in 'Read', from the data in run 5 for
the beaglebone board, adjust them by 5 % (either up or down depending on the operation), and
store them as new pass-criteria in a file called 'my-fs-criteria'.

 * ftc set-ignore --tguid LTP.syscall.kill10 --description 'I don't care about this right now because, blah, blah' \
-o my-ltp-ignore-list

Then using them something like this:
 * ftc run-test -b beagle-bone -t Benchmark.bonnie --pass-criteria my-fs-criteria
 * ftc run-test -b beagle-bone -t Functional.LTP --pass-criteria my-ltp-ignore-list

Or, in a spec:
 {
    "specs": {
           "default": {
                   "VAR1": "-f foo"
           },
          "alternate": {
                   "VAR1":"-f bar"
                   "pass-criteria": "my-criteria"
          }
}
   * ftc run-test -b beagle-bone -t Functional.some_test --spec "alternate"

I have more thoughts on other topics, that I'll put in other e-mails.  Otherwise the discussion
threading gets too hard to follow.

Regards,
 -- Tim

^ permalink raw reply	[flat|nested] 5+ messages in thread