All of lore.kernel.org
 help / color / mirror / Atom feed
* [Fuego] [PATCH] LTP: put fanotify07 in the skiplist
@ 2018-05-10  7:36 Daniel Sangorrin
       [not found] ` <CGME20180510073701epcas2p3fdbf9e1899d22350d9a29a67f3637d73@epcms5p5>
  2018-05-10 20:50 ` Tim.Bird
  0 siblings, 2 replies; 7+ messages in thread
From: Daniel Sangorrin @ 2018-05-10  7:36 UTC (permalink / raw)
  To: fuego

This test seems problematic. I need to investigate it
further but for now, it maybe a good idea to put it in the
skip list.

tst_test.c:1015: INFO: Timeout per run is 0h 05m 00s
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Cannot kill test processes!
Congratulation, likely test hit a kernel bug.
Exitting uncleanly...

Signed-off-by: Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp>
---
 engine/tests/Functional.LTP/spec.json | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/engine/tests/Functional.LTP/spec.json b/engine/tests/Functional.LTP/spec.json
index 3b48bdc..1f78afb 100644
--- a/engine/tests/Functional.LTP/spec.json
+++ b/engine/tests/Functional.LTP/spec.json
@@ -3,6 +3,7 @@
     "specs": {
         "default": {
             "tests": "syscalls SEM",
+            "skiplist": "fanotify07",
             "extra_success_links": {"xlsx": "results.xlsx", "skiplist": "skiplist.txt"},
             "extra_fail_links": {"xlsx": "results.xlsx", "skiplist": "skiplist.txt"}
         },
@@ -14,6 +15,7 @@
         },
         "selection": {
             "tests": "syscalls fs pipes sched timers dio mm ipc pty AIO MSG SEM SIG THR TMR TPS",
+            "skiplist": "fanotify07",
             "extra_success_links": {"xlsx": "results.xlsx", "skiplist": "skiplist.txt"},
             "extra_fail_links": {"xlsx": "results.xlsx", "skiplist": "skiplist.txt"}
         },
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Fuego] [PATCH] LTP: put fanotify07 in the skiplist
       [not found] ` <CGME20180510073701epcas2p3fdbf9e1899d22350d9a29a67f3637d73@epcms5p5>
@ 2018-05-10 16:13   ` Dhinakar Kalyanasundaram
  2018-05-10 21:50     ` Tim.Bird
       [not found]     ` <CGME20180510073701epcas2p3fdbf9e1899d22350d9a29a67f3637d73@epcms5p7>
  0 siblings, 2 replies; 7+ messages in thread
From: Dhinakar Kalyanasundaram @ 2018-05-10 16:13 UTC (permalink / raw)
  To: Daniel Sangorrin; +Cc: fuego

[-- Attachment #1: Type: text/html, Size: 4325 bytes --]

[-- Attachment #2: 201602111742151_N3WZA6X7.png --]
[-- Type: image/png, Size: 33527 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Fuego] [PATCH] LTP: put fanotify07 in the skiplist
  2018-05-10  7:36 [Fuego] [PATCH] LTP: put fanotify07 in the skiplist Daniel Sangorrin
       [not found] ` <CGME20180510073701epcas2p3fdbf9e1899d22350d9a29a67f3637d73@epcms5p5>
@ 2018-05-10 20:50 ` Tim.Bird
  1 sibling, 0 replies; 7+ messages in thread
From: Tim.Bird @ 2018-05-10 20:50 UTC (permalink / raw)
  To: daniel.sangorrin, fuego



> -----Original Message-----
> From: Daniel Sangorrin
> This test seems problematic. I need to investigate it
> further but for now, it maybe a good idea to put it in the
> skip list.
> 
> tst_test.c:1015: INFO: Timeout per run is 0h 05m 00s
> Test timeouted, sending SIGKILL!
> Test timeouted, sending SIGKILL!
> Test timeouted, sending SIGKILL!
> Test timeouted, sending SIGKILL!
> Test timeouted, sending SIGKILL!
> Test timeouted, sending SIGKILL!
> Test timeouted, sending SIGKILL!
> Test timeouted, sending SIGKILL!
> Test timeouted, sending SIGKILL!
> Test timeouted, sending SIGKILL!
> Test timeouted, sending SIGKILL!
> Cannot kill test processes!
> Congratulation, likely test hit a kernel bug.
> Exitting uncleanly...
> 
> Signed-off-by: Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp>
> ---
>  engine/tests/Functional.LTP/spec.json | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/engine/tests/Functional.LTP/spec.json
> b/engine/tests/Functional.LTP/spec.json
> index 3b48bdc..1f78afb 100644
> --- a/engine/tests/Functional.LTP/spec.json
> +++ b/engine/tests/Functional.LTP/spec.json
> @@ -3,6 +3,7 @@
>      "specs": {
>          "default": {
>              "tests": "syscalls SEM",
> +            "skiplist": "fanotify07",
>              "extra_success_links": {"xlsx": "results.xlsx", "skiplist": "skiplist.txt"},
>              "extra_fail_links": {"xlsx": "results.xlsx", "skiplist": "skiplist.txt"}
>          },
> @@ -14,6 +15,7 @@
>          },
>          "selection": {
>              "tests": "syscalls fs pipes sched timers dio mm ipc pty AIO MSG SEM
> SIG THR TMR TPS",
> +            "skiplist": "fanotify07",
>              "extra_success_links": {"xlsx": "results.xlsx", "skiplist": "skiplist.txt"},
>              "extra_fail_links": {"xlsx": "results.xlsx", "skiplist": "skiplist.txt"}
>          },
> --
> 2.7.4

Daniel,

I think a better solution is to put this in the board file.  I recently add
support for FUNCTIONAL_LTP_BOARD_SKIPLIST to allow adding an
LTP skiplist item at the board level.

You can put this in your board file:
FUNCTIONAL_LTP_BOARD_SKIPLIST="fanotify07"

I don't think turning off a test for all users of a test suite is good, unless
the test itself is known to be broken.  This test raises interesting
issues with Fuego, that I'll discuss in my response to Dhinakar's
e-mail on this issue.  Please see that and let me know what you think.
 -- Tim

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Fuego] [PATCH] LTP: put fanotify07 in the skiplist
  2018-05-10 16:13   ` Dhinakar Kalyanasundaram
@ 2018-05-10 21:50     ` Tim.Bird
  2018-05-14  2:31       ` Daniel Sangorrin
       [not found]     ` <CGME20180510073701epcas2p3fdbf9e1899d22350d9a29a67f3637d73@epcms5p7>
  1 sibling, 1 reply; 7+ messages in thread
From: Tim.Bird @ 2018-05-10 21:50 UTC (permalink / raw)
  To: dhinakar.k, daniel.sangorrin; +Cc: fuego

> -----Original Message-----
> From: Dhinakar Kalyanasundaram
> 
> Yes fanotify07 test is problematic if you use old kernel before 4.14 version.
> 
> We used kernel 4.9 and fanotify07 resulted in kernel panic when executed.
> 
> This issue has been fixed in kernel version 4.14.
> 
> Please refer to this link https://www.spinics.net/lists/linux-
> fsdevel/msg109131.html
> 
> fanotify07 executes without any issue in kernel 4.14.
> 

Dhinakar,

Thanks very much for this information.  I had been meaning to investigate
the issue with fanotify07 (which also fails on some of my boards here), but had
not gotten around to it yet.

Sorry for the long message below, but I'm now going to start talking about
more general Fuego issues...

This test raises some interesting issues that I'd like make Fuego better at
handling.

This test causes a hang in the middle of an LTP test, which is really a pain.
Fuego doesn't handle this kind of thing very well.  In the case of other tests,
if the test hangs in the middle you can see from the console log where the
test left off.  The LTP output is too quiet, in my judgement, as it takes a lot
of effort to see where the test got to before the machine hung.  This is
something that would be good to change.

Also - once the machine hangs there is no mechanism to continue
where the test left off.  The result is that many LTP sub-testcases don't get run.
Right now the only mechanism we have to deal with this is to skip
the individual testcase (the fanotify07 program).
Luckily, Daniel has added the skiplist feature to LTP to support this,
which is nice. We might want to think about also implementing a test
re-start mechanism, or making LTP more fine-grained in its execution.

In general, LTP is a "big" test, and Fuego is really architected around
running "small" tests, that don't hang the DUT while they are in progress.
It might make more sense to structure LTP a little differently, with
testplans that use a series of specs that run smaller test sets from LTP,
rather than huge test sets (syscalls is notoriously long).

I'm not sure of all the pros and cons to having lots of specs, but specs
that ran smaller sets of programs would lose less data when the
machine hung than the current configuration does now.

Fuego should be able to reboot a hung machine and continue to the
next scheduled Fuego test, when the BOARD_CONTROL feature is
being used. However, the BOARD_CONTROL feature needs more
work to support LAVA and other DUT control systems.  We might
want to prioritize adding BOARD_CONTROL support for more board
management systems in the near future.

Also, with regards to fanotify07;
This is one of the "active bugs", which some people will see and
some won't, depending on what kernel version they're running.  It
definitely indicates a bug (I think), and it's quite possible that an
LTS bugfix backport might fix it, so it's not good IMHO to remove it
from the test pool.  However, it is really problematic since it takes
down the test machine, and (under certain circumstances) causes
a Fuego failure cascade where all remaining queued tests in Jenkins
for that board fail as well.  Yuk.

Tests like these are why I wrote LTP_one_test, to isolate
these into individual units that could be tested independently from
the main LTP set of tests.  I can imagine making a board-specific
testplan_ltp_problem_tests file, listing LTP_one_test with
multiple different specs (indicating multiple different single LTP
test programs).  This could be run at a different frequency than
the full LTP, to check the status of these testcases.

I don't know what most QA departments do with tests that fail, but
are known to work in future versions of the software.  Do you just
always skip them?  Do you just remember to ignore their failure
results in your log?  We can use criteria files to ignore results but there's
a similar issue there.  How do you know to turn a test back on (stop skipping
it) when your software upgrades?  Or how do you know when to stop
ignoring a test's fail status, in the criteria file?
It seems like if a tester is not careful, they will continue skipping or ignoring
tests indefinitely, when it would be better to occasionally re-check the
status of failing tests to see if they've been fixed.

I assume that if a kernel is 4.14 or later, we really want to run fanotify07,
to catch possible regressions.  That's why I declined Daniel's patch
to put fanotify07 in the skiplist for the default spec for LTP.

Finally, I wrote the per-testcase-documentation system to capture
detailed information, like what Dhinakar has provided, so that other
users could avoid having to do the same research to figure out
what is going on with a test that fails or errors out.
Note that one of the files I already created was
Functional.LTP/docs/Functional.LTP.syscalls.fanotify07.ftmp, so
I could gather this information.  I've had problems with this test
myself, that I hadn't reported yet, because I didn't have time
to do the research into why it was misbehaving.  But hey, I made
the file to hold the information I wanted to gather!  :-)
I'll put Dhinakar's information there, but probably people won't
notice because the system is not well-know or visible yet.
But it's a start.

The per-testcase-documentation system is not completely
implemented, but is intended to eventually provide a combination
of static and dynamic information on a test, testset or testcase -  accessible
from the Jenkins interface.

OK - I've talked far too long.  But feedback or thoughts on these
issues is welcome.  I'd like Fuego to handle these types of situations
better, and any ideas are appreciated.

Regards,
 -- Tim


> 
> Regards,
> 
> Dhinakar
> 
> 
> 
> 
> 
> --------- Original Message ---------
> 
> Sender : Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp>
> 
> Date : 2018-05-10 13:07 (GMT+5:30)
> 
> Title : [Fuego] [PATCH] LTP: put fanotify07 in the skiplist
> 
> To : fuego@lists.linuxfoundation.org
> 
> 
> 
> This test seems problematic. I need to investigate it
> further but for now, it maybe a good idea to put it in the
> skip list.
> 
> tst_test.c:1015: INFO: Timeout per run is 0h 05m 00s
> Test timeouted, sending SIGKILL!
> Test timeouted, sending SIGKILL!
> Test timeouted, sending SIGKILL!
> Test timeouted, sending SIGKILL!
> Test timeouted, sending SIGKILL!
> Test timeouted, sending SIGKILL!
> Test timeouted, sending SIGKILL!
> Test timeouted, sending SIGKILL!
> Test timeouted, sending SIGKILL!
> Test timeouted, sending SIGKILL!
> Test timeouted, sending SIGKILL!
> Cannot kill test processes!
> Congratulation, likely test hit a kernel bug.
> Exitting uncleanly...
> 
> Signed-off-by: Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp>
> ---
>  engine/tests/Functional.LTP/spec.json | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/engine/tests/Functional.LTP/spec.json
> b/engine/tests/Functional.LTP/spec.json
> index 3b48bdc..1f78afb 100644
> --- a/engine/tests/Functional.LTP/spec.json
> +++ b/engine/tests/Functional.LTP/spec.json
> @@ -3,6 +3,7 @@
>      "specs": {
>          "default": {
>              "tests": "syscalls SEM",
> +            "skiplist": "fanotify07",
>              "extra_success_links": {"xlsx": "results.xlsx", "skiplist": "skiplist.txt"},
>              "extra_fail_links": {"xlsx": "results.xlsx", "skiplist": "skiplist.txt"}
>          },
> @@ -14,6 +15,7 @@
>          },
>          "selection": {
>              "tests": "syscalls fs pipes sched timers dio mm ipc pty AIO MSG SEM
> SIG THR TMR TPS",
> +            "skiplist": "fanotify07",
>              "extra_success_links": {"xlsx": "results.xlsx", "skiplist": "skiplist.txt"},
>              "extra_fail_links": {"xlsx": "results.xlsx", "skiplist": "skiplist.txt"}
>          },
> --
> 2.7.4
> 
> 
> _______________________________________________
> Fuego mailing list
> Fuego@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/fuego
> 
> 
> 
> 
> <http://ext.samsung.net/mail/ext/v1/external/status/update?userid=dhina
> kar.k&do=bWFpbElEPTIwMTgwNTEwMTYxMzU3ZXBjbXM1cDVhMGRiMjUy
> OWVhZTViYzc2ZWIxM2UzZTZjMTgyZTI3MCZyZWNpcGllbnRBZGRyZXNzPWZ1
> ZWdvQGxpc3RzLmxpbnV4Zm91bmRhdGlvbi5vcmc_>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Fuego] [PATCH] LTP: put fanotify07 in the skiplist
       [not found]     ` <CGME20180510073701epcas2p3fdbf9e1899d22350d9a29a67f3637d73@epcms5p7>
@ 2018-05-11  6:00       ` Dhinakar Kalyanasundaram
  0 siblings, 0 replies; 7+ messages in thread
From: Dhinakar Kalyanasundaram @ 2018-05-11  6:00 UTC (permalink / raw)
  To: Tim.Bird; +Cc: fuego

[-- Attachment #1: Type: text/html, Size: 12192 bytes --]

[-- Attachment #2: 201602111742151_N3WZA6X7.png --]
[-- Type: image/png, Size: 33527 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Fuego] [PATCH] LTP: put fanotify07 in the skiplist
  2018-05-10 21:50     ` Tim.Bird
@ 2018-05-14  2:31       ` Daniel Sangorrin
  2018-05-14 23:25         ` Tim.Bird
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Sangorrin @ 2018-05-14  2:31 UTC (permalink / raw)
  To: Tim.Bird, dhinakar.k; +Cc: fuego

> -----Original Message-----
> From: Tim.Bird@sony.com [mailto:Tim.Bird@sony.com]

> > -----Original Message-----
> > From: Dhinakar Kalyanasundaram
> >
> > Yes fanotify07 test is problematic if you use old kernel before 4.14 version.
> >
> > We used kernel 4.9 and fanotify07 resulted in kernel panic when executed.
> >
> > This issue has been fixed in kernel version 4.14.
> >
> > Please refer to this link https://www.spinics.net/lists/linux-
> > fsdevel/msg109131.html
> >
> > fanotify07 executes without any issue in kernel 4.14.
> >
> 
> Dhinakar,
> 
> Thanks very much for this information.  I had been meaning to investigate
> the issue with fanotify07 (which also fails on some of my boards here), but had
> not gotten around to it yet.

Thanks Dhinakar for the info!!.

Tim, the user can do FUNCTIONAL_LTP_BOARD_SKIPLIST="fanotify07" but that's once
the test has failed. How about adding fanotify07 to the skiplist automatically after checking
the kernel version? 
# the downside is that LTS kernels have lots of backports that make kernel version checking ineffective.
# I will add a flag to skip "automatic checks" for users who know what they are doing.

> This test causes a hang in the middle of an LTP test, which is really a pain.
> Fuego doesn't handle this kind of thing very well.  In the case of other tests,
> if the test hangs in the middle you can see from the console log where the
> test left off.  The LTP output is too quiet, in my judgement, as it takes a lot
> of effort to see where the test got to before the machine hung.  This is
> something that would be good to change.

Yes, I agree. I think there must be a verbose flag, I will try to find it.

> Also - once the machine hangs there is no mechanism to continue
> where the test left off.  The result is that many LTP sub-testcases don't get run.
> Right now the only mechanism we have to deal with this is to skip
> the individual testcase (the fanotify07 program).
> Luckily, Daniel has added the skiplist feature to LTP to support this,
> which is nice. We might want to think about also implementing a test
> re-start mechanism, or making LTP more fine-grained in its execution.

We have the timeout mechanism. If we run LTP in chunks as you propose below
it would be easier to use. LTP has also a timeout per test_case that we are not
exploiting at the moment.

> In general, LTP is a "big" test, and Fuego is really architected around
> running "small" tests, that don't hang the DUT while they are in progress.
> It might make more sense to structure LTP a little differently, with
> testplans that use a series of specs that run smaller test sets from LTP,
> rather than huge test sets (syscalls is notoriously long).

I like the idea of a testplan to run LTP in small chunks.

> I'm not sure of all the pros and cons to having lots of specs, but specs
> that ran smaller sets of programs would lose less data when the
> machine hung than the current configuration does now.

The cons would be that after parsing, we would have multiple spreadsheets.
But probably with ftc we can generate a report that puts all results together.

> Fuego should be able to reboot a hung machine and continue to the
> next scheduled Fuego test, when the BOARD_CONTROL feature is
> being used. However, the BOARD_CONTROL feature needs more
> work to support LAVA and other DUT control systems.  We might
> want to prioritize adding BOARD_CONTROL support for more board
> management systems in the near future.

We need to add support for different remote power switches, to be
able to forcefully reboot a machine whose kernel has crashed. But also
support a simple "reboot" command for those who haven't.
# or a "thumb finger" notification[1] asking the user to reboot the machine manually.
 
> Also, with regards to fanotify07;
> This is one of the "active bugs", which some people will see and
> some won't, depending on what kernel version they're running.  It
> definitely indicates a bug (I think), and it's quite possible that an
> LTS bugfix backport might fix it, so it's not good IMHO to remove it
> from the test pool.  However, it is really problematic since it takes
> down the test machine, and (under certain circumstances) causes
> a Fuego failure cascade where all remaining queued tests in Jenkins
> for that board fail as well.  Yuk.

The first step will be to investigate which commits are necessary and then
backport them to LTS if necessary.

> Tests like these are why I wrote LTP_one_test, to isolate
> these into individual units that could be tested independently from
> the main LTP set of tests.  I can imagine making a board-specific
> testplan_ltp_problem_tests file, listing LTP_one_test with
> multiple different specs (indicating multiple different single LTP
> test programs).  This could be run at a different frequency than
> the full LTP, to check the status of these testcases.

Thanks for LTP_one_test. 
In my case, I usually login into the board and run the test manually often under strace.
# I set the clean target flag off so that the binaries remain on the board.

It would be nice to have an option that adds strace to the tests. Maybe it can be added
to LTP upstream.

> I don't know what most QA departments do with tests that fail, but
> are known to work in future versions of the software.  Do you just
> always skip them?  Do you just remember to ignore their failure
> results in your log?  We can use criteria files to ignore results but there's
> a similar issue there.  How do you know to turn a test back on (stop skipping
> it) when your software upgrades?  Or how do you know when to stop
> ignoring a test's fail status, in the criteria file?
> It seems like if a tester is not careful, they will continue skipping or ignoring
> tests indefinitely, when it would be better to occasionally re-check the
> status of failing tests to see if they've been fixed.
> 
> I assume that if a kernel is 4.14 or later, we really want to run fanotify07,
> to catch possible regressions.  That's why I declined Daniel's patch
> to put fanotify07 in the skiplist for the default spec for LTP.
> 
> Finally, I wrote the per-testcase-documentation system to capture
> detailed information, like what Dhinakar has provided, so that other
> users could avoid having to do the same research to figure out
> what is going on with a test that fails or errors out.
> Note that one of the files I already created was
> Functional.LTP/docs/Functional.LTP.syscalls.fanotify07.ftmp, so
> I could gather this information.  I've had problems with this test
> myself, that I hadn't reported yet, because I didn't have time
> to do the research into why it was misbehaving.  But hey, I made
> the file to hold the information I wanted to gather!  :-)
> I'll put Dhinakar's information there, but probably people won't
> notice because the system is not well-know or visible yet.
> But it's a start.
> 
> The per-testcase-documentation system is not completely
> implemented, but is intended to eventually provide a combination
> of static and dynamic information on a test, testset or testcase -  accessible
> from the Jenkins interface.
> 
> OK - I've talked far too long.  But feedback or thoughts on these
> issues is welcome.  I'd like Fuego to handle these types of situations
> better, and any ideas are appreciated.

Thanks,
Daniel

[1] https://elinux.org/images/f/fd/Automated_Testing_with_ktest.pl_%28Embedded_Edition%29.pdf

> >
> > Regards,
> >
> > Dhinakar
> >
> >
> >
> >
> >
> > --------- Original Message ---------
> >
> > Sender : Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp>
> >
> > Date : 2018-05-10 13:07 (GMT+5:30)
> >
> > Title : [Fuego] [PATCH] LTP: put fanotify07 in the skiplist
> >
> > To : fuego@lists.linuxfoundation.org
> >
> >
> >
> > This test seems problematic. I need to investigate it
> > further but for now, it maybe a good idea to put it in the
> > skip list.
> >
> > tst_test.c:1015: INFO: Timeout per run is 0h 05m 00s
> > Test timeouted, sending SIGKILL!
> > Test timeouted, sending SIGKILL!
> > Test timeouted, sending SIGKILL!
> > Test timeouted, sending SIGKILL!
> > Test timeouted, sending SIGKILL!
> > Test timeouted, sending SIGKILL!
> > Test timeouted, sending SIGKILL!
> > Test timeouted, sending SIGKILL!
> > Test timeouted, sending SIGKILL!
> > Test timeouted, sending SIGKILL!
> > Test timeouted, sending SIGKILL!
> > Cannot kill test processes!
> > Congratulation, likely test hit a kernel bug.
> > Exitting uncleanly...
> >
> > Signed-off-by: Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp>
> > ---
> >  engine/tests/Functional.LTP/spec.json | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/engine/tests/Functional.LTP/spec.json
> > b/engine/tests/Functional.LTP/spec.json
> > index 3b48bdc..1f78afb 100644
> > --- a/engine/tests/Functional.LTP/spec.json
> > +++ b/engine/tests/Functional.LTP/spec.json
> > @@ -3,6 +3,7 @@
> >      "specs": {
> >          "default": {
> >              "tests": "syscalls SEM",
> > +            "skiplist": "fanotify07",
> >              "extra_success_links": {"xlsx": "results.xlsx", "skiplist":
> "skiplist.txt"},
> >              "extra_fail_links": {"xlsx": "results.xlsx", "skiplist": "skiplist.txt"}
> >          },
> > @@ -14,6 +15,7 @@
> >          },
> >          "selection": {
> >              "tests": "syscalls fs pipes sched timers dio mm ipc pty AIO MSG SEM
> > SIG THR TMR TPS",
> > +            "skiplist": "fanotify07",
> >              "extra_success_links": {"xlsx": "results.xlsx", "skiplist":
> "skiplist.txt"},
> >              "extra_fail_links": {"xlsx": "results.xlsx", "skiplist": "skiplist.txt"}
> >          },
> > --
> > 2.7.4
> >
> >
> > _______________________________________________
> > Fuego mailing list
> > Fuego@lists.linuxfoundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/fuego
> >
> >
> >
> >
> > <http://ext.samsung.net/mail/ext/v1/external/status/update?userid=dhina
> > kar.k&do=bWFpbElEPTIwMTgwNTEwMTYxMzU3ZXBjbXM1cDVhMGRiMjUy
> > OWVhZTViYzc2ZWIxM2UzZTZjMTgyZTI3MCZyZWNpcGllbnRBZGRyZXNzPWZ1
> > ZWdvQGxpc3RzLmxpbnV4Zm91bmRhdGlvbi5vcmc_>



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Fuego] [PATCH] LTP: put fanotify07 in the skiplist
  2018-05-14  2:31       ` Daniel Sangorrin
@ 2018-05-14 23:25         ` Tim.Bird
  0 siblings, 0 replies; 7+ messages in thread
From: Tim.Bird @ 2018-05-14 23:25 UTC (permalink / raw)
  To: daniel.sangorrin, dhinakar.k; +Cc: fuego



> -----Original Message-----
> From: Daniel Sangorrin 
> 
> > -----Original Message-----
> > From: Tim.Bird@sony.com [mailto:Tim.Bird@sony.com]
> 
> > > -----Original Message-----
> > > From: Dhinakar Kalyanasundaram
> > >
> > > Yes fanotify07 test is problematic if you use old kernel before 4.14
> version.
> > >
> > > We used kernel 4.9 and fanotify07 resulted in kernel panic when
> executed.
> > >
> > > This issue has been fixed in kernel version 4.14.
> > >
> > > Please refer to this link https://www.spinics.net/lists/linux-
> > > fsdevel/msg109131.html
> > >
> > > fanotify07 executes without any issue in kernel 4.14.
> > >
> >
> > Dhinakar,
> >
> > Thanks very much for this information.  I had been meaning to investigate
> > the issue with fanotify07 (which also fails on some of my boards here), but
> had
> > not gotten around to it yet.
> 
> Thanks Dhinakar for the info!!.
> 
> Tim, the user can do FUNCTIONAL_LTP_BOARD_SKIPLIST="fanotify07" but
> that's once
> the test has failed. How about adding fanotify07 to the skiplist automatically
> after checking
> the kernel version?
> # the downside is that LTS kernels have lots of backports that make kernel
> version checking ineffective.
> # I will add a flag to skip "automatic checks" for users who know what they
> are doing.

I was considering how to deal with this and utilize the skiplist feature
in some intelligent way.  I think it is appropriate to add fanotify07 to a skiplist
based on the kernel version.  But there should be some mechanism, like
you describe, to override the skiplist, for experts.  But it doesn't have to
be an additional skiplist mechanism.  We could make an LTP scenario, or
an LTP Fuego spec, that explicitly tested weird corner cases like this, add fanotify07
and other problematic tests, and leave it as an exercise  for the user when
to run them.

> 
> > This test causes a hang in the middle of an LTP test, which is really a pain.
> > Fuego doesn't handle this kind of thing very well.  In the case of other
> tests,
> > if the test hangs in the middle you can see from the console log where the
> > test left off.  The LTP output is too quiet, in my judgement, as it takes a lot
> > of effort to see where the test got to before the machine hung.  This is
> > something that would be good to change.
> 
> Yes, I agree. I think there must be a verbose flag, I will try to find it.
There are so many layers in LTP it's hard to figure out what's going on.
In ltp_target_run.sh, we output the "main" stream to output.log.
(and all the different parts of the log are split into different files).

One solution might be just to tail that, filtering through 'sed' for the lines
indicating each new test program, and have that go back to the Fuego 'report'
function.

> 
> > Also - once the machine hangs there is no mechanism to continue
> > where the test left off.  The result is that many LTP sub-testcases don't get
> run.
> > Right now the only mechanism we have to deal with this is to skip
> > the individual testcase (the fanotify07 program).
> > Luckily, Daniel has added the skiplist feature to LTP to support this,
> > which is nice. We might want to think about also implementing a test
> > re-start mechanism, or making LTP more fine-grained in its execution.
> 
> We have the timeout mechanism. If we run LTP in chunks as you propose
> below
> it would be easier to use. LTP has also a timeout per test_case that we are
> not
> exploiting at the moment.
Indeed.

> 
> > In general, LTP is a "big" test, and Fuego is really architected around
> > running "small" tests, that don't hang the DUT while they are in progress.
> > It might make more sense to structure LTP a little differently, with
> > testplans that use a series of specs that run smaller test sets from LTP,
> > rather than huge test sets (syscalls is notoriously long).
> 
> I like the idea of a testplan to run LTP in small chunks.
> 
> > I'm not sure of all the pros and cons to having lots of specs, but specs
> > that ran smaller sets of programs would lose less data when the
> > machine hung than the current configuration does now.
> 
> The cons would be that after parsing, we would have multiple spreadsheets.
> But probably with ftc we can generate a report that puts all results together.
> 
> > Fuego should be able to reboot a hung machine and continue to the
> > next scheduled Fuego test, when the BOARD_CONTROL feature is
> > being used. However, the BOARD_CONTROL feature needs more
> > work to support LAVA and other DUT control systems.  We might
> > want to prioritize adding BOARD_CONTROL support for more board
> > management systems in the near future.
> 
> We need to add support for different remote power switches, to be
> able to forcefully reboot a machine whose kernel has crashed. But also
> support a simple "reboot" command for those who haven't.
> # or a "thumb finger" notification[1] asking the user to reboot the machine
> manually.
Agreed.  That's what I'd like to add to BOARD_CONTROL.

I haven't quite figured out how to notify the user to ask them to reboot
the machine.  If we're on the command line, we can stop and ask.  But
if we're executing a Jenkins job, it would be nice if there were some
dialog box popup.  Probably, this needs to be abstracted so that the user
could select the operation (e.g. text my mobile phone), that they wanted
for the notification.
> 
> > Also, with regards to fanotify07;
> > This is one of the "active bugs", which some people will see and
> > some won't, depending on what kernel version they're running.  It
> > definitely indicates a bug (I think), and it's quite possible that an
> > LTS bugfix backport might fix it, so it's not good IMHO to remove it
> > from the test pool.  However, it is really problematic since it takes
> > down the test machine, and (under certain circumstances) causes
> > a Fuego failure cascade where all remaining queued tests in Jenkins
> > for that board fail as well.  Yuk.
> 
> The first step will be to investigate which commits are necessary and then
> backport them to LTS if necessary.
> 
> > Tests like these are why I wrote LTP_one_test, to isolate
> > these into individual units that could be tested independently from
> > the main LTP set of tests.  I can imagine making a board-specific
> > testplan_ltp_problem_tests file, listing LTP_one_test with
> > multiple different specs (indicating multiple different single LTP
> > test programs).  This could be run at a different frequency than
> > the full LTP, to check the status of these testcases.
> 
> Thanks for LTP_one_test.
> In my case, I usually login into the board and run the test manually often
> under strace.
> # I set the clean target flag off so that the binaries remain on the board.
> 
> It would be nice to have an option that adds strace to the tests. Maybe it can
> be added
> to LTP upstream.

Do we want to add some kind of "use strace" option to Fuego, that would
be applied to programs ran by the "report" function?  We would try to make
this transparent to the fuego_test.sh script, but controllable via an environment
variable (maybe controlling it with FUEGO_DEBUG, which I might remind
everyone supports bitwise flags).

That might be cool.
 -- Tim

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-05-14 23:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-10  7:36 [Fuego] [PATCH] LTP: put fanotify07 in the skiplist Daniel Sangorrin
     [not found] ` <CGME20180510073701epcas2p3fdbf9e1899d22350d9a29a67f3637d73@epcms5p5>
2018-05-10 16:13   ` Dhinakar Kalyanasundaram
2018-05-10 21:50     ` Tim.Bird
2018-05-14  2:31       ` Daniel Sangorrin
2018-05-14 23:25         ` Tim.Bird
     [not found]     ` <CGME20180510073701epcas2p3fdbf9e1899d22350d9a29a67f3637d73@epcms5p7>
2018-05-11  6:00       ` Dhinakar Kalyanasundaram
2018-05-10 20:50 ` Tim.Bird

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.