All of lore.kernel.org
 help / color / mirror / Atom feed
* [Fuego] target_reboot during test_run
@ 2017-01-12 22:44 Maciej Pijanowski
  2017-01-13  0:18 ` Bird, Timothy
  0 siblings, 1 reply; 6+ messages in thread
From: Maciej Pijanowski @ 2017-01-12 22:44 UTC (permalink / raw)
  To: fuego; +Cc: Jakub Ruczyński, Piotr Król

Hello,

Recenty we've started to use Fuego as a test framework for our project. 
We had
some success with first tests, but then we began to struggle when it 
came to test
our system upgrade process. It requires us to perform target reboot 
during test.
For example, a part of a test (or one of tests) would be to check current
mounted rootfs partition, perform target reboot, and after reboot check 
again
which one from the rootfs partitions has been mounted. Few issues have been
spotted there:

   - systemlogs.before were missing - I understand that the reason is that
     systemlogs are stored under /tmp and transferred to host in post_test
     function; if reboot occours /tmp is being cleared and logs are gone
   - there are no testlogs on host
   - overall test fails due to connection loss; after reboot it enters 
post_test
     function immediately - commands inside test_deploy function located 
after
     target_reboot command are never executed

The same failure happens when trying to run default Benchmark.Reboot 
test with
default testplan. To be more specific: after reboot it enters post_test
function and report function never is never executed.

function test_run {
     # MAX_REBOOT_RETRIES can be defined in the board's file.
     # Otherwise, the default is 20 retries
     retries=${MAX_REBOOT_RETRIES:-20}
     target_reboot $retries
     report "cd $FUEGO_HOME/fuego.$TESTDIR; ./reboot"
}

 From what I've found there [1] it looks like Benchmark.Reboot is (used 
to) work
properly.

I've also found FIXME note in documentation regarding report function 
[2]. Does
this apply in my case?

My question is: what is the current state of target_reboot during
test_run phase? Maybe I'm missing something or not running Benchmark.Reboot
properly - if so, what would be the correct way?

Thanks for any advice.

[1] 
https://lists.linuxfoundation.org/pipermail/fuego/2016-November/000100.html
[2] http://bird.org/fuego/function_report


-- 
Maciej Pijanowski
Embedded Systems Engineer
http://3mdeb.com | @3mdeb_com


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Fuego] target_reboot during test_run
  2017-01-12 22:44 [Fuego] target_reboot during test_run Maciej Pijanowski
@ 2017-01-13  0:18 ` Bird, Timothy
  2017-01-13 18:29   ` Maciej Pijanowski
  0 siblings, 1 reply; 6+ messages in thread
From: Bird, Timothy @ 2017-01-13  0:18 UTC (permalink / raw)
  To: Maciej Pijanowski, fuego; +Cc: Jakub Ruczyński, Piotr Król



> -----Original Message-----
> From Maciej Pijanowski on  Thursday, January 12, 2017 2:44 PM
>
> Recently we've started to use Fuego as a test framework for our project.
> We had
> some success with first tests, but then we began to struggle when it
> came to test
> our system upgrade process. It requires us to perform target reboot
> during test.
> For example, a part of a test (or one of tests) would be to check current
> mounted rootfs partition, perform target reboot, and after reboot check
> again
> which one from the rootfs partitions has been mounted. Few issues have
> been
> spotted there:

Thanks for reporting these issues.  Reboot is an area that could use some
improvement and this is helpful.
> 
>    - systemlogs.before were missing - I understand that the reason is that
>      systemlogs are stored under /tmp and transferred to host in post_test
>      function; if reboot occours /tmp is being cleared and logs are gone

There are two ways to get around this.  One would be to make the
log file location configurable.  ie, put something in the board file,
and put the logs on a more persistent area of the file system.
this seems like the best way to handle it, and shouldn't be too hard

I think this could be done by modifying function ov_rootfs_logread()
in overlays/base/base-distrib.sh.  that would at least put the 'before'
system log somewhere else.  We could have this function check
FUEGO_TARGET_LOG_DIR or some such variable, and use
that instead of /tmp (if it were defined) as the location for system
logs.

The other way around this problem would be to just ignore the error.
Likely, the system is dropping out of post_test due to the missing
log (failure of the get operation).  The system logs are not essential
for test operation.  They are just used to check for Oopses during the
test.  We could probably wrap the getting of the test logs with a 
set +e and set -e (to avoid erroring out if there's a problem).
However, this seems like papering over the problem.

>    - there are no testlogs on host
Probably this is an artifact of post_test not completing (and the
logs getting wiped out on a reboot).  We probably need a configurable
log directory on the target to solve issues with this disappearing mid-test
as well.  'report_append' expects the testlog to be persistent on the
target, and clearly won't work across reboots in configurations like yours.

>    - overall test fails due to connection loss; after reboot it enters
> post_test
>      function immediately - commands inside test_deploy function located
> after
>      target_reboot command are never executed

Hmmm.  I'm not sure that target_reboot should be in test_deploy.
It seems like it should be in test_run instead, if you're actually doing
testing of the reboot itself, or the reboot is an integral part of the test.

> 
> The same failure happens when trying to run default Benchmark.Reboot
> test with
> default testplan. To be more specific: after reboot it enters post_test
> function and report function never is never executed.
> 
> function test_run {
>      # MAX_REBOOT_RETRIES can be defined in the board's file.
>      # Otherwise, the default is 20 retries
>      retries=${MAX_REBOOT_RETRIES:-20}
>      target_reboot $retries
>      report "cd $FUEGO_HOME/fuego.$TESTDIR; ./reboot"
> }
> 
>  From what I've found there [1] it looks like Benchmark.Reboot is (used
> to) work
> properly.
> 
> I've also found FIXME note in documentation regarding report function
> [2]. Does
> this apply in my case?

I hadn't seen this note added by someone.  Thanks for bringing it to my
attention!

The 'report' operation should finish and the testlog should be on the
target in /tmp, before a target_reboot operation should proceed in
test_run.  Looking at the code in report, it should finish and the log
should be created, before the next operation in the test script.

I think that is referring to a test program (which could be a script)
which has an embedded 'reboot' in it.  The FIXTHIS comment would
be correct in such a case; the connection to the host during the command
will be severed, and if the script has more activity following the reboot, it
won't be recorded in the log (or the script started for that matter)
But I'm not sure how this would work anyway.  I don't
know of any scripts that would run on target and expect to continue
at the next command after a reboot operation.  We can do these type
of reboots in Fuego since the scripts are run on host.

> 
> My question is: what is the current state of target_reboot during
> test_run phase? Maybe I'm missing something or not running
> Benchmark.Reboot
> properly - if so, what would be the correct way?

One quick thing you could do is modify all uses of
/tmp in fuego-core/engine/scripts/functions.sh
and fuego-core/engine/overlays/base/base-distrib.fuegoclass
and replace it with something else. (maybe /usr/tmp or /home/tmp)
Something that doesn't get wiped clean during your startup.

There are 9 uses of /tmp in functions.sh (not in comments)
that I count, 2 in base-distrib.fuegoclass, and it shouldn't be too
hard to just do a search and replace, to see if that solves the problem.

Can you do that and get back to me with the results?

If it works, I'll work up a proper solution that involves using
a default of /tmp and allowing the board file to override
the log file directory.

Thanks.
 -- Tim

P.S. It looks like there's a bug in target_cleanup anyway, that
I found looking through the code.  It does a "rm -rf /tmp/*", which
seems really aggressive and potentially dangerous.  It should
only remove the fuego-related files from /tmp, and not everything.
Thanks for reporting this issue and giving us a chance to review this
part of Fuego.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Fuego] target_reboot during test_run
  2017-01-13  0:18 ` Bird, Timothy
@ 2017-01-13 18:29   ` Maciej Pijanowski
  2017-01-19  5:30     ` Bird, Timothy
  0 siblings, 1 reply; 6+ messages in thread
From: Maciej Pijanowski @ 2017-01-13 18:29 UTC (permalink / raw)
  To: Bird, Timothy, fuego; +Cc: Jakub Ruczyński, Piotr Król, kamil.wcislo

[-- Attachment #1: Type: text/plain, Size: 8232 bytes --]



On 13.01.2017 01:18, Bird, Timothy wrote:

Thank you for such detailed answer.
>
>> -----Original Message-----
>>  From Maciej Pijanowski on  Thursday, January 12, 2017 2:44 PM
>>
>> Recently we've started to use Fuego as a test framework for our project.
>> We had
>> some success with first tests, but then we began to struggle when it
>> came to test
>> our system upgrade process. It requires us to perform target reboot
>> during test.
>> For example, a part of a test (or one of tests) would be to check current
>> mounted rootfs partition, perform target reboot, and after reboot check
>> again
>> which one from the rootfs partitions has been mounted. Few issues have
>> been
>> spotted there:
> Thanks for reporting these issues.  Reboot is an area that could use some
> improvement and this is helpful.
>>     - systemlogs.before were missing - I understand that the reason is that
>>       systemlogs are stored under /tmp and transferred to host in post_test
>>       function; if reboot occours /tmp is being cleared and logs are gone
> There are two ways to get around this.  One would be to make the
> log file location configurable.  ie, put something in the board file,
> and put the logs on a more persistent area of the file system.
> this seems like the best way to handle it, and shouldn't be too hard
>
> I think this could be done by modifying function ov_rootfs_logread()
> in overlays/base/base-distrib.sh.  that would at least put the 'before'
> system log somewhere else.  We could have this function check
> FUEGO_TARGET_LOG_DIR or some such variable, and use
> that instead of /tmp (if it were defined) as the location for system
> logs.
>
> The other way around this problem would be to just ignore the error.
> Likely, the system is dropping out of post_test due to the missing
> log (failure of the get operation).  The system logs are not essential
> for test operation.  They are just used to check for Oopses during the
> test.  We could probably wrap the getting of the test logs with a
> set +e and set -e (to avoid erroring out if there's a problem).
> However, this seems like papering over the problem.
>
>>     - there are no testlogs on host
> Probably this is an artifact of post_test not completing (and the
> logs getting wiped out on a reboot).  We probably need a configurable
> log directory on the target to solve issues with this disappearing mid-test
> as well.  'report_append' expects the testlog to be persistent on the
> target, and clearly won't work across reboots in configurations like yours.
>
>>     - overall test fails due to connection loss; after reboot it enters
>> post_test
>>       function immediately - commands inside test_deploy function located
>> after
>>       target_reboot command are never executed
> Hmmm.  I'm not sure that target_reboot should be in test_deploy.
> It seems like it should be in test_run instead, if you're actually doing
> testing of the reboot itself, or the reboot is an integral part of the test.
You are right. I meant 'test_run' instead of 'test_deploy', sorry for 
confusion.
>
>> The same failure happens when trying to run default Benchmark.Reboot
>> test with
>> default testplan. To be more specific: after reboot it enters post_test
>> function and report function never is never executed.
>>
>> function test_run {
>>       # MAX_REBOOT_RETRIES can be defined in the board's file.
>>       # Otherwise, the default is 20 retries
>>       retries=${MAX_REBOOT_RETRIES:-20}
>>       target_reboot $retries
>>       report "cd $FUEGO_HOME/fuego.$TESTDIR; ./reboot"
>> }
>>
>>   From what I've found there [1] it looks like Benchmark.Reboot is (used
>> to) work
>> properly.
>>
>> I've also found FIXME note in documentation regarding report function
>> [2]. Does
>> this apply in my case?
> I hadn't seen this note added by someone.  Thanks for bringing it to my
> attention!
>
> The 'report' operation should finish and the testlog should be on the
> target in /tmp, before a target_reboot operation should proceed in
> test_run.  Looking at the code in report, it should finish and the log
> should be created, before the next operation in the test script.
>
> I think that is referring to a test program (which could be a script)
> which has an embedded 'reboot' in it.  The FIXTHIS comment would
> be correct in such a case; the connection to the host during the command
> will be severed, and if the script has more activity following the reboot, it
> won't be recorded in the log (or the script started for that matter)
> But I'm not sure how this would work anyway.  I don't
> know of any scripts that would run on target and expect to continue
> at the next command after a reboot operation.  We can do these type
> of reboots in Fuego since the scripts are run on host.
>
>> My question is: what is the current state of target_reboot during
>> test_run phase? Maybe I'm missing something or not running
>> Benchmark.Reboot
>> properly - if so, what would be the correct way?
> One quick thing you could do is modify all uses of
> /tmp in fuego-core/engine/scripts/functions.sh
> and fuego-core/engine/overlays/base/base-distrib.fuegoclass
> and replace it with something else. (maybe /usr/tmp or /home/tmp)
> Something that doesn't get wiped clean during your startup.
>
> There are 9 uses of /tmp in functions.sh (not in comments)
> that I count, 2 in base-distrib.fuegoclass, and it shouldn't be too
> hard to just do a search and replace, to see if that solves the problem.
>
> Can you do that and get back to me with the results?
I have changed all occurrences  of '/tmp' in above files to persistent 
storage.
At this moment 'target_reboot` still causes test to fail. It seems to me 
that it is
caused by:

```

Write failed: Broken pipe
Build step 'Execute shell' marked build as failure
```

I was able to overcome this by moving 'set +e' in 'target_reboot' to the
top of this function [3]. Now both systemlogs and testlog are on host in
test directory. 'Benchmark.Reboot' finishes with SUCCESS, however there is no
plot because of missing modules ('ImportError: No module named matplotlib',
'ImportError: No module named simplejson'). It is a bit strange, since I've
checked that these modules are installed for python2 inside fuego docker
and this script seems to be executed by python2. I'll take a look into that.
This test is not crucial for me, but I wanted to get default one working
before experimenting with custom tests.

I came up with really simple test just to test this feature:

```
function test_run {
     COMMAND="echo 123"
     report "$COMMAND"
     report_append "echo fail"
     target_reboot 20
     report_append "$COMMAND"
}

function test_processing {
     log_compare "$TESTDIR" "1" "^success" "p"
}

```

Testlog is on host and contains proper output from all 'report' and
'report_append' functions. I can even see the 'log_compare' result:


```
Fuego error reason: Mismatch in expected (1) and actual (0) pos/neg (p) results.
+++ check_create_functional_logrun failed
+++ '[' '' ']'
+++ false
```

However, it always succeeds:


```

POST BUILD TASK : SUCCESS
END OF POST BUILD TASK : 0
Finished: SUCCESS
```

So it looks like renaming `/tmp` brought some improvements. Was my
workaround for `target_reboot` appropriate? Any idea what may causing
constant test success even in case as above?

Full log from jenkins can be obtained there [4].


[3] 
https://bitbucket.org/tbird20d/fuego-core/src/c2ddbabc98ffa437b479dcbef33be901b3e229b5/engine/scripts/functions.sh?at=master&fileviewer=file-view-default#functions.sh-401
[4] http://pastebin.com/raw/UsbNZX2W
>
> If it works, I'll work up a proper solution that involves using
> a default of /tmp and allowing the board file to override
> the log file directory.
>
> Thanks.
>   -- Tim
>
> P.S. It looks like there's a bug in target_cleanup anyway, that
> I found looking through the code.  It does a "rm -rf /tmp/*", which
> seems really aggressive and potentially dangerous.  It should
> only remove the fuego-related files from /tmp, and not everything.
> Thanks for reporting this issue and giving us a chance to review this
> part of Fuego.
>

-- 
Maciej Pijanowski
Embedded Systems Engineer
http://3mdeb.com | @3mdeb_com


[-- Attachment #2: Type: text/html, Size: 10298 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Fuego] target_reboot during test_run
  2017-01-13 18:29   ` Maciej Pijanowski
@ 2017-01-19  5:30     ` Bird, Timothy
  2017-01-20  0:18       ` Bird, Timothy
  0 siblings, 1 reply; 6+ messages in thread
From: Bird, Timothy @ 2017-01-19  5:30 UTC (permalink / raw)
  To: Maciej Pijanowski, fuego
  Cc: Jakub Ruczyński, Piotr Król, kamil.wcislo



> -----Original Message-----
> From: Maciej Pijanowski on Friday, January 13, 2017 10:30 AM
...
> I have changed all occurrences  of '/tmp' in above files to persistent storage.
> At this moment 'target_reboot` still causes test to fail. It seems to me that it
> is
> caused by:
> 
> ```
> 
> Write failed: Broken pipe
> Build step 'Execute shell' marked build as failure
> ```
> 
> I was able to overcome this by moving 'set +e' in 'target_reboot' to the
> top of this function [3]. Now both systemlogs and testlog are on host in
> test directory. 'Benchmark.Reboot' finishes with SUCCESS, however there is
> no
> plot because of missing modules ('ImportError: No module named
> matplotlib',
> 'ImportError: No module named simplejson'). It is a bit strange, since I've
> checked that these modules are installed for python2 inside fuego docker
> and this script seems to be executed by python2. I'll take a look into that.
> This test is not crucial for me, but I wanted to get default one working
> before experimenting with custom tests.
> 
> I came up with really simple test just to test this feature:
> 
> ```
> function test_run {
>     COMMAND="echo 123"
>     report "$COMMAND"
>     report_append "echo fail"
>     target_reboot 20
>     report_append "$COMMAND"
> }
> 
> function test_processing {
>     log_compare "$TESTDIR" "1" "^success" "p"
> }
> 
> ```
> 
> Testlog is on host and contains proper output from all 'report' and
> 'report_append' functions. I can even see the 'log_compare' result:
> 
> 
> ```
> Fuego error reason: Mismatch in expected (1) and actual (0) pos/neg (p)
> results.
> +++ check_create_functional_logrun failed
> +++ '[' '' ']'
> +++ false
> ```
> However, it always succeeds:
> 
> 
> ```
> 
> POST BUILD TASK : SUCCESS
> END OF POST BUILD TASK : 0
> Finished: SUCCESS
> ```
> 
> So it looks like renaming `/tmp` brought some improvements. Was my
> workaround for `target_reboot` appropriate? Any idea what may causing
> constant test success even in case as above?

Thanks very much for the thorough testing and feedback.  I've got a fix
on my local machine, that I'm pretty pleased with.  I had planned to check
it into the 'next' branch today, but stumbled onto another error that sidetracked
me.  I'll do some more testing on the fix and hopefully get it out tomorrow.

Regards,
 -- Tim

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Fuego] target_reboot during test_run
  2017-01-19  5:30     ` Bird, Timothy
@ 2017-01-20  0:18       ` Bird, Timothy
  2017-01-29 10:22         ` Maciej Pijanowski
  0 siblings, 1 reply; 6+ messages in thread
From: Bird, Timothy @ 2017-01-20  0:18 UTC (permalink / raw)
  To: Bird, Timothy, Maciej Pijanowski, fuego
  Cc: Jakub Ruczyński, Piotr Król, kamil.wcislo

> -----Original Message-----
> From: Bird, Timothy on Wednesday, January 18, 2017 9:31 PM
...
> Thanks very much for the thorough testing and feedback.  I've got a fix
> on my local machine, that I'm pretty pleased with.  I had planned to check
> it into the 'next' branch today, but stumbled onto another error that
> sidetracked
> me.  I'll do some more testing on the fix and hopefully get it out tomorrow.

OK - I've got a fix in my next branch for this issue.  I found a couple
of other miscellaneous bugs while working on this.  This should handle
the case where a reboot during the test causes the /tmp directory to be cleared.

See https://bitbucket.org/tbird20d/fuego-core/commits/0adfb8f56688ab3af5e83bf885674b22b2f62b28

Note that you can specify a different tmp directory on target in your board file,
with the environment variable FUEGO_TARGET_TMP.  However, even if you don't
and the distribution on your board clears the tmp directory over a reboot, the code
won't fall over anymore.  Instead, it issues a warning in the console log for the test,
and uses an empty 'before' system log.   But the test itself won't fail for a missing
'before' log as it used to.

I ended up refactoring dump_syslog and ov_rootfs_logread, and fixing a bug
in the nosyslogd.dist file during the change.  The wiki pages for the functions
have been updated.

If you want to test this out, please use the 'next' branch.  That is, use the instructions
on this page, http://bird.org/fuego/Fuego_Quickstart_Guide
but use branch 'next' for the git clone operation:
git clone -b next https://bitbucket.org/tbird20d/fuego.git

Before I make a versioned release of this branch, I plan to finish my changes to ovgen.py
to simplify the testplan and test spec requirements for tests.

Let me know if you have any problems with this.
 -- Tim


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Fuego] target_reboot during test_run
  2017-01-20  0:18       ` Bird, Timothy
@ 2017-01-29 10:22         ` Maciej Pijanowski
  0 siblings, 0 replies; 6+ messages in thread
From: Maciej Pijanowski @ 2017-01-29 10:22 UTC (permalink / raw)
  To: Bird, Timothy, fuego; +Cc: Jakub Ruczyński, Piotr Król, kamil.wcislo



On 20.01.2017 01:18, Bird, Timothy wrote:
>> -----Original Message-----
>> From: Bird, Timothy on Wednesday, January 18, 2017 9:31 PM
> ...
>> Thanks very much for the thorough testing and feedback.  I've got a fix
>> on my local machine, that I'm pretty pleased with.  I had planned to check
>> it into the 'next' branch today, but stumbled onto another error that
>> sidetracked
>> me.  I'll do some more testing on the fix and hopefully get it out tomorrow.
> OK - I've got a fix in my next branch for this issue.  I found a couple
> of other miscellaneous bugs while working on this.  This should handle
> the case where a reboot during the test causes the /tmp directory to be cleared.
>
> See https://bitbucket.org/tbird20d/fuego-core/commits/0adfb8f56688ab3af5e83bf885674b22b2f62b28
>
> Note that you can specify a different tmp directory on target in your board file,
> with the environment variable FUEGO_TARGET_TMP.  However, even if you don't
> and the distribution on your board clears the tmp directory over a reboot, the code
> won't fall over anymore.  Instead, it issues a warning in the console log for the test,
> and uses an empty 'before' system log.   But the test itself won't fail for a missing
> 'before' log as it used to.
>
> I ended up refactoring dump_syslog and ov_rootfs_logread, and fixing a bug
> in the nosyslogd.dist file during the change.  The wiki pages for the functions
> have been updated.
>
> If you want to test this out, please use the 'next' branch.  That is, use the instructions
> on this page, http://bird.org/fuego/Fuego_Quickstart_Guide
> but use branch 'next' for the git clone operation:
> git clone -b next https://bitbucket.org/tbird20d/fuego.git
>
> Before I make a versioned release of this branch, I plan to finish my changes to ovgen.py
> to simplify the testplan and test spec requirements for tests.
>
> Let me know if you have any problems with this.
Thanks. W had to postpone fuego implementation for a moment. I will come
back with the results as soon as I can go back to it.
>   -- Tim
>
>

-- 
Maciej Pijanowski
Embedded Systems Engineer
http://3mdeb.com | @3mdeb_com


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-01-29 10:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-12 22:44 [Fuego] target_reboot during test_run Maciej Pijanowski
2017-01-13  0:18 ` Bird, Timothy
2017-01-13 18:29   ` Maciej Pijanowski
2017-01-19  5:30     ` Bird, Timothy
2017-01-20  0:18       ` Bird, Timothy
2017-01-29 10:22         ` Maciej Pijanowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.