openembedded-core.lists.openembedded.org archive mirror
 help / color / mirror / Atom feed
* [OE-Core][RFC][PATCH 0/3] add failed test artifacts retriever
@ 2023-06-02  9:50 Alexis Lothoré
  2023-06-02  9:50 ` [OE-Core][RFC][PATCH 1/3] oeqa/target/ssh: update options for SCP Alexis Lothoré
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Alexis Lothoré @ 2023-06-02  9:50 UTC (permalink / raw)
  To: Openembedded-core; +Cc: Thomas Petazzoni, Alexandre Belloni

This series is a proposal to bring in an "artifact retriever" to ease
debugging when some runtime tests fails. This is a follow-up to
corresponding RFC ([1]), which in turn is a proposal to address general
debugging issues like [2]

In the proposed form the retriever is pretty simple/dumb: it waits for all
tests listed for a testimage run to be done, and if any of those tests has
failed, it tries to read a list of "artifacts of interest to retrieve", and
pulls those files onto the host system (next to testresults.json) for
further analysis. This is true for ALL runtime tests. So for example, a
failing test in a very basic test session running ping, ssh and ptests will
trigger artifacts retrieval, the failing test being either a ping test, a
ssh test or a ptest.
The artifacts list is provided in the form of a configuration file which
must be a valid json file, with an 'artifacts' entry at its root, pointing
to an array of aths to retrieve. Such paths must be absolute and can point
either to files or directories. For example:

{
        "artifacts": [
                "/usr/lib/lttng-tools/ptest",
                "/var/log",
		"/etc/version",
        ]
}

There is one single artifacts file to be provided for a whole test run (i.e
a run done with bitbake -c testimage). The artifacts configuration file is
not submitted with this series: current design assumes that this
configuration is linked to CI runs, so artifacts configuration file(s) may
be stored in yocto-autobuilder-helper, which then can pass the
configuration to the testimage class (where retriever is implemented)
through a new bitbake variable: ARTIFACTS_LIST_PATH

Retrieved files are pulled through scp to allow compatibility for both
Qemu and SSH targets, and are currently stored "as is"
(unarchived/uncompressed) in tmp/log/oeqa/<image>/artifacts.

The series has been tested with the following process:
- create an 'artifacts.json' file based on sample above
- add dummy failing ptest in lttng-modules through a custom patch in
  meta/recipes-kernel/lttng
- build core-image-minimal image with:
  - DISTRO_FEATURES:append = " ptest"
  - CORE_IMAGE_EXTRA_INSTALL += "dropbear lttng-tools-ptest"
  - TEST_SUITES = "ping ssh ptest"
  - IMAGE_CLASSES += "testimage"
  - ARTIFACTS_LIST_PATH="<yocto_dir>/build/artifacts.json"
- run tests: bitbake core-image-minimal -c testimage
- ensure artifacts are properly retrieved and stored

Depending on this series feedback, here is what could follow:
- documentation update for new ARTIFACTS_LIST_PATH variable and artifacts
  storage path
- adding some artifacts.json file(s) to yocto-autobuilder-helper to
  automatically pull out some artifacts for intermittent failures
- adding compression and publication of artifacts on Autobuilder web
  server

The main questioning I have about current implementation is the
granularity level to offer to select which files/directories to retrieve.
Based on Alexander's feedback on initial RFC, it may be too tedious to
specify for each ptest what file to retrieve, so I have gone for a very
simple version which list files for the whole run. But feel free to
challenge it and provide some insights if some finer controls would be more
useful.

[1] https://lore.kernel.org/openembedded-core/20230523161619a8c871d9@mail.local/T/#t
[2] https://bugzilla.yoctoproject.org/show_bug.cgi?id=14901

Alexis Lothoré (3):
  oeqa/target/ssh: update options for SCP
  testimage: shut down DUT later
  testimage: implement test artifacts retriever for failing tests

 meta/classes-recipe/testimage.bbclass | 55 ++++++++++++++++++++++++++-
 meta/lib/oeqa/core/target/ssh.py      |  6 ++-
 2 files changed, 59 insertions(+), 2 deletions(-)

-- 
2.40.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [OE-Core][RFC][PATCH 1/3] oeqa/target/ssh: update options for SCP
  2023-06-02  9:50 [OE-Core][RFC][PATCH 0/3] add failed test artifacts retriever Alexis Lothoré
@ 2023-06-02  9:50 ` Alexis Lothoré
  2023-06-02 12:02   ` Richard Purdie
  2023-06-02  9:50 ` [OE-Core][RFC][PATCH 2/3] testimage: shut down DUT later Alexis Lothoré
  2023-06-02  9:50 ` [OE-Core][RFC][PATCH 3/3] testimage: implement test artifacts retriever for failing tests Alexis Lothoré
  2 siblings, 1 reply; 16+ messages in thread
From: Alexis Lothoré @ 2023-06-02  9:50 UTC (permalink / raw)
  To: Openembedded-core; +Cc: Thomas Petazzoni, Alexandre Belloni

Add two options for SCP:
- by default scp expects sftp server to be available on target, which is
  not always true (e.g: core-image-minimal image). Pass -O to force SCP to
  fallback to legacy SCP protocol
- by default scp expects files. Passing -r option allows to copy
  directories too

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
---
 meta/lib/oeqa/core/target/ssh.py | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/meta/lib/oeqa/core/target/ssh.py b/meta/lib/oeqa/core/target/ssh.py
index 51079075b5bd..dc17ceecd74d 100644
--- a/meta/lib/oeqa/core/target/ssh.py
+++ b/meta/lib/oeqa/core/target/ssh.py
@@ -40,8 +40,12 @@ class OESSHTarget(OETarget):
                 '-o', 'StrictHostKeyChecking=no',
                 '-o', 'LogLevel=ERROR'
                 ]
+        scp_options = [
+                '-O', # Allow legacy option for images not containing SFTP server
+                '-r'
+        ]
         self.ssh = ['ssh', '-l', self.user ] + ssh_options
-        self.scp = ['scp'] + ssh_options
+        self.scp = ['scp'] + ssh_options + scp_options
         if port:
             self.ssh = self.ssh + [ '-p', port ]
             self.scp = self.scp + [ '-P', port ]
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [OE-Core][RFC][PATCH 2/3] testimage: shut down DUT later
  2023-06-02  9:50 [OE-Core][RFC][PATCH 0/3] add failed test artifacts retriever Alexis Lothoré
  2023-06-02  9:50 ` [OE-Core][RFC][PATCH 1/3] oeqa/target/ssh: update options for SCP Alexis Lothoré
@ 2023-06-02  9:50 ` Alexis Lothoré
  2023-06-02 11:55   ` Richard Purdie
  2023-06-02  9:50 ` [OE-Core][RFC][PATCH 3/3] testimage: implement test artifacts retriever for failing tests Alexis Lothoré
  2 siblings, 1 reply; 16+ messages in thread
From: Alexis Lothoré @ 2023-06-02  9:50 UTC (permalink / raw)
  To: Openembedded-core; +Cc: Thomas Petazzoni, Alexandre Belloni

Move target stop (for Qemu targets: qemu image stop) later in testimage
main function, especially _after_ having access to general test results, to
allow executing some other actions (like retrieving some files) on DUT
depending on test results (e.g. : retrieve some log files)

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
---
 meta/classes-recipe/testimage.bbclass | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/meta/classes-recipe/testimage.bbclass b/meta/classes-recipe/testimage.bbclass
index 1bf0cb450ce4..6b10c1db09f9 100644
--- a/meta/classes-recipe/testimage.bbclass
+++ b/meta/classes-recipe/testimage.bbclass
@@ -393,7 +393,6 @@ def testimage_main(d):
         results = tc.results
     finally:
         signal.signal(signal.SIGTERM, orig_sigterm_handler)
-        tc.target.stop()
 
     # Show results (if we have them)
     if results:
@@ -404,6 +403,8 @@ def testimage_main(d):
                         dump_streams=d.getVar('TESTREPORT_FULLLOGS'))
         results.logSummary(pn)
 
+    tc.target.stop()
+
     # Copy additional logs to tmp/log/oeqa so it's easier to find them
     targetdir = os.path.join(get_testimage_json_result_dir(d), d.getVar("PN"))
     os.makedirs(targetdir, exist_ok=True)
@@ -415,6 +416,7 @@ def testimage_main(d):
     if not results.wasSuccessful():
         bb.fatal('%s - FAILED - also check the logs in %s' % (pn, d.getVar("LOG_DIR")), forcelog=True)
 
+
 def get_runtime_paths(d):
     """
     Returns a list of paths where runtime test must reside.
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [OE-Core][RFC][PATCH 3/3] testimage: implement test artifacts retriever for failing tests
  2023-06-02  9:50 [OE-Core][RFC][PATCH 0/3] add failed test artifacts retriever Alexis Lothoré
  2023-06-02  9:50 ` [OE-Core][RFC][PATCH 1/3] oeqa/target/ssh: update options for SCP Alexis Lothoré
  2023-06-02  9:50 ` [OE-Core][RFC][PATCH 2/3] testimage: shut down DUT later Alexis Lothoré
@ 2023-06-02  9:50 ` Alexis Lothoré
  2023-06-02 11:46   ` Alexander Kanavin
  2 siblings, 1 reply; 16+ messages in thread
From: Alexis Lothoré @ 2023-06-02  9:50 UTC (permalink / raw)
  To: Openembedded-core; +Cc: Thomas Petazzoni, Alexandre Belloni

Add a basic artifacts retrievers in testimage class which:
- triggers when at least one runtime test fails
- reads a list of files to retrieve from ARTIFACTS_LIST_PATH
- retrieve those files over scp thanks to existing ssh class
- store those files in an "artifacts" directory in "tmp/log/oeqa/<image>"

Bring partial solution to [YOCTO #14901]

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
---
 meta/classes-recipe/testimage.bbclass | 51 +++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/meta/classes-recipe/testimage.bbclass b/meta/classes-recipe/testimage.bbclass
index 6b10c1db09f9..3fd3dd5b0264 100644
--- a/meta/classes-recipe/testimage.bbclass
+++ b/meta/classes-recipe/testimage.bbclass
@@ -18,6 +18,12 @@ inherit image-artifact-names
 
 TESTIMAGE_AUTO ??= "0"
 
+# When any test fails, if path to an artifacts configuration (listing
+# files/directories to retrieve on target) has been provided, all listed
+# elements will be downloaded from target before shutting it down
+
+ARTIFACTS_LIST_PATH ??= ""
+
 # You can set (or append to) TEST_SUITES in local.conf to select the tests
 # which you want to run for your target.
 # The test names are the module names in meta/lib/oeqa/runtime/cases.
@@ -192,6 +198,45 @@ def get_testimage_boot_patterns(d):
                 boot_patterns[flag] = flagval.encode().decode('unicode-escape')
     return boot_patterns
 
+def load_artifacts_list(artifacts_conf_path):
+    import json
+
+    if not artifacts_conf_path:
+        return None
+
+    if not os.path.isfile(artifacts_conf_path):
+        return None
+
+    try:
+        with open(artifacts_conf_path) as f:
+            artifacts_list = json.load(f)
+    except json.decoder.JSONDecodeError as e:
+        bb.warn(f"Invalid tests artifact list format: {e.lineno}:{e.colno} : {e.msg}")
+        return None
+
+    if 'artifacts' in artifacts_list:
+        return artifacts_list['artifacts']
+
+    return None
+
+def retrieve_test_artifacts(target, artifacts_list, target_dir):
+    import shutil
+
+    local_artifacts_dir = os.path.join(target_dir, "artifacts")
+    if os.path.isdir(local_artifacts_dir):
+        shutil.rmtree(local_artifacts_dir)
+
+    os.makedirs(local_artifacts_dir)
+    for artifact_path in artifacts_list:
+        if not os.path.isabs(artifact_path):
+            bb.warn(f"{artifact_path} is not an absolute path")
+            continue
+        try:
+            dest_dir = os.path.join(local_artifacts_dir, os.path.dirname(artifact_path[1:]))
+            os.makedirs(dest_dir, exist_ok=True)
+            target.copyFrom(artifact_path, dest_dir)
+        except:
+            bb.warn(f"Can not retrieve {artifact_path} from test target")
 
 def testimage_main(d):
     import os
@@ -402,6 +447,12 @@ def testimage_main(d):
                         get_testimage_result_id(configuration),
                         dump_streams=d.getVar('TESTREPORT_FULLLOGS'))
         results.logSummary(pn)
+        if not results.wasSuccessful():
+            artifacts_list = load_artifacts_list(d.getVar("ARTIFACTS_LIST_PATH"))
+            if not artifacts_list:
+                bb.warn("Could not load artifacts list, skip artifacts retrieval")
+            else:
+                retrieve_test_artifacts(tc.target, artifacts_list, get_testimage_json_result_dir(d))
 
     tc.target.stop()
 
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [OE-Core][RFC][PATCH 3/3] testimage: implement test artifacts retriever for failing tests
  2023-06-02  9:50 ` [OE-Core][RFC][PATCH 3/3] testimage: implement test artifacts retriever for failing tests Alexis Lothoré
@ 2023-06-02 11:46   ` Alexander Kanavin
  2023-06-02 11:52     ` Mikko Rapeli
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Kanavin @ 2023-06-02 11:46 UTC (permalink / raw)
  To: alexis.lothore; +Cc: Openembedded-core, Thomas Petazzoni, Alexandre Belloni

On Fri, 2 Jun 2023 at 11:50, Alexis Lothoré via lists.openembedded.org
<alexis.lothore=bootlin.com@lists.openembedded.org> wrote:
> +# When any test fails, if path to an artifacts configuration (listing
> +# files/directories to retrieve on target) has been provided, all listed
> +# elements will be downloaded from target before shutting it down
> +
> +ARTIFACTS_LIST_PATH ??= ""

I think this should be simplified further by simply specifying the
default list directly instead:

TESTIMAGE_FAILED_QA_ARTIFACTS ?= "${libdir}/*/ptest/ /var/log/ /etc/version"

Then, if desired, the list can be overridden globally, or per image as
needed  -  from the image recipes and/or regular bitbake configuration
mechanisms used by CI (local.conf etc). If the feature needs to be
switched off altogether, that can be done by setting an empty list.

Alex


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [OE-Core][RFC][PATCH 3/3] testimage: implement test artifacts retriever for failing tests
  2023-06-02 11:46   ` Alexander Kanavin
@ 2023-06-02 11:52     ` Mikko Rapeli
  2023-06-02 12:15       ` Alexis Lothoré
  0 siblings, 1 reply; 16+ messages in thread
From: Mikko Rapeli @ 2023-06-02 11:52 UTC (permalink / raw)
  To: Alexander Kanavin
  Cc: alexis.lothore, Openembedded-core, Thomas Petazzoni, Alexandre Belloni

Hi,

On Fri, Jun 02, 2023 at 01:46:03PM +0200, Alexander Kanavin wrote:
> On Fri, 2 Jun 2023 at 11:50, Alexis Lothor� via lists.openembedded.org
> <alexis.lothore=bootlin.com@lists.openembedded.org> wrote:
> > +# When any test fails, if path to an artifacts configuration (listing
> > +# files/directories to retrieve on target) has been provided, all listed
> > +# elements will be downloaded from target before shutting it down
> > +
> > +ARTIFACTS_LIST_PATH ??= ""
> 
> I think this should be simplified further by simply specifying the
> default list directly instead:
> 
> TESTIMAGE_FAILED_QA_ARTIFACTS ?= "${libdir}/*/ptest/ /var/log/ /etc/version"

Try fetching /etc/os-release too, please.

I like this patch, thanks!

Cheers,

-Mikko


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [OE-Core][RFC][PATCH 2/3] testimage: shut down DUT later
  2023-06-02  9:50 ` [OE-Core][RFC][PATCH 2/3] testimage: shut down DUT later Alexis Lothoré
@ 2023-06-02 11:55   ` Richard Purdie
  2023-06-02 12:39     ` Alexis Lothoré
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Purdie @ 2023-06-02 11:55 UTC (permalink / raw)
  To: alexis.lothore, Openembedded-core; +Cc: Thomas Petazzoni, Alexandre Belloni

On Fri, 2023-06-02 at 11:50 +0200, Alexis Lothoré via
lists.openembedded.org wrote:
> Move target stop (for Qemu targets: qemu image stop) later in testimage
> main function, especially _after_ having access to general test results, to
> allow executing some other actions (like retrieving some files) on DUT
> depending on test results (e.g. : retrieve some log files)
> 
> Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
> ---
>  meta/classes-recipe/testimage.bbclass | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/meta/classes-recipe/testimage.bbclass b/meta/classes-recipe/testimage.bbclass
> index 1bf0cb450ce4..6b10c1db09f9 100644
> --- a/meta/classes-recipe/testimage.bbclass
> +++ b/meta/classes-recipe/testimage.bbclass
> @@ -393,7 +393,6 @@ def testimage_main(d):
>          results = tc.results
>      finally:
>          signal.signal(signal.SIGTERM, orig_sigterm_handler)
> -        tc.target.stop()
>  
>      # Show results (if we have them)
>      if results:
> @@ -404,6 +403,8 @@ def testimage_main(d):
>                          dump_streams=d.getVar('TESTREPORT_FULLLOGS'))
>          results.logSummary(pn)
>  
> +    tc.target.stop()
> +

I've not really had a chance to think about this properly, sorry but
there is an issue that jumps out at me in this patch.

The stop is delibverately in a finally: clause so it always runs even
if there is an exception in the code. You've moved it outside of this
so the target would no longer be shut down any more in such an
exception case. This would potentially leave images running in the case
of exceptions and we want to avoid that.

Cheers,

Richard


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [OE-Core][RFC][PATCH 1/3] oeqa/target/ssh: update options for SCP
  2023-06-02  9:50 ` [OE-Core][RFC][PATCH 1/3] oeqa/target/ssh: update options for SCP Alexis Lothoré
@ 2023-06-02 12:02   ` Richard Purdie
  2023-06-02 13:41     ` Alexis Lothoré
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Purdie @ 2023-06-02 12:02 UTC (permalink / raw)
  To: alexis.lothore, Openembedded-core; +Cc: Thomas Petazzoni, Alexandre Belloni

On Fri, 2023-06-02 at 11:50 +0200, Alexis Lothoré via
lists.openembedded.org wrote:
> Add two options for SCP:
> - by default scp expects sftp server to be available on target, which is
>   not always true (e.g: core-image-minimal image). Pass -O to force SCP to
>   fallback to legacy SCP protocol

We deliberately switched away from SCP since it is deprecated and will
be removed in a later openssh release. This is therefore a step
backwards, we really need to do something different.

> - by default scp expects files. Passing -r option allows to copy
>   directories too

Cheers,

Richard


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [OE-Core][RFC][PATCH 3/3] testimage: implement test artifacts retriever for failing tests
  2023-06-02 11:52     ` Mikko Rapeli
@ 2023-06-02 12:15       ` Alexis Lothoré
  2023-06-02 12:21         ` Alexander Kanavin
  2023-06-02 13:07         ` Mikko Rapeli
  0 siblings, 2 replies; 16+ messages in thread
From: Alexis Lothoré @ 2023-06-02 12:15 UTC (permalink / raw)
  To: Mikko Rapeli, Alexander Kanavin
  Cc: Openembedded-core, Thomas Petazzoni, Alexandre Belloni

Hello Alexander, Mikko,

On 6/2/23 13:52, Mikko Rapeli wrote:
> Hi,
> 
> On Fri, Jun 02, 2023 at 01:46:03PM +0200, Alexander Kanavin wrote:
>> On Fri, 2 Jun 2023 at 11:50, Alexis Lothoré via lists.openembedded.org
>> <alexis.lothore=bootlin.com@lists.openembedded.org> wrote:
>>> +# When any test fails, if path to an artifacts configuration (listing
>>> +# files/directories to retrieve on target) has been provided, all listed
>>> +# elements will be downloaded from target before shutting it down
>>> +
>>> +ARTIFACTS_LIST_PATH ??= ""
>>
>> I think this should be simplified further by simply specifying the
>> default list directly instead:
>>
>> TESTIMAGE_FAILED_QA_ARTIFACTS ?= "${libdir}/*/ptest/ /var/log/ /etc/version"

I wondered about this possibility while adding this variable, and at this time I
thought it would encode to much "CI logic" in testimage class. But since you
raise the suggestion, I can update to pass directly the file list, which could
indeed be overridden by CI tests runners. We can start with those three paths in
the default value.

Another point raised by your example is the lack of wildcards management in
artifacts path, which would clearly ease the listing, especially for ptests.
I'll see if I can come up with something better for this
> 
> Try fetching /etc/os-release too, please.

ACK, will add it to default list
> 
> I like this patch, thanks!
> 
> Cheers,
> 
> -Mikko
> 
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#182307): https://lists.openembedded.org/g/openembedded-core/message/182307
> Mute This Topic: https://lists.openembedded.org/mt/99283034/7394887
> Group Owner: openembedded-core+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [alexis.lothore@bootlin.com]
> -=-=-=-=-=-=-=-=-=-=-=-
> 

-- 
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [OE-Core][RFC][PATCH 3/3] testimage: implement test artifacts retriever for failing tests
  2023-06-02 12:15       ` Alexis Lothoré
@ 2023-06-02 12:21         ` Alexander Kanavin
  2023-06-02 13:07         ` Mikko Rapeli
  1 sibling, 0 replies; 16+ messages in thread
From: Alexander Kanavin @ 2023-06-02 12:21 UTC (permalink / raw)
  To: Alexis Lothoré
  Cc: Mikko Rapeli, Openembedded-core, Thomas Petazzoni, Alexandre Belloni

On Fri, 2 Jun 2023 at 14:14, Alexis Lothoré <alexis.lothore@bootlin.com> wrote:
> I wondered about this possibility while adding this variable, and at this time I
> thought it would encode to much "CI logic" in testimage class. But since you
> raise the suggestion, I can update to pass directly the file list, which could
> indeed be overridden by CI tests runners. We can start with those three paths in
> the default value.

But this isn't CI logic. It's bitbake's QA logic, which can and should
be available in local builds. Running testimage locally is a supported
and regularly used operation, so it should work exactly same as in CI.

Alex


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [OE-Core][RFC][PATCH 2/3] testimage: shut down DUT later
  2023-06-02 11:55   ` Richard Purdie
@ 2023-06-02 12:39     ` Alexis Lothoré
  0 siblings, 0 replies; 16+ messages in thread
From: Alexis Lothoré @ 2023-06-02 12:39 UTC (permalink / raw)
  To: Richard Purdie, Openembedded-core; +Cc: Thomas Petazzoni, Alexandre Belloni

Hello Richard,
On 6/2/23 13:55, Richard Purdie wrote:
> On Fri, 2023-06-02 at 11:50 +0200, Alexis Lothoré via
> lists.openembedded.org wrote:
>> Move target stop (for Qemu targets: qemu image stop) later in testimage
>> main function, especially _after_ having access to general test results, to
>> allow executing some other actions (like retrieving some files) on DUT
>> depending on test results (e.g. : retrieve some log files)
>>
>> Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
>> ---
>>  meta/classes-recipe/testimage.bbclass | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/meta/classes-recipe/testimage.bbclass b/meta/classes-recipe/testimage.bbclass
>> index 1bf0cb450ce4..6b10c1db09f9 100644
>> --- a/meta/classes-recipe/testimage.bbclass
>> +++ b/meta/classes-recipe/testimage.bbclass
>> @@ -393,7 +393,6 @@ def testimage_main(d):
>>          results = tc.results
>>      finally:
>>          signal.signal(signal.SIGTERM, orig_sigterm_handler)
>> -        tc.target.stop()
>>  
>>      # Show results (if we have them)
>>      if results:
>> @@ -404,6 +403,8 @@ def testimage_main(d):
>>                          dump_streams=d.getVar('TESTREPORT_FULLLOGS'))
>>          results.logSummary(pn)
>>  
>> +    tc.target.stop()
>> +
> 
> I've not really had a chance to think about this properly, sorry but
> there is an issue that jumps out at me in this patch.
> 
> The stop is delibverately in a finally: clause so it always runs even
> if there is an exception in the code. You've moved it outside of this
> so the target would no longer be shut down any more in such an
> exception case. This would potentially leave images running in the case
> of exceptions and we want to avoid that.

I felt the danger while touching this part (and tested at least with a Ctrl-C
that everything was behaving correctly), but indeed I did not think enough about
the "any exception other than KeyboardInterrupt or BlockingIOError". I'll then
drop the patch moving this stop, and if it's ok, move the artifacts retrieval in
the finally clause, just before the stop.

> Cheers,
> 
> Richard

-- 
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [OE-Core][RFC][PATCH 3/3] testimage: implement test artifacts retriever for failing tests
  2023-06-02 12:15       ` Alexis Lothoré
  2023-06-02 12:21         ` Alexander Kanavin
@ 2023-06-02 13:07         ` Mikko Rapeli
  2023-06-07  7:36           ` Alexis Lothoré
  1 sibling, 1 reply; 16+ messages in thread
From: Mikko Rapeli @ 2023-06-02 13:07 UTC (permalink / raw)
  To: Alexis Lothoré
  Cc: Alexander Kanavin, Openembedded-core, Thomas Petazzoni,
	Alexandre Belloni

Hi,

These changes are an improvement, but based on my experience in product test automation,
instead of collecting logs after testing is completed, it is better to capture logs
from the target device while tests are being executed. Serial console, systemd journal
etc logs can be captured as streams from the device under test (DUT), and time stamps added
from the test controller so that aligning events like test command execution and log messages
is possible. It is also a good idea to capture core dumps via this kind of stream(s).
Problems with HW and BSP SW may not be visible after testing completes (device has
rebooted, reset) or capturing data is not possible at all (due to system, kernel, userspace hang,
oom or too much load).

This capturing of logs could be implemented by adding some configurable variables to execute
commands on the test controller and inside oeqa environment at certain stages of testing, for example
after serial console login prompt has been detected. Command to execute could be a simple
"ssh -c 'journalctl -b -a'" to capture boot logs and "ssh 'journalctl -a -f'" and log the output data with
additional time stamps to bitbake task output or a separate file.

Just thinking out loud here, these changes are an improvement over current situation already.
Thanks for sending these patches!

Cheers,

-Mikko


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [OE-Core][RFC][PATCH 1/3] oeqa/target/ssh: update options for SCP
  2023-06-02 12:02   ` Richard Purdie
@ 2023-06-02 13:41     ` Alexis Lothoré
  2023-06-02 13:43       ` Richard Purdie
  0 siblings, 1 reply; 16+ messages in thread
From: Alexis Lothoré @ 2023-06-02 13:41 UTC (permalink / raw)
  To: Richard Purdie, Openembedded-core; +Cc: Thomas Petazzoni, Alexandre Belloni

On 6/2/23 14:02, Richard Purdie wrote:
> On Fri, 2023-06-02 at 11:50 +0200, Alexis Lothoré via
> lists.openembedded.org wrote:
>> Add two options for SCP:
>> - by default scp expects sftp server to be available on target, which is
>>   not always true (e.g: core-image-minimal image). Pass -O to force SCP to
>>   fallback to legacy SCP protocol
> 
> We deliberately switched away from SCP since it is deprecated and will
> be removed in a later openssh release. This is therefore a step
> backwards, we really need to do something different.

Understood. Digging a bit about the deprecation issue and why it did not work on
my machine without this option, and based on fixes seen in mailing list around
this issue, I found out that my image was not configured properly (adding
dropbear package instead of enabling it as an IMAGE_FEATURE, which brings in
sftp support)
I guess keeping scp command but with sftp support instead of legacy scp support
is ok ? Or you are in fact talking about completely stopping using the scp
binary, even with sftp support ?

Kind regards,
> 
>> - by default scp expects files. Passing -r option allows to copy
>>   directories too
> 
> Cheers,
> 
> Richard

-- 
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [OE-Core][RFC][PATCH 1/3] oeqa/target/ssh: update options for SCP
  2023-06-02 13:41     ` Alexis Lothoré
@ 2023-06-02 13:43       ` Richard Purdie
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Purdie @ 2023-06-02 13:43 UTC (permalink / raw)
  To: Alexis Lothoré, Openembedded-core
  Cc: Thomas Petazzoni, Alexandre Belloni

On Fri, 2023-06-02 at 15:41 +0200, Alexis Lothoré wrote:
> On 6/2/23 14:02, Richard Purdie wrote:
> > On Fri, 2023-06-02 at 11:50 +0200, Alexis Lothoré via
> > lists.openembedded.org wrote:
> > > Add two options for SCP:
> > > - by default scp expects sftp server to be available on target, which is
> > >   not always true (e.g: core-image-minimal image). Pass -O to force SCP to
> > >   fallback to legacy SCP protocol
> > 
> > We deliberately switched away from SCP since it is deprecated and will
> > be removed in a later openssh release. This is therefore a step
> > backwards, we really need to do something different.
> 
> Understood. Digging a bit about the deprecation issue and why it did not work on
> my machine without this option, and based on fixes seen in mailing list around
> this issue, I found out that my image was not configured properly (adding
> dropbear package instead of enabling it as an IMAGE_FEATURE, which brings in
> sftp support)
> I guess keeping scp command but with sftp support instead of legacy scp support
> is ok ? Or you are in fact talking about completely stopping using the scp
> binary, even with sftp support ?

Using scp is fine but I don't think we can rely on the SCP protocol
staying around.

Cheers,

Richard


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [OE-Core][RFC][PATCH 3/3] testimage: implement test artifacts retriever for failing tests
  2023-06-02 13:07         ` Mikko Rapeli
@ 2023-06-07  7:36           ` Alexis Lothoré
  2023-06-07  7:46             ` Mikko Rapeli
  0 siblings, 1 reply; 16+ messages in thread
From: Alexis Lothoré @ 2023-06-07  7:36 UTC (permalink / raw)
  To: Mikko Rapeli
  Cc: Alexander Kanavin, Openembedded-core, Thomas Petazzoni,
	Alexandre Belloni

Hi Mikko, sorry for late reply, and thanks for the additional feedback

On 6/2/23 15:07, Mikko Rapeli wrote:
> Hi,
> 
> These changes are an improvement, but based on my experience in product test automation,
> instead of collecting logs after testing is completed, it is better to capture logs
> from the target device while tests are being executed. Serial console, systemd journal
> etc logs can be captured as streams from the device under test (DUT), and time stamps added
> from the test controller so that aligning events like test command execution and log messages
> is possible. It is also a good idea to capture core dumps via this kind of stream(s).
> Problems with HW and BSP SW may not be visible after testing completes (device has
> rebooted, reset) or capturing data is not possible at all (due to system, kernel, userspace hang,
> oom or too much load).
> 
> This capturing of logs could be implemented by adding some configurable variables to execute
> commands on the test controller and inside oeqa environment at certain stages of testing, for example
> after serial console login prompt has been detected. Command to execute could be a simple
> "ssh -c 'journalctl -b -a'" to capture boot logs and "ssh 'journalctl -a -f'" and log the output data with
> additional time stamps to bitbake task output or a separate file.
> 
> Just thinking out loud here, these changes are an improvement over current situation already.
> Thanks for sending these patches!

While I tend to agree with what you are suggesting, I feel like there are some
new constraints, because of targets variety: is the system a real target or a
qemu image ? Does it run systemd/journald ? Are coredumps enabled by default ?
Of course those constraints may be quite easy to circumvent, moreover we can
think of a "best effort" solution which gathers "what it can" from running
target. Also, there may be some corner cases that would need some specific
handling: some tests can be quite long (hours), what about broken pipes during
those ? This would definitely need some re-connection strategies for example.

Since current proposed solution is not very invasive/has not much requirements
except a valid ssh access, I plan to update the series and let some real testing
show if it is relevant. If not (or not enough), we may give a try to a "runtime"
version like the one you are suggesting ?

Thanks,
Alexis
> 
> Cheers,
> 
> -Mikko
> 
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#182316): https://lists.openembedded.org/g/openembedded-core/message/182316
> Mute This Topic: https://lists.openembedded.org/mt/99283034/7394887
> Group Owner: openembedded-core+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [alexis.lothore@bootlin.com]
> -=-=-=-=-=-=-=-=-=-=-=-
> 

-- 
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [OE-Core][RFC][PATCH 3/3] testimage: implement test artifacts retriever for failing tests
  2023-06-07  7:36           ` Alexis Lothoré
@ 2023-06-07  7:46             ` Mikko Rapeli
  0 siblings, 0 replies; 16+ messages in thread
From: Mikko Rapeli @ 2023-06-07  7:46 UTC (permalink / raw)
  To: Alexis Lothoré
  Cc: Alexander Kanavin, Openembedded-core, Thomas Petazzoni,
	Alexandre Belloni

Hi,

On Wed, Jun 07, 2023 at 09:36:07AM +0200, Alexis Lothor� wrote:
> Hi Mikko, sorry for late reply, and thanks for the additional feedback
> 
> On 6/2/23 15:07, Mikko Rapeli wrote:
> > Hi,
> > 
> > These changes are an improvement, but based on my experience in product test automation,
> > instead of collecting logs after testing is completed, it is better to capture logs
> > from the target device while tests are being executed. Serial console, systemd journal
> > etc logs can be captured as streams from the device under test (DUT), and time stamps added
> > from the test controller so that aligning events like test command execution and log messages
> > is possible. It is also a good idea to capture core dumps via this kind of stream(s).
> > Problems with HW and BSP SW may not be visible after testing completes (device has
> > rebooted, reset) or capturing data is not possible at all (due to system, kernel, userspace hang,
> > oom or too much load).
> > 
> > This capturing of logs could be implemented by adding some configurable variables to execute
> > commands on the test controller and inside oeqa environment at certain stages of testing, for example
> > after serial console login prompt has been detected. Command to execute could be a simple
> > "ssh -c 'journalctl -b -a'" to capture boot logs and "ssh 'journalctl -a -f'" and log the output data with
> > additional time stamps to bitbake task output or a separate file.
> > 
> > Just thinking out loud here, these changes are an improvement over current situation already.
> > Thanks for sending these patches!
> 
> While I tend to agree with what you are suggesting, I feel like there are some
> new constraints, because of targets variety: is the system a real target or a
> qemu image ? Does it run systemd/journald ? Are coredumps enabled by default ?
> Of course those constraints may be quite easy to circumvent, moreover we can
> think of a "best effort" solution which gathers "what it can" from running
> target. Also, there may be some corner cases that would need some specific
> handling: some tests can be quite long (hours), what about broken pipes during
> those ? This would definitely need some re-connection strategies for example.

Agreed. Currently our qemu reboot tests are failing for due some kind
of reconnection issue, for example. I'll try to find time to fix this...

> Since current proposed solution is not very invasive/has not much requirements
> except a valid ssh access, I plan to update the series and let some real testing
> show if it is relevant. If not (or not enough), we may give a try to a "runtime"
> version like the one you are suggesting ?

Yes, the series is a clear improvement over the current state.

Cheers,

-Mikko


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-06-07  7:46 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-02  9:50 [OE-Core][RFC][PATCH 0/3] add failed test artifacts retriever Alexis Lothoré
2023-06-02  9:50 ` [OE-Core][RFC][PATCH 1/3] oeqa/target/ssh: update options for SCP Alexis Lothoré
2023-06-02 12:02   ` Richard Purdie
2023-06-02 13:41     ` Alexis Lothoré
2023-06-02 13:43       ` Richard Purdie
2023-06-02  9:50 ` [OE-Core][RFC][PATCH 2/3] testimage: shut down DUT later Alexis Lothoré
2023-06-02 11:55   ` Richard Purdie
2023-06-02 12:39     ` Alexis Lothoré
2023-06-02  9:50 ` [OE-Core][RFC][PATCH 3/3] testimage: implement test artifacts retriever for failing tests Alexis Lothoré
2023-06-02 11:46   ` Alexander Kanavin
2023-06-02 11:52     ` Mikko Rapeli
2023-06-02 12:15       ` Alexis Lothoré
2023-06-02 12:21         ` Alexander Kanavin
2023-06-02 13:07         ` Mikko Rapeli
2023-06-07  7:36           ` Alexis Lothoré
2023-06-07  7:46             ` Mikko Rapeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).