openembedded-core.lists.openembedded.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] scripts/resulttool/regression: add metadata filtering
@ 2023-02-14 16:53 alexis.lothore
  2023-02-14 16:53 ` [PATCH v2 1/4] scripts/oe-selftest: append metadata to tests results alexis.lothore
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: alexis.lothore @ 2023-02-14 16:53 UTC (permalink / raw)
  To: openembedded-core; +Cc: alexandre.belloni, thomas.petazzoni

From: Alexis Lothoré <alexis.lothore@bootlin.com>

This v2 does not contain any change in patches content, it only sets the From:
field correctly. Sorry for the noise.

This patch serie is a proposal linked to discussion initiated here:
https://lists.yoctoproject.org/g/automated-testing/topic/96652823#1219

After integration of some improvements on regression reporting, it has been
observed that the regression report of version 4.2_M2 is way too big. When
checking it, it appears that a big part of the report is composed of "missing
tests" (regression detected because test status changed from "PASS" to "None").
It is mostly due to oeselftest results, since oeselftest is run multiple time
for a single build, but not with the same parameters (so not the same tests
"sets"), so those test sets are not comparable.

The proposed serie introduce OSELFTEST_METADATA appended to tests results when
the TEST_TYPE is "oeselftest". An oeselftest result with those metadata looks
like this:
	[...]
	"configuration": {
		"HOST_DISTRO": "fedora-36",
		"HOST_NAME": "fedora36-ty-3",
		"LAYERS": {
			[...]
		},
		"MACHINE": "qemux86",
		"STARTTIME": "20230126235248",
		"TESTSERIES": "qemux86",
		"TEST_TYPE": "oeselftest",
		"OESELFTEST_METADATA": {
		    "run_all_tests": true,
		    "run_tests": null,
		    "skips": null,
		    "machine": null,
		    "select_tags": ["toolchain-user", "toolchain-system"],
		    "exclude_tags": null
		} 
 	}
	[...]

Additionally, the serie now makes resulttool look at a METADATA_MATCH_TABLE,
which tells that when compared test results have a specific TEST_TYPE, it should
look for some specific metadata to know if tests can be compared or not. It will
then remove all the false positive in regression reports due to tests present in
base results but not found in target results because of skipped tests/excluded
tags

* this serie prioritize retro-compatibility: if the base test is older (ie: it
does not have the needed metadata), it will consider tests as "comparable"
* additionally to tests added in oeqa test cases, some "best effort" manual
testing has been done, with the following cases:
  - run a basic test (e.g: `oeselftest -r tinfoils`), collect test result, break
    test, collect result, ensure tests are compared. Change oeselftest
    parameters, ensure tests are not compared
  - collect base and target tests results from 4.2_M2 regression report,
    manually add new metadata to some tests, replay regression report, ensure
    that regressions are kept or discarded depending on the metadata

Alexis Lothoré (4):
  scripts/oe-selftest: append metadata to tests results
  oeqa/selftest/resulttooltests: fix minor typo
  scripts/resulttool/regression: add metadata filtering for oeselftest
  oeqa/selftest/resulttool: add test for metadata filtering on
    regression

 .../oeqa/selftest/cases/resulttooltests.py    | 123 +++++++++++++++++-
 meta/lib/oeqa/selftest/context.py             |  15 ++-
 scripts/lib/resulttool/regression.py          |  34 +++++
 3 files changed, 170 insertions(+), 2 deletions(-)

-- 
2.39.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/4] scripts/oe-selftest: append metadata to tests results
  2023-02-14 16:53 [PATCH v2 0/4] scripts/resulttool/regression: add metadata filtering alexis.lothore
@ 2023-02-14 16:53 ` alexis.lothore
  2023-02-14 16:53 ` [PATCH v2 2/4] oeqa/selftest/resulttooltests: fix minor typo alexis.lothore
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: alexis.lothore @ 2023-02-14 16:53 UTC (permalink / raw)
  To: openembedded-core; +Cc: alexandre.belloni, thomas.petazzoni

From: Alexis Lothoré <alexis.lothore@bootlin.com>

Many stored results TEST_TYPE are set to "oeselftest", however all those
tests are not run with the same sets of parameters, so those tests results may
not be comparable.

Attach relevant parameters as tests metadata to allow identifying tests
configuration so we can compare tests only when they are run with the same
parameters.

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
---
 meta/lib/oeqa/selftest/context.py | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/meta/lib/oeqa/selftest/context.py b/meta/lib/oeqa/selftest/context.py
index c7dd03ce37..8cc46283ed 100644
--- a/meta/lib/oeqa/selftest/context.py
+++ b/meta/lib/oeqa/selftest/context.py
@@ -22,6 +22,17 @@ from oeqa.core.exception import OEQAPreRun, OEQATestNotFound
 
 from oeqa.utils.commands import runCmd, get_bb_vars, get_test_layer
 
+OESELFTEST_METADATA=["run_all_tests", "run_tests", "skips", "machine", "select_tags", "exclude_tags"]
+
+def get_oeselftest_metadata(args):
+    result = {}
+    raw_args = vars(args)
+    for metadata in OESELFTEST_METADATA:
+        if metadata in raw_args:
+            result[metadata] = raw_args[metadata]
+
+    return result
+
 class NonConcurrentTestSuite(unittest.TestSuite):
     def __init__(self, suite, processes, setupfunc, removefunc):
         super().__init__([suite])
@@ -334,12 +345,14 @@ class OESelftestTestContextExecutor(OETestContextExecutor):
         import platform
         from oeqa.utils.metadata import metadata_from_bb
         metadata = metadata_from_bb()
+        oeselftest_metadata = get_oeselftest_metadata(args)
         configuration = {'TEST_TYPE': 'oeselftest',
                         'STARTTIME': args.test_start_time,
                         'MACHINE': self.tc.td["MACHINE"],
                         'HOST_DISTRO': oe.lsb.distro_identifier().replace(' ', '-'),
                         'HOST_NAME': metadata['hostname'],
-                        'LAYERS': metadata['layers']}
+                        'LAYERS': metadata['layers'],
+                        'OESELFTEST_METADATA':oeselftest_metadata}
         return configuration
 
     def get_result_id(self, configuration):
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/4] oeqa/selftest/resulttooltests: fix minor typo
  2023-02-14 16:53 [PATCH v2 0/4] scripts/resulttool/regression: add metadata filtering alexis.lothore
  2023-02-14 16:53 ` [PATCH v2 1/4] scripts/oe-selftest: append metadata to tests results alexis.lothore
@ 2023-02-14 16:53 ` alexis.lothore
  2023-02-14 16:53 ` [PATCH v2 3/4] scripts/resulttool/regression: add metadata filtering for oeselftest alexis.lothore
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: alexis.lothore @ 2023-02-14 16:53 UTC (permalink / raw)
  To: openembedded-core; +Cc: alexandre.belloni, thomas.petazzoni

From: Alexis Lothoré <alexis.lothore@bootlin.com>

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
---
 meta/lib/oeqa/selftest/cases/resulttooltests.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meta/lib/oeqa/selftest/cases/resulttooltests.py b/meta/lib/oeqa/selftest/cases/resulttooltests.py
index c2e76f1a44..efdfd98af3 100644
--- a/meta/lib/oeqa/selftest/cases/resulttooltests.py
+++ b/meta/lib/oeqa/selftest/cases/resulttooltests.py
@@ -71,7 +71,7 @@ class ResultToolTests(OESelftestTestCase):
         self.assertTrue('target_result1' in results['runtime/mydistro/qemux86/image'], msg="Pair not correct:%s" % results)
         self.assertTrue('target_result3' in results['runtime/mydistro/qemux86-64/image'], msg="Pair not correct:%s" % results)
 
-    def test_regrresion_can_get_regression_result(self):
+    def test_regression_can_get_regression_result(self):
         base_result_data = {'result': {'test1': {'status': 'PASSED'},
                                        'test2': {'status': 'PASSED'},
                                        'test3': {'status': 'FAILED'},
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 3/4] scripts/resulttool/regression: add metadata filtering for oeselftest
  2023-02-14 16:53 [PATCH v2 0/4] scripts/resulttool/regression: add metadata filtering alexis.lothore
  2023-02-14 16:53 ` [PATCH v2 1/4] scripts/oe-selftest: append metadata to tests results alexis.lothore
  2023-02-14 16:53 ` [PATCH v2 2/4] oeqa/selftest/resulttooltests: fix minor typo alexis.lothore
@ 2023-02-14 16:53 ` alexis.lothore
  2023-02-14 16:53 ` [PATCH v2 4/4] oeqa/selftest/resulttool: add test for metadata filtering on regression alexis.lothore
  2023-02-16  0:02 ` [OE-core] [PATCH v2 0/4] scripts/resulttool/regression: add metadata filtering Richard Purdie
  4 siblings, 0 replies; 8+ messages in thread
From: alexis.lothore @ 2023-02-14 16:53 UTC (permalink / raw)
  To: openembedded-core; +Cc: alexandre.belloni, thomas.petazzoni

From: Alexis Lothoré <alexis.lothore@bootlin.com>

When generating regression reports, many false positive can be observed since
some tests results are compared while the corresponding tests sets are not the
same, as it can be seen for example for oeselftest tests (oeselftest is run
multiple time but with different parameters, resulting in different tests sets)

Add a filtering mechanism in resulttool regression module to enable a better
matching between tests. The METADATA_MATCH_TABLE defines that when the TEST_TYPE
is "oeselftest", then resulttool should filter pairs based on
OESELFTEST_METADATA appended to test configuration. If metadata is absent from
"base" test results, tests are marked "comparable" to preserve compatibility
with test results which still do not have those new metadata.

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
---
 scripts/lib/resulttool/regression.py | 34 ++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/scripts/lib/resulttool/regression.py b/scripts/lib/resulttool/regression.py
index 9f952951b3..64d1eeee37 100644
--- a/scripts/lib/resulttool/regression.py
+++ b/scripts/lib/resulttool/regression.py
@@ -12,6 +12,36 @@ import json
 from oeqa.utils.git import GitRepo
 import oeqa.utils.gitarchive as gitarchive
 
+METADATA_MATCH_TABLE={
+    "oeselftest": "OESELFTEST_METADATA"
+}
+
+
+def metadata_matches(base_configuration, target_configuration):
+    """
+    For passed base and target, check test type. If test type matches one of
+    properties described in METADATA_MATCH_TABLE, compare metadata if it is
+    present in base. Return true if metadata matches, or if base lacks some
+    data (either TEST_TYPE or the corresponding metadata)
+    """
+    test_type=base_configuration.get('TEST_TYPE')
+    metadata_key=METADATA_MATCH_TABLE.get(test_type)
+    if metadata_key not in base_configuration:
+        return True
+
+    if target_configuration.get(metadata_key) != base_configuration[metadata_key]:
+        return False
+
+    return True
+
+def can_be_compared(base_configuration, target_configuration):
+    """
+    Some tests are not relevant to be compared, for example some oeselftest
+    run with different tests sets or parameters. Return true if tests can be
+    compared
+    """
+    return metadata_matches(base_configuration, target_configuration)
+
 def compare_result(logger, base_name, target_name, base_result, target_result):
     base_result = base_result.get('result')
     target_result = target_result.get('result')
@@ -62,6 +92,8 @@ def regression_common(args, logger, base_results, target_results):
             # removing any pairs which match
             for c in base.copy():
                 for b in target.copy():
+                    if not can_be_compared(base_results[a][c]['configuration'], target_results[a][b]['configuration']):
+                        continue
                     res, resstr = compare_result(logger, c, b, base_results[a][c], target_results[a][b])
                     if not res:
                         matches.append(resstr)
@@ -71,6 +103,8 @@ def regression_common(args, logger, base_results, target_results):
             # Should only now see regressions, we may not be able to match multiple pairs directly
             for c in base:
                 for b in target:
+                    if not can_be_compared(base_results[a][c]['configuration'], target_results[a][b]['configuration']):
+                        continue
                     res, resstr = compare_result(logger, c, b, base_results[a][c], target_results[a][b])
                     if res:
                         regressions.append(resstr)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 4/4] oeqa/selftest/resulttool: add test for metadata filtering on regression
  2023-02-14 16:53 [PATCH v2 0/4] scripts/resulttool/regression: add metadata filtering alexis.lothore
                   ` (2 preceding siblings ...)
  2023-02-14 16:53 ` [PATCH v2 3/4] scripts/resulttool/regression: add metadata filtering for oeselftest alexis.lothore
@ 2023-02-14 16:53 ` alexis.lothore
  2023-02-16  0:02 ` [OE-core] [PATCH v2 0/4] scripts/resulttool/regression: add metadata filtering Richard Purdie
  4 siblings, 0 replies; 8+ messages in thread
From: alexis.lothore @ 2023-02-14 16:53 UTC (permalink / raw)
  To: openembedded-core; +Cc: alexandre.belloni, thomas.petazzoni

From: Alexis Lothoré <alexis.lothore@bootlin.com>

Introduce new tests for the metadata-based filtering added for oeselftest
results

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
---
 .../oeqa/selftest/cases/resulttooltests.py    | 121 ++++++++++++++++++
 1 file changed, 121 insertions(+)

diff --git a/meta/lib/oeqa/selftest/cases/resulttooltests.py b/meta/lib/oeqa/selftest/cases/resulttooltests.py
index efdfd98af3..e93796e145 100644
--- a/meta/lib/oeqa/selftest/cases/resulttooltests.py
+++ b/meta/lib/oeqa/selftest/cases/resulttooltests.py
@@ -98,3 +98,124 @@ class ResultToolTests(OESelftestTestCase):
         resultutils.append_resultsdata(results, ResultToolTests.target_results_data, configmap=resultutils.flatten_map)
         self.assertEqual(len(results[''].keys()), 5, msg="Flattened results not correct %s" % str(results))
 
+    def test_results_without_metadata_can_be_compared(self):
+        base_configuration = {"TEST_TYPE": "oeselftest",
+                              "TESTSERIES": "series1",
+                              "IMAGE_BASENAME": "image",
+                              "IMAGE_PKGTYPE": "ipk",
+                              "DISTRO": "mydistro",
+                              "MACHINE": "qemux86"}
+        target_configuration = {"TEST_TYPE": "oeselftest",
+                                "TESTSERIES": "series1",
+                                "IMAGE_BASENAME": "image",
+                                "IMAGE_PKGTYPE": "ipk",
+                                "DISTRO": "mydistro",
+                                "MACHINE": "qemux86"}
+        self.assertTrue(regression.can_be_compared(base_configuration, target_configuration),
+                        msg="incorrect metadata filtering, tests without metadata should be compared")
+
+    def test_target_result_with_missing_metadata_can_not_be_compared(self):
+        base_configuration = {"TEST_TYPE": "oeselftest",
+                              "TESTSERIES": "series1",
+                              "IMAGE_BASENAME": "image",
+                              "IMAGE_PKGTYPE": "ipk",
+                              "DISTRO": "mydistro",
+                              "MACHINE": "qemux86",
+                              "OESELFTEST_METADATA": {"run_all_tests": True,
+                                                      "run_tests": None,
+                                                      "skips": None,
+                                                      "machine": None,
+                                                      "select_tags": ["toolchain-user", "toolchain-system"],
+                                                      "exclude_tags": None}}
+        target_configuration = {"TEST_TYPE": "oeselftest",
+                                "TESTSERIES": "series1",
+                                "IMAGE_BASENAME": "image",
+                                "IMAGE_PKGTYPE": "ipk",
+                                "DISTRO": "mydistro",
+                                "MACHINE": "qemux86"}
+        self.assertFalse(regression.can_be_compared(base_configuration, target_configuration),
+                         msg="incorrect metadata filtering, tests should not be compared")
+
+    def test_results_with_matching_metadata_can_be_compared(self):
+        base_configuration = {"TEST_TYPE": "oeselftest",
+                              "TESTSERIES": "series1",
+                              "IMAGE_BASENAME": "image",
+                              "IMAGE_PKGTYPE": "ipk",
+                              "DISTRO": "mydistro",
+                              "MACHINE": "qemux86",
+                              "OESELFTEST_METADATA": {"run_all_tests": True,
+                                                      "run_tests": None,
+                                                      "skips": None,
+                                                      "machine": None,
+                                                      "select_tags": ["toolchain-user", "toolchain-system"],
+                                                      "exclude_tags": None}}
+        target_configuration = {"TEST_TYPE": "oeselftest",
+                                "TESTSERIES": "series1",
+                                "IMAGE_BASENAME": "image",
+                                "IMAGE_PKGTYPE": "ipk",
+                                "DISTRO": "mydistro",
+                                "MACHINE": "qemux86",
+                                "OESELFTEST_METADATA": {"run_all_tests": True,
+                                                        "run_tests": None,
+                                                        "skips": None,
+                                                        "machine": None,
+                                                        "select_tags": ["toolchain-user", "toolchain-system"],
+                                                        "exclude_tags": None}}
+        self.assertTrue(regression.can_be_compared(base_configuration, target_configuration),
+                        msg="incorrect metadata filtering, tests with matching metadata should be compared")
+
+    def test_results_with_mismatching_metadata_can_not_be_compared(self):
+        base_configuration = {"TEST_TYPE": "oeselftest",
+                              "TESTSERIES": "series1",
+                              "IMAGE_BASENAME": "image",
+                              "IMAGE_PKGTYPE": "ipk",
+                              "DISTRO": "mydistro",
+                              "MACHINE": "qemux86",
+                              "OESELFTEST_METADATA": {"run_all_tests": True,
+                                                      "run_tests": None,
+                                                      "skips": None,
+                                                      "machine": None,
+                                                      "select_tags": ["toolchain-user", "toolchain-system"],
+                                                      "exclude_tags": None}}
+        target_configuration = {"TEST_TYPE": "oeselftest",
+                                "TESTSERIES": "series1",
+                                "IMAGE_BASENAME": "image",
+                                "IMAGE_PKGTYPE": "ipk",
+                                "DISTRO": "mydistro",
+                                "MACHINE": "qemux86",
+                                "OESELFTEST_METADATA": {"run_all_tests": True,
+                                                        "run_tests": None,
+                                                        "skips": None,
+                                                        "machine": None,
+                                                        "select_tags": ["machine"],
+                                                        "exclude_tags": None}}
+        self.assertFalse(regression.can_be_compared(base_configuration, target_configuration),
+                         msg="incorrect metadata filtering, tests with mismatching metadata should not be compared")
+
+    def test_metadata_matching_is_only_checked_for_relevant_test_type(self):
+        base_configuration = {"TEST_TYPE": "runtime",
+                              "TESTSERIES": "series1",
+                              "IMAGE_BASENAME": "image",
+                              "IMAGE_PKGTYPE": "ipk",
+                              "DISTRO": "mydistro",
+                              "MACHINE": "qemux86",
+                              "OESELFTEST_METADATA": {"run_all_tests": True,
+                                                      "run_tests": None,
+                                                      "skips": None,
+                                                      "machine": None,
+                                                      "select_tags": ["toolchain-user", "toolchain-system"],
+                                                      "exclude_tags": None}}
+        target_configuration = {"TEST_TYPE": "runtime",
+                                "TESTSERIES": "series1",
+                                "IMAGE_BASENAME": "image",
+                                "IMAGE_PKGTYPE": "ipk",
+                                "DISTRO": "mydistro",
+                                "MACHINE": "qemux86",
+                                "OESELFTEST_METADATA": {"run_all_tests": True,
+                                                        "run_tests": None,
+                                                        "skips": None,
+                                                        "machine": None,
+                                                        "select_tags": ["machine"],
+                                                        "exclude_tags": None}}
+        self.assertTrue(regression.can_be_compared(base_configuration, target_configuration),
+                         msg="incorrect metadata filtering, %s tests should be compared" % base_configuration['TEST_TYPE'])
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [OE-core] [PATCH v2 0/4] scripts/resulttool/regression: add metadata filtering
  2023-02-14 16:53 [PATCH v2 0/4] scripts/resulttool/regression: add metadata filtering alexis.lothore
                   ` (3 preceding siblings ...)
  2023-02-14 16:53 ` [PATCH v2 4/4] oeqa/selftest/resulttool: add test for metadata filtering on regression alexis.lothore
@ 2023-02-16  0:02 ` Richard Purdie
  2023-02-16  8:56   ` Alexis Lothoré
  4 siblings, 1 reply; 8+ messages in thread
From: Richard Purdie @ 2023-02-16  0:02 UTC (permalink / raw)
  To: alexis.lothore, openembedded-core; +Cc: alexandre.belloni, thomas.petazzoni

On Tue, 2023-02-14 at 17:53 +0100, Alexis Lothoré via
lists.openembedded.org wrote:
> From: Alexis Lothoré <alexis.lothore@bootlin.com>
> 
> This v2 does not contain any change in patches content, it only sets the From:
> field correctly. Sorry for the noise.
> 
> This patch serie is a proposal linked to discussion initiated here:
> https://lists.yoctoproject.org/g/automated-testing/topic/96652823#1219
> 
> After integration of some improvements on regression reporting, it has been
> observed that the regression report of version 4.2_M2 is way too big. When
> checking it, it appears that a big part of the report is composed of "missing
> tests" (regression detected because test status changed from "PASS" to "None").
> It is mostly due to oeselftest results, since oeselftest is run multiple time
> for a single build, but not with the same parameters (so not the same tests
> "sets"), so those test sets are not comparable.
> 
> The proposed serie introduce OSELFTEST_METADATA appended to tests results when
> the TEST_TYPE is "oeselftest". An oeselftest result with those metadata looks
> like this:
> 	[...]
> 	"configuration": {
> 		"HOST_DISTRO": "fedora-36",
> 		"HOST_NAME": "fedora36-ty-3",
> 		"LAYERS": {
> 			[...]
> 		},
> 		"MACHINE": "qemux86",
> 		"STARTTIME": "20230126235248",
> 		"TESTSERIES": "qemux86",
> 		"TEST_TYPE": "oeselftest",
> 		"OESELFTEST_METADATA": {
> 		    "run_all_tests": true,
> 		    "run_tests": null,
> 		    "skips": null,
> 		    "machine": null,
> 		    "select_tags": ["toolchain-user", "toolchain-system"],
> 		    "exclude_tags": null
> 		} 
>  	}
> 	[...]
> 
> Additionally, the serie now makes resulttool look at a METADATA_MATCH_TABLE,
> which tells that when compared test results have a specific TEST_TYPE, it should
> look for some specific metadata to know if tests can be compared or not. It will
> then remove all the false positive in regression reports due to tests present in
> base results but not found in target results because of skipped tests/excluded
> tags
> 
> * this serie prioritize retro-compatibility: if the base test is older (ie: it
> does not have the needed metadata), it will consider tests as "comparable"
> * additionally to tests added in oeqa test cases, some "best effort" manual
> testing has been done, with the following cases:
>   - run a basic test (e.g: `oeselftest -r tinfoils`), collect test result, break
>     test, collect result, ensure tests are compared. Change oeselftest
>     parameters, ensure tests are not compared
>   - collect base and target tests results from 4.2_M2 regression report,
>     manually add new metadata to some tests, replay regression report, ensure
>     that regressions are kept or discarded depending on the metadata

I think this is heading in the right direction. Firstly, can we put
some kind of test script into OE-Core for making debugging/testing this
easier?

I'm wondering if we can take some of the code from qa_send_email and
move it into OE-Core such that I could do something like:

show-regression-report 4.2_M1 4.2_M2

which would then resolve those two tags to commits, find the
testresults repo, fetch the data depth1 then call resulttool regression
on them.

I did that manually to experiment. I realised that if we do something
like:

    if "MACHINE" in base_configuration and "MACHINE" in target_configuration:
        if base_configuration["MACHINE"] != target_configuration["MACHINE"]:
            print("Skipping")
            return False

in metadata_matches() we can skip a lot of mismatched combinations even
with the older test results. I think we also should be able to use some
pattern matching to generate a dummy OESELFTEST_METADATA section for
older data which doesn't have it. For example, the presence of meta_ide
tests indicates one particular type of test. Combined with the MACHINE
match, this should let us compare old and new data? That would mean
metadata_matches() would need to see into the actual results too.

Does that make sense?

Cheers,

Richard



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OE-core] [PATCH v2 0/4] scripts/resulttool/regression: add metadata filtering
  2023-02-16  0:02 ` [OE-core] [PATCH v2 0/4] scripts/resulttool/regression: add metadata filtering Richard Purdie
@ 2023-02-16  8:56   ` Alexis Lothoré
  2023-02-16 10:33     ` Richard Purdie
  0 siblings, 1 reply; 8+ messages in thread
From: Alexis Lothoré @ 2023-02-16  8:56 UTC (permalink / raw)
  To: Richard Purdie, openembedded-core; +Cc: alexandre.belloni, thomas.petazzoni

On 2/16/23 01:02, Richard Purdie wrote:
> On Tue, 2023-02-14 at 17:53 +0100, Alexis Lothoré via
> lists.openembedded.org wrote:
>> From: Alexis Lothoré <alexis.lothore@bootlin.com>
>> * this serie prioritize retro-compatibility: if the base test is older (ie: it
>> does not have the needed metadata), it will consider tests as "comparable"
>> * additionally to tests added in oeqa test cases, some "best effort" manual
>> testing has been done, with the following cases:
>>   - run a basic test (e.g: `oeselftest -r tinfoils`), collect test result, break
>>     test, collect result, ensure tests are compared. Change oeselftest
>>     parameters, ensure tests are not compared
>>   - collect base and target tests results from 4.2_M2 regression report,
>>     manually add new metadata to some tests, replay regression report, ensure
>>     that regressions are kept or discarded depending on the metadata
> 
> I think this is heading in the right direction. Firstly, can we put
> some kind of test script into OE-Core for making debugging/testing this
> easier?
> 
> I'm wondering if we can take some of the code from qa_send_email and
> move it into OE-Core such that I could do something like:
> 
> show-regression-report 4.2_M1 4.2_M2
> 
> which would then resolve those two tags to commits, find the
> testresults repo, fetch the data depth1 then call resulttool regression
> on them.

That totally makes sense, it will make future change easier to test, thanks for
the suggestion. I will think about how I can rework/transfer some code to do
that. I feel that one thing could still be troublesome to do so: since there
will be linked modification in yocto-autobuilder-helper and openembedded, do you
know if yocto-autobuilder-helper is fetched on same branch family as all
repositories needed for build ? From a quick reading, I see that it is by
default on "master" in config.py in yocto-autobuilder2 repository, so I think it
is always on master, which will be an issue to solve ?

> 
> I did that manually to experiment. I realised that if we do something
> like:
> 
>     if "MACHINE" in base_configuration and "MACHINE" in target_configuration:
>         if base_configuration["MACHINE"] != target_configuration["MACHINE"]:
>             print("Skipping")
>             return False
> 
> in metadata_matches() we can skip a lot of mismatched combinations even
> with the older test results.

Indeed, comparing MACHINE when present on both results can help too. I may have
skipped this because of confusion about our previous discussion on some tests
being "cross-compared" while other are "strictly" compared ! I will ensure that
most of scenarios allow comparing MACHINE, and if so, I will add it in
metadata_matches to be checked first (because cheap to test), and if it matches,
check OESELFTEST_METADATA too (more expensive)

> I think we also should be able to use some
> pattern matching to generate a dummy OESELFTEST_METADATA section for
> older data which doesn't have it. For example, the presence of meta_ide
> tests indicates one particular type of test. Combined with the MACHINE
> match, this should let us compare old and new data? That would mean
> metadata_matches() would need to see into the actual results too.

I thought about this possibility while starting to work on this (checking for
existing metadata to kind of generate/guess test configuration), but I have felt
that it would need too much "business logic" encoding in resulttool scripts,
which could easily break if we want to change some tests configuration. I mean,
if someone changes some parameters passed to oe-selftest in builders
configuration, he will very probably not remember to update some "guess" filters
in resulttool.

I will start working on new revision of this series based on your comments

Kind regards,

-- 
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OE-core] [PATCH v2 0/4] scripts/resulttool/regression: add metadata filtering
  2023-02-16  8:56   ` Alexis Lothoré
@ 2023-02-16 10:33     ` Richard Purdie
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Purdie @ 2023-02-16 10:33 UTC (permalink / raw)
  To: Alexis Lothoré, openembedded-core
  Cc: alexandre.belloni, thomas.petazzoni

On Thu, 2023-02-16 at 09:56 +0100, Alexis Lothoré wrote:
> On 2/16/23 01:02, Richard Purdie wrote:
> > On Tue, 2023-02-14 at 17:53 +0100, Alexis Lothoré via
> > lists.openembedded.org wrote:
> > > From: Alexis Lothoré <alexis.lothore@bootlin.com>
> > > * this serie prioritize retro-compatibility: if the base test is older (ie: it
> > > does not have the needed metadata), it will consider tests as "comparable"
> > > * additionally to tests added in oeqa test cases, some "best effort" manual
> > > testing has been done, with the following cases:
> > >   - run a basic test (e.g: `oeselftest -r tinfoils`), collect test result, break
> > >     test, collect result, ensure tests are compared. Change oeselftest
> > >     parameters, ensure tests are not compared
> > >   - collect base and target tests results from 4.2_M2 regression report,
> > >     manually add new metadata to some tests, replay regression report, ensure
> > >     that regressions are kept or discarded depending on the metadata
> > 
> > I think this is heading in the right direction. Firstly, can we put
> > some kind of test script into OE-Core for making debugging/testing this
> > easier?
> > 
> > I'm wondering if we can take some of the code from qa_send_email and
> > move it into OE-Core such that I could do something like:
> > 
> > show-regression-report 4.2_M1 4.2_M2

I was thinking "yocto-testresult-query regression-report" might be a
better name/command, then it can default to using the yocto-testresults
repo and resolving yocto tags. You'll also likely want to specify a
workdir for the operation but those are implementation details :)

> > 
> > which would then resolve those two tags to commits, find the
> > testresults repo, fetch the data depth1 then call resulttool regression
> > on them.
> 
> That totally makes sense, it will make future change easier to test, thanks for
> the suggestion. I will think about how I can rework/transfer some code to do
> that. I feel that one thing could still be troublesome to do so: since there
> will be linked modification in yocto-autobuilder-helper and openembedded, do you
> know if yocto-autobuilder-helper is fetched on same branch family as all
> repositories needed for build ? From a quick reading, I see that it is by
> default on "master" in config.py in yocto-autobuilder2 repository, so I think it
> is always on master, which will be an issue to solve ?

Unlike yocto-autobuilder2, there is a branch of yocto-autobuilder-
helper per release. We can therefore move code from there to OE-Core on
the master branch and we'll be ok. It will be a small headache for SWAT
whilst we test it but Alexandre is on cc and will cope :).

> > 
> > I did that manually to experiment. I realised that if we do something
> > like:
> > 
> >     if "MACHINE" in base_configuration and "MACHINE" in target_configuration:
> >         if base_configuration["MACHINE"] != target_configuration["MACHINE"]:
> >             print("Skipping")
> >             return False
> > 
> > in metadata_matches() we can skip a lot of mismatched combinations even
> > with the older test results.
> 
> Indeed, comparing MACHINE when present on both results can help too. I may have
> skipped this because of confusion about our previous discussion on some tests
> being "cross-compared" while other are "strictly" compared ! I will ensure that
> most of scenarios allow comparing MACHINE, and if so, I will add it in
> metadata_matches to be checked first (because cheap to test), and if it matches,
> check OESELFTEST_METADATA too (more expensive)

MACHINE should always match, that is fine. What doesn't need to match
are the distros and so on.

> > I think we also should be able to use some
> > pattern matching to generate a dummy OESELFTEST_METADATA section for
> > older data which doesn't have it. For example, the presence of meta_ide
> > tests indicates one particular type of test. Combined with the MACHINE
> > match, this should let us compare old and new data? That would mean
> > metadata_matches() would need to see into the actual results too.
> 
> I thought about this possibility while starting to work on this (checking for
> existing metadata to kind of generate/guess test configuration), but I have felt
> that it would need too much "business logic" encoding in resulttool scripts,
> which could easily break if we want to change some tests configuration. I mean,
> if someone changes some parameters passed to oe-selftest in builders
> configuration, he will very probably not remember to update some "guess" filters
> in resulttool.

I think it will be important/useful to have good regression reports
against existing test results (e.g. ultimately on our LTS release
branches too). Whilst I don't normally like hardcoding things like
this, I think in this case the benefit outweighs the ugliness as long
as it "fades" over time which it will if we add proper metadata going
forward.

> I will start working on new revision of this series based on your comments

Sounds good, thanks!

Cheers,

Richard


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-02-16 10:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-14 16:53 [PATCH v2 0/4] scripts/resulttool/regression: add metadata filtering alexis.lothore
2023-02-14 16:53 ` [PATCH v2 1/4] scripts/oe-selftest: append metadata to tests results alexis.lothore
2023-02-14 16:53 ` [PATCH v2 2/4] oeqa/selftest/resulttooltests: fix minor typo alexis.lothore
2023-02-14 16:53 ` [PATCH v2 3/4] scripts/resulttool/regression: add metadata filtering for oeselftest alexis.lothore
2023-02-14 16:53 ` [PATCH v2 4/4] oeqa/selftest/resulttool: add test for metadata filtering on regression alexis.lothore
2023-02-16  0:02 ` [OE-core] [PATCH v2 0/4] scripts/resulttool/regression: add metadata filtering Richard Purdie
2023-02-16  8:56   ` Alexis Lothoré
2023-02-16 10:33     ` Richard Purdie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).