Regression caught by replay_kernel.py:ReplayKernelNormal.test_aarch64

All of lore.kernel.org
 help / color / mirror / Atom feed

* Regression caught by replay_kernel.py:ReplayKernelNormal.test_aarch64_virt
@ 2021-07-27  0:39 Cleber Rosa
  2021-07-27  5:57 ` Pavel Dovgalyuk
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Cleber Rosa @ 2021-07-27  0:39 UTC (permalink / raw)
  To: Richard Henderson, Peter Maydell, Pavel Dovgalyuk, QEMU devel


Hi everyone,

tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_aarch64_virt
is currently failing consistently (first found that in [1]).

I've bisected it down to the following commit:

---

78ff82bb1b67c0d79113688e4b3427fc99cab9d4 is the first bad commit
commit 78ff82bb1b67c0d79113688e4b3427fc99cab9d4
Author: Richard Henderson <richard.henderson@linaro.org>

    accel/tcg: Reduce CF_COUNT_MASK to match TCG_MAX_INSNS
    
    The space reserved for CF_COUNT_MASK was overly large.
    Reduce to free up cflags bits and eliminate an extra test.
    
    Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
    Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
    Message-Id: <20210717221851.2124573-2-richard.henderson@linaro.org>

 accel/tcg/translate-all.c | 5 ++---
 include/exec/exec-all.h   | 4 +++-
 2 files changed, 5 insertions(+), 4 deletions(-)

---

To reproduce it:

1. configure --target-list=aarch64-softmmu
2. meson compile
3. make check-venv
4. ./tests/venv/bin/avocado --show=test run tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_aarch64_virt

PS: I haven't had the time yet to scan the mailing list for possible
discussions about it.

[1] https://gitlab.com/qemu-project/qemu/-/jobs/1445513133#L268

-- 
Cleber Rosa
[ Sr Software Engineer - Virtualization Team - Red Hat ]
[ Avocado Test Framework - avocado-framework.github.io ]
[  7ABB 96EB 8B46 B94D 5E0F  E9BB 657E 8D33 A5F2 09F3  ]



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression caught by replay_kernel.py:ReplayKernelNormal.test_aarch64_virt
  2021-07-27  0:39 Regression caught by replay_kernel.py:ReplayKernelNormal.test_aarch64_virt Cleber Rosa
@ 2021-07-27  5:57 ` Pavel Dovgalyuk
  2021-07-27  7:36 ` Peter Maydell
  2021-07-27  9:16 ` Peter Maydell
  2 siblings, 0 replies; 9+ messages in thread
From: Pavel Dovgalyuk @ 2021-07-27  5:57 UTC (permalink / raw)
  To: Cleber Rosa, Richard Henderson, Peter Maydell, QEMU devel

On 27.07.2021 03:39, Cleber Rosa wrote:
> 
> Hi everyone,
> 
> tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_aarch64_virt
> is currently failing consistently (first found that in [1]).
> 
> I've bisected it down to the following commit:

Thanks for bisecting.
I didn't try to understand why the bug happens, but it can be solved 
with the following patch:

--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1428,7 +1428,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,

      max_insns = cflags & CF_COUNT_MASK;
      if (max_insns == 0) {
-        max_insns = TCG_MAX_INSNS;
+        max_insns = CF_COUNT_MASK;
      }
      QEMU_BUILD_BUG_ON(CF_COUNT_MASK + 1 != TCG_MAX_INSNS);


> 
> ---
> 
> 78ff82bb1b67c0d79113688e4b3427fc99cab9d4 is the first bad commit
> commit 78ff82bb1b67c0d79113688e4b3427fc99cab9d4
> Author: Richard Henderson <richard.henderson@linaro.org>
> 
>      accel/tcg: Reduce CF_COUNT_MASK to match TCG_MAX_INSNS
>      
>      The space reserved for CF_COUNT_MASK was overly large.
>      Reduce to free up cflags bits and eliminate an extra test.
>      
>      Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>      Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>      Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>      Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>      Message-Id: <20210717221851.2124573-2-richard.henderson@linaro.org>
> 
>   accel/tcg/translate-all.c | 5 ++---
>   include/exec/exec-all.h   | 4 +++-
>   2 files changed, 5 insertions(+), 4 deletions(-)
> 
> ---
> 
> To reproduce it:
> 
> 1. configure --target-list=aarch64-softmmu
> 2. meson compile
> 3. make check-venv
> 4. ./tests/venv/bin/avocado --show=test run tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_aarch64_virt
> 
> PS: I haven't had the time yet to scan the mailing list for possible
> discussions about it.
> 
> [1] https://gitlab.com/qemu-project/qemu/-/jobs/1445513133#L268
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression caught by replay_kernel.py:ReplayKernelNormal.test_aarch64_virt
  2021-07-27  0:39 Regression caught by replay_kernel.py:ReplayKernelNormal.test_aarch64_virt Cleber Rosa
  2021-07-27  5:57 ` Pavel Dovgalyuk
@ 2021-07-27  7:36 ` Peter Maydell
  2021-07-27 13:18   ` Cleber Rosa
  2021-07-27  9:16 ` Peter Maydell
  2 siblings, 1 reply; 9+ messages in thread
From: Peter Maydell @ 2021-07-27  7:36 UTC (permalink / raw)
  To: Cleber Rosa; +Cc: Pavel Dovgalyuk, Richard Henderson, QEMU devel

On Tue, 27 Jul 2021 at 01:39, Cleber Rosa <crosa@redhat.com> wrote:
>
>
> Hi everyone,
>
> tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_aarch64_virt
> is currently failing consistently (first found that in [1]).
>
> I've bisected it down to the following commit:
>
> ---
>
> 78ff82bb1b67c0d79113688e4b3427fc99cab9d4 is the first bad commit
> commit 78ff82bb1b67c0d79113688e4b3427fc99cab9d4
> Author: Richard Henderson <richard.henderson@linaro.org>
>
>     accel/tcg: Reduce CF_COUNT_MASK to match TCG_MAX_INSNS
>
>     The space reserved for CF_COUNT_MASK was overly large.
>     Reduce to free up cflags bits and eliminate an extra test.
>
>     Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>     Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>     Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>     Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>     Message-Id: <20210717221851.2124573-2-richard.henderson@linaro.org>
>
>  accel/tcg/translate-all.c | 5 ++---
>  include/exec/exec-all.h   | 4 +++-
>  2 files changed, 5 insertions(+), 4 deletions(-)

This is probably fixed by
https://patchew.org/QEMU/20210725174405.24568-1-peter.maydell@linaro.org/
(which is in RTH's pullreq currently on list).

-- PMM


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression caught by replay_kernel.py:ReplayKernelNormal.test_aarch64_virt
  2021-07-27  0:39 Regression caught by replay_kernel.py:ReplayKernelNormal.test_aarch64_virt Cleber Rosa
  2021-07-27  5:57 ` Pavel Dovgalyuk
  2021-07-27  7:36 ` Peter Maydell
@ 2021-07-27  9:16 ` Peter Maydell
  2021-07-27 13:23   ` Cleber Rosa
  2 siblings, 1 reply; 9+ messages in thread
From: Peter Maydell @ 2021-07-27  9:16 UTC (permalink / raw)
  To: Cleber Rosa; +Cc: Pavel Dovgalyuk, Richard Henderson, QEMU devel

On Tue, 27 Jul 2021 at 01:39, Cleber Rosa <crosa@redhat.com> wrote:
> tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_aarch64_virt
> is currently failing consistently (first found that in [1]).

FWIW I find that on my local machine this test is consistently flaky
and always has been, so I just ignore any failure I see in it when
running 'make check-acceptance' locally.

-- PMM


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression caught by replay_kernel.py:ReplayKernelNormal.test_aarch64_virt
  2021-07-27  7:36 ` Peter Maydell
@ 2021-07-27 13:18   ` Cleber Rosa
  2021-07-27 13:46     ` Peter Maydell
  0 siblings, 1 reply; 9+ messages in thread
From: Cleber Rosa @ 2021-07-27 13:18 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Pavel Dovgalyuk, Richard Henderson, QEMU devel

On Tue, Jul 27, 2021 at 3:37 AM Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Tue, 27 Jul 2021 at 01:39, Cleber Rosa <crosa@redhat.com> wrote:
> >
> >
> > Hi everyone,
> >
> > tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_aarch64_virt
> > is currently failing consistently (first found that in [1]).
> >
> > I've bisected it down to the following commit:
> >
> > ---
> >
> > 78ff82bb1b67c0d79113688e4b3427fc99cab9d4 is the first bad commit
> > commit 78ff82bb1b67c0d79113688e4b3427fc99cab9d4
> > Author: Richard Henderson <richard.henderson@linaro.org>
> >
> >     accel/tcg: Reduce CF_COUNT_MASK to match TCG_MAX_INSNS
> >
> >     The space reserved for CF_COUNT_MASK was overly large.
> >     Reduce to free up cflags bits and eliminate an extra test.
> >
> >     Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> >     Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> >     Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> >     Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> >     Message-Id: <20210717221851.2124573-2-richard.henderson@linaro.org>
> >
> >  accel/tcg/translate-all.c | 5 ++---
> >  include/exec/exec-all.h   | 4 +++-
> >  2 files changed, 5 insertions(+), 4 deletions(-)
>
> This is probably fixed by
> https://patchew.org/QEMU/20210725174405.24568-1-peter.maydell@linaro.org/
> (which is in RTH's pullreq currently on list).
>
> -- PMM
>

Actually, it is already fixed by df3a2de51a07089a4a729fe1f792f658df9dade4.

BTW, TCG looks like the right place where the bug was, because it
affected other targets and machines.  This is the actual list of tests
I was seeing the same issue (and are now fixed):

(1/4) tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_aarch64_virt:
PASS (8.86 s)
(2/4) tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_arm_virt:
PASS (13.42 s)
(3/4) tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_m68k_mcf5208evb:
PASS (3.20 s)
(4/4) tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_xtensa_lx60:
PASS (12.29 s)

Cheers,
- Cleber.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression caught by replay_kernel.py:ReplayKernelNormal.test_aarch64_virt
  2021-07-27  9:16 ` Peter Maydell
@ 2021-07-27 13:23   ` Cleber Rosa
  2021-07-27 13:47     ` Peter Maydell
  0 siblings, 1 reply; 9+ messages in thread
From: Cleber Rosa @ 2021-07-27 13:23 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Pavel Dovgalyuk, Richard Henderson, QEMU devel

On Tue, Jul 27, 2021 at 5:17 AM Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Tue, 27 Jul 2021 at 01:39, Cleber Rosa <crosa@redhat.com> wrote:
> > tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_aarch64_virt
> > is currently failing consistently (first found that in [1]).
>
> FWIW I find that on my local machine this test is consistently flaky
> and always has been, so I just ignore any failure I see in it when
> running 'make check-acceptance' locally.
>
> -- PMM
>

Hi Peter,

Yes, I've spent quite some time with some flaky behavior while running
the replay tests as well. But in the end, the test remained unchanged
because we found the issues in the actual code under test (one time
the recording of the replay file would sometimes be corrupted when
using >=1 CPUs, but 100% of the time when using a single CPU).

This time, it was failing 100% of the time in my experience, and now,
after the fix in df3a2de51a07089a4a729fe1f792f658df9dade4, it's
passing 100% of the time.  So I guess even tests with some observed
flakiness can have their value.

Cheers,
- Cleber.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression caught by replay_kernel.py:ReplayKernelNormal.test_aarch64_virt
  2021-07-27 13:18   ` Cleber Rosa
@ 2021-07-27 13:46     ` Peter Maydell
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Maydell @ 2021-07-27 13:46 UTC (permalink / raw)
  To: Cleber Rosa; +Cc: Pavel Dovgalyuk, Richard Henderson, QEMU devel

On Tue, 27 Jul 2021 at 14:18, Cleber Rosa <crosa@redhat.com> wrote:
>
> On Tue, Jul 27, 2021 at 3:37 AM Peter Maydell <peter.maydell@linaro.org> wrote:
> > This is probably fixed by
> > https://patchew.org/QEMU/20210725174405.24568-1-peter.maydell@linaro.org/
> > (which is in RTH's pullreq currently on list).

> Actually, it is already fixed by df3a2de51a07089a4a729fe1f792f658df9dade4.

That is the patchset I linked to above, which has now reached master :-)

-- PMM


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression caught by replay_kernel.py:ReplayKernelNormal.test_aarch64_virt
  2021-07-27 13:23   ` Cleber Rosa
@ 2021-07-27 13:47     ` Peter Maydell
  2021-07-27 14:15       ` Cleber Rosa
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Maydell @ 2021-07-27 13:47 UTC (permalink / raw)
  To: Cleber Rosa; +Cc: Pavel Dovgalyuk, Richard Henderson, QEMU devel

On Tue, 27 Jul 2021 at 14:24, Cleber Rosa <crosa@redhat.com> wrote:
> Yes, I've spent quite some time with some flaky behavior while running
> the replay tests as well. But in the end, the test remained unchanged
> because we found the issues in the actual code under test (one time
> the recording of the replay file would sometimes be corrupted when
> using >=1 CPUs, but 100% of the time when using a single CPU).
>
> This time, it was failing 100% of the time in my experience, and now,
> after the fix in df3a2de51a07089a4a729fe1f792f658df9dade4, it's
> passing 100% of the time.  So I guess even tests with some observed
> flakiness can have their value.

To me they have very little value, because once I notice a test
is flaky I simply start to ignore whether it is passing or failing,
and then it might as well not be there at all.
(This is happening currently with the gitlab CI tests, which have
been failing for a week.)

-- PMM


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression caught by replay_kernel.py:ReplayKernelNormal.test_aarch64_virt
  2021-07-27 13:47     ` Peter Maydell
@ 2021-07-27 14:15       ` Cleber Rosa
  0 siblings, 0 replies; 9+ messages in thread
From: Cleber Rosa @ 2021-07-27 14:15 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Pavel Dovgalyuk, Richard Henderson, QEMU devel

On Tue, Jul 27, 2021 at 9:48 AM Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Tue, 27 Jul 2021 at 14:24, Cleber Rosa <crosa@redhat.com> wrote:
> > Yes, I've spent quite some time with some flaky behavior while running
> > the replay tests as well. But in the end, the test remained unchanged
> > because we found the issues in the actual code under test (one time
> > the recording of the replay file would sometimes be corrupted when
> > using >=1 CPUs, but 100% of the time when using a single CPU).
> >
> > This time, it was failing 100% of the time in my experience, and now,
> > after the fix in df3a2de51a07089a4a729fe1f792f658df9dade4, it's
> > passing 100% of the time.  So I guess even tests with some observed
> > flakiness can have their value.
>
> To me they have very little value, because once I notice a test
> is flaky I simply start to ignore whether it is passing or failing,
> and then it might as well not be there at all.
> (This is happening currently with the gitlab CI tests, which have
> been failing for a week.)
>
> -- PMM
>

I hear you... and I acknowledge that we currently don't have a good
solution for keeping track of the test results data and thus going
beyond one's perceived value of a test.

It's not something for the short term, but I do plan to work on a
"confidence" tracker for tests.  There is some seed work in the CKI
data warehouse project[1] but it's very incipient.

- Cleber.

[1] - https://gitlab.com/cki-project/datawarehouse/-/blob/main/datawarehouse/views.py#L158



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-07-27 14:16 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-27  0:39 Regression caught by replay_kernel.py:ReplayKernelNormal.test_aarch64_virt Cleber Rosa
2021-07-27  5:57 ` Pavel Dovgalyuk
2021-07-27  7:36 ` Peter Maydell
2021-07-27 13:18   ` Cleber Rosa
2021-07-27 13:46     ` Peter Maydell
2021-07-27  9:16 ` Peter Maydell
2021-07-27 13:23   ` Cleber Rosa
2021-07-27 13:47     ` Peter Maydell
2021-07-27 14:15       ` Cleber Rosa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.