qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* getting the console output for s390 cdrom-test?
@ 2021-01-22 20:32 Peter Maydell
  2021-02-04 16:08 ` Thomas Huth
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Maydell @ 2021-01-22 20:32 UTC (permalink / raw)
  To: QEMU Developers; +Cc: Thomas Huth, Cornelia Huck, Eric Farman

Hi; I've been looking at why the s390 cdrom test has an intermittent
failure on my aarch64 box. Looking at some TCG debug log output
I think what is happening is that sometimes execution diverges from
a successful run inside virtio_scsi_setup() and we end up failing
a vs_assert(), which triggers a "Guest crashed on cpu 0: disabled-wait"
which then makes the qtest hang until its timeout.

I think that vs_assert() ought to be printing some information
to the console about which assert fails when it happens, but
how do I need to tweak the qtest to get it to capture this
console log somewhere?

Specifically, the test in question is this one:
QTEST_QEMU_BINARY=qemu-system-s390x
./build/s390/tests/qtest/cdrom-test -p
/s390x/cdrom/boot/without-bootindex

PS: it would be nice if "guest BIOS asserts and puts the
system into a detected-guest-crash state" resulted in the
test failing rather than hanging :-)

(Annoyingly, most of my attempts to get more information about
where things go wrong seem to cause the bug to stop manifesting
itself: eg building the s390-ccw.img without -O2; enabling
TCG 'exec' logging; enabling 'trace:virtio*' tracepoints.
The failure itself started with commit 7a3d37a3f233 updating
the s390 bios blobs, but the changes that went into the new
blobs don't really look like they would be responsible.
I am starting to have gloomy thoughts about potential missing
memory barrier insns between the CPU thread and the iothread
doing the virtio device end of things...)

thanks
-- PMM


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: getting the console output for s390 cdrom-test?
  2021-01-22 20:32 getting the console output for s390 cdrom-test? Peter Maydell
@ 2021-02-04 16:08 ` Thomas Huth
  2021-02-08 10:27   ` Peter Maydell
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Huth @ 2021-02-04 16:08 UTC (permalink / raw)
  To: Peter Maydell, QEMU Developers; +Cc: Eric Farman, Cornelia Huck, qemu-s390x

On 22/01/2021 21.32, Peter Maydell wrote:
> Hi; I've been looking at why the s390 cdrom test has an intermittent
> failure on my aarch64 box. Looking at some TCG debug log output
> I think what is happening is that sometimes execution diverges from
> a successful run inside virtio_scsi_setup() and we end up failing
> a vs_assert(), which triggers a "Guest crashed on cpu 0: disabled-wait"
> which then makes the qtest hang until its timeout.
> 
> I think that vs_assert() ought to be printing some information
> to the console about which assert fails when it happens, but
> how do I need to tweak the qtest to get it to capture this
> console log somewhere?

  Hi!

Sorry for the late reply ... did you get any further with this already? 
Anyway, in case I need to get the console of a test, I normally modify the 
test case like this:

diff --git a/tests/qtest/cdrom-test.c b/tests/qtest/cdrom-test.c
index 5af944a5fb..52f7ed050d 100644
--- a/tests/qtest/cdrom-test.c
+++ b/tests/qtest/cdrom-test.c
@@ -120,7 +120,7 @@ static void test_cdboot(gconstpointer data)
  {
      QTestState *qts;

-    qts = qtest_initf("-accel kvm -accel tcg -no-shutdown %s%s", (const 
char *)data,
+    qts = qtest_initf("-serial file:/tmp/stdio.txt -accel kvm -accel tcg 
-no-shutdown %s%s", (const char *)data,
                        isoimage);
      boot_sector_test(qts);
      qtest_quit(qts);

> PS: it would be nice if "guest BIOS asserts and puts the
> system into a detected-guest-crash state" resulted in the
> test failing rather than hanging :-)

We could maybe check for panic events in boot_sector_test() ... I can have a 
try when time permits...

> (Annoyingly, most of my attempts to get more information about
> where things go wrong seem to cause the bug to stop manifesting
> itself: eg building the s390-ccw.img without -O2; enabling
> TCG 'exec' logging; enabling 'trace:virtio*' tracepoints.
> The failure itself started with commit 7a3d37a3f233 updating
> the s390 bios blobs, but the changes that went into the new
> blobs don't really look like they would be responsible.
> I am starting to have gloomy thoughts about potential missing
> memory barrier insns between the CPU thread and the iothread
> doing the virtio device end of things...)

A Heisen-bug ... ugh ... but I wonder whether it's maybe rather an issue in 
the aarch64 TCG backend, since we've never seen this on other architectures 
before?

  Thomas



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: getting the console output for s390 cdrom-test?
  2021-02-04 16:08 ` Thomas Huth
@ 2021-02-08 10:27   ` Peter Maydell
  2021-02-08 11:34     ` Thomas Huth
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Maydell @ 2021-02-08 10:27 UTC (permalink / raw)
  To: Thomas Huth; +Cc: qemu-s390x, Eric Farman, Cornelia Huck, QEMU Developers

On Thu, 4 Feb 2021 at 16:08, Thomas Huth <thuth@redhat.com> wrote:
>
> On 22/01/2021 21.32, Peter Maydell wrote:
> > Hi; I've been looking at why the s390 cdrom test has an intermittent
> > failure on my aarch64 box. Looking at some TCG debug log output
> > I think what is happening is that sometimes execution diverges from
> > a successful run inside virtio_scsi_setup() and we end up failing
> > a vs_assert(), which triggers a "Guest crashed on cpu 0: disabled-wait"
> > which then makes the qtest hang until its timeout.
> >
> > I think that vs_assert() ought to be printing some information
> > to the console about which assert fails when it happens, but
> > how do I need to tweak the qtest to get it to capture this
> > console log somewhere?
>
>   Hi!
>
> Sorry for the late reply ... did you get any further with this already?

No, I've been mostly working on other stuff. Thanks for the instructions
on how to capture stdio. This is what a success looks like:

LOADPARM=[        ]
Using virtio-scsi.
Warning: Could not locate a usable virtio-scsi device
Using virtio-blk.
Using guessed DASD geometry.
Using ECKD scheme (block size  4096), CDL
No zIPL section in IPL2 record.
zIPL load failed.
Using virtio-blk.
ISO boot image size verified

And this is a failure:

LOADPARM=[        ]
Using virtio-scsi.
target: 0x0000000000000094

! SCSI cannot report LUNs: response VS RESP=09 !

> A Heisen-bug ... ugh ... but I wonder whether it's maybe rather an issue in
> the aarch64 TCG backend, since we've never seen this on other architectures
> before?

Certainly possible; but I would tend to expect TCG backend issues
to be consistently reproducible, not intermittent like this one.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: getting the console output for s390 cdrom-test?
  2021-02-08 10:27   ` Peter Maydell
@ 2021-02-08 11:34     ` Thomas Huth
  2021-02-08 12:08       ` Peter Maydell
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Huth @ 2021-02-08 11:34 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-s390x, Eric Farman, Cornelia Huck, QEMU Developers

On 08/02/2021 11.27, Peter Maydell wrote:
> On Thu, 4 Feb 2021 at 16:08, Thomas Huth <thuth@redhat.com> wrote:
>>
>> On 22/01/2021 21.32, Peter Maydell wrote:
>>> Hi; I've been looking at why the s390 cdrom test has an intermittent
>>> failure on my aarch64 box. Looking at some TCG debug log output
>>> I think what is happening is that sometimes execution diverges from
>>> a successful run inside virtio_scsi_setup() and we end up failing
>>> a vs_assert(), which triggers a "Guest crashed on cpu 0: disabled-wait"
>>> which then makes the qtest hang until its timeout.
>>>
>>> I think that vs_assert() ought to be printing some information
>>> to the console about which assert fails when it happens, but
>>> how do I need to tweak the qtest to get it to capture this
>>> console log somewhere?
>>
>>    Hi!
>>
>> Sorry for the late reply ... did you get any further with this already?
> 
> No, I've been mostly working on other stuff. Thanks for the instructions
> on how to capture stdio. This is what a success looks like:
> 
> LOADPARM=[        ]
> Using virtio-scsi.
> Warning: Could not locate a usable virtio-scsi device
> Using virtio-blk.
> Using guessed DASD geometry.
> Using ECKD scheme (block size  4096), CDL
> No zIPL section in IPL2 record.
> zIPL load failed.
> Using virtio-blk.
> ISO boot image size verified
> 
> And this is a failure:
> 
> LOADPARM=[        ]
> Using virtio-scsi.
> target: 0x0000000000000094
> 
> ! SCSI cannot report LUNs: response VS RESP=09 !

Looks like the SCSI controller returned VIRTIO_SCSI_S_FAILURE instead of the 
expected VIRTIO_SCSI_S_BAD_TARGET here (see virtio_scsi_locate_device() in 
pc-bios/s390-ccw/virtio-scsi.c).

The question is: How could that happen? If I get hw/scsi/virtio-scsi.c 
right, this is only set by virtio_scsi_fail_cmd_req(), i.e. it only happens 
if virtio_scsi_parse_req() returned -ENOTSUP ... which indicates that there 
was something wrong with the VirtIOSCSIReq request?

Could you maybe try to add some debug printfs to virtio_scsi_parse_req() in 
hw/scsi/virtio-scsi.c and/or to scsi_report_luns() / vs_run() in 
pc-bios/s390-ccw/virtio-scsi.c to see whether there is something obviously 
wrong in the request?

  Thomas



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: getting the console output for s390 cdrom-test?
  2021-02-08 11:34     ` Thomas Huth
@ 2021-02-08 12:08       ` Peter Maydell
  2021-02-09 14:58         ` Peter Maydell
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Maydell @ 2021-02-08 12:08 UTC (permalink / raw)
  To: Thomas Huth; +Cc: qemu-s390x, Eric Farman, Cornelia Huck, QEMU Developers

On Mon, 8 Feb 2021 at 11:34, Thomas Huth <thuth@redhat.com> wrote:
> Looks like the SCSI controller returned VIRTIO_SCSI_S_FAILURE instead of the
> expected VIRTIO_SCSI_S_BAD_TARGET here (see virtio_scsi_locate_device() in
> pc-bios/s390-ccw/virtio-scsi.c).
>
> The question is: How could that happen? If I get hw/scsi/virtio-scsi.c
> right, this is only set by virtio_scsi_fail_cmd_req(), i.e. it only happens
> if virtio_scsi_parse_req() returned -ENOTSUP ... which indicates that there
> was something wrong with the VirtIOSCSIReq request?

Yes, virtio_scsi_parse_req() returns ENOTSUP because it
fails the "if (out_size && in_size)" test.

I am becoming somewhat suspicious that the s390-ccw BIOS's
implementation of virtio is not putting in sufficient barriers,
and so if you get unlucky then the QEMU thread sees an inconsistent
view of the in-memory virtio data structures. For instance,
the virtio spec says you must have a memory barrier after
writing the available ring entries and before incrementing the
available index, but pc-bios/s390-ccw/virtio.c:vring_send_buf()
has no kind of enforcement of ordering between these two steps.

Linux's arch/s390/include/asm/barrier.h suggests s390 needs real
CPU barrier insns here; even if it didn't you would at least want
enough of a compiler-barrier to tell the compiler not to try to
reorder anything past it.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: getting the console output for s390 cdrom-test?
  2021-02-08 12:08       ` Peter Maydell
@ 2021-02-09 14:58         ` Peter Maydell
  2021-02-09 17:10           ` Cornelia Huck
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Maydell @ 2021-02-09 14:58 UTC (permalink / raw)
  To: Thomas Huth; +Cc: qemu-s390x, Eric Farman, Cornelia Huck, QEMU Developers

On Mon, 8 Feb 2021 at 12:08, Peter Maydell <peter.maydell@linaro.org> wrote:
> Yes, virtio_scsi_parse_req() returns ENOTSUP because it
> fails the "if (out_size && in_size)" test.
>
> I am becoming somewhat suspicious that the s390-ccw BIOS's
> implementation of virtio is not putting in sufficient barriers,
> and so if you get unlucky then the QEMU thread sees an inconsistent
> view of the in-memory virtio data structures.

As a test of this theory, I tried adding some barrier insns
(with the definition of the barrier insn borrowed from s390x Linux):

diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c
index ab49840db85..0af901264b6 100644
--- a/pc-bios/s390-ccw/virtio.c
+++ b/pc-bios/s390-ccw/virtio.c
@@ -17,6 +17,12 @@
 #include "helper.h"
 #include "s390-time.h"

+#define membarrier() do { asm volatile("bcr 15,0\n" :: : "memory"); } while (0)
+
+#define __ASM_BARRIER "bcr 15,0\n"
+
+
+
 #define VRING_WAIT_REPLY_TIMEOUT 30

 static VRing block[VIRTIO_MAX_VQS];
@@ -154,12 +160,15 @@ void vring_send_buf(VRing *vr, void *p, int len,
int flags)

     /* Chains only have a single ID */
     if (!(flags & VRING_DESC_F_NEXT)) {
+        membarrier();
         vr->avail->idx++;
+        membarrier();
     }
 }

 int vr_poll(VRing *vr)
 {
+    membarrier();
     if (vr->used->idx == vr->used_idx) {
         vring_notify(vr);
         yield();
@@ -169,7 +178,9 @@ int vr_poll(VRing *vr)
     vr->used_idx = vr->used->idx;
     vr->next_idx = 0;
     vr->desc[0].len = 0;
+    membarrier();
     vr->desc[0].flags = 0;
+    membarrier();
     return 1; /* vr has been updated */
 }

This change significantly reduces the frequency with which I see
the hang; but it doesn't get rid of it altogether. Also I couldn't
really figure out from the virtio spec exactly where barriers
were required: I think I would need to read the whole thing in
more detail rather than trying to fish out the information by
just reading small pieces of it. But some of the ordering of
operations the spec describes doesn't match how the s390-ccw
BIOS code is doing it at all (eg the spec says that when feeding
a batch of descriptors to the device the driver isn't supposed to
update the flags on the first descriptor until it's finished
writing all of the descriptors, but the code doesn't seem to
try to do that). So I think the code could use an overhaul from
somebody with a better understanding of virtio than me...

thanks
-- PMM


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: getting the console output for s390 cdrom-test?
  2021-02-09 14:58         ` Peter Maydell
@ 2021-02-09 17:10           ` Cornelia Huck
  2021-02-09 17:17             ` Peter Maydell
  0 siblings, 1 reply; 13+ messages in thread
From: Cornelia Huck @ 2021-02-09 17:10 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Eric Farman, Thomas Huth, QEMU Developers, qemu-s390x

On Tue, 9 Feb 2021 14:58:53 +0000
Peter Maydell <peter.maydell@linaro.org> wrote:

> On Mon, 8 Feb 2021 at 12:08, Peter Maydell <peter.maydell@linaro.org> wrote:
> > Yes, virtio_scsi_parse_req() returns ENOTSUP because it
> > fails the "if (out_size && in_size)" test.
> >
> > I am becoming somewhat suspicious that the s390-ccw BIOS's
> > implementation of virtio is not putting in sufficient barriers,
> > and so if you get unlucky then the QEMU thread sees an inconsistent
> > view of the in-memory virtio data structures.  
> 
> As a test of this theory, I tried adding some barrier insns
> (with the definition of the barrier insn borrowed from s390x Linux):
> 
> diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c
> index ab49840db85..0af901264b6 100644
> --- a/pc-bios/s390-ccw/virtio.c
> +++ b/pc-bios/s390-ccw/virtio.c
> @@ -17,6 +17,12 @@
>  #include "helper.h"
>  #include "s390-time.h"
> 
> +#define membarrier() do { asm volatile("bcr 15,0\n" :: : "memory"); } while (0)
> +
> +#define __ASM_BARRIER "bcr 15,0\n"
> +
> +
> +
>  #define VRING_WAIT_REPLY_TIMEOUT 30
> 
>  static VRing block[VIRTIO_MAX_VQS];
> @@ -154,12 +160,15 @@ void vring_send_buf(VRing *vr, void *p, int len,
> int flags)
> 
>      /* Chains only have a single ID */
>      if (!(flags & VRING_DESC_F_NEXT)) {
> +        membarrier();
>          vr->avail->idx++;
> +        membarrier();
>      }
>  }
> 
>  int vr_poll(VRing *vr)
>  {
> +    membarrier();
>      if (vr->used->idx == vr->used_idx) {
>          vring_notify(vr);
>          yield();
> @@ -169,7 +178,9 @@ int vr_poll(VRing *vr)
>      vr->used_idx = vr->used->idx;
>      vr->next_idx = 0;
>      vr->desc[0].len = 0;
> +    membarrier();
>      vr->desc[0].flags = 0;
> +    membarrier();
>      return 1; /* vr has been updated */
>  }
> 
> This change significantly reduces the frequency with which I see
> the hang; but it doesn't get rid of it altogether. Also I couldn't
> really figure out from the virtio spec exactly where barriers
> were required: I think I would need to read the whole thing in
> more detail rather than trying to fish out the information by
> just reading small pieces of it. 

The Linux virtio-ccw code uses 'weak barriers', i.e. the heavy bcr15
should not be needed. We might well miss other (lightweight) barriers
in other parts of the code part, though.

> But some of the ordering of
> operations the spec describes doesn't match how the s390-ccw
> BIOS code is doing it at all (eg the spec says that when feeding
> a batch of descriptors to the device the driver isn't supposed to
> update the flags on the first descriptor until it's finished
> writing all of the descriptors, but the code doesn't seem to
> try to do that). So I think the code could use an overhaul from
> somebody with a better understanding of virtio than me...

Yeah, the bios virtio code could probably use some love.

I'm wondering how much memory ordering on the host platform influences
things. I doubt many people try to run an s390x guest on an aarch64
host...



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: getting the console output for s390 cdrom-test?
  2021-02-09 17:10           ` Cornelia Huck
@ 2021-02-09 17:17             ` Peter Maydell
  2021-02-09 17:24               ` Cornelia Huck
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Maydell @ 2021-02-09 17:17 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Eric Farman, Thomas Huth, QEMU Developers, qemu-s390x

On Tue, 9 Feb 2021 at 17:10, Cornelia Huck <cohuck@redhat.com> wrote:
>
> On Tue, 9 Feb 2021 14:58:53 +0000
> Peter Maydell <peter.maydell@linaro.org> wrote:
> > This change significantly reduces the frequency with which I see
> > the hang; but it doesn't get rid of it altogether. Also I couldn't
> > really figure out from the virtio spec exactly where barriers
> > were required: I think I would need to read the whole thing in
> > more detail rather than trying to fish out the information by
> > just reading small pieces of it.
>
> The Linux virtio-ccw code uses 'weak barriers', i.e. the heavy bcr15
> should not be needed. We might well miss other (lightweight) barriers
> in other parts of the code part, though.

Is that the version the Linux kernel has as
  /* Fast-BCR without checkpoint synchronization */
  #define __ASM_BARRIER "bcr 14,0\n"

?

> > But some of the ordering of
> > operations the spec describes doesn't match how the s390-ccw
> > BIOS code is doing it at all (eg the spec says that when feeding
> > a batch of descriptors to the device the driver isn't supposed to
> > update the flags on the first descriptor until it's finished
> > writing all of the descriptors, but the code doesn't seem to
> > try to do that). So I think the code could use an overhaul from
> > somebody with a better understanding of virtio than me...
>
> Yeah, the bios virtio code could probably use some love.
>
> I'm wondering how much memory ordering on the host platform influences
> things. I doubt many people try to run an s390x guest on an aarch64
> host...

Yes, you won't see this bug unless you're running QEMU on a
host that's pretty enthusiastic about reordering memory
transactions (and you'd never have seen it at all back when
we ran the iothread actions synchronously with the emulated
CPU, which we probably did back in 2013 when the s390-ccw
virtio code was written...) I haven't tested other aarch64
hosts but I would be unsurprised to find that whether you
could repro it and how frequently depended on the particular
h/w implementation.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: getting the console output for s390 cdrom-test?
  2021-02-09 17:17             ` Peter Maydell
@ 2021-02-09 17:24               ` Cornelia Huck
  2021-02-09 18:25                 ` Peter Maydell
  0 siblings, 1 reply; 13+ messages in thread
From: Cornelia Huck @ 2021-02-09 17:24 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Eric Farman, Thomas Huth, QEMU Developers, qemu-s390x

On Tue, 9 Feb 2021 17:17:19 +0000
Peter Maydell <peter.maydell@linaro.org> wrote:

> On Tue, 9 Feb 2021 at 17:10, Cornelia Huck <cohuck@redhat.com> wrote:
> >
> > On Tue, 9 Feb 2021 14:58:53 +0000
> > Peter Maydell <peter.maydell@linaro.org> wrote:  
> > > This change significantly reduces the frequency with which I see
> > > the hang; but it doesn't get rid of it altogether. Also I couldn't
> > > really figure out from the virtio spec exactly where barriers
> > > were required: I think I would need to read the whole thing in
> > > more detail rather than trying to fish out the information by
> > > just reading small pieces of it.  
> >
> > The Linux virtio-ccw code uses 'weak barriers', i.e. the heavy bcr15
> > should not be needed. We might well miss other (lightweight) barriers
> > in other parts of the code part, though.  
> 
> Is that the version the Linux kernel has as
>   /* Fast-BCR without checkpoint synchronization */
>   #define __ASM_BARRIER "bcr 14,0\n"
> 
> ?

No, just a simple memory barrier in most places (bcr15 and bcr14 do
serialization).

> 
> > > But some of the ordering of
> > > operations the spec describes doesn't match how the s390-ccw
> > > BIOS code is doing it at all (eg the spec says that when feeding
> > > a batch of descriptors to the device the driver isn't supposed to
> > > update the flags on the first descriptor until it's finished
> > > writing all of the descriptors, but the code doesn't seem to
> > > try to do that). So I think the code could use an overhaul from
> > > somebody with a better understanding of virtio than me...  
> >
> > Yeah, the bios virtio code could probably use some love.
> >
> > I'm wondering how much memory ordering on the host platform influences
> > things. I doubt many people try to run an s390x guest on an aarch64
> > host...  
> 
> Yes, you won't see this bug unless you're running QEMU on a
> host that's pretty enthusiastic about reordering memory
> transactions (and you'd never have seen it at all back when
> we ran the iothread actions synchronously with the emulated
> CPU, which we probably did back in 2013 when the s390-ccw
> virtio code was written...) I haven't tested other aarch64
> hosts but I would be unsurprised to find that whether you
> could repro it and how frequently depended on the particular
> h/w implementation.

I guess that hardly any s390x guests run on hw that's not either s390x
or x86...



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: getting the console output for s390 cdrom-test?
  2021-02-09 17:24               ` Cornelia Huck
@ 2021-02-09 18:25                 ` Peter Maydell
  2021-02-12 11:44                   ` Thomas Huth
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Maydell @ 2021-02-09 18:25 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Eric Farman, Thomas Huth, QEMU Developers, qemu-s390x

On Tue, 9 Feb 2021 at 17:24, Cornelia Huck <cohuck@redhat.com> wrote:
>
> On Tue, 9 Feb 2021 17:17:19 +0000
> Peter Maydell <peter.maydell@linaro.org> wrote:
>
> > On Tue, 9 Feb 2021 at 17:10, Cornelia Huck <cohuck@redhat.com> wrote:
> > >
> > > On Tue, 9 Feb 2021 14:58:53 +0000
> > > Peter Maydell <peter.maydell@linaro.org> wrote:
> > > > This change significantly reduces the frequency with which I see
> > > > the hang; but it doesn't get rid of it altogether. Also I couldn't
> > > > really figure out from the virtio spec exactly where barriers
> > > > were required: I think I would need to read the whole thing in
> > > > more detail rather than trying to fish out the information by
> > > > just reading small pieces of it.
> > >
> > > The Linux virtio-ccw code uses 'weak barriers', i.e. the heavy bcr15
> > > should not be needed. We might well miss other (lightweight) barriers
> > > in other parts of the code part, though.
> >
> > Is that the version the Linux kernel has as
> >   /* Fast-BCR without checkpoint synchronization */
> >   #define __ASM_BARRIER "bcr 14,0\n"
> >
> > ?
>
> No, just a simple memory barrier in most places (bcr15 and bcr14 do
> serialization).

OK; if you let me know how that's written for s390 I can give
it a test...

thanks
-- PMM


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: getting the console output for s390 cdrom-test?
  2021-02-09 18:25                 ` Peter Maydell
@ 2021-02-12 11:44                   ` Thomas Huth
  2021-02-12 12:05                     ` Peter Maydell
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Huth @ 2021-02-12 11:44 UTC (permalink / raw)
  To: Peter Maydell, Cornelia Huck; +Cc: qemu-s390x, Eric Farman, QEMU Developers

On 09/02/2021 19.25, Peter Maydell wrote:
> On Tue, 9 Feb 2021 at 17:24, Cornelia Huck <cohuck@redhat.com> wrote:
>>
>> On Tue, 9 Feb 2021 17:17:19 +0000
>> Peter Maydell <peter.maydell@linaro.org> wrote:
>>
>>> On Tue, 9 Feb 2021 at 17:10, Cornelia Huck <cohuck@redhat.com> wrote:
>>>>
>>>> On Tue, 9 Feb 2021 14:58:53 +0000
>>>> Peter Maydell <peter.maydell@linaro.org> wrote:
>>>>> This change significantly reduces the frequency with which I see
>>>>> the hang; but it doesn't get rid of it altogether. Also I couldn't
>>>>> really figure out from the virtio spec exactly where barriers
>>>>> were required: I think I would need to read the whole thing in
>>>>> more detail rather than trying to fish out the information by
>>>>> just reading small pieces of it.
>>>>
>>>> The Linux virtio-ccw code uses 'weak barriers', i.e. the heavy bcr15
>>>> should not be needed. We might well miss other (lightweight) barriers
>>>> in other parts of the code part, though.
>>>
>>> Is that the version the Linux kernel has as
>>>    /* Fast-BCR without checkpoint synchronization */
>>>    #define __ASM_BARRIER "bcr 14,0\n"
>>>
>>> ?
>>
>> No, just a simple memory barrier in most places (bcr15 and bcr14 do
>> serialization).
> 
> OK; if you let me know how that's written for s390 I can give
> it a test...

I guess Cornelia simply meant a:

  asm volatile("nop":::"memory");

Anyway, I've now succeeded in getting my hands on an aarch64 host and tried 
to reproduce the issue, but even after running it in a loop like this:

  for ((x=0;x<200;x++)) ; do \
   QTEST_QEMU_BINARY=./qemu-system-s390x tests/qtest/cdrom-test \
   || break ; \
  done

... I was not able to reproduce the issue. What kind of host distro are you 
using there? Could the exact host CPU type matter here, too? (I was running 
my code on an HPE Apollo system, with Fedora 33, gcc 10.2)

  Thomas



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: getting the console output for s390 cdrom-test?
  2021-02-12 11:44                   ` Thomas Huth
@ 2021-02-12 12:05                     ` Peter Maydell
  2021-02-12 14:10                       ` Thomas Huth
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Maydell @ 2021-02-12 12:05 UTC (permalink / raw)
  To: Thomas Huth; +Cc: qemu-s390x, Eric Farman, Cornelia Huck, QEMU Developers

On Fri, 12 Feb 2021 at 11:45, Thomas Huth <thuth@redhat.com> wrote:
> I guess Cornelia simply meant a:
>
>  asm volatile("nop":::"memory");

That will force the compiler not to reorder, but it doesn't
seem to me that it would really force the memory accesses to
appear to the other host CPU thread that's running the device
emulation. So either it's insufficient in the s390 memory model,
or our s390 emulation isn't being sufficiently rigorous about
implementing the memory model...

> Anyway, I've now succeeded in getting my hands on an aarch64 host and tried
> to reproduce the issue, but even after running it in a loop like this:
>
>   for ((x=0;x<200;x++)) ; do \
>    QTEST_QEMU_BINARY=./qemu-system-s390x tests/qtest/cdrom-test \
>    || break ; \
>   done
>
> ... I was not able to reproduce the issue. What kind of host distro are you
> using there? Could the exact host CPU type matter here, too? (I was running
> my code on an HPE Apollo system, with Fedora 33, gcc 10.2)

I expect the host CPU type matters a lot, yes. dmesg reports it
as "cavium,thunder-88xx", with 96 cores.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: getting the console output for s390 cdrom-test?
  2021-02-12 12:05                     ` Peter Maydell
@ 2021-02-12 14:10                       ` Thomas Huth
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Huth @ 2021-02-12 14:10 UTC (permalink / raw)
  To: Peter Maydell, Christian Borntraeger, Richard Henderson
  Cc: qemu-s390x, Eric Farman, Cornelia Huck, QEMU Developers

On 12/02/2021 13.05, Peter Maydell wrote:
> On Fri, 12 Feb 2021 at 11:45, Thomas Huth <thuth@redhat.com> wrote:
>> I guess Cornelia simply meant a:
>>
>>   asm volatile("nop":::"memory");
> 
> That will force the compiler not to reorder, but it doesn't
> seem to me that it would really force the memory accesses to
> appear to the other host CPU thread that's running the device
> emulation. So either it's insufficient in the s390 memory model,
> or our s390 emulation isn't being sufficiently rigorous about
> implementing the memory model...

AFAIK s390x has a very strict memory ordering model... so maybe this is a 
problem with emulating a strict CPU on a less strict host CPU, indeed? ... 
CC:-ing Christian and Richard, maybe they have some additional ideas here...

  Thomas



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-02-12 14:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-22 20:32 getting the console output for s390 cdrom-test? Peter Maydell
2021-02-04 16:08 ` Thomas Huth
2021-02-08 10:27   ` Peter Maydell
2021-02-08 11:34     ` Thomas Huth
2021-02-08 12:08       ` Peter Maydell
2021-02-09 14:58         ` Peter Maydell
2021-02-09 17:10           ` Cornelia Huck
2021-02-09 17:17             ` Peter Maydell
2021-02-09 17:24               ` Cornelia Huck
2021-02-09 18:25                 ` Peter Maydell
2021-02-12 11:44                   ` Thomas Huth
2021-02-12 12:05                     ` Peter Maydell
2021-02-12 14:10                       ` Thomas Huth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).