All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code
@ 2021-02-16 11:00 Thomas Huth
  2021-02-16 11:43 ` Peter Maydell
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Thomas Huth @ 2021-02-16 11:00 UTC (permalink / raw)
  To: qemu-devel, Cornelia Huck
  Cc: Christian Borntraeger, qemu-s390x, Peter Maydell

According to the virtio specification, a memory barrier should be
used before incrementing the idx field in the "available" ring.
So far, we did not do this in the s390-ccw bios yet, but recently
Peter Maydell saw problems with the s390-ccw bios when running
the qtests on an aarch64 host (the bios panic'ed with the message:
"SCSI cannot report LUNs: response VS RESP=09"), which could
maybe be related to the missing memory barriers. Thus let's add
those barriers now. Since we've only seen the problem on TCG so far,
a "bcr 14,0" should be sufficient here to trigger the tcg_gen_mb()
in the TCG translate code.

(Note: The virtio spec also talks about using a memory barrier
*after* incrementing the idx field, but if I understood correctly
this is only required when using notification suppression - which
we don't use in the s390-ccw bios here)

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 pc-bios/s390-ccw/virtio-net.c | 1 +
 pc-bios/s390-ccw/virtio.c     | 1 +
 pc-bios/s390-ccw/virtio.h     | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/pc-bios/s390-ccw/virtio-net.c b/pc-bios/s390-ccw/virtio-net.c
index 2fcb0a58c5..25598a7a97 100644
--- a/pc-bios/s390-ccw/virtio-net.c
+++ b/pc-bios/s390-ccw/virtio-net.c
@@ -127,6 +127,7 @@ int recv(int fd, void *buf, int maxlen, int flags)
 
     /* Mark buffer as available to the host again */
     rxvq->avail->ring[rxvq->avail->idx % rxvq->num] = id;
+    virtio_mb();
     rxvq->avail->idx = rxvq->avail->idx + 1;
     vring_notify(rxvq);
 
diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c
index ab49840db8..fb9687f9b3 100644
--- a/pc-bios/s390-ccw/virtio.c
+++ b/pc-bios/s390-ccw/virtio.c
@@ -154,6 +154,7 @@ void vring_send_buf(VRing *vr, void *p, int len, int flags)
 
     /* Chains only have a single ID */
     if (!(flags & VRING_DESC_F_NEXT)) {
+        virtio_mb();
         vr->avail->idx++;
     }
 }
diff --git a/pc-bios/s390-ccw/virtio.h b/pc-bios/s390-ccw/virtio.h
index 19fceb6495..6ac65482a9 100644
--- a/pc-bios/s390-ccw/virtio.h
+++ b/pc-bios/s390-ccw/virtio.h
@@ -271,6 +271,8 @@ struct VirtioCmd {
 };
 typedef struct VirtioCmd VirtioCmd;
 
+#define virtio_mb()  asm volatile("bcr 14,0" : : : "memory")
+
 bool vring_notify(VRing *vr);
 int drain_irqs(SubChannelId schid);
 void vring_send_buf(VRing *vr, void *p, int len, int flags);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code
  2021-02-16 11:00 [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code Thomas Huth
@ 2021-02-16 11:43 ` Peter Maydell
  2021-02-16 11:47   ` Thomas Huth
  2021-02-16 11:47 ` Cornelia Huck
  2021-02-16 14:40 ` Halil Pasic
  2 siblings, 1 reply; 18+ messages in thread
From: Peter Maydell @ 2021-02-16 11:43 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Christian Borntraeger, qemu-s390x, Cornelia Huck,
	Richard Henderson, QEMU Developers

On Tue, 16 Feb 2021 at 11:01, Thomas Huth <thuth@redhat.com> wrote:
> According to the virtio specification, a memory barrier should be
> used before incrementing the idx field in the "available" ring.
> So far, we did not do this in the s390-ccw bios yet, but recently
> Peter Maydell saw problems with the s390-ccw bios when running
> the qtests on an aarch64 host (the bios panic'ed with the message:
> "SCSI cannot report LUNs: response VS RESP=09"), which could
> maybe be related to the missing memory barriers. Thus let's add
> those barriers now. Since we've only seen the problem on TCG so far,
> a "bcr 14,0" should be sufficient here to trigger the tcg_gen_mb()
> in the TCG translate code.
>
> (Note: The virtio spec also talks about using a memory barrier
> *after* incrementing the idx field, but if I understood correctly
> this is only required when using notification suppression - which
> we don't use in the s390-ccw bios here)
>
> Signed-off-by: Thomas Huth <thuth@redhat.com>

This noticeably increases the reliability of the test for me:
it goes from failing one time in 2 or 3 to failing about one time
in 5 or 6. However it does still hit the virtio_scsi_parse_req()
check on "out_size && in_size" eventually. So my guess is that this
is not a sufficient supply of barriers. I'm firmly not a virtio expert
but my reading of the spec suggested that some of the way the
s390-ccw code is doing things might be in the wrong order and
need restructuring beyond merely "add barriers to existing code".

thanks
-- PMM


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code
  2021-02-16 11:43 ` Peter Maydell
@ 2021-02-16 11:47   ` Thomas Huth
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Huth @ 2021-02-16 11:47 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Christian Borntraeger, qemu-s390x, Cornelia Huck,
	Richard Henderson, QEMU Developers

On 16/02/2021 12.43, Peter Maydell wrote:
> On Tue, 16 Feb 2021 at 11:01, Thomas Huth <thuth@redhat.com> wrote:
>> According to the virtio specification, a memory barrier should be
>> used before incrementing the idx field in the "available" ring.
>> So far, we did not do this in the s390-ccw bios yet, but recently
>> Peter Maydell saw problems with the s390-ccw bios when running
>> the qtests on an aarch64 host (the bios panic'ed with the message:
>> "SCSI cannot report LUNs: response VS RESP=09"), which could
>> maybe be related to the missing memory barriers. Thus let's add
>> those barriers now. Since we've only seen the problem on TCG so far,
>> a "bcr 14,0" should be sufficient here to trigger the tcg_gen_mb()
>> in the TCG translate code.
>>
>> (Note: The virtio spec also talks about using a memory barrier
>> *after* incrementing the idx field, but if I understood correctly
>> this is only required when using notification suppression - which
>> we don't use in the s390-ccw bios here)
>>
>> Signed-off-by: Thomas Huth <thuth@redhat.com>
> 
> This noticeably increases the reliability of the test for me:
> it goes from failing one time in 2 or 3 to failing about one time
> in 5 or 6. However it does still hit the virtio_scsi_parse_req()
> check on "out_size && in_size" eventually. So my guess is that this
> is not a sufficient supply of barriers. I'm firmly not a virtio expert
> but my reading of the spec suggested that some of the way the
> s390-ccw code is doing things might be in the wrong order and
> need restructuring beyond merely "add barriers to existing code".

I've had a look at the virtio 1.0 spec and looked for "barrier" there, and 
these two spots were the only ones that I identified that were definitely 
missing. The other spots in the spec that talk about memory barriers rather 
seem to be related to notification suppression, which we are not doing here, 
if I've got that right. So no clue what else could be going wrong here on 
your aarch64 host...

  Thomas



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code
  2021-02-16 11:00 [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code Thomas Huth
  2021-02-16 11:43 ` Peter Maydell
@ 2021-02-16 11:47 ` Cornelia Huck
  2021-02-16 14:21   ` Thomas Huth
  2021-02-16 14:40 ` Halil Pasic
  2 siblings, 1 reply; 18+ messages in thread
From: Cornelia Huck @ 2021-02-16 11:47 UTC (permalink / raw)
  To: Thomas Huth; +Cc: Christian Borntraeger, qemu-s390x, qemu-devel, Peter Maydell

On Tue, 16 Feb 2021 12:00:56 +0100
Thomas Huth <thuth@redhat.com> wrote:

> According to the virtio specification, a memory barrier should be
> used before incrementing the idx field in the "available" ring.
> So far, we did not do this in the s390-ccw bios yet, but recently
> Peter Maydell saw problems with the s390-ccw bios when running
> the qtests on an aarch64 host (the bios panic'ed with the message:
> "SCSI cannot report LUNs: response VS RESP=09"), which could
> maybe be related to the missing memory barriers. Thus let's add
> those barriers now. Since we've only seen the problem on TCG so far,
> a "bcr 14,0" should be sufficient here to trigger the tcg_gen_mb()
> in the TCG translate code.
> 
> (Note: The virtio spec also talks about using a memory barrier
> *after* incrementing the idx field, but if I understood correctly
> this is only required when using notification suppression - which
> we don't use in the s390-ccw bios here)
> 
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  pc-bios/s390-ccw/virtio-net.c | 1 +
>  pc-bios/s390-ccw/virtio.c     | 1 +
>  pc-bios/s390-ccw/virtio.h     | 2 ++
>  3 files changed, 4 insertions(+)
> 
> diff --git a/pc-bios/s390-ccw/virtio-net.c b/pc-bios/s390-ccw/virtio-net.c
> index 2fcb0a58c5..25598a7a97 100644
> --- a/pc-bios/s390-ccw/virtio-net.c
> +++ b/pc-bios/s390-ccw/virtio-net.c
> @@ -127,6 +127,7 @@ int recv(int fd, void *buf, int maxlen, int flags)
>  
>      /* Mark buffer as available to the host again */
>      rxvq->avail->ring[rxvq->avail->idx % rxvq->num] = id;
> +    virtio_mb();
>      rxvq->avail->idx = rxvq->avail->idx + 1;
>      vring_notify(rxvq);
>  
> diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c
> index ab49840db8..fb9687f9b3 100644
> --- a/pc-bios/s390-ccw/virtio.c
> +++ b/pc-bios/s390-ccw/virtio.c
> @@ -154,6 +154,7 @@ void vring_send_buf(VRing *vr, void *p, int len, int flags)
>  
>      /* Chains only have a single ID */
>      if (!(flags & VRING_DESC_F_NEXT)) {
> +        virtio_mb();

I think you need to also need barriers for changes to the buffers, as
the spec talks about "manipulating the descriptor table".

>          vr->avail->idx++;
>      }
>  }
> diff --git a/pc-bios/s390-ccw/virtio.h b/pc-bios/s390-ccw/virtio.h
> index 19fceb6495..6ac65482a9 100644
> --- a/pc-bios/s390-ccw/virtio.h
> +++ b/pc-bios/s390-ccw/virtio.h
> @@ -271,6 +271,8 @@ struct VirtioCmd {
>  };
>  typedef struct VirtioCmd VirtioCmd;
>  
> +#define virtio_mb()  asm volatile("bcr 14,0" : : : "memory")

The bios is built for z900, so you probably need a bcr15 here?

> +
>  bool vring_notify(VRing *vr);
>  int drain_irqs(SubChannelId schid);
>  void vring_send_buf(VRing *vr, void *p, int len, int flags);



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code
  2021-02-16 11:47 ` Cornelia Huck
@ 2021-02-16 14:21   ` Thomas Huth
  2021-02-16 14:30     ` Cornelia Huck
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Huth @ 2021-02-16 14:21 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Christian Borntraeger, qemu-s390x, qemu-devel, Peter Maydell

On 16/02/2021 12.47, Cornelia Huck wrote:
> On Tue, 16 Feb 2021 12:00:56 +0100
> Thomas Huth <thuth@redhat.com> wrote:
> 
>> According to the virtio specification, a memory barrier should be
>> used before incrementing the idx field in the "available" ring.
>> So far, we did not do this in the s390-ccw bios yet, but recently
>> Peter Maydell saw problems with the s390-ccw bios when running
>> the qtests on an aarch64 host (the bios panic'ed with the message:
>> "SCSI cannot report LUNs: response VS RESP=09"), which could
>> maybe be related to the missing memory barriers. Thus let's add
>> those barriers now. Since we've only seen the problem on TCG so far,
>> a "bcr 14,0" should be sufficient here to trigger the tcg_gen_mb()
>> in the TCG translate code.
>>
>> (Note: The virtio spec also talks about using a memory barrier
>> *after* incrementing the idx field, but if I understood correctly
>> this is only required when using notification suppression - which
>> we don't use in the s390-ccw bios here)
>>
>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>> ---
>>   pc-bios/s390-ccw/virtio-net.c | 1 +
>>   pc-bios/s390-ccw/virtio.c     | 1 +
>>   pc-bios/s390-ccw/virtio.h     | 2 ++
>>   3 files changed, 4 insertions(+)
>>
>> diff --git a/pc-bios/s390-ccw/virtio-net.c b/pc-bios/s390-ccw/virtio-net.c
>> index 2fcb0a58c5..25598a7a97 100644
>> --- a/pc-bios/s390-ccw/virtio-net.c
>> +++ b/pc-bios/s390-ccw/virtio-net.c
>> @@ -127,6 +127,7 @@ int recv(int fd, void *buf, int maxlen, int flags)
>>   
>>       /* Mark buffer as available to the host again */
>>       rxvq->avail->ring[rxvq->avail->idx % rxvq->num] = id;
>> +    virtio_mb();
>>       rxvq->avail->idx = rxvq->avail->idx + 1;
>>       vring_notify(rxvq);
>>   
>> diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c
>> index ab49840db8..fb9687f9b3 100644
>> --- a/pc-bios/s390-ccw/virtio.c
>> +++ b/pc-bios/s390-ccw/virtio.c
>> @@ -154,6 +154,7 @@ void vring_send_buf(VRing *vr, void *p, int len, int flags)
>>   
>>       /* Chains only have a single ID */
>>       if (!(flags & VRING_DESC_F_NEXT)) {
>> +        virtio_mb();
> 
> I think you need to also need barriers for changes to the buffers, as
> the spec talks about "manipulating the descriptor table".

Which paragraph in the virtio spec are you refering to here? I can't find 
that part right now...

>>           vr->avail->idx++;
>>       }
>>   }
>> diff --git a/pc-bios/s390-ccw/virtio.h b/pc-bios/s390-ccw/virtio.h
>> index 19fceb6495..6ac65482a9 100644
>> --- a/pc-bios/s390-ccw/virtio.h
>> +++ b/pc-bios/s390-ccw/virtio.h
>> @@ -271,6 +271,8 @@ struct VirtioCmd {
>>   };
>>   typedef struct VirtioCmd VirtioCmd;
>>   
>> +#define virtio_mb()  asm volatile("bcr 14,0" : : : "memory")
> 
> The bios is built for z900, so you probably need a bcr15 here?

I thought about that, too, but for TCG, it currently should not matter since 
both, 14 and 15, end up with the same code in op_bc() in 
target/s390x/translate.c. And on a real host, we've never seen this problem 
to occur, so it should not matter there, too. But if you prefer (e.g. in 
case somebody tweaks the TCG implementation one day), I can also switch to 
bcr15 instead.

  Thomas



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code
  2021-02-16 14:21   ` Thomas Huth
@ 2021-02-16 14:30     ` Cornelia Huck
  2021-02-16 14:32       ` Peter Maydell
  0 siblings, 1 reply; 18+ messages in thread
From: Cornelia Huck @ 2021-02-16 14:30 UTC (permalink / raw)
  To: Thomas Huth; +Cc: Christian Borntraeger, qemu-s390x, qemu-devel, Peter Maydell

On Tue, 16 Feb 2021 15:21:45 +0100
Thomas Huth <thuth@redhat.com> wrote:

> On 16/02/2021 12.47, Cornelia Huck wrote:
> > On Tue, 16 Feb 2021 12:00:56 +0100
> > Thomas Huth <thuth@redhat.com> wrote:
> >   
> >> According to the virtio specification, a memory barrier should be
> >> used before incrementing the idx field in the "available" ring.
> >> So far, we did not do this in the s390-ccw bios yet, but recently
> >> Peter Maydell saw problems with the s390-ccw bios when running
> >> the qtests on an aarch64 host (the bios panic'ed with the message:
> >> "SCSI cannot report LUNs: response VS RESP=09"), which could
> >> maybe be related to the missing memory barriers. Thus let's add
> >> those barriers now. Since we've only seen the problem on TCG so far,
> >> a "bcr 14,0" should be sufficient here to trigger the tcg_gen_mb()
> >> in the TCG translate code.
> >>
> >> (Note: The virtio spec also talks about using a memory barrier
> >> *after* incrementing the idx field, but if I understood correctly
> >> this is only required when using notification suppression - which
> >> we don't use in the s390-ccw bios here)
> >>
> >> Signed-off-by: Thomas Huth <thuth@redhat.com>
> >> ---
> >>   pc-bios/s390-ccw/virtio-net.c | 1 +
> >>   pc-bios/s390-ccw/virtio.c     | 1 +
> >>   pc-bios/s390-ccw/virtio.h     | 2 ++
> >>   3 files changed, 4 insertions(+)
> >>
> >> diff --git a/pc-bios/s390-ccw/virtio-net.c b/pc-bios/s390-ccw/virtio-net.c
> >> index 2fcb0a58c5..25598a7a97 100644
> >> --- a/pc-bios/s390-ccw/virtio-net.c
> >> +++ b/pc-bios/s390-ccw/virtio-net.c
> >> @@ -127,6 +127,7 @@ int recv(int fd, void *buf, int maxlen, int flags)
> >>   
> >>       /* Mark buffer as available to the host again */
> >>       rxvq->avail->ring[rxvq->avail->idx % rxvq->num] = id;
> >> +    virtio_mb();
> >>       rxvq->avail->idx = rxvq->avail->idx + 1;
> >>       vring_notify(rxvq);
> >>   
> >> diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c
> >> index ab49840db8..fb9687f9b3 100644
> >> --- a/pc-bios/s390-ccw/virtio.c
> >> +++ b/pc-bios/s390-ccw/virtio.c
> >> @@ -154,6 +154,7 @@ void vring_send_buf(VRing *vr, void *p, int len, int flags)
> >>   
> >>       /* Chains only have a single ID */
> >>       if (!(flags & VRING_DESC_F_NEXT)) {
> >> +        virtio_mb();  
> > 
> > I think you need to also need barriers for changes to the buffers, as
> > the spec talks about "manipulating the descriptor table".  
> 
> Which paragraph in the virtio spec are you refering to here? I can't find 
> that part right now...

Step 4 in "2.7.13 Supplying Buffers to The Device":

"The driver performs a suitable memory barrier to ensure the device
sees the updated descriptor table and available ring before the next
step."

> 
> >>           vr->avail->idx++;
> >>       }
> >>   }
> >> diff --git a/pc-bios/s390-ccw/virtio.h b/pc-bios/s390-ccw/virtio.h
> >> index 19fceb6495..6ac65482a9 100644
> >> --- a/pc-bios/s390-ccw/virtio.h
> >> +++ b/pc-bios/s390-ccw/virtio.h
> >> @@ -271,6 +271,8 @@ struct VirtioCmd {
> >>   };
> >>   typedef struct VirtioCmd VirtioCmd;
> >>   
> >> +#define virtio_mb()  asm volatile("bcr 14,0" : : : "memory")  
> > 
> > The bios is built for z900, so you probably need a bcr15 here?  
> 
> I thought about that, too, but for TCG, it currently should not matter since 
> both, 14 and 15, end up with the same code in op_bc() in 
> target/s390x/translate.c. And on a real host, we've never seen this problem 
> to occur, so it should not matter there, too. But if you prefer (e.g. in 
> case somebody tweaks the TCG implementation one day), I can also switch to 
> bcr15 instead.

OK, if they are both implemented with the same code, it should not
really matter. We don't run on any hardware that doesn't support bcr14
anyway.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code
  2021-02-16 14:30     ` Cornelia Huck
@ 2021-02-16 14:32       ` Peter Maydell
  2021-02-16 14:35         ` Thomas Huth
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Maydell @ 2021-02-16 14:32 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Christian Borntraeger, Thomas Huth, qemu-s390x, QEMU Developers

On Tue, 16 Feb 2021 at 14:30, Cornelia Huck <cohuck@redhat.com> wrote:
> Step 4 in "2.7.13 Supplying Buffers to The Device":
>
> "The driver performs a suitable memory barrier to ensure the device
> sees the updated descriptor table and available ring before the next
> step."

I thought that my first time through the spec as well, but
I think the whole of section 2.7 is dealing with "packed virtqueues",
which have to be explicitly negotiated and which I don't think
the s390-ccw code is doing.

thnaks
-- PMM


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code
  2021-02-16 14:32       ` Peter Maydell
@ 2021-02-16 14:35         ` Thomas Huth
  2021-02-16 14:37           ` Peter Maydell
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Huth @ 2021-02-16 14:35 UTC (permalink / raw)
  To: Peter Maydell, Cornelia Huck
  Cc: Christian Borntraeger, qemu-s390x, QEMU Developers

On 16/02/2021 15.32, Peter Maydell wrote:
> On Tue, 16 Feb 2021 at 14:30, Cornelia Huck <cohuck@redhat.com> wrote:
>> Step 4 in "2.7.13 Supplying Buffers to The Device":
>>
>> "The driver performs a suitable memory barrier to ensure the device
>> sees the updated descriptor table and available ring before the next
>> step."
> 
> I thought that my first time through the spec as well, but
> I think the whole of section 2.7 is dealing with "packed virtqueues",
> which have to be explicitly negotiated and which I don't think
> the s390-ccw code is doing.

Right. I think the s390-ccw code is still based on virtio v1.0, that's why I 
also only looked at that version of the spec.

  Thomas



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code
  2021-02-16 14:35         ` Thomas Huth
@ 2021-02-16 14:37           ` Peter Maydell
  2021-02-16 14:49             ` Cornelia Huck
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Maydell @ 2021-02-16 14:37 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Christian Borntraeger, qemu-s390x, Cornelia Huck, QEMU Developers

On Tue, 16 Feb 2021 at 14:35, Thomas Huth <thuth@redhat.com> wrote:
>
> On 16/02/2021 15.32, Peter Maydell wrote:
> > On Tue, 16 Feb 2021 at 14:30, Cornelia Huck <cohuck@redhat.com> wrote:
> >> Step 4 in "2.7.13 Supplying Buffers to The Device":
> >>
> >> "The driver performs a suitable memory barrier to ensure the device
> >> sees the updated descriptor table and available ring before the next
> >> step."
> >
> > I thought that my first time through the spec as well, but
> > I think the whole of section 2.7 is dealing with "packed virtqueues",
> > which have to be explicitly negotiated and which I don't think
> > the s390-ccw code is doing.
>
> Right. I think the s390-ccw code is still based on virtio v1.0, that's why I
> also only looked at that version of the spec.

I think the ideal would be to find somebody who's really well
acquainted with the virtio spec (MST?) and ask them to have
a quick look-over the s390-ccw code to say where it needs
changes... The fact that this patch doesn't completely fix
the bug leaves me suspicious that we're missing something in
our readings of the spec.

-- PMM


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code
  2021-02-16 11:00 [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code Thomas Huth
  2021-02-16 11:43 ` Peter Maydell
  2021-02-16 11:47 ` Cornelia Huck
@ 2021-02-16 14:40 ` Halil Pasic
  2021-02-16 15:17   ` Cornelia Huck
  2021-02-16 16:15   ` Thomas Huth
  2 siblings, 2 replies; 18+ messages in thread
From: Halil Pasic @ 2021-02-16 14:40 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Christian Borntraeger, qemu-s390x, Cornelia Huck, qemu-devel,
	Peter Maydell

On Tue, 16 Feb 2021 12:00:56 +0100
Thomas Huth <thuth@redhat.com> wrote:

> According to the virtio specification, a memory barrier should be
> used before incrementing the idx field in the "available" ring.
> So far, we did not do this in the s390-ccw bios yet, but recently
> Peter Maydell saw problems with the s390-ccw bios when running
> the qtests on an aarch64 host (the bios panic'ed with the message:
> "SCSI cannot report LUNs: response VS RESP=09"), which could
> maybe be related to the missing memory barriers. Thus let's add
> those barriers now. Since we've only seen the problem on TCG so far,
> a "bcr 14,0" should be sufficient here to trigger the tcg_gen_mb()
> in the TCG translate code.
> 
> (Note: The virtio spec also talks about using a memory barrier
> *after* incrementing the idx field, but if I understood correctly
> this is only required when using notification suppression - which
> we don't use in the s390-ccw bios here)

I suggest to the barrier after incrementing the idx field for two
reasons. First: If the device were to see the notification, but
not see the incremented idx field, it would effectively loose
initiative. That is pretty straight forward, because the
notification just says 'check out that queue', and if we don't
see the incremented index, miss the buffer that was made available
by incrementing idx.

Second: We are in the bios, and I hope even an unnecessary barrier
would not hurt us significantly.

Conny, what do you think?

Regards,
Halil


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code
  2021-02-16 14:37           ` Peter Maydell
@ 2021-02-16 14:49             ` Cornelia Huck
  0 siblings, 0 replies; 18+ messages in thread
From: Cornelia Huck @ 2021-02-16 14:49 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Christian Borntraeger, Thomas Huth, qemu-s390x, QEMU Developers

On Tue, 16 Feb 2021 14:37:22 +0000
Peter Maydell <peter.maydell@linaro.org> wrote:

> On Tue, 16 Feb 2021 at 14:35, Thomas Huth <thuth@redhat.com> wrote:
> >
> > On 16/02/2021 15.32, Peter Maydell wrote:  
> > > On Tue, 16 Feb 2021 at 14:30, Cornelia Huck <cohuck@redhat.com> wrote:  
> > >> Step 4 in "2.7.13 Supplying Buffers to The Device":
> > >>
> > >> "The driver performs a suitable memory barrier to ensure the device
> > >> sees the updated descriptor table and available ring before the next
> > >> step."  
> > >
> > > I thought that my first time through the spec as well, but
> > > I think the whole of section 2.7 is dealing with "packed virtqueues",
> > > which have to be explicitly negotiated and which I don't think
> > > the s390-ccw code is doing.  

2.7 is split, 2.8 packed (at least in my built-from-git version; maybe
I should have mentioned that :)

> >
> > Right. I think the s390-ccw code is still based on virtio v1.0, that's why I
> > also only looked at that version of the spec.  

Yes, the bios code is using split.

> I think the ideal would be to find somebody who's really well
> acquainted with the virtio spec (MST?) and ask them to have
> a quick look-over the s390-ccw code to say where it needs
> changes... The fact that this patch doesn't completely fix
> the bug leaves me suspicious that we're missing something in
> our readings of the spec.

Unfortunately, the bios virtio code is not that easy to follow :/



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code
  2021-02-16 14:40 ` Halil Pasic
@ 2021-02-16 15:17   ` Cornelia Huck
  2021-02-16 16:15   ` Thomas Huth
  1 sibling, 0 replies; 18+ messages in thread
From: Cornelia Huck @ 2021-02-16 15:17 UTC (permalink / raw)
  To: Halil Pasic
  Cc: qemu-s390x, Christian Borntraeger, Thomas Huth, qemu-devel,
	Peter Maydell

On Tue, 16 Feb 2021 15:40:10 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Tue, 16 Feb 2021 12:00:56 +0100
> Thomas Huth <thuth@redhat.com> wrote:
> 
> > According to the virtio specification, a memory barrier should be
> > used before incrementing the idx field in the "available" ring.
> > So far, we did not do this in the s390-ccw bios yet, but recently
> > Peter Maydell saw problems with the s390-ccw bios when running
> > the qtests on an aarch64 host (the bios panic'ed with the message:
> > "SCSI cannot report LUNs: response VS RESP=09"), which could
> > maybe be related to the missing memory barriers. Thus let's add
> > those barriers now. Since we've only seen the problem on TCG so far,
> > a "bcr 14,0" should be sufficient here to trigger the tcg_gen_mb()
> > in the TCG translate code.
> > 
> > (Note: The virtio spec also talks about using a memory barrier
> > *after* incrementing the idx field, but if I understood correctly
> > this is only required when using notification suppression - which
> > we don't use in the s390-ccw bios here)  
> 
> I suggest to the barrier after incrementing the idx field for two
> reasons. First: If the device were to see the notification, but
> not see the incremented idx field, it would effectively loose
> initiative. That is pretty straight forward, because the
> notification just says 'check out that queue', and if we don't
> see the incremented index, miss the buffer that was made available
> by incrementing idx.
> 
> Second: We are in the bios, and I hope even an unnecessary barrier
> would not hurt us significantly.

Yes to both, I think that is worth a try.

> 
> Conny, what do you think?
> 
> Regards,
> Halil
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code
  2021-02-16 14:40 ` Halil Pasic
  2021-02-16 15:17   ` Cornelia Huck
@ 2021-02-16 16:15   ` Thomas Huth
  2021-02-16 16:44     ` Peter Maydell
                       ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Thomas Huth @ 2021-02-16 16:15 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Christian Borntraeger, qemu-s390x, Cornelia Huck, qemu-devel,
	Peter Maydell

On 16/02/2021 15.40, Halil Pasic wrote:
> On Tue, 16 Feb 2021 12:00:56 +0100
> Thomas Huth <thuth@redhat.com> wrote:
> 
>> According to the virtio specification, a memory barrier should be
>> used before incrementing the idx field in the "available" ring.
>> So far, we did not do this in the s390-ccw bios yet, but recently
>> Peter Maydell saw problems with the s390-ccw bios when running
>> the qtests on an aarch64 host (the bios panic'ed with the message:
>> "SCSI cannot report LUNs: response VS RESP=09"), which could
>> maybe be related to the missing memory barriers. Thus let's add
>> those barriers now. Since we've only seen the problem on TCG so far,
>> a "bcr 14,0" should be sufficient here to trigger the tcg_gen_mb()
>> in the TCG translate code.
>>
>> (Note: The virtio spec also talks about using a memory barrier
>> *after* incrementing the idx field, but if I understood correctly
>> this is only required when using notification suppression - which
>> we don't use in the s390-ccw bios here)
> 
> I suggest to the barrier after incrementing the idx field for two
> reasons. First: If the device were to see the notification, but
> not see the incremented idx field, it would effectively loose
> initiative. That is pretty straight forward, because the
> notification just says 'check out that queue', and if we don't
> see the incremented index, miss the buffer that was made available
> by incrementing idx.

I was just about to reply that this is certainly not necessary, since
the DIAGNOSE instruction that we use for the notification hypercall
should be serializing anyway ... but after looking at the PoP, it
actually is not marked as a serializing instruction! (while e.g.
SVC - supervisor call - is explicitly marked as serializing)

So maybe that's worth a try: Peter, could you please apply this patch
on top an see whether it makes a difference?

diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c
--- a/pc-bios/s390-ccw/virtio.c
+++ b/pc-bios/s390-ccw/virtio.c
@@ -54,6 +54,7 @@ static long kvm_hypercall(unsigned long nr, unsigned long param1,
      register ulong r_param3 asm("4") = param3;
      register long retval asm("2");
  
+    virtio_mb();
      asm volatile ("diag 2,4,0x500"
                    : "=d" (retval)
                    : "d" (r_nr), "0" (r_param1), "r"(r_param2), "d"(r_param3)

  Thanks,
   Thomas



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code
  2021-02-16 16:15   ` Thomas Huth
@ 2021-02-16 16:44     ` Peter Maydell
  2021-02-16 17:11     ` Halil Pasic
  2021-02-17  4:31     ` Richard Henderson
  2 siblings, 0 replies; 18+ messages in thread
From: Peter Maydell @ 2021-02-16 16:44 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Halil Pasic, Christian Borntraeger, qemu-s390x, Cornelia Huck,
	QEMU Developers

On Tue, 16 Feb 2021 at 16:15, Thomas Huth <thuth@redhat.com> wrote:
> I was just about to reply that this is certainly not necessary, since
> the DIAGNOSE instruction that we use for the notification hypercall
> should be serializing anyway ... but after looking at the PoP, it
> actually is not marked as a serializing instruction! (while e.g.
> SVC - supervisor call - is explicitly marked as serializing)
>
> So maybe that's worth a try: Peter, could you please apply this patch
> on top an see whether it makes a difference?
>
> diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c
> --- a/pc-bios/s390-ccw/virtio.c
> +++ b/pc-bios/s390-ccw/virtio.c
> @@ -54,6 +54,7 @@ static long kvm_hypercall(unsigned long nr, unsigned long param1,
>       register ulong r_param3 asm("4") = param3;
>       register long retval asm("2");
>
> +    virtio_mb();
>       asm volatile ("diag 2,4,0x500"
>                     : "=d" (retval)
>                     : "d" (r_nr), "0" (r_param1), "r"(r_param2), "d"(r_param3)

Doesn't really help (maybe brings the occurrence rate down a bit
more, towards about 1 in 9?)

-- PMM


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code
  2021-02-16 16:15   ` Thomas Huth
  2021-02-16 16:44     ` Peter Maydell
@ 2021-02-16 17:11     ` Halil Pasic
  2021-02-17  4:31     ` Richard Henderson
  2 siblings, 0 replies; 18+ messages in thread
From: Halil Pasic @ 2021-02-16 17:11 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Christian Borntraeger, qemu-s390x, Cornelia Huck, qemu-devel,
	Peter Maydell

On Tue, 16 Feb 2021 17:15:19 +0100
Thomas Huth <thuth@redhat.com> wrote:

> >> (Note: The virtio spec also talks about using a memory barrier
> >> *after* incrementing the idx field, but if I understood correctly
> >> this is only required when using notification suppression - which
> >> we don't use in the s390-ccw bios here)  
> > 
> > I suggest to the barrier after incrementing the idx field for two
> > reasons. First: If the device were to see the notification, but
> > not see the incremented idx field, it would effectively loose
> > initiative. That is pretty straight forward, because the
> > notification just says 'check out that queue', and if we don't
> > see the incremented index, miss the buffer that was made available
> > by incrementing idx.  
> 
> I was just about to reply that this is certainly not necessary, since
> the DIAGNOSE instruction that we use for the notification hypercall
> should be serializing anyway ... but after looking at the PoP, it
> actually is not marked as a serializing instruction! (while e.g.
> SVC - supervisor call - is explicitly marked as serializing)

I was talking about whether a suitable barrier is needed, from
virtio perspective. How the suitable barrier is put in place is a
different issue.

Regards,
Halil


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code
  2021-02-16 16:15   ` Thomas Huth
  2021-02-16 16:44     ` Peter Maydell
  2021-02-16 17:11     ` Halil Pasic
@ 2021-02-17  4:31     ` Richard Henderson
  2021-02-17 11:15       ` Peter Maydell
  2 siblings, 1 reply; 18+ messages in thread
From: Richard Henderson @ 2021-02-17  4:31 UTC (permalink / raw)
  To: Thomas Huth, Halil Pasic
  Cc: Christian Borntraeger, qemu-s390x, Cornelia Huck, qemu-devel,
	Peter Maydell

On 2/16/21 8:15 AM, Thomas Huth wrote:
> On 16/02/2021 15.40, Halil Pasic wrote:
>> On Tue, 16 Feb 2021 12:00:56 +0100
>> Thomas Huth <thuth@redhat.com> wrote:
>>
>>> According to the virtio specification, a memory barrier should be
>>> used before incrementing the idx field in the "available" ring.
>>> So far, we did not do this in the s390-ccw bios yet, but recently
>>> Peter Maydell saw problems with the s390-ccw bios when running
>>> the qtests on an aarch64 host (the bios panic'ed with the message:
>>> "SCSI cannot report LUNs: response VS RESP=09"), which could
>>> maybe be related to the missing memory barriers. Thus let's add
>>> those barriers now. Since we've only seen the problem on TCG so far,
>>> a "bcr 14,0" should be sufficient here to trigger the tcg_gen_mb()
>>> in the TCG translate code.
>>>
>>> (Note: The virtio spec also talks about using a memory barrier
>>> *after* incrementing the idx field, but if I understood correctly
>>> this is only required when using notification suppression - which
>>> we don't use in the s390-ccw bios here)
>>
>> I suggest to the barrier after incrementing the idx field for two
>> reasons. First: If the device were to see the notification, but
>> not see the incremented idx field, it would effectively loose
>> initiative. That is pretty straight forward, because the
>> notification just says 'check out that queue', and if we don't
>> see the incremented index, miss the buffer that was made available
>> by incrementing idx.
> 
> I was just about to reply that this is certainly not necessary, since
> the DIAGNOSE instruction that we use for the notification hypercall
> should be serializing anyway ... but after looking at the PoP, it
> actually is not marked as a serializing instruction! (while e.g.
> SVC - supervisor call - is explicitly marked as serializing)
> 
> So maybe that's worth a try: Peter, could you please apply this patch
> on top an see whether it makes a difference?
> 
> diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c
> --- a/pc-bios/s390-ccw/virtio.c
> +++ b/pc-bios/s390-ccw/virtio.c
> @@ -54,6 +54,7 @@ static long kvm_hypercall(unsigned long nr, unsigned long
> param1,
>      register ulong r_param3 asm("4") = param3;
>      register long retval asm("2");
>  
> +    virtio_mb();
>      asm volatile ("diag 2,4,0x500"
>                    : "=d" (retval)
>                    : "d" (r_nr), "0" (r_param1), "r"(r_param2), "d"(r_param3)

This is patching the wrong thing, IMO.  You're adding barriers to the guest
that I think ought not be architecturally required -- memory accesses on s390x
are strongly ordered, with or without diag/svc/whatever.

With

diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 1376cdc404..3c5f38be62 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1622,6 +1622,8 @@ static void tcg_out_tlb_read
     TCGType mask_type;
     uint64_t compare_mask;

+    tcg_out_mb(s, TCG_MO_ALL);
+
     mask_type = (TARGET_PAGE_BITS + CPU_TLB_DYN_MAX_BITS > 32
                  ? TCG_TYPE_I64 : TCG_TYPE_I32);

which is a gigantic hammer, adding a host barrier before every qemu guest
access, I can no longer provoke a failure (previously visible 1 in 4, now no
failures in 100).

With that as a data point for success, I'm going to try to use host
load-acquire / store-release instructions, and then apply TCG_GUEST_DEFAULT_MO
and see if I can find something that works reasonably.


r~


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code
  2021-02-17  4:31     ` Richard Henderson
@ 2021-02-17 11:15       ` Peter Maydell
  2021-02-17 15:11         ` Richard Henderson
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Maydell @ 2021-02-17 11:15 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Thomas Huth, Cornelia Huck, QEMU Developers, Halil Pasic,
	Christian Borntraeger, qemu-s390x

On Wed, 17 Feb 2021 at 04:31, Richard Henderson
<richard.henderson@linaro.org> wrote:
> On 2/16/21 8:15 AM, Thomas Huth wrote:
> With
>
> diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
> index 1376cdc404..3c5f38be62 100644
> --- a/tcg/aarch64/tcg-target.c.inc
> +++ b/tcg/aarch64/tcg-target.c.inc
> @@ -1622,6 +1622,8 @@ static void tcg_out_tlb_read
>      TCGType mask_type;
>      uint64_t compare_mask;
>
> +    tcg_out_mb(s, TCG_MO_ALL);
> +
>      mask_type = (TARGET_PAGE_BITS + CPU_TLB_DYN_MAX_BITS > 32
>                   ? TCG_TYPE_I64 : TCG_TYPE_I32);
>
> which is a gigantic hammer, adding a host barrier before every qemu guest
> access, I can no longer provoke a failure (previously visible 1 in 4, now no
> failures in 100).
>
> With that as a data point for success, I'm going to try to use host
> load-acquire / store-release instructions, and then apply TCG_GUEST_DEFAULT_MO
> and see if I can find something that works reasonably.

This isn't aarch64-host-specific, though, is it? It's going to be
the situation for any host with a relaxed memory model. Do we really
want to make all loads and stores lower-performance by adding in
the ldacq/strel (or worse, barriers everywhere on host archs without
ldacq/strel)? I feel like there ought to be an alternate approach
involving using some kind of exclusion to ensure that we don't run
the iothreads in parallel with the vCPU thread if we're using the
non-MTTCG setup where all the vCPUs are on a single thread, and that
that's probably less of a perf hit.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code
  2021-02-17 11:15       ` Peter Maydell
@ 2021-02-17 15:11         ` Richard Henderson
  0 siblings, 0 replies; 18+ messages in thread
From: Richard Henderson @ 2021-02-17 15:11 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Thomas Huth, Cornelia Huck, QEMU Developers, Halil Pasic,
	Christian Borntraeger, qemu-s390x

On 2/17/21 3:15 AM, Peter Maydell wrote:
> This isn't aarch64-host-specific, though, is it? It's going to be
> the situation for any host with a relaxed memory model.

Yes.  I intend to make the code-generation changes generic.

> Do we really
> want to make all loads and stores lower-performance by adding in
> the ldacq/strel (or worse, barriers everywhere on host archs without
> ldacq/strel)?

Well, yes.  But then we get to enable mttcg too.

> I feel like there ought to be an alternate approach
> involving using some kind of exclusion to ensure that we don't run
> the iothreads in parallel with the vCPU thread if we're using the
> non-MTTCG setup where all the vCPUs are on a single thread, and that
> that's probably less of a perf hit.

I don't know where to put such a block, do you?

The memory barriers are a perf hit with -smp 1, but I would think that all that
and more are recoverable by not having to run -smp 2 serially.


r~


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-02-17 15:12 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-16 11:00 [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code Thomas Huth
2021-02-16 11:43 ` Peter Maydell
2021-02-16 11:47   ` Thomas Huth
2021-02-16 11:47 ` Cornelia Huck
2021-02-16 14:21   ` Thomas Huth
2021-02-16 14:30     ` Cornelia Huck
2021-02-16 14:32       ` Peter Maydell
2021-02-16 14:35         ` Thomas Huth
2021-02-16 14:37           ` Peter Maydell
2021-02-16 14:49             ` Cornelia Huck
2021-02-16 14:40 ` Halil Pasic
2021-02-16 15:17   ` Cornelia Huck
2021-02-16 16:15   ` Thomas Huth
2021-02-16 16:44     ` Peter Maydell
2021-02-16 17:11     ` Halil Pasic
2021-02-17  4:31     ` Richard Henderson
2021-02-17 11:15       ` Peter Maydell
2021-02-17 15:11         ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.