All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Gavin Shan <gshan@redhat.com>
Cc: Will Deacon <will@kernel.org>,
	virtualization@lists.linux.dev, linux-kernel@vger.kernel.org,
	jasowang@redhat.com, xuanzhuo@linux.alibaba.com,
	yihyu@redhat.com, shan.gavin@gmail.com,
	linux-arm-kernel@lists.infradead.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	mochs@nvidia.com
Subject: Re: [PATCH] virtio_ring: Fix the stale index in available ring
Date: Tue, 19 Mar 2024 20:49:40 -0400	[thread overview]
Message-ID: <20240319203540-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <9500adaf-0075-4ae9-92db-7e310b6598b0@redhat.com>

On Wed, Mar 20, 2024 at 09:56:58AM +1000, Gavin Shan wrote:
> On 3/20/24 04:22, Will Deacon wrote:
> > On Tue, Mar 19, 2024 at 02:59:23PM +1000, Gavin Shan wrote:
> > > On 3/19/24 02:59, Will Deacon wrote:
> > > > >    drivers/virtio/virtio_ring.c | 12 +++++++++---
> > > > >    1 file changed, 9 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > index 49299b1f9ec7..7d852811c912 100644
> > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > @@ -687,9 +687,15 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > >    	avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
> > > > >    	vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
> > > > > -	/* Descriptors and available array need to be set before we expose the
> > > > > -	 * new available array entries. */
> > > > > -	virtio_wmb(vq->weak_barriers);
> > > > > +	/*
> > > > > +	 * Descriptors and available array need to be set before we expose
> > > > > +	 * the new available array entries. virtio_wmb() should be enough
> > > > > +	 * to ensuere the order theoretically. However, a stronger barrier
> > > > > +	 * is needed by ARM64. Otherwise, the stale data can be observed
> > > > > +	 * by the host (vhost). A stronger barrier should work for other
> > > > > +	 * architectures, but performance loss is expected.
> > > > > +	 */
> > > > > +	virtio_mb(false);
> > > > >    	vq->split.avail_idx_shadow++;
> > > > >    	vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
> > > > >    						vq->split.avail_idx_shadow);
> > > > 
> > > > Replacing a DMB with a DSB is _very_ unlikely to be the correct solution
> > > > here, especially when ordering accesses to coherent memory.
> > > > 
> > > > In practice, either the larger timing different from the DSB or the fact
> > > > that you're going from a Store->Store barrier to a full barrier is what
> > > > makes things "work" for you. Have you tried, for example, a DMB SY
> > > > (e.g. via __smb_mb()).
> > > > 
> > > > We definitely shouldn't take changes like this without a proper
> > > > explanation of what is going on.
> > > > 
> > > 
> > > Thanks for your comments, Will.
> > > 
> > > Yes, DMB should work for us. However, it seems this instruction has issues on
> > > NVidia's grace-hopper. It's hard for me to understand how DMB and DSB works
> > > from hardware level. I agree it's not the solution to replace DMB with DSB
> > > before we fully understand the root cause.
> > > 
> > > I tried the possible replacement like below. __smp_mb() can avoid the issue like
> > > __mb() does. __ndelay(10) can avoid the issue, but __ndelay(9) doesn't.
> > > 
> > > static inline int virtqueue_add_split(struct virtqueue *_vq, ...)
> > > {
> > >      :
> > >          /* Put entry in available array (but don't update avail->idx until they
> > >           * do sync). */
> > >          avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
> > >          vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
> > > 
> > >          /* Descriptors and available array need to be set before we expose the
> > >           * new available array entries. */
> > >          // Broken: virtio_wmb(vq->weak_barriers);
> > >          // Broken: __dma_mb();
> > >          // Work:   __mb();
> > >          // Work:   __smp_mb();
> > 
> > It's pretty weird that __dma_mb() is "broken" but __smp_mb() "works". How
> > confident are you in that result?
> > 
> 
> Yes, __dma_mb() is even stronger than __smp_mb(). I retried the test, showing
> that both __dma_mb() and __smp_mb() work for us. I had too many tests yesterday
> and something may have been messed up.
> 
> Instruction         Hitting times in 10 tests
> ---------------------------------------------
> __smp_wmb()         8
> __smp_mb()          0
> __dma_wmb()         7
> __dma_mb()          0
> __mb()              0
> __wmb()             0
> 
> It's strange that __smp_mb() works, but __smp_wmb() fails. It seems we need a
> read barrier here. I will try WRITE_ONCE() + __smp_wmb() as suggested by Michael
> in another reply. Will update the result soon.
> 
> Thanks,
> Gavin


I think you are wasting the time with these tests. Even if it helps what
does this tell us? Try setting a flag as I suggested elsewhere.
Then check it in vhost.
Or here's another idea - possibly easier. Copy the high bits from index
into ring itself. Then vhost can check that head is synchronized with
index.

Warning: completely untested, not even compiled. But should give you
the idea. If this works btw we should consider making this official in
the spec.


 static inline int vhost_get_avail_flags(struct vhost_virtqueue *vq,
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 6f7e5010a673..79456706d0bd 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -685,7 +685,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	/* Put entry in available array (but don't update avail->idx until they
 	 * do sync). */
 	avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
-	vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
+	u16 headwithflag = head | (q->split.avail_idx_shadow & ~(vq->split.vring.num - 1));
+	vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, headwithflag);
 
 	/* Descriptors and available array need to be set before we expose the
 	 * new available array entries. */

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 045f666b4f12..bd8f7c763caa 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1299,8 +1299,15 @@ static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq,
 static inline int vhost_get_avail_head(struct vhost_virtqueue *vq,
 				       __virtio16 *head, int idx)
 {
-	return vhost_get_avail(vq, *head,
+	unsigned i = idx;
+	unsigned flag = i & ~(vq->num - 1);
+	unsigned val = vhost_get_avail(vq, *head,
 			       &vq->avail->ring[idx & (vq->num - 1)]);
+	unsigned valflag = val & ~(vq->num - 1);
+
+	WARN_ON(valflag != flag);
+
+	return val & (vq->num - 1);
 }
 
-- 
MST


WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Gavin Shan <gshan@redhat.com>
Cc: Will Deacon <will@kernel.org>,
	virtualization@lists.linux.dev, linux-kernel@vger.kernel.org,
	jasowang@redhat.com, xuanzhuo@linux.alibaba.com,
	yihyu@redhat.com, shan.gavin@gmail.com,
	linux-arm-kernel@lists.infradead.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	mochs@nvidia.com
Subject: Re: [PATCH] virtio_ring: Fix the stale index in available ring
Date: Tue, 19 Mar 2024 20:49:40 -0400	[thread overview]
Message-ID: <20240319203540-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <9500adaf-0075-4ae9-92db-7e310b6598b0@redhat.com>

On Wed, Mar 20, 2024 at 09:56:58AM +1000, Gavin Shan wrote:
> On 3/20/24 04:22, Will Deacon wrote:
> > On Tue, Mar 19, 2024 at 02:59:23PM +1000, Gavin Shan wrote:
> > > On 3/19/24 02:59, Will Deacon wrote:
> > > > >    drivers/virtio/virtio_ring.c | 12 +++++++++---
> > > > >    1 file changed, 9 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > index 49299b1f9ec7..7d852811c912 100644
> > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > @@ -687,9 +687,15 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > >    	avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
> > > > >    	vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
> > > > > -	/* Descriptors and available array need to be set before we expose the
> > > > > -	 * new available array entries. */
> > > > > -	virtio_wmb(vq->weak_barriers);
> > > > > +	/*
> > > > > +	 * Descriptors and available array need to be set before we expose
> > > > > +	 * the new available array entries. virtio_wmb() should be enough
> > > > > +	 * to ensuere the order theoretically. However, a stronger barrier
> > > > > +	 * is needed by ARM64. Otherwise, the stale data can be observed
> > > > > +	 * by the host (vhost). A stronger barrier should work for other
> > > > > +	 * architectures, but performance loss is expected.
> > > > > +	 */
> > > > > +	virtio_mb(false);
> > > > >    	vq->split.avail_idx_shadow++;
> > > > >    	vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
> > > > >    						vq->split.avail_idx_shadow);
> > > > 
> > > > Replacing a DMB with a DSB is _very_ unlikely to be the correct solution
> > > > here, especially when ordering accesses to coherent memory.
> > > > 
> > > > In practice, either the larger timing different from the DSB or the fact
> > > > that you're going from a Store->Store barrier to a full barrier is what
> > > > makes things "work" for you. Have you tried, for example, a DMB SY
> > > > (e.g. via __smb_mb()).
> > > > 
> > > > We definitely shouldn't take changes like this without a proper
> > > > explanation of what is going on.
> > > > 
> > > 
> > > Thanks for your comments, Will.
> > > 
> > > Yes, DMB should work for us. However, it seems this instruction has issues on
> > > NVidia's grace-hopper. It's hard for me to understand how DMB and DSB works
> > > from hardware level. I agree it's not the solution to replace DMB with DSB
> > > before we fully understand the root cause.
> > > 
> > > I tried the possible replacement like below. __smp_mb() can avoid the issue like
> > > __mb() does. __ndelay(10) can avoid the issue, but __ndelay(9) doesn't.
> > > 
> > > static inline int virtqueue_add_split(struct virtqueue *_vq, ...)
> > > {
> > >      :
> > >          /* Put entry in available array (but don't update avail->idx until they
> > >           * do sync). */
> > >          avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
> > >          vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
> > > 
> > >          /* Descriptors and available array need to be set before we expose the
> > >           * new available array entries. */
> > >          // Broken: virtio_wmb(vq->weak_barriers);
> > >          // Broken: __dma_mb();
> > >          // Work:   __mb();
> > >          // Work:   __smp_mb();
> > 
> > It's pretty weird that __dma_mb() is "broken" but __smp_mb() "works". How
> > confident are you in that result?
> > 
> 
> Yes, __dma_mb() is even stronger than __smp_mb(). I retried the test, showing
> that both __dma_mb() and __smp_mb() work for us. I had too many tests yesterday
> and something may have been messed up.
> 
> Instruction         Hitting times in 10 tests
> ---------------------------------------------
> __smp_wmb()         8
> __smp_mb()          0
> __dma_wmb()         7
> __dma_mb()          0
> __mb()              0
> __wmb()             0
> 
> It's strange that __smp_mb() works, but __smp_wmb() fails. It seems we need a
> read barrier here. I will try WRITE_ONCE() + __smp_wmb() as suggested by Michael
> in another reply. Will update the result soon.
> 
> Thanks,
> Gavin


I think you are wasting the time with these tests. Even if it helps what
does this tell us? Try setting a flag as I suggested elsewhere.
Then check it in vhost.
Or here's another idea - possibly easier. Copy the high bits from index
into ring itself. Then vhost can check that head is synchronized with
index.

Warning: completely untested, not even compiled. But should give you
the idea. If this works btw we should consider making this official in
the spec.


 static inline int vhost_get_avail_flags(struct vhost_virtqueue *vq,
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 6f7e5010a673..79456706d0bd 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -685,7 +685,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	/* Put entry in available array (but don't update avail->idx until they
 	 * do sync). */
 	avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
-	vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
+	u16 headwithflag = head | (q->split.avail_idx_shadow & ~(vq->split.vring.num - 1));
+	vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, headwithflag);
 
 	/* Descriptors and available array need to be set before we expose the
 	 * new available array entries. */

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 045f666b4f12..bd8f7c763caa 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1299,8 +1299,15 @@ static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq,
 static inline int vhost_get_avail_head(struct vhost_virtqueue *vq,
 				       __virtio16 *head, int idx)
 {
-	return vhost_get_avail(vq, *head,
+	unsigned i = idx;
+	unsigned flag = i & ~(vq->num - 1);
+	unsigned val = vhost_get_avail(vq, *head,
 			       &vq->avail->ring[idx & (vq->num - 1)]);
+	unsigned valflag = val & ~(vq->num - 1);
+
+	WARN_ON(valflag != flag);
+
+	return val & (vq->num - 1);
 }
 
-- 
MST


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2024-03-20  0:49 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-14  7:49 [PATCH] virtio_ring: Fix the stale index in available ring Gavin Shan
2024-03-14  8:05 ` Michael S. Tsirkin
2024-03-14 10:15   ` Gavin Shan
2024-03-14 11:50     ` Michael S. Tsirkin
2024-03-14 12:50       ` Gavin Shan
2024-03-14 12:59         ` Michael S. Tsirkin
2024-03-15 10:45           ` Gavin Shan
2024-03-15 10:45             ` Gavin Shan
2024-03-15 11:05             ` Michael S. Tsirkin
2024-03-15 11:05               ` Michael S. Tsirkin
2024-03-15 11:24               ` Gavin Shan
2024-03-15 11:24                 ` Gavin Shan
2024-03-17 16:50                 ` Michael S. Tsirkin
2024-03-17 16:50                   ` Michael S. Tsirkin
2024-03-17 23:41                   ` Gavin Shan
2024-03-17 23:41                     ` Gavin Shan
2024-03-18  7:50                     ` Michael S. Tsirkin
2024-03-18  7:50                       ` Michael S. Tsirkin
2024-03-18 16:59 ` Will Deacon
2024-03-19  4:59   ` Gavin Shan
2024-03-19  4:59     ` Gavin Shan
2024-03-19  6:09     ` Michael S. Tsirkin
2024-03-19  6:09       ` Michael S. Tsirkin
2024-03-19  6:10       ` Michael S. Tsirkin
2024-03-19  6:10         ` Michael S. Tsirkin
2024-03-19  6:54         ` Gavin Shan
2024-03-19  6:54           ` Gavin Shan
2024-03-19  7:04           ` Michael S. Tsirkin
2024-03-19  7:04             ` Michael S. Tsirkin
2024-03-19  7:41             ` Gavin Shan
2024-03-19  7:41               ` Gavin Shan
2024-03-19  8:28           ` Michael S. Tsirkin
2024-03-19  8:28             ` Michael S. Tsirkin
2024-03-19  6:38       ` Gavin Shan
2024-03-19  6:38         ` Gavin Shan
2024-03-19  6:43         ` Michael S. Tsirkin
2024-03-19  6:43           ` Michael S. Tsirkin
2024-03-19  6:49           ` Gavin Shan
2024-03-19  6:49             ` Gavin Shan
2024-03-19  7:09             ` Michael S. Tsirkin
2024-03-19  7:09               ` Michael S. Tsirkin
2024-03-19  8:08               ` Gavin Shan
2024-03-19  8:08                 ` Gavin Shan
2024-03-19  8:49                 ` Michael S. Tsirkin
2024-03-19  8:49                   ` Michael S. Tsirkin
2024-03-19 18:22     ` Will Deacon
2024-03-19 18:22       ` Will Deacon
2024-03-19 23:56       ` Gavin Shan
2024-03-19 23:56         ` Gavin Shan
2024-03-20  0:49         ` Michael S. Tsirkin [this message]
2024-03-20  0:49           ` Michael S. Tsirkin
2024-03-20  5:24           ` Gavin Shan
2024-03-20  5:24             ` Gavin Shan
2024-03-20  7:14             ` Michael S. Tsirkin
2024-03-20  7:14               ` Michael S. Tsirkin
2024-03-25  7:34               ` Gavin Shan
2024-03-25  7:34                 ` Gavin Shan
2024-03-26  7:49                 ` Michael S. Tsirkin
2024-03-26  7:49                   ` Michael S. Tsirkin
2024-03-26  9:38                   ` Keir Fraser
2024-03-26  9:38                     ` Keir Fraser
2024-03-26 11:43                     ` Will Deacon
2024-03-26 11:43                       ` Will Deacon
2024-03-26 15:46                       ` Will Deacon
2024-03-26 15:46                         ` Will Deacon
2024-03-26 23:14                         ` Gavin Shan
2024-03-26 23:14                           ` Gavin Shan
2024-03-27  0:01                           ` Gavin Shan
2024-03-27  0:01                             ` Gavin Shan
2024-03-27 11:56                         ` Michael S. Tsirkin
2024-03-27 11:56                           ` Michael S. Tsirkin
2024-03-20 17:15             ` Keir Fraser
2024-03-20 17:15               ` Keir Fraser
2024-03-21 12:06               ` Gavin Shan
2024-03-21 12:06                 ` Gavin Shan
2024-03-19  7:36   ` Michael S. Tsirkin
2024-03-19 18:21     ` Will Deacon
2024-03-19  6:14 ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240319203540-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=gshan@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mochs@nvidia.com \
    --cc=shan.gavin@gmail.com \
    --cc=virtualization@lists.linux.dev \
    --cc=will@kernel.org \
    --cc=xuanzhuo@linux.alibaba.com \
    --cc=yihyu@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.