All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Gavin Shan <gshan@redhat.com>
Cc: virtualization@lists.linux.dev, linux-kernel@vger.kernel.org,
	jasowang@redhat.com, xuanzhuo@linux.alibaba.com,
	yihyu@redhat.com, shan.gavin@gmail.com
Subject: Re: [PATCH] virtio_ring: Fix the stale index in available ring
Date: Thu, 14 Mar 2024 07:50:20 -0400	[thread overview]
Message-ID: <20240314074216-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <9b148de7-b687-4d10-b177-5608b8dc7046@redhat.com>

On Thu, Mar 14, 2024 at 08:15:22PM +1000, Gavin Shan wrote:
> On 3/14/24 18:05, Michael S. Tsirkin wrote:
> > On Thu, Mar 14, 2024 at 05:49:23PM +1000, Gavin Shan wrote:
> > > The issue is reported by Yihuang Yu who have 'netperf' test on
> > > NVidia's grace-grace and grace-hopper machines. The 'netperf'
> > > client is started in the VM hosted by grace-hopper machine,
> > > while the 'netperf' server is running on grace-grace machine.
> > > 
> > > The VM is started with virtio-net and vhost has been enabled.
> > > We observe a error message spew from VM and then soft-lockup
> > > report. The error message indicates the data associated with
> > > the descriptor (index: 135) has been released, and the queue
> > > is marked as broken. It eventually leads to the endless effort
> > > to fetch free buffer (skb) in drivers/net/virtio_net.c::start_xmit()
> > > and soft-lockup. The stale index 135 is fetched from the available
> > > ring and published to the used ring by vhost, meaning we have
> > > disordred write to the available ring element and available index.
> > > 
> > >    /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64              \
> > >    -accel kvm -machine virt,gic-version=host                            \
> > >       :                                                                 \
> > >    -netdev tap,id=vnet0,vhost=on                                        \
> > >    -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0 \
> > > 
> > >    [   19.993158] virtio_net virtio1: output.0:id 135 is not a head!
> > > 
> > > Fix the issue by replacing virtio_wmb(vq->weak_barriers) with stronger
> > > virtio_mb(false), equivalent to replaced 'dmb' by 'dsb' instruction on
> > > ARM64. It should work for other architectures, but performance loss is
> > > expected.
> > > 
> > > Cc: stable@vger.kernel.org
> > > Reported-by: Yihuang Yu <yihyu@redhat.com>
> > > Signed-off-by: Gavin Shan <gshan@redhat.com>
> > > ---
> > >   drivers/virtio/virtio_ring.c | 12 +++++++++---
> > >   1 file changed, 9 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 49299b1f9ec7..7d852811c912 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -687,9 +687,15 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > >   	avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
> > >   	vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
> > > -	/* Descriptors and available array need to be set before we expose the
> > > -	 * new available array entries. */
> > > -	virtio_wmb(vq->weak_barriers);
> > > +	/*
> > > +	 * Descriptors and available array need to be set before we expose
> > > +	 * the new available array entries. virtio_wmb() should be enough
> > > +	 * to ensuere the order theoretically. However, a stronger barrier
> > > +	 * is needed by ARM64. Otherwise, the stale data can be observed
> > > +	 * by the host (vhost). A stronger barrier should work for other
> > > +	 * architectures, but performance loss is expected.
> > > +	 */
> > > +	virtio_mb(false);
> > 
> > 
> > I don't get what is going on here. Any explanation why virtio_wmb is not
> > enough besides "it does not work"?
> > 
> 
> The change is replacing instruction "dmb" with "dsb". "dsb" is stronger barrier
> than "dmb" because "dsb" ensures that all memory accesses raised before this
> instruction is completed when the 'dsb' instruction completes. However, "dmb"
> doesn't guarantee the order of completion of the memory accesses.
>
> So 'vq->split.vring.avail->idx = cpu_to_virtio(_vq->vdev, vq->split.avail_idx_shadow)'
> can be completed before 'vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head)'.

Completed as observed by which CPU?
We have 2 writes that we want observed by another CPU in order.
So if CPU observes a new value of idx we want it to see
new value in ring.
This is standard use of smp_wmb()
How are these 2 writes different?

What DMB does, is that is seems to ensure that effects
of 'vq->split.vring.avail->idx = cpu_to_virtio(_vq->vdev, vq->split.avail_idx_shadow)'
are observed after effects of
'vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head)'.




> The stronger barrier 'dsb' ensures the completion order as we expected.
> 
>     virtio_wmb(true)         virt_mb(false)
>       virt_wmb                 mb
>         __smp_wmb               __mb
>           dmb(ishst)              dsb(sy)

First, why would you want a non smp barrier when you are
synchronizing with another CPU? This is what smp_ means ...


> Extraced from ARMv9 specificaton
> ================================
> The DMB instruction is a memory barrier instruction that ensures the relative
> order of memory accesses before the barrier with memory accesses after the
> barrier. The DMB instruction _does not_ ensure the completion of any of the
> memory accesses for which it ensures relative order.

Isn't this exactly what we need?

> A DSB instruction is a memory barrier that ensures that memory accesses that
> occur before the DSB instruction have __completed__ before the completion of
> the DSB instruction.

This seems appropriate for when you want to order things more
strongly. I do not get why it's necessary here.

> In doing this, it acts as a stronger barrier than a DMB
> and all ordering that is created by a DMB with specific options is also generated
> by a DSB with the same options.
> 
> > >   	vq->split.avail_idx_shadow++;
> > >   	vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
> > >   						vq->split.avail_idx_shadow);
> > > -- 
> > > 2.44.0
> > 
> 
> Thanks,
> Gavin


  reply	other threads:[~2024-03-14 11:50 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-14  7:49 [PATCH] virtio_ring: Fix the stale index in available ring Gavin Shan
2024-03-14  8:05 ` Michael S. Tsirkin
2024-03-14 10:15   ` Gavin Shan
2024-03-14 11:50     ` Michael S. Tsirkin [this message]
2024-03-14 12:50       ` Gavin Shan
2024-03-14 12:59         ` Michael S. Tsirkin
2024-03-15 10:45           ` Gavin Shan
2024-03-15 10:45             ` Gavin Shan
2024-03-15 11:05             ` Michael S. Tsirkin
2024-03-15 11:05               ` Michael S. Tsirkin
2024-03-15 11:24               ` Gavin Shan
2024-03-15 11:24                 ` Gavin Shan
2024-03-17 16:50                 ` Michael S. Tsirkin
2024-03-17 16:50                   ` Michael S. Tsirkin
2024-03-17 23:41                   ` Gavin Shan
2024-03-17 23:41                     ` Gavin Shan
2024-03-18  7:50                     ` Michael S. Tsirkin
2024-03-18  7:50                       ` Michael S. Tsirkin
2024-03-18 16:59 ` Will Deacon
2024-03-19  4:59   ` Gavin Shan
2024-03-19  4:59     ` Gavin Shan
2024-03-19  6:09     ` Michael S. Tsirkin
2024-03-19  6:09       ` Michael S. Tsirkin
2024-03-19  6:10       ` Michael S. Tsirkin
2024-03-19  6:10         ` Michael S. Tsirkin
2024-03-19  6:54         ` Gavin Shan
2024-03-19  6:54           ` Gavin Shan
2024-03-19  7:04           ` Michael S. Tsirkin
2024-03-19  7:04             ` Michael S. Tsirkin
2024-03-19  7:41             ` Gavin Shan
2024-03-19  7:41               ` Gavin Shan
2024-03-19  8:28           ` Michael S. Tsirkin
2024-03-19  8:28             ` Michael S. Tsirkin
2024-03-19  6:38       ` Gavin Shan
2024-03-19  6:38         ` Gavin Shan
2024-03-19  6:43         ` Michael S. Tsirkin
2024-03-19  6:43           ` Michael S. Tsirkin
2024-03-19  6:49           ` Gavin Shan
2024-03-19  6:49             ` Gavin Shan
2024-03-19  7:09             ` Michael S. Tsirkin
2024-03-19  7:09               ` Michael S. Tsirkin
2024-03-19  8:08               ` Gavin Shan
2024-03-19  8:08                 ` Gavin Shan
2024-03-19  8:49                 ` Michael S. Tsirkin
2024-03-19  8:49                   ` Michael S. Tsirkin
2024-03-19 18:22     ` Will Deacon
2024-03-19 18:22       ` Will Deacon
2024-03-19 23:56       ` Gavin Shan
2024-03-19 23:56         ` Gavin Shan
2024-03-20  0:49         ` Michael S. Tsirkin
2024-03-20  0:49           ` Michael S. Tsirkin
2024-03-20  5:24           ` Gavin Shan
2024-03-20  5:24             ` Gavin Shan
2024-03-20  7:14             ` Michael S. Tsirkin
2024-03-20  7:14               ` Michael S. Tsirkin
2024-03-25  7:34               ` Gavin Shan
2024-03-25  7:34                 ` Gavin Shan
2024-03-26  7:49                 ` Michael S. Tsirkin
2024-03-26  7:49                   ` Michael S. Tsirkin
2024-03-26  9:38                   ` Keir Fraser
2024-03-26  9:38                     ` Keir Fraser
2024-03-26 11:43                     ` Will Deacon
2024-03-26 11:43                       ` Will Deacon
2024-03-26 15:46                       ` Will Deacon
2024-03-26 15:46                         ` Will Deacon
2024-03-26 23:14                         ` Gavin Shan
2024-03-26 23:14                           ` Gavin Shan
2024-03-27  0:01                           ` Gavin Shan
2024-03-27  0:01                             ` Gavin Shan
2024-03-27 11:56                         ` Michael S. Tsirkin
2024-03-27 11:56                           ` Michael S. Tsirkin
2024-03-20 17:15             ` Keir Fraser
2024-03-20 17:15               ` Keir Fraser
2024-03-21 12:06               ` Gavin Shan
2024-03-21 12:06                 ` Gavin Shan
2024-03-19  7:36   ` Michael S. Tsirkin
2024-03-19 18:21     ` Will Deacon
2024-03-19  6:14 ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240314074216-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=gshan@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shan.gavin@gmail.com \
    --cc=virtualization@lists.linux.dev \
    --cc=xuanzhuo@linux.alibaba.com \
    --cc=yihyu@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.