All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Will Deacon <will@kernel.org>
Cc: Keir Fraser <keirf@google.com>,
	gshan@redhat.com, virtualization@lists.linux.dev,
	linux-kernel@vger.kernel.org, jasowang@redhat.com,
	xuanzhuo@linux.alibaba.com, yihyu@redhat.com,
	shan.gavin@gmail.com, linux-arm-kernel@lists.infradead.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	mochs@nvidia.com
Subject: Re: [PATCH] virtio_ring: Fix the stale index in available ring
Date: Wed, 27 Mar 2024 07:56:58 -0400	[thread overview]
Message-ID: <20240327075049-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20240326154628.GA9613@willie-the-truck>

On Tue, Mar 26, 2024 at 03:46:29PM +0000, Will Deacon wrote:
> On Tue, Mar 26, 2024 at 11:43:13AM +0000, Will Deacon wrote:
> > On Tue, Mar 26, 2024 at 09:38:55AM +0000, Keir Fraser wrote:
> > > On Tue, Mar 26, 2024 at 03:49:02AM -0400, Michael S. Tsirkin wrote:
> > > > > Secondly, the debugging code is enhanced so that the available head for
> > > > > (last_avail_idx - 1) is read for twice and recorded. It means the available
> > > > > head for one specific available index is read for twice. I do see the
> > > > > available heads are different from the consecutive reads. More details
> > > > > are shared as below.
> > > > > 
> > > > > From the guest side
> > > > > ===================
> > > > > 
> > > > > virtio_net virtio0: output.0:id 86 is not a head!
> > > > > head to be released: 047 062 112
> > > > > 
> > > > > avail_idx:
> > > > > 000  49665
> > > > > 001  49666  <--
> > > > >  :
> > > > > 015  49664
> > > > 
> > > > what are these #s 49665 and so on?
> > > > and how large is the ring?
> > > > I am guessing 49664 is the index ring size is 16 and
> > > > 49664 % 16 == 0
> > > 
> > > More than that, 49664 % 256 == 0
> > > 
> > > So again there seems to be an error in the vicinity of roll-over of
> > > the idx low byte, as I observed in the earlier log. Surely this is
> > > more than coincidence?
> > 
> > Yeah, I'd still really like to see the disassembly for both sides of the
> > protocol here. Gavin, is that something you're able to provide? Worst
> > case, the host and guest vmlinux objects would be a starting point.
> > 
> > Personally, I'd be fairly surprised if this was a hardware issue.
> 
> Ok, long shot after eyeballing the vhost code, but does the diff below
> help at all? It looks like vhost_vq_avail_empty() can advance the value
> saved in 'vq->avail_idx' but without the read barrier, possibly confusing
> vhost_get_vq_desc() in polling mode.
> 
> Will
> 
> --->8
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 045f666b4f12..87bff710331a 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2801,6 +2801,7 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>                 return false;
>         vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
>  
> +       smp_rmb();
>         return vq->avail_idx == vq->last_avail_idx;
>  }
>  EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);

Oh wow you are right.

We have:

        if (vq->avail_idx == vq->last_avail_idx) {
                if (unlikely(vhost_get_avail_idx(vq, &avail_idx))) {
                        vq_err(vq, "Failed to access avail idx at %p\n",
                                &vq->avail->idx);
                        return -EFAULT;
                }
                vq->avail_idx = vhost16_to_cpu(vq, avail_idx);

                if (unlikely((u16)(vq->avail_idx - last_avail_idx) > vq->num)) {
                        vq_err(vq, "Guest moved used index from %u to %u",
                                last_avail_idx, vq->avail_idx);
                        return -EFAULT;
                }

                /* If there's nothing new since last we looked, return
                 * invalid.
                 */
                if (vq->avail_idx == last_avail_idx)
                        return vq->num;

                /* Only get avail ring entries after they have been
                 * exposed by guest.
                 */
                smp_rmb();
        }


and so the rmb only happens if avail_idx is not advanced.

Actually there is a bunch of code duplication where we assign to
avail_idx, too.

Will thanks a lot for looking into this! I kept looking into
the virtio side for some reason, the fact that it did not
trigger with qemu should have been a big hint!


-- 
MST


WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Will Deacon <will@kernel.org>
Cc: Keir Fraser <keirf@google.com>,
	gshan@redhat.com, virtualization@lists.linux.dev,
	linux-kernel@vger.kernel.org, jasowang@redhat.com,
	xuanzhuo@linux.alibaba.com, yihyu@redhat.com,
	shan.gavin@gmail.com, linux-arm-kernel@lists.infradead.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	mochs@nvidia.com
Subject: Re: [PATCH] virtio_ring: Fix the stale index in available ring
Date: Wed, 27 Mar 2024 07:56:58 -0400	[thread overview]
Message-ID: <20240327075049-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20240326154628.GA9613@willie-the-truck>

On Tue, Mar 26, 2024 at 03:46:29PM +0000, Will Deacon wrote:
> On Tue, Mar 26, 2024 at 11:43:13AM +0000, Will Deacon wrote:
> > On Tue, Mar 26, 2024 at 09:38:55AM +0000, Keir Fraser wrote:
> > > On Tue, Mar 26, 2024 at 03:49:02AM -0400, Michael S. Tsirkin wrote:
> > > > > Secondly, the debugging code is enhanced so that the available head for
> > > > > (last_avail_idx - 1) is read for twice and recorded. It means the available
> > > > > head for one specific available index is read for twice. I do see the
> > > > > available heads are different from the consecutive reads. More details
> > > > > are shared as below.
> > > > > 
> > > > > From the guest side
> > > > > ===================
> > > > > 
> > > > > virtio_net virtio0: output.0:id 86 is not a head!
> > > > > head to be released: 047 062 112
> > > > > 
> > > > > avail_idx:
> > > > > 000  49665
> > > > > 001  49666  <--
> > > > >  :
> > > > > 015  49664
> > > > 
> > > > what are these #s 49665 and so on?
> > > > and how large is the ring?
> > > > I am guessing 49664 is the index ring size is 16 and
> > > > 49664 % 16 == 0
> > > 
> > > More than that, 49664 % 256 == 0
> > > 
> > > So again there seems to be an error in the vicinity of roll-over of
> > > the idx low byte, as I observed in the earlier log. Surely this is
> > > more than coincidence?
> > 
> > Yeah, I'd still really like to see the disassembly for both sides of the
> > protocol here. Gavin, is that something you're able to provide? Worst
> > case, the host and guest vmlinux objects would be a starting point.
> > 
> > Personally, I'd be fairly surprised if this was a hardware issue.
> 
> Ok, long shot after eyeballing the vhost code, but does the diff below
> help at all? It looks like vhost_vq_avail_empty() can advance the value
> saved in 'vq->avail_idx' but without the read barrier, possibly confusing
> vhost_get_vq_desc() in polling mode.
> 
> Will
> 
> --->8
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 045f666b4f12..87bff710331a 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2801,6 +2801,7 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>                 return false;
>         vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
>  
> +       smp_rmb();
>         return vq->avail_idx == vq->last_avail_idx;
>  }
>  EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);

Oh wow you are right.

We have:

        if (vq->avail_idx == vq->last_avail_idx) {
                if (unlikely(vhost_get_avail_idx(vq, &avail_idx))) {
                        vq_err(vq, "Failed to access avail idx at %p\n",
                                &vq->avail->idx);
                        return -EFAULT;
                }
                vq->avail_idx = vhost16_to_cpu(vq, avail_idx);

                if (unlikely((u16)(vq->avail_idx - last_avail_idx) > vq->num)) {
                        vq_err(vq, "Guest moved used index from %u to %u",
                                last_avail_idx, vq->avail_idx);
                        return -EFAULT;
                }

                /* If there's nothing new since last we looked, return
                 * invalid.
                 */
                if (vq->avail_idx == last_avail_idx)
                        return vq->num;

                /* Only get avail ring entries after they have been
                 * exposed by guest.
                 */
                smp_rmb();
        }


and so the rmb only happens if avail_idx is not advanced.

Actually there is a bunch of code duplication where we assign to
avail_idx, too.

Will thanks a lot for looking into this! I kept looking into
the virtio side for some reason, the fact that it did not
trigger with qemu should have been a big hint!


-- 
MST


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2024-03-27 11:57 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-14  7:49 [PATCH] virtio_ring: Fix the stale index in available ring Gavin Shan
2024-03-14  8:05 ` Michael S. Tsirkin
2024-03-14 10:15   ` Gavin Shan
2024-03-14 11:50     ` Michael S. Tsirkin
2024-03-14 12:50       ` Gavin Shan
2024-03-14 12:59         ` Michael S. Tsirkin
2024-03-15 10:45           ` Gavin Shan
2024-03-15 10:45             ` Gavin Shan
2024-03-15 11:05             ` Michael S. Tsirkin
2024-03-15 11:05               ` Michael S. Tsirkin
2024-03-15 11:24               ` Gavin Shan
2024-03-15 11:24                 ` Gavin Shan
2024-03-17 16:50                 ` Michael S. Tsirkin
2024-03-17 16:50                   ` Michael S. Tsirkin
2024-03-17 23:41                   ` Gavin Shan
2024-03-17 23:41                     ` Gavin Shan
2024-03-18  7:50                     ` Michael S. Tsirkin
2024-03-18  7:50                       ` Michael S. Tsirkin
2024-03-18 16:59 ` Will Deacon
2024-03-19  4:59   ` Gavin Shan
2024-03-19  4:59     ` Gavin Shan
2024-03-19  6:09     ` Michael S. Tsirkin
2024-03-19  6:09       ` Michael S. Tsirkin
2024-03-19  6:10       ` Michael S. Tsirkin
2024-03-19  6:10         ` Michael S. Tsirkin
2024-03-19  6:54         ` Gavin Shan
2024-03-19  6:54           ` Gavin Shan
2024-03-19  7:04           ` Michael S. Tsirkin
2024-03-19  7:04             ` Michael S. Tsirkin
2024-03-19  7:41             ` Gavin Shan
2024-03-19  7:41               ` Gavin Shan
2024-03-19  8:28           ` Michael S. Tsirkin
2024-03-19  8:28             ` Michael S. Tsirkin
2024-03-19  6:38       ` Gavin Shan
2024-03-19  6:38         ` Gavin Shan
2024-03-19  6:43         ` Michael S. Tsirkin
2024-03-19  6:43           ` Michael S. Tsirkin
2024-03-19  6:49           ` Gavin Shan
2024-03-19  6:49             ` Gavin Shan
2024-03-19  7:09             ` Michael S. Tsirkin
2024-03-19  7:09               ` Michael S. Tsirkin
2024-03-19  8:08               ` Gavin Shan
2024-03-19  8:08                 ` Gavin Shan
2024-03-19  8:49                 ` Michael S. Tsirkin
2024-03-19  8:49                   ` Michael S. Tsirkin
2024-03-19 18:22     ` Will Deacon
2024-03-19 18:22       ` Will Deacon
2024-03-19 23:56       ` Gavin Shan
2024-03-19 23:56         ` Gavin Shan
2024-03-20  0:49         ` Michael S. Tsirkin
2024-03-20  0:49           ` Michael S. Tsirkin
2024-03-20  5:24           ` Gavin Shan
2024-03-20  5:24             ` Gavin Shan
2024-03-20  7:14             ` Michael S. Tsirkin
2024-03-20  7:14               ` Michael S. Tsirkin
2024-03-25  7:34               ` Gavin Shan
2024-03-25  7:34                 ` Gavin Shan
2024-03-26  7:49                 ` Michael S. Tsirkin
2024-03-26  7:49                   ` Michael S. Tsirkin
2024-03-26  9:38                   ` Keir Fraser
2024-03-26  9:38                     ` Keir Fraser
2024-03-26 11:43                     ` Will Deacon
2024-03-26 11:43                       ` Will Deacon
2024-03-26 15:46                       ` Will Deacon
2024-03-26 15:46                         ` Will Deacon
2024-03-26 23:14                         ` Gavin Shan
2024-03-26 23:14                           ` Gavin Shan
2024-03-27  0:01                           ` Gavin Shan
2024-03-27  0:01                             ` Gavin Shan
2024-03-27 11:56                         ` Michael S. Tsirkin [this message]
2024-03-27 11:56                           ` Michael S. Tsirkin
2024-03-20 17:15             ` Keir Fraser
2024-03-20 17:15               ` Keir Fraser
2024-03-21 12:06               ` Gavin Shan
2024-03-21 12:06                 ` Gavin Shan
2024-03-19  7:36   ` Michael S. Tsirkin
2024-03-19 18:21     ` Will Deacon
2024-03-19  6:14 ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240327075049-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=gshan@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=keirf@google.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mochs@nvidia.com \
    --cc=shan.gavin@gmail.com \
    --cc=virtualization@lists.linux.dev \
    --cc=will@kernel.org \
    --cc=xuanzhuo@linux.alibaba.com \
    --cc=yihyu@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.