From: Gavin Shan <gshan@redhat.com> To: "Michael S. Tsirkin" <mst@redhat.com> Cc: Will Deacon <will@kernel.org>, virtualization@lists.linux.dev, linux-kernel@vger.kernel.org, jasowang@redhat.com, xuanzhuo@linux.alibaba.com, yihyu@redhat.com, shan.gavin@gmail.com, linux-arm-kernel@lists.infradead.org, Catalin Marinas <catalin.marinas@arm.com>, mochs@nvidia.com Subject: Re: [PATCH] virtio_ring: Fix the stale index in available ring Date: Tue, 19 Mar 2024 16:38:49 +1000 [thread overview] Message-ID: <9b3030d1-cb2c-4ce0-8b24-1074b616fc84@redhat.com> (raw) In-Reply-To: <20240319020905-mutt-send-email-mst@kernel.org> On 3/19/24 16:09, Michael S. Tsirkin wrote: >>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c >>>> index 49299b1f9ec7..7d852811c912 100644 >>>> --- a/drivers/virtio/virtio_ring.c >>>> +++ b/drivers/virtio/virtio_ring.c >>>> @@ -687,9 +687,15 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, >>>> avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1); >>>> vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head); >>>> - /* Descriptors and available array need to be set before we expose the >>>> - * new available array entries. */ >>>> - virtio_wmb(vq->weak_barriers); >>>> + /* >>>> + * Descriptors and available array need to be set before we expose >>>> + * the new available array entries. virtio_wmb() should be enough >>>> + * to ensuere the order theoretically. However, a stronger barrier >>>> + * is needed by ARM64. Otherwise, the stale data can be observed >>>> + * by the host (vhost). A stronger barrier should work for other >>>> + * architectures, but performance loss is expected. >>>> + */ >>>> + virtio_mb(false); >>>> vq->split.avail_idx_shadow++; >>>> vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev, >>>> vq->split.avail_idx_shadow); >>> >>> Replacing a DMB with a DSB is _very_ unlikely to be the correct solution >>> here, especially when ordering accesses to coherent memory. >>> >>> In practice, either the larger timing different from the DSB or the fact >>> that you're going from a Store->Store barrier to a full barrier is what >>> makes things "work" for you. Have you tried, for example, a DMB SY >>> (e.g. via __smb_mb()). >>> >>> We definitely shouldn't take changes like this without a proper >>> explanation of what is going on. >>> >> >> Thanks for your comments, Will. >> >> Yes, DMB should work for us. However, it seems this instruction has issues on >> NVidia's grace-hopper. It's hard for me to understand how DMB and DSB works >> from hardware level. I agree it's not the solution to replace DMB with DSB >> before we fully understand the root cause. >> >> I tried the possible replacement like below. __smp_mb() can avoid the issue like >> __mb() does. __ndelay(10) can avoid the issue, but __ndelay(9) doesn't. >> >> static inline int virtqueue_add_split(struct virtqueue *_vq, ...) >> { >> : >> /* Put entry in available array (but don't update avail->idx until they >> * do sync). */ >> avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1); >> vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head); >> >> /* Descriptors and available array need to be set before we expose the >> * new available array entries. */ >> // Broken: virtio_wmb(vq->weak_barriers); >> // Broken: __dma_mb(); >> // Work: __mb(); >> // Work: __smp_mb(); >> // Work: __ndelay(100); >> // Work: __ndelay(10); >> // Broken: __ndelay(9); >> >> vq->split.avail_idx_shadow++; >> vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev, >> vq->split.avail_idx_shadow); > > What if you stick __ndelay here? > /* Put entry in available array (but don't update avail->idx until they * do sync). */ avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1); vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head); /* Descriptors and available array need to be set before we expose the * new available array entries. */ virtio_wmb(vq->weak_barriers); vq->split.avail_idx_shadow++; vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->split.avail_idx_shadow); /* Try __ndelay(x) here as Michael suggested * * Work: __ndelay(200); possiblly make it hard to reproduce * Broken: __ndelay(100); * Broken: __ndelay(20); * Broken: __ndelay(10); */ __ndelay(200); > >> vq->num_added++; >> >> pr_debug("Added buffer head %i to %p\n", head, vq); >> END_USE(vq); >> : >> } >> >> I also tried to measure the consumed time for various barrier-relative instructions using >> ktime_get_ns() which should have consumed most of the time. __smb_mb() is slower than >> __smp_wmb() but faster than __mb() >> >> Instruction Range of used time in ns >> ---------------------------------------------- >> __smp_wmb() [32 1128032] >> __smp_mb() [32 1160096] >> __mb() [32 1162496] >> Thanks, Gavin
WARNING: multiple messages have this Message-ID (diff)
From: Gavin Shan <gshan@redhat.com> To: "Michael S. Tsirkin" <mst@redhat.com> Cc: Will Deacon <will@kernel.org>, virtualization@lists.linux.dev, linux-kernel@vger.kernel.org, jasowang@redhat.com, xuanzhuo@linux.alibaba.com, yihyu@redhat.com, shan.gavin@gmail.com, linux-arm-kernel@lists.infradead.org, Catalin Marinas <catalin.marinas@arm.com>, mochs@nvidia.com Subject: Re: [PATCH] virtio_ring: Fix the stale index in available ring Date: Tue, 19 Mar 2024 16:38:49 +1000 [thread overview] Message-ID: <9b3030d1-cb2c-4ce0-8b24-1074b616fc84@redhat.com> (raw) In-Reply-To: <20240319020905-mutt-send-email-mst@kernel.org> On 3/19/24 16:09, Michael S. Tsirkin wrote: >>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c >>>> index 49299b1f9ec7..7d852811c912 100644 >>>> --- a/drivers/virtio/virtio_ring.c >>>> +++ b/drivers/virtio/virtio_ring.c >>>> @@ -687,9 +687,15 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, >>>> avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1); >>>> vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head); >>>> - /* Descriptors and available array need to be set before we expose the >>>> - * new available array entries. */ >>>> - virtio_wmb(vq->weak_barriers); >>>> + /* >>>> + * Descriptors and available array need to be set before we expose >>>> + * the new available array entries. virtio_wmb() should be enough >>>> + * to ensuere the order theoretically. However, a stronger barrier >>>> + * is needed by ARM64. Otherwise, the stale data can be observed >>>> + * by the host (vhost). A stronger barrier should work for other >>>> + * architectures, but performance loss is expected. >>>> + */ >>>> + virtio_mb(false); >>>> vq->split.avail_idx_shadow++; >>>> vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev, >>>> vq->split.avail_idx_shadow); >>> >>> Replacing a DMB with a DSB is _very_ unlikely to be the correct solution >>> here, especially when ordering accesses to coherent memory. >>> >>> In practice, either the larger timing different from the DSB or the fact >>> that you're going from a Store->Store barrier to a full barrier is what >>> makes things "work" for you. Have you tried, for example, a DMB SY >>> (e.g. via __smb_mb()). >>> >>> We definitely shouldn't take changes like this without a proper >>> explanation of what is going on. >>> >> >> Thanks for your comments, Will. >> >> Yes, DMB should work for us. However, it seems this instruction has issues on >> NVidia's grace-hopper. It's hard for me to understand how DMB and DSB works >> from hardware level. I agree it's not the solution to replace DMB with DSB >> before we fully understand the root cause. >> >> I tried the possible replacement like below. __smp_mb() can avoid the issue like >> __mb() does. __ndelay(10) can avoid the issue, but __ndelay(9) doesn't. >> >> static inline int virtqueue_add_split(struct virtqueue *_vq, ...) >> { >> : >> /* Put entry in available array (but don't update avail->idx until they >> * do sync). */ >> avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1); >> vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head); >> >> /* Descriptors and available array need to be set before we expose the >> * new available array entries. */ >> // Broken: virtio_wmb(vq->weak_barriers); >> // Broken: __dma_mb(); >> // Work: __mb(); >> // Work: __smp_mb(); >> // Work: __ndelay(100); >> // Work: __ndelay(10); >> // Broken: __ndelay(9); >> >> vq->split.avail_idx_shadow++; >> vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev, >> vq->split.avail_idx_shadow); > > What if you stick __ndelay here? > /* Put entry in available array (but don't update avail->idx until they * do sync). */ avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1); vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head); /* Descriptors and available array need to be set before we expose the * new available array entries. */ virtio_wmb(vq->weak_barriers); vq->split.avail_idx_shadow++; vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->split.avail_idx_shadow); /* Try __ndelay(x) here as Michael suggested * * Work: __ndelay(200); possiblly make it hard to reproduce * Broken: __ndelay(100); * Broken: __ndelay(20); * Broken: __ndelay(10); */ __ndelay(200); > >> vq->num_added++; >> >> pr_debug("Added buffer head %i to %p\n", head, vq); >> END_USE(vq); >> : >> } >> >> I also tried to measure the consumed time for various barrier-relative instructions using >> ktime_get_ns() which should have consumed most of the time. __smb_mb() is slower than >> __smp_wmb() but faster than __mb() >> >> Instruction Range of used time in ns >> ---------------------------------------------- >> __smp_wmb() [32 1128032] >> __smp_mb() [32 1160096] >> __mb() [32 1162496] >> Thanks, Gavin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2024-03-19 6:38 UTC|newest] Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top 2024-03-14 7:49 [PATCH] virtio_ring: Fix the stale index in available ring Gavin Shan 2024-03-14 8:05 ` Michael S. Tsirkin 2024-03-14 10:15 ` Gavin Shan 2024-03-14 11:50 ` Michael S. Tsirkin 2024-03-14 12:50 ` Gavin Shan 2024-03-14 12:59 ` Michael S. Tsirkin 2024-03-15 10:45 ` Gavin Shan 2024-03-15 10:45 ` Gavin Shan 2024-03-15 11:05 ` Michael S. Tsirkin 2024-03-15 11:05 ` Michael S. Tsirkin 2024-03-15 11:24 ` Gavin Shan 2024-03-15 11:24 ` Gavin Shan 2024-03-17 16:50 ` Michael S. Tsirkin 2024-03-17 16:50 ` Michael S. Tsirkin 2024-03-17 23:41 ` Gavin Shan 2024-03-17 23:41 ` Gavin Shan 2024-03-18 7:50 ` Michael S. Tsirkin 2024-03-18 7:50 ` Michael S. Tsirkin 2024-03-18 16:59 ` Will Deacon 2024-03-19 4:59 ` Gavin Shan 2024-03-19 4:59 ` Gavin Shan 2024-03-19 6:09 ` Michael S. Tsirkin 2024-03-19 6:09 ` Michael S. Tsirkin 2024-03-19 6:10 ` Michael S. Tsirkin 2024-03-19 6:10 ` Michael S. Tsirkin 2024-03-19 6:54 ` Gavin Shan 2024-03-19 6:54 ` Gavin Shan 2024-03-19 7:04 ` Michael S. Tsirkin 2024-03-19 7:04 ` Michael S. Tsirkin 2024-03-19 7:41 ` Gavin Shan 2024-03-19 7:41 ` Gavin Shan 2024-03-19 8:28 ` Michael S. Tsirkin 2024-03-19 8:28 ` Michael S. Tsirkin 2024-03-19 6:38 ` Gavin Shan [this message] 2024-03-19 6:38 ` Gavin Shan 2024-03-19 6:43 ` Michael S. Tsirkin 2024-03-19 6:43 ` Michael S. Tsirkin 2024-03-19 6:49 ` Gavin Shan 2024-03-19 6:49 ` Gavin Shan 2024-03-19 7:09 ` Michael S. Tsirkin 2024-03-19 7:09 ` Michael S. Tsirkin 2024-03-19 8:08 ` Gavin Shan 2024-03-19 8:08 ` Gavin Shan 2024-03-19 8:49 ` Michael S. Tsirkin 2024-03-19 8:49 ` Michael S. Tsirkin 2024-03-19 18:22 ` Will Deacon 2024-03-19 18:22 ` Will Deacon 2024-03-19 23:56 ` Gavin Shan 2024-03-19 23:56 ` Gavin Shan 2024-03-20 0:49 ` Michael S. Tsirkin 2024-03-20 0:49 ` Michael S. Tsirkin 2024-03-20 5:24 ` Gavin Shan 2024-03-20 5:24 ` Gavin Shan 2024-03-20 7:14 ` Michael S. Tsirkin 2024-03-20 7:14 ` Michael S. Tsirkin 2024-03-25 7:34 ` Gavin Shan 2024-03-25 7:34 ` Gavin Shan 2024-03-26 7:49 ` Michael S. Tsirkin 2024-03-26 7:49 ` Michael S. Tsirkin 2024-03-26 9:38 ` Keir Fraser 2024-03-26 9:38 ` Keir Fraser 2024-03-26 11:43 ` Will Deacon 2024-03-26 11:43 ` Will Deacon 2024-03-26 15:46 ` Will Deacon 2024-03-26 15:46 ` Will Deacon 2024-03-26 23:14 ` Gavin Shan 2024-03-26 23:14 ` Gavin Shan 2024-03-27 0:01 ` Gavin Shan 2024-03-27 0:01 ` Gavin Shan 2024-03-27 11:56 ` Michael S. Tsirkin 2024-03-27 11:56 ` Michael S. Tsirkin 2024-03-20 17:15 ` Keir Fraser 2024-03-20 17:15 ` Keir Fraser 2024-03-21 12:06 ` Gavin Shan 2024-03-21 12:06 ` Gavin Shan 2024-03-19 7:36 ` Michael S. Tsirkin 2024-03-19 18:21 ` Will Deacon 2024-03-19 6:14 ` Michael S. Tsirkin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=9b3030d1-cb2c-4ce0-8b24-1074b616fc84@redhat.com \ --to=gshan@redhat.com \ --cc=catalin.marinas@arm.com \ --cc=jasowang@redhat.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=mochs@nvidia.com \ --cc=mst@redhat.com \ --cc=shan.gavin@gmail.com \ --cc=virtualization@lists.linux.dev \ --cc=will@kernel.org \ --cc=xuanzhuo@linux.alibaba.com \ --cc=yihyu@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.