All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gavin Shan <gshan@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: virtualization@lists.linux.dev, linux-kernel@vger.kernel.org,
	mst@redhat.com, davem@davemloft.net, stefanha@redhat.com,
	sgarzare@redhat.com, keirf@google.com, yihyu@redhat.com,
	shan.gavin@gmail.com, Will Deacon <will@kernel.org>
Subject: Re: [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty()
Date: Thu, 28 Mar 2024 10:27:33 +1000	[thread overview]
Message-ID: <48b1cc13-849f-4f36-b328-175417a1e4d2@redhat.com> (raw)
In-Reply-To: <CACGkMEti3xtqHxhM-DGcquP6UncELUzrNeVor45wfGRCBkkZrg@mail.gmail.com>

On 3/27/24 17:42, Jason Wang wrote:
> On Wed, Mar 27, 2024 at 3:35 PM Gavin Shan <gshan@redhat.com> wrote:
>>
>> On 3/27/24 14:08, Gavin Shan wrote:
>>> On 3/27/24 12:44, Jason Wang wrote:
>>>> On Wed, Mar 27, 2024 at 10:34 AM Jason Wang <jasowang@redhat.com> wrote:
>>>>> On Wed, Mar 27, 2024 at 7:39 AM Gavin Shan <gshan@redhat.com> wrote:
>>>>>>
>>>>>> A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
>>>>>> Will Deacon <will@kernel.org>. Otherwise, it's not ensured the
>>>>>> available ring entries pushed by guest can be observed by vhost
>>>>>> in time, leading to stale available ring entries fetched by vhost
>>>>>> in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's
>>>>>> grace-hopper (ARM64) platform.
>>>>>>
>>>>>>     /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64      \
>>>>>>     -accel kvm -machine virt,gic-version=host -cpu host          \
>>>>>>     -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
>>>>>>     -m 4096M,slots=16,maxmem=64G                                 \
>>>>>>     -object memory-backend-ram,id=mem0,size=4096M                \
>>>>>>      :                                                           \
>>>>>>     -netdev tap,id=vnet0,vhost=true                              \
>>>>>>     -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
>>>>>>      :
>>>>>>     guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
>>>>>>     virtio_net virtio0: output.0:id 100 is not a head!
>>>>>>
>>>>>> Add the missed smp_rmb() in vhost_vq_avail_empty(). Note that it
>>>>>> should be safe until vq->avail_idx is changed by commit 275bf960ac697
>>>>>> ("vhost: better detection of available buffers").
>>>>>>
>>>>>> Fixes: 275bf960ac697 ("vhost: better detection of available buffers")
>>>>>> Cc: <stable@kernel.org> # v4.11+
>>>>>> Reported-by: Yihuang Yu <yihyu@redhat.com>
>>>>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>>>>> ---
>>>>>>    drivers/vhost/vhost.c | 11 ++++++++++-
>>>>>>    1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>>>>> index 045f666b4f12..00445ab172b3 100644
>>>>>> --- a/drivers/vhost/vhost.c
>>>>>> +++ b/drivers/vhost/vhost.c
>>>>>> @@ -2799,9 +2799,18 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>>>>>>           r = vhost_get_avail_idx(vq, &avail_idx);
>>>>>>           if (unlikely(r))
>>>>>>                   return false;
>>>>>> +
>>>>>>           vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
>>>>>> +       if (vq->avail_idx != vq->last_avail_idx) {
>>>>>> +               /* Similar to what's done in vhost_get_vq_desc(), we need
>>>>>> +                * to ensure the available ring entries have been exposed
>>>>>> +                * by guest.
>>>>>> +                */
>>>>>
>>>>> We need to be more verbose here. For example, which load needs to be
>>>>> ordered with which load.
>>>>>
>>>>> The rmb in vhost_get_vq_desc() is used to order the load of avail idx
>>>>> and the load of head. It is paired with e.g virtio_wmb() in
>>>>> virtqueue_add_split().
>>>>>
>>>>> vhost_vq_avail_empty() are mostly used as a hint in
>>>>> vhost_net_busy_poll() which is under the protection of the vq mutex.
>>>>>
>>>>> An exception is the tx_can_batch(), but in that case it doesn't even
>>>>> want to read the head.
>>>>
>>>> Ok, if it is needed only in that path, maybe we can move the barriers there.
>>>>
>>>
>>> [cc Will Deacon]
>>>
>>> Jason, appreciate for your review and comments. I think PATCH[1/2] is
>>> the fix for the hypothesis, meaning PATCH[2/2] is the real fix. However,
>>> it would be nice to fix all of them in one shoot. I will try with PATCH[2/2]
>>> only to see if our issue will disappear or not. However, the issue still
>>> exists if PATCH[2/2] is missed.
>>>
>>
>> Jason, PATCH[2/2] is sufficient to fix our current issue. I tried with PATCH[2/2]
>> only and unable to hit the issue. However, PATCH[1/2] may be needed by other scenarios.
>> So it would be nice to fix them in one shoot.
> 
> Yes, see below.
> 
>>
>>
>>> Firstly, We were failing on the transmit queue and {tvq, rvq}->busyloop_timeout
>>> == false if I remember correctly. So the added smp_rmb() in vhost_vq_avail_empty()
>>> is only a concern to tx_can_batch(). A mutex isn't enough to ensure the order
>>> for the available index and available ring entry (head). For example, vhost_vq_avail_empty()
>>> called by tx_can_batch() can see next available index, but its corresponding
>>> available ring entry (head) may not be seen by vhost yet if smp_rmb() is missed.
>>> The next call to get_tx_bufs(), where the available ring entry (head) doesn't
>>> arrived yet, leading to stale available ring entry (head) being fetched.
>>>
>>>     handle_tx_copy
>>>       get_tx_bufs                 // smp_rmb() won't be executed when vq->avail_idx != vq->last_avail_idx
>>>       tx_can_batch
>>>         vhost_vq_avail_empty      // vq->avail_idx is updated from vq->avail->idx
>>>
>>> The reason why I added smp_rmb() to vhost_vq_avail_empty() is because the function
>>> is a exposed API, even it's only used by drivers/vhost/net.c at present. It means
>>> the API has been broken internally. So it seems more appropriate to fix it up in
>>> vhost_vq_avail_empty() so that the API's users needn't worry about the memory access
>>> order.
> 
> When tx_can_batch returns true it means there's still pending tx
> buffers. Since it might read indices so it still can bypass the
> smp_rmb() in the vhost_get_vq_desc().
> 
> I'd suggest adding those above to change log.
> 
> With this,
> 
> Acked-by: Jason Wang <jasowang@redhat.com>
> 

Thanks, Jason. The change log has been improved as you suggested in
v3, which was posted for further review.

Thanks,
Gavin

>>>
>>>>>
>>>>>
>>>>>> +               smp_rmb();
>>>>>> +               return false;
>>>>>> +       }
>>>>>>
>>>>>> -       return vq->avail_idx == vq->last_avail_idx;
>>>>>> +       return true;
>>>>>>    }
>>>>>>    EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);


  reply	other threads:[~2024-03-28  0:27 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-26 23:38 [PATCH v2 0/2] vhost: Fix stale available ring entries Gavin Shan
2024-03-26 23:38 ` [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty() Gavin Shan
2024-03-27  2:34   ` Jason Wang
2024-03-27  2:44     ` Jason Wang
2024-03-27  4:08       ` Gavin Shan
2024-03-27  7:35         ` Gavin Shan
2024-03-27  7:42           ` Jason Wang
2024-03-28  0:27             ` Gavin Shan [this message]
2024-03-27 12:07   ` Michael S. Tsirkin
2024-03-28  0:26     ` Gavin Shan
2024-03-26 23:38 ` [PATCH v2 2/2] vhost: Add smp_rmb() in vhost_enable_notify() Gavin Shan
2024-03-27  2:41   ` Jason Wang
2024-03-27  4:10     ` Gavin Shan
2024-03-26 23:55 ` [PATCH v2 0/2] vhost: Fix stale available ring entries Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48b1cc13-849f-4f36-b328-175417a1e4d2@redhat.com \
    --to=gshan@redhat.com \
    --cc=davem@davemloft.net \
    --cc=jasowang@redhat.com \
    --cc=keirf@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=shan.gavin@gmail.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux.dev \
    --cc=will@kernel.org \
    --cc=yihyu@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.