From: Andrey Ryabinin <arbn@yandex-team.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Stefano Garzarella <sgarzare@redhat.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
"David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
Jesper Dangaard Brouer <hawk@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
kvm <kvm@vger.kernel.org>,
virtualization <virtualization@lists.linux-foundation.org>,
netdev <netdev@vger.kernel.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
bpf@vger.kernel.org
Subject: Re: [PATCH 6/6] vhost_net: use RCU callbacks instead of synchronize_rcu()
Date: Mon, 22 Nov 2021 18:12:59 +0300 [thread overview]
Message-ID: <ba4dbc25-f912-fb34-a0e2-c6c85b34b918@yandex-team.com> (raw)
In-Reply-To: <20211122043620-mutt-send-email-mst@kernel.org>
On 11/22/21 12:37 PM, Michael S. Tsirkin wrote:
> On Fri, Nov 19, 2021 at 02:32:05PM +0300, Andrey Ryabinin wrote:
>>
>>
>> On 11/16/21 8:00 AM, Jason Wang wrote:
>>> On Mon, Nov 15, 2021 at 11:32 PM Andrey Ryabinin <arbn@yandex-team.com> wrote:
>>>>
>>>> Currently vhost_net_release() uses synchronize_rcu() to synchronize
>>>> freeing with vhost_zerocopy_callback(). However synchronize_rcu()
>>>> is quite costly operation. It take more than 10 seconds
>>>> to shutdown qemu launched with couple net devices like this:
>>>> -netdev tap,id=tap0,..,vhost=on,queues=80
>>>> because we end up calling synchronize_rcu() netdev_count*queues times.
>>>>
>>>> Free vhost net structures in rcu callback instead of using
>>>> synchronize_rcu() to fix the problem.
>>>
>>> I admit the release code is somehow hard to understand. But I wonder
>>> if the following case can still happen with this:
>>>
>>> CPU 0 (vhost_dev_cleanup) CPU1
>>> (vhost_net_zerocopy_callback()->vhost_work_queue())
>>> if (!dev->worker)
>>> dev->worker = NULL
>>>
>>> wake_up_process(dev->worker)
>>>
>>> If this is true. It seems the fix is to move RCU synchronization stuff
>>> in vhost_net_ubuf_put_and_wait()?
>>>
>>
>> It all depends whether vhost_zerocopy_callback() can be called outside of vhost
>> thread context or not. If it can run after vhost thread stopped, than the race you
>> describe seems possible and the fix in commit b0c057ca7e83 ("vhost: fix a theoretical race in device cleanup")
>> wasn't complete. I would fix it by calling synchronize_rcu() after vhost_net_flush()
>> and before vhost_dev_cleanup().
>>
>> As for the performance problem, it can be solved by replacing synchronize_rcu() with synchronize_rcu_expedited().
>
> expedited causes a stop of IPIs though, so it's problematic to
> do it upon a userspace syscall.
>
How about something like this?
---
drivers/vhost/net.c | 40 ++++++++++++++++++++++++++--------------
1 file changed, 26 insertions(+), 14 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 97a209d6a527..556df26c584d 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -144,6 +144,10 @@ struct vhost_net {
struct page_frag page_frag;
/* Refcount bias of page frag */
int refcnt_bias;
+
+ struct socket *tx_sock;
+ struct socket *rx_sock;
+ struct rcu_work rwork;
};
static unsigned vhost_net_zcopy_mask __read_mostly;
@@ -1389,6 +1393,24 @@ static void vhost_net_flush(struct vhost_net *n)
}
}
+static void vhost_net_cleanup(struct work_struct *work)
+{
+ struct vhost_net *n =
+ container_of(to_rcu_work(work), struct vhost_net, rwork);
+ vhost_dev_cleanup(&n->dev);
+ vhost_net_vq_reset(n);
+ if (n->tx_sock)
+ sockfd_put(n->tx_sock);
+ if (n->rx_sock)
+ sockfd_put(n->rx_sock);
+ kfree(n->vqs[VHOST_NET_VQ_RX].rxq.queue);
+ kfree(n->vqs[VHOST_NET_VQ_TX].xdp);
+ kfree(n->dev.vqs);
+ if (n->page_frag.page)
+ __page_frag_cache_drain(n->page_frag.page, n->refcnt_bias);
+ kvfree(n);
+}
+
static int vhost_net_release(struct inode *inode, struct file *f)
{
struct vhost_net *n = f->private_data;
@@ -1398,21 +1420,11 @@ static int vhost_net_release(struct inode *inode, struct file *f)
vhost_net_stop(n, &tx_sock, &rx_sock);
vhost_net_flush(n);
vhost_dev_stop(&n->dev);
- vhost_dev_cleanup(&n->dev);
- vhost_net_vq_reset(n);
- if (tx_sock)
- sockfd_put(tx_sock);
- if (rx_sock)
- sockfd_put(rx_sock);
- /* Make sure no callbacks are outstanding */
- synchronize_rcu();
+ n->tx_sock = tx_sock;
+ n->rx_sock = rx_sock;
- kfree(n->vqs[VHOST_NET_VQ_RX].rxq.queue);
- kfree(n->vqs[VHOST_NET_VQ_TX].xdp);
- kfree(n->dev.vqs);
- if (n->page_frag.page)
- __page_frag_cache_drain(n->page_frag.page, n->refcnt_bias);
- kvfree(n);
+ INIT_RCU_WORK(&n->rwork, vhost_net_cleanup);
+ queue_rcu_work(system_wq, &n->rwork);
return 0;
}
--
next prev parent reply other threads:[~2021-11-22 15:11 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-15 15:29 [PATCH 1/6] vhost: get rid of vhost_poll_flush() wrapper Andrey Ryabinin
2021-11-15 15:29 ` [PATCH 2/6] vhost_net: get rid of vhost_net_flush_vq() and extra flush calls Andrey Ryabinin
2021-11-16 14:33 ` Stefano Garzarella
2021-11-19 10:31 ` Andrey Ryabinin
2021-11-15 15:30 ` [PATCH 3/6] vhost_test: remove vhost_test_flush_vq() Andrey Ryabinin
2021-11-15 15:30 ` [PATCH 4/6] vhost_vsock: simplify vhost_vsock_flush() Andrey Ryabinin
2021-11-16 14:35 ` Stefano Garzarella
2021-11-15 15:30 ` [PATCH 5/6] vhost_net: remove NOP vhost_net_flush() in vhost_net_release() Andrey Ryabinin
2021-11-15 15:30 ` [PATCH 6/6] vhost_net: use RCU callbacks instead of synchronize_rcu() Andrey Ryabinin
2021-11-16 5:00 ` Jason Wang
2021-11-19 11:32 ` Andrey Ryabinin
2021-11-22 2:48 ` Jason Wang
2021-11-22 9:37 ` Michael S. Tsirkin
2021-11-22 15:12 ` Andrey Ryabinin [this message]
2021-11-16 14:33 ` [PATCH 1/6] vhost: get rid of vhost_poll_flush() wrapper Stefano Garzarella
2021-12-03 17:45 ` Mike Christie
2021-11-16 14:39 ` Denis Kirjanov
2021-11-16 14:41 ` Stefano Garzarella
2021-11-19 10:30 ` Andrey Ryabinin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ba4dbc25-f912-fb34-a0e2-c6c85b34b918@yandex-team.com \
--to=arbn@yandex-team.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=hawk@kernel.org \
--cc=jasowang@redhat.com \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=sgarzare@redhat.com \
--cc=stefanha@redhat.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).