kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrey Ryabinin <arbn@yandex-team.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Stefano Garzarella <sgarzare@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	kvm <kvm@vger.kernel.org>,
	virtualization <virtualization@lists.linux-foundation.org>,
	netdev <netdev@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	bpf@vger.kernel.org
Subject: Re: [PATCH 6/6] vhost_net: use RCU callbacks instead of synchronize_rcu()
Date: Mon, 22 Nov 2021 18:12:59 +0300	[thread overview]
Message-ID: <ba4dbc25-f912-fb34-a0e2-c6c85b34b918@yandex-team.com> (raw)
In-Reply-To: <20211122043620-mutt-send-email-mst@kernel.org>



On 11/22/21 12:37 PM, Michael S. Tsirkin wrote:
> On Fri, Nov 19, 2021 at 02:32:05PM +0300, Andrey Ryabinin wrote:
>>
>>
>> On 11/16/21 8:00 AM, Jason Wang wrote:
>>> On Mon, Nov 15, 2021 at 11:32 PM Andrey Ryabinin <arbn@yandex-team.com> wrote:
>>>>
>>>> Currently vhost_net_release() uses synchronize_rcu() to synchronize
>>>> freeing with vhost_zerocopy_callback(). However synchronize_rcu()
>>>> is quite costly operation. It take more than 10 seconds
>>>> to shutdown qemu launched with couple net devices like this:
>>>>         -netdev tap,id=tap0,..,vhost=on,queues=80
>>>> because we end up calling synchronize_rcu() netdev_count*queues times.
>>>>
>>>> Free vhost net structures in rcu callback instead of using
>>>> synchronize_rcu() to fix the problem.
>>>
>>> I admit the release code is somehow hard to understand. But I wonder
>>> if the following case can still happen with this:
>>>
>>> CPU 0 (vhost_dev_cleanup)   CPU1
>>> (vhost_net_zerocopy_callback()->vhost_work_queue())
>>>                                                 if (!dev->worker)
>>> dev->worker = NULL
>>>
>>> wake_up_process(dev->worker)
>>>
>>> If this is true. It seems the fix is to move RCU synchronization stuff
>>> in vhost_net_ubuf_put_and_wait()?
>>>
>>
>> It all depends whether vhost_zerocopy_callback() can be called outside of vhost
>> thread context or not. If it can run after vhost thread stopped, than the race you
>> describe seems possible and the fix in commit b0c057ca7e83 ("vhost: fix a theoretical race in device cleanup")
>> wasn't complete. I would fix it by calling synchronize_rcu() after vhost_net_flush()
>> and before vhost_dev_cleanup().
>>
>> As for the performance problem, it can be solved by replacing synchronize_rcu() with synchronize_rcu_expedited().
> 
> expedited causes a stop of IPIs though, so it's problematic to
> do it upon a userspace syscall.
> 

How about something like this?


---
 drivers/vhost/net.c | 40 ++++++++++++++++++++++++++--------------
 1 file changed, 26 insertions(+), 14 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 97a209d6a527..556df26c584d 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -144,6 +144,10 @@ struct vhost_net {
 	struct page_frag page_frag;
 	/* Refcount bias of page frag */
 	int refcnt_bias;
+
+	struct socket *tx_sock;
+	struct socket *rx_sock;
+	struct rcu_work rwork;
 };
 
 static unsigned vhost_net_zcopy_mask __read_mostly;
@@ -1389,6 +1393,24 @@ static void vhost_net_flush(struct vhost_net *n)
 	}
 }
 
+static void vhost_net_cleanup(struct work_struct *work)
+{
+	struct vhost_net *n =
+		container_of(to_rcu_work(work), struct vhost_net, rwork);
+	vhost_dev_cleanup(&n->dev);
+	vhost_net_vq_reset(n);
+	if (n->tx_sock)
+		sockfd_put(n->tx_sock);
+	if (n->rx_sock)
+		sockfd_put(n->rx_sock);
+	kfree(n->vqs[VHOST_NET_VQ_RX].rxq.queue);
+	kfree(n->vqs[VHOST_NET_VQ_TX].xdp);
+	kfree(n->dev.vqs);
+	if (n->page_frag.page)
+		__page_frag_cache_drain(n->page_frag.page, n->refcnt_bias);
+	kvfree(n);
+}
+
 static int vhost_net_release(struct inode *inode, struct file *f)
 {
 	struct vhost_net *n = f->private_data;
@@ -1398,21 +1420,11 @@ static int vhost_net_release(struct inode *inode, struct file *f)
 	vhost_net_stop(n, &tx_sock, &rx_sock);
 	vhost_net_flush(n);
 	vhost_dev_stop(&n->dev);
-	vhost_dev_cleanup(&n->dev);
-	vhost_net_vq_reset(n);
-	if (tx_sock)
-		sockfd_put(tx_sock);
-	if (rx_sock)
-		sockfd_put(rx_sock);
-	/* Make sure no callbacks are outstanding */
-	synchronize_rcu();
+	n->tx_sock = tx_sock;
+	n->rx_sock = rx_sock;
 
-	kfree(n->vqs[VHOST_NET_VQ_RX].rxq.queue);
-	kfree(n->vqs[VHOST_NET_VQ_TX].xdp);
-	kfree(n->dev.vqs);
-	if (n->page_frag.page)
-		__page_frag_cache_drain(n->page_frag.page, n->refcnt_bias);
-	kvfree(n);
+	INIT_RCU_WORK(&n->rwork, vhost_net_cleanup);
+	queue_rcu_work(system_wq, &n->rwork);
 	return 0;
 }
 
-- 


  reply	other threads:[~2021-11-22 15:11 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-15 15:29 [PATCH 1/6] vhost: get rid of vhost_poll_flush() wrapper Andrey Ryabinin
2021-11-15 15:29 ` [PATCH 2/6] vhost_net: get rid of vhost_net_flush_vq() and extra flush calls Andrey Ryabinin
2021-11-16 14:33   ` Stefano Garzarella
2021-11-19 10:31     ` Andrey Ryabinin
2021-11-15 15:30 ` [PATCH 3/6] vhost_test: remove vhost_test_flush_vq() Andrey Ryabinin
2021-11-15 15:30 ` [PATCH 4/6] vhost_vsock: simplify vhost_vsock_flush() Andrey Ryabinin
2021-11-16 14:35   ` Stefano Garzarella
2021-11-15 15:30 ` [PATCH 5/6] vhost_net: remove NOP vhost_net_flush() in vhost_net_release() Andrey Ryabinin
2021-11-15 15:30 ` [PATCH 6/6] vhost_net: use RCU callbacks instead of synchronize_rcu() Andrey Ryabinin
2021-11-16  5:00   ` Jason Wang
2021-11-19 11:32     ` Andrey Ryabinin
2021-11-22  2:48       ` Jason Wang
2021-11-22  9:37       ` Michael S. Tsirkin
2021-11-22 15:12         ` Andrey Ryabinin [this message]
2021-11-16 14:33 ` [PATCH 1/6] vhost: get rid of vhost_poll_flush() wrapper Stefano Garzarella
2021-12-03 17:45   ` Mike Christie
2021-11-16 14:39 ` Denis Kirjanov
2021-11-16 14:41 ` Stefano Garzarella
2021-11-19 10:30   ` Andrey Ryabinin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ba4dbc25-f912-fb34-a0e2-c6c85b34b918@yandex-team.com \
    --to=arbn@yandex-team.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=hawk@kernel.org \
    --cc=jasowang@redhat.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).