linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] vhost: vsock: don't send pkt when vq is not started
@ 2020-04-30  2:13 Jia He
  2020-04-30  8:26 ` Stefano Garzarella
  0 siblings, 1 reply; 6+ messages in thread
From: Jia He @ 2020-04-30  2:13 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin, Jason Wang
  Cc: kvm, virtualization, netdev, linux-kernel, Kaly.Xin, Jia He

Ning Bo reported an abnormal 2-second gap when booting Kata container [1].
The unconditional timeout is caused by VSOCK_DEFAULT_CONNECT_TIMEOUT of
connect at client side. The vhost vsock client tries to connect an
initlizing virtio vsock server.

The abnormal flow looks like:
host-userspace           vhost vsock                       guest vsock
==============           ===========                       ============
connect()     -------->  vhost_transport_send_pkt_work()   initializing
   |                     vq->private_data==NULL
   |                     will not be queued
   V
schedule_timeout(2s)
                         vhost_vsock_start()  <---------   device ready
                         set vq->private_data

wait for 2s and failed

connect() again          vq->private_data!=NULL          recv connecting pkt

1. host userspace sends a connect pkt, at that time, guest vsock is under
initializing, hence the vhost_vsock_start has not been called. So
vq->private_data==NULL, and the pkt is not been queued to send to guest.
2. then it sleeps for 2s
3. after guest vsock finishes initializing, vq->private_data is set.
4. When host userspace wakes up after 2s, send connecting pkt again,
everything is fine.

This fixes it by checking vq->private_data in vhost_transport_send_pkt,
and return at once if !vq->private_data. This makes user connect()
be returned with ECONNREFUSED.

After this patch, kata-runtime (with vsock enabled) boottime reduces from
3s to 1s on ThunderX2 arm64 server.

[1] https://github.com/kata-containers/runtime/issues/1917

Reported-by: Ning Bo <n.b@live.com>
Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/vhost/vsock.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index e36aaf9ba7bd..67474334dd88 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -241,6 +241,7 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
 {
 	struct vhost_vsock *vsock;
 	int len = pkt->len;
+	struct vhost_virtqueue *vq;
 
 	rcu_read_lock();
 
@@ -252,6 +253,13 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
 		return -ENODEV;
 	}
 
+	vq = &vsock->vqs[VSOCK_VQ_RX];
+	if (!vq->private_data) {
+		rcu_read_unlock();
+		virtio_transport_free_pkt(pkt);
+		return -ECONNREFUSED;
+	}
+
 	if (pkt->reply)
 		atomic_inc(&vsock->queued_replies);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] vhost: vsock: don't send pkt when vq is not started
  2020-04-30  2:13 [PATCH] vhost: vsock: don't send pkt when vq is not started Jia He
@ 2020-04-30  8:26 ` Stefano Garzarella
  2020-04-30 10:06   ` Justin He
  0 siblings, 1 reply; 6+ messages in thread
From: Stefano Garzarella @ 2020-04-30  8:26 UTC (permalink / raw)
  To: Jia He
  Cc: Stefan Hajnoczi, Michael S. Tsirkin, Jason Wang, kvm,
	virtualization, netdev, linux-kernel, Kaly.Xin

Hi Jia,
thanks for the patch, some comments below:

On Thu, Apr 30, 2020 at 10:13:14AM +0800, Jia He wrote:
> Ning Bo reported an abnormal 2-second gap when booting Kata container [1].
> The unconditional timeout is caused by VSOCK_DEFAULT_CONNECT_TIMEOUT of
> connect at client side. The vhost vsock client tries to connect an
> initlizing virtio vsock server.
> 
> The abnormal flow looks like:
> host-userspace           vhost vsock                       guest vsock
> ==============           ===========                       ============
> connect()     -------->  vhost_transport_send_pkt_work()   initializing
>    |                     vq->private_data==NULL
>    |                     will not be queued
>    V
> schedule_timeout(2s)
>                          vhost_vsock_start()  <---------   device ready
>                          set vq->private_data
> 
> wait for 2s and failed
> 
> connect() again          vq->private_data!=NULL          recv connecting pkt
> 
> 1. host userspace sends a connect pkt, at that time, guest vsock is under
> initializing, hence the vhost_vsock_start has not been called. So
> vq->private_data==NULL, and the pkt is not been queued to send to guest.
> 2. then it sleeps for 2s
> 3. after guest vsock finishes initializing, vq->private_data is set.
> 4. When host userspace wakes up after 2s, send connecting pkt again,
> everything is fine.
> 
> This fixes it by checking vq->private_data in vhost_transport_send_pkt,
> and return at once if !vq->private_data. This makes user connect()
> be returned with ECONNREFUSED.
> 
> After this patch, kata-runtime (with vsock enabled) boottime reduces from
> 3s to 1s on ThunderX2 arm64 server.
> 
> [1] https://github.com/kata-containers/runtime/issues/1917
> 
> Reported-by: Ning Bo <n.b@live.com>
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
>  drivers/vhost/vsock.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index e36aaf9ba7bd..67474334dd88 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -241,6 +241,7 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
>  {
>  	struct vhost_vsock *vsock;
>  	int len = pkt->len;
> +	struct vhost_virtqueue *vq;
>  
>  	rcu_read_lock();
>  
> @@ -252,6 +253,13 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
>  		return -ENODEV;
>  	}
>  
> +	vq = &vsock->vqs[VSOCK_VQ_RX];
> +	if (!vq->private_data) {

I think is better to use vhost_vq_get_backend():

	if (!vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])) {
		...

This function should be called with 'vq->mutex' acquired as explained in
the comment, but here we can avoid that, because we are not using the vq,
so it is safe, because in vhost_transport_do_send_pkt() we check it again.

Please add a comment explaining that.


As an alternative to this patch, should we kick the send worker when the
device is ready?

IIUC we reach the timeout because the send worker (that runs
vhost_transport_do_send_pkt()) exits immediately since 'vq->private_data'
is NULL, and no one will requeue it.

Let's do it when we know the device is ready:

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index e36aaf9ba7bd..295b5867944f 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -543,6 +543,11 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
                mutex_unlock(&vq->mutex);
        }
 
+       /* Some packets may have been queued before the device was started,
+        * let's kick the send worker to send them.
+        */
+       vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
+
        mutex_unlock(&vsock->dev.mutex);
        return 0;

I didn't test it, can you try if it fixes the issue?

I'm not sure which is better...

Thanks,
Stefano


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* RE: [PATCH] vhost: vsock: don't send pkt when vq is not started
  2020-04-30  8:26 ` Stefano Garzarella
@ 2020-04-30 10:06   ` Justin He
  2020-04-30 16:25     ` Stefano Garzarella
  0 siblings, 1 reply; 6+ messages in thread
From: Justin He @ 2020-04-30 10:06 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Stefan Hajnoczi, Michael S. Tsirkin, Jason Wang, kvm,
	virtualization, netdev, linux-kernel, Kaly Xin

Hi Stefano

> -----Original Message-----
> From: Stefano Garzarella <sgarzare@redhat.com>
> Sent: Thursday, April 30, 2020 4:26 PM
> To: Justin He <Justin.He@arm.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>; Michael S. Tsirkin
> <mst@redhat.com>; Jason Wang <jasowang@redhat.com>;
> kvm@vger.kernel.org; virtualization@lists.linux-foundation.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Kaly Xin
> <Kaly.Xin@arm.com>
> Subject: Re: [PATCH] vhost: vsock: don't send pkt when vq is not started
>
> Hi Jia,
> thanks for the patch, some comments below:
>
> On Thu, Apr 30, 2020 at 10:13:14AM +0800, Jia He wrote:
> > Ning Bo reported an abnormal 2-second gap when booting Kata container
> [1].
> > The unconditional timeout is caused by
> VSOCK_DEFAULT_CONNECT_TIMEOUT of
> > connect at client side. The vhost vsock client tries to connect an
> > initlizing virtio vsock server.
> >
> > The abnormal flow looks like:
> > host-userspace           vhost vsock                       guest vsock
> > ==============           ===========                       ============
> > connect()     -------->  vhost_transport_send_pkt_work()   initializing
> >    |                     vq->private_data==NULL
> >    |                     will not be queued
> >    V
> > schedule_timeout(2s)
> >                          vhost_vsock_start()  <---------   device ready
> >                          set vq->private_data
> >
> > wait for 2s and failed
> >
> > connect() again          vq->private_data!=NULL          recv connecting pkt
> >
> > 1. host userspace sends a connect pkt, at that time, guest vsock is under
> > initializing, hence the vhost_vsock_start has not been called. So
> > vq->private_data==NULL, and the pkt is not been queued to send to guest.
> > 2. then it sleeps for 2s
> > 3. after guest vsock finishes initializing, vq->private_data is set.
> > 4. When host userspace wakes up after 2s, send connecting pkt again,
> > everything is fine.
> >
> > This fixes it by checking vq->private_data in vhost_transport_send_pkt,
> > and return at once if !vq->private_data. This makes user connect()
> > be returned with ECONNREFUSED.
> >
> > After this patch, kata-runtime (with vsock enabled) boottime reduces from
> > 3s to 1s on ThunderX2 arm64 server.
> >
> > [1] https://github.com/kata-containers/runtime/issues/1917
> >
> > Reported-by: Ning Bo <n.b@live.com>
> > Signed-off-by: Jia He <justin.he@arm.com>
> > ---
> >  drivers/vhost/vsock.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index e36aaf9ba7bd..67474334dd88 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -241,6 +241,7 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt
> *pkt)
> >  {
> >  struct vhost_vsock *vsock;
> >  int len = pkt->len;
> > +struct vhost_virtqueue *vq;
> >
> >  rcu_read_lock();
> >
> > @@ -252,6 +253,13 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt
> *pkt)
> >  return -ENODEV;
> >  }
> >
> > +vq = &vsock->vqs[VSOCK_VQ_RX];
> > +if (!vq->private_data) {
>
> I think is better to use vhost_vq_get_backend():
>
> if (!vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])) {
> ...
>
> This function should be called with 'vq->mutex' acquired as explained in
> the comment, but here we can avoid that, because we are not using the vq,
> so it is safe, because in vhost_transport_do_send_pkt() we check it again.
>
> Please add a comment explaining that.
>

Thanks, vhost_vq_get_backend is better. I chose a 5.3 kernel to develop
and missed this helper.
>
> As an alternative to this patch, should we kick the send worker when the
> device is ready?
>
> IIUC we reach the timeout because the send worker (that runs
> vhost_transport_do_send_pkt()) exits immediately since 'vq->private_data'
> is NULL, and no one will requeue it.
>
> Let's do it when we know the device is ready:
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index e36aaf9ba7bd..295b5867944f 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -543,6 +543,11 @@ static int vhost_vsock_start(struct vhost_vsock
> *vsock)
>                 mutex_unlock(&vq->mutex);
>         }
>
> +       /* Some packets may have been queued before the device was started,
> +        * let's kick the send worker to send them.
> +        */
> +       vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
> +
Yes, it works.
But do you think a threshold should be set here to prevent the queue
from being too long? E.g. the client user sends too many connect pkts
in a short time before the server is completely ready.

>         mutex_unlock(&vsock->dev.mutex);
>         return 0;
>
> I didn't test it, can you try if it fixes the issue?
>
> I'm not sure which is better...
I don't know, either. Wait for more comments 😊

--
Cheers,
Justin (Jia He)
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vhost: vsock: don't send pkt when vq is not started
  2020-04-30 10:06   ` Justin He
@ 2020-04-30 16:25     ` Stefano Garzarella
  2020-04-30 19:43       ` Michael S. Tsirkin
  0 siblings, 1 reply; 6+ messages in thread
From: Stefano Garzarella @ 2020-04-30 16:25 UTC (permalink / raw)
  To: Justin He
  Cc: Stefan Hajnoczi, Michael S. Tsirkin, Jason Wang, kvm,
	virtualization, netdev, linux-kernel, Kaly Xin

On Thu, Apr 30, 2020 at 10:06:26AM +0000, Justin He wrote:
> Hi Stefano
> 
> > -----Original Message-----
> > From: Stefano Garzarella <sgarzare@redhat.com>
> > Sent: Thursday, April 30, 2020 4:26 PM
> > To: Justin He <Justin.He@arm.com>
> > Cc: Stefan Hajnoczi <stefanha@redhat.com>; Michael S. Tsirkin
> > <mst@redhat.com>; Jason Wang <jasowang@redhat.com>;
> > kvm@vger.kernel.org; virtualization@lists.linux-foundation.org;
> > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Kaly Xin
> > <Kaly.Xin@arm.com>
> > Subject: Re: [PATCH] vhost: vsock: don't send pkt when vq is not started
> >
> > Hi Jia,
> > thanks for the patch, some comments below:
> >
> > On Thu, Apr 30, 2020 at 10:13:14AM +0800, Jia He wrote:
> > > Ning Bo reported an abnormal 2-second gap when booting Kata container
> > [1].
> > > The unconditional timeout is caused by
> > VSOCK_DEFAULT_CONNECT_TIMEOUT of
> > > connect at client side. The vhost vsock client tries to connect an
> > > initlizing virtio vsock server.
> > >
> > > The abnormal flow looks like:
> > > host-userspace           vhost vsock                       guest vsock
> > > ==============           ===========                       ============
> > > connect()     -------->  vhost_transport_send_pkt_work()   initializing
> > >    |                     vq->private_data==NULL
> > >    |                     will not be queued
> > >    V
> > > schedule_timeout(2s)
> > >                          vhost_vsock_start()  <---------   device ready
> > >                          set vq->private_data
> > >
> > > wait for 2s and failed
> > >
> > > connect() again          vq->private_data!=NULL          recv connecting pkt
> > >
> > > 1. host userspace sends a connect pkt, at that time, guest vsock is under
> > > initializing, hence the vhost_vsock_start has not been called. So
> > > vq->private_data==NULL, and the pkt is not been queued to send to guest.
> > > 2. then it sleeps for 2s
> > > 3. after guest vsock finishes initializing, vq->private_data is set.
> > > 4. When host userspace wakes up after 2s, send connecting pkt again,
> > > everything is fine.
> > >
> > > This fixes it by checking vq->private_data in vhost_transport_send_pkt,
> > > and return at once if !vq->private_data. This makes user connect()
> > > be returned with ECONNREFUSED.
> > >
> > > After this patch, kata-runtime (with vsock enabled) boottime reduces from
> > > 3s to 1s on ThunderX2 arm64 server.
> > >
> > > [1] https://github.com/kata-containers/runtime/issues/1917
> > >
> > > Reported-by: Ning Bo <n.b@live.com>
> > > Signed-off-by: Jia He <justin.he@arm.com>
> > > ---
> > >  drivers/vhost/vsock.c | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > >
> > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > index e36aaf9ba7bd..67474334dd88 100644
> > > --- a/drivers/vhost/vsock.c
> > > +++ b/drivers/vhost/vsock.c
> > > @@ -241,6 +241,7 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt
> > *pkt)
> > >  {
> > >  struct vhost_vsock *vsock;
> > >  int len = pkt->len;
> > > +struct vhost_virtqueue *vq;
> > >
> > >  rcu_read_lock();
> > >
> > > @@ -252,6 +253,13 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt
> > *pkt)
> > >  return -ENODEV;
> > >  }
> > >
> > > +vq = &vsock->vqs[VSOCK_VQ_RX];
> > > +if (!vq->private_data) {
> >
> > I think is better to use vhost_vq_get_backend():
> >
> > if (!vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])) {
> > ...
> >
> > This function should be called with 'vq->mutex' acquired as explained in
> > the comment, but here we can avoid that, because we are not using the vq,
> > so it is safe, because in vhost_transport_do_send_pkt() we check it again.
> >
> > Please add a comment explaining that.
> >
> 
> Thanks, vhost_vq_get_backend is better. I chose a 5.3 kernel to develop
> and missed this helper.

:-)

> >
> > As an alternative to this patch, should we kick the send worker when the
> > device is ready?
> >
> > IIUC we reach the timeout because the send worker (that runs
> > vhost_transport_do_send_pkt()) exits immediately since 'vq->private_data'
> > is NULL, and no one will requeue it.
> >
> > Let's do it when we know the device is ready:
> >
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index e36aaf9ba7bd..295b5867944f 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -543,6 +543,11 @@ static int vhost_vsock_start(struct vhost_vsock
> > *vsock)
> >                 mutex_unlock(&vq->mutex);
> >         }
> >
> > +       /* Some packets may have been queued before the device was started,
> > +        * let's kick the send worker to send them.
> > +        */
> > +       vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
> > +
> Yes, it works.
> But do you think a threshold should be set here to prevent the queue
> from being too long? E.g. the client user sends too many connect pkts
> in a short time before the server is completely ready.

When the user call the connect() the socket status is moved to
SS_CONNECTING (see net/vmw_vsock/af_vsock.c), so another connect() on
the same socket will receive EALREADY error.

If the user uses multiple sockets, the socket layer already check for
any limits, so I don't think we should put a threshold here.

> 
> >         mutex_unlock(&vsock->dev.mutex);
> >         return 0;
> >
> > I didn't test it, can you try if it fixes the issue?
> >
> > I'm not sure which is better...
> I don't know, either. Wait for more comments 😊

I prefer the second option, because the device is in a transitional
state and a connect can block (for at most two seconds) until the device is
started.

For the first option, I'm also not sure if ECONNREFUSED is the right error
to return, maybe is better ENETUNREACH.

Cheers,
Stefano


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vhost: vsock: don't send pkt when vq is not started
  2020-04-30 16:25     ` Stefano Garzarella
@ 2020-04-30 19:43       ` Michael S. Tsirkin
  2020-05-01 14:37         ` Stefano Garzarella
  0 siblings, 1 reply; 6+ messages in thread
From: Michael S. Tsirkin @ 2020-04-30 19:43 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Justin He, Stefan Hajnoczi, Jason Wang, kvm, virtualization,
	netdev, linux-kernel, Kaly Xin

On Thu, Apr 30, 2020 at 06:25:21PM +0200, Stefano Garzarella wrote:
> On Thu, Apr 30, 2020 at 10:06:26AM +0000, Justin He wrote:
> > Hi Stefano
> > 
> > > -----Original Message-----
> > > From: Stefano Garzarella <sgarzare@redhat.com>
> > > Sent: Thursday, April 30, 2020 4:26 PM
> > > To: Justin He <Justin.He@arm.com>
> > > Cc: Stefan Hajnoczi <stefanha@redhat.com>; Michael S. Tsirkin
> > > <mst@redhat.com>; Jason Wang <jasowang@redhat.com>;
> > > kvm@vger.kernel.org; virtualization@lists.linux-foundation.org;
> > > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Kaly Xin
> > > <Kaly.Xin@arm.com>
> > > Subject: Re: [PATCH] vhost: vsock: don't send pkt when vq is not started
> > >
> > > Hi Jia,
> > > thanks for the patch, some comments below:
> > >
> > > On Thu, Apr 30, 2020 at 10:13:14AM +0800, Jia He wrote:
> > > > Ning Bo reported an abnormal 2-second gap when booting Kata container
> > > [1].
> > > > The unconditional timeout is caused by
> > > VSOCK_DEFAULT_CONNECT_TIMEOUT of
> > > > connect at client side. The vhost vsock client tries to connect an
> > > > initlizing virtio vsock server.
> > > >
> > > > The abnormal flow looks like:
> > > > host-userspace           vhost vsock                       guest vsock
> > > > ==============           ===========                       ============
> > > > connect()     -------->  vhost_transport_send_pkt_work()   initializing
> > > >    |                     vq->private_data==NULL
> > > >    |                     will not be queued
> > > >    V
> > > > schedule_timeout(2s)
> > > >                          vhost_vsock_start()  <---------   device ready
> > > >                          set vq->private_data
> > > >
> > > > wait for 2s and failed
> > > >
> > > > connect() again          vq->private_data!=NULL          recv connecting pkt
> > > >
> > > > 1. host userspace sends a connect pkt, at that time, guest vsock is under
> > > > initializing, hence the vhost_vsock_start has not been called. So
> > > > vq->private_data==NULL, and the pkt is not been queued to send to guest.
> > > > 2. then it sleeps for 2s
> > > > 3. after guest vsock finishes initializing, vq->private_data is set.
> > > > 4. When host userspace wakes up after 2s, send connecting pkt again,
> > > > everything is fine.
> > > >
> > > > This fixes it by checking vq->private_data in vhost_transport_send_pkt,
> > > > and return at once if !vq->private_data. This makes user connect()
> > > > be returned with ECONNREFUSED.
> > > >
> > > > After this patch, kata-runtime (with vsock enabled) boottime reduces from
> > > > 3s to 1s on ThunderX2 arm64 server.
> > > >
> > > > [1] https://github.com/kata-containers/runtime/issues/1917
> > > >
> > > > Reported-by: Ning Bo <n.b@live.com>
> > > > Signed-off-by: Jia He <justin.he@arm.com>
> > > > ---
> > > >  drivers/vhost/vsock.c | 8 ++++++++
> > > >  1 file changed, 8 insertions(+)
> > > >
> > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > > index e36aaf9ba7bd..67474334dd88 100644
> > > > --- a/drivers/vhost/vsock.c
> > > > +++ b/drivers/vhost/vsock.c
> > > > @@ -241,6 +241,7 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt
> > > *pkt)
> > > >  {
> > > >  struct vhost_vsock *vsock;
> > > >  int len = pkt->len;
> > > > +struct vhost_virtqueue *vq;
> > > >
> > > >  rcu_read_lock();
> > > >
> > > > @@ -252,6 +253,13 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt
> > > *pkt)
> > > >  return -ENODEV;
> > > >  }
> > > >
> > > > +vq = &vsock->vqs[VSOCK_VQ_RX];
> > > > +if (!vq->private_data) {
> > >
> > > I think is better to use vhost_vq_get_backend():
> > >
> > > if (!vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])) {
> > > ...
> > >
> > > This function should be called with 'vq->mutex' acquired as explained in
> > > the comment, but here we can avoid that, because we are not using the vq,
> > > so it is safe, because in vhost_transport_do_send_pkt() we check it again.
> > >
> > > Please add a comment explaining that.
> > >
> > 
> > Thanks, vhost_vq_get_backend is better. I chose a 5.3 kernel to develop
> > and missed this helper.
> 
> :-)
> 
> > >
> > > As an alternative to this patch, should we kick the send worker when the
> > > device is ready?
> > >
> > > IIUC we reach the timeout because the send worker (that runs
> > > vhost_transport_do_send_pkt()) exits immediately since 'vq->private_data'
> > > is NULL, and no one will requeue it.
> > >
> > > Let's do it when we know the device is ready:
> > >
> > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > index e36aaf9ba7bd..295b5867944f 100644
> > > --- a/drivers/vhost/vsock.c
> > > +++ b/drivers/vhost/vsock.c
> > > @@ -543,6 +543,11 @@ static int vhost_vsock_start(struct vhost_vsock
> > > *vsock)
> > >                 mutex_unlock(&vq->mutex);
> > >         }
> > >
> > > +       /* Some packets may have been queued before the device was started,
> > > +        * let's kick the send worker to send them.
> > > +        */
> > > +       vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
> > > +
> > Yes, it works.
> > But do you think a threshold should be set here to prevent the queue
> > from being too long? E.g. the client user sends too many connect pkts
> > in a short time before the server is completely ready.
> 
> When the user call the connect() the socket status is moved to
> SS_CONNECTING (see net/vmw_vsock/af_vsock.c), so another connect() on
> the same socket will receive EALREADY error.
> 
> If the user uses multiple sockets, the socket layer already check for
> any limits, so I don't think we should put a threshold here.
> 
> > 
> > >         mutex_unlock(&vsock->dev.mutex);
> > >         return 0;
> > >
> > > I didn't test it, can you try if it fixes the issue?
> > >
> > > I'm not sure which is better...
> > I don't know, either. Wait for more comments 😊
> 
> I prefer the second option, because the device is in a transitional
> state and a connect can block (for at most two seconds) until the device is
> started.
> 
> For the first option, I'm also not sure if ECONNREFUSED is the right error
> to return, maybe is better ENETUNREACH.
> 
> Cheers,
> Stefano

IIRC:

ECONNREFUSED is what one gets when connecting to remote a port which does not
yet have a listening socket, so remote sends back RST.
ENETUNREACH is when local network's down, so you can't even send a
connection request.
EHOSTUNREACH is remote network is down.

-- 
MST


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vhost: vsock: don't send pkt when vq is not started
  2020-04-30 19:43       ` Michael S. Tsirkin
@ 2020-05-01 14:37         ` Stefano Garzarella
  0 siblings, 0 replies; 6+ messages in thread
From: Stefano Garzarella @ 2020-05-01 14:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Justin He, Stefan Hajnoczi, Jason Wang, kvm, virtualization,
	netdev, linux-kernel, Kaly Xin

On Thu, Apr 30, 2020 at 03:43:00PM -0400, Michael S. Tsirkin wrote:
> On Thu, Apr 30, 2020 at 06:25:21PM +0200, Stefano Garzarella wrote:
> > On Thu, Apr 30, 2020 at 10:06:26AM +0000, Justin He wrote:
> > > Hi Stefano
> > > 
> > > > -----Original Message-----
> > > > From: Stefano Garzarella <sgarzare@redhat.com>
> > > > Sent: Thursday, April 30, 2020 4:26 PM
> > > > To: Justin He <Justin.He@arm.com>
> > > > Cc: Stefan Hajnoczi <stefanha@redhat.com>; Michael S. Tsirkin
> > > > <mst@redhat.com>; Jason Wang <jasowang@redhat.com>;
> > > > kvm@vger.kernel.org; virtualization@lists.linux-foundation.org;
> > > > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Kaly Xin
> > > > <Kaly.Xin@arm.com>
> > > > Subject: Re: [PATCH] vhost: vsock: don't send pkt when vq is not started
> > > >
> > > > Hi Jia,
> > > > thanks for the patch, some comments below:
> > > >
> > > > On Thu, Apr 30, 2020 at 10:13:14AM +0800, Jia He wrote:
> > > > > Ning Bo reported an abnormal 2-second gap when booting Kata container
> > > > [1].
> > > > > The unconditional timeout is caused by
> > > > VSOCK_DEFAULT_CONNECT_TIMEOUT of
> > > > > connect at client side. The vhost vsock client tries to connect an
> > > > > initlizing virtio vsock server.
> > > > >
> > > > > The abnormal flow looks like:
> > > > > host-userspace           vhost vsock                       guest vsock
> > > > > ==============           ===========                       ============
> > > > > connect()     -------->  vhost_transport_send_pkt_work()   initializing
> > > > >    |                     vq->private_data==NULL
> > > > >    |                     will not be queued
> > > > >    V
> > > > > schedule_timeout(2s)
> > > > >                          vhost_vsock_start()  <---------   device ready
> > > > >                          set vq->private_data
> > > > >
> > > > > wait for 2s and failed
> > > > >
> > > > > connect() again          vq->private_data!=NULL          recv connecting pkt
> > > > >
> > > > > 1. host userspace sends a connect pkt, at that time, guest vsock is under
> > > > > initializing, hence the vhost_vsock_start has not been called. So
> > > > > vq->private_data==NULL, and the pkt is not been queued to send to guest.
> > > > > 2. then it sleeps for 2s
> > > > > 3. after guest vsock finishes initializing, vq->private_data is set.
> > > > > 4. When host userspace wakes up after 2s, send connecting pkt again,
> > > > > everything is fine.
> > > > >
> > > > > This fixes it by checking vq->private_data in vhost_transport_send_pkt,
> > > > > and return at once if !vq->private_data. This makes user connect()
> > > > > be returned with ECONNREFUSED.
> > > > >
> > > > > After this patch, kata-runtime (with vsock enabled) boottime reduces from
> > > > > 3s to 1s on ThunderX2 arm64 server.
> > > > >
> > > > > [1] https://github.com/kata-containers/runtime/issues/1917
> > > > >
> > > > > Reported-by: Ning Bo <n.b@live.com>
> > > > > Signed-off-by: Jia He <justin.he@arm.com>
> > > > > ---
> > > > >  drivers/vhost/vsock.c | 8 ++++++++
> > > > >  1 file changed, 8 insertions(+)
> > > > >
> > > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > > > index e36aaf9ba7bd..67474334dd88 100644
> > > > > --- a/drivers/vhost/vsock.c
> > > > > +++ b/drivers/vhost/vsock.c
> > > > > @@ -241,6 +241,7 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt
> > > > *pkt)
> > > > >  {
> > > > >  struct vhost_vsock *vsock;
> > > > >  int len = pkt->len;
> > > > > +struct vhost_virtqueue *vq;
> > > > >
> > > > >  rcu_read_lock();
> > > > >
> > > > > @@ -252,6 +253,13 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt
> > > > *pkt)
> > > > >  return -ENODEV;
> > > > >  }
> > > > >
> > > > > +vq = &vsock->vqs[VSOCK_VQ_RX];
> > > > > +if (!vq->private_data) {
> > > >
> > > > I think is better to use vhost_vq_get_backend():
> > > >
> > > > if (!vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])) {
> > > > ...
> > > >
> > > > This function should be called with 'vq->mutex' acquired as explained in
> > > > the comment, but here we can avoid that, because we are not using the vq,
> > > > so it is safe, because in vhost_transport_do_send_pkt() we check it again.
> > > >
> > > > Please add a comment explaining that.
> > > >
> > > 
> > > Thanks, vhost_vq_get_backend is better. I chose a 5.3 kernel to develop
> > > and missed this helper.
> > 
> > :-)
> > 
> > > >
> > > > As an alternative to this patch, should we kick the send worker when the
> > > > device is ready?
> > > >
> > > > IIUC we reach the timeout because the send worker (that runs
> > > > vhost_transport_do_send_pkt()) exits immediately since 'vq->private_data'
> > > > is NULL, and no one will requeue it.
> > > >
> > > > Let's do it when we know the device is ready:
> > > >
> > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > > index e36aaf9ba7bd..295b5867944f 100644
> > > > --- a/drivers/vhost/vsock.c
> > > > +++ b/drivers/vhost/vsock.c
> > > > @@ -543,6 +543,11 @@ static int vhost_vsock_start(struct vhost_vsock
> > > > *vsock)
> > > >                 mutex_unlock(&vq->mutex);
> > > >         }
> > > >
> > > > +       /* Some packets may have been queued before the device was started,
> > > > +        * let's kick the send worker to send them.
> > > > +        */
> > > > +       vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
> > > > +
> > > Yes, it works.
> > > But do you think a threshold should be set here to prevent the queue
> > > from being too long? E.g. the client user sends too many connect pkts
> > > in a short time before the server is completely ready.
> > 
> > When the user call the connect() the socket status is moved to
> > SS_CONNECTING (see net/vmw_vsock/af_vsock.c), so another connect() on
> > the same socket will receive EALREADY error.
> > 
> > If the user uses multiple sockets, the socket layer already check for
> > any limits, so I don't think we should put a threshold here.
> > 
> > > 
> > > >         mutex_unlock(&vsock->dev.mutex);
> > > >         return 0;
> > > >
> > > > I didn't test it, can you try if it fixes the issue?
> > > >
> > > > I'm not sure which is better...
> > > I don't know, either. Wait for more comments 😊
> > 
> > I prefer the second option, because the device is in a transitional
> > state and a connect can block (for at most two seconds) until the device is
> > started.
> > 
> > For the first option, I'm also not sure if ECONNREFUSED is the right error
> > to return, maybe is better ENETUNREACH.
> > 
> > Cheers,
> > Stefano
> 
> IIRC:
> 
> ECONNREFUSED is what one gets when connecting to remote a port which does not
> yet have a listening socket, so remote sends back RST.
> ENETUNREACH is when local network's down, so you can't even send a
> connection request.
> EHOSTUNREACH is remote network is down.

Thanks for the clarification!

I was looking at connect(2) man page and there isn't EHOSTUNREACH in the
ERRORS section :-(

But connect(3p) contains the following that match what you said:
       ECONNRESET
              Remote host reset the connection request.
       ENETUNREACH
              No route to the network is present.
       EHOSTUNREACH
              The  destination host cannot be reached (probably because
              the host is down or a remote router cannot reach it).

So in this case, I think ENETUNREACH should be the best one, since the
device is down and we can't send the connection request, but also
EHOSTUNREACH should fit...

In af_vsock.c we already return ENETUNREACH when the stream is not allowed
or we don't have a transport to use.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-05-01 14:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-30  2:13 [PATCH] vhost: vsock: don't send pkt when vq is not started Jia He
2020-04-30  8:26 ` Stefano Garzarella
2020-04-30 10:06   ` Justin He
2020-04-30 16:25     ` Stefano Garzarella
2020-04-30 19:43       ` Michael S. Tsirkin
2020-05-01 14:37         ` Stefano Garzarella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).