qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] nbd/server: push pending frames after sending reply
@ 2023-03-24 10:47 Florian Westphal
  2023-03-24 17:20 ` Kevin Wolf
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Florian Westphal @ 2023-03-24 10:47 UTC (permalink / raw)
  To: qemu-block; +Cc: eblake, vsementsov, qemu-devel, Florian Westphal

qemu-nbd doesn't set TCP_NODELAY on the tcp socket.

Kernel waits for more data and avoids transmission of small packets.
Without TLS this is barely noticeable, but with TLS this really shows.

Booting a VM via qemu-nbd on localhost (with tls) takes more than
2 minutes on my system.  tcpdump shows frequent wait periods, where no
packets get sent for a 40ms period.

Add explicit (un)corking when processing (and responding to) requests.
"TCP_CORK, &zero" after earlier "CORK, &one" will flush pending data.

VM Boot time:
main:    no tls:  23s, with tls: 2m45s
patched: no tls:  14s, with tls: 15s

VM Boot time, qemu-nbd via network (same lan):
main:    no tls:  18s, with tls: 1m50s
patched: no tls:  17s, with tls: 18s

Future optimization: if we could detect if there is another pending
request we could defer the uncork operation because more data would be
appended.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 nbd/server.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/nbd/server.c b/nbd/server.c
index a4750e41880a..848836d41405 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2667,6 +2667,8 @@ static coroutine_fn void nbd_trip(void *opaque)
         goto disconnect;
     }
 
+    qio_channel_set_cork(client->ioc, true);
+
     if (ret < 0) {
         /* It wasn't -EIO, so, according to nbd_co_receive_request()
          * semantics, we should return the error to the client. */
@@ -2692,6 +2694,7 @@ static coroutine_fn void nbd_trip(void *opaque)
         goto disconnect;
     }
 
+    qio_channel_set_cork(client->ioc, false);
 done:
     nbd_request_put(req);
     nbd_client_put(client);
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] nbd/server: push pending frames after sending reply
  2023-03-24 10:47 [PATCH 1/1] nbd/server: push pending frames after sending reply Florian Westphal
@ 2023-03-24 17:20 ` Kevin Wolf
  2023-03-24 18:20   ` Florian Westphal
  2023-03-24 19:41 ` Eric Blake
  2023-03-27 11:44 ` Kevin Wolf
  2 siblings, 1 reply; 8+ messages in thread
From: Kevin Wolf @ 2023-03-24 17:20 UTC (permalink / raw)
  To: Florian Westphal; +Cc: qemu-block, eblake, vsementsov, qemu-devel

Am 24.03.2023 um 11:47 hat Florian Westphal geschrieben:
> qemu-nbd doesn't set TCP_NODELAY on the tcp socket.
> 
> Kernel waits for more data and avoids transmission of small packets.
> Without TLS this is barely noticeable, but with TLS this really shows.
> 
> Booting a VM via qemu-nbd on localhost (with tls) takes more than
> 2 minutes on my system.  tcpdump shows frequent wait periods, where no
> packets get sent for a 40ms period.
> 
> Add explicit (un)corking when processing (and responding to) requests.
> "TCP_CORK, &zero" after earlier "CORK, &one" will flush pending data.
> 
> VM Boot time:
> main:    no tls:  23s, with tls: 2m45s
> patched: no tls:  14s, with tls: 15s
> 
> VM Boot time, qemu-nbd via network (same lan):
> main:    no tls:  18s, with tls: 1m50s
> patched: no tls:  17s, with tls: 18s
> 
> Future optimization: if we could detect if there is another pending
> request we could defer the uncork operation because more data would be
> appended.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  nbd/server.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index a4750e41880a..848836d41405 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -2667,6 +2667,8 @@ static coroutine_fn void nbd_trip(void *opaque)
>          goto disconnect;
>      }
>  
> +    qio_channel_set_cork(client->ioc, true);
> +
>      if (ret < 0) {
>          /* It wasn't -EIO, so, according to nbd_co_receive_request()
>           * semantics, we should return the error to the client. */
> @@ -2692,6 +2694,7 @@ static coroutine_fn void nbd_trip(void *opaque)
>          goto disconnect;
>      }
>  
> +    qio_channel_set_cork(client->ioc, false);
>  done:
>      nbd_request_put(req);
>      nbd_client_put(client);

In the error paths, we never call set_cork(false) again. I suppose the
reason that this is okay is because the next thing is actually that we
close the socket?

Kevin



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] nbd/server: push pending frames after sending reply
  2023-03-24 17:20 ` Kevin Wolf
@ 2023-03-24 18:20   ` Florian Westphal
  2023-03-24 19:43     ` Eric Blake
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Westphal @ 2023-03-24 18:20 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Florian Westphal, qemu-block, eblake, vsementsov, qemu-devel

Kevin Wolf <kwolf@redhat.com> wrote:
> Am 24.03.2023 um 11:47 hat Florian Westphal geschrieben:
> > +    qio_channel_set_cork(client->ioc, true);
> > +
> >      if (ret < 0) {
> >          /* It wasn't -EIO, so, according to nbd_co_receive_request()
> >           * semantics, we should return the error to the client. */
> > @@ -2692,6 +2694,7 @@ static coroutine_fn void nbd_trip(void *opaque)
> >          goto disconnect;
> >      }
> >  
> > +    qio_channel_set_cork(client->ioc, false);
> >  done:
> >      nbd_request_put(req);
> >      nbd_client_put(client);
> 
> In the error paths, we never call set_cork(false) again. I suppose the
> reason that this is okay is because the next thing is actually that we
> close the socket?

Yes, no need to uncork before close.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] nbd/server: push pending frames after sending reply
  2023-03-24 10:47 [PATCH 1/1] nbd/server: push pending frames after sending reply Florian Westphal
  2023-03-24 17:20 ` Kevin Wolf
@ 2023-03-24 19:41 ` Eric Blake
  2023-03-24 20:03   ` [Libguestfs] " Eric Blake
  2023-03-27 11:44 ` Kevin Wolf
  2 siblings, 1 reply; 8+ messages in thread
From: Eric Blake @ 2023-03-24 19:41 UTC (permalink / raw)
  To: Florian Westphal; +Cc: qemu-block, vsementsov, qemu-devel, libguestfs

On Fri, Mar 24, 2023 at 11:47:20AM +0100, Florian Westphal wrote:
> qemu-nbd doesn't set TCP_NODELAY on the tcp socket.
> 
> Kernel waits for more data and avoids transmission of small packets.
> Without TLS this is barely noticeable, but with TLS this really shows.
> 
> Booting a VM via qemu-nbd on localhost (with tls) takes more than
> 2 minutes on my system.  tcpdump shows frequent wait periods, where no
> packets get sent for a 40ms period.

Thank you for this analysis.

> 
> Add explicit (un)corking when processing (and responding to) requests.
> "TCP_CORK, &zero" after earlier "CORK, &one" will flush pending data.
> 
> VM Boot time:
> main:    no tls:  23s, with tls: 2m45s
> patched: no tls:  14s, with tls: 15s
> 
> VM Boot time, qemu-nbd via network (same lan):
> main:    no tls:  18s, with tls: 1m50s
> patched: no tls:  17s, with tls: 18s

And the timings bear proof that it matters.

> 
> Future optimization: if we could detect if there is another pending
> request we could defer the uncork operation because more data would be
> appended.

nbdkit and libnbd do this with the MSG_MORE flag (plaintext) and TLS
corking (tls); when building up a message to the other side, a flag is
set any time we know we are likely to send more data very shortly.

nbdkit wraps it under a flag SEND_MORE, which applies to both plaintext:
https://gitlab.com/nbdkit/nbdkit/-/blob/master/server/connections.c#L415
and to TLS connections:
https://gitlab.com/nbdkit/nbdkit/-/blob/master/server/crypto.c#L396

while libnbd uses MSG_MORE a bit more directly for the same purpose
for plaintext, but isn't (yet) doing TLS corking:
https://gitlab.com/nbdkit/libnbd/-/blob/master/generator/states-issue-command.c#L53
https://gitlab.com/nbdkit/libnbd/-/blob/master/lib/internal.h#L57

I would love to see a future patch to qio_channel code to support
MSG_MORE in the same way as nbdkit is using its SEND_MORE flag, as the
caller often has more info on whether it is sending a short prefix or
is done with a conceptual message and ready to uncork, and where the
use of a flag can be more efficient than separate passes through
cork/uncork calls.  But even your initial work at properly corking is
a good step in the right direction.

And surprisingly, qemu IS using corking on the client side:
https://gitlab.com/qemu-project/qemu/-/blob/master/block/nbd.c#L525
just not on the server side, before your patch.

> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  nbd/server.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index a4750e41880a..848836d41405 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -2667,6 +2667,8 @@ static coroutine_fn void nbd_trip(void *opaque)
>          goto disconnect;
>      }
>  
> +    qio_channel_set_cork(client->ioc, true);
> +
>      if (ret < 0) {
>          /* It wasn't -EIO, so, according to nbd_co_receive_request()
>           * semantics, we should return the error to the client. */
> @@ -2692,6 +2694,7 @@ static coroutine_fn void nbd_trip(void *opaque)
>          goto disconnect;
>      }
>  
> +    qio_channel_set_cork(client->ioc, false);

Reviewed-by: Eric Blake <eblake@redhat.com>

>  done:
>      nbd_request_put(req);
>      nbd_client_put(client);
> -- 
> 2.39.2
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] nbd/server: push pending frames after sending reply
  2023-03-24 18:20   ` Florian Westphal
@ 2023-03-24 19:43     ` Eric Blake
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Blake @ 2023-03-24 19:43 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Kevin Wolf, qemu-block, vsementsov, qemu-devel

On Fri, Mar 24, 2023 at 07:20:17PM +0100, Florian Westphal wrote:
> Kevin Wolf <kwolf@redhat.com> wrote:
> > Am 24.03.2023 um 11:47 hat Florian Westphal geschrieben:
> > > +    qio_channel_set_cork(client->ioc, true);
> > > +
> > >      if (ret < 0) {
> > >          /* It wasn't -EIO, so, according to nbd_co_receive_request()
> > >           * semantics, we should return the error to the client. */
> > > @@ -2692,6 +2694,7 @@ static coroutine_fn void nbd_trip(void *opaque)
> > >          goto disconnect;
> > >      }
> > >  
> > > +    qio_channel_set_cork(client->ioc, false);
> > >  done:
> > >      nbd_request_put(req);
> > >      nbd_client_put(client);
> > 
> > In the error paths, we never call set_cork(false) again. I suppose the
> > reason that this is okay is because the next thing is actually that we
> > close the socket?
> 
> Yes, no need to uncork before close.

Uncorking may let the close be cleaner (especially in TLS, it is nicer
to let the connection send the necessary handshaking that the
connection is intentionally going down, rather than being abruptly
ripped out by a man-in-the-middle attacker), but NBD already has to
gracefully handle abrupt connection teardown, so we aren't losing
anything if the cork delays the close or if corked data that would
have given a clean shutdown is lost.

If we teach qio_channel to honor MSG_MORE as a way to do auto-corking
without explicit cork/uncork calls, that may take care of itself.  But
that's a bigger project, while this patch seems safe enough as-is.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Libguestfs] [PATCH 1/1] nbd/server: push pending frames after sending reply
  2023-03-24 19:41 ` Eric Blake
@ 2023-03-24 20:03   ` Eric Blake
  2023-03-24 23:55     ` Florian Westphal
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Blake @ 2023-03-24 20:03 UTC (permalink / raw)
  To: Florian Westphal; +Cc: libguestfs, qemu-devel, qemu-block, vsementsov

On Fri, Mar 24, 2023 at 02:41:20PM -0500, Eric Blake wrote:
> On Fri, Mar 24, 2023 at 11:47:20AM +0100, Florian Westphal wrote:
> > qemu-nbd doesn't set TCP_NODELAY on the tcp socket.

Replying to myself, WHY aren't we setting TCP_NODELAY on the socket?

> 
> And surprisingly, qemu IS using corking on the client side:
> https://gitlab.com/qemu-project/qemu/-/blob/master/block/nbd.c#L525
> just not on the server side, before your patch.

Corking matters more when TCP_NODELAY is enabled.  The entire reason
Nagle's algorithm exists (and is default on unless you enable
TCP_NODELAY) is that the benefits of merging smaller piecemeal packets
into larger traffic is a lot easier to do in a common location for
code that isn't super-sensitive to latency and message boundaries.
But once you get to the point where corking or MSG_MORE makes a
difference, then it is obvious you know your message boundaries, and
will benefit from TCP_NODELAY, at the expense of potentially more
network traffic overhead.  One more code search, and I find that we
use TCP_NODELAY in all of:

qemu client: https://gitlab.com/qemu-project/qemu/-/blob/master/nbd/client-connection.c#L143
nbdkit: https://gitlab.com/nbdkit/nbdkit/-/blob/master/server/sockets.c#L430
libnbd: https://gitlab.com/nbdkit/libnbd/-/blob/master/generator/states-connect.c#L41

so I think we _should_ be calling qio_channel_set_delay(false) for
qemu-nbd as well.  That doesn't negate your patch, but rather argues
that we can go for even better performance with TCP_NODELAY also
turned on.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Libguestfs] [PATCH 1/1] nbd/server: push pending frames after sending reply
  2023-03-24 20:03   ` [Libguestfs] " Eric Blake
@ 2023-03-24 23:55     ` Florian Westphal
  0 siblings, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2023-03-24 23:55 UTC (permalink / raw)
  To: Eric Blake
  Cc: Florian Westphal, libguestfs, qemu-devel, qemu-block, vsementsov

Eric Blake <eblake@redhat.com> wrote:
> On Fri, Mar 24, 2023 at 02:41:20PM -0500, Eric Blake wrote:
> > On Fri, Mar 24, 2023 at 11:47:20AM +0100, Florian Westphal wrote:
> > > qemu-nbd doesn't set TCP_NODELAY on the tcp socket.
> 
> Replying to myself, WHY aren't we setting TCP_NODELAY on the socket?

If the application protocol is strictly or mostly request/response then
I agree that qemu-nbd should turn nagle off as well.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] nbd/server: push pending frames after sending reply
  2023-03-24 10:47 [PATCH 1/1] nbd/server: push pending frames after sending reply Florian Westphal
  2023-03-24 17:20 ` Kevin Wolf
  2023-03-24 19:41 ` Eric Blake
@ 2023-03-27 11:44 ` Kevin Wolf
  2 siblings, 0 replies; 8+ messages in thread
From: Kevin Wolf @ 2023-03-27 11:44 UTC (permalink / raw)
  To: Florian Westphal; +Cc: qemu-block, eblake, vsementsov, qemu-devel

Am 24.03.2023 um 11:47 hat Florian Westphal geschrieben:
> qemu-nbd doesn't set TCP_NODELAY on the tcp socket.
> 
> Kernel waits for more data and avoids transmission of small packets.
> Without TLS this is barely noticeable, but with TLS this really shows.
> 
> Booting a VM via qemu-nbd on localhost (with tls) takes more than
> 2 minutes on my system.  tcpdump shows frequent wait periods, where no
> packets get sent for a 40ms period.
> 
> Add explicit (un)corking when processing (and responding to) requests.
> "TCP_CORK, &zero" after earlier "CORK, &one" will flush pending data.
> 
> VM Boot time:
> main:    no tls:  23s, with tls: 2m45s
> patched: no tls:  14s, with tls: 15s
> 
> VM Boot time, qemu-nbd via network (same lan):
> main:    no tls:  18s, with tls: 1m50s
> patched: no tls:  17s, with tls: 18s
> 
> Future optimization: if we could detect if there is another pending
> request we could defer the uncork operation because more data would be
> appended.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>

Thanks, applied to the block branch.

Kevin



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-03-27 11:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-24 10:47 [PATCH 1/1] nbd/server: push pending frames after sending reply Florian Westphal
2023-03-24 17:20 ` Kevin Wolf
2023-03-24 18:20   ` Florian Westphal
2023-03-24 19:43     ` Eric Blake
2023-03-24 19:41 ` Eric Blake
2023-03-24 20:03   ` [Libguestfs] " Eric Blake
2023-03-24 23:55     ` Florian Westphal
2023-03-27 11:44 ` Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).