qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] mptcp support
@ 2021-04-08 19:11 Dr. David Alan Gilbert (git)
  2021-04-08 19:11 ` [RFC PATCH 1/5] channel-socket: Only set CLOEXEC if we have space for fds Dr. David Alan Gilbert (git)
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-08 19:11 UTC (permalink / raw)
  To: qemu-devel, berrange, kraxel, eblake, armbru, pabeni; +Cc: quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hi,
  This RFC set adds support for multipath TCP (mptcp),
in particular on the migration path - but should be extensible
to other users.

  Multipath-tcp is a bit like bonding, but at L3; you can use
it to handle failure, but can also use it to split traffic across
multiple interfaces.

  Using a pair of 10Gb interfaces, I've managed to get 19Gbps
(with the only tuning being using huge pages and turning the MTU up).

  It needs a bleeding-edge Linux kernel (in some older ones you get
false accept messages for the subflows), and a C lib that has the
constants defined (as current glibc does).

  To use it you just need to append ,mptcp to an address;

  -incoming tcp:0:4444,mptcp
  migrate -d tcp:192.168.11.20:4444,mptcp

  I had a quick go at trying NBD as well, but I think it needs
some work with the parsing of NBD addresses.

  All comments welcome.

Dave

Dr. David Alan Gilbert (5):
  channel-socket: Only set CLOEXEC if we have space for fds
  io/net-listener: Call the notifier during finalize
  migration: Add cleanup hook for inwards migration
  migration/socket: Close the listener at the end
  sockets: Support multipath TCP

 io/channel-socket.c   |  8 ++++----
 io/dns-resolver.c     |  2 ++
 io/net-listener.c     |  3 +++
 migration/migration.c |  3 +++
 migration/migration.h |  4 ++++
 migration/multifd.c   |  5 +++++
 migration/socket.c    | 24 ++++++++++++++++++------
 qapi/sockets.json     |  5 ++++-
 util/qemu-sockets.c   | 34 ++++++++++++++++++++++++++++++++++
 9 files changed, 77 insertions(+), 11 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH 1/5] channel-socket: Only set CLOEXEC if we have space for fds
  2021-04-08 19:11 [RFC PATCH 0/5] mptcp support Dr. David Alan Gilbert (git)
@ 2021-04-08 19:11 ` Dr. David Alan Gilbert (git)
  2021-04-09  9:03   ` Daniel P. Berrangé
  2021-04-08 19:11 ` [RFC PATCH 2/5] io/net-listener: Call the notifier during finalize Dr. David Alan Gilbert (git)
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-08 19:11 UTC (permalink / raw)
  To: qemu-devel, berrange, kraxel, eblake, armbru, pabeni; +Cc: quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

MSG_CMSG_CLOEXEC cleans up received fd's; it's really only for Unix
sockets, but currently we enable it for everything; some socket types
(IP_MPTCP) don't like this.

Only enable it when we're giving the recvmsg room to receive fd's
anyway.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 io/channel-socket.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/io/channel-socket.c b/io/channel-socket.c
index de259f7eed..606ec97cf7 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -487,15 +487,15 @@ static ssize_t qio_channel_socket_readv(QIOChannel *ioc,
 
     memset(control, 0, CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS));
 
-#ifdef MSG_CMSG_CLOEXEC
-    sflags |= MSG_CMSG_CLOEXEC;
-#endif
-
     msg.msg_iov = (struct iovec *)iov;
     msg.msg_iovlen = niov;
     if (fds && nfds) {
         msg.msg_control = control;
         msg.msg_controllen = sizeof(control);
+#ifdef MSG_CMSG_CLOEXEC
+        sflags |= MSG_CMSG_CLOEXEC;
+#endif
+
     }
 
  retry:
-- 
2.31.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH 2/5] io/net-listener: Call the notifier during finalize
  2021-04-08 19:11 [RFC PATCH 0/5] mptcp support Dr. David Alan Gilbert (git)
  2021-04-08 19:11 ` [RFC PATCH 1/5] channel-socket: Only set CLOEXEC if we have space for fds Dr. David Alan Gilbert (git)
@ 2021-04-08 19:11 ` Dr. David Alan Gilbert (git)
  2021-04-09  9:06   ` Daniel P. Berrangé
  2021-04-08 19:11 ` [RFC PATCH 3/5] migration: Add cleanup hook for inwards migration Dr. David Alan Gilbert (git)
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-08 19:11 UTC (permalink / raw)
  To: qemu-devel, berrange, kraxel, eblake, armbru, pabeni; +Cc: quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Call the notifier during finalize; it's currently only called
if we change it, which is not the intent.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 io/net-listener.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/io/net-listener.c b/io/net-listener.c
index 46c2643d00..1c984d69c6 100644
--- a/io/net-listener.c
+++ b/io/net-listener.c
@@ -292,6 +292,9 @@ static void qio_net_listener_finalize(Object *obj)
     QIONetListener *listener = QIO_NET_LISTENER(obj);
     size_t i;
 
+    if (listener->io_notify) {
+        listener->io_notify(listener->io_data);
+    }
     qio_net_listener_disconnect(listener);
 
     for (i = 0; i < listener->nsioc; i++) {
-- 
2.31.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH 3/5] migration: Add cleanup hook for inwards migration
  2021-04-08 19:11 [RFC PATCH 0/5] mptcp support Dr. David Alan Gilbert (git)
  2021-04-08 19:11 ` [RFC PATCH 1/5] channel-socket: Only set CLOEXEC if we have space for fds Dr. David Alan Gilbert (git)
  2021-04-08 19:11 ` [RFC PATCH 2/5] io/net-listener: Call the notifier during finalize Dr. David Alan Gilbert (git)
@ 2021-04-08 19:11 ` Dr. David Alan Gilbert (git)
  2021-04-09  9:10   ` Daniel P. Berrangé
  2021-04-08 19:11 ` [RFC PATCH 4/5] migration/socket: Close the listener at the end Dr. David Alan Gilbert (git)
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-08 19:11 UTC (permalink / raw)
  To: qemu-devel, berrange, kraxel, eblake, armbru, pabeni; +Cc: quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add a cleanup hook for incoming migration that gets called
at the end as a way for a transport to allow cleanup.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 3 +++
 migration/migration.h | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index ca8b97baa5..feaedc382e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -279,6 +279,9 @@ void migration_incoming_state_destroy(void)
         g_array_free(mis->postcopy_remote_fds, TRUE);
         mis->postcopy_remote_fds = NULL;
     }
+    if (mis->transport_cleanup) {
+        mis->transport_cleanup(mis->transport_data);
+    }
 
     qemu_event_reset(&mis->main_thread_load_event);
 
diff --git a/migration/migration.h b/migration/migration.h
index db6708326b..1b4c5da917 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -49,6 +49,10 @@ struct PostcopyBlocktimeContext;
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
 
+    /* A hook to allow cleanup at the end of incoming migration */
+    void *transport_data;
+    void (*transport_cleanup)(void *data);
+
     /*
      * Free at the start of the main state load, set as the main thread finishes
      * loading state.
-- 
2.31.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH 4/5] migration/socket: Close the listener at the end
  2021-04-08 19:11 [RFC PATCH 0/5] mptcp support Dr. David Alan Gilbert (git)
                   ` (2 preceding siblings ...)
  2021-04-08 19:11 ` [RFC PATCH 3/5] migration: Add cleanup hook for inwards migration Dr. David Alan Gilbert (git)
@ 2021-04-08 19:11 ` Dr. David Alan Gilbert (git)
  2021-04-09  9:10   ` Daniel P. Berrangé
  2021-04-08 19:11 ` [RFC PATCH 5/5] sockets: Support multipath TCP Dr. David Alan Gilbert (git)
  2021-04-09  9:34 ` [RFC PATCH 0/5] mptcp support Daniel P. Berrangé
  5 siblings, 1 reply; 22+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-08 19:11 UTC (permalink / raw)
  To: qemu-devel, berrange, kraxel, eblake, armbru, pabeni; +Cc: quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Delay closing the listener until the cleanup hook at the end; mptcp
needs the listener to stay open while the other paths come in.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/multifd.c |  5 +++++
 migration/socket.c  | 24 ++++++++++++++++++------
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index a6677c45c8..cebd9029b9 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1165,6 +1165,11 @@ bool multifd_recv_all_channels_created(void)
         return true;
     }
 
+    if (!multifd_recv_state) {
+        /* Called before any connections created */
+        return false;
+    }
+
     return thread_count == qatomic_read(&multifd_recv_state->count);
 }
 
diff --git a/migration/socket.c b/migration/socket.c
index 6016642e04..05705a32d8 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -126,22 +126,31 @@ static void socket_accept_incoming_migration(QIONetListener *listener,
 {
     trace_migration_socket_incoming_accepted();
 
-    qio_channel_set_name(QIO_CHANNEL(cioc), "migration-socket-incoming");
-    migration_channel_process_incoming(QIO_CHANNEL(cioc));
-
     if (migration_has_all_channels()) {
-        /* Close listening socket as its no longer needed */
-        qio_net_listener_disconnect(listener);
-        object_unref(OBJECT(listener));
+        error_report("%s: Extra incoming migration connection; ignoring",
+                     __func__);
+        return;
     }
+
+    qio_channel_set_name(QIO_CHANNEL(cioc), "migration-socket-incoming");
+    migration_channel_process_incoming(QIO_CHANNEL(cioc));
 }
 
+static void
+socket_incoming_migration_end(void *opaque)
+{
+    QIONetListener *listener = opaque;
+
+    qio_net_listener_disconnect(listener);
+    object_unref(OBJECT(listener));
+}
 
 static void
 socket_start_incoming_migration_internal(SocketAddress *saddr,
                                          Error **errp)
 {
     QIONetListener *listener = qio_net_listener_new();
+    MigrationIncomingState *mis = migration_incoming_get_current();
     size_t i;
     int num = 1;
 
@@ -156,6 +165,9 @@ socket_start_incoming_migration_internal(SocketAddress *saddr,
         return;
     }
 
+    mis->transport_data = listener;
+    mis->transport_cleanup = socket_incoming_migration_end;
+
     qio_net_listener_set_client_func_full(listener,
                                           socket_accept_incoming_migration,
                                           NULL, NULL,
-- 
2.31.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH 5/5] sockets: Support multipath TCP
  2021-04-08 19:11 [RFC PATCH 0/5] mptcp support Dr. David Alan Gilbert (git)
                   ` (3 preceding siblings ...)
  2021-04-08 19:11 ` [RFC PATCH 4/5] migration/socket: Close the listener at the end Dr. David Alan Gilbert (git)
@ 2021-04-08 19:11 ` Dr. David Alan Gilbert (git)
  2021-04-09  9:22   ` Daniel P. Berrangé
  2021-04-09  9:34 ` [RFC PATCH 0/5] mptcp support Daniel P. Berrangé
  5 siblings, 1 reply; 22+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2021-04-08 19:11 UTC (permalink / raw)
  To: qemu-devel, berrange, kraxel, eblake, armbru, pabeni; +Cc: quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Multipath TCP allows combining multiple interfaces/routes into a single
socket, with very little work for the user/admin.

It's enabled by 'mptcp' on most socket addresses:

   ./qemu-system-x86_64 -nographic -incoming tcp:0:4444,mptcp

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 io/dns-resolver.c   |  2 ++
 qapi/sockets.json   |  5 ++++-
 util/qemu-sockets.c | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/io/dns-resolver.c b/io/dns-resolver.c
index 743a0efc87..b081e098bb 100644
--- a/io/dns-resolver.c
+++ b/io/dns-resolver.c
@@ -122,6 +122,8 @@ static int qio_dns_resolver_lookup_sync_inet(QIODNSResolver *resolver,
             .ipv4 = iaddr->ipv4,
             .has_ipv6 = iaddr->has_ipv6,
             .ipv6 = iaddr->ipv6,
+            .has_mptcp = iaddr->has_mptcp,
+            .mptcp = iaddr->mptcp,
         };
 
         (*addrs)[i] = newaddr;
diff --git a/qapi/sockets.json b/qapi/sockets.json
index 2e83452797..43122a38bf 100644
--- a/qapi/sockets.json
+++ b/qapi/sockets.json
@@ -57,6 +57,8 @@
 # @keep-alive: enable keep-alive when connecting to this socket. Not supported
 #              for passive sockets. (Since 4.2)
 #
+# @mptcp: enable multi-path TCP. (Since 6.0)
+#
 # Since: 1.3
 ##
 { 'struct': 'InetSocketAddress',
@@ -66,7 +68,8 @@
     '*to': 'uint16',
     '*ipv4': 'bool',
     '*ipv6': 'bool',
-    '*keep-alive': 'bool' } }
+    '*keep-alive': 'bool',
+    '*mptcp': 'bool' } }
 
 ##
 # @UnixSocketAddress:
diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 8af0278f15..72527972d5 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -206,6 +206,21 @@ static int try_bind(int socket, InetSocketAddress *saddr, struct addrinfo *e)
 #endif
 }
 
+static int check_mptcp(const InetSocketAddress *saddr, struct addrinfo *ai,
+                       Error **errp)
+{
+    if (saddr->has_mptcp && saddr->mptcp) {
+#ifdef IPPROTO_MPTCP
+        ai->ai_protocol = IPPROTO_MPTCP;
+#else
+        error_setg(errp, "MPTCP unavailable in this build");
+        return -1;
+#endif
+    }
+
+    return 0;
+}
+
 static int inet_listen_saddr(InetSocketAddress *saddr,
                              int port_offset,
                              int num,
@@ -278,6 +293,11 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
 
     /* create socket + bind/listen */
     for (e = res; e != NULL; e = e->ai_next) {
+        if (check_mptcp(saddr, e, &err)) {
+            error_propagate(errp, err);
+            return -1;
+        }
+
         getnameinfo((struct sockaddr*)e->ai_addr,e->ai_addrlen,
                         uaddr,INET6_ADDRSTRLEN,uport,32,
                         NI_NUMERICHOST | NI_NUMERICSERV);
@@ -456,6 +476,11 @@ int inet_connect_saddr(InetSocketAddress *saddr, Error **errp)
     for (e = res; e != NULL; e = e->ai_next) {
         error_free(local_err);
         local_err = NULL;
+
+        if (check_mptcp(saddr, e, &local_err)) {
+            break;
+        }
+
         sock = inet_connect_addr(saddr, e, &local_err);
         if (sock >= 0) {
             break;
@@ -687,6 +712,15 @@ int inet_parse(InetSocketAddress *addr, const char *str, Error **errp)
         }
         addr->has_keep_alive = true;
     }
+    begin = strstr(optstr, ",mptcp");
+    if (begin) {
+        if (inet_parse_flag("mptcp", begin + strlen(",mptcp"),
+                            &addr->mptcp, errp) < 0)
+        {
+            return -1;
+        }
+        addr->has_mptcp = true;
+    }
     return 0;
 }
 
-- 
2.31.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/5] channel-socket: Only set CLOEXEC if we have space for fds
  2021-04-08 19:11 ` [RFC PATCH 1/5] channel-socket: Only set CLOEXEC if we have space for fds Dr. David Alan Gilbert (git)
@ 2021-04-09  9:03   ` Daniel P. Berrangé
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel P. Berrangé @ 2021-04-09  9:03 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: quintela, armbru, qemu-devel, kraxel, pabeni

On Thu, Apr 08, 2021 at 08:11:55PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> MSG_CMSG_CLOEXEC cleans up received fd's; it's really only for Unix
> sockets, but currently we enable it for everything; some socket types
> (IP_MPTCP) don't like this.
> 
> Only enable it when we're giving the recvmsg room to receive fd's
> anyway.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  io/channel-socket.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 2/5] io/net-listener: Call the notifier during finalize
  2021-04-08 19:11 ` [RFC PATCH 2/5] io/net-listener: Call the notifier during finalize Dr. David Alan Gilbert (git)
@ 2021-04-09  9:06   ` Daniel P. Berrangé
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel P. Berrangé @ 2021-04-09  9:06 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: quintela, armbru, qemu-devel, kraxel, pabeni

On Thu, Apr 08, 2021 at 08:11:56PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Call the notifier during finalize; it's currently only called
> if we change it, which is not the intent.

Harmless so far, since no user has passed a non-NULL notify func

> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  io/net-listener.c | 3 +++
>  1 file changed, 3 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 4/5] migration/socket: Close the listener at the end
  2021-04-08 19:11 ` [RFC PATCH 4/5] migration/socket: Close the listener at the end Dr. David Alan Gilbert (git)
@ 2021-04-09  9:10   ` Daniel P. Berrangé
  2021-04-09  9:20     ` Paolo Abeni
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel P. Berrangé @ 2021-04-09  9:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: quintela, armbru, qemu-devel, kraxel, pabeni

On Thu, Apr 08, 2021 at 08:11:58PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Delay closing the listener until the cleanup hook at the end; mptcp
> needs the listener to stay open while the other paths come in.

So you're saying that when the 'accept(2)' call returns, we are only
guaranteed to have 1 single path accepted, and the other paths
will be accepted by the kernel asynchronously ? Hence we need to
keep listening, even though we're not going to call accept(2) again
ourselves ?

> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  migration/multifd.c |  5 +++++
>  migration/socket.c  | 24 ++++++++++++++++++------
>  2 files changed, 23 insertions(+), 6 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 3/5] migration: Add cleanup hook for inwards migration
  2021-04-08 19:11 ` [RFC PATCH 3/5] migration: Add cleanup hook for inwards migration Dr. David Alan Gilbert (git)
@ 2021-04-09  9:10   ` Daniel P. Berrangé
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel P. Berrangé @ 2021-04-09  9:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: quintela, armbru, qemu-devel, kraxel, pabeni

On Thu, Apr 08, 2021 at 08:11:57PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add a cleanup hook for incoming migration that gets called
> at the end as a way for a transport to allow cleanup.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  migration/migration.c | 3 +++
>  migration/migration.h | 4 ++++
>  2 files changed, 7 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 4/5] migration/socket: Close the listener at the end
  2021-04-09  9:10   ` Daniel P. Berrangé
@ 2021-04-09  9:20     ` Paolo Abeni
  0 siblings, 0 replies; 22+ messages in thread
From: Paolo Abeni @ 2021-04-09  9:20 UTC (permalink / raw)
  To: Daniel P. Berrangé, Dr. David Alan Gilbert (git)
  Cc: armbru, quintela, qemu-devel, kraxel

Hello,

On Fri, 2021-04-09 at 10:10 +0100, Daniel P. Berrangé wrote:
> On Thu, Apr 08, 2021 at 08:11:58PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Delay closing the listener until the cleanup hook at the end; mptcp
> > needs the listener to stay open while the other paths come in.
> 
> So you're saying that when the 'accept(2)' call returns, we are only
> guaranteed to have 1 single path accepted, 

when accept() returns it's guaranteed that the first path (actually
subflow) has been created. Other subflows can be already available, or
can be created later.

> and the other paths
> will be accepted by the kernel asynchronously ? Hence we need to
> keep listening, even though we're not going to call accept(2) again
> ourselves ?

Exactly, the others subflows will be created by the kernel as needed
(according to the configuration and the following MPTCP handshakes) and
will _not_ be exposed directly to the user-space as additional fds. The
fd returned by accept() refers to the main MPTCP socket (that is, the
"aggregated" entity), and not to some specific subflow.

Please let me know if the above clarifies in some way.

Thanks!

Paolo



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 5/5] sockets: Support multipath TCP
  2021-04-08 19:11 ` [RFC PATCH 5/5] sockets: Support multipath TCP Dr. David Alan Gilbert (git)
@ 2021-04-09  9:22   ` Daniel P. Berrangé
  2021-04-10  9:03     ` Markus Armbruster
  2021-04-12 15:42     ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 22+ messages in thread
From: Daniel P. Berrangé @ 2021-04-09  9:22 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: quintela, armbru, qemu-devel, kraxel, pabeni

On Thu, Apr 08, 2021 at 08:11:59PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Multipath TCP allows combining multiple interfaces/routes into a single
> socket, with very little work for the user/admin.
> 
> It's enabled by 'mptcp' on most socket addresses:
> 
>    ./qemu-system-x86_64 -nographic -incoming tcp:0:4444,mptcp
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  io/dns-resolver.c   |  2 ++
>  qapi/sockets.json   |  5 ++++-
>  util/qemu-sockets.c | 34 ++++++++++++++++++++++++++++++++++
>  3 files changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/io/dns-resolver.c b/io/dns-resolver.c
> index 743a0efc87..b081e098bb 100644
> --- a/io/dns-resolver.c
> +++ b/io/dns-resolver.c
> @@ -122,6 +122,8 @@ static int qio_dns_resolver_lookup_sync_inet(QIODNSResolver *resolver,
>              .ipv4 = iaddr->ipv4,
>              .has_ipv6 = iaddr->has_ipv6,
>              .ipv6 = iaddr->ipv6,
> +            .has_mptcp = iaddr->has_mptcp,
> +            .mptcp = iaddr->mptcp,
>          };
>  
>          (*addrs)[i] = newaddr;
> diff --git a/qapi/sockets.json b/qapi/sockets.json
> index 2e83452797..43122a38bf 100644
> --- a/qapi/sockets.json
> +++ b/qapi/sockets.json
> @@ -57,6 +57,8 @@
>  # @keep-alive: enable keep-alive when connecting to this socket. Not supported
>  #              for passive sockets. (Since 4.2)
>  #
> +# @mptcp: enable multi-path TCP. (Since 6.0)
> +#
>  # Since: 1.3
>  ##
>  { 'struct': 'InetSocketAddress',
> @@ -66,7 +68,8 @@
>      '*to': 'uint16',
>      '*ipv4': 'bool',
>      '*ipv6': 'bool',
> -    '*keep-alive': 'bool' } }
> +    '*keep-alive': 'bool',
> +    '*mptcp': 'bool' } }

I think this would need to be

   '*mptcp': { 'type': 'bool', 'if': 'IPPROTO_MPTCP' }

so that mgmt apps can probe when it truely is supported or not for
this build

>  
>  ##
>  # @UnixSocketAddress:
> diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> index 8af0278f15..72527972d5 100644
> --- a/util/qemu-sockets.c
> +++ b/util/qemu-sockets.c
> @@ -206,6 +206,21 @@ static int try_bind(int socket, InetSocketAddress *saddr, struct addrinfo *e)
>  #endif
>  }
>  
> +static int check_mptcp(const InetSocketAddress *saddr, struct addrinfo *ai,
> +                       Error **errp)
> +{
> +    if (saddr->has_mptcp && saddr->mptcp) {
> +#ifdef IPPROTO_MPTCP
> +        ai->ai_protocol = IPPROTO_MPTCP;
> +#else
> +        error_setg(errp, "MPTCP unavailable in this build");
> +        return -1;
> +#endif
> +    }
> +
> +    return 0;
> +}
> +
>  static int inet_listen_saddr(InetSocketAddress *saddr,
>                               int port_offset,
>                               int num,
> @@ -278,6 +293,11 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
>  
>      /* create socket + bind/listen */
>      for (e = res; e != NULL; e = e->ai_next) {
> +        if (check_mptcp(saddr, e, &err)) {
> +            error_propagate(errp, err);
> +            return -1;
> +        }

So this is doing two different things - it checks whether mptcp was
requested and if not compiled in, reports an error. Second it sets
the mptcp flag. The second thing is suprising given the name of
the function but also it delays error reporting until after we've
gone through the DNS lookup which I think is undesirable.

If we make the 'mptcp' field in QAPI schema use the conditional that
I show above, then we make it literally impossible to have the mptcp
field set when IPPROTO_MPTCP is unset, avoiding the need to do error
reporting at all.

IOW, the above 4 lines could be simplified to just

 #ifdef IPPROTO_MPTCP
    if (saddr->has_mptcp && saddr->mptcp) {
        ai->ai_protocol = IPPROTO_MPTCP;
    }
 #else


> @@ -687,6 +712,15 @@ int inet_parse(InetSocketAddress *addr, const char *str, Error **errp)
>          }
>          addr->has_keep_alive = true;
>      }
> +    begin = strstr(optstr, ",mptcp");
> +    if (begin) {
> +        if (inet_parse_flag("mptcp", begin + strlen(",mptcp"),
> +                            &addr->mptcp, errp) < 0)
> +        {
> +            return -1;
> +        }
> +        addr->has_mptcp = true;
> +    }

This reminds me that inet_parse_flag is a bit of a crude design right
now, because it only does half the job, leaving half the repeated code
pattern in the caller still, with use having the string ",mtcp" /"mptcp"
repeated three times !

If you fancy refactoring it, i think it'd make more sense if we could
just have a caller pattern of

   if (inet_parse_flag(optstr,
                       "mptcp",
                       &addr->has_mptcp,
                       &addr->mptcp, errp) < 0)

Not a blocker todo this though.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/5] mptcp support
  2021-04-08 19:11 [RFC PATCH 0/5] mptcp support Dr. David Alan Gilbert (git)
                   ` (4 preceding siblings ...)
  2021-04-08 19:11 ` [RFC PATCH 5/5] sockets: Support multipath TCP Dr. David Alan Gilbert (git)
@ 2021-04-09  9:34 ` Daniel P. Berrangé
  2021-04-09  9:42   ` Daniel P. Berrangé
                     ` (2 more replies)
  5 siblings, 3 replies; 22+ messages in thread
From: Daniel P. Berrangé @ 2021-04-09  9:34 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: quintela, armbru, qemu-devel, kraxel, pabeni

On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Hi,
>   This RFC set adds support for multipath TCP (mptcp),
> in particular on the migration path - but should be extensible
> to other users.
> 
>   Multipath-tcp is a bit like bonding, but at L3; you can use
> it to handle failure, but can also use it to split traffic across
> multiple interfaces.
> 
>   Using a pair of 10Gb interfaces, I've managed to get 19Gbps
> (with the only tuning being using huge pages and turning the MTU up).
> 
>   It needs a bleeding-edge Linux kernel (in some older ones you get
> false accept messages for the subflows), and a C lib that has the
> constants defined (as current glibc does).
> 
>   To use it you just need to append ,mptcp to an address;
> 
>   -incoming tcp:0:4444,mptcp
>   migrate -d tcp:192.168.11.20:4444,mptcp

What happens if you only enable mptcp flag on one side of the
stream (whether client or server), does it degrade to boring
old single path TCP, or does it result in an error ?

>   I had a quick go at trying NBD as well, but I think it needs
> some work with the parsing of NBD addresses.

In theory this is applicable to anywhere that we use sockets.
Anywhere that is configured with the QAPI  SocketAddress /
SocketAddressLegacy type will get it for free AFAICT.

Anywhere that is configured via QemuOpts will need an enhancement.

IOW, I would think NBD already works if you configure NBD via
QMP with nbd-server-start, or block-export-add.  qemu-nbd will
need cli options added.

The block layer clients for NBD, Gluster, Sheepdog and SSH also
all get it for free when configured va QMP, or -blockdev AFAICT

Legacy blocklayer filename syntax would need extra parsing, or
we can just not bother and say if you want new features, use
blockdev.


Overall this is impressively simple.

It feels like it obsoletes the multifd migration code, at least
if you assume Linux platform and new enough kernel ?

Except TLS... We already bottleneck on TLS encryption with
a single FD, since userspace encryption is limited to a
single thread.

There is the KTLS feature which offloads TLS encryption/decryption
to the kernel. This benefits even regular single FD performance,
because the encrytion work can be done by the kernel in a separate
thread from the userspace IO syscalls.

Any idea if KTLS is fully compatible with MPTCP ?  If so, then that
would look like it makes it a full replacementfor multifd on Linux.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/5] mptcp support
  2021-04-09  9:34 ` [RFC PATCH 0/5] mptcp support Daniel P. Berrangé
@ 2021-04-09  9:42   ` Daniel P. Berrangé
  2021-04-09  9:55     ` Paolo Abeni
  2021-04-12 14:46     ` Dr. David Alan Gilbert
  2021-04-09  9:47   ` Paolo Abeni
  2021-04-12 14:51   ` Dr. David Alan Gilbert
  2 siblings, 2 replies; 22+ messages in thread
From: Daniel P. Berrangé @ 2021-04-09  9:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git),
	quintela, armbru, qemu-devel, kraxel, pabeni

On Fri, Apr 09, 2021 at 10:34:30AM +0100, Daniel P. Berrangé wrote:
> On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >   I had a quick go at trying NBD as well, but I think it needs
> > some work with the parsing of NBD addresses.
> 
> In theory this is applicable to anywhere that we use sockets.
> Anywhere that is configured with the QAPI  SocketAddress /
> SocketAddressLegacy type will get it for free AFAICT.

The caveat is any servers which share the problem of prematurely
closing the listener socket that you fixed here for migration.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/5] mptcp support
  2021-04-09  9:34 ` [RFC PATCH 0/5] mptcp support Daniel P. Berrangé
  2021-04-09  9:42   ` Daniel P. Berrangé
@ 2021-04-09  9:47   ` Paolo Abeni
  2021-04-12 14:51   ` Dr. David Alan Gilbert
  2 siblings, 0 replies; 22+ messages in thread
From: Paolo Abeni @ 2021-04-09  9:47 UTC (permalink / raw)
  To: Daniel P. Berrangé, Dr. David Alan Gilbert (git)
  Cc: armbru, quintela, qemu-devel, kraxel

On Fri, 2021-04-09 at 10:34 +0100, Daniel P. Berrangé wrote:
> On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Hi,
> >   This RFC set adds support for multipath TCP (mptcp),
> > in particular on the migration path - but should be extensible
> > to other users.
> > 
> >   Multipath-tcp is a bit like bonding, but at L3; you can use
> > it to handle failure, but can also use it to split traffic across
> > multiple interfaces.
> > 
> >   Using a pair of 10Gb interfaces, I've managed to get 19Gbps
> > (with the only tuning being using huge pages and turning the MTU up).
> > 
> >   It needs a bleeding-edge Linux kernel (in some older ones you get
> > false accept messages for the subflows), and a C lib that has the
> > constants defined (as current glibc does).
> > 
> >   To use it you just need to append ,mptcp to an address;
> > 
> >   -incoming tcp:0:4444,mptcp
> >   migrate -d tcp:192.168.11.20:4444,mptcp
> 
> What happens if you only enable mptcp flag on one side of the
> stream (whether client or server), does it degrade to boring
> old single path TCP, or does it result in an error ?

If the mptcp handshake fails by any means - e.g. one side does not ask
for MPTCP - the connection fallbacks to plain TCP in a transparent way.

> >   I had a quick go at trying NBD as well, but I think it needs
> > some work with the parsing of NBD addresses.
> 
> In theory this is applicable to anywhere that we use sockets.
> Anywhere that is configured with the QAPI  SocketAddress /
> SocketAddressLegacy type will get it for free AFAICT.
> 
> Anywhere that is configured via QemuOpts will need an enhancement.
> 
> IOW, I would think NBD already works if you configure NBD via
> QMP with nbd-server-start, or block-export-add.  qemu-nbd will
> need cli options added.
> 
> The block layer clients for NBD, Gluster, Sheepdog and SSH also
> all get it for free when configured va QMP, or -blockdev AFAICT
> 
> Legacy blocklayer filename syntax would need extra parsing, or
> we can just not bother and say if you want new features, use
> blockdev.
> 
> 
> Overall this is impressively simple.
> 
> It feels like it obsoletes the multifd migration code, at least
> if you assume Linux platform and new enough kernel ?
> 
> Except TLS... We already bottleneck on TLS encryption with
> a single FD, since userspace encryption is limited to a
> single thread.
> 
> There is the KTLS feature which offloads TLS encryption/decryption
> to the kernel. This benefits even regular single FD performance,
> because the encrytion work can be done by the kernel in a separate
> thread from the userspace IO syscalls.
> 
> Any idea if KTLS is fully compatible with MPTCP ?  

Ouch!

So far is not supported. Both KTLS and MPTCP use/need ULP (Upper Layer
Protocol, a kernel way of hijaking core TCP features) and we can have a
single ULP per socket, so possibly that there is some technical show-
stopper there.

At very least is not in our short term roadmap, but I guess we can
updated that based on user needs.

Thanks!

Paolo



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/5] mptcp support
  2021-04-09  9:42   ` Daniel P. Berrangé
@ 2021-04-09  9:55     ` Paolo Abeni
  2021-04-12 14:46     ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 22+ messages in thread
From: Paolo Abeni @ 2021-04-09  9:55 UTC (permalink / raw)
  To: Daniel P. Berrangé, Dr. David Alan Gilbert (git),
	quintela, armbru, qemu-devel, kraxel

On Fri, 2021-04-09 at 10:42 +0100, Daniel P. Berrangé wrote:
> On Fri, Apr 09, 2021 at 10:34:30AM +0100, Daniel P. Berrangé wrote:
> > On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > >   I had a quick go at trying NBD as well, but I think it needs
> > > some work with the parsing of NBD addresses.
> > 
> > In theory this is applicable to anywhere that we use sockets.
> > Anywhere that is configured with the QAPI  SocketAddress /
> > SocketAddressLegacy type will get it for free AFAICT.
> 
> The caveat is any servers which share the problem of prematurely
> closing the listener socket that you fixed here for migration.

For the records, there is an alternative to that, based on a more
advanced and complex MPTCP configuration available only on even more
recent kernels. MPTCP can be configured to accept additional subflows
on a different listener, which will be managed (created and disposed)
by the kernel with no additional user-space changes (beyond the MPTCP
configuration).

That will require also a suitable firewalld (if enabled) configuration
(keeping the additional port open/accessible from the client).

Finally such configuration can be even more complex e.g. the additional
listener could be alternatively configured on the client side (!!!) and
the server could be configured to create additional subflows connecting
to such port (again no user-space changes needed, "only" more complex
MPTCP configuration).

Cheers,

Paolo





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 5/5] sockets: Support multipath TCP
  2021-04-09  9:22   ` Daniel P. Berrangé
@ 2021-04-10  9:03     ` Markus Armbruster
  2021-04-12 15:42     ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 22+ messages in thread
From: Markus Armbruster @ 2021-04-10  9:03 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, pabeni, kraxel, Dr. David Alan Gilbert (git), quintela

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Thu, Apr 08, 2021 at 08:11:59PM +0100, Dr. David Alan Gilbert (git) wrote:
>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> 
>> Multipath TCP allows combining multiple interfaces/routes into a single
>> socket, with very little work for the user/admin.
>> 
>> It's enabled by 'mptcp' on most socket addresses:
>> 
>>    ./qemu-system-x86_64 -nographic -incoming tcp:0:4444,mptcp
>> 
>> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> ---
>>  io/dns-resolver.c   |  2 ++
>>  qapi/sockets.json   |  5 ++++-
>>  util/qemu-sockets.c | 34 ++++++++++++++++++++++++++++++++++
>>  3 files changed, 40 insertions(+), 1 deletion(-)
>> 
>> diff --git a/io/dns-resolver.c b/io/dns-resolver.c
>> index 743a0efc87..b081e098bb 100644
>> --- a/io/dns-resolver.c
>> +++ b/io/dns-resolver.c
>> @@ -122,6 +122,8 @@ static int qio_dns_resolver_lookup_sync_inet(QIODNSResolver *resolver,
>>              .ipv4 = iaddr->ipv4,
>>              .has_ipv6 = iaddr->has_ipv6,
>>              .ipv6 = iaddr->ipv6,
>> +            .has_mptcp = iaddr->has_mptcp,
>> +            .mptcp = iaddr->mptcp,
>>          };
>>  
>>          (*addrs)[i] = newaddr;
>> diff --git a/qapi/sockets.json b/qapi/sockets.json
>> index 2e83452797..43122a38bf 100644
>> --- a/qapi/sockets.json
>> +++ b/qapi/sockets.json
>> @@ -57,6 +57,8 @@
>>  # @keep-alive: enable keep-alive when connecting to this socket. Not supported
>>  #              for passive sockets. (Since 4.2)
>>  #
>> +# @mptcp: enable multi-path TCP. (Since 6.0)
>> +#
>>  # Since: 1.3
>>  ##
>>  { 'struct': 'InetSocketAddress',
>> @@ -66,7 +68,8 @@
>>      '*to': 'uint16',
>>      '*ipv4': 'bool',
>>      '*ipv6': 'bool',
>> -    '*keep-alive': 'bool' } }
>> +    '*keep-alive': 'bool',
>> +    '*mptcp': 'bool' } }
>
> I think this would need to be
>
>    '*mptcp': { 'type': 'bool', 'if': 'IPPROTO_MPTCP' }
>
> so that mgmt apps can probe when it truely is supported or not for
> this build

Yes.  Instance of a somewhat common anti-pattern "declare
unconditionally (this hunk), write unconditionally (previous hunk), read
conditionally (next hunk).  Besides defeating introspection, it also
exposes configuration knobs that don't do anything.

>>  
>>  ##
>>  # @UnixSocketAddress:
>> diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
>> index 8af0278f15..72527972d5 100644
>> --- a/util/qemu-sockets.c
>> +++ b/util/qemu-sockets.c
>> @@ -206,6 +206,21 @@ static int try_bind(int socket, InetSocketAddress *saddr, struct addrinfo *e)
>>  #endif
>>  }
>>  
>> +static int check_mptcp(const InetSocketAddress *saddr, struct addrinfo *ai,
>> +                       Error **errp)
>> +{
>> +    if (saddr->has_mptcp && saddr->mptcp) {
>> +#ifdef IPPROTO_MPTCP
>> +        ai->ai_protocol = IPPROTO_MPTCP;
>> +#else
>> +        error_setg(errp, "MPTCP unavailable in this build");
>> +        return -1;
>> +#endif
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>  static int inet_listen_saddr(InetSocketAddress *saddr,
>>                               int port_offset,
>>                               int num,
[...]



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/5] mptcp support
  2021-04-09  9:42   ` Daniel P. Berrangé
  2021-04-09  9:55     ` Paolo Abeni
@ 2021-04-12 14:46     ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 22+ messages in thread
From: Dr. David Alan Gilbert @ 2021-04-12 14:46 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, pabeni, kraxel, armbru, quintela

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Fri, Apr 09, 2021 at 10:34:30AM +0100, Daniel P. Berrangé wrote:
> > On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > >   I had a quick go at trying NBD as well, but I think it needs
> > > some work with the parsing of NBD addresses.
> > 
> > In theory this is applicable to anywhere that we use sockets.
> > Anywhere that is configured with the QAPI  SocketAddress /
> > SocketAddressLegacy type will get it for free AFAICT.
> 
> The caveat is any servers which share the problem of prematurely
> closing the listener socket that you fixed here for migration.

Right, this varies depending on the server semantics; migration is only
expecting a single connection so shut it immediately; nbd is already
wired to expect multiple connections.

Dave

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/5] mptcp support
  2021-04-09  9:34 ` [RFC PATCH 0/5] mptcp support Daniel P. Berrangé
  2021-04-09  9:42   ` Daniel P. Berrangé
  2021-04-09  9:47   ` Paolo Abeni
@ 2021-04-12 14:51   ` Dr. David Alan Gilbert
  2021-04-12 14:56     ` Daniel P. Berrangé
  2 siblings, 1 reply; 22+ messages in thread
From: Dr. David Alan Gilbert @ 2021-04-12 14:51 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: quintela, armbru, qemu-devel, kraxel, pabeni

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Hi,
> >   This RFC set adds support for multipath TCP (mptcp),
> > in particular on the migration path - but should be extensible
> > to other users.
> > 
> >   Multipath-tcp is a bit like bonding, but at L3; you can use
> > it to handle failure, but can also use it to split traffic across
> > multiple interfaces.
> > 
> >   Using a pair of 10Gb interfaces, I've managed to get 19Gbps
> > (with the only tuning being using huge pages and turning the MTU up).
> > 
> >   It needs a bleeding-edge Linux kernel (in some older ones you get
> > false accept messages for the subflows), and a C lib that has the
> > constants defined (as current glibc does).
> > 
> >   To use it you just need to append ,mptcp to an address;
> > 
> >   -incoming tcp:0:4444,mptcp
> >   migrate -d tcp:192.168.11.20:4444,mptcp
> 
> What happens if you only enable mptcp flag on one side of the
> stream (whether client or server), does it degrade to boring
> old single path TCP, or does it result in an error ?

I've just tested this and it matches what pabeni said; it seems to just
fall back.

> >   I had a quick go at trying NBD as well, but I think it needs
> > some work with the parsing of NBD addresses.
> 
> In theory this is applicable to anywhere that we use sockets.
> Anywhere that is configured with the QAPI  SocketAddress /
> SocketAddressLegacy type will get it for free AFAICT.

That was my hope.

> Anywhere that is configured via QemuOpts will need an enhancement.
> 
> IOW, I would think NBD already works if you configure NBD via
> QMP with nbd-server-start, or block-export-add.  qemu-nbd will
> need cli options added.
> 
> The block layer clients for NBD, Gluster, Sheepdog and SSH also
> all get it for free when configured va QMP, or -blockdev AFAICT

Have you got some examples via QMP?
I'd failed trying -drive if=virtio,file=nbd://192.168.11.20:3333,mptcp=on/zero

> Legacy blocklayer filename syntax would need extra parsing, or
> we can just not bother and say if you want new features, use
> blockdev.
> 
> 
> Overall this is impressively simple.

Yeh; lots of small unexpected tidyups that took a while to fix.

> It feels like it obsoletes the multifd migration code, at least
> if you assume Linux platform and new enough kernel ?
>
> Except TLS... We already bottleneck on TLS encryption with
> a single FD, since userspace encryption is limited to a
> single thread.

Even without TLS we already run out of CPU, probably on the receiving
thread at around 20Gbps; which is a bit meh, compared to multifd which
I have seen hit 80Gbps on a particularly well greased 100Gbps
connection.
Curiously my attempts with multifd+mptcp so far have it being slower
than with just mptcp on it's own, not hitting the 20Gbps - not sure why
yet.

> There is the KTLS feature which offloads TLS encryption/decryption
> to the kernel. This benefits even regular single FD performance,
> because the encrytion work can be done by the kernel in a separate
> thread from the userspace IO syscalls.
> 
> Any idea if KTLS is fully compatible with MPTCP ?  If so, then that
> would look like it makes it a full replacementfor multifd on Linux.

I've not tried kTLS at all yet; as pabeni says, not currently
compatible.
The otherones I'd like to try are zero-copy offload receive/transmit
(again I'm not sure those are compatible).

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/5] mptcp support
  2021-04-12 14:51   ` Dr. David Alan Gilbert
@ 2021-04-12 14:56     ` Daniel P. Berrangé
  2021-04-14 18:49       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel P. Berrangé @ 2021-04-12 14:56 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: quintela, armbru, qemu-devel, kraxel, pabeni

On Mon, Apr 12, 2021 at 03:51:10PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Hi,
> > >   This RFC set adds support for multipath TCP (mptcp),
> > > in particular on the migration path - but should be extensible
> > > to other users.
> > > 
> > >   Multipath-tcp is a bit like bonding, but at L3; you can use
> > > it to handle failure, but can also use it to split traffic across
> > > multiple interfaces.
> > > 
> > >   Using a pair of 10Gb interfaces, I've managed to get 19Gbps
> > > (with the only tuning being using huge pages and turning the MTU up).
> > > 
> > >   It needs a bleeding-edge Linux kernel (in some older ones you get
> > > false accept messages for the subflows), and a C lib that has the
> > > constants defined (as current glibc does).
> > > 
> > >   To use it you just need to append ,mptcp to an address;
> > > 
> > >   -incoming tcp:0:4444,mptcp
> > >   migrate -d tcp:192.168.11.20:4444,mptcp
> > 
> > What happens if you only enable mptcp flag on one side of the
> > stream (whether client or server), does it degrade to boring
> > old single path TCP, or does it result in an error ?
> 
> I've just tested this and it matches what pabeni said; it seems to just
> fall back.
> 
> > >   I had a quick go at trying NBD as well, but I think it needs
> > > some work with the parsing of NBD addresses.
> > 
> > In theory this is applicable to anywhere that we use sockets.
> > Anywhere that is configured with the QAPI  SocketAddress /
> > SocketAddressLegacy type will get it for free AFAICT.
> 
> That was my hope.
> 
> > Anywhere that is configured via QemuOpts will need an enhancement.
> > 
> > IOW, I would think NBD already works if you configure NBD via
> > QMP with nbd-server-start, or block-export-add.  qemu-nbd will
> > need cli options added.
> > 
> > The block layer clients for NBD, Gluster, Sheepdog and SSH also
> > all get it for free when configured va QMP, or -blockdev AFAICT
> 
> Have you got some examples via QMP?
> I'd failed trying -drive if=virtio,file=nbd://192.168.11.20:3333,mptcp=on/zero

I never remember the mapping to blockdev QAPI schema, especially
when using legacy filename syntax with the URI.

Try instead

 -blockdev driver=nbd,host=192.168.11.20,port=3333,mptcp=on,id=disk0backend
 -device virtio-blk,drive=disk0backend,id=disk0



Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 5/5] sockets: Support multipath TCP
  2021-04-09  9:22   ` Daniel P. Berrangé
  2021-04-10  9:03     ` Markus Armbruster
@ 2021-04-12 15:42     ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 22+ messages in thread
From: Dr. David Alan Gilbert @ 2021-04-12 15:42 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: quintela, armbru, qemu-devel, kraxel, pabeni

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Apr 08, 2021 at 08:11:59PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Multipath TCP allows combining multiple interfaces/routes into a single
> > socket, with very little work for the user/admin.
> > 
> > It's enabled by 'mptcp' on most socket addresses:
> > 
> >    ./qemu-system-x86_64 -nographic -incoming tcp:0:4444,mptcp
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  io/dns-resolver.c   |  2 ++
> >  qapi/sockets.json   |  5 ++++-
> >  util/qemu-sockets.c | 34 ++++++++++++++++++++++++++++++++++
> >  3 files changed, 40 insertions(+), 1 deletion(-)
> > 
> > diff --git a/io/dns-resolver.c b/io/dns-resolver.c
> > index 743a0efc87..b081e098bb 100644
> > --- a/io/dns-resolver.c
> > +++ b/io/dns-resolver.c
> > @@ -122,6 +122,8 @@ static int qio_dns_resolver_lookup_sync_inet(QIODNSResolver *resolver,
> >              .ipv4 = iaddr->ipv4,
> >              .has_ipv6 = iaddr->has_ipv6,
> >              .ipv6 = iaddr->ipv6,
> > +            .has_mptcp = iaddr->has_mptcp,
> > +            .mptcp = iaddr->mptcp,
> >          };
> >  
> >          (*addrs)[i] = newaddr;
> > diff --git a/qapi/sockets.json b/qapi/sockets.json
> > index 2e83452797..43122a38bf 100644
> > --- a/qapi/sockets.json
> > +++ b/qapi/sockets.json
> > @@ -57,6 +57,8 @@
> >  # @keep-alive: enable keep-alive when connecting to this socket. Not supported
> >  #              for passive sockets. (Since 4.2)
> >  #
> > +# @mptcp: enable multi-path TCP. (Since 6.0)
> > +#
> >  # Since: 1.3
> >  ##
> >  { 'struct': 'InetSocketAddress',
> > @@ -66,7 +68,8 @@
> >      '*to': 'uint16',
> >      '*ipv4': 'bool',
> >      '*ipv6': 'bool',
> > -    '*keep-alive': 'bool' } }
> > +    '*keep-alive': 'bool',
> > +    '*mptcp': 'bool' } }
> 
> I think this would need to be
> 
>    '*mptcp': { 'type': 'bool', 'if': 'IPPROTO_MPTCP' }
> 
> so that mgmt apps can probe when it truely is supported or not for
> this build

Done; now remember that just tells you if your C library knows about it,
not whether your kernel, firewall, or avian carriers know about it.

> >  
> >  ##
> >  # @UnixSocketAddress:
> > diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> > index 8af0278f15..72527972d5 100644
> > --- a/util/qemu-sockets.c
> > +++ b/util/qemu-sockets.c
> > @@ -206,6 +206,21 @@ static int try_bind(int socket, InetSocketAddress *saddr, struct addrinfo *e)
> >  #endif
> >  }
> >  
> > +static int check_mptcp(const InetSocketAddress *saddr, struct addrinfo *ai,
> > +                       Error **errp)
> > +{
> > +    if (saddr->has_mptcp && saddr->mptcp) {
> > +#ifdef IPPROTO_MPTCP
> > +        ai->ai_protocol = IPPROTO_MPTCP;
> > +#else
> > +        error_setg(errp, "MPTCP unavailable in this build");
> > +        return -1;
> > +#endif
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >  static int inet_listen_saddr(InetSocketAddress *saddr,
> >                               int port_offset,
> >                               int num,
> > @@ -278,6 +293,11 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
> >  
> >      /* create socket + bind/listen */
> >      for (e = res; e != NULL; e = e->ai_next) {
> > +        if (check_mptcp(saddr, e, &err)) {
> > +            error_propagate(errp, err);
> > +            return -1;
> > +        }
> 
> So this is doing two different things - it checks whether mptcp was
> requested and if not compiled in, reports an error. Second it sets
> the mptcp flag. The second thing is suprising given the name of
> the function but also it delays error reporting until after we've
> gone through the DNS lookup which I think is undesirable.
> 
> If we make the 'mptcp' field in QAPI schema use the conditional that
> I show above, then we make it literally impossible to have the mptcp
> field set when IPPROTO_MPTCP is unset, avoiding the need to do error
> reporting at all.
> 
> IOW, the above 4 lines could be simplified to just
> 
>  #ifdef IPPROTO_MPTCP
>     if (saddr->has_mptcp && saddr->mptcp) {
>         ai->ai_protocol = IPPROTO_MPTCP;
>     }
>  #else

OK, done - (with a #endif)

> 
> 
> > @@ -687,6 +712,15 @@ int inet_parse(InetSocketAddress *addr, const char *str, Error **errp)
> >          }
> >          addr->has_keep_alive = true;
> >      }
> > +    begin = strstr(optstr, ",mptcp");
> > +    if (begin) {
> > +        if (inet_parse_flag("mptcp", begin + strlen(",mptcp"),
> > +                            &addr->mptcp, errp) < 0)
> > +        {
> > +            return -1;
> > +        }
> > +        addr->has_mptcp = true;
> > +    }
> 
> This reminds me that inet_parse_flag is a bit of a crude design right
> now, because it only does half the job, leaving half the repeated code
> pattern in the caller still, with use having the string ",mtcp" /"mptcp"
> repeated three times !

Yeh I noticed that.

> If you fancy refactoring it, i think it'd make more sense if we could
> just have a caller pattern of
> 
>    if (inet_parse_flag(optstr,
>                        "mptcp",
>                        &addr->has_mptcp,
>                        &addr->mptcp, errp) < 0)
> 
> Not a blocker todo this though.

A job for another day.

Dave

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/5] mptcp support
  2021-04-12 14:56     ` Daniel P. Berrangé
@ 2021-04-14 18:49       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 22+ messages in thread
From: Dr. David Alan Gilbert @ 2021-04-14 18:49 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: quintela, armbru, qemu-devel, kraxel, pabeni

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Mon, Apr 12, 2021 at 03:51:10PM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > On Thu, Apr 08, 2021 at 08:11:54PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > 
> > > > Hi,
> > > >   This RFC set adds support for multipath TCP (mptcp),
> > > > in particular on the migration path - but should be extensible
> > > > to other users.
> > > > 
> > > >   Multipath-tcp is a bit like bonding, but at L3; you can use
> > > > it to handle failure, but can also use it to split traffic across
> > > > multiple interfaces.
> > > > 
> > > >   Using a pair of 10Gb interfaces, I've managed to get 19Gbps
> > > > (with the only tuning being using huge pages and turning the MTU up).
> > > > 
> > > >   It needs a bleeding-edge Linux kernel (in some older ones you get
> > > > false accept messages for the subflows), and a C lib that has the
> > > > constants defined (as current glibc does).
> > > > 
> > > >   To use it you just need to append ,mptcp to an address;
> > > > 
> > > >   -incoming tcp:0:4444,mptcp
> > > >   migrate -d tcp:192.168.11.20:4444,mptcp
> > > 
> > > What happens if you only enable mptcp flag on one side of the
> > > stream (whether client or server), does it degrade to boring
> > > old single path TCP, or does it result in an error ?
> > 
> > I've just tested this and it matches what pabeni said; it seems to just
> > fall back.
> > 
> > > >   I had a quick go at trying NBD as well, but I think it needs
> > > > some work with the parsing of NBD addresses.
> > > 
> > > In theory this is applicable to anywhere that we use sockets.
> > > Anywhere that is configured with the QAPI  SocketAddress /
> > > SocketAddressLegacy type will get it for free AFAICT.
> > 
> > That was my hope.
> > 
> > > Anywhere that is configured via QemuOpts will need an enhancement.
> > > 
> > > IOW, I would think NBD already works if you configure NBD via
> > > QMP with nbd-server-start, or block-export-add.  qemu-nbd will
> > > need cli options added.
> > > 
> > > The block layer clients for NBD, Gluster, Sheepdog and SSH also
> > > all get it for free when configured va QMP, or -blockdev AFAICT
> > 
> > Have you got some examples via QMP?
> > I'd failed trying -drive if=virtio,file=nbd://192.168.11.20:3333,mptcp=on/zero
> 
> I never remember the mapping to blockdev QAPI schema, especially
> when using legacy filename syntax with the URI.
> 
> Try instead
> 
>  -blockdev driver=nbd,host=192.168.11.20,port=3333,mptcp=on,id=disk0backend
>  -device virtio-blk,drive=disk0backend,id=disk0

That doesn't look like the right syntax, but it got me closer; and it's
working with no more code changes:

On the source:

qemu... -nographic -M none -drive if=none,file=my.qcow2,id=mydisk
(qemu) nbd_server_start 0.0.0.0:3333,mptcp=on
(qemu) nbd_server_add -w mydisk

On the destination:
-blockdev driver=nbd,server.type=inet,server.host=192.168.11.20,server.port=3333,server.mptcp=on,node-name=nbddisk,export=mydisk -device virtio-blk,drive=nbddisk,id=disk0

and it succesfully booted off it, and it looks like it has two flows.
(It didn't get that great a bandwidth, but I'm not sure where that's due
to).

Dave
> 
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-04-14 18:51 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-08 19:11 [RFC PATCH 0/5] mptcp support Dr. David Alan Gilbert (git)
2021-04-08 19:11 ` [RFC PATCH 1/5] channel-socket: Only set CLOEXEC if we have space for fds Dr. David Alan Gilbert (git)
2021-04-09  9:03   ` Daniel P. Berrangé
2021-04-08 19:11 ` [RFC PATCH 2/5] io/net-listener: Call the notifier during finalize Dr. David Alan Gilbert (git)
2021-04-09  9:06   ` Daniel P. Berrangé
2021-04-08 19:11 ` [RFC PATCH 3/5] migration: Add cleanup hook for inwards migration Dr. David Alan Gilbert (git)
2021-04-09  9:10   ` Daniel P. Berrangé
2021-04-08 19:11 ` [RFC PATCH 4/5] migration/socket: Close the listener at the end Dr. David Alan Gilbert (git)
2021-04-09  9:10   ` Daniel P. Berrangé
2021-04-09  9:20     ` Paolo Abeni
2021-04-08 19:11 ` [RFC PATCH 5/5] sockets: Support multipath TCP Dr. David Alan Gilbert (git)
2021-04-09  9:22   ` Daniel P. Berrangé
2021-04-10  9:03     ` Markus Armbruster
2021-04-12 15:42     ` Dr. David Alan Gilbert
2021-04-09  9:34 ` [RFC PATCH 0/5] mptcp support Daniel P. Berrangé
2021-04-09  9:42   ` Daniel P. Berrangé
2021-04-09  9:55     ` Paolo Abeni
2021-04-12 14:46     ` Dr. David Alan Gilbert
2021-04-09  9:47   ` Paolo Abeni
2021-04-12 14:51   ` Dr. David Alan Gilbert
2021-04-12 14:56     ` Daniel P. Berrangé
2021-04-14 18:49       ` Dr. David Alan Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).