[PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work
@ 2024-01-26 22:19 Fabiano Rosas
  2024-01-26 22:19 ` [PATCH 1/5] migration/multifd: Separate compression ops from non-compression Fabiano Rosas
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Fabiano Rosas @ 2024-01-26 22:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Hao Xiang, Yuan Liu, Bryan Zhang

Hi,

Here are two cleanups that are prerequiste for the fixed-ram work, but
also affect the other series on the list at the moment, so I want to
make sure it works for everyone:

1) Separate multifd_ops from compression. The multifd_ops are
   currently coupled with the multifd_compression parameter.

We're adding new multifd_ops in the fixed-ram work and adding new
compression ops in the compression work.

2) Add a new send hook. The multifd_send_thread code currently does
   some twists to support zero copy, which is a socket-only feature.

This might affect the zero page and DSA work which add code to
multifd_send_thread.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1154332360

(I also tested zero copy locally. We cannot add a test for it because
it needs root due to memory locking limits)

Fabiano Rosas (5):
  migration/multifd: Separate compression ops from non-compression
  migration/multifd: Move multifd_socket_ops to socket.c
  migration/multifd: Add multifd_ops->send
  migration/multifd: Simplify zero copy send
  migration/multifd: Move zero copy flag into multifd_socket_setup

 migration/multifd-zlib.c |   9 ++-
 migration/multifd-zstd.c |   9 ++-
 migration/multifd.c      | 164 +++++----------------------------------
 migration/multifd.h      |   6 +-
 migration/socket.c       |  90 ++++++++++++++++++++-
 5 files changed, 128 insertions(+), 150 deletions(-)

-- 
2.35.3



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/5] migration/multifd: Separate compression ops from non-compression
  2024-01-26 22:19 [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work Fabiano Rosas
@ 2024-01-26 22:19 ` Fabiano Rosas
  2024-01-29  6:29   ` Peter Xu
  2024-01-26 22:19 ` [PATCH 2/5] migration/multifd: Move multifd_socket_ops to socket.c Fabiano Rosas
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Fabiano Rosas @ 2024-01-26 22:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Hao Xiang, Yuan Liu, Bryan Zhang

For multifd we currently choose exclusively between migration using
compression or migration without compression. The compression method
is chosen via the multifd_compression parameter (none, zlib,
zstd). We've been using the 'none' value to mean the regular socket
migration.

Rename the 'multifd_ops' array to 'multifd_compression_ops' and move
the 'nocomp_multifd_ops' out of it. We don't need to have the
non-compression methods in an array because they are not registered
dynamically and cannot be compiled out like the compression code.

Rename the 'nocomp' functions to 'multifd_socket' and remove the
comments which are useless IMHO. Next patch moves the functions into a
socket specific file.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd-zlib.c |   2 +-
 migration/multifd-zstd.c |   2 +-
 migration/multifd.c      | 109 +++++++++++----------------------------
 migration/multifd.h      |   3 +-
 4 files changed, 34 insertions(+), 82 deletions(-)

diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 37ce48621e..d89163e975 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -319,7 +319,7 @@ static MultiFDMethods multifd_zlib_ops = {
 
 static void multifd_zlib_register(void)
 {
-    multifd_register_ops(MULTIFD_COMPRESSION_ZLIB, &multifd_zlib_ops);
+    multifd_register_compression(MULTIFD_COMPRESSION_ZLIB, &multifd_zlib_ops);
 }
 
 migration_init(multifd_zlib_register);
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index b471daadcd..a90788540e 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -310,7 +310,7 @@ static MultiFDMethods multifd_zstd_ops = {
 
 static void multifd_zstd_register(void)
 {
-    multifd_register_ops(MULTIFD_COMPRESSION_ZSTD, &multifd_zstd_ops);
+    multifd_register_compression(MULTIFD_COMPRESSION_ZSTD, &multifd_zstd_ops);
 }
 
 migration_init(multifd_zstd_register);
diff --git a/migration/multifd.c b/migration/multifd.c
index 25cbc6dc6b..2968649500 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -45,48 +45,17 @@ typedef struct {
     uint64_t unused2[4];    /* Reserved for future use */
 } __attribute__((packed)) MultiFDInit_t;
 
-/* Multifd without compression */
-
-/**
- * nocomp_send_setup: setup send side
- *
- * For no compression this function does nothing.
- *
- * Returns 0 for success or -1 for error
- *
- * @p: Params for the channel that we are using
- * @errp: pointer to an error
- */
-static int nocomp_send_setup(MultiFDSendParams *p, Error **errp)
+static int multifd_socket_send_setup(MultiFDSendParams *p, Error **errp)
 {
     return 0;
 }
 
-/**
- * nocomp_send_cleanup: cleanup send side
- *
- * For no compression this function does nothing.
- *
- * @p: Params for the channel that we are using
- * @errp: pointer to an error
- */
-static void nocomp_send_cleanup(MultiFDSendParams *p, Error **errp)
+static void multifd_socket_send_cleanup(MultiFDSendParams *p, Error **errp)
 {
     return;
 }
 
-/**
- * nocomp_send_prepare: prepare date to be able to send
- *
- * For no compression we just have to calculate the size of the
- * packet.
- *
- * Returns 0 for success or -1 for error
- *
- * @p: Params for the channel that we are using
- * @errp: pointer to an error
- */
-static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
+static int multifd_socket_send_prepare(MultiFDSendParams *p, Error **errp)
 {
     MultiFDPages_t *pages = p->pages;
 
@@ -101,43 +70,16 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
     return 0;
 }
 
-/**
- * nocomp_recv_setup: setup receive side
- *
- * For no compression this function does nothing.
- *
- * Returns 0 for success or -1 for error
- *
- * @p: Params for the channel that we are using
- * @errp: pointer to an error
- */
-static int nocomp_recv_setup(MultiFDRecvParams *p, Error **errp)
+static int multifd_socket_recv_setup(MultiFDRecvParams *p, Error **errp)
 {
     return 0;
 }
 
-/**
- * nocomp_recv_cleanup: setup receive side
- *
- * For no compression this function does nothing.
- *
- * @p: Params for the channel that we are using
- */
-static void nocomp_recv_cleanup(MultiFDRecvParams *p)
+static void multifd_socket_recv_cleanup(MultiFDRecvParams *p)
 {
 }
 
-/**
- * nocomp_recv_pages: read the data from the channel into actual pages
- *
- * For no compression we just need to read things into the correct place.
- *
- * Returns 0 for success or -1 for error
- *
- * @p: Params for the channel that we are using
- * @errp: pointer to an error
- */
-static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
+static int multifd_socket_recv_pages(MultiFDRecvParams *p, Error **errp)
 {
     uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
 
@@ -153,23 +95,34 @@ static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
     return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
 }
 
-static MultiFDMethods multifd_nocomp_ops = {
-    .send_setup = nocomp_send_setup,
-    .send_cleanup = nocomp_send_cleanup,
-    .send_prepare = nocomp_send_prepare,
-    .recv_setup = nocomp_recv_setup,
-    .recv_cleanup = nocomp_recv_cleanup,
-    .recv_pages = nocomp_recv_pages
+static MultiFDMethods multifd_socket_ops = {
+    .send_setup = multifd_socket_send_setup,
+    .send_cleanup = multifd_socket_send_cleanup,
+    .send_prepare = multifd_socket_send_prepare,
+    .recv_setup = multifd_socket_recv_setup,
+    .recv_cleanup = multifd_socket_recv_cleanup,
+    .recv_pages = multifd_socket_recv_pages
 };
 
-static MultiFDMethods *multifd_ops[MULTIFD_COMPRESSION__MAX] = {
-    [MULTIFD_COMPRESSION_NONE] = &multifd_nocomp_ops,
-};
+static MultiFDMethods *multifd_compression_ops[MULTIFD_COMPRESSION__MAX] = {0};
+
+static MultiFDMethods *multifd_get_ops(void)
+{
+    MultiFDCompression comp = migrate_multifd_compression();
+
+    assert(comp < MULTIFD_COMPRESSION__MAX);
+
+    if (comp != MULTIFD_COMPRESSION_NONE) {
+        return multifd_compression_ops[comp];
+    }
+
+    return &multifd_socket_ops;
+}
 
-void multifd_register_ops(int method, MultiFDMethods *ops)
+void multifd_register_compression(int method, MultiFDMethods *ops)
 {
     assert(0 < method && method < MULTIFD_COMPRESSION__MAX);
-    multifd_ops[method] = ops;
+    multifd_compression_ops[method] = ops;
 }
 
 static int multifd_send_initial_packet(MultiFDSendParams *p, Error **errp)
@@ -915,7 +868,7 @@ int multifd_save_setup(Error **errp)
     multifd_send_state->pages = multifd_pages_init(page_count);
     qemu_sem_init(&multifd_send_state->channels_ready, 0);
     qatomic_set(&multifd_send_state->exiting, 0);
-    multifd_send_state->ops = multifd_ops[migrate_multifd_compression()];
+    multifd_send_state->ops = multifd_get_ops();
 
     for (i = 0; i < thread_count; i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
@@ -1171,7 +1124,7 @@ int multifd_load_setup(Error **errp)
     multifd_recv_state->params = g_new0(MultiFDRecvParams, thread_count);
     qatomic_set(&multifd_recv_state->count, 0);
     qemu_sem_init(&multifd_recv_state->sem_sync, 0);
-    multifd_recv_state->ops = multifd_ops[migrate_multifd_compression()];
+    multifd_recv_state->ops = multifd_get_ops();
 
     for (i = 0; i < thread_count; i++) {
         MultiFDRecvParams *p = &multifd_recv_state->params[i];
diff --git a/migration/multifd.h b/migration/multifd.h
index 35d11f103c..4630baccd4 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -204,7 +204,6 @@ typedef struct {
     int (*recv_pages)(MultiFDRecvParams *p, Error **errp);
 } MultiFDMethods;
 
-void multifd_register_ops(int method, MultiFDMethods *ops);
+void multifd_register_compression(int method, MultiFDMethods *ops);
 
 #endif
-
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/5] migration/multifd: Move multifd_socket_ops to socket.c
  2024-01-26 22:19 [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work Fabiano Rosas
  2024-01-26 22:19 ` [PATCH 1/5] migration/multifd: Separate compression ops from non-compression Fabiano Rosas
@ 2024-01-26 22:19 ` Fabiano Rosas
  2024-01-26 22:19 ` [PATCH 3/5] migration/multifd: Add multifd_ops->send Fabiano Rosas
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Fabiano Rosas @ 2024-01-26 22:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Hao Xiang, Yuan Liu, Bryan Zhang

Code movement only.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd.c | 59 -------------------------------------------
 migration/multifd.h |  2 ++
 migration/socket.c  | 61 ++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 62 insertions(+), 60 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 2968649500..d82775ade9 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -45,65 +45,6 @@ typedef struct {
     uint64_t unused2[4];    /* Reserved for future use */
 } __attribute__((packed)) MultiFDInit_t;
 
-static int multifd_socket_send_setup(MultiFDSendParams *p, Error **errp)
-{
-    return 0;
-}
-
-static void multifd_socket_send_cleanup(MultiFDSendParams *p, Error **errp)
-{
-    return;
-}
-
-static int multifd_socket_send_prepare(MultiFDSendParams *p, Error **errp)
-{
-    MultiFDPages_t *pages = p->pages;
-
-    for (int i = 0; i < p->normal_num; i++) {
-        p->iov[p->iovs_num].iov_base = pages->block->host + p->normal[i];
-        p->iov[p->iovs_num].iov_len = p->page_size;
-        p->iovs_num++;
-    }
-
-    p->next_packet_size = p->normal_num * p->page_size;
-    p->flags |= MULTIFD_FLAG_NOCOMP;
-    return 0;
-}
-
-static int multifd_socket_recv_setup(MultiFDRecvParams *p, Error **errp)
-{
-    return 0;
-}
-
-static void multifd_socket_recv_cleanup(MultiFDRecvParams *p)
-{
-}
-
-static int multifd_socket_recv_pages(MultiFDRecvParams *p, Error **errp)
-{
-    uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
-
-    if (flags != MULTIFD_FLAG_NOCOMP) {
-        error_setg(errp, "multifd %u: flags received %x flags expected %x",
-                   p->id, flags, MULTIFD_FLAG_NOCOMP);
-        return -1;
-    }
-    for (int i = 0; i < p->normal_num; i++) {
-        p->iov[i].iov_base = p->host + p->normal[i];
-        p->iov[i].iov_len = p->page_size;
-    }
-    return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
-}
-
-static MultiFDMethods multifd_socket_ops = {
-    .send_setup = multifd_socket_send_setup,
-    .send_cleanup = multifd_socket_send_cleanup,
-    .send_prepare = multifd_socket_send_prepare,
-    .recv_setup = multifd_socket_recv_setup,
-    .recv_cleanup = multifd_socket_recv_cleanup,
-    .recv_pages = multifd_socket_recv_pages
-};
-
 static MultiFDMethods *multifd_compression_ops[MULTIFD_COMPRESSION__MAX] = {0};
 
 static MultiFDMethods *multifd_get_ops(void)
diff --git a/migration/multifd.h b/migration/multifd.h
index 4630baccd4..6261002524 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -204,6 +204,8 @@ typedef struct {
     int (*recv_pages)(MultiFDRecvParams *p, Error **errp);
 } MultiFDMethods;
 
+extern MultiFDMethods multifd_socket_ops;
+
 void multifd_register_compression(int method, MultiFDMethods *ops);
 
 #endif
diff --git a/migration/socket.c b/migration/socket.c
index 98e3ea1514..7e1371e598 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -16,12 +16,13 @@
 
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
-
+#include "exec/ramblock.h"
 #include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "channel.h"
 #include "socket.h"
 #include "migration.h"
+#include "multifd.h"
 #include "qemu-file.h"
 #include "io/channel-socket.h"
 #include "io/net-listener.h"
@@ -202,3 +203,61 @@ void socket_start_incoming_migration(SocketAddress *saddr,
     }
 }
 
+static int multifd_socket_send_setup(MultiFDSendParams *p, Error **errp)
+{
+    return 0;
+}
+
+static void multifd_socket_send_cleanup(MultiFDSendParams *p, Error **errp)
+{
+    return;
+}
+
+static int multifd_socket_send_prepare(MultiFDSendParams *p, Error **errp)
+{
+    MultiFDPages_t *pages = p->pages;
+
+    for (int i = 0; i < p->normal_num; i++) {
+        p->iov[p->iovs_num].iov_base = pages->block->host + p->normal[i];
+        p->iov[p->iovs_num].iov_len = p->page_size;
+        p->iovs_num++;
+    }
+
+    p->next_packet_size = p->normal_num * p->page_size;
+    p->flags |= MULTIFD_FLAG_NOCOMP;
+    return 0;
+}
+
+static int multifd_socket_recv_setup(MultiFDRecvParams *p, Error **errp)
+{
+    return 0;
+}
+
+static void multifd_socket_recv_cleanup(MultiFDRecvParams *p)
+{
+}
+
+static int multifd_socket_recv_pages(MultiFDRecvParams *p, Error **errp)
+{
+    uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
+
+    if (flags != MULTIFD_FLAG_NOCOMP) {
+        error_setg(errp, "multifd %u: flags received %x flags expected %x",
+                   p->id, flags, MULTIFD_FLAG_NOCOMP);
+        return -1;
+    }
+    for (int i = 0; i < p->normal_num; i++) {
+        p->iov[i].iov_base = p->host + p->normal[i];
+        p->iov[i].iov_len = p->page_size;
+    }
+    return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
+}
+
+MultiFDMethods multifd_socket_ops = {
+    .send_setup = multifd_socket_send_setup,
+    .send_cleanup = multifd_socket_send_cleanup,
+    .send_prepare = multifd_socket_send_prepare,
+    .recv_setup = multifd_socket_recv_setup,
+    .recv_cleanup = multifd_socket_recv_cleanup,
+    .recv_pages = multifd_socket_recv_pages
+};
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/5] migration/multifd: Add multifd_ops->send
  2024-01-26 22:19 [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work Fabiano Rosas
  2024-01-26 22:19 ` [PATCH 1/5] migration/multifd: Separate compression ops from non-compression Fabiano Rosas
  2024-01-26 22:19 ` [PATCH 2/5] migration/multifd: Move multifd_socket_ops to socket.c Fabiano Rosas
@ 2024-01-26 22:19 ` Fabiano Rosas
  2024-01-26 22:19 ` [PATCH 4/5] migration/multifd: Simplify zero copy send Fabiano Rosas
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Fabiano Rosas @ 2024-01-26 22:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Hao Xiang, Yuan Liu, Bryan Zhang

The zero page feature is not supported by the compression methods. It
is exclusive to the socket migration. Add a 'send' hook so we can move
that complexity into a multifd_socket_send() function.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd-zlib.c | 10 ++++++++++
 migration/multifd-zstd.c | 10 ++++++++++
 migration/multifd.c      | 18 ++----------------
 migration/multifd.h      |  1 +
 migration/socket.c       | 22 ++++++++++++++++++++++
 5 files changed, 45 insertions(+), 16 deletions(-)

diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index d89163e975..0859efa60f 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -174,6 +174,15 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
     return 0;
 }
 
+static int zlib_send(MultiFDSendParams *p, Error **errp)
+{
+    p->iov[0].iov_len = p->packet_len;
+    p->iov[0].iov_base = p->packet;
+
+    return qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL,
+                                       0, p->write_flags, errp);
+}
+
 /**
  * zlib_recv_setup: setup receive side
  *
@@ -312,6 +321,7 @@ static MultiFDMethods multifd_zlib_ops = {
     .send_setup = zlib_send_setup,
     .send_cleanup = zlib_send_cleanup,
     .send_prepare = zlib_send_prepare,
+    .send = zlib_send,
     .recv_setup = zlib_recv_setup,
     .recv_cleanup = zlib_recv_cleanup,
     .recv_pages = zlib_recv_pages
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index a90788540e..ca0fc79fdd 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -163,6 +163,15 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
     return 0;
 }
 
+static int zstd_send(MultiFDSendParams *p, Error **errp)
+{
+    p->iov[0].iov_len = p->packet_len;
+    p->iov[0].iov_base = p->packet;
+
+    return qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL,
+                                       0, p->write_flags, errp);
+}
+
 /**
  * zstd_recv_setup: setup receive side
  *
@@ -303,6 +312,7 @@ static MultiFDMethods multifd_zstd_ops = {
     .send_setup = zstd_send_setup,
     .send_cleanup = zstd_send_cleanup,
     .send_prepare = zstd_send_prepare,
+    .send = zstd_send,
     .recv_setup = zstd_recv_setup,
     .recv_cleanup = zstd_recv_cleanup,
     .recv_pages = zstd_recv_pages
diff --git a/migration/multifd.c b/migration/multifd.c
index d82775ade9..9f509699c2 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -604,22 +604,8 @@ static void *multifd_send_thread(void *opaque)
             trace_multifd_send(p->id, packet_num, p->normal_num, flags,
                                p->next_packet_size);
 
-            if (use_zero_copy_send) {
-                /* Send header first, without zerocopy */
-                ret = qio_channel_write_all(p->c, (void *)p->packet,
-                                            p->packet_len, &local_err);
-                if (ret != 0) {
-                    break;
-                }
-            } else {
-                /* Send header using the same writev call */
-                p->iov[0].iov_len = p->packet_len;
-                p->iov[0].iov_base = p->packet;
-            }
-
-            ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL,
-                                              0, p->write_flags, &local_err);
-            if (ret != 0) {
+            ret = multifd_send_state->ops->send(p, &local_err);
+            if (ret) {
                 break;
             }
 
diff --git a/migration/multifd.h b/migration/multifd.h
index 6261002524..0c4cf2d315 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -196,6 +196,7 @@ typedef struct {
     void (*send_cleanup)(MultiFDSendParams *p, Error **errp);
     /* Prepare the send packet */
     int (*send_prepare)(MultiFDSendParams *p, Error **errp);
+    int (*send)(MultiFDSendParams *p, Error **errp);
     /* Setup for receiving side */
     int (*recv_setup)(MultiFDRecvParams *p, Error **errp);
     /* Cleanup for receiving side */
diff --git a/migration/socket.c b/migration/socket.c
index 7e1371e598..608f30975e 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -228,6 +228,27 @@ static int multifd_socket_send_prepare(MultiFDSendParams *p, Error **errp)
     return 0;
 }
 
+static int multifd_socket_send(MultiFDSendParams *p, Error **errp)
+{
+    int ret;
+
+    if (migrate_zero_copy_send()) {
+        /* Send header first, without zerocopy */
+        ret = qio_channel_write_all(p->c, (void *)p->packet, p->packet_len,
+                                    errp);
+        if (ret) {
+            return ret;
+        }
+    } else {
+        /* Send header using the same writev call */
+        p->iov[0].iov_len = p->packet_len;
+        p->iov[0].iov_base = p->packet;
+    }
+
+    return qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL,
+                                       0, p->write_flags, errp);
+}
+
 static int multifd_socket_recv_setup(MultiFDRecvParams *p, Error **errp)
 {
     return 0;
@@ -255,6 +276,7 @@ static int multifd_socket_recv_pages(MultiFDRecvParams *p, Error **errp)
 
 MultiFDMethods multifd_socket_ops = {
     .send_setup = multifd_socket_send_setup,
+    .send = multifd_socket_send,
     .send_cleanup = multifd_socket_send_cleanup,
     .send_prepare = multifd_socket_send_prepare,
     .recv_setup = multifd_socket_recv_setup,
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/5] migration/multifd: Simplify zero copy send
  2024-01-26 22:19 [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work Fabiano Rosas
                   ` (2 preceding siblings ...)
  2024-01-26 22:19 ` [PATCH 3/5] migration/multifd: Add multifd_ops->send Fabiano Rosas
@ 2024-01-26 22:19 ` Fabiano Rosas
  2024-01-26 22:19 ` [PATCH 5/5] migration/multifd: Move zero copy flag into multifd_socket_setup Fabiano Rosas
  2024-01-29  1:41 ` [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work Liu, Yuan1
  5 siblings, 0 replies; 20+ messages in thread
From: Fabiano Rosas @ 2024-01-26 22:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Hao Xiang, Yuan Liu, Bryan Zhang

During the multifd send phase, the multifd packet header is included
as the first element of the iovec, except in the special case of a
socket migration with zero copy enabled. In that case the packet
header is sent separately. To avoid the first position of the iovec
being empty, we play with the p->iovs_num value to make sure
send_prepare() fills the iovec from the correct position depending on
whether the header is at the first position or not.

This is confusing at first sight and could be made simpler if we
always put the header at the first position in the iovec and advance
the iovec pointer when sending with zero copy. That way we can keep
that logic restricted to the socket code.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd-zlib.c |  3 ---
 migration/multifd-zstd.c |  3 ---
 migration/multifd.c      | 11 +++++------
 migration/socket.c       | 19 +++++++++++--------
 4 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 0859efa60f..af6ef96670 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -176,9 +176,6 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
 
 static int zlib_send(MultiFDSendParams *p, Error **errp)
 {
-    p->iov[0].iov_len = p->packet_len;
-    p->iov[0].iov_base = p->packet;
-
     return qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL,
                                        0, p->write_flags, errp);
 }
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index ca0fc79fdd..6d533eaa54 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -165,9 +165,6 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
 
 static int zstd_send(MultiFDSendParams *p, Error **errp)
 {
-    p->iov[0].iov_len = p->packet_len;
-    p->iov[0].iov_base = p->packet;
-
     return qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL,
                                        0, p->write_flags, errp);
 }
diff --git a/migration/multifd.c b/migration/multifd.c
index 9f509699c2..358a4dbf8f 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -171,6 +171,9 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
 
         packet->offset[i] = cpu_to_be64(temp);
     }
+
+    p->iov[0].iov_len = p->packet_len;
+    p->iov[0].iov_base = p->packet;
 }
 
 static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
@@ -546,7 +549,6 @@ static void *multifd_send_thread(void *opaque)
     MigrationThread *thread = NULL;
     Error *local_err = NULL;
     int ret = 0;
-    bool use_zero_copy_send = migrate_zero_copy_send();
 
     thread = migration_threads_add(p->name, qemu_get_thread_id());
 
@@ -574,11 +576,8 @@ static void *multifd_send_thread(void *opaque)
             uint32_t flags;
             p->normal_num = 0;
 
-            if (use_zero_copy_send) {
-                p->iovs_num = 0;
-            } else {
-                p->iovs_num = 1;
-            }
+            /* The header is always added to the vector */
+            p->iovs_num = 1;
 
             for (int i = 0; i < p->pages->num; i++) {
                 p->normal[p->normal_num] = p->pages->offset[i];
diff --git a/migration/socket.c b/migration/socket.c
index 608f30975e..af22ff7cc4 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -16,6 +16,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
+#include "qemu/iov.h"
 #include "exec/ramblock.h"
 #include "qemu/error-report.h"
 #include "qapi/error.h"
@@ -230,22 +231,24 @@ static int multifd_socket_send_prepare(MultiFDSendParams *p, Error **errp)
 
 static int multifd_socket_send(MultiFDSendParams *p, Error **errp)
 {
+    struct iovec *iov = p->iov;
     int ret;
 
     if (migrate_zero_copy_send()) {
-        /* Send header first, without zerocopy */
-        ret = qio_channel_write_all(p->c, (void *)p->packet, p->packet_len,
-                                    errp);
+        /*
+         * The first iovec element is always the header. Sent it first
+         * without zerocopy.
+         */
+        ret = qio_channel_writev_all(p->c, iov, 1, errp);
         if (ret) {
             return ret;
         }
-    } else {
-        /* Send header using the same writev call */
-        p->iov[0].iov_len = p->packet_len;
-        p->iov[0].iov_base = p->packet;
+
+        /* header sent, discard it */
+        iov_discard_front(&iov, &p->iovs_num, iov[0].iov_len);
     }
 
-    return qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL,
+    return qio_channel_writev_full_all(p->c, iov, p->iovs_num, NULL,
                                        0, p->write_flags, errp);
 }
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/5] migration/multifd: Move zero copy flag into multifd_socket_setup
  2024-01-26 22:19 [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work Fabiano Rosas
                   ` (3 preceding siblings ...)
  2024-01-26 22:19 ` [PATCH 4/5] migration/multifd: Simplify zero copy send Fabiano Rosas
@ 2024-01-26 22:19 ` Fabiano Rosas
  2024-01-29  1:41 ` [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work Liu, Yuan1
  5 siblings, 0 replies; 20+ messages in thread
From: Fabiano Rosas @ 2024-01-26 22:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Hao Xiang, Yuan Liu, Bryan Zhang

The generic multifd save setup code should not be responsible for
deciding how the client code is going to send the data. Since the zero
copy feature is supported only by the socket migration, move the
setting of the flag into the socket specific function.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd.c | 7 +------
 migration/socket.c  | 4 ++++
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 358a4dbf8f..16d02a4aac 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -817,12 +817,7 @@ int multifd_save_setup(Error **errp)
         p->normal = g_new0(ram_addr_t, page_count);
         p->page_size = qemu_target_page_size();
         p->page_count = page_count;
-
-        if (migrate_zero_copy_send()) {
-            p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY;
-        } else {
-            p->write_flags = 0;
-        }
+        p->write_flags = 0;
 
         multifd_new_send_channel_create(p);
     }
diff --git a/migration/socket.c b/migration/socket.c
index af22ff7cc4..e9d0a5235c 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -206,6 +206,10 @@ void socket_start_incoming_migration(SocketAddress *saddr,
 
 static int multifd_socket_send_setup(MultiFDSendParams *p, Error **errp)
 {
+    if (migrate_zero_copy_send()) {
+        p->write_flags |= QIO_CHANNEL_WRITE_FLAG_ZERO_COPY;
+    }
+
     return 0;
 }
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* RE: [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work
  2024-01-26 22:19 [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work Fabiano Rosas
                   ` (4 preceding siblings ...)
  2024-01-26 22:19 ` [PATCH 5/5] migration/multifd: Move zero copy flag into multifd_socket_setup Fabiano Rosas
@ 2024-01-29  1:41 ` Liu, Yuan1
  2024-01-29  7:36   ` Peter Xu
  5 siblings, 1 reply; 20+ messages in thread
From: Liu, Yuan1 @ 2024-01-29  1:41 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel; +Cc: Peter Xu, Hao Xiang, Bryan Zhang

> -----Original Message-----
> From: Fabiano Rosas <farosas@suse.de>
> Sent: Saturday, January 27, 2024 6:20 AM
> To: qemu-devel@nongnu.org
> Cc: Peter Xu <peterx@redhat.com>; Hao Xiang <hao.xiang@bytedance.com>;
> Liu, Yuan1 <yuan1.liu@intel.com>; Bryan Zhang <bryan.zhang@bytedance.com>
> Subject: [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing
> work
> 
> Hi,
> 
> Here are two cleanups that are prerequiste for the fixed-ram work, but
> also affect the other series on the list at the moment, so I want to make
> sure it works for everyone:
> 
> 1) Separate multifd_ops from compression. The multifd_ops are
>    currently coupled with the multifd_compression parameter.
> 
> We're adding new multifd_ops in the fixed-ram work and adding new
> compression ops in the compression work.
> 2) Add a new send hook. The multifd_send_thread code currently does
>    some twists to support zero copy, which is a socket-only feature.
> 
> This might affect the zero page and DSA work which add code to
> multifd_send_thread.

Thank you for your reminder， I reviewed the patch set and there is 
a question.

Because this change has an impact on the previous live migration 
With IAA Patch, does the submission of the next version needs 
to be submitted based on this change?

> 
> CI run: https://gitlab.com/farosas/qemu/-/pipelines/1154332360
> 
> (I also tested zero copy locally. We cannot add a test for it because it
> needs root due to memory locking limits)
> 
> Fabiano Rosas (5):
>   migration/multifd: Separate compression ops from non-compression
>   migration/multifd: Move multifd_socket_ops to socket.c
>   migration/multifd: Add multifd_ops->send
>   migration/multifd: Simplify zero copy send
>   migration/multifd: Move zero copy flag into multifd_socket_setup
> 
>  migration/multifd-zlib.c |   9 ++-
>  migration/multifd-zstd.c |   9 ++-
>  migration/multifd.c      | 164 +++++----------------------------------
>  migration/multifd.h      |   6 +-
>  migration/socket.c       |  90 ++++++++++++++++++++-
>  5 files changed, 128 insertions(+), 150 deletions(-)
> 
> --
> 2.35.3


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] migration/multifd: Separate compression ops from non-compression
  2024-01-26 22:19 ` [PATCH 1/5] migration/multifd: Separate compression ops from non-compression Fabiano Rosas
@ 2024-01-29  6:29   ` Peter Xu
  2024-01-29 12:42     ` Fabiano Rosas
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Xu @ 2024-01-29  6:29 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Hao Xiang, Yuan Liu, Bryan Zhang, Avihai Horon

On Fri, Jan 26, 2024 at 07:19:39PM -0300, Fabiano Rosas wrote:
> +static MultiFDMethods multifd_socket_ops = {
> +    .send_setup = multifd_socket_send_setup,
> +    .send_cleanup = multifd_socket_send_cleanup,
> +    .send_prepare = multifd_socket_send_prepare,

Here it's named with "socket", however not all socket-based multifd
migrations will go into this route, e.g., when zstd compression enabled it
will not go via this route, even if zstd also uses sockets as transport.
From that pov, this may be slightly confusing.  Maybe it suites more to be
called "socket_plain" / "socket_no_comp"?

One step back, I had a feeling that the current proposal tried to provide a
single ->ops to cover a model where we may need more than one layer of
abstraction.

Since it might be helpful to allow multifd send arbitrary data (e.g. for
VFIO?  Avihai might have an answer there..), I'll try to even consider that
into the picture.

Let's consider the ultimate goal of multifd, where the simplest model could
look like this in my mind (I'm only discussing sender side, but it'll be
similar on recv side):

               prepare()           send()
  Input   ----------------> IOVs ------------> iochannels

[I used prepare/send, but please think them as generic terms, not 100%
 aligned with what we have with existing multifd_ops, or what you proposed
 later]

Here what are sure, IMHO, is:

  - We always can have some input data to dump; I didn't use "guest pages"
    just to say we may allow arbitrary data.  For any multifd user that
    would like to dump arbitrary data, they can already provide IOVs, so
    here input can be either "MultiFDPages_t" or "IOVs".

  - We may always want to have IOVs to represent the buffers at some point,
    no matter what the input it

  - We always flush the IOVs to iochannels; basically I want to say we can
    always assume the last layer is connecting to QIOChannel APIs, while I
    don't think there's outliers here so far, even if the send() may differ.

Then _maybe_ it's clearer that we can have two layers of OPs?

  - prepare(): it tells how the "input" will be converted into a scatter
    gatter list of buffers.  All compression methods fall into this afaiu.
    This has _nothing_ to do on how the buffers will be sent.  For
    arbitrary-typed input, this can already be a no-op since the IOVs
    provided can already be passed over to send().

  - send(): how to dump the IOVs to the iochannels.  AFAIU this is motly
    only useful for fixed-ram migrations.

Would this be clearer, rather than keep using a single multifd_ops?

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work
  2024-01-29  1:41 ` [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work Liu, Yuan1
@ 2024-01-29  7:36   ` Peter Xu
  2024-01-29 12:51     ` Fabiano Rosas
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Xu @ 2024-01-29  7:36 UTC (permalink / raw)
  To: Liu, Yuan1; +Cc: Fabiano Rosas, qemu-devel, Hao Xiang, Bryan Zhang

On Mon, Jan 29, 2024 at 01:41:01AM +0000, Liu, Yuan1 wrote:
> Because this change has an impact on the previous live migration 
> With IAA Patch, does the submission of the next version needs 
> to be submitted based on this change?

I'd say hold off a little while until we're more certain on the planned
interface changes, to avoid you rebase your code back and forth; unless
you're pretty confident that this will be the right approach.

I apologize on not having looked at any of the QAT/IAA compression / zero
detection series posted on the list; I do plan to read them very soon too
after Fabiano.  So I may not have a complete full picture here yet, please
bare with me.

If this series is trying to provide a base ground for all the efforts,
it'll be great if we can thoroughly discuss here and settle an approach
soon that will satisfy everyone.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] migration/multifd: Separate compression ops from non-compression
  2024-01-29  6:29   ` Peter Xu
@ 2024-01-29 12:42     ` Fabiano Rosas
  2024-01-30  8:42       ` Peter Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Fabiano Rosas @ 2024-01-29 12:42 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Hao Xiang, Yuan Liu, Bryan Zhang, Avihai Horon

Peter Xu <peterx@redhat.com> writes:

> On Fri, Jan 26, 2024 at 07:19:39PM -0300, Fabiano Rosas wrote:
>> +static MultiFDMethods multifd_socket_ops = {
>> +    .send_setup = multifd_socket_send_setup,
>> +    .send_cleanup = multifd_socket_send_cleanup,
>> +    .send_prepare = multifd_socket_send_prepare,
>
> Here it's named with "socket", however not all socket-based multifd
> migrations will go into this route, e.g., when zstd compression enabled it
> will not go via this route, even if zstd also uses sockets as transport.
> From that pov, this may be slightly confusing.  Maybe it suites more to be
> called "socket_plain" / "socket_no_comp"?
>
> One step back, I had a feeling that the current proposal tried to provide a
> single ->ops to cover a model where we may need more than one layer of
> abstraction.
>
> Since it might be helpful to allow multifd send arbitrary data (e.g. for
> VFIO?  Avihai might have an answer there..), I'll try to even consider that
> into the picture.
>
> Let's consider the ultimate goal of multifd, where the simplest model could
> look like this in my mind (I'm only discussing sender side, but it'll be
> similar on recv side):
>
>                prepare()           send()
>   Input   ----------------> IOVs ------------> iochannels
>
> [I used prepare/send, but please think them as generic terms, not 100%
>  aligned with what we have with existing multifd_ops, or what you proposed
>  later]
>
> Here what are sure, IMHO, is:
>
>   - We always can have some input data to dump; I didn't use "guest pages"
>     just to say we may allow arbitrary data.  For any multifd user that
>     would like to dump arbitrary data, they can already provide IOVs, so
>     here input can be either "MultiFDPages_t" or "IOVs".

Or anything else, since the client code also has control over send(),
no? So it could give multifd a pointer to some memory and then use
send() to do whatever it wants with it. Multifd is just providing worker
threads and "scheduling".

Also note that multifd clients currently _do not_ provide IOVs. They
merely provide data to multifd (p->pages) and then convert that data
into IOVs at prepare(). This is different, because multifd currently
holds that p->pages (and turns that into p->normal), which means the
client code does not need to store the data across iterations (in the
case of RAM which is iterative).

>
>   - We may always want to have IOVs to represent the buffers at some point,
>     no matter what the input it
>
>   - We always flush the IOVs to iochannels; basically I want to say we can
>     always assume the last layer is connecting to QIOChannel APIs, while I
>     don't think there's outliers here so far, even if the send() may differ.
>
> Then _maybe_ it's clearer that we can have two layers of OPs?
>
>   - prepare(): it tells how the "input" will be converted into a scatter
>     gatter list of buffers.  All compression methods fall into this afaiu.
>     This has _nothing_ to do on how the buffers will be sent.  For
>     arbitrary-typed input, this can already be a no-op since the IOVs
>     provided can already be passed over to send().
>
>   - send(): how to dump the IOVs to the iochannels.  AFAIU this is motly
>     only useful for fixed-ram migrations.
>
> Would this be clearer, rather than keep using a single multifd_ops?

Sorry, I don't see how what you describe is any different than what we
have. And I don't see how any of this would mean more than one
multifd_ops. We already have multifd_ops->prepare() and
multifd_ops->send(). What am I missing?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work
  2024-01-29  7:36   ` Peter Xu
@ 2024-01-29 12:51     ` Fabiano Rosas
  2024-01-31  9:29       ` Peter Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Fabiano Rosas @ 2024-01-29 12:51 UTC (permalink / raw)
  To: Peter Xu, Liu, Yuan1; +Cc: qemu-devel, Hao Xiang, Bryan Zhang

Peter Xu <peterx@redhat.com> writes:

> On Mon, Jan 29, 2024 at 01:41:01AM +0000, Liu, Yuan1 wrote:
>> Because this change has an impact on the previous live migration 
>> With IAA Patch, does the submission of the next version needs 
>> to be submitted based on this change?
>
> I'd say hold off a little while until we're more certain on the planned
> interface changes, to avoid you rebase your code back and forth; unless
> you're pretty confident that this will be the right approach.
>
> I apologize on not having looked at any of the QAT/IAA compression / zero
> detection series posted on the list; I do plan to read them very soon too
> after Fabiano.  So I may not have a complete full picture here yet, please
> bare with me.
>
> If this series is trying to provide a base ground for all the efforts,
> it'll be great if we can thoroughly discuss here and settle an approach
> soon that will satisfy everyone.

Just a summary if it helps:

For compression work (IAA/QPL, QAT) the discussion is around having a
new "compression acceleration" option that enables the accelerators and
is complementary to the existing zlib compression method. We'd choose
those automatically based on availability and we'd make HW accelerated
compression produce a stream that is compatible with QEMU's zlib stream
so we could migrate between solutions.

For zero page work and zero page acceleration (DSA), the question is how
to fit zero page detection into multifd and whether we need a new hook
multifd_ops->zero_page_detect() (or similar) to allow client code to
provide it's own zero page detection methods. My worry here is that
teaching multifd to recognize zero pages is one more coupling to the
"pages" data type. Ideallly we'd find a way to include that operation as
a prepare() responsibility and the client code would deal with it.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] migration/multifd: Separate compression ops from non-compression
  2024-01-29 12:42     ` Fabiano Rosas
@ 2024-01-30  8:42       ` Peter Xu
  2024-01-30 15:11         ` Fabiano Rosas
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Xu @ 2024-01-30  8:42 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Hao Xiang, Yuan Liu, Bryan Zhang, Avihai Horon

On Mon, Jan 29, 2024 at 09:42:24AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Fri, Jan 26, 2024 at 07:19:39PM -0300, Fabiano Rosas wrote:
> >> +static MultiFDMethods multifd_socket_ops = {
> >> +    .send_setup = multifd_socket_send_setup,
> >> +    .send_cleanup = multifd_socket_send_cleanup,
> >> +    .send_prepare = multifd_socket_send_prepare,
> >
> > Here it's named with "socket", however not all socket-based multifd
> > migrations will go into this route, e.g., when zstd compression enabled it
> > will not go via this route, even if zstd also uses sockets as transport.
> > From that pov, this may be slightly confusing.  Maybe it suites more to be
> > called "socket_plain" / "socket_no_comp"?
> >
> > One step back, I had a feeling that the current proposal tried to provide a
> > single ->ops to cover a model where we may need more than one layer of
> > abstraction.
> >
> > Since it might be helpful to allow multifd send arbitrary data (e.g. for
> > VFIO?  Avihai might have an answer there..), I'll try to even consider that
> > into the picture.
> >
> > Let's consider the ultimate goal of multifd, where the simplest model could
> > look like this in my mind (I'm only discussing sender side, but it'll be
> > similar on recv side):
> >
> >                prepare()           send()
> >   Input   ----------------> IOVs ------------> iochannels
> >
> > [I used prepare/send, but please think them as generic terms, not 100%
> >  aligned with what we have with existing multifd_ops, or what you proposed
> >  later]
> >
> > Here what are sure, IMHO, is:
> >
> >   - We always can have some input data to dump; I didn't use "guest pages"
> >     just to say we may allow arbitrary data.  For any multifd user that
> >     would like to dump arbitrary data, they can already provide IOVs, so
> >     here input can be either "MultiFDPages_t" or "IOVs".
> 
> Or anything else, since the client code also has control over send(),
> no? So it could give multifd a pointer to some memory and then use
> send() to do whatever it wants with it. Multifd is just providing worker
> threads and "scheduling".

IOVs contain the case of one single buffer, where n_iovs==1.  Here I
mentioned IOVs explicitly because I want to make it part of the protocol so
that the interface might be clearer, on what is not changing, and what can
change for a multifd client.

> 
> Also note that multifd clients currently _do not_ provide IOVs. They
> merely provide data to multifd (p->pages) and then convert that data
> into IOVs at prepare(). This is different, because multifd currently
> holds that p->pages (and turns that into p->normal), which means the
> client code does not need to store the data across iterations (in the
> case of RAM which is iterative).

They provide?  AFAIU that's exactly MultiFDSendParams.iov as of now, while
iov_nums is the length.

> 
> >
> >   - We may always want to have IOVs to represent the buffers at some point,
> >     no matter what the input it
> >
> >   - We always flush the IOVs to iochannels; basically I want to say we can
> >     always assume the last layer is connecting to QIOChannel APIs, while I
> >     don't think there's outliers here so far, even if the send() may differ.
> >
> > Then _maybe_ it's clearer that we can have two layers of OPs?
> >
> >   - prepare(): it tells how the "input" will be converted into a scatter
> >     gatter list of buffers.  All compression methods fall into this afaiu.
> >     This has _nothing_ to do on how the buffers will be sent.  For
> >     arbitrary-typed input, this can already be a no-op since the IOVs
> >     provided can already be passed over to send().
> >
> >   - send(): how to dump the IOVs to the iochannels.  AFAIU this is motly
> >     only useful for fixed-ram migrations.
> >
> > Would this be clearer, rather than keep using a single multifd_ops?
> 
> Sorry, I don't see how what you describe is any different than what we
> have. And I don't see how any of this would mean more than one
> multifd_ops. We already have multifd_ops->prepare() and
> multifd_ops->send(). What am I missing?

I meant instead of having a single MultiFDMethods, we can have
MultifdPrepareOps and MultifdSendOps separately.

Now with single MultiFDMethods, it must always provide e.g. both prepare()
and send() in one set of OPs for one use case.  What I wanted to say is
maybe it is cleaner we split it into two OPs, then all the socket-based
scenarios can already stick with the same send() method, even though they
can prepare() differently.

IOW, for this base patchset to pave way for compression accelerators, IIUC
we don't need a send() yet so far?  Should they still work pretty well with
qio_channel_writev_full_all() with proper touchups on p->write_flags just
for zero copy purposes?

I'll have a read again to your previous multifd-packet-cleanups branch I
guess.  but this series definitely doesn't apply there already.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] migration/multifd: Separate compression ops from non-compression
  2024-01-30  8:42       ` Peter Xu
@ 2024-01-30 15:11         ` Fabiano Rosas
  2024-01-31  7:24           ` Peter Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Fabiano Rosas @ 2024-01-30 15:11 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Hao Xiang, Yuan Liu, Bryan Zhang, Avihai Horon

Peter Xu <peterx@redhat.com> writes:

> On Mon, Jan 29, 2024 at 09:42:24AM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Fri, Jan 26, 2024 at 07:19:39PM -0300, Fabiano Rosas wrote:
>> >> +static MultiFDMethods multifd_socket_ops = {
>> >> +    .send_setup = multifd_socket_send_setup,
>> >> +    .send_cleanup = multifd_socket_send_cleanup,
>> >> +    .send_prepare = multifd_socket_send_prepare,
>> >
>> > Here it's named with "socket", however not all socket-based multifd
>> > migrations will go into this route, e.g., when zstd compression enabled it
>> > will not go via this route, even if zstd also uses sockets as transport.
>> > From that pov, this may be slightly confusing.  Maybe it suites more to be
>> > called "socket_plain" / "socket_no_comp"?
>> >
>> > One step back, I had a feeling that the current proposal tried to provide a
>> > single ->ops to cover a model where we may need more than one layer of
>> > abstraction.
>> >
>> > Since it might be helpful to allow multifd send arbitrary data (e.g. for
>> > VFIO?  Avihai might have an answer there..), I'll try to even consider that
>> > into the picture.
>> >
>> > Let's consider the ultimate goal of multifd, where the simplest model could
>> > look like this in my mind (I'm only discussing sender side, but it'll be
>> > similar on recv side):
>> >
>> >                prepare()           send()
>> >   Input   ----------------> IOVs ------------> iochannels
>> >
>> > [I used prepare/send, but please think them as generic terms, not 100%
>> >  aligned with what we have with existing multifd_ops, or what you proposed
>> >  later]
>> >
>> > Here what are sure, IMHO, is:
>> >
>> >   - We always can have some input data to dump; I didn't use "guest pages"
>> >     just to say we may allow arbitrary data.  For any multifd user that
>> >     would like to dump arbitrary data, they can already provide IOVs, so
>> >     here input can be either "MultiFDPages_t" or "IOVs".
>> 
>> Or anything else, since the client code also has control over send(),
>> no? So it could give multifd a pointer to some memory and then use
>> send() to do whatever it wants with it. Multifd is just providing worker
>> threads and "scheduling".
>
> IOVs contain the case of one single buffer, where n_iovs==1.  Here I
> mentioned IOVs explicitly because I want to make it part of the protocol so
> that the interface might be clearer, on what is not changing, and what can
> change for a multifd client.

Got it. I agree.

>> 
>> Also note that multifd clients currently _do not_ provide IOVs. They
>> merely provide data to multifd (p->pages) and then convert that data
>> into IOVs at prepare(). This is different, because multifd currently
>> holds that p->pages (and turns that into p->normal), which means the
>> client code does not need to store the data across iterations (in the
>> case of RAM which is iterative).
>
> They provide?  AFAIU that's exactly MultiFDSendParams.iov as of now, while
> iov_nums is the length.

Before that, the ram code needs to pass in the p->pages->offset array
first. Then, that gets put into p->normal. Then, that gets put into
p->iov at prepare(). So it's not a simple "fill p->iov and pass it to
multifd".

Hmm, could we just replace multifd_send_state->pages with a
multifd_send_state->iov? I don't really understand why do we need to
carry that pages->offset around.

>> 
>> >
>> >   - We may always want to have IOVs to represent the buffers at some point,
>> >     no matter what the input it
>> >
>> >   - We always flush the IOVs to iochannels; basically I want to say we can
>> >     always assume the last layer is connecting to QIOChannel APIs, while I
>> >     don't think there's outliers here so far, even if the send() may differ.
>> >
>> > Then _maybe_ it's clearer that we can have two layers of OPs?
>> >
>> >   - prepare(): it tells how the "input" will be converted into a scatter
>> >     gatter list of buffers.  All compression methods fall into this afaiu.
>> >     This has _nothing_ to do on how the buffers will be sent.  For
>> >     arbitrary-typed input, this can already be a no-op since the IOVs
>> >     provided can already be passed over to send().
>> >
>> >   - send(): how to dump the IOVs to the iochannels.  AFAIU this is motly
>> >     only useful for fixed-ram migrations.
>> >
>> > Would this be clearer, rather than keep using a single multifd_ops?
>> 
>> Sorry, I don't see how what you describe is any different than what we
>> have. And I don't see how any of this would mean more than one
>> multifd_ops. We already have multifd_ops->prepare() and
>> multifd_ops->send(). What am I missing?
>
> I meant instead of having a single MultiFDMethods, we can have
> MultifdPrepareOps and MultifdSendOps separately.
>
> Now with single MultiFDMethods, it must always provide e.g. both prepare()
> and send() in one set of OPs for one use case.  What I wanted to say is
> maybe it is cleaner we split it into two OPs, then all the socket-based
> scenarios can already stick with the same send() method, even though they
> can prepare() differently.

Hmm, so zlib/zstd implement all ops except for the send one. And
socket_plain and file implement all prepare hooks plus the send. So we'd
have sort of a data handling layer and a transport layer. I'll see how
it looks.

>
> IOW, for this base patchset to pave way for compression accelerators, IIUC
> we don't need a send() yet so far?  Should they still work pretty well with
> qio_channel_writev_full_all() with proper touchups on p->write_flags just
> for zero copy purposes?

Yes. The point here is to just give everyone a heads-up so we avoid
changing the code in incompatible ways.

>
> I'll have a read again to your previous multifd-packet-cleanups branch I
> guess.  but this series definitely doesn't apply there already.

multifd-packet-cleanups attempts to replace MultiFDPages_t with a
generic data structure. That's a separate issue.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] migration/multifd: Separate compression ops from non-compression
  2024-01-30 15:11         ` Fabiano Rosas
@ 2024-01-31  7:24           ` Peter Xu
  2024-01-31 13:14             ` Fabiano Rosas
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Xu @ 2024-01-31  7:24 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Hao Xiang, Yuan Liu, Bryan Zhang, Avihai Horon

On Tue, Jan 30, 2024 at 12:11:47PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Mon, Jan 29, 2024 at 09:42:24AM -0300, Fabiano Rosas wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >> 
> >> > On Fri, Jan 26, 2024 at 07:19:39PM -0300, Fabiano Rosas wrote:
> >> >> +static MultiFDMethods multifd_socket_ops = {
> >> >> +    .send_setup = multifd_socket_send_setup,
> >> >> +    .send_cleanup = multifd_socket_send_cleanup,
> >> >> +    .send_prepare = multifd_socket_send_prepare,
> >> >
> >> > Here it's named with "socket", however not all socket-based multifd
> >> > migrations will go into this route, e.g., when zstd compression enabled it
> >> > will not go via this route, even if zstd also uses sockets as transport.
> >> > From that pov, this may be slightly confusing.  Maybe it suites more to be
> >> > called "socket_plain" / "socket_no_comp"?
> >> >
> >> > One step back, I had a feeling that the current proposal tried to provide a
> >> > single ->ops to cover a model where we may need more than one layer of
> >> > abstraction.
> >> >
> >> > Since it might be helpful to allow multifd send arbitrary data (e.g. for
> >> > VFIO?  Avihai might have an answer there..), I'll try to even consider that
> >> > into the picture.
> >> >
> >> > Let's consider the ultimate goal of multifd, where the simplest model could
> >> > look like this in my mind (I'm only discussing sender side, but it'll be
> >> > similar on recv side):
> >> >
> >> >                prepare()           send()
> >> >   Input   ----------------> IOVs ------------> iochannels
> >> >
> >> > [I used prepare/send, but please think them as generic terms, not 100%
> >> >  aligned with what we have with existing multifd_ops, or what you proposed
> >> >  later]
> >> >
> >> > Here what are sure, IMHO, is:
> >> >
> >> >   - We always can have some input data to dump; I didn't use "guest pages"
> >> >     just to say we may allow arbitrary data.  For any multifd user that
> >> >     would like to dump arbitrary data, they can already provide IOVs, so
> >> >     here input can be either "MultiFDPages_t" or "IOVs".
> >> 
> >> Or anything else, since the client code also has control over send(),
> >> no? So it could give multifd a pointer to some memory and then use
> >> send() to do whatever it wants with it. Multifd is just providing worker
> >> threads and "scheduling".
> >
> > IOVs contain the case of one single buffer, where n_iovs==1.  Here I
> > mentioned IOVs explicitly because I want to make it part of the protocol so
> > that the interface might be clearer, on what is not changing, and what can
> > change for a multifd client.
> 
> Got it. I agree.
> 
> >> 
> >> Also note that multifd clients currently _do not_ provide IOVs. They
> >> merely provide data to multifd (p->pages) and then convert that data
> >> into IOVs at prepare(). This is different, because multifd currently
> >> holds that p->pages (and turns that into p->normal), which means the
> >> client code does not need to store the data across iterations (in the
> >> case of RAM which is iterative).
> >
> > They provide?  AFAIU that's exactly MultiFDSendParams.iov as of now, while
> > iov_nums is the length.
> 
> Before that, the ram code needs to pass in the p->pages->offset array
> first. Then, that gets put into p->normal. Then, that gets put into
> p->iov at prepare(). So it's not a simple "fill p->iov and pass it to
> multifd".
> 
> Hmm, could we just replace multifd_send_state->pages with a
> multifd_send_state->iov? I don't really understand why do we need to
> carry that pages->offset around.

I am thinking the p->normal is mostly redundant.. at least on the sender
side that I just read.  Since I'll be preparing a new spin of the multifd
cleanup series I posted, maybe I can append one more to try dropping
p->normal[] completely.

> 
> >> 
> >> >
> >> >   - We may always want to have IOVs to represent the buffers at some point,
> >> >     no matter what the input it
> >> >
> >> >   - We always flush the IOVs to iochannels; basically I want to say we can
> >> >     always assume the last layer is connecting to QIOChannel APIs, while I
> >> >     don't think there's outliers here so far, even if the send() may differ.
> >> >
> >> > Then _maybe_ it's clearer that we can have two layers of OPs?
> >> >
> >> >   - prepare(): it tells how the "input" will be converted into a scatter
> >> >     gatter list of buffers.  All compression methods fall into this afaiu.
> >> >     This has _nothing_ to do on how the buffers will be sent.  For
> >> >     arbitrary-typed input, this can already be a no-op since the IOVs
> >> >     provided can already be passed over to send().
> >> >
> >> >   - send(): how to dump the IOVs to the iochannels.  AFAIU this is motly
> >> >     only useful for fixed-ram migrations.
> >> >
> >> > Would this be clearer, rather than keep using a single multifd_ops?
> >> 
> >> Sorry, I don't see how what you describe is any different than what we
> >> have. And I don't see how any of this would mean more than one
> >> multifd_ops. We already have multifd_ops->prepare() and
> >> multifd_ops->send(). What am I missing?
> >
> > I meant instead of having a single MultiFDMethods, we can have
> > MultifdPrepareOps and MultifdSendOps separately.
> >
> > Now with single MultiFDMethods, it must always provide e.g. both prepare()
> > and send() in one set of OPs for one use case.  What I wanted to say is
> > maybe it is cleaner we split it into two OPs, then all the socket-based
> > scenarios can already stick with the same send() method, even though they
> > can prepare() differently.
> 
> Hmm, so zlib/zstd implement all ops except for the send one. And
> socket_plain and file implement all prepare hooks plus the send. So we'd
> have sort of a data handling layer and a transport layer. I'll see how
> it looks.

Yeah something like that if you agree; I'd think socket_plain can also use
the same socket send() with all the rest?  Again, I don't see its specialty
except the zero copy possibility, while the latter should be able to be
covered by proper setup of p->write_flags.

> 
> >
> > IOW, for this base patchset to pave way for compression accelerators, IIUC
> > we don't need a send() yet so far?  Should they still work pretty well with
> > qio_channel_writev_full_all() with proper touchups on p->write_flags just
> > for zero copy purposes?
> 
> Yes. The point here is to just give everyone a heads-up so we avoid
> changing the code in incompatible ways.
> 
> >
> > I'll have a read again to your previous multifd-packet-cleanups branch I
> > guess.  but this series definitely doesn't apply there already.
> 
> multifd-packet-cleanups attempts to replace MultiFDPages_t with a
> generic data structure. That's a separate issue.
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work
  2024-01-29 12:51     ` Fabiano Rosas
@ 2024-01-31  9:29       ` Peter Xu
  2024-01-31 13:19         ` Fabiano Rosas
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Xu @ 2024-01-31  9:29 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: Liu, Yuan1, qemu-devel, Hao Xiang, Bryan Zhang

On Mon, Jan 29, 2024 at 09:51:06AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Mon, Jan 29, 2024 at 01:41:01AM +0000, Liu, Yuan1 wrote:
> >> Because this change has an impact on the previous live migration 
> >> With IAA Patch, does the submission of the next version needs 
> >> to be submitted based on this change?
> >
> > I'd say hold off a little while until we're more certain on the planned
> > interface changes, to avoid you rebase your code back and forth; unless
> > you're pretty confident that this will be the right approach.
> >
> > I apologize on not having looked at any of the QAT/IAA compression / zero
> > detection series posted on the list; I do plan to read them very soon too
> > after Fabiano.  So I may not have a complete full picture here yet, please
> > bare with me.
> >
> > If this series is trying to provide a base ground for all the efforts,
> > it'll be great if we can thoroughly discuss here and settle an approach
> > soon that will satisfy everyone.
> 
> Just a summary if it helps:
> 
> For compression work (IAA/QPL, QAT) the discussion is around having a
> new "compression acceleration" option that enables the accelerators and
> is complementary to the existing zlib compression method. We'd choose
> those automatically based on availability and we'd make HW accelerated
> compression produce a stream that is compatible with QEMU's zlib stream
> so we could migrate between solutions.
> 
> For zero page work and zero page acceleration (DSA), the question is how
> to fit zero page detection into multifd and whether we need a new hook
> multifd_ops->zero_page_detect() (or similar) to allow client code to
> provide it's own zero page detection methods. My worry here is that
> teaching multifd to recognize zero pages is one more coupling to the
> "pages" data type. Ideallly we'd find a way to include that operation as
> a prepare() responsibility and the client code would deal with it.

Thanks Fabiano.

Since I'm preparing the old series to post for some fundamental cleanups
around multifd, and when I'm looking around the code, I noticed that
_maybe_ it'll also be eaiser to apply such a series if we can cleanup more
things then move towards a clean base to add more accelerators.

I agree many ideas in your this series, but I may address it slightly
different (e.g., I want to avoid send(), but you can consider that in the
fixed-ram series instead), also it'll be after some other cleanup I plan to
give a stab at which is not yet covered in this series.  I hope I can add
your "Co-developed-by" in some of the patches there.  If you haven't spend
more time on new version of this series, please wait 1-2 days so I can post
my thoughts.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] migration/multifd: Separate compression ops from non-compression
  2024-01-31  7:24           ` Peter Xu
@ 2024-01-31 13:14             ` Fabiano Rosas
  2024-02-01  3:25               ` Peter Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Fabiano Rosas @ 2024-01-31 13:14 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Hao Xiang, Yuan Liu, Bryan Zhang, Avihai Horon

Peter Xu <peterx@redhat.com> writes:

> On Tue, Jan 30, 2024 at 12:11:47PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Mon, Jan 29, 2024 at 09:42:24AM -0300, Fabiano Rosas wrote:
>> >> Peter Xu <peterx@redhat.com> writes:
>> >> 
>> >> > On Fri, Jan 26, 2024 at 07:19:39PM -0300, Fabiano Rosas wrote:
>> >> >> +static MultiFDMethods multifd_socket_ops = {
>> >> >> +    .send_setup = multifd_socket_send_setup,
>> >> >> +    .send_cleanup = multifd_socket_send_cleanup,
>> >> >> +    .send_prepare = multifd_socket_send_prepare,
>> >> >
>> >> > Here it's named with "socket", however not all socket-based multifd
>> >> > migrations will go into this route, e.g., when zstd compression enabled it
>> >> > will not go via this route, even if zstd also uses sockets as transport.
>> >> > From that pov, this may be slightly confusing.  Maybe it suites more to be
>> >> > called "socket_plain" / "socket_no_comp"?
>> >> >
>> >> > One step back, I had a feeling that the current proposal tried to provide a
>> >> > single ->ops to cover a model where we may need more than one layer of
>> >> > abstraction.
>> >> >
>> >> > Since it might be helpful to allow multifd send arbitrary data (e.g. for
>> >> > VFIO?  Avihai might have an answer there..), I'll try to even consider that
>> >> > into the picture.
>> >> >
>> >> > Let's consider the ultimate goal of multifd, where the simplest model could
>> >> > look like this in my mind (I'm only discussing sender side, but it'll be
>> >> > similar on recv side):
>> >> >
>> >> >                prepare()           send()
>> >> >   Input   ----------------> IOVs ------------> iochannels
>> >> >
>> >> > [I used prepare/send, but please think them as generic terms, not 100%
>> >> >  aligned with what we have with existing multifd_ops, or what you proposed
>> >> >  later]
>> >> >
>> >> > Here what are sure, IMHO, is:
>> >> >
>> >> >   - We always can have some input data to dump; I didn't use "guest pages"
>> >> >     just to say we may allow arbitrary data.  For any multifd user that
>> >> >     would like to dump arbitrary data, they can already provide IOVs, so
>> >> >     here input can be either "MultiFDPages_t" or "IOVs".
>> >> 
>> >> Or anything else, since the client code also has control over send(),
>> >> no? So it could give multifd a pointer to some memory and then use
>> >> send() to do whatever it wants with it. Multifd is just providing worker
>> >> threads and "scheduling".
>> >
>> > IOVs contain the case of one single buffer, where n_iovs==1.  Here I
>> > mentioned IOVs explicitly because I want to make it part of the protocol so
>> > that the interface might be clearer, on what is not changing, and what can
>> > change for a multifd client.
>> 
>> Got it. I agree.
>> 
>> >> 
>> >> Also note that multifd clients currently _do not_ provide IOVs. They
>> >> merely provide data to multifd (p->pages) and then convert that data
>> >> into IOVs at prepare(). This is different, because multifd currently
>> >> holds that p->pages (and turns that into p->normal), which means the
>> >> client code does not need to store the data across iterations (in the
>> >> case of RAM which is iterative).
>> >
>> > They provide?  AFAIU that's exactly MultiFDSendParams.iov as of now, while
>> > iov_nums is the length.
>> 
>> Before that, the ram code needs to pass in the p->pages->offset array
>> first. Then, that gets put into p->normal. Then, that gets put into
>> p->iov at prepare(). So it's not a simple "fill p->iov and pass it to
>> multifd".
>> 
>> Hmm, could we just replace multifd_send_state->pages with a
>> multifd_send_state->iov? I don't really understand why do we need to
>> carry that pages->offset around.
>
> I am thinking the p->normal is mostly redundant.. at least on the sender
> side that I just read.  Since I'll be preparing a new spin of the multifd
> cleanup series I posted, maybe I can append one more to try dropping
> p->normal[] completely.

Just for reference, you don't have to use it, but I have this patch:

https://gitlab.com/farosas/qemu/-/commit/4316e145ae7e7bf378ef7fde64c2b02260362847

>> 
>> >> 
>> >> >
>> >> >   - We may always want to have IOVs to represent the buffers at some point,
>> >> >     no matter what the input it
>> >> >
>> >> >   - We always flush the IOVs to iochannels; basically I want to say we can
>> >> >     always assume the last layer is connecting to QIOChannel APIs, while I
>> >> >     don't think there's outliers here so far, even if the send() may differ.
>> >> >
>> >> > Then _maybe_ it's clearer that we can have two layers of OPs?
>> >> >
>> >> >   - prepare(): it tells how the "input" will be converted into a scatter
>> >> >     gatter list of buffers.  All compression methods fall into this afaiu.
>> >> >     This has _nothing_ to do on how the buffers will be sent.  For
>> >> >     arbitrary-typed input, this can already be a no-op since the IOVs
>> >> >     provided can already be passed over to send().
>> >> >
>> >> >   - send(): how to dump the IOVs to the iochannels.  AFAIU this is motly
>> >> >     only useful for fixed-ram migrations.
>> >> >
>> >> > Would this be clearer, rather than keep using a single multifd_ops?
>> >> 
>> >> Sorry, I don't see how what you describe is any different than what we
>> >> have. And I don't see how any of this would mean more than one
>> >> multifd_ops. We already have multifd_ops->prepare() and
>> >> multifd_ops->send(). What am I missing?
>> >
>> > I meant instead of having a single MultiFDMethods, we can have
>> > MultifdPrepareOps and MultifdSendOps separately.
>> >
>> > Now with single MultiFDMethods, it must always provide e.g. both prepare()
>> > and send() in one set of OPs for one use case.  What I wanted to say is
>> > maybe it is cleaner we split it into two OPs, then all the socket-based
>> > scenarios can already stick with the same send() method, even though they
>> > can prepare() differently.
>> 
>> Hmm, so zlib/zstd implement all ops except for the send one. And
>> socket_plain and file implement all prepare hooks plus the send. So we'd
>> have sort of a data handling layer and a transport layer. I'll see how
>> it looks.
>
> Yeah something like that if you agree; I'd think socket_plain can also use
> the same socket send() with all the rest?  Again, I don't see its specialty
> except the zero copy possibility, while the latter should be able to be
> covered by proper setup of p->write_flags.
>

I see. Makes sense.

>> 
>> >
>> > IOW, for this base patchset to pave way for compression accelerators, IIUC
>> > we don't need a send() yet so far?  Should they still work pretty well with
>> > qio_channel_writev_full_all() with proper touchups on p->write_flags just
>> > for zero copy purposes?
>> 
>> Yes. The point here is to just give everyone a heads-up so we avoid
>> changing the code in incompatible ways.
>> 
>> >
>> > I'll have a read again to your previous multifd-packet-cleanups branch I
>> > guess.  but this series definitely doesn't apply there already.
>> 
>> multifd-packet-cleanups attempts to replace MultiFDPages_t with a
>> generic data structure. That's a separate issue.
>> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work
  2024-01-31  9:29       ` Peter Xu
@ 2024-01-31 13:19         ` Fabiano Rosas
  2024-02-01  1:11           ` [External] " Hao Xiang
  0 siblings, 1 reply; 20+ messages in thread
From: Fabiano Rosas @ 2024-01-31 13:19 UTC (permalink / raw)
  To: Peter Xu; +Cc: Liu, Yuan1, qemu-devel, Hao Xiang, Bryan Zhang

Peter Xu <peterx@redhat.com> writes:

> On Mon, Jan 29, 2024 at 09:51:06AM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Mon, Jan 29, 2024 at 01:41:01AM +0000, Liu, Yuan1 wrote:
>> >> Because this change has an impact on the previous live migration 
>> >> With IAA Patch, does the submission of the next version needs 
>> >> to be submitted based on this change?
>> >
>> > I'd say hold off a little while until we're more certain on the planned
>> > interface changes, to avoid you rebase your code back and forth; unless
>> > you're pretty confident that this will be the right approach.
>> >
>> > I apologize on not having looked at any of the QAT/IAA compression / zero
>> > detection series posted on the list; I do plan to read them very soon too
>> > after Fabiano.  So I may not have a complete full picture here yet, please
>> > bare with me.
>> >
>> > If this series is trying to provide a base ground for all the efforts,
>> > it'll be great if we can thoroughly discuss here and settle an approach
>> > soon that will satisfy everyone.
>> 
>> Just a summary if it helps:
>> 
>> For compression work (IAA/QPL, QAT) the discussion is around having a
>> new "compression acceleration" option that enables the accelerators and
>> is complementary to the existing zlib compression method. We'd choose
>> those automatically based on availability and we'd make HW accelerated
>> compression produce a stream that is compatible with QEMU's zlib stream
>> so we could migrate between solutions.
>> 
>> For zero page work and zero page acceleration (DSA), the question is how
>> to fit zero page detection into multifd and whether we need a new hook
>> multifd_ops->zero_page_detect() (or similar) to allow client code to
>> provide it's own zero page detection methods. My worry here is that
>> teaching multifd to recognize zero pages is one more coupling to the
>> "pages" data type. Ideallly we'd find a way to include that operation as
>> a prepare() responsibility and the client code would deal with it.
>
> Thanks Fabiano.
>
> Since I'm preparing the old series to post for some fundamental cleanups
> around multifd, and when I'm looking around the code, I noticed that
> _maybe_ it'll also be eaiser to apply such a series if we can cleanup more
> things then move towards a clean base to add more accelerators.
>
> I agree many ideas in your this series, but I may address it slightly
> different (e.g., I want to avoid send(), but you can consider that in the
> fixed-ram series instead), also it'll be after some other cleanup I plan to
> give a stab at which is not yet covered in this series.  I hope I can add
> your "Co-developed-by" in some of the patches there.  If you haven't spend
> more time on new version of this series, please wait 1-2 days so I can post
> my thoughts.

Sure, go ahead.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [External] Re: [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work
  2024-01-31 13:19         ` Fabiano Rosas
@ 2024-02-01  1:11           ` Hao Xiang
  2024-02-01 13:23             ` Fabiano Rosas
  0 siblings, 1 reply; 20+ messages in thread
From: Hao Xiang @ 2024-02-01  1:11 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: Peter Xu, Liu, Yuan1, qemu-devel, Bryan Zhang

On Wed, Jan 31, 2024 at 5:19 AM Fabiano Rosas <farosas@suse.de> wrote:
>
> Peter Xu <peterx@redhat.com> writes:
>
> > On Mon, Jan 29, 2024 at 09:51:06AM -0300, Fabiano Rosas wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >>
> >> > On Mon, Jan 29, 2024 at 01:41:01AM +0000, Liu, Yuan1 wrote:
> >> >> Because this change has an impact on the previous live migration
> >> >> With IAA Patch, does the submission of the next version needs
> >> >> to be submitted based on this change?
> >> >
> >> > I'd say hold off a little while until we're more certain on the planned
> >> > interface changes, to avoid you rebase your code back and forth; unless
> >> > you're pretty confident that this will be the right approach.
> >> >
> >> > I apologize on not having looked at any of the QAT/IAA compression / zero
> >> > detection series posted on the list; I do plan to read them very soon too
> >> > after Fabiano.  So I may not have a complete full picture here yet, please
> >> > bare with me.
> >> >
> >> > If this series is trying to provide a base ground for all the efforts,
> >> > it'll be great if we can thoroughly discuss here and settle an approach
> >> > soon that will satisfy everyone.
> >>
> >> Just a summary if it helps:
> >>
> >> For compression work (IAA/QPL, QAT) the discussion is around having a
> >> new "compression acceleration" option that enables the accelerators and
> >> is complementary to the existing zlib compression method. We'd choose
> >> those automatically based on availability and we'd make HW accelerated
> >> compression produce a stream that is compatible with QEMU's zlib stream
> >> so we could migrate between solutions.
> >>
> >> For zero page work and zero page acceleration (DSA), the question is how
> >> to fit zero page detection into multifd and whether we need a new hook
> >> multifd_ops->zero_page_detect() (or similar) to allow client code to
> >> provide it's own zero page detection methods. My worry here is that
> >> teaching multifd to recognize zero pages is one more coupling to the
> >> "pages" data type. Ideallly we'd find a way to include that operation as
> >> a prepare() responsibility and the client code would deal with it.
> >
> > Thanks Fabiano.

Hi Fabiano,

Your current refactoring assumes that compression ops and multifd
socket ops are mutually exclusive. Both of them need to implement the
entire MultiFDMethods interface. I think this works fine for now. Once
we introduce multifd zero page checking and we add a new interface for
that, we are adding a new method zero_page_detect() on the
MultiFDMethods interface. If we do that, zero_page_detect() needs to
be implemented in multifd_socket_ops and it also needs to be
implemented in zlib and zstd. On top of that, if we add an accelerator
to offload zero_page_detect(), that accelerator configuration can
co-exist with compression or socket. That makes things quite
complicated in my opinion.

Can we create an instance of MultiFDMethods at runtime and fill each
method depending on the configuration? If methods are not filled, we
fallback to fill it with the default implementation (like what
socket.c provides) For instance, if zstd is enabled and zero page
checking using CPU, the interface will be filled with all the
functions zstd currently implements and since zstd doesn't implement
zero_page_detect(), we will fallback to fill zero_page_detect() with
the default multifd zero page checking implementation.

> >
> > Since I'm preparing the old series to post for some fundamental cleanups
> > around multifd, and when I'm looking around the code, I noticed that
> > _maybe_ it'll also be eaiser to apply such a series if we can cleanup more
> > things then move towards a clean base to add more accelerators.
> >
> > I agree many ideas in your this series, but I may address it slightly
> > different (e.g., I want to avoid send(), but you can consider that in the
> > fixed-ram series instead), also it'll be after some other cleanup I plan to
> > give a stab at which is not yet covered in this series.  I hope I can add
> > your "Co-developed-by" in some of the patches there.  If you haven't spend
> > more time on new version of this series, please wait 1-2 days so I can post
> > my thoughts.
>
> Sure, go ahead.
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] migration/multifd: Separate compression ops from non-compression
  2024-01-31 13:14             ` Fabiano Rosas
@ 2024-02-01  3:25               ` Peter Xu
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Xu @ 2024-02-01  3:25 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Hao Xiang, Yuan Liu, Bryan Zhang, Avihai Horon

On Wed, Jan 31, 2024 at 10:14:58AM -0300, Fabiano Rosas wrote:
> > I am thinking the p->normal is mostly redundant.. at least on the sender
> > side that I just read.  Since I'll be preparing a new spin of the multifd
> > cleanup series I posted, maybe I can append one more to try dropping
> > p->normal[] completely.
> 
> Just for reference, you don't have to use it, but I have this patch:
> 
> https://gitlab.com/farosas/qemu/-/commit/4316e145ae7e7bf378ef7fde64c2b02260362847

Oops, I missed that even though I did have a glance over your branch (only
the final look, though), or I could have picked it up indeed, sorry.  But
it's also good news then it means it's probably the right thing to do.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [External] Re: [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work
  2024-02-01  1:11           ` [External] " Hao Xiang
@ 2024-02-01 13:23             ` Fabiano Rosas
  0 siblings, 0 replies; 20+ messages in thread
From: Fabiano Rosas @ 2024-02-01 13:23 UTC (permalink / raw)
  To: Hao Xiang; +Cc: Peter Xu, Liu, Yuan1, qemu-devel, Bryan Zhang

Hao Xiang <hao.xiang@bytedance.com> writes:

> On Wed, Jan 31, 2024 at 5:19 AM Fabiano Rosas <farosas@suse.de> wrote:
>>
>> Peter Xu <peterx@redhat.com> writes:
>>
>> > On Mon, Jan 29, 2024 at 09:51:06AM -0300, Fabiano Rosas wrote:
>> >> Peter Xu <peterx@redhat.com> writes:
>> >>
>> >> > On Mon, Jan 29, 2024 at 01:41:01AM +0000, Liu, Yuan1 wrote:
>> >> >> Because this change has an impact on the previous live migration
>> >> >> With IAA Patch, does the submission of the next version needs
>> >> >> to be submitted based on this change?
>> >> >
>> >> > I'd say hold off a little while until we're more certain on the planned
>> >> > interface changes, to avoid you rebase your code back and forth; unless
>> >> > you're pretty confident that this will be the right approach.
>> >> >
>> >> > I apologize on not having looked at any of the QAT/IAA compression / zero
>> >> > detection series posted on the list; I do plan to read them very soon too
>> >> > after Fabiano.  So I may not have a complete full picture here yet, please
>> >> > bare with me.
>> >> >
>> >> > If this series is trying to provide a base ground for all the efforts,
>> >> > it'll be great if we can thoroughly discuss here and settle an approach
>> >> > soon that will satisfy everyone.
>> >>
>> >> Just a summary if it helps:
>> >>
>> >> For compression work (IAA/QPL, QAT) the discussion is around having a
>> >> new "compression acceleration" option that enables the accelerators and
>> >> is complementary to the existing zlib compression method. We'd choose
>> >> those automatically based on availability and we'd make HW accelerated
>> >> compression produce a stream that is compatible with QEMU's zlib stream
>> >> so we could migrate between solutions.
>> >>
>> >> For zero page work and zero page acceleration (DSA), the question is how
>> >> to fit zero page detection into multifd and whether we need a new hook
>> >> multifd_ops->zero_page_detect() (or similar) to allow client code to
>> >> provide it's own zero page detection methods. My worry here is that
>> >> teaching multifd to recognize zero pages is one more coupling to the
>> >> "pages" data type. Ideallly we'd find a way to include that operation as
>> >> a prepare() responsibility and the client code would deal with it.
>> >
>> > Thanks Fabiano.
>
> Hi Fabiano,
>
> Your current refactoring assumes that compression ops and multifd
> socket ops are mutually exclusive. Both of them need to implement the
> entire MultiFDMethods interface. I think this works fine for now. Once
> we introduce multifd zero page checking and we add a new interface for
> that, we are adding a new method zero_page_detect() on the
> MultiFDMethods interface. If we do that, zero_page_detect() needs to
> be implemented in multifd_socket_ops and it also needs to be
> implemented in zlib and zstd. On top of that, if we add an accelerator
> to offload zero_page_detect(), that accelerator configuration can
> co-exist with compression or socket. That makes things quite
> complicated in my opinion.

Peter has proposed an alternate scheme. Take a look at his series. But
it basically keeps the compression as is and moves some code into the
prepare() phase:

https://lore.kernel.org/r/20240131103111.306523-1-peterx@redhat.com

> Can we create an instance of MultiFDMethods at runtime and fill each
> method depending on the configuration? If methods are not filled, we
> fallback to fill it with the default implementation (like what
> socket.c provides) For instance, if zstd is enabled and zero page
> checking using CPU, the interface will be filled with all the
> functions zstd currently implements and since zstd doesn't implement
> zero_page_detect(), we will fallback to fill zero_page_detect() with
> the default multifd zero page checking implementation.

Take a look whether incorporating zero_page_detect() in the prepare()
phase would work. We're trying to walk toward a multifd_ops model that
is not tied to the pages concept.

>> >
>> > Since I'm preparing the old series to post for some fundamental cleanups
>> > around multifd, and when I'm looking around the code, I noticed that
>> > _maybe_ it'll also be eaiser to apply such a series if we can cleanup more
>> > things then move towards a clean base to add more accelerators.
>> >
>> > I agree many ideas in your this series, but I may address it slightly
>> > different (e.g., I want to avoid send(), but you can consider that in the
>> > fixed-ram series instead), also it'll be after some other cleanup I plan to
>> > give a stab at which is not yet covered in this series.  I hope I can add
>> > your "Co-developed-by" in some of the patches there.  If you haven't spend
>> > more time on new version of this series, please wait 1-2 days so I can post
>> > my thoughts.
>>
>> Sure, go ahead.
>>


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-02-01 13:23 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-26 22:19 [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work Fabiano Rosas
2024-01-26 22:19 ` [PATCH 1/5] migration/multifd: Separate compression ops from non-compression Fabiano Rosas
2024-01-29  6:29   ` Peter Xu
2024-01-29 12:42     ` Fabiano Rosas
2024-01-30  8:42       ` Peter Xu
2024-01-30 15:11         ` Fabiano Rosas
2024-01-31  7:24           ` Peter Xu
2024-01-31 13:14             ` Fabiano Rosas
2024-02-01  3:25               ` Peter Xu
2024-01-26 22:19 ` [PATCH 2/5] migration/multifd: Move multifd_socket_ops to socket.c Fabiano Rosas
2024-01-26 22:19 ` [PATCH 3/5] migration/multifd: Add multifd_ops->send Fabiano Rosas
2024-01-26 22:19 ` [PATCH 4/5] migration/multifd: Simplify zero copy send Fabiano Rosas
2024-01-26 22:19 ` [PATCH 5/5] migration/multifd: Move zero copy flag into multifd_socket_setup Fabiano Rosas
2024-01-29  1:41 ` [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work Liu, Yuan1
2024-01-29  7:36   ` Peter Xu
2024-01-29 12:51     ` Fabiano Rosas
2024-01-31  9:29       ` Peter Xu
2024-01-31 13:19         ` Fabiano Rosas
2024-02-01  1:11           ` [External] " Hao Xiang
2024-02-01 13:23             ` Fabiano Rosas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.