All of lore.kernel.org
 help / color / mirror / Atom feed
* [PULL 00/11] migration queue
@ 2022-04-28 14:40 Dr. David Alan Gilbert (git)
  2022-04-28 14:40 ` [PULL 01/11] tests: fix encoding of IP addresses in x509 certs Dr. David Alan Gilbert (git)
                   ` (10 more replies)
  0 siblings, 11 replies; 20+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-04-28 14:40 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, leobras, berrange

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The following changes since commit cf6f26d6f9b2015ee12b4604b79359e76784163a:

  Merge tag 'kraxel-20220427-pull-request' of git://git.kraxel.org/qemu into staging (2022-04-27 10:49:28 -0700)

are available in the Git repository at:

  https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220428a

for you to fetch changes up to 62c49432662815dc520a41cd9f2adbd7ec1e22f4:

  multifd: Implement zero copy write in multifd migration (multifd-zero-copy) (2022-04-28 14:55:24 +0100)

----------------------------------------------------------------
Migration pull 2022-04-28

Zero copy send features from Leo
Test fixes from Dan

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

----------------------------------------------------------------
Daniel P. Berrangé (4):
      tests: fix encoding of IP addresses in x509 certs
      tests: convert XBZRLE migration test to use common helper
      tests: convert multifd migration tests to use common helper
      tests: ensure migration status isn't reported as failed

Leonardo Bras (7):
      QIOChannel: Add flags on io_writev and introduce io_flush callback
      QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
      migration: Add zero-copy-send parameter for QMP/HMP for Linux
      migration: Add migrate_use_tls() helper
      multifd: multifd_send_sync_main now returns negative on error
      multifd: Send header packet without flags if zero-copy-send is enabled
      multifd: Implement zero copy write in multifd migration (multifd-zero-copy)

 chardev/char-io.c                    |   2 +-
 hw/remote/mpqemu-link.c              |   2 +-
 include/io/channel-socket.h          |   2 +
 include/io/channel.h                 |  38 ++++++++-
 io/channel-buffer.c                  |   1 +
 io/channel-command.c                 |   1 +
 io/channel-file.c                    |   1 +
 io/channel-socket.c                  | 110 ++++++++++++++++++++++++-
 io/channel-tls.c                     |   1 +
 io/channel-websock.c                 |   1 +
 io/channel.c                         |  49 +++++++++---
 migration/channel.c                  |   3 +-
 migration/migration.c                |  52 +++++++++++-
 migration/migration.h                |   6 ++
 migration/multifd.c                  |  74 ++++++++++++++---
 migration/multifd.h                  |   4 +-
 migration/ram.c                      |  29 +++++--
 migration/rdma.c                     |   1 +
 migration/socket.c                   |  12 ++-
 monitor/hmp-cmds.c                   |   6 ++
 qapi/migration.json                  |  24 ++++++
 scsi/pr-manager-helper.c             |   2 +-
 tests/qtest/migration-helpers.c      |  13 +++
 tests/qtest/migration-helpers.h      |   1 +
 tests/qtest/migration-test.c         | 150 ++++++++++++++++-------------------
 tests/unit/crypto-tls-x509-helpers.c |  16 +++-
 tests/unit/test-crypto-tlssession.c  |  11 ++-
 tests/unit/test-io-channel-socket.c  |   1 +
 28 files changed, 482 insertions(+), 131 deletions(-)



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PULL 01/11] tests: fix encoding of IP addresses in x509 certs
  2022-04-28 14:40 [PULL 00/11] migration queue Dr. David Alan Gilbert (git)
@ 2022-04-28 14:40 ` Dr. David Alan Gilbert (git)
  2022-04-28 14:40 ` [PULL 02/11] tests: convert XBZRLE migration test to use common helper Dr. David Alan Gilbert (git)
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-04-28 14:40 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, leobras, berrange

From: Daniel P. Berrangé <berrange@redhat.com>

We need to encode just the address bytes, not the whole struct sockaddr
data. Add a test case to validate that we're matching on SAN IP
addresses correctly.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220426160048.812266-2-berrange@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tests/unit/crypto-tls-x509-helpers.c | 16 +++++++++++++---
 tests/unit/test-crypto-tlssession.c  | 11 +++++++++--
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/tests/unit/crypto-tls-x509-helpers.c b/tests/unit/crypto-tls-x509-helpers.c
index fc609b3fd4..e9937f60d8 100644
--- a/tests/unit/crypto-tls-x509-helpers.c
+++ b/tests/unit/crypto-tls-x509-helpers.c
@@ -168,9 +168,19 @@ test_tls_get_ipaddr(const char *addrstr,
     hints.ai_flags = AI_NUMERICHOST;
     g_assert(getaddrinfo(addrstr, NULL, &hints, &res) == 0);
 
-    *datalen = res->ai_addrlen;
-    *data = g_new(char, *datalen);
-    memcpy(*data, res->ai_addr, *datalen);
+    if (res->ai_family == AF_INET) {
+        struct sockaddr_in *in = (struct sockaddr_in *)res->ai_addr;
+        *datalen = sizeof(in->sin_addr);
+        *data = g_new(char, *datalen);
+        memcpy(*data, &in->sin_addr, *datalen);
+    } else if (res->ai_family == AF_INET6) {
+        struct sockaddr_in6 *in = (struct sockaddr_in6 *)res->ai_addr;
+        *datalen = sizeof(in->sin6_addr);
+        *data = g_new(char, *datalen);
+        memcpy(*data, &in->sin6_addr, *datalen);
+    } else {
+        g_assert_not_reached();
+    }
     freeaddrinfo(res);
 }
 
diff --git a/tests/unit/test-crypto-tlssession.c b/tests/unit/test-crypto-tlssession.c
index 5f0da9192c..a6935d8497 100644
--- a/tests/unit/test-crypto-tlssession.c
+++ b/tests/unit/test-crypto-tlssession.c
@@ -512,12 +512,19 @@ int main(int argc, char **argv)
                   false, true, "wiki.qemu.org", NULL);
 
     TEST_SESS_REG(altname4, cacertreq.filename,
+                  servercertalt1req.filename, clientcertreq.filename,
+                  false, false, "192.168.122.1", NULL);
+    TEST_SESS_REG(altname5, cacertreq.filename,
+                  servercertalt1req.filename, clientcertreq.filename,
+                  false, false, "fec0::dead:beaf", NULL);
+
+    TEST_SESS_REG(altname6, cacertreq.filename,
                   servercertalt2req.filename, clientcertreq.filename,
                   false, true, "qemu.org", NULL);
-    TEST_SESS_REG(altname5, cacertreq.filename,
+    TEST_SESS_REG(altname7, cacertreq.filename,
                   servercertalt2req.filename, clientcertreq.filename,
                   false, false, "www.qemu.org", NULL);
-    TEST_SESS_REG(altname6, cacertreq.filename,
+    TEST_SESS_REG(altname8, cacertreq.filename,
                   servercertalt2req.filename, clientcertreq.filename,
                   false, false, "wiki.qemu.org", NULL);
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PULL 02/11] tests: convert XBZRLE migration test to use common helper
  2022-04-28 14:40 [PULL 00/11] migration queue Dr. David Alan Gilbert (git)
  2022-04-28 14:40 ` [PULL 01/11] tests: fix encoding of IP addresses in x509 certs Dr. David Alan Gilbert (git)
@ 2022-04-28 14:40 ` Dr. David Alan Gilbert (git)
  2022-04-28 14:40 ` [PULL 03/11] tests: convert multifd migration tests " Dr. David Alan Gilbert (git)
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-04-28 14:40 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, leobras, berrange

From: Daniel P. Berrangé <berrange@redhat.com>

Most of the XBZRLE migration test logic is common with the rest of the
precopy tests, so it can use the helper with just one small tweak.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220426160048.812266-6-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: Manual merge since I skipped some of the other patches
---
 tests/qtest/migration-test.c | 67 ++++++++++++++----------------------
 1 file changed, 25 insertions(+), 42 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 2af36c16a3..359eb2cbb7 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -845,6 +845,9 @@ typedef struct {
         /* This test should fail, dest qemu should fail with abnormal status */
         MIG_TEST_FAIL_DEST_QUIT_ERR,
     } result;
+
+    /* Optional: set number of migration passes to wait for */
+    unsigned int iterations;
 } MigrateCommon;
 
 static void test_precopy_common(MigrateCommon *args)
@@ -890,7 +893,13 @@ static void test_precopy_common(MigrateCommon *args)
             qtest_set_expected_status(to, 1);
         }
     } else {
-        wait_for_migration_pass(from);
+        if (args->iterations) {
+            while (args->iterations--) {
+                wait_for_migration_pass(from);
+            }
+        } else {
+            wait_for_migration_pass(from);
+        }
 
         migrate_set_parameter_int(from, "downtime-limit", CONVERGE_DOWNTIME);
 
@@ -973,57 +982,31 @@ static void test_ignore_shared(void)
 }
 #endif
 
-static void test_xbzrle(const char *uri)
+static void *
+test_migrate_xbzrle_start(QTestState *from,
+                          QTestState *to)
 {
-    MigrateStart args = {};
-    QTestState *from, *to;
-
-    if (test_migrate_start(&from, &to, uri, &args)) {
-        return;
-    }
-
-    /*
-     * We want to pick a speed slow enough that the test completes
-     * quickly, but that it doesn't complete precopy even on a slow
-     * machine, so also set the downtime.
-     */
-    /* 1 ms should make it not converge*/
-    migrate_set_parameter_int(from, "downtime-limit", 1);
-    /* 1GB/s */
-    migrate_set_parameter_int(from, "max-bandwidth", 1000000000);
-
     migrate_set_parameter_int(from, "xbzrle-cache-size", 33554432);
 
     migrate_set_capability(from, "xbzrle", true);
     migrate_set_capability(to, "xbzrle", true);
-    /* Wait for the first serial output from the source */
-    wait_for_serial("src_serial");
 
-    migrate_qmp(from, uri, "{}");
-
-    wait_for_migration_pass(from);
-    /* Make sure we have 2 passes, so the xbzrle cache gets a workout */
-    wait_for_migration_pass(from);
-
-    /* 1000ms should converge */
-    migrate_set_parameter_int(from, "downtime-limit", 1000);
-
-    if (!got_stop) {
-        qtest_qmp_eventwait(from, "STOP");
-    }
-    qtest_qmp_eventwait(to, "RESUME");
-
-    wait_for_serial("dest_serial");
-    wait_for_migration_complete(from);
-
-    test_migrate_end(from, to, true);
+    return NULL;
 }
 
-static void test_xbzrle_unix(void)
+static void test_precopy_unix_xbzrle(void)
 {
     g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+    MigrateCommon args = {
+        .connect_uri = uri,
+        .listen_uri = uri,
+
+        .start_hook = test_migrate_xbzrle_start,
 
-    test_xbzrle(uri);
+        .iterations = 2,
+    };
+
+    test_precopy_common(&args);
 }
 
 static void test_precopy_tcp(void)
@@ -1498,9 +1481,9 @@ int main(int argc, char **argv)
     qtest_add_func("/migration/postcopy/recovery", test_postcopy_recovery);
     qtest_add_func("/migration/bad_dest", test_baddest);
     qtest_add_func("/migration/precopy/unix", test_precopy_unix);
+    qtest_add_func("/migration/precopy/unix/xbzrle", test_precopy_unix_xbzrle);
     qtest_add_func("/migration/precopy/tcp", test_precopy_tcp);
     /* qtest_add_func("/migration/ignore_shared", test_ignore_shared); */
-    qtest_add_func("/migration/xbzrle/unix", test_xbzrle_unix);
     qtest_add_func("/migration/fd_proto", test_migrate_fd_proto);
     qtest_add_func("/migration/validate_uuid", test_validate_uuid);
     qtest_add_func("/migration/validate_uuid_error", test_validate_uuid_error);
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PULL 03/11] tests: convert multifd migration tests to use common helper
  2022-04-28 14:40 [PULL 00/11] migration queue Dr. David Alan Gilbert (git)
  2022-04-28 14:40 ` [PULL 01/11] tests: fix encoding of IP addresses in x509 certs Dr. David Alan Gilbert (git)
  2022-04-28 14:40 ` [PULL 02/11] tests: convert XBZRLE migration test to use common helper Dr. David Alan Gilbert (git)
@ 2022-04-28 14:40 ` Dr. David Alan Gilbert (git)
  2022-04-28 14:40 ` [PULL 04/11] tests: ensure migration status isn't reported as failed Dr. David Alan Gilbert (git)
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-04-28 14:40 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, leobras, berrange

From: Daniel P. Berrangé <berrange@redhat.com>

Most of the multifd migration test logic is common with the rest of the
precopy tests, so it can use the helper without difficulty. The only
exception of the multifd cancellation test which tries to run multiple
migrations in a row.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220426160048.812266-7-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tests/qtest/migration-test.c | 77 +++++++++++++++++++-----------------
 1 file changed, 40 insertions(+), 37 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 359eb2cbb7..fc399e887f 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1244,26 +1244,12 @@ static void test_migrate_auto_converge(void)
     test_migrate_end(from, to, true);
 }
 
-static void test_multifd_tcp(const char *method)
+static void *
+test_migrate_precopy_tcp_multifd_start_common(QTestState *from,
+                                              QTestState *to,
+                                              const char *method)
 {
-    MigrateStart args = {};
-    QTestState *from, *to;
     QDict *rsp;
-    g_autofree char *uri = NULL;
-
-    if (test_migrate_start(&from, &to, "defer", &args)) {
-        return;
-    }
-
-    /*
-     * We want to pick a speed slow enough that the test completes
-     * quickly, but that it doesn't complete precopy even on a slow
-     * machine, so also set the downtime.
-     */
-    /* 1 ms should make it not converge*/
-    migrate_set_parameter_int(from, "downtime-limit", 1);
-    /* 1GB/s */
-    migrate_set_parameter_int(from, "max-bandwidth", 1000000000);
 
     migrate_set_parameter_int(from, "multifd-channels", 16);
     migrate_set_parameter_int(to, "multifd-channels", 16);
@@ -1279,41 +1265,58 @@ static void test_multifd_tcp(const char *method)
                            "  'arguments': { 'uri': 'tcp:127.0.0.1:0' }}");
     qobject_unref(rsp);
 
-    /* Wait for the first serial output from the source */
-    wait_for_serial("src_serial");
-
-    uri = migrate_get_socket_address(to, "socket-address");
-
-    migrate_qmp(from, uri, "{}");
-
-    wait_for_migration_pass(from);
+    return NULL;
+}
 
-    migrate_set_parameter_int(from, "downtime-limit", CONVERGE_DOWNTIME);
+static void *
+test_migrate_precopy_tcp_multifd_start(QTestState *from,
+                                       QTestState *to)
+{
+    return test_migrate_precopy_tcp_multifd_start_common(from, to, "none");
+}
 
-    if (!got_stop) {
-        qtest_qmp_eventwait(from, "STOP");
-    }
-    qtest_qmp_eventwait(to, "RESUME");
+static void *
+test_migrate_precopy_tcp_multifd_zlib_start(QTestState *from,
+                                            QTestState *to)
+{
+    return test_migrate_precopy_tcp_multifd_start_common(from, to, "zlib");
+}
 
-    wait_for_serial("dest_serial");
-    wait_for_migration_complete(from);
-    test_migrate_end(from, to, true);
+#ifdef CONFIG_ZSTD
+static void *
+test_migrate_precopy_tcp_multifd_zstd_start(QTestState *from,
+                                            QTestState *to)
+{
+    return test_migrate_precopy_tcp_multifd_start_common(from, to, "zstd");
 }
+#endif /* CONFIG_ZSTD */
 
 static void test_multifd_tcp_none(void)
 {
-    test_multifd_tcp("none");
+    MigrateCommon args = {
+        .listen_uri = "defer",
+        .start_hook = test_migrate_precopy_tcp_multifd_start,
+    };
+    test_precopy_common(&args);
 }
 
 static void test_multifd_tcp_zlib(void)
 {
-    test_multifd_tcp("zlib");
+    MigrateCommon args = {
+        .listen_uri = "defer",
+        .start_hook = test_migrate_precopy_tcp_multifd_zlib_start,
+    };
+    test_precopy_common(&args);
 }
 
 #ifdef CONFIG_ZSTD
 static void test_multifd_tcp_zstd(void)
 {
-    test_multifd_tcp("zstd");
+    MigrateCommon args = {
+        .listen_uri = "defer",
+        .start_hook = test_migrate_precopy_tcp_multifd_zstd_start,
+    };
+    test_precopy_common(&args);
 }
 #endif
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PULL 04/11] tests: ensure migration status isn't reported as failed
  2022-04-28 14:40 [PULL 00/11] migration queue Dr. David Alan Gilbert (git)
                   ` (2 preceding siblings ...)
  2022-04-28 14:40 ` [PULL 03/11] tests: convert multifd migration tests " Dr. David Alan Gilbert (git)
@ 2022-04-28 14:40 ` Dr. David Alan Gilbert (git)
  2022-04-28 14:40 ` [PULL 05/11] QIOChannel: Add flags on io_writev and introduce io_flush callback Dr. David Alan Gilbert (git)
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-04-28 14:40 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, leobras, berrange

From: Daniel P. Berrangé <berrange@redhat.com>

Various methods in the migration test call 'query_migrate' to fetch the
current status and then access a particular field. Almost all of these
cases expect the migration to be in a non-failed state. In the case of
'wait_for_migration_pass' in particular, if the status is 'failed' then
it will get into an infinite loop. By validating that the status is
not 'failed' the test suite will assert rather than hang when getting
into an unexpected state.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220426160048.812266-10-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tests/qtest/migration-helpers.c | 13 +++++++++++++
 tests/qtest/migration-helpers.h |  1 +
 tests/qtest/migration-test.c    |  6 +++---
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index 4ee26014b7..a6aa59e4e6 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -107,6 +107,19 @@ QDict *migrate_query(QTestState *who)
     return wait_command(who, "{ 'execute': 'query-migrate' }");
 }
 
+QDict *migrate_query_not_failed(QTestState *who)
+{
+    const char *status;
+    QDict *rsp = migrate_query(who);
+    status = qdict_get_str(rsp, "status");
+    if (g_str_equal(status, "failed")) {
+        g_printerr("query-migrate shows failed migration: %s\n",
+                   qdict_get_str(rsp, "error-desc"));
+    }
+    g_assert(!g_str_equal(status, "failed"));
+    return rsp;
+}
+
 /*
  * Note: caller is responsible to free the returned object via
  * g_free() after use
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index 555adafce1..d07e0fb748 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -26,6 +26,7 @@ G_GNUC_PRINTF(3, 4)
 void migrate_qmp(QTestState *who, const char *uri, const char *fmt, ...);
 
 QDict *migrate_query(QTestState *who);
+QDict *migrate_query_not_failed(QTestState *who);
 
 void wait_for_migration_status(QTestState *who,
                                const char *goal, const char **ungoals);
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index fc399e887f..e8fcdeee8b 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -174,7 +174,7 @@ static int64_t read_ram_property_int(QTestState *who, const char *property)
     QDict *rsp_return, *rsp_ram;
     int64_t result;
 
-    rsp_return = migrate_query(who);
+    rsp_return = migrate_query_not_failed(who);
     if (!qdict_haskey(rsp_return, "ram")) {
         /* Still in setup */
         result = 0;
@@ -191,7 +191,7 @@ static int64_t read_migrate_property_int(QTestState *who, const char *property)
     QDict *rsp_return;
     int64_t result;
 
-    rsp_return = migrate_query(who);
+    rsp_return = migrate_query_not_failed(who);
     result = qdict_get_try_int(rsp_return, property, 0);
     qobject_unref(rsp_return);
     return result;
@@ -206,7 +206,7 @@ static void read_blocktime(QTestState *who)
 {
     QDict *rsp_return;
 
-    rsp_return = migrate_query(who);
+    rsp_return = migrate_query_not_failed(who);
     g_assert(qdict_haskey(rsp_return, "postcopy-blocktime"));
     qobject_unref(rsp_return);
 }
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PULL 05/11] QIOChannel: Add flags on io_writev and introduce io_flush callback
  2022-04-28 14:40 [PULL 00/11] migration queue Dr. David Alan Gilbert (git)
                   ` (3 preceding siblings ...)
  2022-04-28 14:40 ` [PULL 04/11] tests: ensure migration status isn't reported as failed Dr. David Alan Gilbert (git)
@ 2022-04-28 14:40 ` Dr. David Alan Gilbert (git)
  2022-04-28 14:40 ` [PULL 06/11] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX Dr. David Alan Gilbert (git)
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-04-28 14:40 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, leobras, berrange

From: Leonardo Bras <leobras@redhat.com>

Add flags to io_writev and introduce io_flush as optional callback to
QIOChannelClass, allowing the implementation of zero copy writes by
subclasses.

How to use them:
- Write data using qio_channel_writev*(...,QIO_CHANNEL_WRITE_FLAG_ZERO_COPY),
- Wait write completion with qio_channel_flush().

Notes:
As some zero copy write implementations work asynchronously, it's
recommended to keep the write buffer untouched until the return of
qio_channel_flush(), to avoid the risk of sending an updated buffer
instead of the buffer state during write.

As io_flush callback is optional, if a subclass does not implement it, then:
- io_flush will return 0 without changing anything.

Also, some functions like qio_channel_writev_full_all() were adapted to
receive a flag parameter. That allows shared code between zero copy and
non-zero copy writev, and also an easier implementation on new flags.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20220426230654.637939-2-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 chardev/char-io.c                   |  2 +-
 hw/remote/mpqemu-link.c             |  2 +-
 include/io/channel.h                | 38 +++++++++++++++++++++-
 io/channel-buffer.c                 |  1 +
 io/channel-command.c                |  1 +
 io/channel-file.c                   |  1 +
 io/channel-socket.c                 |  2 ++
 io/channel-tls.c                    |  1 +
 io/channel-websock.c                |  1 +
 io/channel.c                        | 49 +++++++++++++++++++++++------
 migration/rdma.c                    |  1 +
 scsi/pr-manager-helper.c            |  2 +-
 tests/unit/test-io-channel-socket.c |  1 +
 13 files changed, 88 insertions(+), 14 deletions(-)

diff --git a/chardev/char-io.c b/chardev/char-io.c
index 8ced184160..4451128cba 100644
--- a/chardev/char-io.c
+++ b/chardev/char-io.c
@@ -122,7 +122,7 @@ int io_channel_send_full(QIOChannel *ioc,
 
         ret = qio_channel_writev_full(
             ioc, &iov, 1,
-            fds, nfds, NULL);
+            fds, nfds, 0, NULL);
         if (ret == QIO_CHANNEL_ERR_BLOCK) {
             if (offset) {
                 return offset;
diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
index 2a4aa651ca..9bd98e8219 100644
--- a/hw/remote/mpqemu-link.c
+++ b/hw/remote/mpqemu-link.c
@@ -68,7 +68,7 @@ bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
     }
 
     if (!qio_channel_writev_full_all(ioc, send, G_N_ELEMENTS(send),
-                                    fds, nfds, errp)) {
+                                    fds, nfds, 0, errp)) {
         ret = true;
     } else {
         trace_mpqemu_send_io_error(msg->cmd, msg->size, nfds);
diff --git a/include/io/channel.h b/include/io/channel.h
index 88988979f8..c680ee7480 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -32,12 +32,15 @@ OBJECT_DECLARE_TYPE(QIOChannel, QIOChannelClass,
 
 #define QIO_CHANNEL_ERR_BLOCK -2
 
+#define QIO_CHANNEL_WRITE_FLAG_ZERO_COPY 0x1
+
 typedef enum QIOChannelFeature QIOChannelFeature;
 
 enum QIOChannelFeature {
     QIO_CHANNEL_FEATURE_FD_PASS,
     QIO_CHANNEL_FEATURE_SHUTDOWN,
     QIO_CHANNEL_FEATURE_LISTEN,
+    QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY,
 };
 
 
@@ -104,6 +107,7 @@ struct QIOChannelClass {
                          size_t niov,
                          int *fds,
                          size_t nfds,
+                         int flags,
                          Error **errp);
     ssize_t (*io_readv)(QIOChannel *ioc,
                         const struct iovec *iov,
@@ -136,6 +140,8 @@ struct QIOChannelClass {
                                   IOHandler *io_read,
                                   IOHandler *io_write,
                                   void *opaque);
+    int (*io_flush)(QIOChannel *ioc,
+                    Error **errp);
 };
 
 /* General I/O handling functions */
@@ -228,6 +234,7 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
  * @niov: the length of the @iov array
  * @fds: an array of file handles to send
  * @nfds: number of file handles in @fds
+ * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
  * @errp: pointer to a NULL-initialized error object
  *
  * Write data to the IO channel, reading it from the
@@ -260,6 +267,7 @@ ssize_t qio_channel_writev_full(QIOChannel *ioc,
                                 size_t niov,
                                 int *fds,
                                 size_t nfds,
+                                int flags,
                                 Error **errp);
 
 /**
@@ -837,6 +845,7 @@ int qio_channel_readv_full_all(QIOChannel *ioc,
  * @niov: the length of the @iov array
  * @fds: an array of file handles to send
  * @nfds: number of file handles in @fds
+ * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
  * @errp: pointer to a NULL-initialized error object
  *
  *
@@ -846,6 +855,14 @@ int qio_channel_readv_full_all(QIOChannel *ioc,
  * to be written, yielding from the current coroutine
  * if required.
  *
+ * If QIO_CHANNEL_WRITE_FLAG_ZERO_COPY is passed in flags,
+ * instead of waiting for all requested data to be written,
+ * this function will wait until it's all queued for writing.
+ * In this case, if the buffer gets changed between queueing and
+ * sending, the updated buffer will be sent. If this is not a
+ * desired behavior, it's suggested to call qio_channel_flush()
+ * before reusing the buffer.
+ *
  * Returns: 0 if all bytes were written, or -1 on error
  */
 
@@ -853,6 +870,25 @@ int qio_channel_writev_full_all(QIOChannel *ioc,
                                 const struct iovec *iov,
                                 size_t niov,
                                 int *fds, size_t nfds,
-                                Error **errp);
+                                int flags, Error **errp);
+
+/**
+ * qio_channel_flush:
+ * @ioc: the channel object
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Will block until every packet queued with
+ * qio_channel_writev_full() + QIO_CHANNEL_WRITE_FLAG_ZERO_COPY
+ * is sent, or return in case of any error.
+ *
+ * If not implemented, acts as a no-op, and returns 0.
+ *
+ * Returns -1 if any error is found,
+ *          1 if every send failed to use zero copy.
+ *          0 otherwise.
+ */
+
+int qio_channel_flush(QIOChannel *ioc,
+                      Error **errp);
 
 #endif /* QIO_CHANNEL_H */
diff --git a/io/channel-buffer.c b/io/channel-buffer.c
index baa4e2b089..bf52011be2 100644
--- a/io/channel-buffer.c
+++ b/io/channel-buffer.c
@@ -81,6 +81,7 @@ static ssize_t qio_channel_buffer_writev(QIOChannel *ioc,
                                          size_t niov,
                                          int *fds,
                                          size_t nfds,
+                                         int flags,
                                          Error **errp)
 {
     QIOChannelBuffer *bioc = QIO_CHANNEL_BUFFER(ioc);
diff --git a/io/channel-command.c b/io/channel-command.c
index 338da73ade..54560464ae 100644
--- a/io/channel-command.c
+++ b/io/channel-command.c
@@ -258,6 +258,7 @@ static ssize_t qio_channel_command_writev(QIOChannel *ioc,
                                           size_t niov,
                                           int *fds,
                                           size_t nfds,
+                                          int flags,
                                           Error **errp)
 {
     QIOChannelCommand *cioc = QIO_CHANNEL_COMMAND(ioc);
diff --git a/io/channel-file.c b/io/channel-file.c
index d7cf6d278f..ef6807a6be 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -114,6 +114,7 @@ static ssize_t qio_channel_file_writev(QIOChannel *ioc,
                                        size_t niov,
                                        int *fds,
                                        size_t nfds,
+                                       int flags,
                                        Error **errp)
 {
     QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
diff --git a/io/channel-socket.c b/io/channel-socket.c
index 9f5ddf68b6..696a04dc9c 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -524,6 +524,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
                                          size_t niov,
                                          int *fds,
                                          size_t nfds,
+                                         int flags,
                                          Error **errp)
 {
     QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
@@ -619,6 +620,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
                                          size_t niov,
                                          int *fds,
                                          size_t nfds,
+                                         int flags,
                                          Error **errp)
 {
     QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
diff --git a/io/channel-tls.c b/io/channel-tls.c
index 2ae1b92fc0..4ce890a538 100644
--- a/io/channel-tls.c
+++ b/io/channel-tls.c
@@ -301,6 +301,7 @@ static ssize_t qio_channel_tls_writev(QIOChannel *ioc,
                                       size_t niov,
                                       int *fds,
                                       size_t nfds,
+                                      int flags,
                                       Error **errp)
 {
     QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
diff --git a/io/channel-websock.c b/io/channel-websock.c
index 55145a6a8c..9619906ac3 100644
--- a/io/channel-websock.c
+++ b/io/channel-websock.c
@@ -1127,6 +1127,7 @@ static ssize_t qio_channel_websock_writev(QIOChannel *ioc,
                                           size_t niov,
                                           int *fds,
                                           size_t nfds,
+                                          int flags,
                                           Error **errp)
 {
     QIOChannelWebsock *wioc = QIO_CHANNEL_WEBSOCK(ioc);
diff --git a/io/channel.c b/io/channel.c
index e8b019dc36..0640941ac5 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -72,18 +72,32 @@ ssize_t qio_channel_writev_full(QIOChannel *ioc,
                                 size_t niov,
                                 int *fds,
                                 size_t nfds,
+                                int flags,
                                 Error **errp)
 {
     QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
 
-    if ((fds || nfds) &&
-        !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS)) {
+    if (fds || nfds) {
+        if (!qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS)) {
+            error_setg_errno(errp, EINVAL,
+                             "Channel does not support file descriptor passing");
+            return -1;
+        }
+        if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
+            error_setg_errno(errp, EINVAL,
+                             "Zero Copy does not support file descriptor passing");
+            return -1;
+        }
+    }
+
+    if ((flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) &&
+        !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY)) {
         error_setg_errno(errp, EINVAL,
-                         "Channel does not support file descriptor passing");
+                         "Requested Zero Copy feature is not available");
         return -1;
     }
 
-    return klass->io_writev(ioc, iov, niov, fds, nfds, errp);
+    return klass->io_writev(ioc, iov, niov, fds, nfds, flags, errp);
 }
 
 
@@ -217,14 +231,14 @@ int qio_channel_writev_all(QIOChannel *ioc,
                            size_t niov,
                            Error **errp)
 {
-    return qio_channel_writev_full_all(ioc, iov, niov, NULL, 0, errp);
+    return qio_channel_writev_full_all(ioc, iov, niov, NULL, 0, 0, errp);
 }
 
 int qio_channel_writev_full_all(QIOChannel *ioc,
                                 const struct iovec *iov,
                                 size_t niov,
                                 int *fds, size_t nfds,
-                                Error **errp)
+                                int flags, Error **errp)
 {
     int ret = -1;
     struct iovec *local_iov = g_new(struct iovec, niov);
@@ -237,8 +251,10 @@ int qio_channel_writev_full_all(QIOChannel *ioc,
 
     while (nlocal_iov > 0) {
         ssize_t len;
-        len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds, nfds,
-                                      errp);
+
+        len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds,
+                                            nfds, flags, errp);
+
         if (len == QIO_CHANNEL_ERR_BLOCK) {
             if (qemu_in_coroutine()) {
                 qio_channel_yield(ioc, G_IO_OUT);
@@ -277,7 +293,7 @@ ssize_t qio_channel_writev(QIOChannel *ioc,
                            size_t niov,
                            Error **errp)
 {
-    return qio_channel_writev_full(ioc, iov, niov, NULL, 0, errp);
+    return qio_channel_writev_full(ioc, iov, niov, NULL, 0, 0, errp);
 }
 
 
@@ -297,7 +313,7 @@ ssize_t qio_channel_write(QIOChannel *ioc,
                           Error **errp)
 {
     struct iovec iov = { .iov_base = (char *)buf, .iov_len = buflen };
-    return qio_channel_writev_full(ioc, &iov, 1, NULL, 0, errp);
+    return qio_channel_writev_full(ioc, &iov, 1, NULL, 0, 0, errp);
 }
 
 
@@ -473,6 +489,19 @@ off_t qio_channel_io_seek(QIOChannel *ioc,
     return klass->io_seek(ioc, offset, whence, errp);
 }
 
+int qio_channel_flush(QIOChannel *ioc,
+                                Error **errp)
+{
+    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
+
+    if (!klass->io_flush ||
+        !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY)) {
+        return 0;
+    }
+
+    return klass->io_flush(ioc, errp);
+}
+
 
 static void qio_channel_restart_read(void *opaque)
 {
diff --git a/migration/rdma.c b/migration/rdma.c
index ef1e65ec36..672d1958a9 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2840,6 +2840,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
                                        size_t niov,
                                        int *fds,
                                        size_t nfds,
+                                       int flags,
                                        Error **errp)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
diff --git a/scsi/pr-manager-helper.c b/scsi/pr-manager-helper.c
index 451c7631b7..3be52a98d5 100644
--- a/scsi/pr-manager-helper.c
+++ b/scsi/pr-manager-helper.c
@@ -77,7 +77,7 @@ static int pr_manager_helper_write(PRManagerHelper *pr_mgr,
         iov.iov_base = (void *)buf;
         iov.iov_len = sz;
         n_written = qio_channel_writev_full(QIO_CHANNEL(pr_mgr->ioc), &iov, 1,
-                                            nfds ? &fd : NULL, nfds, errp);
+                                            nfds ? &fd : NULL, nfds, 0, errp);
 
         if (n_written <= 0) {
             assert(n_written != QIO_CHANNEL_ERR_BLOCK);
diff --git a/tests/unit/test-io-channel-socket.c b/tests/unit/test-io-channel-socket.c
index c49eec1f03..6713886d02 100644
--- a/tests/unit/test-io-channel-socket.c
+++ b/tests/unit/test-io-channel-socket.c
@@ -444,6 +444,7 @@ static void test_io_channel_unix_fd_pass(void)
                             G_N_ELEMENTS(iosend),
                             fdsend,
                             G_N_ELEMENTS(fdsend),
+                            0,
                             &error_abort);
 
     qio_channel_readv_full(dst,
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PULL 06/11] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
  2022-04-28 14:40 [PULL 00/11] migration queue Dr. David Alan Gilbert (git)
                   ` (4 preceding siblings ...)
  2022-04-28 14:40 ` [PULL 05/11] QIOChannel: Add flags on io_writev and introduce io_flush callback Dr. David Alan Gilbert (git)
@ 2022-04-28 14:40 ` Dr. David Alan Gilbert (git)
  2022-04-28 16:20   ` Dr. David Alan Gilbert
  2022-04-28 14:40 ` [PULL 07/11] migration: Add zero-copy-send parameter for QMP/HMP for Linux Dr. David Alan Gilbert (git)
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-04-28 14:40 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, leobras, berrange

From: Leonardo Bras <leobras@redhat.com>

For CONFIG_LINUX, implement the new zero copy flag and the optional callback
io_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
feature is available in the host kernel, which is checked on
qio_channel_socket_connect_sync()

qio_channel_socket_flush() was implemented by counting how many times
sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the
socket's error queue, in order to find how many of them finished sending.
Flush will loop until those counters are the same, or until some error occurs.

Notes on using writev() with QIO_CHANNEL_WRITE_FLAG_ZERO_COPY:
1: Buffer
- As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying,
some caution is necessary to avoid overwriting any buffer before it's sent.
If something like this happen, a newer version of the buffer may be sent instead.
- If this is a problem, it's recommended to call qio_channel_flush() before freeing
or re-using the buffer.

2: Locked memory
- When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and
unlocked after it's sent.
- Depending on the size of each buffer, and how often it's sent, it may require
a larger amount of locked memory than usually available to non-root user.
- If the required amount of locked memory is not available, writev_zero_copy
will return an error, which can abort an operation like migration,
- Because of this, when an user code wants to add zero copy as a feature, it
requires a mechanism to disable it, so it can still be accessible to less
privileged users.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20220426230654.637939-3-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/io/channel-socket.h |   2 +
 io/channel-socket.c         | 108 ++++++++++++++++++++++++++++++++++--
 2 files changed, 106 insertions(+), 4 deletions(-)

diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
index e747e63514..513c428fe4 100644
--- a/include/io/channel-socket.h
+++ b/include/io/channel-socket.h
@@ -47,6 +47,8 @@ struct QIOChannelSocket {
     socklen_t localAddrLen;
     struct sockaddr_storage remoteAddr;
     socklen_t remoteAddrLen;
+    ssize_t zero_copy_queued;
+    ssize_t zero_copy_sent;
 };
 
 
diff --git a/io/channel-socket.c b/io/channel-socket.c
index 696a04dc9c..1dd85fc1ef 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -25,6 +25,10 @@
 #include "io/channel-watch.h"
 #include "trace.h"
 #include "qapi/clone-visitor.h"
+#ifdef CONFIG_LINUX
+#include <linux/errqueue.h>
+#include <bits/socket.h>
+#endif
 
 #define SOCKET_MAX_FDS 16
 
@@ -54,6 +58,8 @@ qio_channel_socket_new(void)
 
     sioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET));
     sioc->fd = -1;
+    sioc->zero_copy_queued = 0;
+    sioc->zero_copy_sent = 0;
 
     ioc = QIO_CHANNEL(sioc);
     qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN);
@@ -153,6 +159,16 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
         return -1;
     }
 
+#ifdef CONFIG_LINUX
+    int ret, v = 1;
+    ret = setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v));
+    if (ret == 0) {
+        /* Zero copy available on host */
+        qio_channel_set_feature(QIO_CHANNEL(ioc),
+                                QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY);
+    }
+#endif
+
     return 0;
 }
 
@@ -533,6 +549,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
     char control[CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)];
     size_t fdsize = sizeof(int) * nfds;
     struct cmsghdr *cmsg;
+    int sflags = 0;
 
     memset(control, 0, CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS));
 
@@ -557,15 +574,27 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
         memcpy(CMSG_DATA(cmsg), fds, fdsize);
     }
 
+    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
+        sflags = MSG_ZEROCOPY;
+    }
+
  retry:
-    ret = sendmsg(sioc->fd, &msg, 0);
+    ret = sendmsg(sioc->fd, &msg, sflags);
     if (ret <= 0) {
-        if (errno == EAGAIN) {
+        switch (errno) {
+        case EAGAIN:
             return QIO_CHANNEL_ERR_BLOCK;
-        }
-        if (errno == EINTR) {
+        case EINTR:
             goto retry;
+        case ENOBUFS:
+            if (sflags & MSG_ZEROCOPY) {
+                error_setg_errno(errp, errno,
+                                 "Process can't lock enough memory for using MSG_ZEROCOPY");
+                return -1;
+            }
+            break;
         }
+
         error_setg_errno(errp, errno,
                          "Unable to write to socket");
         return -1;
@@ -659,6 +688,74 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
 }
 #endif /* WIN32 */
 
+
+#ifdef CONFIG_LINUX
+static int qio_channel_socket_flush(QIOChannel *ioc,
+                                    Error **errp)
+{
+    QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
+    struct msghdr msg = {};
+    struct sock_extended_err *serr;
+    struct cmsghdr *cm;
+    char control[CMSG_SPACE(sizeof(*serr))];
+    int received;
+    int ret = 1;
+
+    msg.msg_control = control;
+    msg.msg_controllen = sizeof(control);
+    memset(control, 0, sizeof(control));
+
+    while (sioc->zero_copy_sent < sioc->zero_copy_queued) {
+        received = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE);
+        if (received < 0) {
+            switch (errno) {
+            case EAGAIN:
+                /* Nothing on errqueue, wait until something is available */
+                qio_channel_wait(ioc, G_IO_ERR);
+                continue;
+            case EINTR:
+                continue;
+            default:
+                error_setg_errno(errp, errno,
+                                 "Unable to read errqueue");
+                return -1;
+            }
+        }
+
+        cm = CMSG_FIRSTHDR(&msg);
+        if (cm->cmsg_level != SOL_IP &&
+            cm->cmsg_type != IP_RECVERR) {
+            error_setg_errno(errp, EPROTOTYPE,
+                             "Wrong cmsg in errqueue");
+            return -1;
+        }
+
+        serr = (void *) CMSG_DATA(cm);
+        if (serr->ee_errno != SO_EE_ORIGIN_NONE) {
+            error_setg_errno(errp, serr->ee_errno,
+                             "Error on socket");
+            return -1;
+        }
+        if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) {
+            error_setg_errno(errp, serr->ee_origin,
+                             "Error not from zero copy");
+            return -1;
+        }
+
+        /* No errors, count successfully finished sendmsg()*/
+        sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
+
+        /* If any sendmsg() succeeded using zero copy, return 0 at the end */
+        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
+            ret = 0;
+        }
+    }
+
+    return ret;
+}
+
+#endif /* CONFIG_LINUX */
+
 static int
 qio_channel_socket_set_blocking(QIOChannel *ioc,
                                 bool enabled,
@@ -789,6 +886,9 @@ static void qio_channel_socket_class_init(ObjectClass *klass,
     ioc_klass->io_set_delay = qio_channel_socket_set_delay;
     ioc_klass->io_create_watch = qio_channel_socket_create_watch;
     ioc_klass->io_set_aio_fd_handler = qio_channel_socket_set_aio_fd_handler;
+#ifdef CONFIG_LINUX
+    ioc_klass->io_flush = qio_channel_socket_flush;
+#endif
 }
 
 static const TypeInfo qio_channel_socket_info = {
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PULL 07/11] migration: Add zero-copy-send parameter for QMP/HMP for Linux
  2022-04-28 14:40 [PULL 00/11] migration queue Dr. David Alan Gilbert (git)
                   ` (5 preceding siblings ...)
  2022-04-28 14:40 ` [PULL 06/11] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX Dr. David Alan Gilbert (git)
@ 2022-04-28 14:40 ` Dr. David Alan Gilbert (git)
  2022-04-28 14:40 ` [PULL 08/11] migration: Add migrate_use_tls() helper Dr. David Alan Gilbert (git)
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-04-28 14:40 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, leobras, berrange

From: Leonardo Bras <leobras@redhat.com>

Add property that allows zero-copy migration of memory pages
on the sending side, and also includes a helper function
migrate_use_zero_copy_send() to check if it's enabled.

No code is introduced to actually do the migration, but it allow
future implementations to enable/disable this feature.

On non-Linux builds this parameter is compiled-out.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20220426230654.637939-4-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 32 ++++++++++++++++++++++++++++++++
 migration/migration.h |  5 +++++
 migration/socket.c    | 11 +++++++++--
 monitor/hmp-cmds.c    |  6 ++++++
 qapi/migration.json   | 24 ++++++++++++++++++++++++
 5 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 5a31b23bd6..3e91f4b5e2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -910,6 +910,10 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     params->multifd_zlib_level = s->parameters.multifd_zlib_level;
     params->has_multifd_zstd_level = true;
     params->multifd_zstd_level = s->parameters.multifd_zstd_level;
+#ifdef CONFIG_LINUX
+    params->has_zero_copy_send = true;
+    params->zero_copy_send = s->parameters.zero_copy_send;
+#endif
     params->has_xbzrle_cache_size = true;
     params->xbzrle_cache_size = s->parameters.xbzrle_cache_size;
     params->has_max_postcopy_bandwidth = true;
@@ -1567,6 +1571,11 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
     if (params->has_multifd_compression) {
         dest->multifd_compression = params->multifd_compression;
     }
+#ifdef CONFIG_LINUX
+    if (params->has_zero_copy_send) {
+        dest->zero_copy_send = params->zero_copy_send;
+    }
+#endif
     if (params->has_xbzrle_cache_size) {
         dest->xbzrle_cache_size = params->xbzrle_cache_size;
     }
@@ -1679,6 +1688,11 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
     if (params->has_multifd_compression) {
         s->parameters.multifd_compression = params->multifd_compression;
     }
+#ifdef CONFIG_LINUX
+    if (params->has_zero_copy_send) {
+        s->parameters.zero_copy_send = params->zero_copy_send;
+    }
+#endif
     if (params->has_xbzrle_cache_size) {
         s->parameters.xbzrle_cache_size = params->xbzrle_cache_size;
         xbzrle_cache_resize(params->xbzrle_cache_size, errp);
@@ -2563,6 +2577,17 @@ int migrate_multifd_zstd_level(void)
     return s->parameters.multifd_zstd_level;
 }
 
+#ifdef CONFIG_LINUX
+bool migrate_use_zero_copy_send(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->parameters.zero_copy_send;
+}
+#endif
+
 int migrate_use_xbzrle(void)
 {
     MigrationState *s;
@@ -4206,6 +4231,10 @@ static Property migration_properties[] = {
     DEFINE_PROP_UINT8("multifd-zstd-level", MigrationState,
                       parameters.multifd_zstd_level,
                       DEFAULT_MIGRATE_MULTIFD_ZSTD_LEVEL),
+#ifdef CONFIG_LINUX
+    DEFINE_PROP_BOOL("zero_copy_send", MigrationState,
+                      parameters.zero_copy_send, false),
+#endif
     DEFINE_PROP_SIZE("xbzrle-cache-size", MigrationState,
                       parameters.xbzrle_cache_size,
                       DEFAULT_MIGRATE_XBZRLE_CACHE_SIZE),
@@ -4303,6 +4332,9 @@ static void migration_instance_init(Object *obj)
     params->has_multifd_compression = true;
     params->has_multifd_zlib_level = true;
     params->has_multifd_zstd_level = true;
+#ifdef CONFIG_LINUX
+    params->has_zero_copy_send = true;
+#endif
     params->has_xbzrle_cache_size = true;
     params->has_max_postcopy_bandwidth = true;
     params->has_max_cpu_throttle = true;
diff --git a/migration/migration.h b/migration/migration.h
index a863032b71..e8f2941a55 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -375,6 +375,11 @@ MultiFDCompression migrate_multifd_compression(void);
 int migrate_multifd_zlib_level(void);
 int migrate_multifd_zstd_level(void);
 
+#ifdef CONFIG_LINUX
+bool migrate_use_zero_copy_send(void);
+#else
+#define migrate_use_zero_copy_send() (false)
+#endif
 int migrate_use_xbzrle(void);
 uint64_t migrate_xbzrle_cache_size(void);
 bool migrate_colo_enabled(void);
diff --git a/migration/socket.c b/migration/socket.c
index 05705a32d8..3754d8f72c 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -74,9 +74,16 @@ static void socket_outgoing_migration(QIOTask *task,
 
     if (qio_task_propagate_error(task, &err)) {
         trace_migration_socket_outgoing_error(error_get_pretty(err));
-    } else {
-        trace_migration_socket_outgoing_connected(data->hostname);
+           goto out;
     }
+
+    trace_migration_socket_outgoing_connected(data->hostname);
+
+    if (migrate_use_zero_copy_send()) {
+        error_setg(&err, "Zero copy send not available in migration");
+    }
+
+out:
     migration_channel_connect(data->s, sioc, data->hostname, err);
     object_unref(OBJECT(sioc));
 }
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 93061a11af..622c783c32 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1309,6 +1309,12 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
         p->has_multifd_zstd_level = true;
         visit_type_uint8(v, param, &p->multifd_zstd_level, &err);
         break;
+#ifdef CONFIG_LINUX
+    case MIGRATION_PARAMETER_ZERO_COPY_SEND:
+        p->has_zero_copy_send = true;
+        visit_type_bool(v, param, &p->zero_copy_send, &err);
+        break;
+#endif
     case MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE:
         p->has_xbzrle_cache_size = true;
         if (!visit_type_size(v, param, &cache_size, &err)) {
diff --git a/qapi/migration.json b/qapi/migration.json
index 409eb086a2..04246481ce 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -741,6 +741,13 @@
 #                      will consume more CPU.
 #                      Defaults to 1. (Since 5.0)
 #
+# @zero-copy-send: Controls behavior on sending memory pages on migration.
+#                  When true, enables a zero-copy mechanism for sending memory
+#                  pages, if host supports it.
+#                  Requires that QEMU be permitted to use locked memory for guest
+#                  RAM pages.
+#                  Defaults to false. (Since 7.1)
+#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #                        aliases for the purpose of dirty bitmap migration.  Such
 #                        aliases may for example be the corresponding names on the
@@ -780,6 +787,7 @@
            'xbzrle-cache-size', 'max-postcopy-bandwidth',
            'max-cpu-throttle', 'multifd-compression',
            'multifd-zlib-level' ,'multifd-zstd-level',
+           { 'name': 'zero-copy-send', 'if' : 'CONFIG_LINUX'},
            'block-bitmap-mapping' ] }
 
 ##
@@ -906,6 +914,13 @@
 #                      will consume more CPU.
 #                      Defaults to 1. (Since 5.0)
 #
+# @zero-copy-send: Controls behavior on sending memory pages on migration.
+#                  When true, enables a zero-copy mechanism for sending memory
+#                  pages, if host supports it.
+#                  Requires that QEMU be permitted to use locked memory for guest
+#                  RAM pages.
+#                  Defaults to false. (Since 7.1)
+#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #                        aliases for the purpose of dirty bitmap migration.  Such
 #                        aliases may for example be the corresponding names on the
@@ -960,6 +975,7 @@
             '*multifd-compression': 'MultiFDCompression',
             '*multifd-zlib-level': 'uint8',
             '*multifd-zstd-level': 'uint8',
+            '*zero-copy-send': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
             '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
 
 ##
@@ -1106,6 +1122,13 @@
 #                      will consume more CPU.
 #                      Defaults to 1. (Since 5.0)
 #
+# @zero-copy-send: Controls behavior on sending memory pages on migration.
+#                  When true, enables a zero-copy mechanism for sending memory
+#                  pages, if host supports it.
+#                  Requires that QEMU be permitted to use locked memory for guest
+#                  RAM pages.
+#                  Defaults to false. (Since 7.1)
+#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #                        aliases for the purpose of dirty bitmap migration.  Such
 #                        aliases may for example be the corresponding names on the
@@ -1158,6 +1181,7 @@
             '*multifd-compression': 'MultiFDCompression',
             '*multifd-zlib-level': 'uint8',
             '*multifd-zstd-level': 'uint8',
+            '*zero-copy-send': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
             '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
 
 ##
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PULL 08/11] migration: Add migrate_use_tls() helper
  2022-04-28 14:40 [PULL 00/11] migration queue Dr. David Alan Gilbert (git)
                   ` (6 preceding siblings ...)
  2022-04-28 14:40 ` [PULL 07/11] migration: Add zero-copy-send parameter for QMP/HMP for Linux Dr. David Alan Gilbert (git)
@ 2022-04-28 14:40 ` Dr. David Alan Gilbert (git)
  2022-04-28 14:40 ` [PULL 09/11] multifd: multifd_send_sync_main now returns negative on error Dr. David Alan Gilbert (git)
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-04-28 14:40 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, leobras, berrange

From: Leonardo Bras <leobras@redhat.com>

A lot of places check parameters.tls_creds in order to evaluate if TLS is
in use, and sometimes call migrate_get_current() just for that test.

Add new helper function migrate_use_tls() in order to simplify testing
for TLS usage.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220426230654.637939-5-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/channel.c   | 3 +--
 migration/migration.c | 9 +++++++++
 migration/migration.h | 1 +
 migration/multifd.c   | 5 +----
 4 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/migration/channel.c b/migration/channel.c
index c6a8dcf1d7..a162d00fea 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -38,8 +38,7 @@ void migration_channel_process_incoming(QIOChannel *ioc)
     trace_migration_set_incoming_channel(
         ioc, object_get_typename(OBJECT(ioc)));
 
-    if (s->parameters.tls_creds &&
-        *s->parameters.tls_creds &&
+    if (migrate_use_tls() &&
         !object_dynamic_cast(OBJECT(ioc),
                              TYPE_QIO_CHANNEL_TLS)) {
         migration_tls_channel_process_incoming(s, ioc, &local_err);
diff --git a/migration/migration.c b/migration/migration.c
index 3e91f4b5e2..4b6df2eb5e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2588,6 +2588,15 @@ bool migrate_use_zero_copy_send(void)
 }
 #endif
 
+int migrate_use_tls(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->parameters.tls_creds && *s->parameters.tls_creds;
+}
+
 int migrate_use_xbzrle(void)
 {
     MigrationState *s;
diff --git a/migration/migration.h b/migration/migration.h
index e8f2941a55..485d58b95f 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -380,6 +380,7 @@ bool migrate_use_zero_copy_send(void);
 #else
 #define migrate_use_zero_copy_send() (false)
 #endif
+int migrate_use_tls(void);
 int migrate_use_xbzrle(void);
 uint64_t migrate_xbzrle_cache_size(void);
 bool migrate_colo_enabled(void);
diff --git a/migration/multifd.c b/migration/multifd.c
index 9ea4f581e2..2a8c8570c3 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -782,15 +782,12 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
                                     QIOChannel *ioc,
                                     Error *error)
 {
-    MigrationState *s = migrate_get_current();
-
     trace_multifd_set_outgoing_channel(
         ioc, object_get_typename(OBJECT(ioc)),
         migrate_get_current()->hostname, error);
 
     if (!error) {
-        if (s->parameters.tls_creds &&
-            *s->parameters.tls_creds &&
+        if (migrate_use_tls() &&
             !object_dynamic_cast(OBJECT(ioc),
                                  TYPE_QIO_CHANNEL_TLS)) {
             multifd_tls_channel_connect(p, ioc, &error);
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PULL 09/11] multifd: multifd_send_sync_main now returns negative on error
  2022-04-28 14:40 [PULL 00/11] migration queue Dr. David Alan Gilbert (git)
                   ` (7 preceding siblings ...)
  2022-04-28 14:40 ` [PULL 08/11] migration: Add migrate_use_tls() helper Dr. David Alan Gilbert (git)
@ 2022-04-28 14:40 ` Dr. David Alan Gilbert (git)
  2022-04-28 14:40 ` [PULL 10/11] multifd: Send header packet without flags if zero-copy-send is enabled Dr. David Alan Gilbert (git)
  2022-04-28 14:40 ` [PULL 11/11] multifd: Implement zero copy write in multifd migration (multifd-zero-copy) Dr. David Alan Gilbert (git)
  10 siblings, 0 replies; 20+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-04-28 14:40 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, leobras, berrange

From: Leonardo Bras <leobras@redhat.com>

Even though multifd_send_sync_main() currently emits error_reports, it's
callers don't really check it before continuing.

Change multifd_send_sync_main() to return -1 on error and 0 on success.
Also change all it's callers to make use of this change and possibly fail
earlier.

(This change is important to next patch on  multifd zero copy
implementation, to make it sure an error in zero-copy flush does not go
unnoticed.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220426230654.637939-6-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/multifd.c | 10 ++++++----
 migration/multifd.h |  2 +-
 migration/ram.c     | 29 ++++++++++++++++++++++-------
 3 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 2a8c8570c3..15fb668e64 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -566,17 +566,17 @@ void multifd_save_cleanup(void)
     multifd_send_state = NULL;
 }
 
-void multifd_send_sync_main(QEMUFile *f)
+int multifd_send_sync_main(QEMUFile *f)
 {
     int i;
 
     if (!migrate_use_multifd()) {
-        return;
+        return 0;
     }
     if (multifd_send_state->pages->num) {
         if (multifd_send_pages(f) < 0) {
             error_report("%s: multifd_send_pages fail", __func__);
-            return;
+            return -1;
         }
     }
     for (i = 0; i < migrate_multifd_channels(); i++) {
@@ -589,7 +589,7 @@ void multifd_send_sync_main(QEMUFile *f)
         if (p->quit) {
             error_report("%s: channel %d has already quit", __func__, i);
             qemu_mutex_unlock(&p->mutex);
-            return;
+            return -1;
         }
 
         p->packet_num = multifd_send_state->packet_num++;
@@ -608,6 +608,8 @@ void multifd_send_sync_main(QEMUFile *f)
         qemu_sem_wait(&p->sem_sync);
     }
     trace_multifd_send_sync_main(multifd_send_state->packet_num);
+
+    return 0;
 }
 
 static void *multifd_send_thread(void *opaque)
diff --git a/migration/multifd.h b/migration/multifd.h
index 7d0effcb03..bcf5992945 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -20,7 +20,7 @@ int multifd_load_cleanup(Error **errp);
 bool multifd_recv_all_channels_created(void);
 bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
 void multifd_recv_sync_main(void);
-void multifd_send_sync_main(QEMUFile *f);
+int multifd_send_sync_main(QEMUFile *f);
 int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
 
 /* Multifd Compression flags */
diff --git a/migration/ram.c b/migration/ram.c
index a2489a2699..5f5e37f64d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2909,6 +2909,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMState **rsp = opaque;
     RAMBlock *block;
+    int ret;
 
     if (compress_threads_save_setup()) {
         return -1;
@@ -2943,7 +2944,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     ram_control_before_iterate(f, RAM_CONTROL_SETUP);
     ram_control_after_iterate(f, RAM_CONTROL_SETUP);
 
-    multifd_send_sync_main(f);
+    ret =  multifd_send_sync_main(f);
+    if (ret < 0) {
+        return ret;
+    }
+
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
     qemu_fflush(f);
 
@@ -3052,7 +3057,11 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 out:
     if (ret >= 0
         && migration_is_setup_or_active(migrate_get_current()->state)) {
-        multifd_send_sync_main(rs->f);
+        ret = multifd_send_sync_main(rs->f);
+        if (ret < 0) {
+            return ret;
+        }
+
         qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
         qemu_fflush(f);
         ram_transferred_add(8);
@@ -3112,13 +3121,19 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
         ram_control_after_iterate(f, RAM_CONTROL_FINISH);
     }
 
-    if (ret >= 0) {
-        multifd_send_sync_main(rs->f);
-        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
-        qemu_fflush(f);
+    if (ret < 0) {
+        return ret;
     }
 
-    return ret;
+    ret = multifd_send_sync_main(rs->f);
+    if (ret < 0) {
+        return ret;
+    }
+
+    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+    qemu_fflush(f);
+
+    return 0;
 }
 
 static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PULL 10/11] multifd: Send header packet without flags if zero-copy-send is enabled
  2022-04-28 14:40 [PULL 00/11] migration queue Dr. David Alan Gilbert (git)
                   ` (8 preceding siblings ...)
  2022-04-28 14:40 ` [PULL 09/11] multifd: multifd_send_sync_main now returns negative on error Dr. David Alan Gilbert (git)
@ 2022-04-28 14:40 ` Dr. David Alan Gilbert (git)
  2022-04-28 14:40 ` [PULL 11/11] multifd: Implement zero copy write in multifd migration (multifd-zero-copy) Dr. David Alan Gilbert (git)
  10 siblings, 0 replies; 20+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-04-28 14:40 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, leobras, berrange

From: Leonardo Bras <leobras@redhat.com>

Since d48c3a0445 ("multifd: Use a single writev on the send side"),
sending the header packet and the memory pages happens in the same
writev, which can potentially make the migration faster.

Using channel-socket as example, this works well with the default copying
mechanism of sendmsg(), but with zero-copy-send=true, it will cause
the migration to often break.

This happens because the header packet buffer gets reused quite often,
and there is a high chance that by the time the MSG_ZEROCOPY mechanism get
to send the buffer, it has already changed, sending the wrong data and
causing the migration to abort.

It means that, as it is, the buffer for the header packet is not suitable
for sending with MSG_ZEROCOPY.

In order to enable zero copy for multifd, send the header packet on an
individual write(), without any flags, and the remanining pages with a
writev(), as it was happening before. This only changes how a migration
with zero-copy-send=true works, not changing any current behavior for
migrations with zero-copy-send=false.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Message-Id: <20220426230654.637939-7-leobras@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: Removed blank line
---
 migration/multifd.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 15fb668e64..2541cd2322 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -617,6 +617,7 @@ static void *multifd_send_thread(void *opaque)
     MultiFDSendParams *p = opaque;
     Error *local_err = NULL;
     int ret = 0;
+    bool use_zero_copy_send = migrate_use_zero_copy_send();
 
     trace_multifd_send_thread_start(p->id);
     rcu_register_thread();
@@ -639,9 +640,14 @@ static void *multifd_send_thread(void *opaque)
         if (p->pending_job) {
             uint64_t packet_num = p->packet_num;
             uint32_t flags = p->flags;
-            p->iovs_num = 1;
             p->normal_num = 0;
 
+            if (use_zero_copy_send) {
+                p->iovs_num = 0;
+            } else {
+                p->iovs_num = 1;
+            }
+
             for (int i = 0; i < p->pages->num; i++) {
                 p->normal[p->normal_num] = p->pages->offset[i];
                 p->normal_num++;
@@ -665,8 +671,18 @@ static void *multifd_send_thread(void *opaque)
             trace_multifd_send(p->id, packet_num, p->normal_num, flags,
                                p->next_packet_size);
 
-            p->iov[0].iov_len = p->packet_len;
-            p->iov[0].iov_base = p->packet;
+            if (use_zero_copy_send) {
+                /* Send header first, without zerocopy */
+                ret = qio_channel_write_all(p->c, (void *)p->packet,
+                                            p->packet_len, &local_err);
+                if (ret != 0) {
+                    break;
+                }
+            } else {
+                /* Send header using the same writev call */
+                p->iov[0].iov_len = p->packet_len;
+                p->iov[0].iov_base = p->packet;
+            }
 
             ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
                                          &local_err);
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PULL 11/11] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)
  2022-04-28 14:40 [PULL 00/11] migration queue Dr. David Alan Gilbert (git)
                   ` (9 preceding siblings ...)
  2022-04-28 14:40 ` [PULL 10/11] multifd: Send header packet without flags if zero-copy-send is enabled Dr. David Alan Gilbert (git)
@ 2022-04-28 14:40 ` Dr. David Alan Gilbert (git)
  10 siblings, 0 replies; 20+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-04-28 14:40 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, leobras, berrange

From: Leonardo Bras <leobras@redhat.com>

Implement zero copy send on nocomp_send_write(), by making use of QIOChannel
writev + flags & flush interface.

Change multifd_send_sync_main() so flush_zero_copy() can be called
after each iteration in order to make sure all dirty pages are sent before
a new iteration is started. It will also flush at the beginning and at the
end of migration.

Also make it return -1 if flush_zero_copy() fails, in order to cancel
the migration process, and avoid resuming the guest in the target host
without receiving all current RAM.

This will work fine on RAM migration because the RAM pages are not usually freed,
and there is no problem on changing the pages content between writev_zero_copy() and
the actual sending of the buffer, because this change will dirty the page and
cause it to be re-sent on a next iteration anyway.

A lot of locked memory may be needed in order to use multifd migration
with zero-copy enabled, so disabling the feature should be necessary for
low-privileged users trying to perform multifd migrations.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Message-Id: <20220426230654.637939-8-leobras@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 11 ++++++++++-
 migration/multifd.c   | 37 +++++++++++++++++++++++++++++++++++--
 migration/multifd.h   |  2 ++
 migration/socket.c    |  5 +++--
 4 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 4b6df2eb5e..31739b2af9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1497,7 +1497,16 @@ static bool migrate_params_check(MigrationParameters *params, Error **errp)
         error_prepend(errp, "Invalid mapping given for block-bitmap-mapping: ");
         return false;
     }
-
+#ifdef CONFIG_LINUX
+    if (params->zero_copy_send &&
+        (!migrate_use_multifd() ||
+         params->multifd_compression != MULTIFD_COMPRESSION_NONE ||
+         (params->tls_creds && *params->tls_creds))) {
+        error_setg(errp,
+                   "Zero copy only available for non-compressed non-TLS multifd migration");
+        return false;
+    }
+#endif
     return true;
 }
 
diff --git a/migration/multifd.c b/migration/multifd.c
index 2541cd2322..9282ab6aa4 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -569,6 +569,7 @@ void multifd_save_cleanup(void)
 int multifd_send_sync_main(QEMUFile *f)
 {
     int i;
+    bool flush_zero_copy;
 
     if (!migrate_use_multifd()) {
         return 0;
@@ -579,6 +580,20 @@ int multifd_send_sync_main(QEMUFile *f)
             return -1;
         }
     }
+
+    /*
+     * When using zero-copy, it's necessary to flush the pages before any of
+     * the pages can be sent again, so we'll make sure the new version of the
+     * pages will always arrive _later_ than the old pages.
+     *
+     * Currently we achieve this by flushing the zero-page requested writes
+     * per ram iteration, but in the future we could potentially optimize it
+     * to be less frequent, e.g. only after we finished one whole scanning of
+     * all the dirty bitmaps.
+     */
+
+    flush_zero_copy = migrate_use_zero_copy_send();
+
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
 
@@ -600,6 +615,17 @@ int multifd_send_sync_main(QEMUFile *f)
         ram_counters.transferred += p->packet_len;
         qemu_mutex_unlock(&p->mutex);
         qemu_sem_post(&p->sem);
+
+        if (flush_zero_copy && p->c) {
+            int ret;
+            Error *err = NULL;
+
+            ret = qio_channel_flush(p->c, &err);
+            if (ret < 0) {
+                error_report_err(err);
+                return -1;
+            }
+        }
     }
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
@@ -684,8 +710,8 @@ static void *multifd_send_thread(void *opaque)
                 p->iov[0].iov_base = p->packet;
             }
 
-            ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
-                                         &local_err);
+            ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL,
+                                              0, p->write_flags, &local_err);
             if (ret != 0) {
                 break;
             }
@@ -913,6 +939,13 @@ int multifd_save_setup(Error **errp)
         /* We need one extra place for the packet header */
         p->iov = g_new0(struct iovec, page_count + 1);
         p->normal = g_new0(ram_addr_t, page_count);
+
+        if (migrate_use_zero_copy_send()) {
+            p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY;
+        } else {
+            p->write_flags = 0;
+        }
+
         socket_send_channel_create(multifd_new_send_channel_async, p);
     }
 
diff --git a/migration/multifd.h b/migration/multifd.h
index bcf5992945..4d8d89e5e5 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -92,6 +92,8 @@ typedef struct {
     uint32_t packet_len;
     /* pointer to the packet */
     MultiFDPacket_t *packet;
+    /* multifd flags for sending ram */
+    int write_flags;
     /* multifd flags for each packet */
     uint32_t flags;
     /* size of the next packet that contains pages */
diff --git a/migration/socket.c b/migration/socket.c
index 3754d8f72c..4fd5e85f50 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -79,8 +79,9 @@ static void socket_outgoing_migration(QIOTask *task,
 
     trace_migration_socket_outgoing_connected(data->hostname);
 
-    if (migrate_use_zero_copy_send()) {
-        error_setg(&err, "Zero copy send not available in migration");
+    if (migrate_use_zero_copy_send() &&
+        !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY)) {
+        error_setg(&err, "Zero copy send feature not detected in host kernel");
     }
 
 out:
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PULL 06/11] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
  2022-04-28 14:40 ` [PULL 06/11] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX Dr. David Alan Gilbert (git)
@ 2022-04-28 16:20   ` Dr. David Alan Gilbert
  2022-04-30  2:40     ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 20+ messages in thread
From: Dr. David Alan Gilbert @ 2022-04-28 16:20 UTC (permalink / raw)
  To: qemu-devel, berrange, quintela, peterx

Leo:
  Unfortunately this is failing a couple of CI tests; the MSG_ZEROCOPY
one I guess is the simpler one; I think Stefanha managed to find the
liburing fix for the __kernel_timespec case, but that looks like a bit
more fun!

Dave


Job #2390848140 ( https://gitlab.com/dagrh/qemu/-/jobs/2390848140/raw )
Name: build-system-alpine
In file included from /usr/include/linux/errqueue.h:6,
                 from ../io/channel-socket.c:29:
/usr/include/linux/time_types.h:7:8: error: redefinition of 'struct __kernel_timespec'
    7 | struct __kernel_timespec {
      |        ^~~~~~~~~~~~~~~~~
In file included from /usr/include/liburing.h:19,
                 from /builds/dagrh/qemu/include/block/aio.h:18,
                 from /builds/dagrh/qemu/include/io/channel.h:26,
                 from /builds/dagrh/qemu/include/io/channel-socket.h:24,
                 from ../io/channel-socket.c:24:
/usr/include/liburing/compat.h:9:8: note: originally defined here
    9 | struct __kernel_timespec {
      |        ^~~~~~~~~~~~~~~~~

----
Name: build-system-opensuse

https://gitlab.com/dagrh/qemu/-/jobs/2390848160/raw
../io/channel-socket.c: In function ‘qio_channel_socket_writev’:
../io/channel-socket.c:578:18: error: ‘MSG_ZEROCOPY’ undeclared (first use in this function); did you mean ‘SO_ZEROCOPY’?
         sflags = MSG_ZEROCOPY;
                  ^~~~~~~~~~~~
                  SO_ZEROCOPY
../io/channel-socket.c:578:18: note: each undeclared identifier is reported only once for each function it appears in

* Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
> From: Leonardo Bras <leobras@redhat.com>
> 
> For CONFIG_LINUX, implement the new zero copy flag and the optional callback
> io_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
> feature is available in the host kernel, which is checked on
> qio_channel_socket_connect_sync()
> 
> qio_channel_socket_flush() was implemented by counting how many times
> sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the
> socket's error queue, in order to find how many of them finished sending.
> Flush will loop until those counters are the same, or until some error occurs.
> 
> Notes on using writev() with QIO_CHANNEL_WRITE_FLAG_ZERO_COPY:
> 1: Buffer
> - As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying,
> some caution is necessary to avoid overwriting any buffer before it's sent.
> If something like this happen, a newer version of the buffer may be sent instead.
> - If this is a problem, it's recommended to call qio_channel_flush() before freeing
> or re-using the buffer.
> 
> 2: Locked memory
> - When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and
> unlocked after it's sent.
> - Depending on the size of each buffer, and how often it's sent, it may require
> a larger amount of locked memory than usually available to non-root user.
> - If the required amount of locked memory is not available, writev_zero_copy
> will return an error, which can abort an operation like migration,
> - Because of this, when an user code wants to add zero copy as a feature, it
> requires a mechanism to disable it, so it can still be accessible to less
> privileged users.
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> Message-Id: <20220426230654.637939-3-leobras@redhat.com>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/io/channel-socket.h |   2 +
>  io/channel-socket.c         | 108 ++++++++++++++++++++++++++++++++++--
>  2 files changed, 106 insertions(+), 4 deletions(-)
> 
> diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
> index e747e63514..513c428fe4 100644
> --- a/include/io/channel-socket.h
> +++ b/include/io/channel-socket.h
> @@ -47,6 +47,8 @@ struct QIOChannelSocket {
>      socklen_t localAddrLen;
>      struct sockaddr_storage remoteAddr;
>      socklen_t remoteAddrLen;
> +    ssize_t zero_copy_queued;
> +    ssize_t zero_copy_sent;
>  };
>  
>  
> diff --git a/io/channel-socket.c b/io/channel-socket.c
> index 696a04dc9c..1dd85fc1ef 100644
> --- a/io/channel-socket.c
> +++ b/io/channel-socket.c
> @@ -25,6 +25,10 @@
>  #include "io/channel-watch.h"
>  #include "trace.h"
>  #include "qapi/clone-visitor.h"
> +#ifdef CONFIG_LINUX
> +#include <linux/errqueue.h>
> +#include <bits/socket.h>
> +#endif
>  
>  #define SOCKET_MAX_FDS 16
>  
> @@ -54,6 +58,8 @@ qio_channel_socket_new(void)
>  
>      sioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET));
>      sioc->fd = -1;
> +    sioc->zero_copy_queued = 0;
> +    sioc->zero_copy_sent = 0;
>  
>      ioc = QIO_CHANNEL(sioc);
>      qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN);
> @@ -153,6 +159,16 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
>          return -1;
>      }
>  
> +#ifdef CONFIG_LINUX
> +    int ret, v = 1;
> +    ret = setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v));
> +    if (ret == 0) {
> +        /* Zero copy available on host */
> +        qio_channel_set_feature(QIO_CHANNEL(ioc),
> +                                QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY);
> +    }
> +#endif
> +
>      return 0;
>  }
>  
> @@ -533,6 +549,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
>      char control[CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)];
>      size_t fdsize = sizeof(int) * nfds;
>      struct cmsghdr *cmsg;
> +    int sflags = 0;
>  
>      memset(control, 0, CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS));
>  
> @@ -557,15 +574,27 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
>          memcpy(CMSG_DATA(cmsg), fds, fdsize);
>      }
>  
> +    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> +        sflags = MSG_ZEROCOPY;
> +    }
> +
>   retry:
> -    ret = sendmsg(sioc->fd, &msg, 0);
> +    ret = sendmsg(sioc->fd, &msg, sflags);
>      if (ret <= 0) {
> -        if (errno == EAGAIN) {
> +        switch (errno) {
> +        case EAGAIN:
>              return QIO_CHANNEL_ERR_BLOCK;
> -        }
> -        if (errno == EINTR) {
> +        case EINTR:
>              goto retry;
> +        case ENOBUFS:
> +            if (sflags & MSG_ZEROCOPY) {
> +                error_setg_errno(errp, errno,
> +                                 "Process can't lock enough memory for using MSG_ZEROCOPY");
> +                return -1;
> +            }
> +            break;
>          }
> +
>          error_setg_errno(errp, errno,
>                           "Unable to write to socket");
>          return -1;
> @@ -659,6 +688,74 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
>  }
>  #endif /* WIN32 */
>  
> +
> +#ifdef CONFIG_LINUX
> +static int qio_channel_socket_flush(QIOChannel *ioc,
> +                                    Error **errp)
> +{
> +    QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
> +    struct msghdr msg = {};
> +    struct sock_extended_err *serr;
> +    struct cmsghdr *cm;
> +    char control[CMSG_SPACE(sizeof(*serr))];
> +    int received;
> +    int ret = 1;
> +
> +    msg.msg_control = control;
> +    msg.msg_controllen = sizeof(control);
> +    memset(control, 0, sizeof(control));
> +
> +    while (sioc->zero_copy_sent < sioc->zero_copy_queued) {
> +        received = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE);
> +        if (received < 0) {
> +            switch (errno) {
> +            case EAGAIN:
> +                /* Nothing on errqueue, wait until something is available */
> +                qio_channel_wait(ioc, G_IO_ERR);
> +                continue;
> +            case EINTR:
> +                continue;
> +            default:
> +                error_setg_errno(errp, errno,
> +                                 "Unable to read errqueue");
> +                return -1;
> +            }
> +        }
> +
> +        cm = CMSG_FIRSTHDR(&msg);
> +        if (cm->cmsg_level != SOL_IP &&
> +            cm->cmsg_type != IP_RECVERR) {
> +            error_setg_errno(errp, EPROTOTYPE,
> +                             "Wrong cmsg in errqueue");
> +            return -1;
> +        }
> +
> +        serr = (void *) CMSG_DATA(cm);
> +        if (serr->ee_errno != SO_EE_ORIGIN_NONE) {
> +            error_setg_errno(errp, serr->ee_errno,
> +                             "Error on socket");
> +            return -1;
> +        }
> +        if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) {
> +            error_setg_errno(errp, serr->ee_origin,
> +                             "Error not from zero copy");
> +            return -1;
> +        }
> +
> +        /* No errors, count successfully finished sendmsg()*/
> +        sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
> +
> +        /* If any sendmsg() succeeded using zero copy, return 0 at the end */
> +        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
> +            ret = 0;
> +        }
> +    }
> +
> +    return ret;
> +}
> +
> +#endif /* CONFIG_LINUX */
> +
>  static int
>  qio_channel_socket_set_blocking(QIOChannel *ioc,
>                                  bool enabled,
> @@ -789,6 +886,9 @@ static void qio_channel_socket_class_init(ObjectClass *klass,
>      ioc_klass->io_set_delay = qio_channel_socket_set_delay;
>      ioc_klass->io_create_watch = qio_channel_socket_create_watch;
>      ioc_klass->io_set_aio_fd_handler = qio_channel_socket_set_aio_fd_handler;
> +#ifdef CONFIG_LINUX
> +    ioc_klass->io_flush = qio_channel_socket_flush;
> +#endif
>  }
>  
>  static const TypeInfo qio_channel_socket_info = {
> -- 
> 2.35.1
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PULL 06/11] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
  2022-04-28 16:20   ` Dr. David Alan Gilbert
@ 2022-04-30  2:40     ` Leonardo Bras Soares Passos
  2022-05-02 23:51       ` Peter Xu
  2022-05-03  8:30       ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 20+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-04-30  2:40 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Daniel Berrange, qemu-devel, Peter Xu, Juan Quintela

Hello Dave,

On Thu, Apr 28, 2022 at 1:20 PM Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
>
> Leo:
>   Unfortunately this is failing a couple of CI tests; the MSG_ZEROCOPY
> one I guess is the simpler one; I think Stefanha managed to find the
> liburing fix for the __kernel_timespec case, but that looks like a bit
> more fun!
>
> Dave

About MSG_ZEROCOPY error:

I tracked down how the test happened, downloaded the same docker image from the
tests(opensuse-leap-15.2), and took a look at the filesystem for the
MSG_ZEROCOPY define, which I could not find anywhere.

Then I took a look into /usr/include/bits/socket.h, which is where RHEL has
MSG_ZEROCOPY defined. Zypper defines it as been provided by glibc-devel, which
is versioned at 2.26-lp152.26.12.1.

I then took a look at https://sourceware.org/git/glibc.git, and found commit
78cde19f62 that introduces MSG_ZEROCOPY. The first version that has this commit
is glibc-2.27.

So, basically, this means opensuse-leap-15.2 glibc version does not support
MSG_ZEROCOPY. Based on that, I had a few ideas on how to solve the CI bug:
1 - Propose a backport of this patch (few comments +  single define) for
leap-15.x, wait for them to accept and update the version in qemu CI.
(TBH I have no idea how the opensuse community works, I just suppose it could
be a way of tackling this.)
2 - include an #ifndef MSG_ZEROCOPY #define MSG_ZEROCOPY 0x4000000 #endif in
code, which is ugly IMHO, but will be fast and clean.
3 - In CI, patch /usr/include/bits/socket.h before building, which will also
work fine, but defeats the purpose of keeping qemu building on the platform.

Among the above, I would go with (2), as it seems a reasonable way of dealing
with this.

Does anyone else have any further suggestions, or know how this kind of issue
is generally solved in qemu?

Best regards,
Leo


>
>
> Job #2390848140 ( https://gitlab.com/dagrh/qemu/-/jobs/2390848140/raw )
> Name: build-system-alpine
> In file included from /usr/include/linux/errqueue.h:6,
>                  from ../io/channel-socket.c:29:
> /usr/include/linux/time_types.h:7:8: error: redefinition of 'struct __kernel_timespec'
>     7 | struct __kernel_timespec {
>       |        ^~~~~~~~~~~~~~~~~
> In file included from /usr/include/liburing.h:19,
>                  from /builds/dagrh/qemu/include/block/aio.h:18,
>                  from /builds/dagrh/qemu/include/io/channel.h:26,
>                  from /builds/dagrh/qemu/include/io/channel-socket.h:24,
>                  from ../io/channel-socket.c:24:
> /usr/include/liburing/compat.h:9:8: note: originally defined here
>     9 | struct __kernel_timespec {
>       |        ^~~~~~~~~~~~~~~~~
>
> ----
> Name: build-system-opensuse
>
> https://gitlab.com/dagrh/qemu/-/jobs/2390848160/raw
> ../io/channel-socket.c: In function ‘qio_channel_socket_writev’:
> ../io/channel-socket.c:578:18: error: ‘MSG_ZEROCOPY’ undeclared (first use in this function); did you mean ‘SO_ZEROCOPY’?
>          sflags = MSG_ZEROCOPY;
>                   ^~~~~~~~~~~~
>                   SO_ZEROCOPY
> ../io/channel-socket.c:578:18: note: each undeclared identifier is reported only once for each function it appears in
>
> * Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
> > From: Leonardo Bras <leobras@redhat.com>
> >
> > For CONFIG_LINUX, implement the new zero copy flag and the optional callback
> > io_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
> > feature is available in the host kernel, which is checked on
> > qio_channel_socket_connect_sync()
> >
> > qio_channel_socket_flush() was implemented by counting how many times
> > sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the
> > socket's error queue, in order to find how many of them finished sending.
> > Flush will loop until those counters are the same, or until some error occurs.
> >
> > Notes on using writev() with QIO_CHANNEL_WRITE_FLAG_ZERO_COPY:
> > 1: Buffer
> > - As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying,
> > some caution is necessary to avoid overwriting any buffer before it's sent.
> > If something like this happen, a newer version of the buffer may be sent instead.
> > - If this is a problem, it's recommended to call qio_channel_flush() before freeing
> > or re-using the buffer.
> >
> > 2: Locked memory
> > - When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and
> > unlocked after it's sent.
> > - Depending on the size of each buffer, and how often it's sent, it may require
> > a larger amount of locked memory than usually available to non-root user.
> > - If the required amount of locked memory is not available, writev_zero_copy
> > will return an error, which can abort an operation like migration,
> > - Because of this, when an user code wants to add zero copy as a feature, it
> > requires a mechanism to disable it, so it can still be accessible to less
> > privileged users.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> > Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
> > Reviewed-by: Juan Quintela <quintela@redhat.com>
> > Message-Id: <20220426230654.637939-3-leobras@redhat.com>
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/io/channel-socket.h |   2 +
> >  io/channel-socket.c         | 108 ++++++++++++++++++++++++++++++++++--
> >  2 files changed, 106 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
> > index e747e63514..513c428fe4 100644
> > --- a/include/io/channel-socket.h
> > +++ b/include/io/channel-socket.h
> > @@ -47,6 +47,8 @@ struct QIOChannelSocket {
> >      socklen_t localAddrLen;
> >      struct sockaddr_storage remoteAddr;
> >      socklen_t remoteAddrLen;
> > +    ssize_t zero_copy_queued;
> > +    ssize_t zero_copy_sent;
> >  };
> >
> >
> > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > index 696a04dc9c..1dd85fc1ef 100644
> > --- a/io/channel-socket.c
> > +++ b/io/channel-socket.c
> > @@ -25,6 +25,10 @@
> >  #include "io/channel-watch.h"
> >  #include "trace.h"
> >  #include "qapi/clone-visitor.h"
> > +#ifdef CONFIG_LINUX
> > +#include <linux/errqueue.h>
> > +#include <bits/socket.h>
> > +#endif
> >
> >  #define SOCKET_MAX_FDS 16
> >
> > @@ -54,6 +58,8 @@ qio_channel_socket_new(void)
> >
> >      sioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET));
> >      sioc->fd = -1;
> > +    sioc->zero_copy_queued = 0;
> > +    sioc->zero_copy_sent = 0;
> >
> >      ioc = QIO_CHANNEL(sioc);
> >      qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN);
> > @@ -153,6 +159,16 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
> >          return -1;
> >      }
> >
> > +#ifdef CONFIG_LINUX
> > +    int ret, v = 1;
> > +    ret = setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v));
> > +    if (ret == 0) {
> > +        /* Zero copy available on host */
> > +        qio_channel_set_feature(QIO_CHANNEL(ioc),
> > +                                QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY);
> > +    }
> > +#endif
> > +
> >      return 0;
> >  }
> >
> > @@ -533,6 +549,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> >      char control[CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)];
> >      size_t fdsize = sizeof(int) * nfds;
> >      struct cmsghdr *cmsg;
> > +    int sflags = 0;
> >
> >      memset(control, 0, CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS));
> >
> > @@ -557,15 +574,27 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> >          memcpy(CMSG_DATA(cmsg), fds, fdsize);
> >      }
> >
> > +    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> > +        sflags = MSG_ZEROCOPY;
> > +    }
> > +
> >   retry:
> > -    ret = sendmsg(sioc->fd, &msg, 0);
> > +    ret = sendmsg(sioc->fd, &msg, sflags);
> >      if (ret <= 0) {
> > -        if (errno == EAGAIN) {
> > +        switch (errno) {
> > +        case EAGAIN:
> >              return QIO_CHANNEL_ERR_BLOCK;
> > -        }
> > -        if (errno == EINTR) {
> > +        case EINTR:
> >              goto retry;
> > +        case ENOBUFS:
> > +            if (sflags & MSG_ZEROCOPY) {
> > +                error_setg_errno(errp, errno,
> > +                                 "Process can't lock enough memory for using MSG_ZEROCOPY");
> > +                return -1;
> > +            }
> > +            break;
> >          }
> > +
> >          error_setg_errno(errp, errno,
> >                           "Unable to write to socket");
> >          return -1;
> > @@ -659,6 +688,74 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> >  }
> >  #endif /* WIN32 */
> >
> > +
> > +#ifdef CONFIG_LINUX
> > +static int qio_channel_socket_flush(QIOChannel *ioc,
> > +                                    Error **errp)
> > +{
> > +    QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
> > +    struct msghdr msg = {};
> > +    struct sock_extended_err *serr;
> > +    struct cmsghdr *cm;
> > +    char control[CMSG_SPACE(sizeof(*serr))];
> > +    int received;
> > +    int ret = 1;
> > +
> > +    msg.msg_control = control;
> > +    msg.msg_controllen = sizeof(control);
> > +    memset(control, 0, sizeof(control));
> > +
> > +    while (sioc->zero_copy_sent < sioc->zero_copy_queued) {
> > +        received = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE);
> > +        if (received < 0) {
> > +            switch (errno) {
> > +            case EAGAIN:
> > +                /* Nothing on errqueue, wait until something is available */
> > +                qio_channel_wait(ioc, G_IO_ERR);
> > +                continue;
> > +            case EINTR:
> > +                continue;
> > +            default:
> > +                error_setg_errno(errp, errno,
> > +                                 "Unable to read errqueue");
> > +                return -1;
> > +            }
> > +        }
> > +
> > +        cm = CMSG_FIRSTHDR(&msg);
> > +        if (cm->cmsg_level != SOL_IP &&
> > +            cm->cmsg_type != IP_RECVERR) {
> > +            error_setg_errno(errp, EPROTOTYPE,
> > +                             "Wrong cmsg in errqueue");
> > +            return -1;
> > +        }
> > +
> > +        serr = (void *) CMSG_DATA(cm);
> > +        if (serr->ee_errno != SO_EE_ORIGIN_NONE) {
> > +            error_setg_errno(errp, serr->ee_errno,
> > +                             "Error on socket");
> > +            return -1;
> > +        }
> > +        if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) {
> > +            error_setg_errno(errp, serr->ee_origin,
> > +                             "Error not from zero copy");
> > +            return -1;
> > +        }
> > +
> > +        /* No errors, count successfully finished sendmsg()*/
> > +        sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
> > +
> > +        /* If any sendmsg() succeeded using zero copy, return 0 at the end */
> > +        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
> > +            ret = 0;
> > +        }
> > +    }
> > +
> > +    return ret;
> > +}
> > +
> > +#endif /* CONFIG_LINUX */
> > +
> >  static int
> >  qio_channel_socket_set_blocking(QIOChannel *ioc,
> >                                  bool enabled,
> > @@ -789,6 +886,9 @@ static void qio_channel_socket_class_init(ObjectClass *klass,
> >      ioc_klass->io_set_delay = qio_channel_socket_set_delay;
> >      ioc_klass->io_create_watch = qio_channel_socket_create_watch;
> >      ioc_klass->io_set_aio_fd_handler = qio_channel_socket_set_aio_fd_handler;
> > +#ifdef CONFIG_LINUX
> > +    ioc_klass->io_flush = qio_channel_socket_flush;
> > +#endif
> >  }
> >
> >  static const TypeInfo qio_channel_socket_info = {
> > --
> > 2.35.1
> >
> >
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PULL 06/11] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
  2022-04-30  2:40     ` Leonardo Bras Soares Passos
@ 2022-05-02 23:51       ` Peter Xu
  2022-05-03  0:12         ` Leonardo Bras Soares Passos
  2022-05-03  8:30       ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Xu @ 2022-05-02 23:51 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: Dr. David Alan Gilbert, qemu-devel, Daniel Berrange, Juan Quintela

Leo,

On Fri, Apr 29, 2022 at 11:40:44PM -0300, Leonardo Bras Soares Passos wrote:
> Does anyone else have any further suggestions, or know how this kind of issue
> is generally solved in qemu?

I've no solid idea why it can't see MSG_ZEROCOPY defined in the specific
environment, but when I was looking at bits/socket.h I saw this:

#ifndef _SYS_SOCKET_H
# error "Never include <bits/socket.h> directly; use <sys/socket.h> instead."
#endif

Maybe worth a shot to do a replacement in all cases?

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PULL 06/11] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
  2022-05-02 23:51       ` Peter Xu
@ 2022-05-03  0:12         ` Leonardo Bras Soares Passos
  2022-05-03  1:22           ` Peter Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-05-03  0:12 UTC (permalink / raw)
  To: Peter Xu
  Cc: Dr. David Alan Gilbert, qemu-devel, Daniel Berrange, Juan Quintela

Hello Peter,

On Mon, May 2, 2022 at 8:52 PM Peter Xu <peterx@redhat.com> wrote:
>
> Leo,
>
> On Fri, Apr 29, 2022 at 11:40:44PM -0300, Leonardo Bras Soares Passos wrote:
> > Does anyone else have any further suggestions, or know how this kind of issue
> > is generally solved in qemu?
>
> I've no solid idea why it can't see MSG_ZEROCOPY defined in the specific
> environment, but when I was looking at bits/socket.h I saw this:
>
> #ifndef _SYS_SOCKET_H
> # error "Never include <bits/socket.h> directly; use <sys/socket.h> instead."
> #endif
>
> Maybe worth a shot to do a replacement in all cases?
>

Sure, no problem with this, I will update for v11.
(Or should I send a different patch since Dave has already merged in his tree?)

But it should not interfere in MSG_ZEROCOPY definition:

> > I tracked down how the test happened, downloaded the same docker image from the
> > tests(opensuse-leap-15.2), and took a look at the filesystem for the
> > MSG_ZEROCOPY define, which I could not find anywhere.

By this, I mean I did a 'grep MSG_ZEROCOPY -r /' and could not find anything, so
it's probably not defined anywhere in the fs.

> --
> Peter Xu
>

Thanks Peter!

Best regards,
Leo



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PULL 06/11] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
  2022-05-03  0:12         ` Leonardo Bras Soares Passos
@ 2022-05-03  1:22           ` Peter Xu
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Xu @ 2022-05-03  1:22 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: Dr. David Alan Gilbert, qemu-devel, Daniel Berrange, Juan Quintela

On Mon, May 02, 2022 at 09:12:53PM -0300, Leonardo Bras Soares Passos wrote:
> Hello Peter,
> 
> On Mon, May 2, 2022 at 8:52 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > Leo,
> >
> > On Fri, Apr 29, 2022 at 11:40:44PM -0300, Leonardo Bras Soares Passos wrote:
> > > Does anyone else have any further suggestions, or know how this kind of issue
> > > is generally solved in qemu?
> >
> > I've no solid idea why it can't see MSG_ZEROCOPY defined in the specific
> > environment, but when I was looking at bits/socket.h I saw this:
> >
> > #ifndef _SYS_SOCKET_H
> > # error "Never include <bits/socket.h> directly; use <sys/socket.h> instead."
> > #endif
> >
> > Maybe worth a shot to do a replacement in all cases?
> >
> 
> Sure, no problem with this, I will update for v11.
> (Or should I send a different patch since Dave has already merged in his tree?)
> 
> But it should not interfere in MSG_ZEROCOPY definition:
> 
> > > I tracked down how the test happened, downloaded the same docker image from the
> > > tests(opensuse-leap-15.2), and took a look at the filesystem for the
> > > MSG_ZEROCOPY define, which I could not find anywhere.
> 
> By this, I mean I did a 'grep MSG_ZEROCOPY -r /' and could not find anything, so
> it's probably not defined anywhere in the fs.

What you described gives me the feeling that the distro seems to have had
mismatched versions of asm-generic/socket.h (who should define
SO_ZEROCOPY), and bits/socket.h (who should define MSG_ZEROCOPY).

Let's first replace it with sys/socket.h, then one trick you could consider
play with (even if any env could have broken headers) that I thought of, is
you can put your code into:

#if defined(MSG_ZEROCOPY) && defined(SO_ZEROCOPY)
...
#endif

Blocks.  Just to avoid assuming CONFIG_LINUX will be the same.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PULL 06/11] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
  2022-04-30  2:40     ` Leonardo Bras Soares Passos
  2022-05-02 23:51       ` Peter Xu
@ 2022-05-03  8:30       ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 20+ messages in thread
From: Dr. David Alan Gilbert @ 2022-05-03  8:30 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: qemu-devel, Daniel Berrange, Juan Quintela, Peter Xu

* Leonardo Bras Soares Passos (lsoaresp@redhat.com) wrote:
> Hello Dave,
> 
> On Thu, Apr 28, 2022 at 1:20 PM Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
> >
> > Leo:
> >   Unfortunately this is failing a couple of CI tests; the MSG_ZEROCOPY
> > one I guess is the simpler one; I think Stefanha managed to find the
> > liburing fix for the __kernel_timespec case, but that looks like a bit
> > more fun!
> >
> > Dave
> 
> About MSG_ZEROCOPY error:
> 
> I tracked down how the test happened, downloaded the same docker image from the
> tests(opensuse-leap-15.2), and took a look at the filesystem for the
> MSG_ZEROCOPY define, which I could not find anywhere.
> 
> Then I took a look into /usr/include/bits/socket.h, which is where RHEL has
> MSG_ZEROCOPY defined. Zypper defines it as been provided by glibc-devel, which
> is versioned at 2.26-lp152.26.12.1.
> 
> I then took a look at https://sourceware.org/git/glibc.git, and found commit
> 78cde19f62 that introduces MSG_ZEROCOPY. The first version that has this commit
> is glibc-2.27.
> 
> So, basically, this means opensuse-leap-15.2 glibc version does not support
> MSG_ZEROCOPY. Based on that, I had a few ideas on how to solve the CI bug:
> 1 - Propose a backport of this patch (few comments +  single define) for
> leap-15.x, wait for them to accept and update the version in qemu CI.
> (TBH I have no idea how the opensuse community works, I just suppose it could
> be a way of tackling this.)
> 2 - include an #ifndef MSG_ZEROCOPY #define MSG_ZEROCOPY 0x4000000 #endif in
> code, which is ugly IMHO, but will be fast and clean.
> 3 - In CI, patch /usr/include/bits/socket.h before building, which will also
> work fine, but defeats the purpose of keeping qemu building on the platform.
> 
> Among the above, I would go with (2), as it seems a reasonable way of dealing
> with this.

Right we need to run on the current set of distros, so we need to do
(2); it's not clear if we need to make trying to enable the feature fail
if the host doesn't support it.

Now, having said that, you might also want to file an Opensuse bug
suggesting they do (1).

Dave

> Does anyone else have any further suggestions, or know how this kind of issue
> is generally solved in qemu?
> 
> Best regards,
> Leo
> 
> 
> >
> >
> > Job #2390848140 ( https://gitlab.com/dagrh/qemu/-/jobs/2390848140/raw )
> > Name: build-system-alpine
> > In file included from /usr/include/linux/errqueue.h:6,
> >                  from ../io/channel-socket.c:29:
> > /usr/include/linux/time_types.h:7:8: error: redefinition of 'struct __kernel_timespec'
> >     7 | struct __kernel_timespec {
> >       |        ^~~~~~~~~~~~~~~~~
> > In file included from /usr/include/liburing.h:19,
> >                  from /builds/dagrh/qemu/include/block/aio.h:18,
> >                  from /builds/dagrh/qemu/include/io/channel.h:26,
> >                  from /builds/dagrh/qemu/include/io/channel-socket.h:24,
> >                  from ../io/channel-socket.c:24:
> > /usr/include/liburing/compat.h:9:8: note: originally defined here
> >     9 | struct __kernel_timespec {
> >       |        ^~~~~~~~~~~~~~~~~
> >
> > ----
> > Name: build-system-opensuse
> >
> > https://gitlab.com/dagrh/qemu/-/jobs/2390848160/raw
> > ../io/channel-socket.c: In function ‘qio_channel_socket_writev’:
> > ../io/channel-socket.c:578:18: error: ‘MSG_ZEROCOPY’ undeclared (first use in this function); did you mean ‘SO_ZEROCOPY’?
> >          sflags = MSG_ZEROCOPY;
> >                   ^~~~~~~~~~~~
> >                   SO_ZEROCOPY
> > ../io/channel-socket.c:578:18: note: each undeclared identifier is reported only once for each function it appears in
> >
> > * Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
> > > From: Leonardo Bras <leobras@redhat.com>
> > >
> > > For CONFIG_LINUX, implement the new zero copy flag and the optional callback
> > > io_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
> > > feature is available in the host kernel, which is checked on
> > > qio_channel_socket_connect_sync()
> > >
> > > qio_channel_socket_flush() was implemented by counting how many times
> > > sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the
> > > socket's error queue, in order to find how many of them finished sending.
> > > Flush will loop until those counters are the same, or until some error occurs.
> > >
> > > Notes on using writev() with QIO_CHANNEL_WRITE_FLAG_ZERO_COPY:
> > > 1: Buffer
> > > - As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying,
> > > some caution is necessary to avoid overwriting any buffer before it's sent.
> > > If something like this happen, a newer version of the buffer may be sent instead.
> > > - If this is a problem, it's recommended to call qio_channel_flush() before freeing
> > > or re-using the buffer.
> > >
> > > 2: Locked memory
> > > - When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and
> > > unlocked after it's sent.
> > > - Depending on the size of each buffer, and how often it's sent, it may require
> > > a larger amount of locked memory than usually available to non-root user.
> > > - If the required amount of locked memory is not available, writev_zero_copy
> > > will return an error, which can abort an operation like migration,
> > > - Because of this, when an user code wants to add zero copy as a feature, it
> > > requires a mechanism to disable it, so it can still be accessible to less
> > > privileged users.
> > >
> > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > Reviewed-by: Peter Xu <peterx@redhat.com>
> > > Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
> > > Reviewed-by: Juan Quintela <quintela@redhat.com>
> > > Message-Id: <20220426230654.637939-3-leobras@redhat.com>
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > ---
> > >  include/io/channel-socket.h |   2 +
> > >  io/channel-socket.c         | 108 ++++++++++++++++++++++++++++++++++--
> > >  2 files changed, 106 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
> > > index e747e63514..513c428fe4 100644
> > > --- a/include/io/channel-socket.h
> > > +++ b/include/io/channel-socket.h
> > > @@ -47,6 +47,8 @@ struct QIOChannelSocket {
> > >      socklen_t localAddrLen;
> > >      struct sockaddr_storage remoteAddr;
> > >      socklen_t remoteAddrLen;
> > > +    ssize_t zero_copy_queued;
> > > +    ssize_t zero_copy_sent;
> > >  };
> > >
> > >
> > > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > > index 696a04dc9c..1dd85fc1ef 100644
> > > --- a/io/channel-socket.c
> > > +++ b/io/channel-socket.c
> > > @@ -25,6 +25,10 @@
> > >  #include "io/channel-watch.h"
> > >  #include "trace.h"
> > >  #include "qapi/clone-visitor.h"
> > > +#ifdef CONFIG_LINUX
> > > +#include <linux/errqueue.h>
> > > +#include <bits/socket.h>
> > > +#endif
> > >
> > >  #define SOCKET_MAX_FDS 16
> > >
> > > @@ -54,6 +58,8 @@ qio_channel_socket_new(void)
> > >
> > >      sioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET));
> > >      sioc->fd = -1;
> > > +    sioc->zero_copy_queued = 0;
> > > +    sioc->zero_copy_sent = 0;
> > >
> > >      ioc = QIO_CHANNEL(sioc);
> > >      qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN);
> > > @@ -153,6 +159,16 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
> > >          return -1;
> > >      }
> > >
> > > +#ifdef CONFIG_LINUX
> > > +    int ret, v = 1;
> > > +    ret = setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v));
> > > +    if (ret == 0) {
> > > +        /* Zero copy available on host */
> > > +        qio_channel_set_feature(QIO_CHANNEL(ioc),
> > > +                                QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY);
> > > +    }
> > > +#endif
> > > +
> > >      return 0;
> > >  }
> > >
> > > @@ -533,6 +549,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> > >      char control[CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)];
> > >      size_t fdsize = sizeof(int) * nfds;
> > >      struct cmsghdr *cmsg;
> > > +    int sflags = 0;
> > >
> > >      memset(control, 0, CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS));
> > >
> > > @@ -557,15 +574,27 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> > >          memcpy(CMSG_DATA(cmsg), fds, fdsize);
> > >      }
> > >
> > > +    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> > > +        sflags = MSG_ZEROCOPY;
> > > +    }
> > > +
> > >   retry:
> > > -    ret = sendmsg(sioc->fd, &msg, 0);
> > > +    ret = sendmsg(sioc->fd, &msg, sflags);
> > >      if (ret <= 0) {
> > > -        if (errno == EAGAIN) {
> > > +        switch (errno) {
> > > +        case EAGAIN:
> > >              return QIO_CHANNEL_ERR_BLOCK;
> > > -        }
> > > -        if (errno == EINTR) {
> > > +        case EINTR:
> > >              goto retry;
> > > +        case ENOBUFS:
> > > +            if (sflags & MSG_ZEROCOPY) {
> > > +                error_setg_errno(errp, errno,
> > > +                                 "Process can't lock enough memory for using MSG_ZEROCOPY");
> > > +                return -1;
> > > +            }
> > > +            break;
> > >          }
> > > +
> > >          error_setg_errno(errp, errno,
> > >                           "Unable to write to socket");
> > >          return -1;
> > > @@ -659,6 +688,74 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> > >  }
> > >  #endif /* WIN32 */
> > >
> > > +
> > > +#ifdef CONFIG_LINUX
> > > +static int qio_channel_socket_flush(QIOChannel *ioc,
> > > +                                    Error **errp)
> > > +{
> > > +    QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
> > > +    struct msghdr msg = {};
> > > +    struct sock_extended_err *serr;
> > > +    struct cmsghdr *cm;
> > > +    char control[CMSG_SPACE(sizeof(*serr))];
> > > +    int received;
> > > +    int ret = 1;
> > > +
> > > +    msg.msg_control = control;
> > > +    msg.msg_controllen = sizeof(control);
> > > +    memset(control, 0, sizeof(control));
> > > +
> > > +    while (sioc->zero_copy_sent < sioc->zero_copy_queued) {
> > > +        received = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE);
> > > +        if (received < 0) {
> > > +            switch (errno) {
> > > +            case EAGAIN:
> > > +                /* Nothing on errqueue, wait until something is available */
> > > +                qio_channel_wait(ioc, G_IO_ERR);
> > > +                continue;
> > > +            case EINTR:
> > > +                continue;
> > > +            default:
> > > +                error_setg_errno(errp, errno,
> > > +                                 "Unable to read errqueue");
> > > +                return -1;
> > > +            }
> > > +        }
> > > +
> > > +        cm = CMSG_FIRSTHDR(&msg);
> > > +        if (cm->cmsg_level != SOL_IP &&
> > > +            cm->cmsg_type != IP_RECVERR) {
> > > +            error_setg_errno(errp, EPROTOTYPE,
> > > +                             "Wrong cmsg in errqueue");
> > > +            return -1;
> > > +        }
> > > +
> > > +        serr = (void *) CMSG_DATA(cm);
> > > +        if (serr->ee_errno != SO_EE_ORIGIN_NONE) {
> > > +            error_setg_errno(errp, serr->ee_errno,
> > > +                             "Error on socket");
> > > +            return -1;
> > > +        }
> > > +        if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) {
> > > +            error_setg_errno(errp, serr->ee_origin,
> > > +                             "Error not from zero copy");
> > > +            return -1;
> > > +        }
> > > +
> > > +        /* No errors, count successfully finished sendmsg()*/
> > > +        sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
> > > +
> > > +        /* If any sendmsg() succeeded using zero copy, return 0 at the end */
> > > +        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
> > > +            ret = 0;
> > > +        }
> > > +    }
> > > +
> > > +    return ret;
> > > +}
> > > +
> > > +#endif /* CONFIG_LINUX */
> > > +
> > >  static int
> > >  qio_channel_socket_set_blocking(QIOChannel *ioc,
> > >                                  bool enabled,
> > > @@ -789,6 +886,9 @@ static void qio_channel_socket_class_init(ObjectClass *klass,
> > >      ioc_klass->io_set_delay = qio_channel_socket_set_delay;
> > >      ioc_klass->io_create_watch = qio_channel_socket_create_watch;
> > >      ioc_klass->io_set_aio_fd_handler = qio_channel_socket_set_aio_fd_handler;
> > > +#ifdef CONFIG_LINUX
> > > +    ioc_klass->io_flush = qio_channel_socket_flush;
> > > +#endif
> > >  }
> > >
> > >  static const TypeInfo qio_channel_socket_info = {
> > > --
> > > 2.35.1
> > >
> > >
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PULL 00/11] migration queue
  2020-11-12 18:37 [PULL 00/11] migration queue Dr. David Alan Gilbert (git)
@ 2020-11-13 10:49 ` Peter Maydell
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2020-11-13 10:49 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: Peng Liang, Juan Quintela, QEMU Developers, lihaotian9,
	zhengchuan, Stefan Hajnoczi, Chenqun (kuhn),
	Longpeng, Philippe Mathieu-Daudé,
	liuzhiqiang26

On Thu, 12 Nov 2020 at 18:41, Dr. David Alan Gilbert (git)
<dgilbert@redhat.com> wrote:
>
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> The following changes since commit cb5d19e8294486551c422759260883ed290226d9:
>
>   Merge remote-tracking branch 'remotes/mcayland/tags/qemu-macppc-20201112' into staging (2020-11-12 11:33:26 +0000)
>
> are available in the Git repository at:
>
>   git://github.com/dagrh/qemu.git tags/pull-migration-20201112a
>
> for you to fetch changes up to 7632b56c8f880a8f86cf049a3785069e1ffd2997:
>
>   virtiofsd: check whether strdup lo.source return NULL in main func (2020-11-12 16:25:38 +0000)
>
> ----------------------------------------------------------------
> Migration & virtiofs fixes for 5.2
>
> A bunch of small fixes.
>



Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.2
for any user-visible changes.

-- PMM


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PULL 00/11] migration queue
@ 2020-11-12 18:37 Dr. David Alan Gilbert (git)
  2020-11-13 10:49 ` Peter Maydell
  0 siblings, 1 reply; 20+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2020-11-12 18:37 UTC (permalink / raw)
  To: qemu-devel, kuhn.chenqun, zhengchuan, lihaotian9, longpeng2,
	liangpeng10, philmd, liuzhiqiang26
  Cc: stefanha, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The following changes since commit cb5d19e8294486551c422759260883ed290226d9:

  Merge remote-tracking branch 'remotes/mcayland/tags/qemu-macppc-20201112' into staging (2020-11-12 11:33:26 +0000)

are available in the Git repository at:

  git://github.com/dagrh/qemu.git tags/pull-migration-20201112a

for you to fetch changes up to 7632b56c8f880a8f86cf049a3785069e1ffd2997:

  virtiofsd: check whether strdup lo.source return NULL in main func (2020-11-12 16:25:38 +0000)

----------------------------------------------------------------
Migration & virtiofs fixes for 5.2

A bunch of small fixes.

----------------------------------------------------------------
Chen Qun (1):
      migration: fix uninitialized variable warning in migrate_send_rp_req_pages()

Chuan Zheng (3):
      migration/multifd: fix hangup with TLS-Multifd due to blocking handshake
      migration/dirtyrate: simplify includes in dirtyrate.c
      multifd/tls: fix memoryleak of the QIOChannelSocket object when cancelling migration

Haotian Li (3):
      tools/virtiofsd/buffer.c: check whether buf is NULL in fuse_bufvec_advance func
      virtiofsd: check whether lo_map_reserve returns NULL in, main func
      virtiofsd: check whether strdup lo.source return NULL in main func

Longpeng (Mike) (1):
      migration: handle CANCELLING state in migration_completion()

Max Reitz (1):
      virtiofsd: Announce submounts even without statx()

Peng Liang (1):
      ACPI: Avoid infinite recursion when dump-vmstate

Philippe Mathieu-Daudé (1):
      migration/ram: Fix hexadecimal format string specifier

 hw/acpi/generic_event_device.c   | 12 +++++++++++-
 migration/dirtyrate.c            |  5 -----
 migration/migration.c            |  4 +++-
 migration/multifd.c              | 24 ++++++++++++++++++------
 migration/ram.c                  |  2 +-
 tools/virtiofsd/buffer.c         |  4 ++++
 tools/virtiofsd/passthrough_ll.c | 24 +++++++++++++++---------
 7 files changed, 52 insertions(+), 23 deletions(-)



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-05-03  8:40 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-28 14:40 [PULL 00/11] migration queue Dr. David Alan Gilbert (git)
2022-04-28 14:40 ` [PULL 01/11] tests: fix encoding of IP addresses in x509 certs Dr. David Alan Gilbert (git)
2022-04-28 14:40 ` [PULL 02/11] tests: convert XBZRLE migration test to use common helper Dr. David Alan Gilbert (git)
2022-04-28 14:40 ` [PULL 03/11] tests: convert multifd migration tests " Dr. David Alan Gilbert (git)
2022-04-28 14:40 ` [PULL 04/11] tests: ensure migration status isn't reported as failed Dr. David Alan Gilbert (git)
2022-04-28 14:40 ` [PULL 05/11] QIOChannel: Add flags on io_writev and introduce io_flush callback Dr. David Alan Gilbert (git)
2022-04-28 14:40 ` [PULL 06/11] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX Dr. David Alan Gilbert (git)
2022-04-28 16:20   ` Dr. David Alan Gilbert
2022-04-30  2:40     ` Leonardo Bras Soares Passos
2022-05-02 23:51       ` Peter Xu
2022-05-03  0:12         ` Leonardo Bras Soares Passos
2022-05-03  1:22           ` Peter Xu
2022-05-03  8:30       ` Dr. David Alan Gilbert
2022-04-28 14:40 ` [PULL 07/11] migration: Add zero-copy-send parameter for QMP/HMP for Linux Dr. David Alan Gilbert (git)
2022-04-28 14:40 ` [PULL 08/11] migration: Add migrate_use_tls() helper Dr. David Alan Gilbert (git)
2022-04-28 14:40 ` [PULL 09/11] multifd: multifd_send_sync_main now returns negative on error Dr. David Alan Gilbert (git)
2022-04-28 14:40 ` [PULL 10/11] multifd: Send header packet without flags if zero-copy-send is enabled Dr. David Alan Gilbert (git)
2022-04-28 14:40 ` [PULL 11/11] multifd: Implement zero copy write in multifd migration (multifd-zero-copy) Dr. David Alan Gilbert (git)
  -- strict thread matches above, loose matches on Subject: below --
2020-11-12 18:37 [PULL 00/11] migration queue Dr. David Alan Gilbert (git)
2020-11-13 10:49 ` Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.