All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/5] MSG_ZEROCOPY + multifd
@ 2022-01-06 22:13 Leonardo Bras
  2022-01-06 22:13 ` [PATCH v7 1/5] QIOChannel: Add flags on io_writev and introduce io_flush callback Leonardo Bras
                   ` (4 more replies)
  0 siblings, 5 replies; 40+ messages in thread
From: Leonardo Bras @ 2022-01-06 22:13 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, Peter Xu
  Cc: Leonardo Bras, qemu-devel

This patch series intends to enable MSG_ZEROCOPY in QIOChannel, and make
use of it for multifd migration performance improvement, by reducing cpu
usage.

Patch #1 creates new callbacks for QIOChannel, allowing the implementation
of zero copy writing.

Patch #2 implements io_writev flags and io_flush() on QIOChannelSocket,
making use of MSG_ZEROCOPY on Linux.

Patch #3 adds a "zero_copy" migration property, only available with
CONFIG_LINUX, and compiled-out in any other architectures.
This migration property has to be enabled before multifd migration starts.

Patch #4 adds a helper function that allows to see if TLS is going to be used.
This helper will be later used in patch #5.

Patch #5 Makes use of QIOChannelSocket zero_copy implementation on
nocomp multifd migration.

Results:
In preliminary tests, the resource usage of __sys_sendmsg() reduced 15 times,
and the overall migration took 13-22% less time, based in synthetic cpu
workload.

In further tests, it was noted that, on multifd migration with 8 channels:
- On idle hosts, migration time reduced in 10% to 21%.
- On hosts busy with heavy cpu stress (1 stress thread per cpu, but
  not cpu-pinned) migration time reduced in ~25% by enabling zero-copy.
- On hosts with heavy cpu-pinned workloads (1 stress thread per cpu, 
  cpu-pinned), migration time reducted in ~66% by enabling zero-copy.

Above tests setup:
- Sending and Receiving hosts:
  - CPU : Intel(R) Xeon(R) Platinum 8276L CPU @ 2.20GHz (448 CPUS)
  - Network card: E810-C (100Gbps)
  - >1TB RAM
  - QEMU: Upstream master branch + This patchset
  - Linux: Upstream v5.15 
- VM configuration:
  - 28 VCPUs
  - 512GB RAM


---
Changes since v6:
- Remove io_writev_zero_copy(), and makes use of io_writev() new flags
  to achieve the same results.
- Rename io_flush_zero_copy() to io_flush()
- Previous patch #2 became too small, so it was squashed in previous
  patch #3 (now patch #2)

Changes since v5:
- flush_zero_copy now returns -1 on fail, 0 on success, and 1 when all
  processed writes were not able to use zerocopy in kernel.
- qio_channel_socket_poll() removed, using qio_channel_wait() instead
- ENOBUFS is now processed inside qio_channel_socket_writev_flags()
- Most zerocopy parameter validation moved to migrate_params_check(),
  leaving only feature test to socket_outgoing_migration() callback
- Naming went from *zerocopy to *zero_copy or *zero-copy, due to QAPI/QMP
  preferences
- Improved docs

Changes since v4:
- 3 patches got splitted in 6
- Flush is used for syncing after each iteration, instead of only at the end
- If zerocopy is not available, fail in connect instead of failing on write
- 'multifd-zerocopy' property renamed to 'zerocopy'
- Fail migrations that don't support zerocopy, if it's enabled.
- Instead of checking for zerocopy at each write, save the flags in
  MultiFDSendParams->write_flags and use them on write
- Reorganized flag usage in QIOChannelSocket 
- A lot of typos fixed
- More doc on buffer restrictions

Changes since v3:
- QIOChannel interface names changed from io_async_{writev,flush} to
  io_{writev,flush}_zerocopy
- Instead of falling back in case zerocopy is not implemented, return
  error and abort operation.
- Flush now waits as long as needed, or return error in case anything
  goes wrong, aborting the operation.
- Zerocopy is now conditional in multifd, being set by parameter
  multifd-zerocopy
- Moves zerocopy_flush to multifd_send_sync_main() from multifd_save_cleanup
  so migration can abort if flush goes wrong.
- Several other small improvements

Changes since v2:
- Patch #1: One more fallback
- Patch #2: Fall back to sync if fails to lock buffer memory in MSG_ZEROCOPY send.

Changes since v1:
- Reimplemented the patchset using async_write + async_flush approach.
- Implemented a flush to be able to tell whenever all data was written.


Leonardo Bras (5):
  QIOChannel: Add flags on io_writev and introduce io_flush callback
  QIOChannelSocket: Implement io_writev zero copy flag & io_flush for
    CONFIG_LINUX
  migration: Add zero-copy parameter for QMP/HMP for Linux
  migration: Add migrate_use_tls() helper
  multifd: Implement zero copy write in multifd migration
    (multifd-zero-copy)

 qapi/migration.json         |  24 ++++++++
 include/io/channel-socket.h |   2 +
 include/io/channel.h        |  67 +++++++++++++++++-----
 migration/migration.h       |   6 ++
 migration/multifd.h         |   4 +-
 io/channel-buffer.c         |   1 +
 io/channel-command.c        |   1 +
 io/channel-file.c           |   1 +
 io/channel-socket.c         | 109 ++++++++++++++++++++++++++++++++++--
 io/channel-tls.c            |   1 +
 io/channel-websock.c        |   1 +
 io/channel.c                |  51 ++++++++++++-----
 migration/channel.c         |   6 +-
 migration/migration.c       |  52 ++++++++++++++++-
 migration/multifd.c         |  45 ++++++++++++---
 migration/ram.c             |  29 +++++++---
 migration/rdma.c            |   1 +
 migration/socket.c          |   6 ++
 monitor/hmp-cmds.c          |   6 ++
 19 files changed, 360 insertions(+), 53 deletions(-)

-- 
2.34.1




^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v7 1/5] QIOChannel: Add flags on io_writev and introduce io_flush callback
  2022-01-06 22:13 [PATCH v7 0/5] MSG_ZEROCOPY + multifd Leonardo Bras
@ 2022-01-06 22:13 ` Leonardo Bras
  2022-01-13  6:28   ` Peter Xu
  2022-01-13 10:52   ` Daniel P. Berrangé
  2022-01-06 22:13 ` [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX Leonardo Bras
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 40+ messages in thread
From: Leonardo Bras @ 2022-01-06 22:13 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, Peter Xu
  Cc: Leonardo Bras, qemu-devel

Add flags to io_writev and introduce io_flush as optional callback to
QIOChannelClass, allowing the implementation of zero copy writes by
subclasses.

How to use them:
- Write data using qio_channel_writev(...,QIO_CHANNEL_WRITE_FLAG_ZERO_COPY),
- Wait write completion with qio_channel_flush().

Notes:
As some zero copy implementations work asynchronously, it's
recommended to keep the write buffer untouched until the return of
qio_channel_flush(), to avoid the risk of sending an updated buffer
instead of the buffer state during write.

As io_flush callback is optional, if a subclass does not implement it, then:
- io_flush will return 0 without changing anything.

Also, some functions like qio_channel_writev_full_all() were adapted to
receive a flag parameter. That allows shared code between zero copy and
non-zero copy writev, and also an easier implementation on new flags.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 include/io/channel.h | 67 +++++++++++++++++++++++++++++++++++---------
 io/channel-buffer.c  |  1 +
 io/channel-command.c |  1 +
 io/channel-file.c    |  1 +
 io/channel-socket.c  |  2 ++
 io/channel-tls.c     |  1 +
 io/channel-websock.c |  1 +
 io/channel.c         | 51 +++++++++++++++++++++++----------
 migration/rdma.c     |  1 +
 9 files changed, 98 insertions(+), 28 deletions(-)

diff --git a/include/io/channel.h b/include/io/channel.h
index 88988979f8..343766ce5b 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -32,12 +32,15 @@ OBJECT_DECLARE_TYPE(QIOChannel, QIOChannelClass,
 
 #define QIO_CHANNEL_ERR_BLOCK -2
 
+#define QIO_CHANNEL_WRITE_FLAG_ZERO_COPY 0x1
+
 typedef enum QIOChannelFeature QIOChannelFeature;
 
 enum QIOChannelFeature {
     QIO_CHANNEL_FEATURE_FD_PASS,
     QIO_CHANNEL_FEATURE_SHUTDOWN,
     QIO_CHANNEL_FEATURE_LISTEN,
+    QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY,
 };
 
 
@@ -104,6 +107,7 @@ struct QIOChannelClass {
                          size_t niov,
                          int *fds,
                          size_t nfds,
+                         int flags,
                          Error **errp);
     ssize_t (*io_readv)(QIOChannel *ioc,
                         const struct iovec *iov,
@@ -136,6 +140,8 @@ struct QIOChannelClass {
                                   IOHandler *io_read,
                                   IOHandler *io_write,
                                   void *opaque);
+    int (*io_flush)(QIOChannel *ioc,
+                    Error **errp);
 };
 
 /* General I/O handling functions */
@@ -222,12 +228,13 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
 
 
 /**
- * qio_channel_writev_full:
+ * qio_channel_writev_full_flags:
  * @ioc: the channel object
  * @iov: the array of memory regions to write data from
  * @niov: the length of the @iov array
  * @fds: an array of file handles to send
  * @nfds: number of file handles in @fds
+ * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
  * @errp: pointer to a NULL-initialized error object
  *
  * Write data to the IO channel, reading it from the
@@ -255,12 +262,16 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
  * or QIO_CHANNEL_ERR_BLOCK if no data is can be sent
  * and the channel is non-blocking
  */
-ssize_t qio_channel_writev_full(QIOChannel *ioc,
-                                const struct iovec *iov,
-                                size_t niov,
-                                int *fds,
-                                size_t nfds,
-                                Error **errp);
+ssize_t qio_channel_writev_full_flags(QIOChannel *ioc,
+                                      const struct iovec *iov,
+                                      size_t niov,
+                                      int *fds,
+                                      size_t nfds,
+                                      int flags,
+                                      Error **errp);
+
+#define qio_channel_writev_full(ioc, iov, niov, fds, nfds, errp) \
+    qio_channel_writev_full_flags(ioc, iov, niov, fds, nfds, 0, errp)
 
 /**
  * qio_channel_readv_all_eof:
@@ -831,12 +842,13 @@ int qio_channel_readv_full_all(QIOChannel *ioc,
                                Error **errp);
 
 /**
- * qio_channel_writev_full_all:
+ * qio_channel_writev_full_all_flags:
  * @ioc: the channel object
  * @iov: the array of memory regions to write data from
  * @niov: the length of the @iov array
  * @fds: an array of file handles to send
  * @nfds: number of file handles in @fds
+ * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
  * @errp: pointer to a NULL-initialized error object
  *
  *
@@ -846,13 +858,42 @@ int qio_channel_readv_full_all(QIOChannel *ioc,
  * to be written, yielding from the current coroutine
  * if required.
  *
+ * If QIO_CHANNEL_WRITE_FLAG_ZERO_COPY is passed in flags,
+ * instead of waiting for all requested data to be written,
+ * this function will wait until it's all queued for writing.
+ * In this case, if the buffer gets changed between queueing and
+ * sending, the updated buffer will be sent. If this is not a
+ * desired behavior, it's suggested to call qio_channel_flush()
+ * before reusing the buffer.
+ *
  * Returns: 0 if all bytes were written, or -1 on error
  */
 
-int qio_channel_writev_full_all(QIOChannel *ioc,
-                                const struct iovec *iov,
-                                size_t niov,
-                                int *fds, size_t nfds,
-                                Error **errp);
+int qio_channel_writev_full_all_flags(QIOChannel *ioc,
+                                      const struct iovec *iov,
+                                      size_t niov,
+                                      int *fds, size_t nfds,
+                                      int flags, Error **errp);
+#define qio_channel_writev_full_all(ioc, iov, niov, fds, nfds, errp) \
+    qio_channel_writev_full_all_flags(ioc, iov, niov, fds, nfds, 0, errp)
+
+/**
+ * qio_channel_flush:
+ * @ioc: the channel object
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Will block until every packet queued with
+ * qio_channel_writev_full_flags() + QIO_CHANNEL_WRITE_FLAG_ZERO_COPY
+ * is sent, or return in case of any error.
+ *
+ * If not implemented, acts as a no-op, and returns 0.
+ *
+ * Returns -1 if any error is found,
+ *          1 if every send failed to use zero copy.
+ *          0 otherwise.
+ */
+
+int qio_channel_flush(QIOChannel *ioc,
+                      Error **errp);
 
 #endif /* QIO_CHANNEL_H */
diff --git a/io/channel-buffer.c b/io/channel-buffer.c
index baa4e2b089..bf52011be2 100644
--- a/io/channel-buffer.c
+++ b/io/channel-buffer.c
@@ -81,6 +81,7 @@ static ssize_t qio_channel_buffer_writev(QIOChannel *ioc,
                                          size_t niov,
                                          int *fds,
                                          size_t nfds,
+                                         int flags,
                                          Error **errp)
 {
     QIOChannelBuffer *bioc = QIO_CHANNEL_BUFFER(ioc);
diff --git a/io/channel-command.c b/io/channel-command.c
index b2a9e27138..5ff1691bad 100644
--- a/io/channel-command.c
+++ b/io/channel-command.c
@@ -258,6 +258,7 @@ static ssize_t qio_channel_command_writev(QIOChannel *ioc,
                                           size_t niov,
                                           int *fds,
                                           size_t nfds,
+                                          int flags,
                                           Error **errp)
 {
     QIOChannelCommand *cioc = QIO_CHANNEL_COMMAND(ioc);
diff --git a/io/channel-file.c b/io/channel-file.c
index c4bf799a80..348a48545e 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -114,6 +114,7 @@ static ssize_t qio_channel_file_writev(QIOChannel *ioc,
                                        size_t niov,
                                        int *fds,
                                        size_t nfds,
+                                       int flags,
                                        Error **errp)
 {
     QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
diff --git a/io/channel-socket.c b/io/channel-socket.c
index 606ec97cf7..bfbd64787e 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -525,6 +525,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
                                          size_t niov,
                                          int *fds,
                                          size_t nfds,
+                                         int flags,
                                          Error **errp)
 {
     QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
@@ -620,6 +621,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
                                          size_t niov,
                                          int *fds,
                                          size_t nfds,
+                                         int flags,
                                          Error **errp)
 {
     QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
diff --git a/io/channel-tls.c b/io/channel-tls.c
index 2ae1b92fc0..4ce890a538 100644
--- a/io/channel-tls.c
+++ b/io/channel-tls.c
@@ -301,6 +301,7 @@ static ssize_t qio_channel_tls_writev(QIOChannel *ioc,
                                       size_t niov,
                                       int *fds,
                                       size_t nfds,
+                                      int flags,
                                       Error **errp)
 {
     QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
diff --git a/io/channel-websock.c b/io/channel-websock.c
index 70889bb54d..035dd6075b 100644
--- a/io/channel-websock.c
+++ b/io/channel-websock.c
@@ -1127,6 +1127,7 @@ static ssize_t qio_channel_websock_writev(QIOChannel *ioc,
                                           size_t niov,
                                           int *fds,
                                           size_t nfds,
+                                          int flags,
                                           Error **errp)
 {
     QIOChannelWebsock *wioc = QIO_CHANNEL_WEBSOCK(ioc);
diff --git a/io/channel.c b/io/channel.c
index e8b019dc36..904855e16e 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -67,12 +67,13 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
 }
 
 
-ssize_t qio_channel_writev_full(QIOChannel *ioc,
-                                const struct iovec *iov,
-                                size_t niov,
-                                int *fds,
-                                size_t nfds,
-                                Error **errp)
+ssize_t qio_channel_writev_full_flags(QIOChannel *ioc,
+                                      const struct iovec *iov,
+                                      size_t niov,
+                                      int *fds,
+                                      size_t nfds,
+                                      int flags,
+                                      Error **errp)
 {
     QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
 
@@ -83,7 +84,7 @@ ssize_t qio_channel_writev_full(QIOChannel *ioc,
         return -1;
     }
 
-    return klass->io_writev(ioc, iov, niov, fds, nfds, errp);
+    return klass->io_writev(ioc, iov, niov, fds, nfds, flags, errp);
 }
 
 
@@ -217,14 +218,15 @@ int qio_channel_writev_all(QIOChannel *ioc,
                            size_t niov,
                            Error **errp)
 {
-    return qio_channel_writev_full_all(ioc, iov, niov, NULL, 0, errp);
+    return qio_channel_writev_full_all_flags(ioc, iov, niov, NULL, 0, 0,
+                                             errp);
 }
 
-int qio_channel_writev_full_all(QIOChannel *ioc,
-                                const struct iovec *iov,
-                                size_t niov,
-                                int *fds, size_t nfds,
-                                Error **errp)
+int qio_channel_writev_full_all_flags(QIOChannel *ioc,
+                                      const struct iovec *iov,
+                                      size_t niov,
+                                      int *fds, size_t nfds,
+                                      int flags, Error **errp)
 {
     int ret = -1;
     struct iovec *local_iov = g_new(struct iovec, niov);
@@ -235,10 +237,16 @@ int qio_channel_writev_full_all(QIOChannel *ioc,
                           iov, niov,
                           0, iov_size(iov, niov));
 
+    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
+        assert(fds == NULL && nfds == 0);
+    }
+
     while (nlocal_iov > 0) {
         ssize_t len;
-        len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds, nfds,
-                                      errp);
+
+        len = qio_channel_writev_full_flags(ioc, local_iov, nlocal_iov, fds,
+                                            nfds, flags, errp);
+
         if (len == QIO_CHANNEL_ERR_BLOCK) {
             if (qemu_in_coroutine()) {
                 qio_channel_yield(ioc, G_IO_OUT);
@@ -473,6 +481,19 @@ off_t qio_channel_io_seek(QIOChannel *ioc,
     return klass->io_seek(ioc, offset, whence, errp);
 }
 
+int qio_channel_flush(QIOChannel *ioc,
+                                Error **errp)
+{
+    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
+
+    if (!klass->io_flush ||
+        !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY)) {
+        return 0;
+    }
+
+    return klass->io_flush(ioc, errp);
+}
+
 
 static void qio_channel_restart_read(void *opaque)
 {
diff --git a/migration/rdma.c b/migration/rdma.c
index f5d3bbe7e9..54acd2000e 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2833,6 +2833,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
                                        size_t niov,
                                        int *fds,
                                        size_t nfds,
+                                       int flags,
                                        Error **errp)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
  2022-01-06 22:13 [PATCH v7 0/5] MSG_ZEROCOPY + multifd Leonardo Bras
  2022-01-06 22:13 ` [PATCH v7 1/5] QIOChannel: Add flags on io_writev and introduce io_flush callback Leonardo Bras
@ 2022-01-06 22:13 ` Leonardo Bras
  2022-01-13  6:48   ` Peter Xu
  2022-01-13 13:05   ` Daniel P. Berrangé
  2022-01-06 22:13 ` [PATCH v7 3/5] migration: Add zero-copy parameter for QMP/HMP for Linux Leonardo Bras
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 40+ messages in thread
From: Leonardo Bras @ 2022-01-06 22:13 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, Peter Xu
  Cc: Leonardo Bras, qemu-devel

For CONFIG_LINUX, implement the new zero copy flag and the optional callback
io_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
feature is available in the host kernel, which is checked on
qio_channel_socket_connect_sync()

qio_channel_socket_flush() was implemented by counting how many times
sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the
socket's error queue, in order to find how many of them finished sending.
Flush will loop until those counters are the same, or until some error occurs.

Notes on using writev() with QIO_CHANNEL_WRITE_FLAG_ZERO_COPY:
1: Buffer
- As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying,
some caution is necessary to avoid overwriting any buffer before it's sent.
If something like this happen, a newer version of the buffer may be sent instead.
- If this is a problem, it's recommended to call qio_channel_flush() before freeing
or re-using the buffer.

2: Locked memory
- When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and
unlocked after it's sent.
- Depending on the size of each buffer, and how often it's sent, it may require
a larger amount of locked memory than usually available to non-root user.
- If the required amount of locked memory is not available, writev_zero_copy
will return an error, which can abort an operation like migration,
- Because of this, when an user code wants to add zero copy as a feature, it
requires a mechanism to disable it, so it can still be accessible to less
privileged users.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 include/io/channel-socket.h |   2 +
 io/channel-socket.c         | 107 ++++++++++++++++++++++++++++++++++--
 2 files changed, 105 insertions(+), 4 deletions(-)

diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
index e747e63514..513c428fe4 100644
--- a/include/io/channel-socket.h
+++ b/include/io/channel-socket.h
@@ -47,6 +47,8 @@ struct QIOChannelSocket {
     socklen_t localAddrLen;
     struct sockaddr_storage remoteAddr;
     socklen_t remoteAddrLen;
+    ssize_t zero_copy_queued;
+    ssize_t zero_copy_sent;
 };
 
 
diff --git a/io/channel-socket.c b/io/channel-socket.c
index bfbd64787e..fb1e210ec5 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -26,6 +26,10 @@
 #include "io/channel-watch.h"
 #include "trace.h"
 #include "qapi/clone-visitor.h"
+#ifdef CONFIG_LINUX
+#include <linux/errqueue.h>
+#include <bits/socket.h>
+#endif
 
 #define SOCKET_MAX_FDS 16
 
@@ -55,6 +59,8 @@ qio_channel_socket_new(void)
 
     sioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET));
     sioc->fd = -1;
+    sioc->zero_copy_queued = 0;
+    sioc->zero_copy_sent = 0;
 
     ioc = QIO_CHANNEL(sioc);
     qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN);
@@ -154,6 +160,16 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
         return -1;
     }
 
+#ifdef CONFIG_LINUX
+    int ret, v = 1;
+    ret = qemu_setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v));
+    if (ret == 0) {
+        /* Zero copy available on host */
+        qio_channel_set_feature(QIO_CHANNEL(ioc),
+                                QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY);
+    }
+#endif
+
     return 0;
 }
 
@@ -534,6 +550,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
     char control[CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)];
     size_t fdsize = sizeof(int) * nfds;
     struct cmsghdr *cmsg;
+    int sflags = 0;
 
     memset(control, 0, CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS));
 
@@ -558,15 +575,26 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
         memcpy(CMSG_DATA(cmsg), fds, fdsize);
     }
 
+    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
+        sflags = MSG_ZEROCOPY;
+    }
+
  retry:
-    ret = sendmsg(sioc->fd, &msg, 0);
+    ret = sendmsg(sioc->fd, &msg, sflags);
     if (ret <= 0) {
-        if (errno == EAGAIN) {
+        switch (errno) {
+        case EAGAIN:
             return QIO_CHANNEL_ERR_BLOCK;
-        }
-        if (errno == EINTR) {
+        case EINTR:
             goto retry;
+        case ENOBUFS:
+            if (sflags & MSG_ZEROCOPY) {
+                error_setg_errno(errp, errno,
+                                 "Process can't lock enough memory for using MSG_ZEROCOPY");
+                return -1;
+            }
         }
+
         error_setg_errno(errp, errno,
                          "Unable to write to socket");
         return -1;
@@ -660,6 +688,74 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
 }
 #endif /* WIN32 */
 
+
+#ifdef CONFIG_LINUX
+static int qio_channel_socket_flush(QIOChannel *ioc,
+                                    Error **errp)
+{
+    QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
+    struct msghdr msg = {};
+    struct sock_extended_err *serr;
+    struct cmsghdr *cm;
+    char control[CMSG_SPACE(sizeof(*serr))];
+    int received;
+    int ret = 1;
+
+    msg.msg_control = control;
+    msg.msg_controllen = sizeof(control);
+    memset(control, 0, sizeof(control));
+
+    while (sioc->zero_copy_sent < sioc->zero_copy_queued) {
+        received = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE);
+        if (received < 0) {
+            switch (errno) {
+            case EAGAIN:
+                /* Nothing on errqueue, wait until something is available */
+                qio_channel_wait(ioc, G_IO_ERR);
+                continue;
+            case EINTR:
+                continue;
+            default:
+                error_setg_errno(errp, errno,
+                                 "Unable to read errqueue");
+                return -1;
+            }
+        }
+
+        cm = CMSG_FIRSTHDR(&msg);
+        if (cm->cmsg_level != SOL_IP &&
+            cm->cmsg_type != IP_RECVERR) {
+            error_setg_errno(errp, EPROTOTYPE,
+                             "Wrong cmsg in errqueue");
+            return -1;
+        }
+
+        serr = (void *) CMSG_DATA(cm);
+        if (serr->ee_errno != SO_EE_ORIGIN_NONE) {
+            error_setg_errno(errp, serr->ee_errno,
+                             "Error on socket");
+            return -1;
+        }
+        if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) {
+            error_setg_errno(errp, serr->ee_origin,
+                             "Error not from zero copy");
+            return -1;
+        }
+
+        /* No errors, count successfully finished sendmsg()*/
+        sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
+
+        /* If any sendmsg() succeeded using zero copy, return 0 at the end */
+        if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
+            ret = 0;
+        }
+    }
+
+    return ret;
+}
+
+#endif /* CONFIG_LINUX */
+
 static int
 qio_channel_socket_set_blocking(QIOChannel *ioc,
                                 bool enabled,
@@ -789,6 +885,9 @@ static void qio_channel_socket_class_init(ObjectClass *klass,
     ioc_klass->io_set_delay = qio_channel_socket_set_delay;
     ioc_klass->io_create_watch = qio_channel_socket_create_watch;
     ioc_klass->io_set_aio_fd_handler = qio_channel_socket_set_aio_fd_handler;
+#ifdef CONFIG_LINUX
+    ioc_klass->io_flush = qio_channel_socket_flush;
+#endif
 }
 
 static const TypeInfo qio_channel_socket_info = {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 3/5] migration: Add zero-copy parameter for QMP/HMP for Linux
  2022-01-06 22:13 [PATCH v7 0/5] MSG_ZEROCOPY + multifd Leonardo Bras
  2022-01-06 22:13 ` [PATCH v7 1/5] QIOChannel: Add flags on io_writev and introduce io_flush callback Leonardo Bras
  2022-01-06 22:13 ` [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX Leonardo Bras
@ 2022-01-06 22:13 ` Leonardo Bras
  2022-01-13  7:00   ` Peter Xu
  2022-01-13 13:09   ` Daniel P. Berrangé
  2022-01-06 22:13 ` [PATCH v7 4/5] migration: Add migrate_use_tls() helper Leonardo Bras
  2022-01-06 22:13 ` [PATCH v7 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy) Leonardo Bras
  4 siblings, 2 replies; 40+ messages in thread
From: Leonardo Bras @ 2022-01-06 22:13 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, Peter Xu
  Cc: Leonardo Bras, qemu-devel

Add property that allows zero-copy migration of memory pages,
and also includes a helper function migrate_use_zero_copy() to check
if it's enabled.

No code is introduced to actually do the migration, but it allow
future implementations to enable/disable this feature.

On non-Linux builds this parameter is compiled-out.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 qapi/migration.json   | 24 ++++++++++++++++++++++++
 migration/migration.h |  5 +++++
 migration/migration.c | 32 ++++++++++++++++++++++++++++++++
 migration/socket.c    |  5 +++++
 monitor/hmp-cmds.c    |  6 ++++++
 5 files changed, 72 insertions(+)

diff --git a/qapi/migration.json b/qapi/migration.json
index bbfd48cf0b..2e62ea6ebd 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -730,6 +730,13 @@
 #                      will consume more CPU.
 #                      Defaults to 1. (Since 5.0)
 #
+# @zero-copy: Controls behavior on sending memory pages on migration.
+#             When true, enables a zero-copy mechanism for sending memory
+#             pages, if host supports it.
+#             Requires that QEMU be permitted to use locked memory for guest
+#             RAM pages.
+#             Defaults to false. (Since 7.0)
+#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #                        aliases for the purpose of dirty bitmap migration.  Such
 #                        aliases may for example be the corresponding names on the
@@ -769,6 +776,7 @@
            'xbzrle-cache-size', 'max-postcopy-bandwidth',
            'max-cpu-throttle', 'multifd-compression',
            'multifd-zlib-level' ,'multifd-zstd-level',
+           { 'name': 'zero-copy', 'if' : 'CONFIG_LINUX'},
            'block-bitmap-mapping' ] }
 
 ##
@@ -895,6 +903,13 @@
 #                      will consume more CPU.
 #                      Defaults to 1. (Since 5.0)
 #
+# @zero-copy: Controls behavior on sending memory pages on migration.
+#             When true, enables a zero-copy mechanism for sending memory
+#             pages, if host supports it.
+#             Requires that QEMU be permitted to use locked memory for guest
+#             RAM pages.
+#             Defaults to false. (Since 7.0)
+#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #                        aliases for the purpose of dirty bitmap migration.  Such
 #                        aliases may for example be the corresponding names on the
@@ -949,6 +964,7 @@
             '*multifd-compression': 'MultiFDCompression',
             '*multifd-zlib-level': 'uint8',
             '*multifd-zstd-level': 'uint8',
+            '*zero-copy': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
             '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
 
 ##
@@ -1095,6 +1111,13 @@
 #                      will consume more CPU.
 #                      Defaults to 1. (Since 5.0)
 #
+# @zero-copy: Controls behavior on sending memory pages on migration.
+#             When true, enables a zero-copy mechanism for sending memory
+#             pages, if host supports it.
+#             Requires that QEMU be permitted to use locked memory for guest
+#             RAM pages.
+#             Defaults to false. (Since 7.0)
+#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #                        aliases for the purpose of dirty bitmap migration.  Such
 #                        aliases may for example be the corresponding names on the
@@ -1147,6 +1170,7 @@
             '*multifd-compression': 'MultiFDCompression',
             '*multifd-zlib-level': 'uint8',
             '*multifd-zstd-level': 'uint8',
+            '*zero-copy': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
             '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
 
 ##
diff --git a/migration/migration.h b/migration/migration.h
index 8130b703eb..1489eeb165 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -339,6 +339,11 @@ MultiFDCompression migrate_multifd_compression(void);
 int migrate_multifd_zlib_level(void);
 int migrate_multifd_zstd_level(void);
 
+#ifdef CONFIG_LINUX
+bool migrate_use_zero_copy(void);
+#else
+#define migrate_use_zero_copy() (false)
+#endif
 int migrate_use_xbzrle(void);
 uint64_t migrate_xbzrle_cache_size(void);
 bool migrate_colo_enabled(void);
diff --git a/migration/migration.c b/migration/migration.c
index 0652165610..aa8f1dc835 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -893,6 +893,10 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     params->multifd_zlib_level = s->parameters.multifd_zlib_level;
     params->has_multifd_zstd_level = true;
     params->multifd_zstd_level = s->parameters.multifd_zstd_level;
+#ifdef CONFIG_LINUX
+    params->has_zero_copy = true;
+    params->zero_copy = s->parameters.zero_copy;
+#endif
     params->has_xbzrle_cache_size = true;
     params->xbzrle_cache_size = s->parameters.xbzrle_cache_size;
     params->has_max_postcopy_bandwidth = true;
@@ -1546,6 +1550,11 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
     if (params->has_multifd_compression) {
         dest->multifd_compression = params->multifd_compression;
     }
+#ifdef CONFIG_LINUX
+    if (params->has_zero_copy) {
+        dest->zero_copy = params->zero_copy;
+    }
+#endif
     if (params->has_xbzrle_cache_size) {
         dest->xbzrle_cache_size = params->xbzrle_cache_size;
     }
@@ -1658,6 +1667,11 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
     if (params->has_multifd_compression) {
         s->parameters.multifd_compression = params->multifd_compression;
     }
+#ifdef CONFIG_LINUX
+    if (params->has_zero_copy) {
+        s->parameters.zero_copy = params->zero_copy;
+    }
+#endif
     if (params->has_xbzrle_cache_size) {
         s->parameters.xbzrle_cache_size = params->xbzrle_cache_size;
         xbzrle_cache_resize(params->xbzrle_cache_size, errp);
@@ -2548,6 +2562,17 @@ int migrate_multifd_zstd_level(void)
     return s->parameters.multifd_zstd_level;
 }
 
+#ifdef CONFIG_LINUX
+bool migrate_use_zero_copy(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->parameters.zero_copy;
+}
+#endif
+
 int migrate_use_xbzrle(void)
 {
     MigrationState *s;
@@ -4200,6 +4225,10 @@ static Property migration_properties[] = {
     DEFINE_PROP_UINT8("multifd-zstd-level", MigrationState,
                       parameters.multifd_zstd_level,
                       DEFAULT_MIGRATE_MULTIFD_ZSTD_LEVEL),
+#ifdef CONFIG_LINUX
+    DEFINE_PROP_BOOL("zero_copy", MigrationState,
+                      parameters.zero_copy, false),
+#endif
     DEFINE_PROP_SIZE("xbzrle-cache-size", MigrationState,
                       parameters.xbzrle_cache_size,
                       DEFAULT_MIGRATE_XBZRLE_CACHE_SIZE),
@@ -4297,6 +4326,9 @@ static void migration_instance_init(Object *obj)
     params->has_multifd_compression = true;
     params->has_multifd_zlib_level = true;
     params->has_multifd_zstd_level = true;
+#ifdef CONFIG_LINUX
+    params->has_zero_copy = true;
+#endif
     params->has_xbzrle_cache_size = true;
     params->has_max_postcopy_bandwidth = true;
     params->has_max_cpu_throttle = true;
diff --git a/migration/socket.c b/migration/socket.c
index 05705a32d8..f7a77aafd3 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -77,6 +77,11 @@ static void socket_outgoing_migration(QIOTask *task,
     } else {
         trace_migration_socket_outgoing_connected(data->hostname);
     }
+
+    if (migrate_use_zero_copy()) {
+        error_setg(&err, "Zero copy not available in migration");
+    }
+
     migration_channel_connect(data->s, sioc, data->hostname, err);
     object_unref(OBJECT(sioc));
 }
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 2669156b28..35d47e250d 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1297,6 +1297,12 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
         p->has_multifd_zstd_level = true;
         visit_type_uint8(v, param, &p->multifd_zstd_level, &err);
         break;
+#ifdef CONFIG_LINUX
+    case MIGRATION_PARAMETER_ZERO_COPY:
+        p->has_zero_copy = true;
+        visit_type_bool(v, param, &p->zero_copy, &err);
+        break;
+#endif
     case MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE:
         p->has_xbzrle_cache_size = true;
         if (!visit_type_size(v, param, &cache_size, &err)) {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 4/5] migration: Add migrate_use_tls() helper
  2022-01-06 22:13 [PATCH v7 0/5] MSG_ZEROCOPY + multifd Leonardo Bras
                   ` (2 preceding siblings ...)
  2022-01-06 22:13 ` [PATCH v7 3/5] migration: Add zero-copy parameter for QMP/HMP for Linux Leonardo Bras
@ 2022-01-06 22:13 ` Leonardo Bras
  2022-01-13  7:02   ` Peter Xu
  2022-01-13 13:10   ` Daniel P. Berrangé
  2022-01-06 22:13 ` [PATCH v7 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy) Leonardo Bras
  4 siblings, 2 replies; 40+ messages in thread
From: Leonardo Bras @ 2022-01-06 22:13 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, Peter Xu
  Cc: Leonardo Bras, qemu-devel

A lot of places check parameters.tls_creds in order to evaluate if TLS is
in use, and sometimes call migrate_get_current() just for that test.

Add new helper function migrate_use_tls() in order to simplify testing
for TLS usage.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 migration/migration.h | 1 +
 migration/channel.c   | 6 +++---
 migration/migration.c | 9 +++++++++
 migration/multifd.c   | 5 +----
 4 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 1489eeb165..445d95bbf2 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -344,6 +344,7 @@ bool migrate_use_zero_copy(void);
 #else
 #define migrate_use_zero_copy() (false)
 #endif
+int migrate_use_tls(void);
 int migrate_use_xbzrle(void);
 uint64_t migrate_xbzrle_cache_size(void);
 bool migrate_colo_enabled(void);
diff --git a/migration/channel.c b/migration/channel.c
index c4fc000a1a..1a45b75d29 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -32,16 +32,16 @@
  */
 void migration_channel_process_incoming(QIOChannel *ioc)
 {
-    MigrationState *s = migrate_get_current();
     Error *local_err = NULL;
 
     trace_migration_set_incoming_channel(
         ioc, object_get_typename(OBJECT(ioc)));
 
-    if (s->parameters.tls_creds &&
-        *s->parameters.tls_creds &&
+    if (migrate_use_tls() &&
         !object_dynamic_cast(OBJECT(ioc),
                              TYPE_QIO_CHANNEL_TLS)) {
+        MigrationState *s = migrate_get_current();
+
         migration_tls_channel_process_incoming(s, ioc, &local_err);
     } else {
         migration_ioc_register_yank(ioc);
diff --git a/migration/migration.c b/migration/migration.c
index aa8f1dc835..7bcb800890 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2573,6 +2573,15 @@ bool migrate_use_zero_copy(void)
 }
 #endif
 
+int migrate_use_tls(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->parameters.tls_creds && *s->parameters.tls_creds;
+}
+
 int migrate_use_xbzrle(void)
 {
     MigrationState *s;
diff --git a/migration/multifd.c b/migration/multifd.c
index 3242f688e5..677e942747 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -796,14 +796,11 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
                                     QIOChannel *ioc,
                                     Error *error)
 {
-    MigrationState *s = migrate_get_current();
-
     trace_multifd_set_outgoing_channel(
         ioc, object_get_typename(OBJECT(ioc)), p->tls_hostname, error);
 
     if (!error) {
-        if (s->parameters.tls_creds &&
-            *s->parameters.tls_creds &&
+        if (migrate_use_tls() &&
             !object_dynamic_cast(OBJECT(ioc),
                                  TYPE_QIO_CHANNEL_TLS)) {
             multifd_tls_channel_connect(p, ioc, &error);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v7 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)
  2022-01-06 22:13 [PATCH v7 0/5] MSG_ZEROCOPY + multifd Leonardo Bras
                   ` (3 preceding siblings ...)
  2022-01-06 22:13 ` [PATCH v7 4/5] migration: Add migrate_use_tls() helper Leonardo Bras
@ 2022-01-06 22:13 ` Leonardo Bras
  2022-01-13  7:15   ` Peter Xu
  4 siblings, 1 reply; 40+ messages in thread
From: Leonardo Bras @ 2022-01-06 22:13 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, Peter Xu
  Cc: Leonardo Bras, qemu-devel

Implement zero copy on nocomp_send_write(), by making use of QIOChannel
writev + flags & flush interface.

Change multifd_send_sync_main() so it can distinguish each iteration sync from
the setup and the completion, so a flush_zero_copy() can be called
after each iteration in order to make sure all dirty pages are sent
before a new iteration is started.

Also make it return -1 if flush_zero_copy() fails, in order to cancel
the migration process, and avoid resuming the guest in the target host
without receiving all current RAM.

This will work fine on RAM migration because the RAM pages are not usually freed,
and there is no problem on changing the pages content between writev_zero_copy() and
the actual sending of the buffer, because this change will dirty the page and
cause it to be re-sent on a next iteration anyway.

A lot of locked memory may be needed in order to use multid migration
with zero-copy enabled, so disabling the feature should be necessary for
low-privileged users trying to perform multifd migrations.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/multifd.h   |  4 +++-
 migration/migration.c | 11 ++++++++++-
 migration/multifd.c   | 40 +++++++++++++++++++++++++++++++++++-----
 migration/ram.c       | 29 ++++++++++++++++++++++-------
 migration/socket.c    |  5 +++--
 5 files changed, 73 insertions(+), 16 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index e57adc783b..d9fbccdbe2 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -22,7 +22,7 @@ int multifd_load_cleanup(Error **errp);
 bool multifd_recv_all_channels_created(void);
 bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
 void multifd_recv_sync_main(void);
-void multifd_send_sync_main(QEMUFile *f);
+int multifd_send_sync_main(QEMUFile *f, bool sync);
 int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
 
 /* Multifd Compression flags */
@@ -97,6 +97,8 @@ typedef struct {
     uint32_t packet_len;
     /* pointer to the packet */
     MultiFDPacket_t *packet;
+    /* multifd flags for sending ram */
+    int write_flags;
     /* multifd flags for each packet */
     uint32_t flags;
     /* size of the next packet that contains pages */
diff --git a/migration/migration.c b/migration/migration.c
index 7bcb800890..76a3313e66 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1476,7 +1476,16 @@ static bool migrate_params_check(MigrationParameters *params, Error **errp)
         error_prepend(errp, "Invalid mapping given for block-bitmap-mapping: ");
         return false;
     }
-
+#ifdef CONFIG_LINUX
+    if (params->zero_copy &&
+        (!migrate_use_multifd() ||
+         params->multifd_compression != MULTIFD_COMPRESSION_NONE ||
+         (params->tls_creds && *params->tls_creds))) {
+        error_setg(errp,
+                   "Zero copy only available for non-compressed non-TLS multifd migration");
+        return false;
+    }
+#endif
     return true;
 }
 
diff --git a/migration/multifd.c b/migration/multifd.c
index 677e942747..1b6b7cc1a1 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -104,7 +104,8 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
  */
 static int nocomp_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
 {
-    return qio_channel_writev_all(p->c, p->pages->iov, used, errp);
+    return qio_channel_writev_full_all_flags(p->c, p->pages->iov, used,
+                                             NULL, 0, p->write_flags, errp);
 }
 
 /**
@@ -582,19 +583,28 @@ void multifd_save_cleanup(void)
     multifd_send_state = NULL;
 }
 
-void multifd_send_sync_main(QEMUFile *f)
+int multifd_send_sync_main(QEMUFile *f, bool sync)
 {
     int i;
+    bool flush_zero_copy;
 
     if (!migrate_use_multifd()) {
-        return;
+        return 0;
     }
     if (multifd_send_state->pages->num) {
         if (multifd_send_pages(f) < 0) {
             error_report("%s: multifd_send_pages fail", __func__);
-            return;
+            return 0;
         }
     }
+
+    /*
+     * When using zero-copy, it's necessary to flush after each iteration to
+     * make sure pages from earlier iterations don't end up replacing newer
+     * pages.
+     */
+    flush_zero_copy = sync && migrate_use_zero_copy();
+
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
 
@@ -605,7 +615,7 @@ void multifd_send_sync_main(QEMUFile *f)
         if (p->quit) {
             error_report("%s: channel %d has already quit", __func__, i);
             qemu_mutex_unlock(&p->mutex);
-            return;
+            return 0;
         }
 
         p->packet_num = multifd_send_state->packet_num++;
@@ -616,6 +626,17 @@ void multifd_send_sync_main(QEMUFile *f)
         ram_counters.transferred += p->packet_len;
         qemu_mutex_unlock(&p->mutex);
         qemu_sem_post(&p->sem);
+
+        if (flush_zero_copy) {
+            int ret;
+            Error *err = NULL;
+
+            ret = qio_channel_flush(p->c, &err);
+            if (ret < 0) {
+                error_report_err(err);
+                return -1;
+            }
+        }
     }
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
@@ -624,6 +645,8 @@ void multifd_send_sync_main(QEMUFile *f)
         qemu_sem_wait(&p->sem_sync);
     }
     trace_multifd_send_sync_main(multifd_send_state->packet_num);
+
+    return 0;
 }
 
 static void *multifd_send_thread(void *opaque)
@@ -919,6 +942,13 @@ int multifd_save_setup(Error **errp)
         p->packet->version = cpu_to_be32(MULTIFD_VERSION);
         p->name = g_strdup_printf("multifdsend_%d", i);
         p->tls_hostname = g_strdup(s->hostname);
+
+        if (migrate_use_zero_copy()) {
+            p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY;
+        } else {
+            p->write_flags = 0;
+        }
+
         socket_send_channel_create(multifd_new_send_channel_async, p);
     }
 
diff --git a/migration/ram.c b/migration/ram.c
index 57efa67f20..a1ae66c50c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2987,6 +2987,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMState **rsp = opaque;
     RAMBlock *block;
+    int ret;
 
     if (compress_threads_save_setup()) {
         return -1;
@@ -3021,7 +3022,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     ram_control_before_iterate(f, RAM_CONTROL_SETUP);
     ram_control_after_iterate(f, RAM_CONTROL_SETUP);
 
-    multifd_send_sync_main(f);
+    ret =  multifd_send_sync_main(f, false);
+    if (ret < 0) {
+        return ret;
+    }
+
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
     qemu_fflush(f);
 
@@ -3130,7 +3135,11 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 out:
     if (ret >= 0
         && migration_is_setup_or_active(migrate_get_current()->state)) {
-        multifd_send_sync_main(rs->f);
+        ret = multifd_send_sync_main(rs->f, true);
+        if (ret < 0) {
+            return ret;
+        }
+
         qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
         qemu_fflush(f);
         ram_counters.transferred += 8;
@@ -3188,13 +3197,19 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
         ram_control_after_iterate(f, RAM_CONTROL_FINISH);
     }
 
-    if (ret >= 0) {
-        multifd_send_sync_main(rs->f);
-        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
-        qemu_fflush(f);
+    if (ret < 0) {
+        return ret;
     }
 
-    return ret;
+    ret = multifd_send_sync_main(rs->f, false);
+    if (ret < 0) {
+        return ret;
+    }
+
+    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+    qemu_fflush(f);
+
+    return 0;
 }
 
 static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
diff --git a/migration/socket.c b/migration/socket.c
index f7a77aafd3..23b03e6190 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -78,8 +78,9 @@ static void socket_outgoing_migration(QIOTask *task,
         trace_migration_socket_outgoing_connected(data->hostname);
     }
 
-    if (migrate_use_zero_copy()) {
-        error_setg(&err, "Zero copy not available in migration");
+    if (migrate_use_zero_copy() &&
+        !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY)) {
+        error_setg(&err, "Zero copy feature not detected in host kernel");
     }
 
     migration_channel_connect(data->s, sioc, data->hostname, err);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 1/5] QIOChannel: Add flags on io_writev and introduce io_flush callback
  2022-01-06 22:13 ` [PATCH v7 1/5] QIOChannel: Add flags on io_writev and introduce io_flush callback Leonardo Bras
@ 2022-01-13  6:28   ` Peter Xu
  2022-01-18 20:45     ` Leonardo Bras Soares Passos
  2022-01-13 10:52   ` Daniel P. Berrangé
  1 sibling, 1 reply; 40+ messages in thread
From: Peter Xu @ 2022-01-13  6:28 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Daniel P. Berrangé,
	Juan Quintela, Markus Armbruster, qemu-devel,
	Dr. David Alan Gilbert, Eric Blake

On Thu, Jan 06, 2022 at 07:13:38PM -0300, Leonardo Bras wrote:
> diff --git a/io/channel.c b/io/channel.c
> index e8b019dc36..904855e16e 100644
> --- a/io/channel.c
> +++ b/io/channel.c
> @@ -67,12 +67,13 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
>  }
>  
>  
> -ssize_t qio_channel_writev_full(QIOChannel *ioc,
> -                                const struct iovec *iov,
> -                                size_t niov,
> -                                int *fds,
> -                                size_t nfds,
> -                                Error **errp)
> +ssize_t qio_channel_writev_full_flags(QIOChannel *ioc,
> +                                      const struct iovec *iov,
> +                                      size_t niov,
> +                                      int *fds,
> +                                      size_t nfds,
> +                                      int flags,
> +                                      Error **errp)
>  {
>      QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
>  
> @@ -83,7 +84,7 @@ ssize_t qio_channel_writev_full(QIOChannel *ioc,
>          return -1;
>      }

Should we better also check QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY here when
QIO_CHANNEL_WRITE_FLAG_ZERO_COPY is set?  Just like what we do with:

    if ((fds || nfds) &&
        !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS)) {
        error_setg_errno(errp, EINVAL,
                         "Channel does not support file descriptor passing");
        return -1;
    }

I still think it's better to have the caller be crystal clear when to use
zero_copy feature because it has implication on buffer lifetime.

I might have commented similar things before, but I have missed a few versions
so I could also have missed some previous discussions..

>  
> -    return klass->io_writev(ioc, iov, niov, fds, nfds, errp);
> +    return klass->io_writev(ioc, iov, niov, fds, nfds, flags, errp);
>  }

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
  2022-01-06 22:13 ` [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX Leonardo Bras
@ 2022-01-13  6:48   ` Peter Xu
  2022-01-13 10:06     ` Daniel P. Berrangé
                       ` (2 more replies)
  2022-01-13 13:05   ` Daniel P. Berrangé
  1 sibling, 3 replies; 40+ messages in thread
From: Peter Xu @ 2022-01-13  6:48 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Daniel P. Berrangé,
	Juan Quintela, Markus Armbruster, qemu-devel,
	Dr. David Alan Gilbert, Eric Blake

On Thu, Jan 06, 2022 at 07:13:39PM -0300, Leonardo Bras wrote:
> @@ -558,15 +575,26 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
>          memcpy(CMSG_DATA(cmsg), fds, fdsize);
>      }
>  
> +    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> +        sflags = MSG_ZEROCOPY;
> +    }
> +
>   retry:
> -    ret = sendmsg(sioc->fd, &msg, 0);
> +    ret = sendmsg(sioc->fd, &msg, sflags);
>      if (ret <= 0) {
> -        if (errno == EAGAIN) {
> +        switch (errno) {
> +        case EAGAIN:
>              return QIO_CHANNEL_ERR_BLOCK;
> -        }
> -        if (errno == EINTR) {
> +        case EINTR:
>              goto retry;
> +        case ENOBUFS:
> +            if (sflags & MSG_ZEROCOPY) {
> +                error_setg_errno(errp, errno,
> +                                 "Process can't lock enough memory for using MSG_ZEROCOPY");
> +                return -1;
> +            }

I have no idea whether it'll make a real differnece, but - should we better add
a "break" here?  If you agree and with that fixed, feel free to add:

Reviewed-by: Peter Xu <peterx@redhat.com>

I also wonder whether you hit ENOBUFS in any of the environments.  On Fedora
here it's by default unlimited, but just curious when we should keep an eye.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 3/5] migration: Add zero-copy parameter for QMP/HMP for Linux
  2022-01-06 22:13 ` [PATCH v7 3/5] migration: Add zero-copy parameter for QMP/HMP for Linux Leonardo Bras
@ 2022-01-13  7:00   ` Peter Xu
  2022-01-19 17:53     ` Leonardo Bras Soares Passos
  2022-01-13 13:09   ` Daniel P. Berrangé
  1 sibling, 1 reply; 40+ messages in thread
From: Peter Xu @ 2022-01-13  7:00 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Daniel P. Berrangé,
	Juan Quintela, Markus Armbruster, qemu-devel,
	Dr. David Alan Gilbert, Eric Blake

On Thu, Jan 06, 2022 at 07:13:40PM -0300, Leonardo Bras wrote:
> Add property that allows zero-copy migration of memory pages,
> and also includes a helper function migrate_use_zero_copy() to check
> if it's enabled.
> 
> No code is introduced to actually do the migration, but it allow
> future implementations to enable/disable this feature.
> 
> On non-Linux builds this parameter is compiled-out.

I feel sad every time seeing a new parameter needs to be mostly duplicated 3
times in the code. :(

> diff --git a/migration/socket.c b/migration/socket.c
> index 05705a32d8..f7a77aafd3 100644
> --- a/migration/socket.c
> +++ b/migration/socket.c
> @@ -77,6 +77,11 @@ static void socket_outgoing_migration(QIOTask *task,
>      } else {
>          trace_migration_socket_outgoing_connected(data->hostname);
>      }
> +
> +    if (migrate_use_zero_copy()) {
> +        error_setg(&err, "Zero copy not available in migration");
> +    }

I got confused the 1st time looking at it..  I think this is not strongly
needed, but that's okay:

Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 4/5] migration: Add migrate_use_tls() helper
  2022-01-06 22:13 ` [PATCH v7 4/5] migration: Add migrate_use_tls() helper Leonardo Bras
@ 2022-01-13  7:02   ` Peter Xu
  2022-01-19 18:06     ` Leonardo Bras Soares Passos
  2022-01-13 13:10   ` Daniel P. Berrangé
  1 sibling, 1 reply; 40+ messages in thread
From: Peter Xu @ 2022-01-13  7:02 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Daniel P. Berrangé,
	Juan Quintela, Markus Armbruster, qemu-devel,
	Dr. David Alan Gilbert, Eric Blake

On Thu, Jan 06, 2022 at 07:13:41PM -0300, Leonardo Bras wrote:
>  void migration_channel_process_incoming(QIOChannel *ioc)
>  {
> -    MigrationState *s = migrate_get_current();
>      Error *local_err = NULL;
>  
>      trace_migration_set_incoming_channel(
>          ioc, object_get_typename(OBJECT(ioc)));
>  
> -    if (s->parameters.tls_creds &&
> -        *s->parameters.tls_creds &&
> +    if (migrate_use_tls() &&
>          !object_dynamic_cast(OBJECT(ioc),
>                               TYPE_QIO_CHANNEL_TLS)) {
> +        MigrationState *s = migrate_get_current();
> +

Trivial nit: I'd rather keep the line there; as the movement offers nothing,
imho..

>          migration_tls_channel_process_incoming(s, ioc, &local_err);
>      } else {
>          migration_ioc_register_yank(ioc);

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)
  2022-01-06 22:13 ` [PATCH v7 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy) Leonardo Bras
@ 2022-01-13  7:15   ` Peter Xu
  2022-01-19 18:23     ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 40+ messages in thread
From: Peter Xu @ 2022-01-13  7:15 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Daniel P. Berrangé,
	Juan Quintela, Markus Armbruster, qemu-devel,
	Dr. David Alan Gilbert, Eric Blake

On Thu, Jan 06, 2022 at 07:13:42PM -0300, Leonardo Bras wrote:
> Implement zero copy on nocomp_send_write(), by making use of QIOChannel
> writev + flags & flush interface.
> 
> Change multifd_send_sync_main() so it can distinguish each iteration sync from
> the setup and the completion, so a flush_zero_copy() can be called
> after each iteration in order to make sure all dirty pages are sent
> before a new iteration is started.

Leo - could you remind me (since I remembered we've discussed something
similar) on why we can't simply do the sync() unconditionally for zero copy?

I remember why we need the sync(), but I can't remember what's the matter if we
simply sync() too during setup and complete of migration.

Another trivial nit here: 

> -void multifd_send_sync_main(QEMUFile *f)
> +int multifd_send_sync_main(QEMUFile *f, bool sync)

I'd name it "bool full" or anything not called "sync", because the function
already has a name that contains "sync", then it's werid to sync(sync==false).

The rest looks good to me.  Thanks.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
  2022-01-13  6:48   ` Peter Xu
@ 2022-01-13 10:06     ` Daniel P. Berrangé
  2022-01-13 10:34       ` Peter Xu
  2022-01-19 17:01     ` Leonardo Bras Soares Passos
  2022-02-01  4:23     ` Leonardo Bras Soares Passos
  2 siblings, 1 reply; 40+ messages in thread
From: Daniel P. Berrangé @ 2022-01-13 10:06 UTC (permalink / raw)
  To: Peter Xu
  Cc: Juan Quintela, qemu-devel, Markus Armbruster, Leonardo Bras,
	Eric Blake, Dr. David Alan Gilbert

On Thu, Jan 13, 2022 at 02:48:15PM +0800, Peter Xu wrote:
> On Thu, Jan 06, 2022 at 07:13:39PM -0300, Leonardo Bras wrote:
> > @@ -558,15 +575,26 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> >          memcpy(CMSG_DATA(cmsg), fds, fdsize);
> >      }
> >  
> > +    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> > +        sflags = MSG_ZEROCOPY;
> > +    }
> > +
> >   retry:
> > -    ret = sendmsg(sioc->fd, &msg, 0);
> > +    ret = sendmsg(sioc->fd, &msg, sflags);
> >      if (ret <= 0) {
> > -        if (errno == EAGAIN) {
> > +        switch (errno) {
> > +        case EAGAIN:
> >              return QIO_CHANNEL_ERR_BLOCK;
> > -        }
> > -        if (errno == EINTR) {
> > +        case EINTR:
> >              goto retry;
> > +        case ENOBUFS:
> > +            if (sflags & MSG_ZEROCOPY) {
> > +                error_setg_errno(errp, errno,
> > +                                 "Process can't lock enough memory for using MSG_ZEROCOPY");
> > +                return -1;
> > +            }
> 
> I have no idea whether it'll make a real differnece, but - should we better add
> a "break" here?  If you agree and with that fixed, feel free to add:
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> 
> I also wonder whether you hit ENOBUFS in any of the environments.  On Fedora
> here it's by default unlimited, but just curious when we should keep an eye.

Fedora doesn't allow unlimited locked memory by default

$ grep "locked memory" /proc/self/limits 
Max locked memory         65536                65536                bytes     

And  regardless of Fedora defaults, libvirt will set a limit
for the guest. It will only be unlimited if requiring certain
things like VFIO.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
  2022-01-13 10:06     ` Daniel P. Berrangé
@ 2022-01-13 10:34       ` Peter Xu
  2022-01-13 10:42         ` Daniel P. Berrangé
  2022-01-19 17:16         ` Leonardo Bras Soares Passos
  0 siblings, 2 replies; 40+ messages in thread
From: Peter Xu @ 2022-01-13 10:34 UTC (permalink / raw)
  To: Daniel P. Berrangé, Leonardo Bras Soares Passos
  Cc: Juan Quintela, qemu-devel, Markus Armbruster, Leonardo Bras,
	Eric Blake, Dr. David Alan Gilbert

On Thu, Jan 13, 2022 at 10:06:14AM +0000, Daniel P. Berrangé wrote:
> On Thu, Jan 13, 2022 at 02:48:15PM +0800, Peter Xu wrote:
> > On Thu, Jan 06, 2022 at 07:13:39PM -0300, Leonardo Bras wrote:
> > > @@ -558,15 +575,26 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> > >          memcpy(CMSG_DATA(cmsg), fds, fdsize);
> > >      }
> > >  
> > > +    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> > > +        sflags = MSG_ZEROCOPY;
> > > +    }
> > > +
> > >   retry:
> > > -    ret = sendmsg(sioc->fd, &msg, 0);
> > > +    ret = sendmsg(sioc->fd, &msg, sflags);
> > >      if (ret <= 0) {
> > > -        if (errno == EAGAIN) {
> > > +        switch (errno) {
> > > +        case EAGAIN:
> > >              return QIO_CHANNEL_ERR_BLOCK;
> > > -        }
> > > -        if (errno == EINTR) {
> > > +        case EINTR:
> > >              goto retry;
> > > +        case ENOBUFS:
> > > +            if (sflags & MSG_ZEROCOPY) {
> > > +                error_setg_errno(errp, errno,
> > > +                                 "Process can't lock enough memory for using MSG_ZEROCOPY");
> > > +                return -1;
> > > +            }
> > 
> > I have no idea whether it'll make a real differnece, but - should we better add
> > a "break" here?  If you agree and with that fixed, feel free to add:
> > 
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> > 
> > I also wonder whether you hit ENOBUFS in any of the environments.  On Fedora
> > here it's by default unlimited, but just curious when we should keep an eye.
> 
> Fedora doesn't allow unlimited locked memory by default
> 
> $ grep "locked memory" /proc/self/limits 
> Max locked memory         65536                65536                bytes     
> 
> And  regardless of Fedora defaults, libvirt will set a limit
> for the guest. It will only be unlimited if requiring certain
> things like VFIO.

Thanks, I obviously checked up the wrong host..

Leo, do you know how much locked memory will be needed by zero copy?  Will
there be a limit?  Is it linear to the number of sockets/channels?

It'll be better if we can fail at enabling the feature when we detected that
the specified locked memory limit may not be suffice.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
  2022-01-13 10:34       ` Peter Xu
@ 2022-01-13 10:42         ` Daniel P. Berrangé
  2022-01-13 12:12           ` Peter Xu
  2022-01-19 17:22           ` Leonardo Bras Soares Passos
  2022-01-19 17:16         ` Leonardo Bras Soares Passos
  1 sibling, 2 replies; 40+ messages in thread
From: Daniel P. Berrangé @ 2022-01-13 10:42 UTC (permalink / raw)
  To: Peter Xu
  Cc: Juan Quintela, qemu-devel, Markus Armbruster,
	Dr. David Alan Gilbert, Leonardo Bras,
	Leonardo Bras Soares Passos, Eric Blake

On Thu, Jan 13, 2022 at 06:34:12PM +0800, Peter Xu wrote:
> On Thu, Jan 13, 2022 at 10:06:14AM +0000, Daniel P. Berrangé wrote:
> > On Thu, Jan 13, 2022 at 02:48:15PM +0800, Peter Xu wrote:
> > > On Thu, Jan 06, 2022 at 07:13:39PM -0300, Leonardo Bras wrote:
> > > > @@ -558,15 +575,26 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> > > >          memcpy(CMSG_DATA(cmsg), fds, fdsize);
> > > >      }
> > > >  
> > > > +    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> > > > +        sflags = MSG_ZEROCOPY;
> > > > +    }
> > > > +
> > > >   retry:
> > > > -    ret = sendmsg(sioc->fd, &msg, 0);
> > > > +    ret = sendmsg(sioc->fd, &msg, sflags);
> > > >      if (ret <= 0) {
> > > > -        if (errno == EAGAIN) {
> > > > +        switch (errno) {
> > > > +        case EAGAIN:
> > > >              return QIO_CHANNEL_ERR_BLOCK;
> > > > -        }
> > > > -        if (errno == EINTR) {
> > > > +        case EINTR:
> > > >              goto retry;
> > > > +        case ENOBUFS:
> > > > +            if (sflags & MSG_ZEROCOPY) {
> > > > +                error_setg_errno(errp, errno,
> > > > +                                 "Process can't lock enough memory for using MSG_ZEROCOPY");
> > > > +                return -1;
> > > > +            }
> > > 
> > > I have no idea whether it'll make a real differnece, but - should we better add
> > > a "break" here?  If you agree and with that fixed, feel free to add:
> > > 
> > > Reviewed-by: Peter Xu <peterx@redhat.com>
> > > 
> > > I also wonder whether you hit ENOBUFS in any of the environments.  On Fedora
> > > here it's by default unlimited, but just curious when we should keep an eye.
> > 
> > Fedora doesn't allow unlimited locked memory by default
> > 
> > $ grep "locked memory" /proc/self/limits 
> > Max locked memory         65536                65536                bytes     
> > 
> > And  regardless of Fedora defaults, libvirt will set a limit
> > for the guest. It will only be unlimited if requiring certain
> > things like VFIO.
> 
> Thanks, I obviously checked up the wrong host..
> 
> Leo, do you know how much locked memory will be needed by zero copy?  Will
> there be a limit?  Is it linear to the number of sockets/channels?

IIRC we decided it would be limited by the socket send buffer size, rather
than guest RAM, because writes will block once the send buffer is full.

This has a default global setting, with per-socket override. On one box I
have it is 200 Kb. With multifd you'll need  "num-sockets * send buffer".

> It'll be better if we can fail at enabling the feature when we detected that
> the specified locked memory limit may not be suffice.

Checking this value against available locked memory though will always
have an error margin because other things in QEMU can use locked memory
too


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 1/5] QIOChannel: Add flags on io_writev and introduce io_flush callback
  2022-01-06 22:13 ` [PATCH v7 1/5] QIOChannel: Add flags on io_writev and introduce io_flush callback Leonardo Bras
  2022-01-13  6:28   ` Peter Xu
@ 2022-01-13 10:52   ` Daniel P. Berrangé
  2022-01-19  4:38     ` Leonardo Bras Soares Passos
  1 sibling, 1 reply; 40+ messages in thread
From: Daniel P. Berrangé @ 2022-01-13 10:52 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Juan Quintela, qemu-devel, Peter Xu, Markus Armbruster,
	Eric Blake, Dr. David Alan Gilbert

On Thu, Jan 06, 2022 at 07:13:38PM -0300, Leonardo Bras wrote:
> Add flags to io_writev and introduce io_flush as optional callback to
> QIOChannelClass, allowing the implementation of zero copy writes by
> subclasses.
> 
> How to use them:
> - Write data using qio_channel_writev(...,QIO_CHANNEL_WRITE_FLAG_ZERO_COPY),
> - Wait write completion with qio_channel_flush().
> 
> Notes:
> As some zero copy implementations work asynchronously, it's
> recommended to keep the write buffer untouched until the return of
> qio_channel_flush(), to avoid the risk of sending an updated buffer
> instead of the buffer state during write.
> 
> As io_flush callback is optional, if a subclass does not implement it, then:
> - io_flush will return 0 without changing anything.
> 
> Also, some functions like qio_channel_writev_full_all() were adapted to
> receive a flag parameter. That allows shared code between zero copy and
> non-zero copy writev, and also an easier implementation on new flags.
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  include/io/channel.h | 67 +++++++++++++++++++++++++++++++++++---------
>  io/channel-buffer.c  |  1 +
>  io/channel-command.c |  1 +
>  io/channel-file.c    |  1 +
>  io/channel-socket.c  |  2 ++
>  io/channel-tls.c     |  1 +
>  io/channel-websock.c |  1 +
>  io/channel.c         | 51 +++++++++++++++++++++++----------
>  migration/rdma.c     |  1 +
>  9 files changed, 98 insertions(+), 28 deletions(-)
> 
> diff --git a/include/io/channel.h b/include/io/channel.h
> index 88988979f8..343766ce5b 100644
> --- a/include/io/channel.h
> +++ b/include/io/channel.h
> @@ -32,12 +32,15 @@ OBJECT_DECLARE_TYPE(QIOChannel, QIOChannelClass,
>  
>  #define QIO_CHANNEL_ERR_BLOCK -2
>  
> +#define QIO_CHANNEL_WRITE_FLAG_ZERO_COPY 0x1
> +
>  typedef enum QIOChannelFeature QIOChannelFeature;
>  
>  enum QIOChannelFeature {
>      QIO_CHANNEL_FEATURE_FD_PASS,
>      QIO_CHANNEL_FEATURE_SHUTDOWN,
>      QIO_CHANNEL_FEATURE_LISTEN,
> +    QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY,
>  };
>  
>  
> @@ -104,6 +107,7 @@ struct QIOChannelClass {
>                           size_t niov,
>                           int *fds,
>                           size_t nfds,
> +                         int flags,
>                           Error **errp);
>      ssize_t (*io_readv)(QIOChannel *ioc,
>                          const struct iovec *iov,
> @@ -136,6 +140,8 @@ struct QIOChannelClass {
>                                    IOHandler *io_read,
>                                    IOHandler *io_write,
>                                    void *opaque);
> +    int (*io_flush)(QIOChannel *ioc,
> +                    Error **errp);
>  };
>  
>  /* General I/O handling functions */
> @@ -222,12 +228,13 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
>  
>  
>  /**
> - * qio_channel_writev_full:
> + * qio_channel_writev_full_flags:
>   * @ioc: the channel object
>   * @iov: the array of memory regions to write data from
>   * @niov: the length of the @iov array
>   * @fds: an array of file handles to send
>   * @nfds: number of file handles in @fds
> + * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
>   * @errp: pointer to a NULL-initialized error object
>   *
>   * Write data to the IO channel, reading it from the
> @@ -255,12 +262,16 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
>   * or QIO_CHANNEL_ERR_BLOCK if no data is can be sent
>   * and the channel is non-blocking
>   */
> -ssize_t qio_channel_writev_full(QIOChannel *ioc,
> -                                const struct iovec *iov,
> -                                size_t niov,
> -                                int *fds,
> -                                size_t nfds,
> -                                Error **errp);
> +ssize_t qio_channel_writev_full_flags(QIOChannel *ioc,
> +                                      const struct iovec *iov,
> +                                      size_t niov,
> +                                      int *fds,
> +                                      size_t nfds,
> +                                      int flags,
> +                                      Error **errp);
> +
> +#define qio_channel_writev_full(ioc, iov, niov, fds, nfds, errp) \
> +    qio_channel_writev_full_flags(ioc, iov, niov, fds, nfds, 0, errp)

Don't introduce yet another API variant here. Just add flags to
all the existing write APIs with "full" in their name. The word
"full" in their name was intended to indicate that they are
accepting all possible parameters, so it doesn't mean sense to
add APIs which take even more possible parameters.

> +int qio_channel_writev_full_all_flags(QIOChannel *ioc,
> +                                      const struct iovec *iov,
> +                                      size_t niov,
> +                                      int *fds, size_t nfds,
> +                                      int flags, Error **errp);
> +#define qio_channel_writev_full_all(ioc, iov, niov, fds, nfds, errp) \
> +    qio_channel_writev_full_all_flags(ioc, iov, niov, fds, nfds, 0, errp)

Same note.

> +/**
> + * qio_channel_flush:
> + * @ioc: the channel object
> + * @errp: pointer to a NULL-initialized error object
> + *
> + * Will block until every packet queued with
> + * qio_channel_writev_full_flags() + QIO_CHANNEL_WRITE_FLAG_ZERO_COPY
> + * is sent, or return in case of any error.
> + *
> + * If not implemented, acts as a no-op, and returns 0.
> + *
> + * Returns -1 if any error is found,
> + *          1 if every send failed to use zero copy.
> + *          0 otherwise.
> + */
> +
> +int qio_channel_flush(QIOChannel *ioc,
> +                      Error **errp);

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
  2022-01-13 10:42         ` Daniel P. Berrangé
@ 2022-01-13 12:12           ` Peter Xu
  2022-01-19 17:23             ` Leonardo Bras Soares Passos
  2022-01-19 17:22           ` Leonardo Bras Soares Passos
  1 sibling, 1 reply; 40+ messages in thread
From: Peter Xu @ 2022-01-13 12:12 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Juan Quintela, qemu-devel, Markus Armbruster,
	Dr. David Alan Gilbert, Leonardo Bras,
	Leonardo Bras Soares Passos, Eric Blake

On Thu, Jan 13, 2022 at 10:42:39AM +0000, Daniel P. Berrangé wrote:
> On Thu, Jan 13, 2022 at 06:34:12PM +0800, Peter Xu wrote:
> > On Thu, Jan 13, 2022 at 10:06:14AM +0000, Daniel P. Berrangé wrote:
> > > On Thu, Jan 13, 2022 at 02:48:15PM +0800, Peter Xu wrote:
> > > > On Thu, Jan 06, 2022 at 07:13:39PM -0300, Leonardo Bras wrote:
> > > > > @@ -558,15 +575,26 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> > > > >          memcpy(CMSG_DATA(cmsg), fds, fdsize);
> > > > >      }
> > > > >  
> > > > > +    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> > > > > +        sflags = MSG_ZEROCOPY;
> > > > > +    }
> > > > > +
> > > > >   retry:
> > > > > -    ret = sendmsg(sioc->fd, &msg, 0);
> > > > > +    ret = sendmsg(sioc->fd, &msg, sflags);
> > > > >      if (ret <= 0) {
> > > > > -        if (errno == EAGAIN) {
> > > > > +        switch (errno) {
> > > > > +        case EAGAIN:
> > > > >              return QIO_CHANNEL_ERR_BLOCK;
> > > > > -        }
> > > > > -        if (errno == EINTR) {
> > > > > +        case EINTR:
> > > > >              goto retry;
> > > > > +        case ENOBUFS:
> > > > > +            if (sflags & MSG_ZEROCOPY) {
> > > > > +                error_setg_errno(errp, errno,
> > > > > +                                 "Process can't lock enough memory for using MSG_ZEROCOPY");
> > > > > +                return -1;
> > > > > +            }
> > > > 
> > > > I have no idea whether it'll make a real differnece, but - should we better add
> > > > a "break" here?  If you agree and with that fixed, feel free to add:
> > > > 
> > > > Reviewed-by: Peter Xu <peterx@redhat.com>
> > > > 
> > > > I also wonder whether you hit ENOBUFS in any of the environments.  On Fedora
> > > > here it's by default unlimited, but just curious when we should keep an eye.
> > > 
> > > Fedora doesn't allow unlimited locked memory by default
> > > 
> > > $ grep "locked memory" /proc/self/limits 
> > > Max locked memory         65536                65536                bytes     
> > > 
> > > And  regardless of Fedora defaults, libvirt will set a limit
> > > for the guest. It will only be unlimited if requiring certain
> > > things like VFIO.
> > 
> > Thanks, I obviously checked up the wrong host..
> > 
> > Leo, do you know how much locked memory will be needed by zero copy?  Will
> > there be a limit?  Is it linear to the number of sockets/channels?
> 
> IIRC we decided it would be limited by the socket send buffer size, rather
> than guest RAM, because writes will block once the send buffer is full.
> 
> This has a default global setting, with per-socket override. On one box I
> have it is 200 Kb. With multifd you'll need  "num-sockets * send buffer".
> 
> > It'll be better if we can fail at enabling the feature when we detected that
> > the specified locked memory limit may not be suffice.
> 
> Checking this value against available locked memory though will always
> have an error margin because other things in QEMU can use locked memory
> too

We could always still allow false positive in this check, so we can fail if we
have a solid clue to know we'll fail later (e.g. minimum locked_vm needed is
already less than total).  But no strong opinion; we could have this merged and
see whether that's needed in real life.  Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
  2022-01-06 22:13 ` [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX Leonardo Bras
  2022-01-13  6:48   ` Peter Xu
@ 2022-01-13 13:05   ` Daniel P. Berrangé
  2022-01-19 17:24     ` Leonardo Bras Soares Passos
  1 sibling, 1 reply; 40+ messages in thread
From: Daniel P. Berrangé @ 2022-01-13 13:05 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Juan Quintela, qemu-devel, Peter Xu, Markus Armbruster,
	Eric Blake, Dr. David Alan Gilbert

On Thu, Jan 06, 2022 at 07:13:39PM -0300, Leonardo Bras wrote:
> For CONFIG_LINUX, implement the new zero copy flag and the optional callback
> io_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
> feature is available in the host kernel, which is checked on
> qio_channel_socket_connect_sync()
> 
> qio_channel_socket_flush() was implemented by counting how many times
> sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the
> socket's error queue, in order to find how many of them finished sending.
> Flush will loop until those counters are the same, or until some error occurs.
> 
> Notes on using writev() with QIO_CHANNEL_WRITE_FLAG_ZERO_COPY:
> 1: Buffer
> - As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying,
> some caution is necessary to avoid overwriting any buffer before it's sent.
> If something like this happen, a newer version of the buffer may be sent instead.
> - If this is a problem, it's recommended to call qio_channel_flush() before freeing
> or re-using the buffer.
> 
> 2: Locked memory
> - When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and
> unlocked after it's sent.
> - Depending on the size of each buffer, and how often it's sent, it may require
> a larger amount of locked memory than usually available to non-root user.
> - If the required amount of locked memory is not available, writev_zero_copy
> will return an error, which can abort an operation like migration,
> - Because of this, when an user code wants to add zero copy as a feature, it
> requires a mechanism to disable it, so it can still be accessible to less
> privileged users.
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  include/io/channel-socket.h |   2 +
>  io/channel-socket.c         | 107 ++++++++++++++++++++++++++++++++++--
>  2 files changed, 105 insertions(+), 4 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 3/5] migration: Add zero-copy parameter for QMP/HMP for Linux
  2022-01-06 22:13 ` [PATCH v7 3/5] migration: Add zero-copy parameter for QMP/HMP for Linux Leonardo Bras
  2022-01-13  7:00   ` Peter Xu
@ 2022-01-13 13:09   ` Daniel P. Berrangé
  2022-01-19 18:03     ` Leonardo Bras Soares Passos
  1 sibling, 1 reply; 40+ messages in thread
From: Daniel P. Berrangé @ 2022-01-13 13:09 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Juan Quintela, qemu-devel, Peter Xu, Markus Armbruster,
	Eric Blake, Dr. David Alan Gilbert

On Thu, Jan 06, 2022 at 07:13:40PM -0300, Leonardo Bras wrote:
> Add property that allows zero-copy migration of memory pages,
> and also includes a helper function migrate_use_zero_copy() to check
> if it's enabled.
> 
> No code is introduced to actually do the migration, but it allow
> future implementations to enable/disable this feature.
> 
> On non-Linux builds this parameter is compiled-out.
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  qapi/migration.json   | 24 ++++++++++++++++++++++++
>  migration/migration.h |  5 +++++
>  migration/migration.c | 32 ++++++++++++++++++++++++++++++++
>  migration/socket.c    |  5 +++++
>  monitor/hmp-cmds.c    |  6 ++++++
>  5 files changed, 72 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

> 
> diff --git a/qapi/migration.json b/qapi/migration.json
> index bbfd48cf0b..2e62ea6ebd 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -730,6 +730,13 @@
>  #                      will consume more CPU.
>  #                      Defaults to 1. (Since 5.0)
>  #
> +# @zero-copy: Controls behavior on sending memory pages on migration.
> +#             When true, enables a zero-copy mechanism for sending memory
> +#             pages, if host supports it.
> +#             Requires that QEMU be permitted to use locked memory for guest
> +#             RAM pages.
> +#             Defaults to false. (Since 7.0)
> +#
>  # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
>  #                        aliases for the purpose of dirty bitmap migration.  Such
>  #                        aliases may for example be the corresponding names on the
> @@ -769,6 +776,7 @@
>             'xbzrle-cache-size', 'max-postcopy-bandwidth',
>             'max-cpu-throttle', 'multifd-compression',
>             'multifd-zlib-level' ,'multifd-zstd-level',
> +           { 'name': 'zero-copy', 'if' : 'CONFIG_LINUX'},
>             'block-bitmap-mapping' ] }
>  
>  ##
> @@ -895,6 +903,13 @@
>  #                      will consume more CPU.
>  #                      Defaults to 1. (Since 5.0)
>  #
> +# @zero-copy: Controls behavior on sending memory pages on migration.
> +#             When true, enables a zero-copy mechanism for sending memory
> +#             pages, if host supports it.
> +#             Requires that QEMU be permitted to use locked memory for guest
> +#             RAM pages.
> +#             Defaults to false. (Since 7.0)
> +#
>  # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
>  #                        aliases for the purpose of dirty bitmap migration.  Such
>  #                        aliases may for example be the corresponding names on the
> @@ -949,6 +964,7 @@
>              '*multifd-compression': 'MultiFDCompression',
>              '*multifd-zlib-level': 'uint8',
>              '*multifd-zstd-level': 'uint8',
> +            '*zero-copy': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
>              '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }

The current zerocopy impl is for the send path.

Do you expect we might get zerocopy in the receive path
later ?

If so then either call this 'send-zero-copy', or change it
from a bool to an enum taking '["send", "recv", "both"]'.

I'd probably take the former and just rename it.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 4/5] migration: Add migrate_use_tls() helper
  2022-01-06 22:13 ` [PATCH v7 4/5] migration: Add migrate_use_tls() helper Leonardo Bras
  2022-01-13  7:02   ` Peter Xu
@ 2022-01-13 13:10   ` Daniel P. Berrangé
  2022-01-19 18:07     ` Leonardo Bras Soares Passos
  1 sibling, 1 reply; 40+ messages in thread
From: Daniel P. Berrangé @ 2022-01-13 13:10 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Juan Quintela, qemu-devel, Peter Xu, Markus Armbruster,
	Eric Blake, Dr. David Alan Gilbert

On Thu, Jan 06, 2022 at 07:13:41PM -0300, Leonardo Bras wrote:
> A lot of places check parameters.tls_creds in order to evaluate if TLS is
> in use, and sometimes call migrate_get_current() just for that test.
> 
> Add new helper function migrate_use_tls() in order to simplify testing
> for TLS usage.
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> ---
>  migration/migration.h | 1 +
>  migration/channel.c   | 6 +++---
>  migration/migration.c | 9 +++++++++
>  migration/multifd.c   | 5 +----
>  4 files changed, 14 insertions(+), 7 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 1/5] QIOChannel: Add flags on io_writev and introduce io_flush callback
  2022-01-13  6:28   ` Peter Xu
@ 2022-01-18 20:45     ` Leonardo Bras Soares Passos
  2022-01-19  1:58       ` Peter Xu
  0 siblings, 1 reply; 40+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-01-18 20:45 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Juan Quintela, Markus Armbruster, qemu-devel,
	Dr. David Alan Gilbert, Eric Blake

Hello Peter,

On Thu, Jan 13, 2022 at 3:28 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Jan 06, 2022 at 07:13:38PM -0300, Leonardo Bras wrote:
> > diff --git a/io/channel.c b/io/channel.c
> > index e8b019dc36..904855e16e 100644
> > --- a/io/channel.c
> > +++ b/io/channel.c
> > @@ -67,12 +67,13 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
> >  }
> >
> >
> > -ssize_t qio_channel_writev_full(QIOChannel *ioc,
> > -                                const struct iovec *iov,
> > -                                size_t niov,
> > -                                int *fds,
> > -                                size_t nfds,
> > -                                Error **errp)
> > +ssize_t qio_channel_writev_full_flags(QIOChannel *ioc,
> > +                                      const struct iovec *iov,
> > +                                      size_t niov,
> > +                                      int *fds,
> > +                                      size_t nfds,
> > +                                      int flags,
> > +                                      Error **errp)
> >  {
> >      QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
> >
> > @@ -83,7 +84,7 @@ ssize_t qio_channel_writev_full(QIOChannel *ioc,
> >          return -1;
> >      }
>
> Should we better also check QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY here when
> QIO_CHANNEL_WRITE_FLAG_ZERO_COPY is set?  Just like what we do with:

Yes, that's correct.
I will also test for fds + zerocopy_flag , which should also fail here.

>
>     if ((fds || nfds) &&
>         !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS)) {
>         error_setg_errno(errp, EINVAL,
>                          "Channel does not support file descriptor passing");
>         return -1;
>     }
>
> I still think it's better to have the caller be crystal clear when to use
> zero_copy feature because it has implication on buffer lifetime.

I don't disagree with that suggestion.

But the buffer lifetime limitation is something on the socket
implementation, right?
There could be some synchronous zerocopy implementation that does not
require flush, and thus
don't require the buffer to be treated any special. Or am I missing something?

>
> I might have commented similar things before, but I have missed a few versions
> so I could also have missed some previous discussions..
>

That's all great suggestions Peter!  Thanks for that!

Some of the previous suggestions may have been missed because a lot of
code moved.
Sorry about that.

Best regards,
Leo



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 1/5] QIOChannel: Add flags on io_writev and introduce io_flush callback
  2022-01-18 20:45     ` Leonardo Bras Soares Passos
@ 2022-01-19  1:58       ` Peter Xu
  2022-01-19 18:29         ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 40+ messages in thread
From: Peter Xu @ 2022-01-19  1:58 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: Daniel P. Berrangé,
	Juan Quintela, Markus Armbruster, qemu-devel,
	Dr. David Alan Gilbert, Eric Blake

On Tue, Jan 18, 2022 at 05:45:09PM -0300, Leonardo Bras Soares Passos wrote:
> Hello Peter,
> 
> On Thu, Jan 13, 2022 at 3:28 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Thu, Jan 06, 2022 at 07:13:38PM -0300, Leonardo Bras wrote:
> > > diff --git a/io/channel.c b/io/channel.c
> > > index e8b019dc36..904855e16e 100644
> > > --- a/io/channel.c
> > > +++ b/io/channel.c
> > > @@ -67,12 +67,13 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
> > >  }
> > >
> > >
> > > -ssize_t qio_channel_writev_full(QIOChannel *ioc,
> > > -                                const struct iovec *iov,
> > > -                                size_t niov,
> > > -                                int *fds,
> > > -                                size_t nfds,
> > > -                                Error **errp)
> > > +ssize_t qio_channel_writev_full_flags(QIOChannel *ioc,
> > > +                                      const struct iovec *iov,
> > > +                                      size_t niov,
> > > +                                      int *fds,
> > > +                                      size_t nfds,
> > > +                                      int flags,
> > > +                                      Error **errp)
> > >  {
> > >      QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
> > >
> > > @@ -83,7 +84,7 @@ ssize_t qio_channel_writev_full(QIOChannel *ioc,
> > >          return -1;
> > >      }
> >
> > Should we better also check QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY here when
> > QIO_CHANNEL_WRITE_FLAG_ZERO_COPY is set?  Just like what we do with:
> 
> Yes, that's correct.
> I will also test for fds + zerocopy_flag , which should also fail here.
> 
> >
> >     if ((fds || nfds) &&
> >         !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS)) {
> >         error_setg_errno(errp, EINVAL,
> >                          "Channel does not support file descriptor passing");
> >         return -1;
> >     }
> >
> > I still think it's better to have the caller be crystal clear when to use
> > zero_copy feature because it has implication on buffer lifetime.
> 
> I don't disagree with that suggestion.
> 
> But the buffer lifetime limitation is something on the socket
> implementation, right?
> There could be some synchronous zerocopy implementation that does not
> require flush, and thus
> don't require the buffer to be treated any special. Or am I missing something?

Currently the flush() is required for zerocopy and not required for all the
existing non-zerocopy use cases, that's already an API difference so the caller
needs to identify it anyway.  Then I think it's simpler we expose all of it to
the user.

Not to mention IIUC if we don't fail here, it will just fail later when the
code will unconditionally convert the flags=ZEROCOPY into MSG_ZEROCOPY in your
next patch:

    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
        sflags = MSG_ZEROCOPY;
    }

So AFAIU it'll fail anyway, either here with the cap check I mentioned, or
later in sendmsg().

IOW, I think it fails cleaner here, rather than reaching sendmsg().

> 
> >
> > I might have commented similar things before, but I have missed a few versions
> > so I could also have missed some previous discussions..
> >
> 
> That's all great suggestions Peter!  Thanks for that!
> 
> Some of the previous suggestions may have been missed because a lot of
> code moved.
> Sorry about that.

Not a problem at all, I just want to make sure my question still makes
sense. :)

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 1/5] QIOChannel: Add flags on io_writev and introduce io_flush callback
  2022-01-13 10:52   ` Daniel P. Berrangé
@ 2022-01-19  4:38     ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 40+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-01-19  4:38 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Juan Quintela, qemu-devel, Peter Xu, Markus Armbruster,
	Eric Blake, Dr. David Alan Gilbert

Hello Daniel,

On Thu, Jan 13, 2022 at 7:53 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Thu, Jan 06, 2022 at 07:13:38PM -0300, Leonardo Bras wrote:
> > Add flags to io_writev and introduce io_flush as optional callback to
> > QIOChannelClass, allowing the implementation of zero copy writes by
> > subclasses.
> >
> > How to use them:
> > - Write data using qio_channel_writev(...,QIO_CHANNEL_WRITE_FLAG_ZERO_COPY),
> > - Wait write completion with qio_channel_flush().
> >
> > Notes:
> > As some zero copy implementations work asynchronously, it's
> > recommended to keep the write buffer untouched until the return of
> > qio_channel_flush(), to avoid the risk of sending an updated buffer
> > instead of the buffer state during write.
> >
> > As io_flush callback is optional, if a subclass does not implement it, then:
> > - io_flush will return 0 without changing anything.
> >
> > Also, some functions like qio_channel_writev_full_all() were adapted to
> > receive a flag parameter. That allows shared code between zero copy and
> > non-zero copy writev, and also an easier implementation on new flags.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> >  include/io/channel.h | 67 +++++++++++++++++++++++++++++++++++---------
> >  io/channel-buffer.c  |  1 +
> >  io/channel-command.c |  1 +
> >  io/channel-file.c    |  1 +
> >  io/channel-socket.c  |  2 ++
> >  io/channel-tls.c     |  1 +
> >  io/channel-websock.c |  1 +
> >  io/channel.c         | 51 +++++++++++++++++++++++----------
> >  migration/rdma.c     |  1 +
> >  9 files changed, 98 insertions(+), 28 deletions(-)
> >
> > diff --git a/include/io/channel.h b/include/io/channel.h
> > index 88988979f8..343766ce5b 100644
> > --- a/include/io/channel.h
> > +++ b/include/io/channel.h
> > @@ -32,12 +32,15 @@ OBJECT_DECLARE_TYPE(QIOChannel, QIOChannelClass,
> >
> >  #define QIO_CHANNEL_ERR_BLOCK -2
> >
> > +#define QIO_CHANNEL_WRITE_FLAG_ZERO_COPY 0x1
> > +
> >  typedef enum QIOChannelFeature QIOChannelFeature;
> >
> >  enum QIOChannelFeature {
> >      QIO_CHANNEL_FEATURE_FD_PASS,
> >      QIO_CHANNEL_FEATURE_SHUTDOWN,
> >      QIO_CHANNEL_FEATURE_LISTEN,
> > +    QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY,
> >  };
> >
> >
> > @@ -104,6 +107,7 @@ struct QIOChannelClass {
> >                           size_t niov,
> >                           int *fds,
> >                           size_t nfds,
> > +                         int flags,
> >                           Error **errp);
> >      ssize_t (*io_readv)(QIOChannel *ioc,
> >                          const struct iovec *iov,
> > @@ -136,6 +140,8 @@ struct QIOChannelClass {
> >                                    IOHandler *io_read,
> >                                    IOHandler *io_write,
> >                                    void *opaque);
> > +    int (*io_flush)(QIOChannel *ioc,
> > +                    Error **errp);
> >  };
> >
> >  /* General I/O handling functions */
> > @@ -222,12 +228,13 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
> >
> >
> >  /**
> > - * qio_channel_writev_full:
> > + * qio_channel_writev_full_flags:
> >   * @ioc: the channel object
> >   * @iov: the array of memory regions to write data from
> >   * @niov: the length of the @iov array
> >   * @fds: an array of file handles to send
> >   * @nfds: number of file handles in @fds
> > + * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
> >   * @errp: pointer to a NULL-initialized error object
> >   *
> >   * Write data to the IO channel, reading it from the
> > @@ -255,12 +262,16 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
> >   * or QIO_CHANNEL_ERR_BLOCK if no data is can be sent
> >   * and the channel is non-blocking
> >   */
> > -ssize_t qio_channel_writev_full(QIOChannel *ioc,
> > -                                const struct iovec *iov,
> > -                                size_t niov,
> > -                                int *fds,
> > -                                size_t nfds,
> > -                                Error **errp);
> > +ssize_t qio_channel_writev_full_flags(QIOChannel *ioc,
> > +                                      const struct iovec *iov,
> > +                                      size_t niov,
> > +                                      int *fds,
> > +                                      size_t nfds,
> > +                                      int flags,
> > +                                      Error **errp);
> > +
> > +#define qio_channel_writev_full(ioc, iov, niov, fds, nfds, errp) \
> > +    qio_channel_writev_full_flags(ioc, iov, niov, fds, nfds, 0, errp)
>
> Don't introduce yet another API variant here. Just add flags to
> all the existing write APIs with "full" in their name. The word
> "full" in their name was intended to indicate that they are
> accepting all possible parameters, so it doesn't mean sense to
> add APIs which take even more possible parameters.

Oh, I was not aware of that. Thanks for letting me know!

Sure, I will do this change for v8.


>
> > +int qio_channel_writev_full_all_flags(QIOChannel *ioc,
> > +                                      const struct iovec *iov,
> > +                                      size_t niov,
> > +                                      int *fds, size_t nfds,
> > +                                      int flags, Error **errp);
> > +#define qio_channel_writev_full_all(ioc, iov, niov, fds, nfds, errp) \
> > +    qio_channel_writev_full_all_flags(ioc, iov, niov, fds, nfds, 0, errp)
>
> Same note.
>
> > +/**
> > + * qio_channel_flush:
> > + * @ioc: the channel object
> > + * @errp: pointer to a NULL-initialized error object
> > + *
> > + * Will block until every packet queued with
> > + * qio_channel_writev_full_flags() + QIO_CHANNEL_WRITE_FLAG_ZERO_COPY
> > + * is sent, or return in case of any error.
> > + *
> > + * If not implemented, acts as a no-op, and returns 0.
> > + *
> > + * Returns -1 if any error is found,
> > + *          1 if every send failed to use zero copy.
> > + *          0 otherwise.
> > + */
> > +
> > +int qio_channel_flush(QIOChannel *ioc,
> > +                      Error **errp);
>
>

Best regards,
Leo



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
  2022-01-13  6:48   ` Peter Xu
  2022-01-13 10:06     ` Daniel P. Berrangé
@ 2022-01-19 17:01     ` Leonardo Bras Soares Passos
  2022-02-01  4:23     ` Leonardo Bras Soares Passos
  2 siblings, 0 replies; 40+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-01-19 17:01 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Juan Quintela, Markus Armbruster, qemu-devel,
	Dr. David Alan Gilbert, Eric Blake

Hello Peter,

On Thu, Jan 13, 2022 at 3:48 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Jan 06, 2022 at 07:13:39PM -0300, Leonardo Bras wrote:
> > @@ -558,15 +575,26 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> >          memcpy(CMSG_DATA(cmsg), fds, fdsize);
> >      }
> >
> > +    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> > +        sflags = MSG_ZEROCOPY;
> > +    }
> > +
> >   retry:
> > -    ret = sendmsg(sioc->fd, &msg, 0);
> > +    ret = sendmsg(sioc->fd, &msg, sflags);
> >      if (ret <= 0) {
> > -        if (errno == EAGAIN) {
> > +        switch (errno) {
> > +        case EAGAIN:
> >              return QIO_CHANNEL_ERR_BLOCK;
> > -        }
> > -        if (errno == EINTR) {
> > +        case EINTR:
> >              goto retry;
> > +        case ENOBUFS:
> > +            if (sflags & MSG_ZEROCOPY) {
> > +                error_setg_errno(errp, errno,
> > +                                 "Process can't lock enough memory for using MSG_ZEROCOPY");
> > +                return -1;
> > +            }
>
> I have no idea whether it'll make a real differnece, but - should we better add
> a "break" here?

Here I followed the standard of the EAGAIN error, that's why I just returned -1.

IIUC A break here would cause the errp to be re-set to the default
message, after the switch.
Another option would be to add a 'default' clause, and move the
default error msg there, and return the -1
after the switch.

In the end I thought the current way was simpler, but it's no issue
to change if you think the 'default' idea would be better.

>  If you agree and with that fixed, feel free to add:
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
>

Thanks!

> I also wonder whether you hit ENOBUFS in any of the environments.  On Fedora
> here it's by default unlimited, but just curious when we should keep an eye.

It's unlimited if you run as root IIRC.

>
> Thanks,
>
> --
> Peter Xu
>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
  2022-01-13 10:34       ` Peter Xu
  2022-01-13 10:42         ` Daniel P. Berrangé
@ 2022-01-19 17:16         ` Leonardo Bras Soares Passos
  1 sibling, 0 replies; 40+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-01-19 17:16 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Juan Quintela, Markus Armbruster, qemu-devel,
	Dr. David Alan Gilbert, Eric Blake

On Thu, Jan 13, 2022 at 7:34 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Jan 13, 2022 at 10:06:14AM +0000, Daniel P. Berrangé wrote:
> > On Thu, Jan 13, 2022 at 02:48:15PM +0800, Peter Xu wrote:
> > > On Thu, Jan 06, 2022 at 07:13:39PM -0300, Leonardo Bras wrote:
> > > > @@ -558,15 +575,26 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> > > >          memcpy(CMSG_DATA(cmsg), fds, fdsize);
> > > >      }
> > > >
> > > > +    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> > > > +        sflags = MSG_ZEROCOPY;
> > > > +    }
> > > > +
> > > >   retry:
> > > > -    ret = sendmsg(sioc->fd, &msg, 0);
> > > > +    ret = sendmsg(sioc->fd, &msg, sflags);
> > > >      if (ret <= 0) {
> > > > -        if (errno == EAGAIN) {
> > > > +        switch (errno) {
> > > > +        case EAGAIN:
> > > >              return QIO_CHANNEL_ERR_BLOCK;
> > > > -        }
> > > > -        if (errno == EINTR) {
> > > > +        case EINTR:
> > > >              goto retry;
> > > > +        case ENOBUFS:
> > > > +            if (sflags & MSG_ZEROCOPY) {
> > > > +                error_setg_errno(errp, errno,
> > > > +                                 "Process can't lock enough memory for using MSG_ZEROCOPY");
> > > > +                return -1;
> > > > +            }
> > >
> > > I have no idea whether it'll make a real differnece, but - should we better add
> > > a "break" here?  If you agree and with that fixed, feel free to add:
> > >
> > > Reviewed-by: Peter Xu <peterx@redhat.com>
> > >
> > > I also wonder whether you hit ENOBUFS in any of the environments.  On Fedora
> > > here it's by default unlimited, but just curious when we should keep an eye.
> >
> > Fedora doesn't allow unlimited locked memory by default
> >
> > $ grep "locked memory" /proc/self/limits
> > Max locked memory         65536                65536                bytes
> >
> > And  regardless of Fedora defaults, libvirt will set a limit
> > for the guest. It will only be unlimited if requiring certain
> > things like VFIO.
>
> Thanks, I obviously checked up the wrong host..
>
> Leo, do you know how much locked memory will be needed by zero copy?  Will
> there be a limit?  Is it linear to the number of sockets/channels?

It depends on the number of channels, of course, but there are
influencing factors, like network bandwidth & usage, and cpu speed &
usage, network queue size, VM pagesize and so on.

A simple exemple:
If the cpu is free/fast, but there are other applications using the
network, we may enqueue a lot of stuff for sending, and end up needing
a lot of locked memory.

I don't think it's easy to calculate a good reference value for locked
memory here.

>
> It'll be better if we can fail at enabling the feature when we detected that
> the specified locked memory limit may not be suffice.

I agree it's a good idea. But having this reference value calculated
is not much simple, IIUC.

>
> --
> Peter Xu
>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
  2022-01-13 10:42         ` Daniel P. Berrangé
  2022-01-13 12:12           ` Peter Xu
@ 2022-01-19 17:22           ` Leonardo Bras Soares Passos
  2022-01-20  1:28             ` Peter Xu
  1 sibling, 1 reply; 40+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-01-19 17:22 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Juan Quintela, qemu-devel, Peter Xu, Markus Armbruster,
	Eric Blake, Dr. David Alan Gilbert

Hello Daniel,

On Thu, Jan 13, 2022 at 7:42 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Thu, Jan 13, 2022 at 06:34:12PM +0800, Peter Xu wrote:
> > On Thu, Jan 13, 2022 at 10:06:14AM +0000, Daniel P. Berrangé wrote:
> > > On Thu, Jan 13, 2022 at 02:48:15PM +0800, Peter Xu wrote:
> > > > On Thu, Jan 06, 2022 at 07:13:39PM -0300, Leonardo Bras wrote:
> > > > > @@ -558,15 +575,26 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> > > > >          memcpy(CMSG_DATA(cmsg), fds, fdsize);
> > > > >      }
> > > > >
> > > > > +    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> > > > > +        sflags = MSG_ZEROCOPY;
> > > > > +    }
> > > > > +
> > > > >   retry:
> > > > > -    ret = sendmsg(sioc->fd, &msg, 0);
> > > > > +    ret = sendmsg(sioc->fd, &msg, sflags);
> > > > >      if (ret <= 0) {
> > > > > -        if (errno == EAGAIN) {
> > > > > +        switch (errno) {
> > > > > +        case EAGAIN:
> > > > >              return QIO_CHANNEL_ERR_BLOCK;
> > > > > -        }
> > > > > -        if (errno == EINTR) {
> > > > > +        case EINTR:
> > > > >              goto retry;
> > > > > +        case ENOBUFS:
> > > > > +            if (sflags & MSG_ZEROCOPY) {
> > > > > +                error_setg_errno(errp, errno,
> > > > > +                                 "Process can't lock enough memory for using MSG_ZEROCOPY");
> > > > > +                return -1;
> > > > > +            }
> > > >
> > > > I have no idea whether it'll make a real differnece, but - should we better add
> > > > a "break" here?  If you agree and with that fixed, feel free to add:
> > > >
> > > > Reviewed-by: Peter Xu <peterx@redhat.com>
> > > >
> > > > I also wonder whether you hit ENOBUFS in any of the environments.  On Fedora
> > > > here it's by default unlimited, but just curious when we should keep an eye.
> > >
> > > Fedora doesn't allow unlimited locked memory by default
> > >
> > > $ grep "locked memory" /proc/self/limits
> > > Max locked memory         65536                65536                bytes
> > >
> > > And  regardless of Fedora defaults, libvirt will set a limit
> > > for the guest. It will only be unlimited if requiring certain
> > > things like VFIO.
> >
> > Thanks, I obviously checked up the wrong host..
> >
> > Leo, do you know how much locked memory will be needed by zero copy?  Will
> > there be a limit?  Is it linear to the number of sockets/channels?
>
> IIRC we decided it would be limited by the socket send buffer size, rather
> than guest RAM, because writes will block once the send buffer is full.
>
> This has a default global setting, with per-socket override. On one box I
> have it is 200 Kb. With multifd you'll need  "num-sockets * send buffer".

Oh, I was not aware there is a send buffer size (or maybe I am unable
to recall).
That sure makes things much easier.

>
> > It'll be better if we can fail at enabling the feature when we detected that
> > the specified locked memory limit may not be suffice.

sure

>
> Checking this value against available locked memory though will always
> have an error margin because other things in QEMU can use locked memory
> too

We can get the current limit (before zerocopy) as an error margin:
req_lock_mem = num-sockets * send buffer + BASE_LOCKED

Where BASE_LOCKED is the current libvirt value, or so on.

What do you think?

Best regards,
Leo



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
  2022-01-13 12:12           ` Peter Xu
@ 2022-01-19 17:23             ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 40+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-01-19 17:23 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Juan Quintela, Markus Armbruster, qemu-devel,
	Dr. David Alan Gilbert, Eric Blake

On Thu, Jan 13, 2022 at 9:12 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Jan 13, 2022 at 10:42:39AM +0000, Daniel P. Berrangé wrote:
> > On Thu, Jan 13, 2022 at 06:34:12PM +0800, Peter Xu wrote:
> > > On Thu, Jan 13, 2022 at 10:06:14AM +0000, Daniel P. Berrangé wrote:
> > > > On Thu, Jan 13, 2022 at 02:48:15PM +0800, Peter Xu wrote:
> > > > > On Thu, Jan 06, 2022 at 07:13:39PM -0300, Leonardo Bras wrote:
> > > > > > @@ -558,15 +575,26 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> > > > > >          memcpy(CMSG_DATA(cmsg), fds, fdsize);
> > > > > >      }
> > > > > >
> > > > > > +    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> > > > > > +        sflags = MSG_ZEROCOPY;
> > > > > > +    }
> > > > > > +
> > > > > >   retry:
> > > > > > -    ret = sendmsg(sioc->fd, &msg, 0);
> > > > > > +    ret = sendmsg(sioc->fd, &msg, sflags);
> > > > > >      if (ret <= 0) {
> > > > > > -        if (errno == EAGAIN) {
> > > > > > +        switch (errno) {
> > > > > > +        case EAGAIN:
> > > > > >              return QIO_CHANNEL_ERR_BLOCK;
> > > > > > -        }
> > > > > > -        if (errno == EINTR) {
> > > > > > +        case EINTR:
> > > > > >              goto retry;
> > > > > > +        case ENOBUFS:
> > > > > > +            if (sflags & MSG_ZEROCOPY) {
> > > > > > +                error_setg_errno(errp, errno,
> > > > > > +                                 "Process can't lock enough memory for using MSG_ZEROCOPY");
> > > > > > +                return -1;
> > > > > > +            }
> > > > >
> > > > > I have no idea whether it'll make a real differnece, but - should we better add
> > > > > a "break" here?  If you agree and with that fixed, feel free to add:
> > > > >
> > > > > Reviewed-by: Peter Xu <peterx@redhat.com>
> > > > >
> > > > > I also wonder whether you hit ENOBUFS in any of the environments.  On Fedora
> > > > > here it's by default unlimited, but just curious when we should keep an eye.
> > > >
> > > > Fedora doesn't allow unlimited locked memory by default
> > > >
> > > > $ grep "locked memory" /proc/self/limits
> > > > Max locked memory         65536                65536                bytes
> > > >
> > > > And  regardless of Fedora defaults, libvirt will set a limit
> > > > for the guest. It will only be unlimited if requiring certain
> > > > things like VFIO.
> > >
> > > Thanks, I obviously checked up the wrong host..
> > >
> > > Leo, do you know how much locked memory will be needed by zero copy?  Will
> > > there be a limit?  Is it linear to the number of sockets/channels?
> >
> > IIRC we decided it would be limited by the socket send buffer size, rather
> > than guest RAM, because writes will block once the send buffer is full.
> >
> > This has a default global setting, with per-socket override. On one box I
> > have it is 200 Kb. With multifd you'll need  "num-sockets * send buffer".
> >
> > > It'll be better if we can fail at enabling the feature when we detected that
> > > the specified locked memory limit may not be suffice.
> >
> > Checking this value against available locked memory though will always
> > have an error margin because other things in QEMU can use locked memory
> > too
>
> We could always still allow false positive in this check, so we can fail if we
> have a solid clue to know we'll fail later (e.g. minimum locked_vm needed is
> already less than total).  But no strong opinion; we could have this merged and
> see whether that's needed in real life.  Thanks,

I agree, this is a good approach.

Leo



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
  2022-01-13 13:05   ` Daniel P. Berrangé
@ 2022-01-19 17:24     ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 40+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-01-19 17:24 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Juan Quintela, qemu-devel, Peter Xu, Markus Armbruster,
	Eric Blake, Dr. David Alan Gilbert

On Thu, Jan 13, 2022 at 10:06 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Thu, Jan 06, 2022 at 07:13:39PM -0300, Leonardo Bras wrote:
> > For CONFIG_LINUX, implement the new zero copy flag and the optional callback
> > io_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
> > feature is available in the host kernel, which is checked on
> > qio_channel_socket_connect_sync()
> >
> > qio_channel_socket_flush() was implemented by counting how many times
> > sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the
> > socket's error queue, in order to find how many of them finished sending.
> > Flush will loop until those counters are the same, or until some error occurs.
> >
> > Notes on using writev() with QIO_CHANNEL_WRITE_FLAG_ZERO_COPY:
> > 1: Buffer
> > - As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying,
> > some caution is necessary to avoid overwriting any buffer before it's sent.
> > If something like this happen, a newer version of the buffer may be sent instead.
> > - If this is a problem, it's recommended to call qio_channel_flush() before freeing
> > or re-using the buffer.
> >
> > 2: Locked memory
> > - When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and
> > unlocked after it's sent.
> > - Depending on the size of each buffer, and how often it's sent, it may require
> > a larger amount of locked memory than usually available to non-root user.
> > - If the required amount of locked memory is not available, writev_zero_copy
> > will return an error, which can abort an operation like migration,
> > - Because of this, when an user code wants to add zero copy as a feature, it
> > requires a mechanism to disable it, so it can still be accessible to less
> > privileged users.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> >  include/io/channel-socket.h |   2 +
> >  io/channel-socket.c         | 107 ++++++++++++++++++++++++++++++++++--
> >  2 files changed, 105 insertions(+), 4 deletions(-)
>
> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
>

Thanks!

>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 3/5] migration: Add zero-copy parameter for QMP/HMP for Linux
  2022-01-13  7:00   ` Peter Xu
@ 2022-01-19 17:53     ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 40+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-01-19 17:53 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Juan Quintela, Markus Armbruster, qemu-devel,
	Dr. David Alan Gilbert, Eric Blake

Hello Peter,

On Thu, Jan 13, 2022 at 4:00 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Jan 06, 2022 at 07:13:40PM -0300, Leonardo Bras wrote:
> > Add property that allows zero-copy migration of memory pages,
> > and also includes a helper function migrate_use_zero_copy() to check
> > if it's enabled.
> >
> > No code is introduced to actually do the migration, but it allow
> > future implementations to enable/disable this feature.
> >
> > On non-Linux builds this parameter is compiled-out.
>
> I feel sad every time seeing a new parameter needs to be mostly duplicated 3
> times in the code. :(
>
> > diff --git a/migration/socket.c b/migration/socket.c
> > index 05705a32d8..f7a77aafd3 100644
> > --- a/migration/socket.c
> > +++ b/migration/socket.c
> > @@ -77,6 +77,11 @@ static void socket_outgoing_migration(QIOTask *task,
> >      } else {
> >          trace_migration_socket_outgoing_connected(data->hostname);
> >      }
> > +
> > +    if (migrate_use_zero_copy()) {
> > +        error_setg(&err, "Zero copy not available in migration");
> > +    }
>
> I got confused the 1st time looking at it..  I think this is not strongly
> needed, but that's okay:

The idea is to avoid some future issues on testing migration while bisecting.

>
> Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks Peter!

>
> Thanks,
>
> --
> Peter Xu
>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 3/5] migration: Add zero-copy parameter for QMP/HMP for Linux
  2022-01-13 13:09   ` Daniel P. Berrangé
@ 2022-01-19 18:03     ` Leonardo Bras Soares Passos
  2022-01-19 18:15       ` Daniel P. Berrangé
  0 siblings, 1 reply; 40+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-01-19 18:03 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Juan Quintela, qemu-devel, Peter Xu, Markus Armbruster,
	Eric Blake, Dr. David Alan Gilbert

Hello Daniel,

On Thu, Jan 13, 2022 at 10:10 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Thu, Jan 06, 2022 at 07:13:40PM -0300, Leonardo Bras wrote:
> > Add property that allows zero-copy migration of memory pages,
> > and also includes a helper function migrate_use_zero_copy() to check
> > if it's enabled.
> >
> > No code is introduced to actually do the migration, but it allow
> > future implementations to enable/disable this feature.
> >
> > On non-Linux builds this parameter is compiled-out.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> >  qapi/migration.json   | 24 ++++++++++++++++++++++++
> >  migration/migration.h |  5 +++++
> >  migration/migration.c | 32 ++++++++++++++++++++++++++++++++
> >  migration/socket.c    |  5 +++++
> >  monitor/hmp-cmds.c    |  6 ++++++
> >  5 files changed, 72 insertions(+)
>
> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

Thanks!

>
> >
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index bbfd48cf0b..2e62ea6ebd 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -730,6 +730,13 @@
> >  #                      will consume more CPU.
> >  #                      Defaults to 1. (Since 5.0)
> >  #
> > +# @zero-copy: Controls behavior on sending memory pages on migration.
> > +#             When true, enables a zero-copy mechanism for sending memory
> > +#             pages, if host supports it.
> > +#             Requires that QEMU be permitted to use locked memory for guest
> > +#             RAM pages.
> > +#             Defaults to false. (Since 7.0)
> > +#
> >  # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
> >  #                        aliases for the purpose of dirty bitmap migration.  Such
> >  #                        aliases may for example be the corresponding names on the
> > @@ -769,6 +776,7 @@
> >             'xbzrle-cache-size', 'max-postcopy-bandwidth',
> >             'max-cpu-throttle', 'multifd-compression',
> >             'multifd-zlib-level' ,'multifd-zstd-level',
> > +           { 'name': 'zero-copy', 'if' : 'CONFIG_LINUX'},
> >             'block-bitmap-mapping' ] }
> >
> >  ##
> > @@ -895,6 +903,13 @@
> >  #                      will consume more CPU.
> >  #                      Defaults to 1. (Since 5.0)
> >  #
> > +# @zero-copy: Controls behavior on sending memory pages on migration.
> > +#             When true, enables a zero-copy mechanism for sending memory
> > +#             pages, if host supports it.
> > +#             Requires that QEMU be permitted to use locked memory for guest
> > +#             RAM pages.
> > +#             Defaults to false. (Since 7.0)
> > +#
> >  # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
> >  #                        aliases for the purpose of dirty bitmap migration.  Such
> >  #                        aliases may for example be the corresponding names on the
> > @@ -949,6 +964,7 @@
> >              '*multifd-compression': 'MultiFDCompression',
> >              '*multifd-zlib-level': 'uint8',
> >              '*multifd-zstd-level': 'uint8',
> > +            '*zero-copy': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
> >              '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
>
> The current zerocopy impl is for the send path.
>
> Do you expect we might get zerocopy in the receive path
> later ?

It's possible, but I haven't started the implementation yet.

>
> If so then either call this 'send-zero-copy', or change it
> from a bool to an enum taking '["send", "recv", "both"]'.
>
> I'd probably take the former and just rename it.
>

Well, my rationale:
- I want to set zero copy sending:
zero-copy is set in the sending host, start migration.

- I want to set zero copy receiving:
zero-copy is set in the receiving host, wait for migration.
(Of course host support is checked when setting the parameter).

The problem with the current approach is trying to enable zero-copy on
receive before it's implemented, which will 'fail' silently .
A possible solution would be to add a patch to check in the receiving
path if zero-copy is enabled, and fail for now.

What do you think?

Best regards,
Leo

>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 4/5] migration: Add migrate_use_tls() helper
  2022-01-13  7:02   ` Peter Xu
@ 2022-01-19 18:06     ` Leonardo Bras Soares Passos
  2022-01-20  1:37       ` Peter Xu
  0 siblings, 1 reply; 40+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-01-19 18:06 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Juan Quintela, Markus Armbruster, qemu-devel,
	Dr. David Alan Gilbert, Eric Blake

Hello Peter,

On Thu, Jan 13, 2022 at 4:02 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Jan 06, 2022 at 07:13:41PM -0300, Leonardo Bras wrote:
> >  void migration_channel_process_incoming(QIOChannel *ioc)
> >  {
> > -    MigrationState *s = migrate_get_current();
> >      Error *local_err = NULL;
> >
> >      trace_migration_set_incoming_channel(
> >          ioc, object_get_typename(OBJECT(ioc)));
> >
> > -    if (s->parameters.tls_creds &&
> > -        *s->parameters.tls_creds &&
> > +    if (migrate_use_tls() &&
> >          !object_dynamic_cast(OBJECT(ioc),
> >                               TYPE_QIO_CHANNEL_TLS)) {
> > +        MigrationState *s = migrate_get_current();
> > +
>
> Trivial nit: I'd rather keep the line there; as the movement offers nothing,
> imho..

The idea to move the 's' to inside the if  block is to make it clear
it's only used in this case.

But if you think it's better to keep it at the beginning of the
function, sure, I can change that.
Just let me know.

>
> >          migration_tls_channel_process_incoming(s, ioc, &local_err);
> >      } else {
> >          migration_ioc_register_yank(ioc);
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
>

Thanks!

> --
> Peter Xu
>

Best regards,
Leo



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 4/5] migration: Add migrate_use_tls() helper
  2022-01-13 13:10   ` Daniel P. Berrangé
@ 2022-01-19 18:07     ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 40+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-01-19 18:07 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Juan Quintela, qemu-devel, Peter Xu, Markus Armbruster,
	Eric Blake, Dr. David Alan Gilbert

Hello Daniel,

On Thu, Jan 13, 2022 at 10:11 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Thu, Jan 06, 2022 at 07:13:41PM -0300, Leonardo Bras wrote:
> > A lot of places check parameters.tls_creds in order to evaluate if TLS is
> > in use, and sometimes call migrate_get_current() just for that test.
> >
> > Add new helper function migrate_use_tls() in order to simplify testing
> > for TLS usage.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > Reviewed-by: Juan Quintela <quintela@redhat.com>
> > ---
> >  migration/migration.h | 1 +
> >  migration/channel.c   | 6 +++---
> >  migration/migration.c | 9 +++++++++
> >  migration/multifd.c   | 5 +----
> >  4 files changed, 14 insertions(+), 7 deletions(-)
>
> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
>

Thanks!

> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>

Best regards,
Leo



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 3/5] migration: Add zero-copy parameter for QMP/HMP for Linux
  2022-01-19 18:03     ` Leonardo Bras Soares Passos
@ 2022-01-19 18:15       ` Daniel P. Berrangé
  2022-01-19 18:46         ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 40+ messages in thread
From: Daniel P. Berrangé @ 2022-01-19 18:15 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: Juan Quintela, qemu-devel, Peter Xu, Markus Armbruster,
	Eric Blake, Dr. David Alan Gilbert

On Wed, Jan 19, 2022 at 03:03:29PM -0300, Leonardo Bras Soares Passos wrote:
> Hello Daniel,
> 
> On Thu, Jan 13, 2022 at 10:10 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > On Thu, Jan 06, 2022 at 07:13:40PM -0300, Leonardo Bras wrote:
> > > Add property that allows zero-copy migration of memory pages,
> > > and also includes a helper function migrate_use_zero_copy() to check
> > > if it's enabled.
> > >
> > > No code is introduced to actually do the migration, but it allow
> > > future implementations to enable/disable this feature.
> > >
> > > On non-Linux builds this parameter is compiled-out.
> > >
> > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > ---
> > >  qapi/migration.json   | 24 ++++++++++++++++++++++++
> > >  migration/migration.h |  5 +++++
> > >  migration/migration.c | 32 ++++++++++++++++++++++++++++++++
> > >  migration/socket.c    |  5 +++++
> > >  monitor/hmp-cmds.c    |  6 ++++++
> > >  5 files changed, 72 insertions(+)
> >
> > Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
> 
> Thanks!
> 
> >
> > >
> > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > index bbfd48cf0b..2e62ea6ebd 100644
> > > --- a/qapi/migration.json
> > > +++ b/qapi/migration.json
> > > @@ -730,6 +730,13 @@
> > >  #                      will consume more CPU.
> > >  #                      Defaults to 1. (Since 5.0)
> > >  #
> > > +# @zero-copy: Controls behavior on sending memory pages on migration.
> > > +#             When true, enables a zero-copy mechanism for sending memory
> > > +#             pages, if host supports it.
> > > +#             Requires that QEMU be permitted to use locked memory for guest
> > > +#             RAM pages.
> > > +#             Defaults to false. (Since 7.0)
> > > +#
> > >  # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
> > >  #                        aliases for the purpose of dirty bitmap migration.  Such
> > >  #                        aliases may for example be the corresponding names on the
> > > @@ -769,6 +776,7 @@
> > >             'xbzrle-cache-size', 'max-postcopy-bandwidth',
> > >             'max-cpu-throttle', 'multifd-compression',
> > >             'multifd-zlib-level' ,'multifd-zstd-level',
> > > +           { 'name': 'zero-copy', 'if' : 'CONFIG_LINUX'},
> > >             'block-bitmap-mapping' ] }
> > >
> > >  ##
> > > @@ -895,6 +903,13 @@
> > >  #                      will consume more CPU.
> > >  #                      Defaults to 1. (Since 5.0)
> > >  #
> > > +# @zero-copy: Controls behavior on sending memory pages on migration.
> > > +#             When true, enables a zero-copy mechanism for sending memory
> > > +#             pages, if host supports it.
> > > +#             Requires that QEMU be permitted to use locked memory for guest
> > > +#             RAM pages.
> > > +#             Defaults to false. (Since 7.0)
> > > +#
> > >  # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
> > >  #                        aliases for the purpose of dirty bitmap migration.  Such
> > >  #                        aliases may for example be the corresponding names on the
> > > @@ -949,6 +964,7 @@
> > >              '*multifd-compression': 'MultiFDCompression',
> > >              '*multifd-zlib-level': 'uint8',
> > >              '*multifd-zstd-level': 'uint8',
> > > +            '*zero-copy': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
> > >              '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
> >
> > The current zerocopy impl is for the send path.
> >
> > Do you expect we might get zerocopy in the receive path
> > later ?
> 
> It's possible, but I haven't started the implementation yet.
> 
> >
> > If so then either call this 'send-zero-copy', or change it
> > from a bool to an enum taking '["send", "recv", "both"]'.
> >
> > I'd probably take the former and just rename it.
> >
> 
> Well, my rationale:
> - I want to set zero copy sending:
> zero-copy is set in the sending host, start migration.
> 
> - I want to set zero copy receiving:
> zero-copy is set in the receiving host, wait for migration.
> (Of course host support is checked when setting the parameter).
> 
> The problem with the current approach is trying to enable zero-copy on
> receive before it's implemented, which will 'fail' silently .
> A possible solution would be to add a patch to check in the receiving
> path if zero-copy is enabled, and fail for now.

That's not good because mgmt apps cannot query the QAPI schema
to find out if this feature is supported or not.

If we wantt o support zero copy recv, then we need an explicit
flag for it that is distinct from zero copy send, so that apps
can introspect whether the feature is implemneted in QEMU or
not.

Distinct named bool flags feels better, and also makes it clear
to anyone not familiar with the impl that the current  code is
strictly send only.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)
  2022-01-13  7:15   ` Peter Xu
@ 2022-01-19 18:23     ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 40+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-01-19 18:23 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Juan Quintela, Markus Armbruster, qemu-devel,
	Dr. David Alan Gilbert, Eric Blake

Hello Peter,

On Thu, Jan 13, 2022 at 4:15 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Jan 06, 2022 at 07:13:42PM -0300, Leonardo Bras wrote:
> > Implement zero copy on nocomp_send_write(), by making use of QIOChannel
> > writev + flags & flush interface.
> >
> > Change multifd_send_sync_main() so it can distinguish each iteration sync from
> > the setup and the completion, so a flush_zero_copy() can be called
> > after each iteration in order to make sure all dirty pages are sent
> > before a new iteration is started.
>
> Leo - could you remind me (since I remembered we've discussed something
> similar) on why we can't simply do the sync() unconditionally for zero copy?

On previous implementations, it would get stuck on the setup, since it
was waiting for any movement on the error queue before even starting
the sending process.
At the time we would sync only at 'complete', so it would only need to
run once. Running every iteration seemed a waste at the time.

Then, after some talk with Juan, it was decided to sync after each
migration, so on 'complete' it was unnecessary.
But sure, now it would add just 2 syncs in the whole migration, and
those should not even get to the syscall due to queued/sent counters.

>
> I remember why we need the sync(), but I can't remember what's the matter if we
> simply sync() too during setup and complete of migration.
>



> Another trivial nit here:
>
> > -void multifd_send_sync_main(QEMUFile *f)
> > +int multifd_send_sync_main(QEMUFile *f, bool sync)
>
> I'd name it "bool full" or anything not called "sync", because the function
> already has a name that contains "sync", then it's werid to sync(sync==false).
>

Yeah, I agree.
But if we will flush every time, then there is no need for such parameter :).

> The rest looks good to me.  Thanks.
>

Thanks!

> --
> Peter Xu
>

Best regards,
Leo



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 1/5] QIOChannel: Add flags on io_writev and introduce io_flush callback
  2022-01-19  1:58       ` Peter Xu
@ 2022-01-19 18:29         ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 40+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-01-19 18:29 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Juan Quintela, Markus Armbruster, qemu-devel,
	Dr. David Alan Gilbert, Eric Blake

On Tue, Jan 18, 2022 at 10:58 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Jan 18, 2022 at 05:45:09PM -0300, Leonardo Bras Soares Passos wrote:
> > Hello Peter,
> >
> > On Thu, Jan 13, 2022 at 3:28 AM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Thu, Jan 06, 2022 at 07:13:38PM -0300, Leonardo Bras wrote:
> > > > diff --git a/io/channel.c b/io/channel.c
> > > > index e8b019dc36..904855e16e 100644
> > > > --- a/io/channel.c
> > > > +++ b/io/channel.c
> > > > @@ -67,12 +67,13 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
> > > >  }
> > > >
> > > >
> > > > -ssize_t qio_channel_writev_full(QIOChannel *ioc,
> > > > -                                const struct iovec *iov,
> > > > -                                size_t niov,
> > > > -                                int *fds,
> > > > -                                size_t nfds,
> > > > -                                Error **errp)
> > > > +ssize_t qio_channel_writev_full_flags(QIOChannel *ioc,
> > > > +                                      const struct iovec *iov,
> > > > +                                      size_t niov,
> > > > +                                      int *fds,
> > > > +                                      size_t nfds,
> > > > +                                      int flags,
> > > > +                                      Error **errp)
> > > >  {
> > > >      QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
> > > >
> > > > @@ -83,7 +84,7 @@ ssize_t qio_channel_writev_full(QIOChannel *ioc,
> > > >          return -1;
> > > >      }
> > >
> > > Should we better also check QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY here when
> > > QIO_CHANNEL_WRITE_FLAG_ZERO_COPY is set?  Just like what we do with:
> >
> > Yes, that's correct.
> > I will also test for fds + zerocopy_flag , which should also fail here.
> >
> > >
> > >     if ((fds || nfds) &&
> > >         !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS)) {
> > >         error_setg_errno(errp, EINVAL,
> > >                          "Channel does not support file descriptor passing");
> > >         return -1;
> > >     }
> > >
> > > I still think it's better to have the caller be crystal clear when to use
> > > zero_copy feature because it has implication on buffer lifetime.
> >
> > I don't disagree with that suggestion.
> >
> > But the buffer lifetime limitation is something on the socket
> > implementation, right?
> > There could be some synchronous zerocopy implementation that does not
> > require flush, and thus
> > don't require the buffer to be treated any special. Or am I missing something?
>
> Currently the flush() is required for zerocopy and not required for all the
> existing non-zerocopy use cases, that's already an API difference so the caller
> needs to identify it anyway.  Then I think it's simpler we expose all of it to
> the user.

Yeah, I agree.
Since one ZC implementation uses flush, all should use them. Even if
it's a no-op.
It was just an observation that not all ZC implementations have buffer
limitations, but I agree the user should expect them anyway, since
they will exist in some implementations.

>
> Not to mention IIUC if we don't fail here, it will just fail later when the
> code will unconditionally convert the flags=ZEROCOPY into MSG_ZEROCOPY in your
> next patch:
>
>     if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
>         sflags = MSG_ZEROCOPY;
>     }
>

Correct.

> So AFAIU it'll fail anyway, either here with the cap check I mentioned, or
> later in sendmsg().
>
> IOW, I think it fails cleaner here, rather than reaching sendmsg().

I Agree.

>
> >
> > >
> > > I might have commented similar things before, but I have missed a few versions
> > > so I could also have missed some previous discussions..
> > >
> >
> > That's all great suggestions Peter!  Thanks for that!
> >
> > Some of the previous suggestions may have been missed because a lot of
> > code moved.
> > Sorry about that.
>
> Not a problem at all, I just want to make sure my question still makes
> sense. :)

Thanks for asking them!

>
> --
> Peter Xu
>

Best regards,
Leo



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 3/5] migration: Add zero-copy parameter for QMP/HMP for Linux
  2022-01-19 18:15       ` Daniel P. Berrangé
@ 2022-01-19 18:46         ` Leonardo Bras Soares Passos
  2022-02-18 16:31           ` Juan Quintela
  0 siblings, 1 reply; 40+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-01-19 18:46 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Juan Quintela, qemu-devel, Peter Xu, Markus Armbruster,
	Eric Blake, Dr. David Alan Gilbert

On Wed, Jan 19, 2022 at 3:16 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Wed, Jan 19, 2022 at 03:03:29PM -0300, Leonardo Bras Soares Passos wrote:
> > Hello Daniel,
> >
> > On Thu, Jan 13, 2022 at 10:10 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > >
> > > On Thu, Jan 06, 2022 at 07:13:40PM -0300, Leonardo Bras wrote:
> > > > Add property that allows zero-copy migration of memory pages,
> > > > and also includes a helper function migrate_use_zero_copy() to check
> > > > if it's enabled.
> > > >
> > > > No code is introduced to actually do the migration, but it allow
> > > > future implementations to enable/disable this feature.
> > > >
> > > > On non-Linux builds this parameter is compiled-out.
> > > >
> > > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > > ---
> > > >  qapi/migration.json   | 24 ++++++++++++++++++++++++
> > > >  migration/migration.h |  5 +++++
> > > >  migration/migration.c | 32 ++++++++++++++++++++++++++++++++
> > > >  migration/socket.c    |  5 +++++
> > > >  monitor/hmp-cmds.c    |  6 ++++++
> > > >  5 files changed, 72 insertions(+)
> > >
> > > Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
> >
> > Thanks!
> >
> > >
> > > >
> > > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > > index bbfd48cf0b..2e62ea6ebd 100644
> > > > --- a/qapi/migration.json
> > > > +++ b/qapi/migration.json
> > > > @@ -730,6 +730,13 @@
> > > >  #                      will consume more CPU.
> > > >  #                      Defaults to 1. (Since 5.0)
> > > >  #
> > > > +# @zero-copy: Controls behavior on sending memory pages on migration.
> > > > +#             When true, enables a zero-copy mechanism for sending memory
> > > > +#             pages, if host supports it.
> > > > +#             Requires that QEMU be permitted to use locked memory for guest
> > > > +#             RAM pages.
> > > > +#             Defaults to false. (Since 7.0)
> > > > +#
> > > >  # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
> > > >  #                        aliases for the purpose of dirty bitmap migration.  Such
> > > >  #                        aliases may for example be the corresponding names on the
> > > > @@ -769,6 +776,7 @@
> > > >             'xbzrle-cache-size', 'max-postcopy-bandwidth',
> > > >             'max-cpu-throttle', 'multifd-compression',
> > > >             'multifd-zlib-level' ,'multifd-zstd-level',
> > > > +           { 'name': 'zero-copy', 'if' : 'CONFIG_LINUX'},
> > > >             'block-bitmap-mapping' ] }
> > > >
> > > >  ##
> > > > @@ -895,6 +903,13 @@
> > > >  #                      will consume more CPU.
> > > >  #                      Defaults to 1. (Since 5.0)
> > > >  #
> > > > +# @zero-copy: Controls behavior on sending memory pages on migration.
> > > > +#             When true, enables a zero-copy mechanism for sending memory
> > > > +#             pages, if host supports it.
> > > > +#             Requires that QEMU be permitted to use locked memory for guest
> > > > +#             RAM pages.
> > > > +#             Defaults to false. (Since 7.0)
> > > > +#
> > > >  # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
> > > >  #                        aliases for the purpose of dirty bitmap migration.  Such
> > > >  #                        aliases may for example be the corresponding names on the
> > > > @@ -949,6 +964,7 @@
> > > >              '*multifd-compression': 'MultiFDCompression',
> > > >              '*multifd-zlib-level': 'uint8',
> > > >              '*multifd-zstd-level': 'uint8',
> > > > +            '*zero-copy': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
> > > >              '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
> > >
> > > The current zerocopy impl is for the send path.
> > >
> > > Do you expect we might get zerocopy in the receive path
> > > later ?
> >
> > It's possible, but I haven't started the implementation yet.
> >
> > >
> > > If so then either call this 'send-zero-copy', or change it
> > > from a bool to an enum taking '["send", "recv", "both"]'.
> > >
> > > I'd probably take the former and just rename it.
> > >
> >
> > Well, my rationale:
> > - I want to set zero copy sending:
> > zero-copy is set in the sending host, start migration.
> >
> > - I want to set zero copy receiving:
> > zero-copy is set in the receiving host, wait for migration.
> > (Of course host support is checked when setting the parameter).
> >
> > The problem with the current approach is trying to enable zero-copy on
> > receive before it's implemented, which will 'fail' silently .
> > A possible solution would be to add a patch to check in the receiving
> > path if zero-copy is enabled, and fail for now.
>
> That's not good because mgmt apps cannot query the QAPI schema
> to find out if this feature is supported or not.
>
> If we wantt o support zero copy recv, then we need an explicit
> flag for it that is distinct from zero copy send, so that apps
> can introspect whether the feature is implemneted in QEMU or
> not.
>
> Distinct named bool flags feels better, and also makes it clear
> to anyone not familiar with the impl that the current  code is
> strictly send only.
>

Ok, I see the point.
I will try to refactor the code changing zero-copy to zero-copy-send
or something like that.

Best regards,
Leo



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
  2022-01-19 17:22           ` Leonardo Bras Soares Passos
@ 2022-01-20  1:28             ` Peter Xu
  0 siblings, 0 replies; 40+ messages in thread
From: Peter Xu @ 2022-01-20  1:28 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: Daniel P. Berrangé,
	Juan Quintela, Markus Armbruster, qemu-devel,
	Dr. David Alan Gilbert, Eric Blake

On Wed, Jan 19, 2022 at 02:22:56PM -0300, Leonardo Bras Soares Passos wrote:
> Hello Daniel,
> 
> On Thu, Jan 13, 2022 at 7:42 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > On Thu, Jan 13, 2022 at 06:34:12PM +0800, Peter Xu wrote:
> > > On Thu, Jan 13, 2022 at 10:06:14AM +0000, Daniel P. Berrangé wrote:
> > > > On Thu, Jan 13, 2022 at 02:48:15PM +0800, Peter Xu wrote:
> > > > > On Thu, Jan 06, 2022 at 07:13:39PM -0300, Leonardo Bras wrote:
> > > > > > @@ -558,15 +575,26 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> > > > > >          memcpy(CMSG_DATA(cmsg), fds, fdsize);
> > > > > >      }
> > > > > >
> > > > > > +    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> > > > > > +        sflags = MSG_ZEROCOPY;
> > > > > > +    }
> > > > > > +
> > > > > >   retry:
> > > > > > -    ret = sendmsg(sioc->fd, &msg, 0);
> > > > > > +    ret = sendmsg(sioc->fd, &msg, sflags);
> > > > > >      if (ret <= 0) {
> > > > > > -        if (errno == EAGAIN) {
> > > > > > +        switch (errno) {
> > > > > > +        case EAGAIN:
> > > > > >              return QIO_CHANNEL_ERR_BLOCK;
> > > > > > -        }
> > > > > > -        if (errno == EINTR) {
> > > > > > +        case EINTR:
> > > > > >              goto retry;
> > > > > > +        case ENOBUFS:
> > > > > > +            if (sflags & MSG_ZEROCOPY) {
> > > > > > +                error_setg_errno(errp, errno,
> > > > > > +                                 "Process can't lock enough memory for using MSG_ZEROCOPY");
> > > > > > +                return -1;
> > > > > > +            }
> > > > >
> > > > > I have no idea whether it'll make a real differnece, but - should we better add
> > > > > a "break" here?  If you agree and with that fixed, feel free to add:
> > > > >
> > > > > Reviewed-by: Peter Xu <peterx@redhat.com>
> > > > >
> > > > > I also wonder whether you hit ENOBUFS in any of the environments.  On Fedora
> > > > > here it's by default unlimited, but just curious when we should keep an eye.
> > > >
> > > > Fedora doesn't allow unlimited locked memory by default
> > > >
> > > > $ grep "locked memory" /proc/self/limits
> > > > Max locked memory         65536                65536                bytes
> > > >
> > > > And  regardless of Fedora defaults, libvirt will set a limit
> > > > for the guest. It will only be unlimited if requiring certain
> > > > things like VFIO.
> > >
> > > Thanks, I obviously checked up the wrong host..
> > >
> > > Leo, do you know how much locked memory will be needed by zero copy?  Will
> > > there be a limit?  Is it linear to the number of sockets/channels?
> >
> > IIRC we decided it would be limited by the socket send buffer size, rather
> > than guest RAM, because writes will block once the send buffer is full.
> >
> > This has a default global setting, with per-socket override. On one box I
> > have it is 200 Kb. With multifd you'll need  "num-sockets * send buffer".
> 
> Oh, I was not aware there is a send buffer size (or maybe I am unable
> to recall).
> That sure makes things much easier.
> 
> >
> > > It'll be better if we can fail at enabling the feature when we detected that
> > > the specified locked memory limit may not be suffice.
> 
> sure
> 
> >
> > Checking this value against available locked memory though will always
> > have an error margin because other things in QEMU can use locked memory
> > too
> 
> We can get the current limit (before zerocopy) as an error margin:
> req_lock_mem = num-sockets * send buffer + BASE_LOCKED
> 
> Where BASE_LOCKED is the current libvirt value, or so on.

Hmm.. not familiar with libvirt, so I'm curious whether libvirt is actually
enlarging the allowed locked mem on Fedora since the default is 64KB?

I think it'll be great to capture the very major going-to-fail scenarios.  For
example, I'm wondering whether a qemu (without libvirt) will simply fail
directly on Fedora using non-root even with 1 channel due to the 64K limit, or
the other extreme case is when the user does not allow locking mem at all in
some container environment (when we see max locked mem is zero).

It's not only about failing early, it's also about failing with a meaningful
error so the user knows what to tune, while I'm not very sure that'll be easily
understandable when we wait until the failure of io_writev().

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 4/5] migration: Add migrate_use_tls() helper
  2022-01-19 18:06     ` Leonardo Bras Soares Passos
@ 2022-01-20  1:37       ` Peter Xu
  0 siblings, 0 replies; 40+ messages in thread
From: Peter Xu @ 2022-01-20  1:37 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: Daniel P. Berrangé,
	Juan Quintela, Markus Armbruster, qemu-devel,
	Dr. David Alan Gilbert, Eric Blake

On Wed, Jan 19, 2022 at 03:06:55PM -0300, Leonardo Bras Soares Passos wrote:
> Hello Peter,
> 
> On Thu, Jan 13, 2022 at 4:02 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Thu, Jan 06, 2022 at 07:13:41PM -0300, Leonardo Bras wrote:
> > >  void migration_channel_process_incoming(QIOChannel *ioc)
> > >  {
> > > -    MigrationState *s = migrate_get_current();
> > >      Error *local_err = NULL;
> > >
> > >      trace_migration_set_incoming_channel(
> > >          ioc, object_get_typename(OBJECT(ioc)));
> > >
> > > -    if (s->parameters.tls_creds &&
> > > -        *s->parameters.tls_creds &&
> > > +    if (migrate_use_tls() &&
> > >          !object_dynamic_cast(OBJECT(ioc),
> > >                               TYPE_QIO_CHANNEL_TLS)) {
> > > +        MigrationState *s = migrate_get_current();
> > > +
> >
> > Trivial nit: I'd rather keep the line there; as the movement offers nothing,
> > imho..
> 
> The idea to move the 's' to inside the if  block is to make it clear
> it's only used in this case.

IMHO not necessary; I hardly read declarations for this, unless there's a bug,
e.g. on variable shadowing. Moving it downwards makes it easier to happen. :)

> 
> But if you think it's better to keep it at the beginning of the
> function, sure, I can change that.
> Just let me know.

Since there'll be a new version, that definitely looks nicer.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
  2022-01-13  6:48   ` Peter Xu
  2022-01-13 10:06     ` Daniel P. Berrangé
  2022-01-19 17:01     ` Leonardo Bras Soares Passos
@ 2022-02-01  4:23     ` Leonardo Bras Soares Passos
  2 siblings, 0 replies; 40+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-02-01  4:23 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Juan Quintela, Markus Armbruster, qemu-devel,
	Dr. David Alan Gilbert, Eric Blake

Hello Peter,

Re-reading everything before submitting the next version.
I think I finally got that you are suggesting to just add a break at
the end of the case, after the if :)

Sorry I misunderstand that before,

Best regards,
Leo

On Thu, Jan 13, 2022 at 3:48 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Jan 06, 2022 at 07:13:39PM -0300, Leonardo Bras wrote:
> > @@ -558,15 +575,26 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> >          memcpy(CMSG_DATA(cmsg), fds, fdsize);
> >      }
> >
> > +    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> > +        sflags = MSG_ZEROCOPY;
> > +    }
> > +
> >   retry:
> > -    ret = sendmsg(sioc->fd, &msg, 0);
> > +    ret = sendmsg(sioc->fd, &msg, sflags);
> >      if (ret <= 0) {
> > -        if (errno == EAGAIN) {
> > +        switch (errno) {
> > +        case EAGAIN:
> >              return QIO_CHANNEL_ERR_BLOCK;
> > -        }
> > -        if (errno == EINTR) {
> > +        case EINTR:
> >              goto retry;
> > +        case ENOBUFS:
> > +            if (sflags & MSG_ZEROCOPY) {
> > +                error_setg_errno(errp, errno,
> > +                                 "Process can't lock enough memory for using MSG_ZEROCOPY");
> > +                return -1;
> > +            }
>
> I have no idea whether it'll make a real differnece, but - should we better add
> a "break" here?  If you agree and with that fixed, feel free to add:
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
>
> I also wonder whether you hit ENOBUFS in any of the environments.  On Fedora
> here it's by default unlimited, but just curious when we should keep an eye.
>
> Thanks,
>
> --
> Peter Xu
>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 3/5] migration: Add zero-copy parameter for QMP/HMP for Linux
  2022-01-19 18:46         ` Leonardo Bras Soares Passos
@ 2022-02-18 16:31           ` Juan Quintela
  2022-02-21 16:23             ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 40+ messages in thread
From: Juan Quintela @ 2022-02-18 16:31 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: Daniel P. Berrangé,
	Markus Armbruster, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Eric Blake

Leonardo Bras Soares Passos <leobras@redhat.com> wrote:
> On Wed, Jan 19, 2022 at 3:16 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
>>
>> On Wed, Jan 19, 2022 at 03:03:29PM -0300, Leonardo Bras Soares Passos wrote:
>> > Hello Daniel,
>> >
>> > On Thu, Jan 13, 2022 at 10:10 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>> > >
>> > > On Thu, Jan 06, 2022 at 07:13:40PM -0300, Leonardo Bras wrote:
>> > > > Add property that allows zero-copy migration of memory pages,
>> > > > and also includes a helper function migrate_use_zero_copy() to check
>> > > > if it's enabled.
>> > > >
>> > > > No code is introduced to actually do the migration, but it allow
>> > > > future implementations to enable/disable this feature.
>> > > >
>> > > > On non-Linux builds this parameter is compiled-out.
>> > > >
>> > > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
>> > > > ---
>> > > >  qapi/migration.json   | 24 ++++++++++++++++++++++++
>> > > >  migration/migration.h |  5 +++++
>> > > >  migration/migration.c | 32 ++++++++++++++++++++++++++++++++
>> > > >  migration/socket.c    |  5 +++++
>> > > >  monitor/hmp-cmds.c    |  6 ++++++
>> > > >  5 files changed, 72 insertions(+)
>> > >
>> > > Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
>> >
>> > Thanks!
>>
>
> Ok, I see the point.
> I will try to refactor the code changing zero-copy to zero-copy-send
> or something like that.

Hi

I am late to the party, but I agree with Dan that we need two flags.

Thre reason is that you can be the target of one migration, and later be
the source of a next one.  If we only have one flag that means different
things on the source and destination side, things become really
complicated.

Later, Juan.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v7 3/5] migration: Add zero-copy parameter for QMP/HMP for Linux
  2022-02-18 16:31           ` Juan Quintela
@ 2022-02-21 16:23             ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 40+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-02-21 16:23 UTC (permalink / raw)
  To: Juan Quintela
  Cc: Daniel P. Berrangé,
	Markus Armbruster, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Eric Blake

Hello Juan,
Thanks for thew feedback!


On Fri, Feb 18, 2022 at 1:31 PM Juan Quintela <quintela@redhat.com> wrote:
>
> Leonardo Bras Soares Passos <leobras@redhat.com> wrote:
> > On Wed, Jan 19, 2022 at 3:16 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >>
> >> On Wed, Jan 19, 2022 at 03:03:29PM -0300, Leonardo Bras Soares Passos wrote:
> >> > Hello Daniel,
> >> >
> >> > On Thu, Jan 13, 2022 at 10:10 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >> > >
> >> > > On Thu, Jan 06, 2022 at 07:13:40PM -0300, Leonardo Bras wrote:
> >> > > > Add property that allows zero-copy migration of memory pages,
> >> > > > and also includes a helper function migrate_use_zero_copy() to check
> >> > > > if it's enabled.
> >> > > >
> >> > > > No code is introduced to actually do the migration, but it allow
> >> > > > future implementations to enable/disable this feature.
> >> > > >
> >> > > > On non-Linux builds this parameter is compiled-out.
> >> > > >
> >> > > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> >> > > > ---
> >> > > >  qapi/migration.json   | 24 ++++++++++++++++++++++++
> >> > > >  migration/migration.h |  5 +++++
> >> > > >  migration/migration.c | 32 ++++++++++++++++++++++++++++++++
> >> > > >  migration/socket.c    |  5 +++++
> >> > > >  monitor/hmp-cmds.c    |  6 ++++++
> >> > > >  5 files changed, 72 insertions(+)
> >> > >
> >> > > Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
> >> >
> >> > Thanks!
> >>
> >
> > Ok, I see the point.
> > I will try to refactor the code changing zero-copy to zero-copy-send
> > or something like that.
>
> Hi
>
> I am late to the party, but I agree with Dan that we need two flags.
>
> Thre reason is that you can be the target of one migration, and later be
> the source of a next one.  If we only have one flag that means different
> things on the source and destination side, things become really
> complicated.
>
> Later, Juan.
>

Yeah, that makes sense. :)

Best regards,
Leo



^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2022-02-21 16:33 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-06 22:13 [PATCH v7 0/5] MSG_ZEROCOPY + multifd Leonardo Bras
2022-01-06 22:13 ` [PATCH v7 1/5] QIOChannel: Add flags on io_writev and introduce io_flush callback Leonardo Bras
2022-01-13  6:28   ` Peter Xu
2022-01-18 20:45     ` Leonardo Bras Soares Passos
2022-01-19  1:58       ` Peter Xu
2022-01-19 18:29         ` Leonardo Bras Soares Passos
2022-01-13 10:52   ` Daniel P. Berrangé
2022-01-19  4:38     ` Leonardo Bras Soares Passos
2022-01-06 22:13 ` [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX Leonardo Bras
2022-01-13  6:48   ` Peter Xu
2022-01-13 10:06     ` Daniel P. Berrangé
2022-01-13 10:34       ` Peter Xu
2022-01-13 10:42         ` Daniel P. Berrangé
2022-01-13 12:12           ` Peter Xu
2022-01-19 17:23             ` Leonardo Bras Soares Passos
2022-01-19 17:22           ` Leonardo Bras Soares Passos
2022-01-20  1:28             ` Peter Xu
2022-01-19 17:16         ` Leonardo Bras Soares Passos
2022-01-19 17:01     ` Leonardo Bras Soares Passos
2022-02-01  4:23     ` Leonardo Bras Soares Passos
2022-01-13 13:05   ` Daniel P. Berrangé
2022-01-19 17:24     ` Leonardo Bras Soares Passos
2022-01-06 22:13 ` [PATCH v7 3/5] migration: Add zero-copy parameter for QMP/HMP for Linux Leonardo Bras
2022-01-13  7:00   ` Peter Xu
2022-01-19 17:53     ` Leonardo Bras Soares Passos
2022-01-13 13:09   ` Daniel P. Berrangé
2022-01-19 18:03     ` Leonardo Bras Soares Passos
2022-01-19 18:15       ` Daniel P. Berrangé
2022-01-19 18:46         ` Leonardo Bras Soares Passos
2022-02-18 16:31           ` Juan Quintela
2022-02-21 16:23             ` Leonardo Bras Soares Passos
2022-01-06 22:13 ` [PATCH v7 4/5] migration: Add migrate_use_tls() helper Leonardo Bras
2022-01-13  7:02   ` Peter Xu
2022-01-19 18:06     ` Leonardo Bras Soares Passos
2022-01-20  1:37       ` Peter Xu
2022-01-13 13:10   ` Daniel P. Berrangé
2022-01-19 18:07     ` Leonardo Bras Soares Passos
2022-01-06 22:13 ` [PATCH v7 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy) Leonardo Bras
2022-01-13  7:15   ` Peter Xu
2022-01-19 18:23     ` Leonardo Bras Soares Passos

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.