All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/3] MSG_ZEROCOPY for multifd
@ 2021-10-09  7:56 Leonardo Bras
  2021-10-09  7:56 ` [PATCH v4 1/3] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks Leonardo Bras
                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Leonardo Bras @ 2021-10-09  7:56 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, Peter Xu, Jason Wang
  Cc: Leonardo Bras, qemu-devel

This patch series intends to enable MSG_ZEROCOPY in QIOChannel, and make
use of it for multifd migration performance improvement.

Patch #1 creates new callbacks for QIOChannel, allowing the implementation
of zerocopy writing.

Patch #2 implements writev_zerocopy and flush_zerocopy on QIOChannelSocket,
making use of MSG_ZEROCOPY on Linux.

Patch #3 Makes use of QIOChannelSocket zerocopy implementation on
nocomp multifd migration.

Results:
So far, the resource usage of __sys_sendmsg() reduced 15 times, and the
overall migration took 13-22% less time, based in synthetic workload.

The objective is to reduce migration time in hosts with heavy cpu usage.

---
Changes since v3:
- QIOChannel interface names changed from io_async_{writev,flush} to
  io_{writev,flush}_zerocopy
- Instead of falling back in case zerocopy is not implemented, return
  error and abort operation.
- Flush now waits as long as needed, or return error in case anything 
  goes wrong, aborting the operation.
- Zerocopy is now conditional in multifd, being set by parameter 
  multifd-zerocopy
- Moves zerocopy_flush to multifd_send_sync_main() from multifd_save_cleanup
  so migration can abort if flush goes wrong.
- Several other small improvements

Changes since v2:
- Patch #1: One more fallback
- Patch #2: Fall back to sync if fails to lock buffer memory in MSG_ZEROCOPY send.

Changes since v1:
- Reimplemented the patchset using async_write + async_flush approach.
- Implemented a flush to be able to tell whenever all data was written.

Leonardo Bras (3):
  QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks
  QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for
    CONFIG_LINUX
  multifd: Implement zerocopy write in multifd migration
    (multifd-zerocopy)

 qapi/migration.json         |  18 ++++
 include/io/channel-socket.h |   2 +
 include/io/channel.h        | 104 +++++++++++++++++----
 migration/migration.h       |   1 +
 migration/multifd.h         |   2 +-
 io/channel-socket.c         | 180 ++++++++++++++++++++++++++++++++++--
 io/channel.c                |  74 +++++++++++----
 migration/migration.c       |  20 ++++
 migration/multifd.c         |  33 ++++++-
 migration/ram.c             |  20 ++--
 monitor/hmp-cmds.c          |   4 +
 11 files changed, 399 insertions(+), 59 deletions(-)

-- 
2.33.0



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v4 1/3] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks
  2021-10-09  7:56 [PATCH v4 0/3] MSG_ZEROCOPY for multifd Leonardo Bras
@ 2021-10-09  7:56 ` Leonardo Bras
  2021-10-11 19:17   ` Eric Blake
  2021-10-13  6:07   ` Peter Xu
  2021-10-09  7:56 ` [PATCH v4 2/3] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX Leonardo Bras
  2021-10-09  7:56 ` [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy) Leonardo Bras
  2 siblings, 2 replies; 35+ messages in thread
From: Leonardo Bras @ 2021-10-09  7:56 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, Peter Xu, Jason Wang
  Cc: Leonardo Bras, qemu-devel

Adds io_async_writev and io_async_flush as optional callback to QIOChannelClass,
allowing the implementation of asynchronous writes by subclasses.

How to use them:
- Write data using qio_channel_writev_zerocopu(),
- Wait write completion with qio_channel_flush_zerocopy().

Notes:
As some zerocopy implementations work asynchronously, it's
recommended to keep the write buffer untouched until the return of
qio_channel_flush_zerocopy(), by the risk of sending an updated buffer
instead of the one at the write.

As the new callbacks are optional, if a subclass does not implement them, then:
- io_async_writev will return -1,
- io_async_flush will return 0 without changing anything.

Also, some functions like qio_channel_writev_full_all() were adapted to
receive a flag parameter. That allows shared code between zerocopy and
non-zerocopy writev.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 include/io/channel.h | 103 +++++++++++++++++++++++++++++++++++--------
 io/channel.c         |  74 +++++++++++++++++++++++--------
 2 files changed, 141 insertions(+), 36 deletions(-)

diff --git a/include/io/channel.h b/include/io/channel.h
index 88988979f8..e7d4e1521f 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -32,12 +32,15 @@ OBJECT_DECLARE_TYPE(QIOChannel, QIOChannelClass,
 
 #define QIO_CHANNEL_ERR_BLOCK -2
 
+#define QIO_CHANNEL_WRITE_FLAG_ZEROCOPY 0x1
+
 typedef enum QIOChannelFeature QIOChannelFeature;
 
 enum QIOChannelFeature {
     QIO_CHANNEL_FEATURE_FD_PASS,
     QIO_CHANNEL_FEATURE_SHUTDOWN,
     QIO_CHANNEL_FEATURE_LISTEN,
+    QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY,
 };
 
 
@@ -136,6 +139,12 @@ struct QIOChannelClass {
                                   IOHandler *io_read,
                                   IOHandler *io_write,
                                   void *opaque);
+    ssize_t (*io_writev_zerocopy)(QIOChannel *ioc,
+                                  const struct iovec *iov,
+                                  size_t niov,
+                                  Error **errp);
+    int (*io_flush_zerocopy)(QIOChannel *ioc,
+                              Error **errp);
 };
 
 /* General I/O handling functions */
@@ -222,12 +231,13 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
 
 
 /**
- * qio_channel_writev_full:
+ * qio_channel_writev_full_flags:
  * @ioc: the channel object
  * @iov: the array of memory regions to write data from
  * @niov: the length of the @iov array
  * @fds: an array of file handles to send
  * @nfds: number of file handles in @fds
+ * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
  * @errp: pointer to a NULL-initialized error object
  *
  * Write data to the IO channel, reading it from the
@@ -242,6 +252,10 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
  * guaranteed. If the channel is non-blocking and no
  * data can be sent, it will return QIO_CHANNEL_ERR_BLOCK
  *
+ * If flag QIO_CHANNEL_WRITE_FLAG_ZEROCOPY is passed,
+ * function will return once each buffer was queued for
+ * sending.
+ *
  * If there are file descriptors to send, the @fds
  * array should be non-NULL and provide the handles.
  * All file descriptors will be sent if at least one
@@ -255,12 +269,15 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
  * or QIO_CHANNEL_ERR_BLOCK if no data is can be sent
  * and the channel is non-blocking
  */
-ssize_t qio_channel_writev_full(QIOChannel *ioc,
-                                const struct iovec *iov,
-                                size_t niov,
-                                int *fds,
-                                size_t nfds,
-                                Error **errp);
+ssize_t qio_channel_writev_full_flags(QIOChannel *ioc,
+                                      const struct iovec *iov,
+                                      size_t niov,
+                                      int *fds,
+                                      size_t nfds,
+                                      int flags,
+                                      Error **errp);
+#define qio_channel_writev_full(ioc, iov, niov, fds, nfds, errp) \
+    qio_channel_writev_full_flags(ioc, iov, niov, fds, nfds, 0, errp)
 
 /**
  * qio_channel_readv_all_eof:
@@ -321,10 +338,11 @@ int qio_channel_readv_all(QIOChannel *ioc,
 
 
 /**
- * qio_channel_writev_all:
+ * qio_channel_writev_all_flags:
  * @ioc: the channel object
  * @iov: the array of memory regions to write data from
  * @niov: the length of the @iov array
+ * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
  * @errp: pointer to a NULL-initialized error object
  *
  * Write data to the IO channel, reading it from the
@@ -339,10 +357,13 @@ int qio_channel_readv_all(QIOChannel *ioc,
  *
  * Returns: 0 if all bytes were written, or -1 on error
  */
-int qio_channel_writev_all(QIOChannel *ioc,
-                           const struct iovec *iov,
-                           size_t niov,
-                           Error **erp);
+int qio_channel_writev_all_flags(QIOChannel *ioc,
+                                 const struct iovec *iov,
+                                 size_t niov,
+                                 int flags,
+                                 Error **errp);
+#define qio_channel_writev_all(ioc, iov, niov, errp) \
+    qio_channel_writev_all_flags(ioc, iov, niov, 0, errp)
 
 /**
  * qio_channel_readv:
@@ -831,12 +852,13 @@ int qio_channel_readv_full_all(QIOChannel *ioc,
                                Error **errp);
 
 /**
- * qio_channel_writev_full_all:
+ * qio_channel_writev_full_all_flags:
  * @ioc: the channel object
  * @iov: the array of memory regions to write data from
  * @niov: the length of the @iov array
  * @fds: an array of file handles to send
  * @nfds: number of file handles in @fds
+ * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
  * @errp: pointer to a NULL-initialized error object
  *
  *
@@ -846,13 +868,58 @@ int qio_channel_readv_full_all(QIOChannel *ioc,
  * to be written, yielding from the current coroutine
  * if required.
  *
+ * If QIO_CHANNEL_WRITE_FLAG_ZEROCOPY is passed in flags,
+ * instead of waiting for all requested data to be written,
+ * this function will wait until it's all queued for writing.
+ *
  * Returns: 0 if all bytes were written, or -1 on error
  */
 
-int qio_channel_writev_full_all(QIOChannel *ioc,
-                                const struct iovec *iov,
-                                size_t niov,
-                                int *fds, size_t nfds,
-                                Error **errp);
+int qio_channel_writev_full_all_flags(QIOChannel *ioc,
+                                      const struct iovec *iov,
+                                      size_t niov,
+                                      int *fds, size_t nfds,
+                                      int flags, Error **errp);
+#define qio_channel_writev_full_all(ioc, iov, niov, fds, nfds, errp) \
+    qio_channel_writev_full_all_flags(ioc, iov, niov, fds, nfds, 0, errp)
+
+/**
+ * qio_channel_writev_zerocopy:
+ * @ioc: the channel object
+ * @iov: the array of memory regions to write data from
+ * @niov: the length of the @iov array
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Behaves like qio_channel_writev_full_all_flags, but will write
+ * data asynchronously while avoiding unnecessary data copy.
+ * This function may return before any data is actually written,
+ * but should queue every buffer for writting.
+ *
+ * If at some point it's necessary wait for all data to be
+ * written, use qio_channel_flush_zerocopy().
+ *
+ * If zerocopy is not available, returns -1 and set errp.
+ */
+
+ssize_t qio_channel_writev_zerocopy(QIOChannel *ioc,
+                                    const struct iovec *iov,
+                                    size_t niov,
+                                    Error **errp);
+
+/**
+ * qio_channel_flush_zerocopy:
+ * @ioc: the channel object
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Will lock until every packet queued with
+ * qio_channel_writev_zerocopy() is sent, or return
+ * in case of any error.
+ *
+ * Returns -1 if any error is found, 0 otherwise.
+ * If not implemented, returns 0 without changing anything.
+ */
+
+int qio_channel_flush_zerocopy(QIOChannel *ioc,
+                               Error **errp);
 
 #endif /* QIO_CHANNEL_H */
diff --git a/io/channel.c b/io/channel.c
index e8b019dc36..811c93ae23 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -67,15 +67,27 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
 }
 
 
-ssize_t qio_channel_writev_full(QIOChannel *ioc,
-                                const struct iovec *iov,
-                                size_t niov,
-                                int *fds,
-                                size_t nfds,
-                                Error **errp)
+ssize_t qio_channel_writev_full_flags(QIOChannel *ioc,
+                                      const struct iovec *iov,
+                                      size_t niov,
+                                      int *fds,
+                                      size_t nfds,
+                                      int flags,
+                                      Error **errp)
 {
     QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
 
+    if (flags & QIO_CHANNEL_WRITE_FLAG_ZEROCOPY) {
+        if (!klass->io_writev_zerocopy ||
+            !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY)) {
+            error_setg_errno(errp, EINVAL,
+                             "Channel does not support zerocopy writev");
+            return -1;
+        }
+
+        return klass->io_writev_zerocopy(ioc, iov, niov, errp);
+    }
+
     if ((fds || nfds) &&
         !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS)) {
         error_setg_errno(errp, EINVAL,
@@ -212,19 +224,20 @@ int qio_channel_readv_full_all(QIOChannel *ioc,
     return ret;
 }
 
-int qio_channel_writev_all(QIOChannel *ioc,
-                           const struct iovec *iov,
-                           size_t niov,
-                           Error **errp)
+int qio_channel_writev_all_flags(QIOChannel *ioc,
+                                 const struct iovec *iov,
+                                 size_t niov,
+                                 int flags,
+                                 Error **errp)
 {
-    return qio_channel_writev_full_all(ioc, iov, niov, NULL, 0, errp);
+    return qio_channel_writev_full_all_flags(ioc, iov, niov, NULL, 0, flags, errp);
 }
 
-int qio_channel_writev_full_all(QIOChannel *ioc,
-                                const struct iovec *iov,
-                                size_t niov,
-                                int *fds, size_t nfds,
-                                Error **errp)
+int qio_channel_writev_full_all_flags(QIOChannel *ioc,
+                                      const struct iovec *iov,
+                                      size_t niov,
+                                      int *fds, size_t nfds,
+                                      int flags, Error **errp)
 {
     int ret = -1;
     struct iovec *local_iov = g_new(struct iovec, niov);
@@ -237,8 +250,8 @@ int qio_channel_writev_full_all(QIOChannel *ioc,
 
     while (nlocal_iov > 0) {
         ssize_t len;
-        len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds, nfds,
-                                      errp);
+        len = qio_channel_writev_full_flags(ioc, local_iov, nlocal_iov, fds, nfds,
+                                          flags, errp);
         if (len == QIO_CHANNEL_ERR_BLOCK) {
             if (qemu_in_coroutine()) {
                 qio_channel_yield(ioc, G_IO_OUT);
@@ -474,6 +487,31 @@ off_t qio_channel_io_seek(QIOChannel *ioc,
 }
 
 
+ssize_t qio_channel_writev_zerocopy(QIOChannel *ioc,
+                                    const struct iovec *iov,
+                                    size_t niov,
+                                    Error **errp)
+{
+    return qio_channel_writev_full_flags(ioc, iov, niov, NULL, 0,
+                                         QIO_CHANNEL_WRITE_FLAG_ZEROCOPY,
+                                         errp);
+}
+
+
+int qio_channel_flush_zerocopy(QIOChannel *ioc,
+                               Error **errp)
+{
+    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
+
+    if (!klass->io_flush_zerocopy ||
+        !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY)) {
+        return 0;
+    }
+
+    return klass->io_flush_zerocopy(ioc, errp);
+}
+
+
 static void qio_channel_restart_read(void *opaque)
 {
     QIOChannel *ioc = opaque;
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 2/3] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX
  2021-10-09  7:56 [PATCH v4 0/3] MSG_ZEROCOPY for multifd Leonardo Bras
  2021-10-09  7:56 ` [PATCH v4 1/3] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks Leonardo Bras
@ 2021-10-09  7:56 ` Leonardo Bras
  2021-10-11 19:27   ` Eric Blake
                     ` (2 more replies)
  2021-10-09  7:56 ` [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy) Leonardo Bras
  2 siblings, 3 replies; 35+ messages in thread
From: Leonardo Bras @ 2021-10-09  7:56 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, Peter Xu, Jason Wang
  Cc: Leonardo Bras, qemu-devel

For CONFIG_LINUX, implement the new optional callbacks io_write_zerocopy and
io_flush_zerocopy on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
feature is available in the host kernel, which is checked on
qio_channel_socket_connect_sync()

qio_channel_socket_writev() contents were moved to a helper function
qio_channel_socket_writev_flags() which accepts an extra argument for flags.
(This argument is passed directly to sendmsg().

The above helper function is used to implement qio_channel_socket_writev(),
with flags = 0, keeping it's behavior unchanged, and
qio_channel_socket_writev_zerocopy() with flags = MSG_ZEROCOPY.

qio_channel_socket_flush_zerocopy() was implemented by counting how many times
sendmsg(...,MSG_ZEROCOPY) was sucessfully called, and then reading the
socket's error queue, in order to find how many of them finished sending.
Flush will loop until those counters are the same, or until some error ocurs.

A new function qio_channel_socket_poll() was also created in order to avoid
busy-looping recvmsg() in qio_channel_socket_flush_zerocopy() while waiting for
updates in socket's error queue.

Notes on using writev_zerocopy():
1: Buffer
- As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying,
some caution is necessary to avoid overwriting any buffer before it's sent.
If something like this happen, a newer version of the buffer may be sent instead.
- If this is a problem, it's recommended to call flush_zerocopy() before freeing
or re-using the buffer.

2: Locked memory
- When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and
unlocked after it's sent.
- Depending on the size of each buffer, and how often it's sent, it may require
a larger amount of locked memory than usually available to non-root user.
- If the required amount of locked memory is not available, writev_zerocopy
will return an error, which can abort an operation like migration,
- Because of this, when an user code wants to add zerocopy as a feature, it
requires a mechanism to disable it, so it can still be acessible to less
privileged users.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 include/io/channel-socket.h |   2 +
 include/io/channel.h        |   1 +
 io/channel-socket.c         | 180 ++++++++++++++++++++++++++++++++++--
 3 files changed, 173 insertions(+), 10 deletions(-)

diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
index e747e63514..81d04baa4c 100644
--- a/include/io/channel-socket.h
+++ b/include/io/channel-socket.h
@@ -47,6 +47,8 @@ struct QIOChannelSocket {
     socklen_t localAddrLen;
     struct sockaddr_storage remoteAddr;
     socklen_t remoteAddrLen;
+    ssize_t zerocopy_queued;
+    ssize_t zerocopy_sent;
 };
 
 
diff --git a/include/io/channel.h b/include/io/channel.h
index e7d4e1521f..9d74629226 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -31,6 +31,7 @@ OBJECT_DECLARE_TYPE(QIOChannel, QIOChannelClass,
 
 
 #define QIO_CHANNEL_ERR_BLOCK -2
+#define QIO_CHANNEL_ERR_NOBUFS -3
 
 #define QIO_CHANNEL_WRITE_FLAG_ZEROCOPY 0x1
 
diff --git a/io/channel-socket.c b/io/channel-socket.c
index 606ec97cf7..6cc42057b2 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -26,6 +26,10 @@
 #include "io/channel-watch.h"
 #include "trace.h"
 #include "qapi/clone-visitor.h"
+#ifdef CONFIG_LINUX
+#include <linux/errqueue.h>
+#include <poll.h>
+#endif
 
 #define SOCKET_MAX_FDS 16
 
@@ -55,6 +59,8 @@ qio_channel_socket_new(void)
 
     sioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET));
     sioc->fd = -1;
+    sioc->zerocopy_queued = 0;
+    sioc->zerocopy_sent = 0;
 
     ioc = QIO_CHANNEL(sioc);
     qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN);
@@ -140,6 +146,7 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
                                     Error **errp)
 {
     int fd;
+    int ret, v = 1;
 
     trace_qio_channel_socket_connect_sync(ioc, addr);
     fd = socket_connect(addr, errp);
@@ -154,6 +161,17 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
         return -1;
     }
 
+#ifdef CONFIG_LINUX
+    ret = qemu_setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v));
+    if (ret < 0) {
+        /* Zerocopy not available on host */
+        return 0;
+    }
+
+    qio_channel_set_feature(QIO_CHANNEL(ioc),
+                            QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY);
+#endif
+
     return 0;
 }
 
@@ -520,12 +538,13 @@ static ssize_t qio_channel_socket_readv(QIOChannel *ioc,
     return ret;
 }
 
-static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
-                                         const struct iovec *iov,
-                                         size_t niov,
-                                         int *fds,
-                                         size_t nfds,
-                                         Error **errp)
+static ssize_t qio_channel_socket_writev_flags(QIOChannel *ioc,
+                                               const struct iovec *iov,
+                                               size_t niov,
+                                               int *fds,
+                                               size_t nfds,
+                                               int flags,
+                                               Error **errp)
 {
     QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
     ssize_t ret;
@@ -558,20 +577,34 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
     }
 
  retry:
-    ret = sendmsg(sioc->fd, &msg, 0);
+    ret = sendmsg(sioc->fd, &msg, flags);
     if (ret <= 0) {
-        if (errno == EAGAIN) {
+        switch (errno) {
+        case EAGAIN:
             return QIO_CHANNEL_ERR_BLOCK;
-        }
-        if (errno == EINTR) {
+        case EINTR:
             goto retry;
+        case ENOBUFS:
+            return QIO_CHANNEL_ERR_NOBUFS;
         }
+
         error_setg_errno(errp, errno,
                          "Unable to write to socket");
         return -1;
     }
     return ret;
 }
+
+static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
+                                         const struct iovec *iov,
+                                         size_t niov,
+                                         int *fds,
+                                         size_t nfds,
+                                         Error **errp)
+{
+    return qio_channel_socket_writev_flags(ioc, iov, niov, fds, nfds, 0, errp);
+}
+
 #else /* WIN32 */
 static ssize_t qio_channel_socket_readv(QIOChannel *ioc,
                                         const struct iovec *iov,
@@ -658,6 +691,129 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
 }
 #endif /* WIN32 */
 
+
+#ifdef CONFIG_LINUX
+
+static int qio_channel_socket_poll(QIOChannelSocket *sioc, bool zerocopy,
+                                   Error **errp)
+{
+    struct pollfd pfd;
+    int ret;
+
+    pfd.fd = sioc->fd;
+    pfd.events = 0;
+
+ retry:
+    ret = poll(&pfd, 1, -1);
+    if (ret < 0) {
+        switch (errno) {
+        case EAGAIN:
+        case EINTR:
+            goto retry;
+        default:
+            error_setg_errno(errp, errno,
+                             "Poll error");
+            return ret;
+        }
+    }
+
+    if (pfd.revents & (POLLHUP | POLLNVAL)) {
+        error_setg(errp, "Poll error: Invalid or disconnected fd");
+        return -1;
+    }
+
+    if (!zerocopy && (pfd.revents & POLLERR)) {
+        error_setg(errp, "Poll error: Errors present in errqueue");
+        return -1;
+    }
+
+    return ret;
+}
+
+static ssize_t qio_channel_socket_writev_zerocopy(QIOChannel *ioc,
+                                                  const struct iovec *iov,
+                                                  size_t niov,
+                                                  Error **errp)
+{
+    QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
+    ssize_t ret;
+
+    ret = qio_channel_socket_writev_flags(ioc, iov, niov, NULL, 0,
+                                          MSG_ZEROCOPY, errp);
+    if (ret == QIO_CHANNEL_ERR_NOBUFS) {
+        if (errp && *errp) {
+            error_setg_errno(errp, errno,
+                             "Process can't lock enough memory for using MSG_ZEROCOPY");
+        }
+        return -1;
+    }
+
+    sioc->zerocopy_queued++;
+    return ret;
+}
+
+static int qio_channel_socket_flush_zerocopy(QIOChannel *ioc,
+                                             Error **errp)
+{
+    QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
+    struct msghdr msg = {};
+    struct sock_extended_err *serr;
+    struct cmsghdr *cm;
+    char control[CMSG_SPACE(sizeof(*serr))];
+    int ret;
+
+    msg.msg_control = control;
+    msg.msg_controllen = sizeof(control);
+    memset(control, 0, sizeof(control));
+
+    while (sioc->zerocopy_sent < sioc->zerocopy_queued) {
+        ret = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE);
+        if (ret < 0) {
+            switch (errno) {
+            case EAGAIN:
+                /* Nothing on errqueue, wait until something is available*/
+                ret = qio_channel_socket_poll(sioc, true, errp);
+                if (ret < 0) {
+                    return -1;
+                }
+                continue;
+            case EINTR:
+                continue;
+            default:
+                error_setg_errno(errp, errno,
+                                 "Unable to read errqueue");
+                return -1;
+            }
+        }
+
+        cm = CMSG_FIRSTHDR(&msg);
+        if (cm->cmsg_level != SOL_IP &&
+            cm->cmsg_type != IP_RECVERR) {
+            error_setg_errno(errp, EPROTOTYPE,
+                             "Wrong cmsg in errqueue");
+            return -1;
+        }
+
+        serr = (void *) CMSG_DATA(cm);
+        if (serr->ee_errno != SO_EE_ORIGIN_NONE) {
+            error_setg_errno(errp, serr->ee_errno,
+                             "Error on socket");
+            return -1;
+        }
+        if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) {
+            error_setg_errno(errp, serr->ee_origin,
+                             "Error not from zerocopy");
+            return -1;
+        }
+
+        /* No errors, count sucessfully finished sendmsg()*/
+        sioc->zerocopy_sent += serr->ee_data - serr->ee_info + 1;
+    }
+    return 0;
+}
+
+#endif /* CONFIG_LINUX */
+
 static int
 qio_channel_socket_set_blocking(QIOChannel *ioc,
                                 bool enabled,
@@ -787,6 +943,10 @@ static void qio_channel_socket_class_init(ObjectClass *klass,
     ioc_klass->io_set_delay = qio_channel_socket_set_delay;
     ioc_klass->io_create_watch = qio_channel_socket_create_watch;
     ioc_klass->io_set_aio_fd_handler = qio_channel_socket_set_aio_fd_handler;
+#ifdef CONFIG_LINUX
+    ioc_klass->io_writev_zerocopy = qio_channel_socket_writev_zerocopy;
+    ioc_klass->io_flush_zerocopy = qio_channel_socket_flush_zerocopy;
+#endif
 }
 
 static const TypeInfo qio_channel_socket_info = {
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-10-09  7:56 [PATCH v4 0/3] MSG_ZEROCOPY for multifd Leonardo Bras
  2021-10-09  7:56 ` [PATCH v4 1/3] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks Leonardo Bras
  2021-10-09  7:56 ` [PATCH v4 2/3] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX Leonardo Bras
@ 2021-10-09  7:56 ` Leonardo Bras
  2021-10-11 19:31   ` Eric Blake
                     ` (3 more replies)
  2 siblings, 4 replies; 35+ messages in thread
From: Leonardo Bras @ 2021-10-09  7:56 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, Peter Xu, Jason Wang
  Cc: Leonardo Bras, qemu-devel

Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
zerocopy interface.

Change multifd_send_sync_main() so it can distinguish the last sync from
the setup and per-iteration ones, so a flush_zerocopy() can be called
at the last sync in order to make sure all RAM is sent before finishing
the migration.

Also make it return -1 if flush_zerocopy() fails, in order to cancel
the migration process, and avoid resuming the guest in the target host
without receiving all current RAM.

This will work fine on RAM migration because the RAM pages are not usually freed,
and there is no problem on changing the pages content between async_send() and
the actual sending of the buffer, because this change will dirty the page and
cause it to be re-sent on a next iteration anyway.

Given a lot of locked memory may be needed in order to use multid migration
with zerocopy enabled, make it optional by creating a new parameter
multifd-zerocopy on qapi, so low-privileged users can still perform multifd
migrations.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 qapi/migration.json   | 18 ++++++++++++++++++
 migration/migration.h |  1 +
 migration/multifd.h   |  2 +-
 migration/migration.c | 20 ++++++++++++++++++++
 migration/multifd.c   | 33 ++++++++++++++++++++++++++++-----
 migration/ram.c       | 20 +++++++++++++-------
 monitor/hmp-cmds.c    |  4 ++++
 7 files changed, 85 insertions(+), 13 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 88f07baedd..c4890cbb54 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -724,6 +724,11 @@
 #                      will consume more CPU.
 #                      Defaults to 1. (Since 5.0)
 #
+# @multifd-zerocopy: Controls behavior on sending memory pages on multifd migration.
+#                    When true, enables a zerocopy mechanism for sending memory
+#                    pages, if host does support it.
+#                    Defaults to false. (Since 6.2)
+#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #                        aliases for the purpose of dirty bitmap migration.  Such
 #                        aliases may for example be the corresponding names on the
@@ -758,6 +763,7 @@
            'xbzrle-cache-size', 'max-postcopy-bandwidth',
            'max-cpu-throttle', 'multifd-compression',
            'multifd-zlib-level' ,'multifd-zstd-level',
+	   'multifd-zerocopy',
            'block-bitmap-mapping' ] }
 
 ##
@@ -884,6 +890,11 @@
 #                      will consume more CPU.
 #                      Defaults to 1. (Since 5.0)
 #
+# @multifd-zerocopy: Controls behavior on sending memory pages on multifd migration.
+#                    When true, enables a zerocopy mechanism for sending memory
+#                    pages, if host does support it.
+#                    Defaults to false. (Since 6.2)
+#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #                        aliases for the purpose of dirty bitmap migration.  Such
 #                        aliases may for example be the corresponding names on the
@@ -934,6 +945,7 @@
             '*multifd-compression': 'MultiFDCompression',
             '*multifd-zlib-level': 'uint8',
             '*multifd-zstd-level': 'uint8',
+	    '*multifd-zerocopy': 'bool',
             '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
 
 ##
@@ -1080,6 +1092,11 @@
 #                      will consume more CPU.
 #                      Defaults to 1. (Since 5.0)
 #
+# @multifd-zerocopy: Controls behavior on sending memory pages on multifd migration.
+#                    When true, enables a zerocopy mechanism for sending memory
+#                    pages, if host does support it.
+#                    Defaults to false. (Since 6.2)
+#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #                        aliases for the purpose of dirty bitmap migration.  Such
 #                        aliases may for example be the corresponding names on the
@@ -1128,6 +1145,7 @@
             '*multifd-compression': 'MultiFDCompression',
             '*multifd-zlib-level': 'uint8',
             '*multifd-zstd-level': 'uint8',
+	    '*multifd-zerocopy': 'bool',
             '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
 
 ##
diff --git a/migration/migration.h b/migration/migration.h
index 7a5aa8c2fd..860d83cc41 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -338,6 +338,7 @@ int migrate_multifd_channels(void);
 MultiFDCompression migrate_multifd_compression(void);
 int migrate_multifd_zlib_level(void);
 int migrate_multifd_zstd_level(void);
+int migrate_multifd_zerocopy(void);
 
 int migrate_use_xbzrle(void);
 uint64_t migrate_xbzrle_cache_size(void);
diff --git a/migration/multifd.h b/migration/multifd.h
index 8d6751f5ed..8f5c5a6953 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -20,7 +20,7 @@ int multifd_load_cleanup(Error **errp);
 bool multifd_recv_all_channels_created(void);
 bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
 void multifd_recv_sync_main(void);
-void multifd_send_sync_main(QEMUFile *f);
+int multifd_send_sync_main(QEMUFile *f, bool last_sync);
 int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
 
 /* Multifd Compression flags */
diff --git a/migration/migration.c b/migration/migration.c
index 6ac807ef3d..326f7c515f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -879,6 +879,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     params->multifd_zlib_level = s->parameters.multifd_zlib_level;
     params->has_multifd_zstd_level = true;
     params->multifd_zstd_level = s->parameters.multifd_zstd_level;
+    params->has_multifd_zerocopy = true;
+    params->multifd_zerocopy = s->parameters.multifd_zerocopy;
     params->has_xbzrle_cache_size = true;
     params->xbzrle_cache_size = s->parameters.xbzrle_cache_size;
     params->has_max_postcopy_bandwidth = true;
@@ -1523,6 +1525,9 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
     if (params->has_multifd_compression) {
         dest->multifd_compression = params->multifd_compression;
     }
+    if (params->has_multifd_zerocopy) {
+        dest->multifd_zerocopy = params->multifd_zerocopy;
+    }
     if (params->has_xbzrle_cache_size) {
         dest->xbzrle_cache_size = params->xbzrle_cache_size;
     }
@@ -1635,6 +1640,9 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
     if (params->has_multifd_compression) {
         s->parameters.multifd_compression = params->multifd_compression;
     }
+    if (params->has_multifd_zerocopy) {
+        s->parameters.multifd_zerocopy = params->multifd_zerocopy;
+    }
     if (params->has_xbzrle_cache_size) {
         s->parameters.xbzrle_cache_size = params->xbzrle_cache_size;
         xbzrle_cache_resize(params->xbzrle_cache_size, errp);
@@ -2516,6 +2524,15 @@ int migrate_multifd_zstd_level(void)
     return s->parameters.multifd_zstd_level;
 }
 
+int migrate_multifd_zerocopy(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->parameters.multifd_zerocopy;
+}
+
 int migrate_use_xbzrle(void)
 {
     MigrationState *s;
@@ -4164,6 +4181,8 @@ static Property migration_properties[] = {
     DEFINE_PROP_UINT8("multifd-zstd-level", MigrationState,
                       parameters.multifd_zstd_level,
                       DEFAULT_MIGRATE_MULTIFD_ZSTD_LEVEL),
+    DEFINE_PROP_BOOL("multifd-zerocopy", MigrationState,
+                      parameters.multifd_zerocopy, false),
     DEFINE_PROP_SIZE("xbzrle-cache-size", MigrationState,
                       parameters.xbzrle_cache_size,
                       DEFAULT_MIGRATE_XBZRLE_CACHE_SIZE),
@@ -4261,6 +4280,7 @@ static void migration_instance_init(Object *obj)
     params->has_multifd_compression = true;
     params->has_multifd_zlib_level = true;
     params->has_multifd_zstd_level = true;
+    params->has_multifd_zerocopy = true;
     params->has_xbzrle_cache_size = true;
     params->has_max_postcopy_bandwidth = true;
     params->has_max_cpu_throttle = true;
diff --git a/migration/multifd.c b/migration/multifd.c
index 377da78f5b..17a7d90de3 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -105,7 +105,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, uint32_t used,
  */
 static int nocomp_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
 {
-    return qio_channel_writev_all(p->c, p->pages->iov, used, errp);
+    int flags = 0;
+
+    if (migrate_multifd_zerocopy()) {
+        flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
+    }
+
+    return qio_channel_writev_all_flags(p->c, p->pages->iov, used, flags, errp);
 }
 
 /**
@@ -575,19 +581,23 @@ void multifd_save_cleanup(void)
     multifd_send_state = NULL;
 }
 
-void multifd_send_sync_main(QEMUFile *f)
+int multifd_send_sync_main(QEMUFile *f, bool last_sync)
 {
     int i;
+    bool flush_zerocopy;
 
     if (!migrate_use_multifd()) {
-        return;
+        return 0;
     }
     if (multifd_send_state->pages->used) {
         if (multifd_send_pages(f) < 0) {
             error_report("%s: multifd_send_pages fail", __func__);
-            return;
+            return 0;
         }
     }
+
+    flush_zerocopy = last_sync && migrate_multifd_zerocopy();
+
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
 
@@ -598,7 +608,7 @@ void multifd_send_sync_main(QEMUFile *f)
         if (p->quit) {
             error_report("%s: channel %d has already quit", __func__, i);
             qemu_mutex_unlock(&p->mutex);
-            return;
+            return 0;
         }
 
         p->packet_num = multifd_send_state->packet_num++;
@@ -609,6 +619,17 @@ void multifd_send_sync_main(QEMUFile *f)
         ram_counters.transferred += p->packet_len;
         qemu_mutex_unlock(&p->mutex);
         qemu_sem_post(&p->sem);
+
+        if (flush_zerocopy) {
+            int ret;
+            Error *err = NULL;
+
+            ret = qio_channel_flush_zerocopy(p->c, &err);
+            if (ret < 0) {
+                error_report_err(err);
+                return -1;
+            }
+        }
     }
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
@@ -617,6 +638,8 @@ void multifd_send_sync_main(QEMUFile *f)
         qemu_sem_wait(&p->sem_sync);
     }
     trace_multifd_send_sync_main(multifd_send_state->packet_num);
+
+    return 0;
 }
 
 static void *multifd_send_thread(void *opaque)
diff --git a/migration/ram.c b/migration/ram.c
index 7a43bfd7af..ada57846a5 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2839,7 +2839,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     ram_control_before_iterate(f, RAM_CONTROL_SETUP);
     ram_control_after_iterate(f, RAM_CONTROL_SETUP);
 
-    multifd_send_sync_main(f);
+    multifd_send_sync_main(f, false);
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
     qemu_fflush(f);
 
@@ -2948,7 +2948,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 out:
     if (ret >= 0
         && migration_is_setup_or_active(migrate_get_current()->state)) {
-        multifd_send_sync_main(rs->f);
+        multifd_send_sync_main(rs->f, false);
         qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
         qemu_fflush(f);
         ram_counters.transferred += 8;
@@ -3006,13 +3006,19 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
         ram_control_after_iterate(f, RAM_CONTROL_FINISH);
     }
 
-    if (ret >= 0) {
-        multifd_send_sync_main(rs->f);
-        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
-        qemu_fflush(f);
+    if (ret < 0) {
+        return ret;
     }
 
-    return ret;
+    ret = multifd_send_sync_main(rs->f, true);
+    if (ret < 0) {
+        return -1;
+    }
+
+    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+    qemu_fflush(f);
+
+    return 0;
 }
 
 static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index bcaa41350e..b04f14ec1e 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1364,6 +1364,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
         p->has_multifd_zstd_level = true;
         visit_type_uint8(v, param, &p->multifd_zstd_level, &err);
         break;
+    case MIGRATION_PARAMETER_MULTIFD_ZEROCOPY:
+        p->has_multifd_zerocopy = true;
+        visit_type_bool(v, param, &p->multifd_zerocopy, &err);
+        break;
     case MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE:
         p->has_xbzrle_cache_size = true;
         if (!visit_type_size(v, param, &cache_size, &err)) {
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 1/3] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks
  2021-10-09  7:56 ` [PATCH v4 1/3] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks Leonardo Bras
@ 2021-10-11 19:17   ` Eric Blake
  2021-10-11 19:38     ` Leonardo Bras Soares Passos
  2021-10-13  6:07   ` Peter Xu
  1 sibling, 1 reply; 35+ messages in thread
From: Eric Blake @ 2021-10-11 19:17 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Daniel P. Berrangé,
	Juan Quintela, qemu-devel, Jason Wang, Markus Armbruster,
	Peter Xu, Dr. David Alan Gilbert

On Sat, Oct 09, 2021 at 04:56:11AM -0300, Leonardo Bras wrote:
> Adds io_async_writev and io_async_flush as optional callback to QIOChannelClass,

Are these names accurate?

> allowing the implementation of asynchronous writes by subclasses.
> 
> How to use them:
> - Write data using qio_channel_writev_zerocopu(),

s/copu/copy/

> - Wait write completion with qio_channel_flush_zerocopy().
> 
> Notes:
> As some zerocopy implementations work asynchronously, it's
> recommended to keep the write buffer untouched until the return of
> qio_channel_flush_zerocopy(), by the risk of sending an updated buffer

s/by/to avoid/

> instead of the one at the write.
> 
> As the new callbacks are optional, if a subclass does not implement them, then:
> - io_async_writev will return -1,
> - io_async_flush will return 0 without changing anything.

Are these names accurate?

> 
> Also, some functions like qio_channel_writev_full_all() were adapted to
> receive a flag parameter. That allows shared code between zerocopy and
> non-zerocopy writev.
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  include/io/channel.h | 103 +++++++++++++++++++++++++++++++++++--------
>  io/channel.c         |  74 +++++++++++++++++++++++--------
>  2 files changed, 141 insertions(+), 36 deletions(-)
> 
> diff --git a/include/io/channel.h b/include/io/channel.h
> index 88988979f8..e7d4e1521f 100644
> --- a/include/io/channel.h
> +++ b/include/io/channel.h
> @@ -32,12 +32,15 @@ OBJECT_DECLARE_TYPE(QIOChannel, QIOChannelClass,
>  
>  #define QIO_CHANNEL_ERR_BLOCK -2
>  
> +#define QIO_CHANNEL_WRITE_FLAG_ZEROCOPY 0x1
> +
>  typedef enum QIOChannelFeature QIOChannelFeature;
>  
>  enum QIOChannelFeature {
>      QIO_CHANNEL_FEATURE_FD_PASS,
>      QIO_CHANNEL_FEATURE_SHUTDOWN,
>      QIO_CHANNEL_FEATURE_LISTEN,
> +    QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY,
>  };
>  
>  
> @@ -136,6 +139,12 @@ struct QIOChannelClass {
>                                    IOHandler *io_read,
>                                    IOHandler *io_write,
>                                    void *opaque);
> +    ssize_t (*io_writev_zerocopy)(QIOChannel *ioc,
> +                                  const struct iovec *iov,
> +                                  size_t niov,
> +                                  Error **errp);
> +    int (*io_flush_zerocopy)(QIOChannel *ioc,
> +                              Error **errp);

Indentation is off by one.

>  };
>  
>  /* General I/O handling functions */
> @@ -222,12 +231,13 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
>  
>  
>  /**
> - * qio_channel_writev_full:
> + * qio_channel_writev_full_flags:
>   * @ioc: the channel object
>   * @iov: the array of memory regions to write data from
>   * @niov: the length of the @iov array
>   * @fds: an array of file handles to send
>   * @nfds: number of file handles in @fds
> + * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
>   * @errp: pointer to a NULL-initialized error object
>   *
>   * Write data to the IO channel, reading it from the
> @@ -242,6 +252,10 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
>   * guaranteed. If the channel is non-blocking and no
>   * data can be sent, it will return QIO_CHANNEL_ERR_BLOCK
>   *
> + * If flag QIO_CHANNEL_WRITE_FLAG_ZEROCOPY is passed,
> + * function will return once each buffer was queued for
> + * sending.

This would be a good place to document the requirement to keep the
buffer unchanged until the zerocopy sequence completes.

>                                 Error **errp);
>  
>  /**
> - * qio_channel_writev_full_all:
> + * qio_channel_writev_full_all_flags:
>   * @ioc: the channel object
>   * @iov: the array of memory regions to write data from
>   * @niov: the length of the @iov array
>   * @fds: an array of file handles to send
>   * @nfds: number of file handles in @fds
> + * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
>   * @errp: pointer to a NULL-initialized error object
>   *
>   *
> @@ -846,13 +868,58 @@ int qio_channel_readv_full_all(QIOChannel *ioc,
>   * to be written, yielding from the current coroutine
>   * if required.
>   *
> + * If QIO_CHANNEL_WRITE_FLAG_ZEROCOPY is passed in flags,
> + * instead of waiting for all requested data to be written,
> + * this function will wait until it's all queued for writing.

Another good place to document restrictions on buffer stability.

> + *
>   * Returns: 0 if all bytes were written, or -1 on error
>   */
>  
> -int qio_channel_writev_full_all(QIOChannel *ioc,
> -                                const struct iovec *iov,
> -                                size_t niov,
> -                                int *fds, size_t nfds,
> -                                Error **errp);
> +int qio_channel_writev_full_all_flags(QIOChannel *ioc,
> +                                      const struct iovec *iov,
> +                                      size_t niov,
> +                                      int *fds, size_t nfds,
> +                                      int flags, Error **errp);
> +#define qio_channel_writev_full_all(ioc, iov, niov, fds, nfds, errp) \
> +    qio_channel_writev_full_all_flags(ioc, iov, niov, fds, nfds, 0, errp)
> +
> +/**
> + * qio_channel_writev_zerocopy:
> + * @ioc: the channel object
> + * @iov: the array of memory regions to write data from
> + * @niov: the length of the @iov array
> + * @errp: pointer to a NULL-initialized error object
> + *
> + * Behaves like qio_channel_writev_full_all_flags, but will write
> + * data asynchronously while avoiding unnecessary data copy.
> + * This function may return before any data is actually written,
> + * but should queue every buffer for writting.

writing

Another place to document buffer stability considerations.

> + *
> + * If at some point it's necessary wait for all data to be

s/wait/to wait/

> + * written, use qio_channel_flush_zerocopy().
> + *
> + * If zerocopy is not available, returns -1 and set errp.
> + */
> +
> +ssize_t qio_channel_writev_zerocopy(QIOChannel *ioc,
> +                                    const struct iovec *iov,
> +                                    size_t niov,
> +                                    Error **errp);
> +
> +/**
> + * qio_channel_flush_zerocopy:
> + * @ioc: the channel object
> + * @errp: pointer to a NULL-initialized error object
> + *
> + * Will lock until every packet queued with

s/lock/block/

> + * qio_channel_writev_zerocopy() is sent, or return
> + * in case of any error.
> + *
> + * Returns -1 if any error is found, 0 otherwise.
> + * If not implemented, returns 0 without changing anything.
> + */
> +
> +int qio_channel_flush_zerocopy(QIOChannel *ioc,
> +                               Error **errp);
>  
>  #endif /* QIO_CHANNEL_H */
> diff --git a/io/channel.c b/io/channel.c
> index e8b019dc36..811c93ae23 100644
> --- a/io/channel.c
> +++ b/io/channel.c

> +int qio_channel_flush_zerocopy(QIOChannel *ioc,
> +                               Error **errp)
> +{
> +    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
> +
> +    if (!klass->io_flush_zerocopy ||
> +        !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY)) {
> +        return 0;

Matches your documentation, but an ideal app should not be trying to
flush if the write failed in the first place.  So wouldn't it be
better to return -1 or even abort() on a coding error?

> +    }
> +
> +    return klass->io_flush_zerocopy(ioc, errp);
> +}
> +
> +
>  static void qio_channel_restart_read(void *opaque)
>  {
>      QIOChannel *ioc = opaque;
> -- 
> 2.33.0
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 2/3] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX
  2021-10-09  7:56 ` [PATCH v4 2/3] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX Leonardo Bras
@ 2021-10-11 19:27   ` Eric Blake
  2021-10-11 19:44     ` Leonardo Bras Soares Passos
  2021-10-13  6:18   ` Peter Xu
  2021-11-02 13:13   ` Juan Quintela
  2 siblings, 1 reply; 35+ messages in thread
From: Eric Blake @ 2021-10-11 19:27 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Daniel P. Berrangé,
	Juan Quintela, qemu-devel, Jason Wang, Markus Armbruster,
	Peter Xu, Dr. David Alan Gilbert

On Sat, Oct 09, 2021 at 04:56:12AM -0300, Leonardo Bras wrote:
> For CONFIG_LINUX, implement the new optional callbacks io_write_zerocopy and
> io_flush_zerocopy on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
> feature is available in the host kernel, which is checked on
> qio_channel_socket_connect_sync()
> 
> qio_channel_socket_writev() contents were moved to a helper function
> qio_channel_socket_writev_flags() which accepts an extra argument for flags.
> (This argument is passed directly to sendmsg().
> 
> The above helper function is used to implement qio_channel_socket_writev(),
> with flags = 0, keeping it's behavior unchanged, and

its (remember, "it's" is shorthand for "it is", which does not fit here)

> qio_channel_socket_writev_zerocopy() with flags = MSG_ZEROCOPY.
> 
> qio_channel_socket_flush_zerocopy() was implemented by counting how many times
> sendmsg(...,MSG_ZEROCOPY) was sucessfully called, and then reading the
> socket's error queue, in order to find how many of them finished sending.
> Flush will loop until those counters are the same, or until some error ocurs.

occurs

> 
> A new function qio_channel_socket_poll() was also created in order to avoid
> busy-looping recvmsg() in qio_channel_socket_flush_zerocopy() while waiting for
> updates in socket's error queue.
> 
> Notes on using writev_zerocopy():
> 1: Buffer
> - As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying,
> some caution is necessary to avoid overwriting any buffer before it's sent.
> If something like this happen, a newer version of the buffer may be sent instead.
> - If this is a problem, it's recommended to call flush_zerocopy() before freeing
> or re-using the buffer.
> 
> 2: Locked memory
> - When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and
> unlocked after it's sent.
> - Depending on the size of each buffer, and how often it's sent, it may require
> a larger amount of locked memory than usually available to non-root user.
> - If the required amount of locked memory is not available, writev_zerocopy
> will return an error, which can abort an operation like migration,
> - Because of this, when an user code wants to add zerocopy as a feature, it
> requires a mechanism to disable it, so it can still be acessible to less
> privileged users.
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  include/io/channel-socket.h |   2 +
>  include/io/channel.h        |   1 +
>  io/channel-socket.c         | 180 ++++++++++++++++++++++++++++++++++--
>  3 files changed, 173 insertions(+), 10 deletions(-)
> 
> +static int qio_channel_socket_flush_zerocopy(QIOChannel *ioc,
> +                                             Error **errp)
> +{

> +
> +        /* No errors, count sucessfully finished sendmsg()*/

Space before */

> +        sioc->zerocopy_sent += serr->ee_data - serr->ee_info + 1;
> +    }
> +    return 0;
> +}
> +
> +#endif /* CONFIG_LINUX */
> +
>  static int
>  qio_channel_socket_set_blocking(QIOChannel *ioc,
>                                  bool enabled,
> @@ -787,6 +943,10 @@ static void qio_channel_socket_class_init(ObjectClass *klass,
>      ioc_klass->io_set_delay = qio_channel_socket_set_delay;
>      ioc_klass->io_create_watch = qio_channel_socket_create_watch;
>      ioc_klass->io_set_aio_fd_handler = qio_channel_socket_set_aio_fd_handler;
> +#ifdef CONFIG_LINUX
> +    ioc_klass->io_writev_zerocopy = qio_channel_socket_writev_zerocopy;
> +    ioc_klass->io_flush_zerocopy = qio_channel_socket_flush_zerocopy;
> +#endif
>  }

I did a high-level look at the code, rather than an in-depth review of
whether zero-copy was being used correctly.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-10-09  7:56 ` [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy) Leonardo Bras
@ 2021-10-11 19:31   ` Eric Blake
  2021-10-11 19:56     ` Leonardo Bras Soares Passos
  2021-10-13  6:23   ` Peter Xu
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 35+ messages in thread
From: Eric Blake @ 2021-10-11 19:31 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Daniel P. Berrangé,
	Juan Quintela, qemu-devel, Jason Wang, Markus Armbruster,
	Peter Xu, Dr. David Alan Gilbert

On Sat, Oct 09, 2021 at 04:56:13AM -0300, Leonardo Bras wrote:
> Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> zerocopy interface.
> 
> Change multifd_send_sync_main() so it can distinguish the last sync from
> the setup and per-iteration ones, so a flush_zerocopy() can be called
> at the last sync in order to make sure all RAM is sent before finishing
> the migration.
> 
> Also make it return -1 if flush_zerocopy() fails, in order to cancel
> the migration process, and avoid resuming the guest in the target host
> without receiving all current RAM.
> 
> This will work fine on RAM migration because the RAM pages are not usually freed,
> and there is no problem on changing the pages content between async_send() and
> the actual sending of the buffer, because this change will dirty the page and
> cause it to be re-sent on a next iteration anyway.
> 
> Given a lot of locked memory may be needed in order to use multid migration
> with zerocopy enabled, make it optional by creating a new parameter
> multifd-zerocopy on qapi, so low-privileged users can still perform multifd
> migrations.
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  qapi/migration.json   | 18 ++++++++++++++++++
>  migration/migration.h |  1 +
>  migration/multifd.h   |  2 +-
>  migration/migration.c | 20 ++++++++++++++++++++
>  migration/multifd.c   | 33 ++++++++++++++++++++++++++++-----
>  migration/ram.c       | 20 +++++++++++++-------
>  monitor/hmp-cmds.c    |  4 ++++
>  7 files changed, 85 insertions(+), 13 deletions(-)
> 
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 88f07baedd..c4890cbb54 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -724,6 +724,11 @@
>  #                      will consume more CPU.
>  #                      Defaults to 1. (Since 5.0)
>  #
> +# @multifd-zerocopy: Controls behavior on sending memory pages on multifd migration.
> +#                    When true, enables a zerocopy mechanism for sending memory
> +#                    pages, if host does support it.

s/does support/supports/ (several instances this patch)

> +#                    Defaults to false. (Since 6.2)
> +#
>  # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
>  #                        aliases for the purpose of dirty bitmap migration.  Such
>  #                        aliases may for example be the corresponding names on the
> @@ -758,6 +763,7 @@
>             'xbzrle-cache-size', 'max-postcopy-bandwidth',
>             'max-cpu-throttle', 'multifd-compression',
>             'multifd-zlib-level' ,'multifd-zstd-level',
> +	   'multifd-zerocopy',
>             'block-bitmap-mapping' ] }

Should this feature be guarded with 'if':'CONFIG_LINUX', since that's
the only platform where you even compile the code (even then, it can
still fail if the kernel doesn't support it).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 1/3] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks
  2021-10-11 19:17   ` Eric Blake
@ 2021-10-11 19:38     ` Leonardo Bras Soares Passos
  2021-10-11 20:45       ` Eric Blake
  0 siblings, 1 reply; 35+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-10-11 19:38 UTC (permalink / raw)
  To: Eric Blake
  Cc: Daniel P. Berrangé,
	Juan Quintela, qemu-devel, Jason Wang, Markus Armbruster,
	Peter Xu, Dr. David Alan Gilbert

Hello Eric, thank you for the feedback!

On Mon, Oct 11, 2021 at 4:17 PM Eric Blake <eblake@redhat.com> wrote:
>
> On Sat, Oct 09, 2021 at 04:56:11AM -0300, Leonardo Bras wrote:
> > Adds io_async_writev and io_async_flush as optional callback to QIOChannelClass,
>
> Are these names accurate?

I am sorry, I think I missed some renaming before generating the patchset.
As you suggested those names are incorrect (as they reflect a previous
naming used in v3).
I will replace them for io_{writev,flush}_zerocopy in v5.

>
> > allowing the implementation of asynchronous writes by subclasses.
> >
> > How to use them:
> > - Write data using qio_channel_writev_zerocopu(),
>
> s/copu/copy/

Thanks, I will fix that.

>
> > - Wait write completion with qio_channel_flush_zerocopy().
> >
> > Notes:
> > As some zerocopy implementations work asynchronously, it's
> > recommended to keep the write buffer untouched until the return of
> > qio_channel_flush_zerocopy(), by the risk of sending an updated buffer
>
> s/by/to avoid/
>
> > instead of the one at the write.
> >
> > As the new callbacks are optional, if a subclass does not implement them, then:
> > - io_async_writev will return -1,
> > - io_async_flush will return 0 without changing anything.
>
> Are these names accurate?

They are not, I will replace them for io_{writev,flush}_zerocopy in v5.

>
> >
> > Also, some functions like qio_channel_writev_full_all() were adapted to
> > receive a flag parameter. That allows shared code between zerocopy and
> > non-zerocopy writev.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> >  include/io/channel.h | 103 +++++++++++++++++++++++++++++++++++--------
> >  io/channel.c         |  74 +++++++++++++++++++++++--------
> >  2 files changed, 141 insertions(+), 36 deletions(-)
> >
> > diff --git a/include/io/channel.h b/include/io/channel.h
> > index 88988979f8..e7d4e1521f 100644
> > --- a/include/io/channel.h
> > +++ b/include/io/channel.h
> > @@ -32,12 +32,15 @@ OBJECT_DECLARE_TYPE(QIOChannel, QIOChannelClass,
> >
> >  #define QIO_CHANNEL_ERR_BLOCK -2
> >
> > +#define QIO_CHANNEL_WRITE_FLAG_ZEROCOPY 0x1
> > +
> >  typedef enum QIOChannelFeature QIOChannelFeature;
> >
> >  enum QIOChannelFeature {
> >      QIO_CHANNEL_FEATURE_FD_PASS,
> >      QIO_CHANNEL_FEATURE_SHUTDOWN,
> >      QIO_CHANNEL_FEATURE_LISTEN,
> > +    QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY,
> >  };
> >
> >
> > @@ -136,6 +139,12 @@ struct QIOChannelClass {
> >                                    IOHandler *io_read,
> >                                    IOHandler *io_write,
> >                                    void *opaque);
> > +    ssize_t (*io_writev_zerocopy)(QIOChannel *ioc,
> > +                                  const struct iovec *iov,
> > +                                  size_t niov,
> > +                                  Error **errp);
> > +    int (*io_flush_zerocopy)(QIOChannel *ioc,
> > +                              Error **errp);
>
> Indentation is off by one.

Thanks, I will fix that for v5.

>
> >  };
> >
> >  /* General I/O handling functions */
> > @@ -222,12 +231,13 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
> >
> >
> >  /**
> > - * qio_channel_writev_full:
> > + * qio_channel_writev_full_flags:
> >   * @ioc: the channel object
> >   * @iov: the array of memory regions to write data from
> >   * @niov: the length of the @iov array
> >   * @fds: an array of file handles to send
> >   * @nfds: number of file handles in @fds
> > + * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
> >   * @errp: pointer to a NULL-initialized error object
> >   *
> >   * Write data to the IO channel, reading it from the
> > @@ -242,6 +252,10 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
> >   * guaranteed. If the channel is non-blocking and no
> >   * data can be sent, it will return QIO_CHANNEL_ERR_BLOCK
> >   *
> > + * If flag QIO_CHANNEL_WRITE_FLAG_ZEROCOPY is passed,
> > + * function will return once each buffer was queued for
> > + * sending.
>
> This would be a good place to document the requirement to keep the
> buffer unchanged until the zerocopy sequence completes.

That makes sense, even though that may be true for just some implementations,
it makes sense to document it here.

>
> >                                 Error **errp);
> >
> >  /**
> > - * qio_channel_writev_full_all:
> > + * qio_channel_writev_full_all_flags:
> >   * @ioc: the channel object
> >   * @iov: the array of memory regions to write data from
> >   * @niov: the length of the @iov array
> >   * @fds: an array of file handles to send
> >   * @nfds: number of file handles in @fds
> > + * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
> >   * @errp: pointer to a NULL-initialized error object
> >   *
> >   *
> > @@ -846,13 +868,58 @@ int qio_channel_readv_full_all(QIOChannel *ioc,
> >   * to be written, yielding from the current coroutine
> >   * if required.
> >   *
> > + * If QIO_CHANNEL_WRITE_FLAG_ZEROCOPY is passed in flags,
> > + * instead of waiting for all requested data to be written,
> > + * this function will wait until it's all queued for writing.
>
> Another good place to document restrictions on buffer stability.

Ok :)

>
> > + *
> >   * Returns: 0 if all bytes were written, or -1 on error
> >   */
> >
> > -int qio_channel_writev_full_all(QIOChannel *ioc,
> > -                                const struct iovec *iov,
> > -                                size_t niov,
> > -                                int *fds, size_t nfds,
> > -                                Error **errp);
> > +int qio_channel_writev_full_all_flags(QIOChannel *ioc,
> > +                                      const struct iovec *iov,
> > +                                      size_t niov,
> > +                                      int *fds, size_t nfds,
> > +                                      int flags, Error **errp);
> > +#define qio_channel_writev_full_all(ioc, iov, niov, fds, nfds, errp) \
> > +    qio_channel_writev_full_all_flags(ioc, iov, niov, fds, nfds, 0, errp)
> > +
> > +/**
> > + * qio_channel_writev_zerocopy:
> > + * @ioc: the channel object
> > + * @iov: the array of memory regions to write data from
> > + * @niov: the length of the @iov array
> > + * @errp: pointer to a NULL-initialized error object
> > + *
> > + * Behaves like qio_channel_writev_full_all_flags, but will write
> > + * data asynchronously while avoiding unnecessary data copy.
> > + * This function may return before any data is actually written,
> > + * but should queue every buffer for writting.
>
> writing
>
> Another place to document buffer stability considerations.

Ok,
Is it enough to document it in a single one of the places suggested, or
would you recommend documenting it in all suggested places?

>
> > + *
> > + * If at some point it's necessary wait for all data to be
>
> s/wait/to wait/

I will fix that for v5, thanks!

>
> > + * written, use qio_channel_flush_zerocopy().
> > + *
> > + * If zerocopy is not available, returns -1 and set errp.
> > + */
> > +
> > +ssize_t qio_channel_writev_zerocopy(QIOChannel *ioc,
> > +                                    const struct iovec *iov,
> > +                                    size_t niov,
> > +                                    Error **errp);
> > +
> > +/**
> > + * qio_channel_flush_zerocopy:
> > + * @ioc: the channel object
> > + * @errp: pointer to a NULL-initialized error object
> > + *
> > + * Will lock until every packet queued with
>
> s/lock/block/

Yeah, I should have fixed it in v4.
Thanks for pointing this out.

>
> > + * qio_channel_writev_zerocopy() is sent, or return
> > + * in case of any error.
> > + *
> > + * Returns -1 if any error is found, 0 otherwise.
> > + * If not implemented, returns 0 without changing anything.
> > + */
> > +
> > +int qio_channel_flush_zerocopy(QIOChannel *ioc,
> > +                               Error **errp);
> >
> >  #endif /* QIO_CHANNEL_H */
> > diff --git a/io/channel.c b/io/channel.c
> > index e8b019dc36..811c93ae23 100644
> > --- a/io/channel.c
> > +++ b/io/channel.c
>
> > +int qio_channel_flush_zerocopy(QIOChannel *ioc,
> > +                               Error **errp)
> > +{
> > +    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
> > +
> > +    if (!klass->io_flush_zerocopy ||
> > +        !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY)) {
> > +        return 0;
>
> Matches your documentation, but an ideal app should not be trying to
> flush if the write failed in the first place.  So wouldn't it be
> better to return -1 or even abort() on a coding error?

The point here is that any valid user of zrocopy_flush would have
already used zerocopy_writev
at some point, and failed if not supported / enabled.

Having this not returning error can help the user keep a simpler
approach when using
a setup in which the writev can happen in both zerocopy or default behavior.

I mean, the user will not need to check if zerocopy was or was not
enabled, and just flush anyway.

But if it's not good behavior, or you guys think it's a better
approach to fail here, I can also do that.

>
> > +    }
> > +
> > +    return klass->io_flush_zerocopy(ioc, errp);
> > +}
> > +
> > +
> >  static void qio_channel_restart_read(void *opaque)
> >  {
> >      QIOChannel *ioc = opaque;
> > --
> > 2.33.0
> >
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
>

Thank you for reviewing!

Best regards,
Leo



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 2/3] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX
  2021-10-11 19:27   ` Eric Blake
@ 2021-10-11 19:44     ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 35+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-10-11 19:44 UTC (permalink / raw)
  To: Eric Blake
  Cc: Daniel P. Berrangé,
	Juan Quintela, qemu-devel, Jason Wang, Markus Armbruster,
	Peter Xu, Dr. David Alan Gilbert

Hello Eric,

On Mon, Oct 11, 2021 at 4:28 PM Eric Blake <eblake@redhat.com> wrote:
>
> On Sat, Oct 09, 2021 at 04:56:12AM -0300, Leonardo Bras wrote:
> > For CONFIG_LINUX, implement the new optional callbacks io_write_zerocopy and
> > io_flush_zerocopy on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
> > feature is available in the host kernel, which is checked on
> > qio_channel_socket_connect_sync()
> >
> > qio_channel_socket_writev() contents were moved to a helper function
> > qio_channel_socket_writev_flags() which accepts an extra argument for flags.
> > (This argument is passed directly to sendmsg().
> >
> > The above helper function is used to implement qio_channel_socket_writev(),
> > with flags = 0, keeping it's behavior unchanged, and
>
> its (remember, "it's" is shorthand for "it is", which does not fit here)
>
> > qio_channel_socket_writev_zerocopy() with flags = MSG_ZEROCOPY.
> >
> > qio_channel_socket_flush_zerocopy() was implemented by counting how many times
> > sendmsg(...,MSG_ZEROCOPY) was sucessfully called, and then reading the
> > socket's error queue, in order to find how many of them finished sending.
> > Flush will loop until those counters are the same, or until some error ocurs.
>
> occurs

Thanks!
(I will check if my codespell is enabled in this setup)

>
> >
> > A new function qio_channel_socket_poll() was also created in order to avoid
> > busy-looping recvmsg() in qio_channel_socket_flush_zerocopy() while waiting for
> > updates in socket's error queue.
> >
> > Notes on using writev_zerocopy():
> > 1: Buffer
> > - As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying,
> > some caution is necessary to avoid overwriting any buffer before it's sent.
> > If something like this happen, a newer version of the buffer may be sent instead.
> > - If this is a problem, it's recommended to call flush_zerocopy() before freeing
> > or re-using the buffer.
> >
> > 2: Locked memory
> > - When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and
> > unlocked after it's sent.
> > - Depending on the size of each buffer, and how often it's sent, it may require
> > a larger amount of locked memory than usually available to non-root user.
> > - If the required amount of locked memory is not available, writev_zerocopy
> > will return an error, which can abort an operation like migration,
> > - Because of this, when an user code wants to add zerocopy as a feature, it
> > requires a mechanism to disable it, so it can still be acessible to less
> > privileged users.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> >  include/io/channel-socket.h |   2 +
> >  include/io/channel.h        |   1 +
> >  io/channel-socket.c         | 180 ++++++++++++++++++++++++++++++++++--
> >  3 files changed, 173 insertions(+), 10 deletions(-)
> >
> > +static int qio_channel_socket_flush_zerocopy(QIOChannel *ioc,
> > +                                             Error **errp)
> > +{
>
> > +
> > +        /* No errors, count sucessfully finished sendmsg()*/
>
> Space before */

Thanks!
Also, typo on 'successfully'.

>
> > +        sioc->zerocopy_sent += serr->ee_data - serr->ee_info + 1;
> > +    }
> > +    return 0;
> > +}
> > +
> > +#endif /* CONFIG_LINUX */
> > +
> >  static int
> >  qio_channel_socket_set_blocking(QIOChannel *ioc,
> >                                  bool enabled,
> > @@ -787,6 +943,10 @@ static void qio_channel_socket_class_init(ObjectClass *klass,
> >      ioc_klass->io_set_delay = qio_channel_socket_set_delay;
> >      ioc_klass->io_create_watch = qio_channel_socket_create_watch;
> >      ioc_klass->io_set_aio_fd_handler = qio_channel_socket_set_aio_fd_handler;
> > +#ifdef CONFIG_LINUX
> > +    ioc_klass->io_writev_zerocopy = qio_channel_socket_writev_zerocopy;
> > +    ioc_klass->io_flush_zerocopy = qio_channel_socket_flush_zerocopy;
> > +#endif
> >  }
>
> I did a high-level look at the code, rather than an in-depth review of
> whether zero-copy was being used correctly.

It's so far been very helpful. Thanks!

Best regards,
Leo



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-10-11 19:31   ` Eric Blake
@ 2021-10-11 19:56     ` Leonardo Bras Soares Passos
  2021-10-12  5:53       ` Markus Armbruster
  0 siblings, 1 reply; 35+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-10-11 19:56 UTC (permalink / raw)
  To: Eric Blake
  Cc: Daniel P. Berrangé,
	Juan Quintela, qemu-devel, Jason Wang, Markus Armbruster,
	Peter Xu, Dr. David Alan Gilbert

Hello Eric,

On Mon, Oct 11, 2021 at 4:32 PM Eric Blake <eblake@redhat.com> wrote:
>
> On Sat, Oct 09, 2021 at 04:56:13AM -0300, Leonardo Bras wrote:
> > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> > zerocopy interface.
> >
> > Change multifd_send_sync_main() so it can distinguish the last sync from
> > the setup and per-iteration ones, so a flush_zerocopy() can be called
> > at the last sync in order to make sure all RAM is sent before finishing
> > the migration.
> >
> > Also make it return -1 if flush_zerocopy() fails, in order to cancel
> > the migration process, and avoid resuming the guest in the target host
> > without receiving all current RAM.
> >
> > This will work fine on RAM migration because the RAM pages are not usually freed,
> > and there is no problem on changing the pages content between async_send() and
> > the actual sending of the buffer, because this change will dirty the page and
> > cause it to be re-sent on a next iteration anyway.
> >
> > Given a lot of locked memory may be needed in order to use multid migration
> > with zerocopy enabled, make it optional by creating a new parameter
> > multifd-zerocopy on qapi, so low-privileged users can still perform multifd
> > migrations.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> >  qapi/migration.json   | 18 ++++++++++++++++++
> >  migration/migration.h |  1 +
> >  migration/multifd.h   |  2 +-
> >  migration/migration.c | 20 ++++++++++++++++++++
> >  migration/multifd.c   | 33 ++++++++++++++++++++++++++++-----
> >  migration/ram.c       | 20 +++++++++++++-------
> >  monitor/hmp-cmds.c    |  4 ++++
> >  7 files changed, 85 insertions(+), 13 deletions(-)
> >
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 88f07baedd..c4890cbb54 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -724,6 +724,11 @@
> >  #                      will consume more CPU.
> >  #                      Defaults to 1. (Since 5.0)
> >  #
> > +# @multifd-zerocopy: Controls behavior on sending memory pages on multifd migration.
> > +#                    When true, enables a zerocopy mechanism for sending memory
> > +#                    pages, if host does support it.
>
> s/does support/supports/ (several instances this patch)

I will make sure to fix that in v5.

>
> > +#                    Defaults to false. (Since 6.2)
> > +#
> >  # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
> >  #                        aliases for the purpose of dirty bitmap migration.  Such
> >  #                        aliases may for example be the corresponding names on the
> > @@ -758,6 +763,7 @@
> >             'xbzrle-cache-size', 'max-postcopy-bandwidth',
> >             'max-cpu-throttle', 'multifd-compression',
> >             'multifd-zlib-level' ,'multifd-zstd-level',
> > +        'multifd-zerocopy',
> >             'block-bitmap-mapping' ] }
>
> Should this feature be guarded with 'if':'CONFIG_LINUX', since that's
> the only platform where you even compile the code (even then, it can
> still fail if the kernel doesn't support it).

I think it makes sense for the feature to always be available, even
though it's not supported
outside linux > v4.14.

IMHO it makes more sense for the user to get an error when it starts
migration, due to host
not supporting zerocopy, than the error happening in the config step,
which may cause the user
to question if it was the right parameter.

The config message error here could also be ignored, and users can
think zerocopy is working, while it's not.

For automated migrations, this feature should never be enabled  for
non-linux / older linux hosts anyway.

Is there a good argument I am missing for this feature being disabled
on non-linux?

>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
>

Best regards,
Leo



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 1/3] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks
  2021-10-11 19:38     ` Leonardo Bras Soares Passos
@ 2021-10-11 20:45       ` Eric Blake
  2021-10-11 20:59         ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Blake @ 2021-10-11 20:45 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: Daniel P. Berrangé,
	Juan Quintela, qemu-devel, Jason Wang, Markus Armbruster,
	Peter Xu, Dr. David Alan Gilbert, jsnow

On Mon, Oct 11, 2021 at 04:38:23PM -0300, Leonardo Bras Soares Passos wrote:
> > >  /**
> > > - * qio_channel_writev_full:
> > > + * qio_channel_writev_full_flags:
> > >   * @ioc: the channel object
> > >   * @iov: the array of memory regions to write data from
> > >   * @niov: the length of the @iov array
> > >   * @fds: an array of file handles to send
> > >   * @nfds: number of file handles in @fds
> > > + * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
> > >   * @errp: pointer to a NULL-initialized error object
> > >   *
> > >   * Write data to the IO channel, reading it from the
> > > @@ -242,6 +252,10 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
> > >   * guaranteed. If the channel is non-blocking and no
> > >   * data can be sent, it will return QIO_CHANNEL_ERR_BLOCK
> > >   *
> > > + * If flag QIO_CHANNEL_WRITE_FLAG_ZEROCOPY is passed,
> > > + * function will return once each buffer was queued for
> > > + * sending.
> >
> > This would be a good place to document the requirement to keep the
> > buffer unchanged until the zerocopy sequence completes.
> 
> That makes sense, even though that may be true for just some implementations,
> it makes sense to document it here.
> 

> 
> Ok,
> Is it enough to document it in a single one of the places suggested, or
> would you recommend documenting it in all suggested places?

Ah, the curse of maintaining copy-and-paste.  If you can find a way to
say "see this other type for limitations" that sounds fine, it avoids
the risk of later edits touching one but not all identical copies.
But our current process for generating sphynx documentation from the
qapi generator does not have cross-referencing abilities that I'm
aware of.  Markus or John, any thoughts?

> >
> > > +int qio_channel_flush_zerocopy(QIOChannel *ioc,
> > > +                               Error **errp)
> > > +{
> > > +    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
> > > +
> > > +    if (!klass->io_flush_zerocopy ||
> > > +        !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY)) {
> > > +        return 0;
> >
> > Matches your documentation, but an ideal app should not be trying to
> > flush if the write failed in the first place.  So wouldn't it be
> > better to return -1 or even abort() on a coding error?
> 
> The point here is that any valid user of zrocopy_flush would have
> already used zerocopy_writev
> at some point, and failed if not supported / enabled.
> 
> Having this not returning error can help the user keep a simpler
> approach when using
> a setup in which the writev can happen in both zerocopy or default behavior.
> 
> I mean, the user will not need to check if zerocopy was or was not
> enabled, and just flush anyway.
> 
> But if it's not good behavior, or you guys think it's a better
> approach to fail here, I can also do that.

Either flush is supposed to be a no-op when zerocopy fails (so
returning 0 is okay), or should not be attempted unless zerocopy
succeeded (in which case abort()ing seems like the best way to point
out the programmer's error).  But I wasn't clear from your
documentation which of the two behaviors you had in mind.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 1/3] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks
  2021-10-11 20:45       ` Eric Blake
@ 2021-10-11 20:59         ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 35+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-10-11 20:59 UTC (permalink / raw)
  To: Eric Blake
  Cc: Daniel P. Berrangé,
	Juan Quintela, qemu-devel, Jason Wang, Markus Armbruster,
	Peter Xu, Dr. David Alan Gilbert, jsnow

On Mon, Oct 11, 2021 at 5:45 PM Eric Blake <eblake@redhat.com> wrote:
>
> On Mon, Oct 11, 2021 at 04:38:23PM -0300, Leonardo Bras Soares Passos wrote:
> > > >  /**
> > > > - * qio_channel_writev_full:
> > > > + * qio_channel_writev_full_flags:
> > > >   * @ioc: the channel object
> > > >   * @iov: the array of memory regions to write data from
> > > >   * @niov: the length of the @iov array
> > > >   * @fds: an array of file handles to send
> > > >   * @nfds: number of file handles in @fds
> > > > + * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
> > > >   * @errp: pointer to a NULL-initialized error object
> > > >   *
> > > >   * Write data to the IO channel, reading it from the
> > > > @@ -242,6 +252,10 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
> > > >   * guaranteed. If the channel is non-blocking and no
> > > >   * data can be sent, it will return QIO_CHANNEL_ERR_BLOCK
> > > >   *
> > > > + * If flag QIO_CHANNEL_WRITE_FLAG_ZEROCOPY is passed,
> > > > + * function will return once each buffer was queued for
> > > > + * sending.
> > >
> > > This would be a good place to document the requirement to keep the
> > > buffer unchanged until the zerocopy sequence completes.
> >
> > That makes sense, even though that may be true for just some implementations,
> > it makes sense to document it here.
> >
>
> >
> > Ok,
> > Is it enough to document it in a single one of the places suggested, or
> > would you recommend documenting it in all suggested places?
>
> Ah, the curse of maintaining copy-and-paste.  If you can find a way to
> say "see this other type for limitations" that sounds fine, it avoids
> the risk of later edits touching one but not all identical copies.
> But our current process for generating sphynx documentation from the
> qapi generator does not have cross-referencing abilities that I'm
> aware of.  Markus or John, any thoughts?
>
> > >
> > > > +int qio_channel_flush_zerocopy(QIOChannel *ioc,
> > > > +                               Error **errp)
> > > > +{
> > > > +    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
> > > > +
> > > > +    if (!klass->io_flush_zerocopy ||
> > > > +        !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY)) {
> > > > +        return 0;
> > >
> > > Matches your documentation, but an ideal app should not be trying to
> > > flush if the write failed in the first place.  So wouldn't it be
> > > better to return -1 or even abort() on a coding error?
> >
> > The point here is that any valid user of zrocopy_flush would have
> > already used zerocopy_writev
> > at some point, and failed if not supported / enabled.
> >
> > Having this not returning error can help the user keep a simpler
> > approach when using
> > a setup in which the writev can happen in both zerocopy or default behavior.
> >
> > I mean, the user will not need to check if zerocopy was or was not
> > enabled, and just flush anyway.
> >
> > But if it's not good behavior, or you guys think it's a better
> > approach to fail here, I can also do that.
>
> Either flush is supposed to be a no-op when zerocopy fails (so
> returning 0 is okay), or should not be attempted unless zerocopy
> succeeded (in which case abort()ing seems like the best way to point
> out the programmer's error).  But I wasn't clear from your
> documentation which of the two behaviors you had in mind.

Oh, sorry about that.
Yeah, I intend to use it as a no-op.
If it's fine I will update the docs for v5.


>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
>

Thanks!



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-10-11 19:56     ` Leonardo Bras Soares Passos
@ 2021-10-12  5:53       ` Markus Armbruster
  2021-10-28  1:56         ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 35+ messages in thread
From: Markus Armbruster @ 2021-10-12  5:53 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: Daniel P. Berrangé,
	Juan Quintela, Jason Wang, Dr. David Alan Gilbert, Peter Xu,
	qemu-devel, Eric Blake

Leonardo Bras Soares Passos <leobras@redhat.com> writes:

> Hello Eric,
>
> On Mon, Oct 11, 2021 at 4:32 PM Eric Blake <eblake@redhat.com> wrote:
>>
>> On Sat, Oct 09, 2021 at 04:56:13AM -0300, Leonardo Bras wrote:
>> > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
>> > zerocopy interface.
>> >
>> > Change multifd_send_sync_main() so it can distinguish the last sync from
>> > the setup and per-iteration ones, so a flush_zerocopy() can be called
>> > at the last sync in order to make sure all RAM is sent before finishing
>> > the migration.
>> >
>> > Also make it return -1 if flush_zerocopy() fails, in order to cancel
>> > the migration process, and avoid resuming the guest in the target host
>> > without receiving all current RAM.
>> >
>> > This will work fine on RAM migration because the RAM pages are not usually freed,
>> > and there is no problem on changing the pages content between async_send() and
>> > the actual sending of the buffer, because this change will dirty the page and
>> > cause it to be re-sent on a next iteration anyway.
>> >
>> > Given a lot of locked memory may be needed in order to use multid migration
>> > with zerocopy enabled, make it optional by creating a new parameter
>> > multifd-zerocopy on qapi, so low-privileged users can still perform multifd
>> > migrations.
>> >
>> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
>> > ---
>> >  qapi/migration.json   | 18 ++++++++++++++++++
>> >  migration/migration.h |  1 +
>> >  migration/multifd.h   |  2 +-
>> >  migration/migration.c | 20 ++++++++++++++++++++
>> >  migration/multifd.c   | 33 ++++++++++++++++++++++++++++-----
>> >  migration/ram.c       | 20 +++++++++++++-------
>> >  monitor/hmp-cmds.c    |  4 ++++
>> >  7 files changed, 85 insertions(+), 13 deletions(-)
>> >
>> > diff --git a/qapi/migration.json b/qapi/migration.json
>> > index 88f07baedd..c4890cbb54 100644
>> > --- a/qapi/migration.json
>> > +++ b/qapi/migration.json
>> > @@ -724,6 +724,11 @@
>> >  #                      will consume more CPU.
>> >  #                      Defaults to 1. (Since 5.0)
>> >  #
>> > +# @multifd-zerocopy: Controls behavior on sending memory pages on multifd migration.
>> > +#                    When true, enables a zerocopy mechanism for sending memory
>> > +#                    pages, if host does support it.
>>
>> s/does support/supports/ (several instances this patch)
>
> I will make sure to fix that in v5.
>
>>
>> > +#                    Defaults to false. (Since 6.2)
>> > +#
>> >  # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
>> >  #                        aliases for the purpose of dirty bitmap migration.  Such
>> >  #                        aliases may for example be the corresponding names on the
>> > @@ -758,6 +763,7 @@
>> >             'xbzrle-cache-size', 'max-postcopy-bandwidth',
>> >             'max-cpu-throttle', 'multifd-compression',
>> >             'multifd-zlib-level' ,'multifd-zstd-level',
>> > +        'multifd-zerocopy',
>> >             'block-bitmap-mapping' ] }
>>
>> Should this feature be guarded with 'if':'CONFIG_LINUX', since that's
>> the only platform where you even compile the code (even then, it can
>> still fail if the kernel doesn't support it).
>
> I think it makes sense for the feature to always be available, even
> though it's not supported
> outside linux > v4.14.
>
> IMHO it makes more sense for the user to get an error when it starts
> migration, due to host
> not supporting zerocopy, than the error happening in the config step,
> which may cause the user
> to question if it was the right parameter.
>
> The config message error here could also be ignored, and users can
> think zerocopy is working, while it's not.
>
> For automated migrations, this feature should never be enabled  for
> non-linux / older linux hosts anyway.
>
> Is there a good argument I am missing for this feature being disabled
> on non-linux?

The general argument for having QAPI schema 'if' mirror the C
implementation's #if is introspection.  Let me explain why that matters.

Consider a management application that supports a range of QEMU
versions, say 5.0 to 6.2.  Say it wants to use an QMP command that is
new in QEMU 6.2.  The sane way to do that is to probe for the command
with query-qmp-schema.  Same for command arguments, and anything else
QMP.

If you doubt "sane", check out Part II of "QEMU interface introspection:
From hacks to solutions"[*].

The same technique works when a QMP command / argument / whatever is
compile-time conditional ('if' in the schema).  The code the management
application needs anyway to deal with older QEMU now also deals with
"compiled out".  Nice.

Of course, a command or argument present in QEMU can still fail, and the
management application still needs to handle failure.  Distinguishing
different failure modes can be bothersome and/or fragile.

By making the QAPI schema conditional mirror the C conditional, you
squash the failure mode "this version of QEMU supports it, but this
build of QEMU does not" into "this version of QEMU does not support
it".  Makes sense, doesn't it?

A minor additional advantage is less generated code.



[*] http://events17.linuxfoundation.org/sites/events/files/slides/armbru-qemu-introspection.pdf



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 1/3] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks
  2021-10-09  7:56 ` [PATCH v4 1/3] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks Leonardo Bras
  2021-10-11 19:17   ` Eric Blake
@ 2021-10-13  6:07   ` Peter Xu
  2021-10-13  6:32     ` Peter Xu
  1 sibling, 1 reply; 35+ messages in thread
From: Peter Xu @ 2021-10-13  6:07 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Daniel P. Berrangé,
	Juan Quintela, qemu-devel, Jason Wang, Dr. David Alan Gilbert,
	Markus Armbruster, Eric Blake

On Sat, Oct 09, 2021 at 04:56:11AM -0300, Leonardo Bras wrote:
> -int qio_channel_writev_full_all(QIOChannel *ioc,
> -                                const struct iovec *iov,
> -                                size_t niov,
> -                                int *fds, size_t nfds,
> -                                Error **errp)
> +int qio_channel_writev_full_all_flags(QIOChannel *ioc,
> +                                      const struct iovec *iov,
> +                                      size_t niov,
> +                                      int *fds, size_t nfds,
> +                                      int flags, Error **errp)
>  {
>      int ret = -1;
>      struct iovec *local_iov = g_new(struct iovec, niov);
> @@ -237,8 +250,8 @@ int qio_channel_writev_full_all(QIOChannel *ioc,
>  
>      while (nlocal_iov > 0) {
>          ssize_t len;
> -        len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds, nfds,
> -                                      errp);
> +        len = qio_channel_writev_full_flags(ioc, local_iov, nlocal_iov, fds, nfds,
> +                                          flags, errp);

IMHO we can keep qio_channel_writev_full() untouched, as it is the wrapper for
io_writev() hook right now (and it allows taking fds).  Then here instead of
adding a new flags into it, we can introduce qio_channel_writev_zerocopy_full()
and directly call here:

           if (flags & ZEROCOPY) {
               assert(fds == NULL && nfds == 0);
               qio_channel_writev_zerocopy_full(...[no fds passing in]);
           } else {
               qio_channel_writev_full(...[with all parameters]);
           }

Then qio_channel_writev_zerocopy_full() should be simplely the wrapper for the
new io_writev_zerocopy() hook.

>          if (len == QIO_CHANNEL_ERR_BLOCK) {
>              if (qemu_in_coroutine()) {
>                  qio_channel_yield(ioc, G_IO_OUT);
> @@ -474,6 +487,31 @@ off_t qio_channel_io_seek(QIOChannel *ioc,
>  }
>  
>  
> +ssize_t qio_channel_writev_zerocopy(QIOChannel *ioc,
> +                                    const struct iovec *iov,
> +                                    size_t niov,
> +                                    Error **errp)
> +{
> +    return qio_channel_writev_full_flags(ioc, iov, niov, NULL, 0,
> +                                         QIO_CHANNEL_WRITE_FLAG_ZEROCOPY,
> +                                         errp);
> +}

This function is never used, right? qio_channel_writev_all_flags() is used in
patch 3, with proper flags passed in.  Then IMHO we can drop this one.

The rest looks good, as long as with Eric's comment addressed.

Thanks,

> +
> +
> +int qio_channel_flush_zerocopy(QIOChannel *ioc,
> +                               Error **errp)
> +{
> +    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
> +
> +    if (!klass->io_flush_zerocopy ||
> +        !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY)) {
> +        return 0;
> +    }
> +
> +    return klass->io_flush_zerocopy(ioc, errp);
> +}
> +
> +
>  static void qio_channel_restart_read(void *opaque)
>  {
>      QIOChannel *ioc = opaque;
> -- 
> 2.33.0
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 2/3] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX
  2021-10-09  7:56 ` [PATCH v4 2/3] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX Leonardo Bras
  2021-10-11 19:27   ` Eric Blake
@ 2021-10-13  6:18   ` Peter Xu
  2021-10-27  6:30     ` Leonardo Bras Soares Passos
  2021-11-02 13:13   ` Juan Quintela
  2 siblings, 1 reply; 35+ messages in thread
From: Peter Xu @ 2021-10-13  6:18 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Daniel P. Berrangé,
	Juan Quintela, qemu-devel, Jason Wang, Dr. David Alan Gilbert,
	Markus Armbruster, Eric Blake

On Sat, Oct 09, 2021 at 04:56:12AM -0300, Leonardo Bras wrote:
> @@ -154,6 +161,17 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
>          return -1;
>      }
>  
> +#ifdef CONFIG_LINUX
> +    ret = qemu_setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v));
> +    if (ret < 0) {
> +        /* Zerocopy not available on host */
> +        return 0;
> +    }
> +
> +    qio_channel_set_feature(QIO_CHANNEL(ioc),
> +                            QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY);

This is okay I think, but looks a bit weird.  Maybe nicer to be written as:

#if LINUX
      ret = setsockopt();
      if (ret == 0) {
          qio_channel_set_feature(...);
      }
#endif
      return 0;

?

> +#endif
> +
>      return 0;
>  }

[...]

> +static ssize_t qio_channel_socket_writev_zerocopy(QIOChannel *ioc,
> +                                                  const struct iovec *iov,
> +                                                  size_t niov,
> +                                                  Error **errp)
> +{
> +    QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
> +    ssize_t ret;
> +
> +    ret = qio_channel_socket_writev_flags(ioc, iov, niov, NULL, 0,
> +                                          MSG_ZEROCOPY, errp);
> +    if (ret == QIO_CHANNEL_ERR_NOBUFS) {
> +        if (errp && *errp) {

Hmm this seems wrong, *errp should be NULL in most cases, meanwhile I think
error_setg*() takes care of errp==NULL too, so maybe we can drop this?

> +            error_setg_errno(errp, errno,
> +                             "Process can't lock enough memory for using MSG_ZEROCOPY");
> +        }
> +        return -1;
> +    }
> +
> +    sioc->zerocopy_queued++;
> +    return ret;
> +}

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-10-09  7:56 ` [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy) Leonardo Bras
  2021-10-11 19:31   ` Eric Blake
@ 2021-10-13  6:23   ` Peter Xu
  2021-10-27  6:47     ` Leonardo Bras Soares Passos
  2021-10-13  6:26   ` Peter Xu
  2021-11-02 12:32   ` Juan Quintela
  3 siblings, 1 reply; 35+ messages in thread
From: Peter Xu @ 2021-10-13  6:23 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Daniel P. Berrangé,
	Juan Quintela, qemu-devel, Jason Wang, Dr. David Alan Gilbert,
	Markus Armbruster, Eric Blake

On Sat, Oct 09, 2021 at 04:56:13AM -0300, Leonardo Bras wrote:
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 88f07baedd..c4890cbb54 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -724,6 +724,11 @@
>  #                      will consume more CPU.
>  #                      Defaults to 1. (Since 5.0)
>  #
> +# @multifd-zerocopy: Controls behavior on sending memory pages on multifd migration.
> +#                    When true, enables a zerocopy mechanism for sending memory
> +#                    pages, if host does support it.
> +#                    Defaults to false. (Since 6.2)
> +#

Shall we keep it named "@zerocopy"?  Yes we have that dependency with multifd,
but it's fine to just fail the configuration if multifd not set. The thing is
we don't know whether that dependency will last forever, and we probably don't
want to introduce yet another feature bit when we can remove the dependency..
as we can't remove the old one to be compatible.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-10-09  7:56 ` [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy) Leonardo Bras
  2021-10-11 19:31   ` Eric Blake
  2021-10-13  6:23   ` Peter Xu
@ 2021-10-13  6:26   ` Peter Xu
  2021-10-27  6:50     ` Leonardo Bras Soares Passos
  2021-11-02 12:32   ` Juan Quintela
  3 siblings, 1 reply; 35+ messages in thread
From: Peter Xu @ 2021-10-13  6:26 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Daniel P. Berrangé,
	Juan Quintela, qemu-devel, Jason Wang, Dr. David Alan Gilbert,
	Markus Armbruster, Eric Blake

On Sat, Oct 09, 2021 at 04:56:13AM -0300, Leonardo Bras wrote:
> @@ -105,7 +105,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, uint32_t used,
>   */
>  static int nocomp_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
>  {
> -    return qio_channel_writev_all(p->c, p->pages->iov, used, errp);
> +    int flags = 0;
> +
> +    if (migrate_multifd_zerocopy()) {
> +        flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
> +    }
> +
> +    return qio_channel_writev_all_flags(p->c, p->pages->iov, used, flags, errp);
>  }

What if the user specified ZEROCOPY+MULTIFD, but the kernel doesn't support it?

IIUC then the first call to qio_channel_writev_all_flags() above will fail,
then we fail the migration.

It seems fine, but since you've introduced QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY
in the previous patch - I think the cleaner way is when start migration and
after we setup the ioc, we sanity check on the capability and the ioc to make
sure if ZEROCOPY+MULTIFD is specified, we should fail early if we're sure the
ioc doesn't have QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY bit set?

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 1/3] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks
  2021-10-13  6:07   ` Peter Xu
@ 2021-10-13  6:32     ` Peter Xu
  2021-10-27  6:07       ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Xu @ 2021-10-13  6:32 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Daniel P. Berrangé,
	Juan Quintela, qemu-devel, Jason Wang, Dr. David Alan Gilbert,
	Markus Armbruster, Eric Blake

On Wed, Oct 13, 2021 at 02:07:13PM +0800, Peter Xu wrote:
> On Sat, Oct 09, 2021 at 04:56:11AM -0300, Leonardo Bras wrote:
> > -int qio_channel_writev_full_all(QIOChannel *ioc,
> > -                                const struct iovec *iov,
> > -                                size_t niov,
> > -                                int *fds, size_t nfds,
> > -                                Error **errp)
> > +int qio_channel_writev_full_all_flags(QIOChannel *ioc,
> > +                                      const struct iovec *iov,
> > +                                      size_t niov,
> > +                                      int *fds, size_t nfds,
> > +                                      int flags, Error **errp)
> >  {
> >      int ret = -1;
> >      struct iovec *local_iov = g_new(struct iovec, niov);
> > @@ -237,8 +250,8 @@ int qio_channel_writev_full_all(QIOChannel *ioc,
> >  
> >      while (nlocal_iov > 0) {
> >          ssize_t len;
> > -        len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds, nfds,
> > -                                      errp);
> > +        len = qio_channel_writev_full_flags(ioc, local_iov, nlocal_iov, fds, nfds,
> > +                                          flags, errp);
> 
> IMHO we can keep qio_channel_writev_full() untouched, as it is the wrapper for
> io_writev() hook right now (and it allows taking fds).  Then here instead of
> adding a new flags into it, we can introduce qio_channel_writev_zerocopy_full()
> and directly call here:
> 
>            if (flags & ZEROCOPY) {
>                assert(fds == NULL && nfds == 0);
>                qio_channel_writev_zerocopy_full(...[no fds passing in]);
>            } else {
>                qio_channel_writev_full(...[with all parameters]);
>            }
> 
> Then qio_channel_writev_zerocopy_full() should be simplely the wrapper for the
> new io_writev_zerocopy() hook.

Sorry I think the name doesn't need to have "_full" - that should be used for
io_writev() when we need to pass in fds.  zerocopy doesn't have that, so I
think we can just call it qio_channel_writev_zerocopy() as it simply does what
writev() does.

> 
> >          if (len == QIO_CHANNEL_ERR_BLOCK) {
> >              if (qemu_in_coroutine()) {
> >                  qio_channel_yield(ioc, G_IO_OUT);
> > @@ -474,6 +487,31 @@ off_t qio_channel_io_seek(QIOChannel *ioc,
> >  }
> >  
> >  
> > +ssize_t qio_channel_writev_zerocopy(QIOChannel *ioc,
> > +                                    const struct iovec *iov,
> > +                                    size_t niov,
> > +                                    Error **errp)
> > +{
> > +    return qio_channel_writev_full_flags(ioc, iov, niov, NULL, 0,
> > +                                         QIO_CHANNEL_WRITE_FLAG_ZEROCOPY,
> > +                                         errp);
> > +}
> 
> This function is never used, right? qio_channel_writev_all_flags() is used in
> patch 3, with proper flags passed in.  Then IMHO we can drop this one.
> 
> The rest looks good, as long as with Eric's comment addressed.
> 
> Thanks,
> 
> > +
> > +
> > +int qio_channel_flush_zerocopy(QIOChannel *ioc,
> > +                               Error **errp)
> > +{
> > +    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
> > +
> > +    if (!klass->io_flush_zerocopy ||
> > +        !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY)) {
> > +        return 0;
> > +    }
> > +
> > +    return klass->io_flush_zerocopy(ioc, errp);
> > +}
> > +
> > +
> >  static void qio_channel_restart_read(void *opaque)
> >  {
> >      QIOChannel *ioc = opaque;
> > -- 
> > 2.33.0
> > 
> 
> -- 
> Peter Xu

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 1/3] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks
  2021-10-13  6:32     ` Peter Xu
@ 2021-10-27  6:07       ` Leonardo Bras Soares Passos
  2021-10-27  6:15         ` Peter Xu
  0 siblings, 1 reply; 35+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-10-27  6:07 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Juan Quintela, qemu-devel, Jason Wang, Dr. David Alan Gilbert,
	Markus Armbruster, Eric Blake

[-- Attachment #1: Type: text/plain, Size: 4090 bytes --]

Hello Peter, sorry for the delay.

On Wed, Oct 13, 2021 at 3:33 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Oct 13, 2021 at 02:07:13PM +0800, Peter Xu wrote:
> > On Sat, Oct 09, 2021 at 04:56:11AM -0300, Leonardo Bras wrote:
> > > -int qio_channel_writev_full_all(QIOChannel *ioc,
> > > -                                const struct iovec *iov,
> > > -                                size_t niov,
> > > -                                int *fds, size_t nfds,
> > > -                                Error **errp)
> > > +int qio_channel_writev_full_all_flags(QIOChannel *ioc,
> > > +                                      const struct iovec *iov,
> > > +                                      size_t niov,
> > > +                                      int *fds, size_t nfds,
> > > +                                      int flags, Error **errp)
> > >  {
> > >      int ret = -1;
> > >      struct iovec *local_iov = g_new(struct iovec, niov);
> > > @@ -237,8 +250,8 @@ int qio_channel_writev_full_all(QIOChannel *ioc,
> > >
> > >      while (nlocal_iov > 0) {
> > >          ssize_t len;
> > > -        len = qio_channel_writev_full(ioc, local_iov, nlocal_iov,
fds, nfds,
> > > -                                      errp);
> > > +        len = qio_channel_writev_full_flags(ioc, local_iov,
nlocal_iov, fds, nfds,
> > > +                                          flags, errp);
> >
> > IMHO we can keep qio_channel_writev_full() untouched, as it is the
wrapper for
> > io_writev() hook right now (and it allows taking fds).  Then here
instead of
> > adding a new flags into it, we can introduce
qio_channel_writev_zerocopy_full()
> > and directly call here:

Sure, it makes sense.

> >
> >            if (flags & ZEROCOPY) {
> >                assert(fds == NULL && nfds == 0);

Quick question: Why is this assert needed?

> >                qio_channel_writev_zerocopy_full(...[no fds passing in]);
> >            } else {
> >                qio_channel_writev_full(...[with all parameters]);
> >            }
> >
> > Then qio_channel_writev_zerocopy_full() should be simplely the wrapper
for the
> > new io_writev_zerocopy() hook.
>
> Sorry I think the name doesn't need to have "_full" - that should be used
for
> io_writev() when we need to pass in fds.  zerocopy doesn't have that, so I
> think we can just call it qio_channel_writev_zerocopy() as it simply does
what
> writev() does.

Ok, I will make these changes to v5.


>
> >
> > >          if (len == QIO_CHANNEL_ERR_BLOCK) {
> > >              if (qemu_in_coroutine()) {
> > >                  qio_channel_yield(ioc, G_IO_OUT);
> > > @@ -474,6 +487,31 @@ off_t qio_channel_io_seek(QIOChannel *ioc,
> > >  }
> > >
> > >
> > > +ssize_t qio_channel_writev_zerocopy(QIOChannel *ioc,
> > > +                                    const struct iovec *iov,
> > > +                                    size_t niov,
> > > +                                    Error **errp)
> > > +{
> > > +    return qio_channel_writev_full_flags(ioc, iov, niov, NULL, 0,
> > > +
QIO_CHANNEL_WRITE_FLAG_ZEROCOPY,
> > > +                                         errp);
> > > +}
> >
> > This function is never used, right? qio_channel_writev_all_flags() is
used in
> > patch 3, with proper flags passed in.  Then IMHO we can drop this one.
> >
> > The rest looks good, as long as with Eric's comment addressed.
> >
> > Thanks,

IIUC, this was addressed by your reply above, is that correct?

> >
> > > +
> > > +
> > > +int qio_channel_flush_zerocopy(QIOChannel *ioc,
> > > +                               Error **errp)
> > > +{
> > > +    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
> > > +
> > > +    if (!klass->io_flush_zerocopy ||
> > > +        !qio_channel_has_feature(ioc,
QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY)) {
> > > +        return 0;
> > > +    }
> > > +
> > > +    return klass->io_flush_zerocopy(ioc, errp);
> > > +}
> > > +
> > > +
> > >  static void qio_channel_restart_read(void *opaque)
> > >  {
> > >      QIOChannel *ioc = opaque;
> > > --
> > > 2.33.0
> > >
> >
> > --
> > Peter Xu
>
> --
> Peter Xu
>

Best regards,
Leonardo Bras

[-- Attachment #2: Type: text/html, Size: 6424 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 1/3] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks
  2021-10-27  6:07       ` Leonardo Bras Soares Passos
@ 2021-10-27  6:15         ` Peter Xu
  2021-10-27  6:31           ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Xu @ 2021-10-27  6:15 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: Daniel P. Berrangé,
	Juan Quintela, qemu-devel, Jason Wang, Dr. David Alan Gilbert,
	Markus Armbruster, Eric Blake

On Wed, Oct 27, 2021 at 03:07:13AM -0300, Leonardo Bras Soares Passos wrote:
> > >
> > >            if (flags & ZEROCOPY) {
> > >                assert(fds == NULL && nfds == 0);
> 
> Quick question: Why is this assert needed?

Not required I think; just want to make sure no one passes in the fds when
using zero copy mode.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 2/3] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX
  2021-10-13  6:18   ` Peter Xu
@ 2021-10-27  6:30     ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 35+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-10-27  6:30 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Juan Quintela, qemu-devel, Jason Wang, Dr. David Alan Gilbert,
	Markus Armbruster, Eric Blake

On Wed, Oct 13, 2021 at 3:18 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Sat, Oct 09, 2021 at 04:56:12AM -0300, Leonardo Bras wrote:
> > @@ -154,6 +161,17 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
> >          return -1;
> >      }
> >
> > +#ifdef CONFIG_LINUX
> > +    ret = qemu_setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v));
> > +    if (ret < 0) {
> > +        /* Zerocopy not available on host */
> > +        return 0;
> > +    }
> > +
> > +    qio_channel_set_feature(QIO_CHANNEL(ioc),
> > +                            QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY);
>
> This is okay I think, but looks a bit weird.  Maybe nicer to be written as:
>
> #if LINUX
>       ret = setsockopt();
>       if (ret == 0) {
>           qio_channel_set_feature(...);
>       }
> #endif
>       return 0;
>
> ?

Yeah, I also questioned myself about this one.
At the time I ended up writing like this because the lines above used
the behavior "if error, then exit/abort", and so I thought that this
would be the better way to include this feature.
But I did not consider that this is not an error exit, but a 'maybe
feature instead'.

So, I will change that like you suggested.

>
> > +#endif
> > +
> >      return 0;
> >  }
>
> [...]
>
> > +static ssize_t qio_channel_socket_writev_zerocopy(QIOChannel *ioc,
> > +                                                  const struct iovec *iov,
> > +                                                  size_t niov,
> > +                                                  Error **errp)
> > +{
> > +    QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
> > +    ssize_t ret;
> > +
> > +    ret = qio_channel_socket_writev_flags(ioc, iov, niov, NULL, 0,
> > +                                          MSG_ZEROCOPY, errp);
> > +    if (ret == QIO_CHANNEL_ERR_NOBUFS) {
> > +        if (errp && *errp) {
>
> Hmm this seems wrong, *errp should be NULL in most cases, meanwhile I think
> error_setg*() takes care of errp==NULL too, so maybe we can drop this?

Yeah, you are correct.
I ended up confused about how to use err, thanks for making it more clear!

>
> > +            error_setg_errno(errp, errno,
> > +                             "Process can't lock enough memory for using MSG_ZEROCOPY");
> > +        }
> > +        return -1;
> > +    }
> > +
> > +    sioc->zerocopy_queued++;
> > +    return ret;
> > +}
>
> --
> Peter Xu
>

Best regards,
Leonardo Bras



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 1/3] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks
  2021-10-27  6:15         ` Peter Xu
@ 2021-10-27  6:31           ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 35+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-10-27  6:31 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Juan Quintela, qemu-devel, Jason Wang, Dr. David Alan Gilbert,
	Markus Armbruster, Eric Blake

On Wed, Oct 27, 2021 at 3:15 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Oct 27, 2021 at 03:07:13AM -0300, Leonardo Bras Soares Passos wrote:
> > > >
> > > >            if (flags & ZEROCOPY) {
> > > >                assert(fds == NULL && nfds == 0);
> >
> > Quick question: Why is this assert needed?
>
> Not required I think; just want to make sure no one passes in the fds when
> using zero copy mode.

Ok, that makes sense.
I will add it then, thanks!

>
> Thanks,
>
> --
> Peter Xu
>


Best regards,
Leo



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-10-13  6:23   ` Peter Xu
@ 2021-10-27  6:47     ` Leonardo Bras Soares Passos
  2021-10-27  7:06       ` Peter Xu
  0 siblings, 1 reply; 35+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-10-27  6:47 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Juan Quintela, qemu-devel, Jason Wang, Dr. David Alan Gilbert,
	Markus Armbruster, Eric Blake

On Wed, Oct 13, 2021 at 3:24 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Sat, Oct 09, 2021 at 04:56:13AM -0300, Leonardo Bras wrote:
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 88f07baedd..c4890cbb54 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -724,6 +724,11 @@
> >  #                      will consume more CPU.
> >  #                      Defaults to 1. (Since 5.0)
> >  #
> > +# @multifd-zerocopy: Controls behavior on sending memory pages on multifd migration.
> > +#                    When true, enables a zerocopy mechanism for sending memory
> > +#                    pages, if host does support it.
> > +#                    Defaults to false. (Since 6.2)
> > +#
>
> Shall we keep it named "@zerocopy"?  Yes we have that dependency with multifd,
> but it's fine to just fail the configuration if multifd not set. The thing is
> we don't know whether that dependency will last forever, and we probably don't
> want to introduce yet another feature bit when we can remove the dependency..
> as we can't remove the old one to be compatible.

It makes sense not wanting to create a new future bit in the future,
but if we just add a
"@zerocopy' , wouldn't we need to fail every migration setup that
don't support zerocopy?

(Thinking back, to stay as it is, it would also be necessary that I
find a way to fail other multifd setups that don't support zerocopy,
for v5)

>
> --
> Peter Xu
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-10-13  6:26   ` Peter Xu
@ 2021-10-27  6:50     ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 35+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-10-27  6:50 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Juan Quintela, qemu-devel, Jason Wang, Dr. David Alan Gilbert,
	Markus Armbruster, Eric Blake

On Wed, Oct 13, 2021 at 3:26 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Sat, Oct 09, 2021 at 04:56:13AM -0300, Leonardo Bras wrote:
> > @@ -105,7 +105,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, uint32_t used,
> >   */
> >  static int nocomp_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
> >  {
> > -    return qio_channel_writev_all(p->c, p->pages->iov, used, errp);
> > +    int flags = 0;
> > +
> > +    if (migrate_multifd_zerocopy()) {
> > +        flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
> > +    }
> > +
> > +    return qio_channel_writev_all_flags(p->c, p->pages->iov, used, flags, errp);
> >  }
>
> What if the user specified ZEROCOPY+MULTIFD, but the kernel doesn't support it?
>
> IIUC then the first call to qio_channel_writev_all_flags() above will fail,
> then we fail the migration.
>
> It seems fine, but since you've introduced QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY
> in the previous patch - I think the cleaner way is when start migration and
> after we setup the ioc, we sanity check on the capability and the ioc to make
> sure if ZEROCOPY+MULTIFD is specified, we should fail early if we're sure the
> ioc doesn't have QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY bit set?
>
> --
> Peter Xu
>

Failing earlier is always a good idea.
I will try to implement that.

Thanks Peter!

Best regards,
Leo



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-10-27  6:47     ` Leonardo Bras Soares Passos
@ 2021-10-27  7:06       ` Peter Xu
  0 siblings, 0 replies; 35+ messages in thread
From: Peter Xu @ 2021-10-27  7:06 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: Daniel P. Berrangé,
	Juan Quintela, qemu-devel, Jason Wang, Dr. David Alan Gilbert,
	Markus Armbruster, Eric Blake

On Wed, Oct 27, 2021 at 03:47:18AM -0300, Leonardo Bras Soares Passos wrote:
> On Wed, Oct 13, 2021 at 3:24 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Sat, Oct 09, 2021 at 04:56:13AM -0300, Leonardo Bras wrote:
> > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > index 88f07baedd..c4890cbb54 100644
> > > --- a/qapi/migration.json
> > > +++ b/qapi/migration.json
> > > @@ -724,6 +724,11 @@
> > >  #                      will consume more CPU.
> > >  #                      Defaults to 1. (Since 5.0)
> > >  #
> > > +# @multifd-zerocopy: Controls behavior on sending memory pages on multifd migration.
> > > +#                    When true, enables a zerocopy mechanism for sending memory
> > > +#                    pages, if host does support it.
> > > +#                    Defaults to false. (Since 6.2)
> > > +#
> >
> > Shall we keep it named "@zerocopy"?  Yes we have that dependency with multifd,
> > but it's fine to just fail the configuration if multifd not set. The thing is
> > we don't know whether that dependency will last forever, and we probably don't
> > want to introduce yet another feature bit when we can remove the dependency..
> > as we can't remove the old one to be compatible.
> 
> It makes sense not wanting to create a new future bit in the future,
> but if we just add a
> "@zerocopy' , wouldn't we need to fail every migration setup that
> don't support zerocopy?
> 
> (Thinking back, to stay as it is, it would also be necessary that I
> find a way to fail other multifd setups that don't support zerocopy,
> for v5)

Yes I think so; imho we can fail either whey applying the bit, or it's okay too
to fail at the start of migration.  Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-10-12  5:53       ` Markus Armbruster
@ 2021-10-28  1:56         ` Leonardo Bras Soares Passos
  2021-10-28  4:30           ` Markus Armbruster
  0 siblings, 1 reply; 35+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-10-28  1:56 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Daniel P. Berrangé,
	Juan Quintela, Jason Wang, Dr. David Alan Gilbert, Peter Xu,
	qemu-devel, Eric Blake

Hello Markus!
(comments at the bottom)

On Tue, Oct 12, 2021 at 2:54 AM Markus Armbruster <armbru@redhat.com> wrote:
>
> Leonardo Bras Soares Passos <leobras@redhat.com> writes:
>
> > Hello Eric,
> >
> > On Mon, Oct 11, 2021 at 4:32 PM Eric Blake <eblake@redhat.com> wrote:
> >>
> >> On Sat, Oct 09, 2021 at 04:56:13AM -0300, Leonardo Bras wrote:
> >> > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> >> > zerocopy interface.
> >> >
> >> > Change multifd_send_sync_main() so it can distinguish the last sync from
> >> > the setup and per-iteration ones, so a flush_zerocopy() can be called
> >> > at the last sync in order to make sure all RAM is sent before finishing
> >> > the migration.
> >> >
> >> > Also make it return -1 if flush_zerocopy() fails, in order to cancel
> >> > the migration process, and avoid resuming the guest in the target host
> >> > without receiving all current RAM.
> >> >
> >> > This will work fine on RAM migration because the RAM pages are not usually freed,
> >> > and there is no problem on changing the pages content between async_send() and
> >> > the actual sending of the buffer, because this change will dirty the page and
> >> > cause it to be re-sent on a next iteration anyway.
> >> >
> >> > Given a lot of locked memory may be needed in order to use multid migration
> >> > with zerocopy enabled, make it optional by creating a new parameter
> >> > multifd-zerocopy on qapi, so low-privileged users can still perform multifd
> >> > migrations.
> >> >
> >> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> >> > ---
> >> >  qapi/migration.json   | 18 ++++++++++++++++++
> >> >  migration/migration.h |  1 +
> >> >  migration/multifd.h   |  2 +-
> >> >  migration/migration.c | 20 ++++++++++++++++++++
> >> >  migration/multifd.c   | 33 ++++++++++++++++++++++++++++-----
> >> >  migration/ram.c       | 20 +++++++++++++-------
> >> >  monitor/hmp-cmds.c    |  4 ++++
> >> >  7 files changed, 85 insertions(+), 13 deletions(-)
> >> >
> >> > diff --git a/qapi/migration.json b/qapi/migration.json
> >> > index 88f07baedd..c4890cbb54 100644
> >> > --- a/qapi/migration.json
> >> > +++ b/qapi/migration.json
> >> > @@ -724,6 +724,11 @@
> >> >  #                      will consume more CPU.
> >> >  #                      Defaults to 1. (Since 5.0)
> >> >  #
> >> > +# @multifd-zerocopy: Controls behavior on sending memory pages on multifd migration.
> >> > +#                    When true, enables a zerocopy mechanism for sending memory
> >> > +#                    pages, if host does support it.
> >>
> >> s/does support/supports/ (several instances this patch)
> >
> > I will make sure to fix that in v5.
> >
> >>
> >> > +#                    Defaults to false. (Since 6.2)
> >> > +#
> >> >  # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
> >> >  #                        aliases for the purpose of dirty bitmap migration.  Such
> >> >  #                        aliases may for example be the corresponding names on the
> >> > @@ -758,6 +763,7 @@
> >> >             'xbzrle-cache-size', 'max-postcopy-bandwidth',
> >> >             'max-cpu-throttle', 'multifd-compression',
> >> >             'multifd-zlib-level' ,'multifd-zstd-level',
> >> > +        'multifd-zerocopy',
> >> >             'block-bitmap-mapping' ] }
> >>
> >> Should this feature be guarded with 'if':'CONFIG_LINUX', since that's
> >> the only platform where you even compile the code (even then, it can
> >> still fail if the kernel doesn't support it).
> >
> > I think it makes sense for the feature to always be available, even
> > though it's not supported
> > outside linux > v4.14.
> >
> > IMHO it makes more sense for the user to get an error when it starts
> > migration, due to host
> > not supporting zerocopy, than the error happening in the config step,
> > which may cause the user
> > to question if it was the right parameter.
> >
> > The config message error here could also be ignored, and users can
> > think zerocopy is working, while it's not.
> >
> > For automated migrations, this feature should never be enabled  for
> > non-linux / older linux hosts anyway.
> >
> > Is there a good argument I am missing for this feature being disabled
> > on non-linux?
>
> The general argument for having QAPI schema 'if' mirror the C
> implementation's #if is introspection.  Let me explain why that matters.
>
> Consider a management application that supports a range of QEMU
> versions, say 5.0 to 6.2.  Say it wants to use an QMP command that is
> new in QEMU 6.2.  The sane way to do that is to probe for the command
> with query-qmp-schema.  Same for command arguments, and anything else
> QMP.
>
> If you doubt "sane", check out Part II of "QEMU interface introspection:
> From hacks to solutions"[*].
>
> The same technique works when a QMP command / argument / whatever is
> compile-time conditional ('if' in the schema).  The code the management
> application needs anyway to deal with older QEMU now also deals with
> "compiled out".  Nice.
>
> Of course, a command or argument present in QEMU can still fail, and the
> management application still needs to handle failure.  Distinguishing
> different failure modes can be bothersome and/or fragile.
>
> By making the QAPI schema conditional mirror the C conditional, you
> squash the failure mode "this version of QEMU supports it, but this
> build of QEMU does not" into "this version of QEMU does not support
> it".  Makes sense, doesn't it?
>
> A minor additional advantage is less generated code.
>
>
>
> [*] http://events17.linuxfoundation.org/sites/events/files/slides/armbru-qemu-introspection.pdf
>

This was very informative, thanks!
I now understand the rationale about this choice.

TBH I am not very used to this syntax.
I did a take a peek at some other json files, and ended adding this
lines in code, which compiled just fine:

for : enum MigrationParameter
{'name': 'multifd-zerocopy', 'if' : 'CONFIG_LINUX'},

for : struct MigrateSetParameters and struct MigrationParameters:
'*multifd-zerocopy': { 'type': 'bool', 'if': 'CONFIG_LINUX' },

Is that enough? Is there any other necessary change?

Thanks for reviewing and for helping out with this!

Best regards,
Leonardo Bras



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-10-28  1:56         ` Leonardo Bras Soares Passos
@ 2021-10-28  4:30           ` Markus Armbruster
  2021-10-28  4:37             ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 35+ messages in thread
From: Markus Armbruster @ 2021-10-28  4:30 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: Daniel P. Berrangé,
	Juan Quintela, Jason Wang, Dr. David Alan Gilbert, Peter Xu,
	qemu-devel, Eric Blake

Leonardo Bras Soares Passos <leobras@redhat.com> writes:

[...]

>> The general argument for having QAPI schema 'if' mirror the C
>> implementation's #if is introspection.  Let me explain why that matters.
>>
>> Consider a management application that supports a range of QEMU
>> versions, say 5.0 to 6.2.  Say it wants to use an QMP command that is
>> new in QEMU 6.2.  The sane way to do that is to probe for the command
>> with query-qmp-schema.  Same for command arguments, and anything else
>> QMP.
>>
>> If you doubt "sane", check out Part II of "QEMU interface introspection:
>> From hacks to solutions"[*].
>>
>> The same technique works when a QMP command / argument / whatever is
>> compile-time conditional ('if' in the schema).  The code the management
>> application needs anyway to deal with older QEMU now also deals with
>> "compiled out".  Nice.
>>
>> Of course, a command or argument present in QEMU can still fail, and the
>> management application still needs to handle failure.  Distinguishing
>> different failure modes can be bothersome and/or fragile.
>>
>> By making the QAPI schema conditional mirror the C conditional, you
>> squash the failure mode "this version of QEMU supports it, but this
>> build of QEMU does not" into "this version of QEMU does not support
>> it".  Makes sense, doesn't it?
>>
>> A minor additional advantage is less generated code.
>>
>>
>>
>> [*] http://events17.linuxfoundation.org/sites/events/files/slides/armbru-qemu-introspection.pdf
>>
>
> This was very informative, thanks!
> I now understand the rationale about this choice.
>
> TBH I am not very used to this syntax.
> I did a take a peek at some other json files, and ended adding this
> lines in code, which compiled just fine:
>
> for : enum MigrationParameter
> {'name': 'multifd-zerocopy', 'if' : 'CONFIG_LINUX'},
>
> for : struct MigrateSetParameters and struct MigrationParameters:
> '*multifd-zerocopy': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
>
> Is that enough? Is there any other necessary change?

Looks good to me.

The QAPI schema language is documented in docs/devel/qapi-code-gen.rst.

If you're curious, you can diff code generated into qapi/ before and
after adding the 'if'.

> Thanks for reviewing and for helping out with this!

My pleasure!



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-10-28  4:30           ` Markus Armbruster
@ 2021-10-28  4:37             ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 35+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-10-28  4:37 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Daniel P. Berrangé,
	Juan Quintela, Jason Wang, Dr. David Alan Gilbert, Peter Xu,
	qemu-devel, Eric Blake

On Thu, Oct 28, 2021 at 1:30 AM Markus Armbruster <armbru@redhat.com> wrote:
>
> Leonardo Bras Soares Passos <leobras@redhat.com> writes:
>
> [...]
>
> >> The general argument for having QAPI schema 'if' mirror the C
> >> implementation's #if is introspection.  Let me explain why that matters.
> >>
> >> Consider a management application that supports a range of QEMU
> >> versions, say 5.0 to 6.2.  Say it wants to use an QMP command that is
> >> new in QEMU 6.2.  The sane way to do that is to probe for the command
> >> with query-qmp-schema.  Same for command arguments, and anything else
> >> QMP.
> >>
> >> If you doubt "sane", check out Part II of "QEMU interface introspection:
> >> From hacks to solutions"[*].
> >>
> >> The same technique works when a QMP command / argument / whatever is
> >> compile-time conditional ('if' in the schema).  The code the management
> >> application needs anyway to deal with older QEMU now also deals with
> >> "compiled out".  Nice.
> >>
> >> Of course, a command or argument present in QEMU can still fail, and the
> >> management application still needs to handle failure.  Distinguishing
> >> different failure modes can be bothersome and/or fragile.
> >>
> >> By making the QAPI schema conditional mirror the C conditional, you
> >> squash the failure mode "this version of QEMU supports it, but this
> >> build of QEMU does not" into "this version of QEMU does not support
> >> it".  Makes sense, doesn't it?
> >>
> >> A minor additional advantage is less generated code.
> >>
> >>
> >>
> >> [*] http://events17.linuxfoundation.org/sites/events/files/slides/armbru-qemu-introspection.pdf
> >>
> >
> > This was very informative, thanks!
> > I now understand the rationale about this choice.
> >
> > TBH I am not very used to this syntax.
> > I did a take a peek at some other json files, and ended adding this
> > lines in code, which compiled just fine:
> >
> > for : enum MigrationParameter
> > {'name': 'multifd-zerocopy', 'if' : 'CONFIG_LINUX'},
> >
> > for : struct MigrateSetParameters and struct MigrationParameters:
> > '*multifd-zerocopy': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
> >
> > Is that enough? Is there any other necessary change?
>
> Looks good to me.
>
> The QAPI schema language is documented in docs/devel/qapi-code-gen.rst.

Thanks for reviewing and for pointing this docs!

>
> If you're curious, you can diff code generated into qapi/ before and
> after adding the 'if'.

Good idea!

>
> > Thanks for reviewing and for helping out with this!
>
> My pleasure!
>

:)

Best regards,
Leo



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-10-09  7:56 ` [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy) Leonardo Bras
                     ` (2 preceding siblings ...)
  2021-10-13  6:26   ` Peter Xu
@ 2021-11-02 12:32   ` Juan Quintela
  2021-11-03 21:29     ` Leonardo Bras Soares Passos
  2022-04-14  4:00     ` Leonardo Bras Soares Passos
  3 siblings, 2 replies; 35+ messages in thread
From: Juan Quintela @ 2021-11-02 12:32 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Daniel P. Berrangé,
	qemu-devel, Jason Wang, Dr. David Alan Gilbert, Peter Xu,
	Markus Armbruster, Eric Blake

Leonardo Bras <leobras@redhat.com> wrote:
> Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> zerocopy interface.
>
> Change multifd_send_sync_main() so it can distinguish the last sync from
> the setup and per-iteration ones, so a flush_zerocopy() can be called
> at the last sync in order to make sure all RAM is sent before finishing
> the migration.

You need to do this after each iteration.  Otherwise it can happen that:

channel 1:               channel 2:

   send page 11

next iteration
                         send page 11

                         this page arrives

now arrives this old copy.

After each iteration, one needs to be sure that no ram is inflight.

This means that I think you don't need the last_sync parameter at all,
as you have to do the flush() in every iteration.

> Also make it return -1 if flush_zerocopy() fails, in order to cancel
> the migration process, and avoid resuming the guest in the target host
> without receiving all current RAM.
>
> This will work fine on RAM migration because the RAM pages are not usually freed,
> and there is no problem on changing the pages content between async_send() and
> the actual sending of the buffer, because this change will dirty the page and
> cause it to be re-sent on a next iteration anyway.
>
> Given a lot of locked memory may be needed in order to use multid migration
> with zerocopy enabled, make it optional by creating a new parameter
> multifd-zerocopy on qapi, so low-privileged users can still perform multifd
> migrations.
>
> Signed-off-by: Leonardo Bras <leobras@redhat.com>

I think it is better that you split this patch into two:

- Add the new parameter (it looks good to me, and can be reviewed-by)
- Implement the feature, here probably you need more changes/review


>              '*multifd-zlib-level': 'uint8',
>              '*multifd-zstd-level': 'uint8',
> +	    '*multifd-zerocopy': 'bool',
>              '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }

Something weird here.

>              '*multifd-compression': 'MultiFDCompression',
>              '*multifd-zlib-level': 'uint8',
>              '*multifd-zstd-level': 'uint8',
> +	    '*multifd-zerocopy': 'bool',
>              '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
>  

Same here.


> @@ -105,7 +105,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, uint32_t used,
>   */
>  static int nocomp_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
>  {
> -    return qio_channel_writev_all(p->c, p->pages->iov, used, errp);
> +    int flags = 0;
> +
> +    if (migrate_multifd_zerocopy()) {
> +        flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
> +    }

You have added an if on each write, just add it during initialization.

There is already a uint32_t flags field in MultiFDRecvParams, but you
can add a

int write_flags;

one and add it during initialization.  That way you don't need any check
here, just pass it.

> +    return qio_channel_writev_all_flags(p->c, p->pages->iov, used, flags, errp);


> -void multifd_send_sync_main(QEMUFile *f)
> +int multifd_send_sync_main(QEMUFile *f, bool last_sync)

As you need to check every round, you now have to check for errors on
every multifd_send_sync_main() call.  It really looked weird that you
only need to check it sometimes.

> @@ -3006,13 +3006,19 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>          ram_control_after_iterate(f, RAM_CONTROL_FINISH);
>      }
>  
> -    if (ret >= 0) {
> -        multifd_send_sync_main(rs->f);
> -        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> -        qemu_fflush(f);
> +    if (ret < 0) {
> +        return ret;
>      }
>  
> -    return ret;
> +    ret = multifd_send_sync_main(rs->f, true);
> +    if (ret < 0) {
> +        return -1;

Why are you returning -1 instead of ret?

Callers of ram_save_complete(). We set qemu_error_file() with that
error, so it is not a good idea to reset it.


Later, Juan.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 2/3] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX
  2021-10-09  7:56 ` [PATCH v4 2/3] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX Leonardo Bras
  2021-10-11 19:27   ` Eric Blake
  2021-10-13  6:18   ` Peter Xu
@ 2021-11-02 13:13   ` Juan Quintela
  2021-11-03 20:50     ` Leonardo Bras Soares Passos
  2 siblings, 1 reply; 35+ messages in thread
From: Juan Quintela @ 2021-11-02 13:13 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Daniel P. Berrangé,
	qemu-devel, Jason Wang, Dr. David Alan Gilbert, Peter Xu,
	Markus Armbruster, Eric Blake

Leonardo Bras <leobras@redhat.com> wrote:
> For CONFIG_LINUX, implement the new optional callbacks io_write_zerocopy and
> io_flush_zerocopy on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
> feature is available in the host kernel, which is checked on
> qio_channel_socket_connect_sync()
>
> qio_channel_socket_writev() contents were moved to a helper function
> qio_channel_socket_writev_flags() which accepts an extra argument for flags.
> (This argument is passed directly to sendmsg().
>
> The above helper function is used to implement qio_channel_socket_writev(),
> with flags = 0, keeping it's behavior unchanged, and
> qio_channel_socket_writev_zerocopy() with flags = MSG_ZEROCOPY.
>
> qio_channel_socket_flush_zerocopy() was implemented by counting how many times
> sendmsg(...,MSG_ZEROCOPY) was sucessfully called, and then reading the
> socket's error queue, in order to find how many of them finished sending.
> Flush will loop until those counters are the same, or until some error ocurs.
>
> A new function qio_channel_socket_poll() was also created in order to avoid
> busy-looping recvmsg() in qio_channel_socket_flush_zerocopy() while waiting for
> updates in socket's error queue.
>
> Notes on using writev_zerocopy():
> 1: Buffer
> - As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying,
> some caution is necessary to avoid overwriting any buffer before it's sent.
> If something like this happen, a newer version of the buffer may be sent instead.
> - If this is a problem, it's recommended to call flush_zerocopy() before freeing
> or re-using the buffer.
>
> 2: Locked memory
> - When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and
> unlocked after it's sent.
> - Depending on the size of each buffer, and how often it's sent, it may require
> a larger amount of locked memory than usually available to non-root user.
> - If the required amount of locked memory is not available, writev_zerocopy
> will return an error, which can abort an operation like migration,
> - Because of this, when an user code wants to add zerocopy as a feature, it
> requires a mechanism to disable it, so it can still be acessible to less
> privileged users.
>
> Signed-off-by: Leonardo Bras <leobras@redhat.com>

I think this patch would be easier to review if you split in:
- add the flags parameter left and right
- add the meat of what you do with the flags.

> +++ b/include/io/channel-socket.h
> @@ -47,6 +47,8 @@ struct QIOChannelSocket {
>      socklen_t localAddrLen;
>      struct sockaddr_storage remoteAddr;
>      socklen_t remoteAddrLen;
> +    ssize_t zerocopy_queued;
> +    ssize_t zerocopy_sent;

I am not sure if this is good/bad to put it inside

#ifdef CONFIG_LINUX

basically everything else uses it.

> +#ifdef CONFIG_LINUX
> +    ret = qemu_setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v));
> +    if (ret < 0) {
> +        /* Zerocopy not available on host */
> +        return 0;
> +    }
> +
> +    qio_channel_set_feature(QIO_CHANNEL(ioc),
> +                            QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY);

As Peter said, you shouldn't fail if the feature is not there.

But on the other hand, on patch 3, you don't check that this feature
exist when you allow to enable multifd_zerocopy.

> +#endif
> +
>      return 0;
>  }


>          error_setg_errno(errp, errno,
>                           "Unable to write to socket");

Why do you split this in two lines?

Yes, I know that this file is not consistent either on how the do with
this, sometimes one line, otherwise more.

I don't know how ZEROCPY works at kernel level to comment on rest of the
patch.

Later, Juan.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 2/3] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX
  2021-11-02 13:13   ` Juan Quintela
@ 2021-11-03 20:50     ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 35+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-11-03 20:50 UTC (permalink / raw)
  To: Juan Quintela
  Cc: Daniel P. Berrangé,
	qemu-devel, Jason Wang, Dr. David Alan Gilbert, Peter Xu,
	Markus Armbruster, Eric Blake

On Tue, Nov 2, 2021 at 10:13 AM Juan Quintela <quintela@redhat.com> wrote:
>
> Leonardo Bras <leobras@redhat.com> wrote:
> > For CONFIG_LINUX, implement the new optional callbacks io_write_zerocopy and
> > io_flush_zerocopy on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
> > feature is available in the host kernel, which is checked on
> > qio_channel_socket_connect_sync()
> >
> > qio_channel_socket_writev() contents were moved to a helper function
> > qio_channel_socket_writev_flags() which accepts an extra argument for flags.
> > (This argument is passed directly to sendmsg().
> >
> > The above helper function is used to implement qio_channel_socket_writev(),
> > with flags = 0, keeping it's behavior unchanged, and
> > qio_channel_socket_writev_zerocopy() with flags = MSG_ZEROCOPY.
> >
> > qio_channel_socket_flush_zerocopy() was implemented by counting how many times
> > sendmsg(...,MSG_ZEROCOPY) was sucessfully called, and then reading the
> > socket's error queue, in order to find how many of them finished sending.
> > Flush will loop until those counters are the same, or until some error ocurs.
> >
> > A new function qio_channel_socket_poll() was also created in order to avoid
> > busy-looping recvmsg() in qio_channel_socket_flush_zerocopy() while waiting for
> > updates in socket's error queue.
> >
> > Notes on using writev_zerocopy():
> > 1: Buffer
> > - As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying,
> > some caution is necessary to avoid overwriting any buffer before it's sent.
> > If something like this happen, a newer version of the buffer may be sent instead.
> > - If this is a problem, it's recommended to call flush_zerocopy() before freeing
> > or re-using the buffer.
> >
> > 2: Locked memory
> > - When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and
> > unlocked after it's sent.
> > - Depending on the size of each buffer, and how often it's sent, it may require
> > a larger amount of locked memory than usually available to non-root user.
> > - If the required amount of locked memory is not available, writev_zerocopy
> > will return an error, which can abort an operation like migration,
> > - Because of this, when an user code wants to add zerocopy as a feature, it
> > requires a mechanism to disable it, so it can still be acessible to less
> > privileged users.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
>
> I think this patch would be easier to review if you split in:
> - add the flags parameter left and right
> - add the meat of what you do with the flags.

ok, I will try to split it like this.

>
> > +++ b/include/io/channel-socket.h
> > @@ -47,6 +47,8 @@ struct QIOChannelSocket {
> >      socklen_t localAddrLen;
> >      struct sockaddr_storage remoteAddr;
> >      socklen_t remoteAddrLen;
> > +    ssize_t zerocopy_queued;
> > +    ssize_t zerocopy_sent;
>
> I am not sure if this is good/bad to put it inside
>
> #ifdef CONFIG_LINUX
>
> basically everything else uses it.

I think it makes sense that zerocopy_{sent,queued} is inside
CONFIG_LINUX as no one else is using zerocopy.

>
> > +#ifdef CONFIG_LINUX
> > +    ret = qemu_setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v));
> > +    if (ret < 0) {
> > +        /* Zerocopy not available on host */
> > +        return 0;
> > +    }
> > +
> > +    qio_channel_set_feature(QIO_CHANNEL(ioc),
> > +                            QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY);
>
> As Peter said, you shouldn't fail if the feature is not there.
>
> But on the other hand, on patch 3, you don't check that this feature
> exist when you allow to enable multifd_zerocopy.

This had a major rework on v5, but I will make sure this suggestion is
addressed before releasing it.

>
> > +#endif
> > +
> >      return 0;
> >  }
>
>
> >          error_setg_errno(errp, errno,
> >                           "Unable to write to socket");
>
> Why do you split this in two lines?
>
> Yes, I know that this file is not consistent either on how the do with
> this, sometimes one line, otherwise more.

IIUC, this lines have no '+' in them, so they are not my addition.

>
> I don't know how ZEROCPY works at kernel level to comment on rest of the
> patch.
>
> Later, Juan.

Thanks for reviewing Juan.

Best regards,
Leo



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-11-02 12:32   ` Juan Quintela
@ 2021-11-03 21:29     ` Leonardo Bras Soares Passos
  2021-11-03 23:24       ` Juan Quintela
  2022-04-14  4:00     ` Leonardo Bras Soares Passos
  1 sibling, 1 reply; 35+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-11-03 21:29 UTC (permalink / raw)
  To: Juan Quintela
  Cc: Daniel P. Berrangé,
	qemu-devel, Jason Wang, Dr. David Alan Gilbert, Peter Xu,
	Markus Armbruster, Eric Blake

Hello Juan,

On Tue, Nov 2, 2021 at 9:32 AM Juan Quintela <quintela@redhat.com> wrote:
>
> Leonardo Bras <leobras@redhat.com> wrote:
> > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> > zerocopy interface.
> >
> > Change multifd_send_sync_main() so it can distinguish the last sync from
> > the setup and per-iteration ones, so a flush_zerocopy() can be called
> > at the last sync in order to make sure all RAM is sent before finishing
> > the migration.
>
> You need to do this after each iteration.  Otherwise it can happen that:
>
> channel 1:               channel 2:
>
>    send page 11
>
> next iteration
>                          send page 11
>
>                          this page arrives
>
> now arrives this old copy.

Current multifd's sendmsg() will block until all data is sent, is that correct?

If that's the case, and supposing the network driver supports
multiqueue, maybe there is a small chance for this to happen.
I will add the flush at the end of each iteration, just to be sure.

>
> After each iteration, one needs to be sure that no ram is inflight.
>
> This means that I think you don't need the last_sync parameter at all,
> as you have to do the flush() in every iteration.
>
> > Also make it return -1 if flush_zerocopy() fails, in order to cancel
> > the migration process, and avoid resuming the guest in the target host
> > without receiving all current RAM.> >
> > This will work fine on RAM migration because the RAM pages are not usually freed,
> > and there is no problem on changing the pages content between async_send() and
> > the actual sending of the buffer, because this change will dirty the page and
> > cause it to be re-sent on a next iteration anyway.
> >
> > Given a lot of locked memory may be needed in order to use multid migration
> > with zerocopy enabled, make it optional by creating a new parameter
> > multifd-zerocopy on qapi, so low-privileged users can still perform multifd
> > migrations.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
>
> I think it is better that you split this patch into two:
>
> - Add the new parameter (it looks good to me, and can be reviewed-by)
> - Implement the feature, here probably you need more changes/review

Sure, I will try to divide the patch like this.

>
>
> >              '*multifd-zlib-level': 'uint8',
> >              '*multifd-zstd-level': 'uint8',
> > +         '*multifd-zerocopy': 'bool',
> >              '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
>
> Something weird here.
>
> >              '*multifd-compression': 'MultiFDCompression',
> >              '*multifd-zlib-level': 'uint8',
> >              '*multifd-zstd-level': 'uint8',
> > +         '*multifd-zerocopy': 'bool',
> >              '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
> >
>
> Same here.

Could you please elaborate?

>
>
> > @@ -105,7 +105,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, uint32_t used,
> >   */
> >  static int nocomp_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
> >  {
> > -    return qio_channel_writev_all(p->c, p->pages->iov, used, errp);
> > +    int flags = 0;
> > +
> > +    if (migrate_multifd_zerocopy()) {
> > +        flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
> > +    }
>
> You have added an if on each write, just add it during initialization.



>
> There is already a uint32_t flags field in MultiFDRecvParams, but you
> can add a
>
> int write_flags;
>
> one and add it during initialization.  That way you don't need any check
> here, just pass it.
>
> > +    return qio_channel_writev_all_flags(p->c, p->pages->iov, used, flags, errp);

Ok, I will try to make it work with this suggestion.

>
>
> > -void multifd_send_sync_main(QEMUFile *f)
> > +int multifd_send_sync_main(QEMUFile *f, bool last_sync)
>
> As you need to check every round, you now have to check for errors on
> every multifd_send_sync_main() call.  It really looked weird that you
> only need to check it sometimes.

Ok, I will work on that.

>
> > @@ -3006,13 +3006,19 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
> >          ram_control_after_iterate(f, RAM_CONTROL_FINISH);
> >      }
> >
> > -    if (ret >= 0) {
> > -        multifd_send_sync_main(rs->f);
> > -        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> > -        qemu_fflush(f);
> > +    if (ret < 0) {
> > +        return ret;
> >      }
> >
> > -    return ret;
> > +    ret = multifd_send_sync_main(rs->f, true);
> > +    if (ret < 0) {
> > +        return -1;
>
> Why are you returning -1 instead of ret?
>
> Callers of ram_save_complete(). We set qemu_error_file() with that
> error, so it is not a good idea to reset it.
>
>

Ok, I will take a look on that.

> Later, Juan.
>

Thanks,
Leo



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-11-03 21:29     ` Leonardo Bras Soares Passos
@ 2021-11-03 23:24       ` Juan Quintela
  2021-11-04  3:43         ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 35+ messages in thread
From: Juan Quintela @ 2021-11-03 23:24 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: Daniel P. Berrangé,
	qemu-devel, Jason Wang, Dr. David Alan Gilbert, Peter Xu,
	Markus Armbruster, Eric Blake

Leonardo Bras Soares Passos <leobras@redhat.com> wrote:
> Hello Juan,

hi

> Current multifd's sendmsg() will block until all data is sent, is that correct?
>
> If that's the case, and supposing the network driver supports
> multiqueue, maybe there is a small chance for this to happen.
> I will add the flush at the end of each iteration, just to be sure.
>
>>
>> After each iteration, one needs to be sure that no ram is inflight.
>>
>> This means that I think you don't need the last_sync parameter at all,
>> as you have to do the flush() in every iteration.

It means guest memory corruption, it _needs_ to be there.
That is the whole reason why multifd code has to wait after each
iteration for all channels to finish.  Probability of failure is low,
but it exist, so it needs to be handled correctly.
>> >              '*multifd-zlib-level': 'uint8',
>> >              '*multifd-zstd-level': 'uint8',
>> > +         '*multifd-zerocopy': 'bool',
>> >              '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
>>
>> Something weird here.
>>
>> >              '*multifd-compression': 'MultiFDCompression',
>> >              '*multifd-zlib-level': 'uint8',
>> >              '*multifd-zstd-level': 'uint8',
>> > +         '*multifd-zerocopy': 'bool',
>> >              '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
>> >
>>
>> Same here.
>
> Could you please elaborate?

Indentation, 

+         '*multifd-zerocopy': 'bool',
             '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }

This is how it is seen here.  space/tab?

Later, Juan.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-11-03 23:24       ` Juan Quintela
@ 2021-11-04  3:43         ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 35+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-11-04  3:43 UTC (permalink / raw)
  To: Juan Quintela
  Cc: Daniel P. Berrangé,
	qemu-devel, Jason Wang, Dr. David Alan Gilbert, Peter Xu,
	Markus Armbruster, Eric Blake

On Wed, Nov 3, 2021 at 8:24 PM Juan Quintela <quintela@redhat.com> wrote:
>
> Leonardo Bras Soares Passos <leobras@redhat.com> wrote:
> > Hello Juan,
>
> hi
>
> > Current multifd's sendmsg() will block until all data is sent, is that correct?
> >
> > If that's the case, and supposing the network driver supports
> > multiqueue, maybe there is a small chance for this to happen.
> > I will add the flush at the end of each iteration, just to be sure.
> >
> >>
> >> After each iteration, one needs to be sure that no ram is inflight.
> >>
> >> This means that I think you don't need the last_sync parameter at all,
> >> as you have to do the flush() in every iteration.
>
> It means guest memory corruption, it _needs_ to be there.
> That is the whole reason why multifd code has to wait after each
> iteration for all channels to finish.  Probability of failure is low,
> but it exist, so it needs to be handled correctly.

Sure, I will add a flush after each iteration.

> >> >              '*multifd-zlib-level': 'uint8',
> >> >              '*multifd-zstd-level': 'uint8',
> >> > +         '*multifd-zerocopy': 'bool',
> >> >              '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
> >>
> >> Something weird here.
> >>
> >> >              '*multifd-compression': 'MultiFDCompression',
> >> >              '*multifd-zlib-level': 'uint8',
> >> >              '*multifd-zstd-level': 'uint8',
> >> > +         '*multifd-zerocopy': 'bool',
> >> >              '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
> >> >
> >>
> >> Same here.
> >
> > Could you please elaborate?
>
> Indentation,
>
> +         '*multifd-zerocopy': 'bool',
>              '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
>
> This is how it is seen here.  space/tab?

It was supposed to be using only spaces.
I will make sure to fix that.

>
> Later, Juan.
>
Thanks!

Best regards,
Leo



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-11-02 12:32   ` Juan Quintela
  2021-11-03 21:29     ` Leonardo Bras Soares Passos
@ 2022-04-14  4:00     ` Leonardo Bras Soares Passos
  1 sibling, 0 replies; 35+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-04-14  4:00 UTC (permalink / raw)
  To: Juan Quintela
  Cc: Daniel P. Berrangé,
	qemu-devel, Jason Wang, Dr. David Alan Gilbert, Peter Xu,
	Markus Armbruster, Eric Blake

Hello Juan,

Sorry to go back that early in discussion, but I was reviewing for v9
and I am not sure If I am unable to recall the reason, or I missed an
argument here.
Could you please help me with this?

On Tue, Nov 2, 2021 at 9:32 AM Juan Quintela <quintela@redhat.com> wrote:
>
> Leonardo Bras <leobras@redhat.com> wrote:
> > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> > zerocopy interface.
> >
> > Change multifd_send_sync_main() so it can distinguish the last sync from
> > the setup and per-iteration ones, so a flush_zerocopy() can be called
> > at the last sync in order to make sure all RAM is sent before finishing
> > the migration.
>
> You need to do this after each iteration.  Otherwise it can happen that:
>
> channel 1:               channel 2:
>
>    send page 11
>
> next iteration
>                          send page 11
>
>                          this page arrives
>
> now arrives this old copy.
>
> After each iteration, one needs to be sure that no ram is inflight.
>
> This means that I think you don't need the last_sync parameter at all,
> as you have to do the flush() in every iteration.

The flush command is used to guarantee every packet queued before
flush is actually sent before flush returns.
I mean, flushing every iteration will not help with the situation
above, where the pages are sent in order, but arrive at target in a
different order.

There is a chance that in the above text you meant 'send page' as
"queue page for sending", and 'page arrives' as "actually send the
queued page".
It that is correct, then syncing every iteration should not be necessary:
- On page queue, Linux saves the page address and size for sending
- On actual send, Linux will send the current data in the page and send.

So, in this example, if page 11 from iteration 'i' happens to be
'actually sent' after page 11 from iteration 'i+1', it would not be an
issue:
###
channel 1:               channel 2:
Iteration i

queue page 11 (i)

iteration i+1
                          queue page 11 (i+1)
                          actually send page 11 (i+1)

actually send page 11 (i)
###

That's because page 11 (i) will contain a newer version compared to
page 11 (i+1)

tl;dr:
- The page content always depends on the send time, instead of queue time.
- The iteration count describes the queue time.
(on non-zerocopy it's the opposite: it will depend on queue time,
because it copies the memory content during enqueue)

>
[...]

Juan, could you please help me understand if I am missing a part of
your argument up there?
Also, syncing every iteration is still necessary / recommended?

Best regards,
Leo



^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2022-04-14  4:04 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-09  7:56 [PATCH v4 0/3] MSG_ZEROCOPY for multifd Leonardo Bras
2021-10-09  7:56 ` [PATCH v4 1/3] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks Leonardo Bras
2021-10-11 19:17   ` Eric Blake
2021-10-11 19:38     ` Leonardo Bras Soares Passos
2021-10-11 20:45       ` Eric Blake
2021-10-11 20:59         ` Leonardo Bras Soares Passos
2021-10-13  6:07   ` Peter Xu
2021-10-13  6:32     ` Peter Xu
2021-10-27  6:07       ` Leonardo Bras Soares Passos
2021-10-27  6:15         ` Peter Xu
2021-10-27  6:31           ` Leonardo Bras Soares Passos
2021-10-09  7:56 ` [PATCH v4 2/3] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX Leonardo Bras
2021-10-11 19:27   ` Eric Blake
2021-10-11 19:44     ` Leonardo Bras Soares Passos
2021-10-13  6:18   ` Peter Xu
2021-10-27  6:30     ` Leonardo Bras Soares Passos
2021-11-02 13:13   ` Juan Quintela
2021-11-03 20:50     ` Leonardo Bras Soares Passos
2021-10-09  7:56 ` [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy) Leonardo Bras
2021-10-11 19:31   ` Eric Blake
2021-10-11 19:56     ` Leonardo Bras Soares Passos
2021-10-12  5:53       ` Markus Armbruster
2021-10-28  1:56         ` Leonardo Bras Soares Passos
2021-10-28  4:30           ` Markus Armbruster
2021-10-28  4:37             ` Leonardo Bras Soares Passos
2021-10-13  6:23   ` Peter Xu
2021-10-27  6:47     ` Leonardo Bras Soares Passos
2021-10-27  7:06       ` Peter Xu
2021-10-13  6:26   ` Peter Xu
2021-10-27  6:50     ` Leonardo Bras Soares Passos
2021-11-02 12:32   ` Juan Quintela
2021-11-03 21:29     ` Leonardo Bras Soares Passos
2021-11-03 23:24       ` Juan Quintela
2021-11-04  3:43         ` Leonardo Bras Soares Passos
2022-04-14  4:00     ` Leonardo Bras Soares Passos

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.