All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/6] MSG_ZEROCOPY + multifd
@ 2021-11-12  5:10 Leonardo Bras
  2021-11-12  5:10 ` [PATCH v5 1/6] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks Leonardo Bras
                   ` (5 more replies)
  0 siblings, 6 replies; 45+ messages in thread
From: Leonardo Bras @ 2021-11-12  5:10 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster
  Cc: Leonardo Bras, qemu-devel

This patch series intends to enable MSG_ZEROCOPY in QIOChannel, and make
use of it for multifd migration performance improvement. 

Patch #1 creates new callbacks for QIOChannel, allowing the implementation
of zerocopy writing.

Patch #2 reworks qio_channel_socket_writev() so it accepts flags for 
that are later passed to sendmsg().

Patch #3 implements writev_zerocopy and flush_zerocopy on QIOChannelSocket,
making use of MSG_ZEROCOPY on Linux.

Patch #4 adds a "zerocopy" migration property, only available with
CONFIG_LINUX, and compiled-out in any other architectures. 
This migration property has to be enabled before multifd migration starts.

Patch #5 adds a helper function that allows to see if TLS is going to be used.
This helper will be later used in patch #6.

Patch #6 Makes use of QIOChannelSocket zerocopy implementation on
nocomp multifd migration.

Results:
So far, the resource usage of __sys_sendmsg() reduced 15 times, and the
overall migration took 13-22% less time, based in synthetic cpu workload.

The objective is to reduce migration time in hosts with heavy cpu usage.

---
Changes since v4:
- 3 patches got splitted in 6
- Flush is used for syncing after each iteration, instead of only at the end
- If zerocopy is not available, fail in connect instead of failing on write
- 'multifd-zerocopy' property renamed to 'zerocopy'
- Fail migrations that don't support zerocopy, if it's enabled.
- Instead of checking for zerocopy at each write, save the flags in
  MultiFDSendParams->write_flags and use them on write
- Reorganized flag usage in QIOChannelSocket 
- A lot of typos fixed
- More doc on buffer restrictions

Changes since v3:
- QIOChannel interface names changed from io_async_{writev,flush} to
  io_{writev,flush}_zerocopy
- Instead of falling back in case zerocopy is not implemented, return
  error and abort operation.
- Flush now waits as long as needed, or return error in case anything
  goes wrong, aborting the operation.
- Zerocopy is now conditional in multifd, being set by parameter
  multifd-zerocopy
- Moves zerocopy_flush to multifd_send_sync_main() from multifd_save_cleanup
  so migration can abort if flush goes wrong.
- Several other small improvements

Changes since v2:
- Patch #1: One more fallback
- Patch #2: Fall back to sync if fails to lock buffer memory in MSG_ZEROCOPY send.

Changes since v1:
- Reimplemented the patchset using async_write + async_flush approach.
- Implemented a flush to be able to tell whenever all data was written.

Leonardo Bras (6):
  QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks
  QIOChannelSocket: Add flags parameter for writing
  QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for
    CONFIG_LINUX
  migration: Add zerocopy parameter for QMP/HMP for Linux
  migration: Add migrate_use_tls() helper
  multifd: Implement zerocopy write in multifd migration
    (multifd-zerocopy)

 qapi/migration.json         |  18 ++++
 include/io/channel-socket.h |   2 +
 include/io/channel.h        |  94 ++++++++++++++++---
 migration/migration.h       |   6 ++
 migration/multifd.h         |   4 +-
 io/channel-socket.c         | 176 ++++++++++++++++++++++++++++++++++--
 io/channel.c                |  63 ++++++++++---
 migration/channel.c         |   6 +-
 migration/migration.c       |  41 +++++++++
 migration/multifd.c         |  64 ++++++++++---
 migration/ram.c             |  29 ++++--
 migration/socket.c          |   5 +
 monitor/hmp-cmds.c          |   6 ++
 13 files changed, 455 insertions(+), 59 deletions(-)

-- 
2.33.1



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v5 1/6] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks
  2021-11-12  5:10 [PATCH v5 0/6] MSG_ZEROCOPY + multifd Leonardo Bras
@ 2021-11-12  5:10 ` Leonardo Bras
  2021-11-12 10:13   ` Daniel P. Berrangé
  2021-11-12 10:56   ` Daniel P. Berrangé
  2021-11-12  5:10 ` [PATCH v5 2/6] QIOChannelSocket: Add flags parameter for writing Leonardo Bras
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 45+ messages in thread
From: Leonardo Bras @ 2021-11-12  5:10 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster
  Cc: Leonardo Bras, qemu-devel

Adds io_writev_zerocopy and io_flush_zerocopy as optional callback to QIOChannelClass,
allowing the implementation of zerocopy writes by subclasses.

How to use them:
- Write data using qio_channel_writev_zerocopy(),
- Wait write completion with qio_channel_flush_zerocopy().

Notes:
As some zerocopy implementations work asynchronously, it's
recommended to keep the write buffer untouched until the return of
qio_channel_flush_zerocopy(), to avoid the risk of sending an updated
buffer instead of the one at the write.

As the new callbacks are optional, if a subclass does not implement them, then:
- io_writev_zerocopy will return -1,
- io_flush_zerocopy will return 0 without changing anything.

Also, some functions like qio_channel_writev_full_all() were adapted to
receive a flag parameter. That allows shared code between zerocopy and
non-zerocopy writev.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 include/io/channel.h | 93 ++++++++++++++++++++++++++++++++++++++------
 io/channel.c         | 65 +++++++++++++++++++++++++------
 2 files changed, 135 insertions(+), 23 deletions(-)

diff --git a/include/io/channel.h b/include/io/channel.h
index 88988979f8..a19c09bb84 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -32,12 +32,15 @@ OBJECT_DECLARE_TYPE(QIOChannel, QIOChannelClass,
 
 #define QIO_CHANNEL_ERR_BLOCK -2
 
+#define QIO_CHANNEL_WRITE_FLAG_ZEROCOPY 0x1
+
 typedef enum QIOChannelFeature QIOChannelFeature;
 
 enum QIOChannelFeature {
     QIO_CHANNEL_FEATURE_FD_PASS,
     QIO_CHANNEL_FEATURE_SHUTDOWN,
     QIO_CHANNEL_FEATURE_LISTEN,
+    QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY,
 };
 
 
@@ -136,6 +139,12 @@ struct QIOChannelClass {
                                   IOHandler *io_read,
                                   IOHandler *io_write,
                                   void *opaque);
+    ssize_t (*io_writev_zerocopy)(QIOChannel *ioc,
+                                  const struct iovec *iov,
+                                  size_t niov,
+                                  Error **errp);
+    int (*io_flush_zerocopy)(QIOChannel *ioc,
+                             Error **errp);
 };
 
 /* General I/O handling functions */
@@ -321,10 +330,11 @@ int qio_channel_readv_all(QIOChannel *ioc,
 
 
 /**
- * qio_channel_writev_all:
+ * qio_channel_writev_all_flags:
  * @ioc: the channel object
  * @iov: the array of memory regions to write data from
  * @niov: the length of the @iov array
+ * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
  * @errp: pointer to a NULL-initialized error object
  *
  * Write data to the IO channel, reading it from the
@@ -337,12 +347,23 @@ int qio_channel_readv_all(QIOChannel *ioc,
  * to be written, yielding from the current coroutine
  * if required.
  *
+ * If QIO_CHANNEL_WRITE_FLAG_ZEROCOPY is passed in flags,
+ * instead of waiting for all requested data to be written,
+ * this function will wait until it's all queued for writing.
+ * In this case, if the buffer gets changed between queueing and
+ * sending, the updated buffer will be sent. If this is not a
+ * desired behavior, it's suggested to call qio_channel_flush_zerocopy()
+ * before reusing the buffer.
+ *
  * Returns: 0 if all bytes were written, or -1 on error
  */
-int qio_channel_writev_all(QIOChannel *ioc,
-                           const struct iovec *iov,
-                           size_t niov,
-                           Error **erp);
+int qio_channel_writev_all_flags(QIOChannel *ioc,
+                                 const struct iovec *iov,
+                                 size_t niov,
+                                 int flags,
+                                 Error **errp);
+#define qio_channel_writev_all(ioc, iov, niov, errp) \
+    qio_channel_writev_all_flags(ioc, iov, niov, 0, errp)
 
 /**
  * qio_channel_readv:
@@ -831,12 +852,13 @@ int qio_channel_readv_full_all(QIOChannel *ioc,
                                Error **errp);
 
 /**
- * qio_channel_writev_full_all:
+ * qio_channel_writev_full_all_flags:
  * @ioc: the channel object
  * @iov: the array of memory regions to write data from
  * @niov: the length of the @iov array
  * @fds: an array of file handles to send
  * @nfds: number of file handles in @fds
+ * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
  * @errp: pointer to a NULL-initialized error object
  *
  *
@@ -846,13 +868,62 @@ int qio_channel_readv_full_all(QIOChannel *ioc,
  * to be written, yielding from the current coroutine
  * if required.
  *
+ * If QIO_CHANNEL_WRITE_FLAG_ZEROCOPY is passed in flags,
+ * instead of waiting for all requested data to be written,
+ * this function will wait until it's all queued for writing.
+ * In this case, if the buffer gets changed between queueing and
+ * sending, the updated buffer will be sent. If this is not a
+ * desired behavior, it's suggested to call qio_channel_flush_zerocopy()
+ * before reusing the buffer.
+ *
  * Returns: 0 if all bytes were written, or -1 on error
  */
 
-int qio_channel_writev_full_all(QIOChannel *ioc,
-                                const struct iovec *iov,
-                                size_t niov,
-                                int *fds, size_t nfds,
-                                Error **errp);
+int qio_channel_writev_full_all_flags(QIOChannel *ioc,
+                                      const struct iovec *iov,
+                                      size_t niov,
+                                      int *fds, size_t nfds,
+                                      int flags, Error **errp);
+#define qio_channel_writev_full_all(ioc, iov, niov, fds, nfds, errp) \
+    qio_channel_writev_full_all_flags(ioc, iov, niov, fds, nfds, 0, errp)
+
+/**
+ * qio_channel_writev_zerocopy:
+ * @ioc: the channel object
+ * @iov: the array of memory regions to write data from
+ * @niov: the length of the @iov array
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Behaves like qio_channel_writev_full_all_flags, but may write
+ * data asynchronously while avoiding unnecessary data copy.
+ * This function may return before any data is actually written,
+ * but should queue every buffer for writing.
+ *
+ * If at some point it's necessary to wait for all data to be
+ * written, use qio_channel_flush_zerocopy().
+ *
+ * If zerocopy is not available, returns -1 and set errp.
+ */
+
+ssize_t qio_channel_writev_zerocopy(QIOChannel *ioc,
+                                    const struct iovec *iov,
+                                    size_t niov,
+                                    Error **errp);
+
+/**
+ * qio_channel_flush_zerocopy:
+ * @ioc: the channel object
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Will block until every packet queued with
+ * qio_channel_writev_zerocopy() is sent, or return
+ * in case of any error.
+ *
+ * Returns -1 if any error is found, 0 otherwise.
+ * If not implemented, acts as a no-op, and returns 0.
+ */
+
+int qio_channel_flush_zerocopy(QIOChannel *ioc,
+                               Error **errp);
 
 #endif /* QIO_CHANNEL_H */
diff --git a/io/channel.c b/io/channel.c
index e8b019dc36..009da9b772 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -212,19 +212,21 @@ int qio_channel_readv_full_all(QIOChannel *ioc,
     return ret;
 }
 
-int qio_channel_writev_all(QIOChannel *ioc,
-                           const struct iovec *iov,
-                           size_t niov,
-                           Error **errp)
+int qio_channel_writev_all_flags(QIOChannel *ioc,
+                                 const struct iovec *iov,
+                                 size_t niov,
+                                 int flags,
+                                 Error **errp)
 {
-    return qio_channel_writev_full_all(ioc, iov, niov, NULL, 0, errp);
+    return qio_channel_writev_full_all_flags(ioc, iov, niov, NULL, 0, flags,
+                                             errp);
 }
 
-int qio_channel_writev_full_all(QIOChannel *ioc,
-                                const struct iovec *iov,
-                                size_t niov,
-                                int *fds, size_t nfds,
-                                Error **errp)
+int qio_channel_writev_full_all_flags(QIOChannel *ioc,
+                                      const struct iovec *iov,
+                                      size_t niov,
+                                      int *fds, size_t nfds,
+                                      int flags, Error **errp)
 {
     int ret = -1;
     struct iovec *local_iov = g_new(struct iovec, niov);
@@ -237,8 +239,15 @@ int qio_channel_writev_full_all(QIOChannel *ioc,
 
     while (nlocal_iov > 0) {
         ssize_t len;
-        len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds, nfds,
-                                      errp);
+
+        if (flags & QIO_CHANNEL_WRITE_FLAG_ZEROCOPY) {
+            assert(fds == NULL && nfds == 0);
+            len = qio_channel_writev_zerocopy(ioc, local_iov, nlocal_iov, errp);
+        } else {
+            len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds, nfds,
+                                          errp);
+        }
+
         if (len == QIO_CHANNEL_ERR_BLOCK) {
             if (qemu_in_coroutine()) {
                 qio_channel_yield(ioc, G_IO_OUT);
@@ -474,6 +483,38 @@ off_t qio_channel_io_seek(QIOChannel *ioc,
 }
 
 
+ssize_t qio_channel_writev_zerocopy(QIOChannel *ioc,
+                                    const struct iovec *iov,
+                                    size_t niov,
+                                    Error **errp)
+{
+    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
+
+    if (!klass->io_writev_zerocopy ||
+        !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY)) {
+        error_setg_errno(errp, EINVAL,
+                         "Channel does not support zerocopy writev");
+        return -1;
+    }
+
+    return klass->io_writev_zerocopy(ioc, iov, niov, errp);
+}
+
+
+int qio_channel_flush_zerocopy(QIOChannel *ioc,
+                               Error **errp)
+{
+    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
+
+    if (!klass->io_flush_zerocopy ||
+        !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY)) {
+        return 0;
+    }
+
+    return klass->io_flush_zerocopy(ioc, errp);
+}
+
+
 static void qio_channel_restart_read(void *opaque)
 {
     QIOChannel *ioc = opaque;
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 2/6] QIOChannelSocket: Add flags parameter for writing
  2021-11-12  5:10 [PATCH v5 0/6] MSG_ZEROCOPY + multifd Leonardo Bras
  2021-11-12  5:10 ` [PATCH v5 1/6] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks Leonardo Bras
@ 2021-11-12  5:10 ` Leonardo Bras
  2021-11-12 10:15   ` Daniel P. Berrangé
  2021-11-12  5:10 ` [PATCH v5 3/6] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX Leonardo Bras
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 45+ messages in thread
From: Leonardo Bras @ 2021-11-12  5:10 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster
  Cc: Leonardo Bras, qemu-devel

Change qio_channel_socket_writev() in order to accept flags, so its possible
to selectively make use of sendmsg() flags.

qio_channel_socket_writev() contents were moved to a helper function
qio_channel_socket_writev_flags() which accepts an extra argument for flags.
(This argument is passed directly to sendmsg().

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 io/channel-socket.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/io/channel-socket.c b/io/channel-socket.c
index 606ec97cf7..b57a27bf91 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -520,12 +520,13 @@ static ssize_t qio_channel_socket_readv(QIOChannel *ioc,
     return ret;
 }
 
-static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
-                                         const struct iovec *iov,
-                                         size_t niov,
-                                         int *fds,
-                                         size_t nfds,
-                                         Error **errp)
+static ssize_t qio_channel_socket_writev_flags(QIOChannel *ioc,
+                                               const struct iovec *iov,
+                                               size_t niov,
+                                               int *fds,
+                                               size_t nfds,
+                                               int flags,
+                                               Error **errp)
 {
     QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
     ssize_t ret;
@@ -558,7 +559,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
     }
 
  retry:
-    ret = sendmsg(sioc->fd, &msg, 0);
+    ret = sendmsg(sioc->fd, &msg, flags);
     if (ret <= 0) {
         if (errno == EAGAIN) {
             return QIO_CHANNEL_ERR_BLOCK;
@@ -572,6 +573,17 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
     }
     return ret;
 }
+
+static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
+                                         const struct iovec *iov,
+                                         size_t niov,
+                                         int *fds,
+                                         size_t nfds,
+                                         Error **errp)
+{
+    return qio_channel_socket_writev_flags(ioc, iov, niov, fds, nfds, 0, errp);
+}
+
 #else /* WIN32 */
 static ssize_t qio_channel_socket_readv(QIOChannel *ioc,
                                         const struct iovec *iov,
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 3/6] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX
  2021-11-12  5:10 [PATCH v5 0/6] MSG_ZEROCOPY + multifd Leonardo Bras
  2021-11-12  5:10 ` [PATCH v5 1/6] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks Leonardo Bras
  2021-11-12  5:10 ` [PATCH v5 2/6] QIOChannelSocket: Add flags parameter for writing Leonardo Bras
@ 2021-11-12  5:10 ` Leonardo Bras
  2021-11-12 10:54   ` Daniel P. Berrangé
  2021-11-12  5:10 ` [PATCH v5 4/6] migration: Add zerocopy parameter for QMP/HMP for Linux Leonardo Bras
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 45+ messages in thread
From: Leonardo Bras @ 2021-11-12  5:10 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster
  Cc: Leonardo Bras, qemu-devel

For CONFIG_LINUX, implement the new optional callbacks io_write_zerocopy and
io_flush_zerocopy on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
feature is available in the host kernel, which is checked on
qio_channel_socket_connect_sync()

qio_channel_socket_flush_zerocopy() was implemented by counting how many times
sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the
socket's error queue, in order to find how many of them finished sending.
Flush will loop until those counters are the same, or until some error occurs.

A new function qio_channel_socket_poll() was also created in order to avoid
busy-looping recvmsg() in qio_channel_socket_flush_zerocopy() while waiting for
updates in socket's error queue.

Notes on using writev_zerocopy():
1: Buffer
- As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying,
some caution is necessary to avoid overwriting any buffer before it's sent.
If something like this happen, a newer version of the buffer may be sent instead.
- If this is a problem, it's recommended to call flush_zerocopy() before freeing
or re-using the buffer.

2: Locked memory
- When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and
unlocked after it's sent.
- Depending on the size of each buffer, and how often it's sent, it may require
a larger amount of locked memory than usually available to non-root user.
- If the required amount of locked memory is not available, writev_zerocopy
will return an error, which can abort an operation like migration,
- Because of this, when an user code wants to add zerocopy as a feature, it
requires a mechanism to disable it, so it can still be accessible to less
privileged users.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 include/io/channel-socket.h |   2 +
 include/io/channel.h        |   1 +
 io/channel-socket.c         | 150 +++++++++++++++++++++++++++++++++++-
 3 files changed, 150 insertions(+), 3 deletions(-)

diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
index e747e63514..81d04baa4c 100644
--- a/include/io/channel-socket.h
+++ b/include/io/channel-socket.h
@@ -47,6 +47,8 @@ struct QIOChannelSocket {
     socklen_t localAddrLen;
     struct sockaddr_storage remoteAddr;
     socklen_t remoteAddrLen;
+    ssize_t zerocopy_queued;
+    ssize_t zerocopy_sent;
 };
 
 
diff --git a/include/io/channel.h b/include/io/channel.h
index a19c09bb84..051fff4197 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -31,6 +31,7 @@ OBJECT_DECLARE_TYPE(QIOChannel, QIOChannelClass,
 
 
 #define QIO_CHANNEL_ERR_BLOCK -2
+#define QIO_CHANNEL_ERR_NOBUFS -3
 
 #define QIO_CHANNEL_WRITE_FLAG_ZEROCOPY 0x1
 
diff --git a/io/channel-socket.c b/io/channel-socket.c
index b57a27bf91..c724b849ad 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -26,6 +26,10 @@
 #include "io/channel-watch.h"
 #include "trace.h"
 #include "qapi/clone-visitor.h"
+#ifdef CONFIG_LINUX
+#include <linux/errqueue.h>
+#include <poll.h>
+#endif
 
 #define SOCKET_MAX_FDS 16
 
@@ -55,6 +59,8 @@ qio_channel_socket_new(void)
 
     sioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET));
     sioc->fd = -1;
+    sioc->zerocopy_queued = 0;
+    sioc->zerocopy_sent = 0;
 
     ioc = QIO_CHANNEL(sioc);
     qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN);
@@ -140,6 +146,7 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
                                     Error **errp)
 {
     int fd;
+    int ret, v = 1;
 
     trace_qio_channel_socket_connect_sync(ioc, addr);
     fd = socket_connect(addr, errp);
@@ -154,6 +161,15 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
         return -1;
     }
 
+#ifdef CONFIG_LINUX
+    ret = qemu_setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v));
+    if (ret == 0) {
+        /* Zerocopy available on host */
+        qio_channel_set_feature(QIO_CHANNEL(ioc),
+                                QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY);
+    }
+#endif
+
     return 0;
 }
 
@@ -561,12 +577,15 @@ static ssize_t qio_channel_socket_writev_flags(QIOChannel *ioc,
  retry:
     ret = sendmsg(sioc->fd, &msg, flags);
     if (ret <= 0) {
-        if (errno == EAGAIN) {
+        switch (errno) {
+        case EAGAIN:
             return QIO_CHANNEL_ERR_BLOCK;
-        }
-        if (errno == EINTR) {
+        case EINTR:
             goto retry;
+        case ENOBUFS:
+            return QIO_CHANNEL_ERR_NOBUFS;
         }
+
         error_setg_errno(errp, errno,
                          "Unable to write to socket");
         return -1;
@@ -670,6 +689,127 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
 }
 #endif /* WIN32 */
 
+
+#ifdef CONFIG_LINUX
+
+static int qio_channel_socket_poll(QIOChannelSocket *sioc, bool zerocopy,
+                                   Error **errp)
+{
+    struct pollfd pfd;
+    int ret;
+
+    pfd.fd = sioc->fd;
+    pfd.events = 0;
+
+ retry:
+    ret = poll(&pfd, 1, -1);
+    if (ret < 0) {
+        switch (errno) {
+        case EAGAIN:
+        case EINTR:
+            goto retry;
+        default:
+            error_setg_errno(errp, errno,
+                             "Poll error");
+            return ret;
+        }
+    }
+
+    if (pfd.revents & (POLLHUP | POLLNVAL)) {
+        error_setg(errp, "Poll error: Invalid or disconnected fd");
+        return -1;
+    }
+
+    if (!zerocopy && (pfd.revents & POLLERR)) {
+        error_setg(errp, "Poll error: Errors present in errqueue");
+        return -1;
+    }
+
+    return ret;
+}
+
+static ssize_t qio_channel_socket_writev_zerocopy(QIOChannel *ioc,
+                                                  const struct iovec *iov,
+                                                  size_t niov,
+                                                  Error **errp)
+{
+    QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
+    ssize_t ret;
+
+    ret = qio_channel_socket_writev_flags(ioc, iov, niov, NULL, 0,
+                                          MSG_ZEROCOPY, errp);
+    if (ret == QIO_CHANNEL_ERR_NOBUFS) {
+        error_setg_errno(errp, errno,
+                         "Process can't lock enough memory for using MSG_ZEROCOPY");
+        return -1;
+    }
+
+    sioc->zerocopy_queued++;
+    return ret;
+}
+
+static int qio_channel_socket_flush_zerocopy(QIOChannel *ioc,
+                                             Error **errp)
+{
+    QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
+    struct msghdr msg = {};
+    struct sock_extended_err *serr;
+    struct cmsghdr *cm;
+    char control[CMSG_SPACE(sizeof(*serr))];
+    int ret;
+
+    msg.msg_control = control;
+    msg.msg_controllen = sizeof(control);
+    memset(control, 0, sizeof(control));
+
+    while (sioc->zerocopy_sent < sioc->zerocopy_queued) {
+        ret = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE);
+        if (ret < 0) {
+            switch (errno) {
+            case EAGAIN:
+                /* Nothing on errqueue, wait until something is available */
+                ret = qio_channel_socket_poll(sioc, true, errp);
+                if (ret < 0) {
+                    return -1;
+                }
+                continue;
+            case EINTR:
+                continue;
+            default:
+                error_setg_errno(errp, errno,
+                                 "Unable to read errqueue");
+                return -1;
+            }
+        }
+
+        cm = CMSG_FIRSTHDR(&msg);
+        if (cm->cmsg_level != SOL_IP &&
+            cm->cmsg_type != IP_RECVERR) {
+            error_setg_errno(errp, EPROTOTYPE,
+                             "Wrong cmsg in errqueue");
+            return -1;
+        }
+
+        serr = (void *) CMSG_DATA(cm);
+        if (serr->ee_errno != SO_EE_ORIGIN_NONE) {
+            error_setg_errno(errp, serr->ee_errno,
+                             "Error on socket");
+            return -1;
+        }
+        if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) {
+            error_setg_errno(errp, serr->ee_origin,
+                             "Error not from zerocopy");
+            return -1;
+        }
+
+        /* No errors, count successfully finished sendmsg()*/
+        sioc->zerocopy_sent += serr->ee_data - serr->ee_info + 1;
+    }
+    return 0;
+}
+
+#endif /* CONFIG_LINUX */
+
 static int
 qio_channel_socket_set_blocking(QIOChannel *ioc,
                                 bool enabled,
@@ -799,6 +939,10 @@ static void qio_channel_socket_class_init(ObjectClass *klass,
     ioc_klass->io_set_delay = qio_channel_socket_set_delay;
     ioc_klass->io_create_watch = qio_channel_socket_create_watch;
     ioc_klass->io_set_aio_fd_handler = qio_channel_socket_set_aio_fd_handler;
+#ifdef CONFIG_LINUX
+    ioc_klass->io_writev_zerocopy = qio_channel_socket_writev_zerocopy;
+    ioc_klass->io_flush_zerocopy = qio_channel_socket_flush_zerocopy;
+#endif
 }
 
 static const TypeInfo qio_channel_socket_info = {
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 4/6] migration: Add zerocopy parameter for QMP/HMP for Linux
  2021-11-12  5:10 [PATCH v5 0/6] MSG_ZEROCOPY + multifd Leonardo Bras
                   ` (2 preceding siblings ...)
  2021-11-12  5:10 ` [PATCH v5 3/6] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX Leonardo Bras
@ 2021-11-12  5:10 ` Leonardo Bras
  2021-11-12 11:04   ` Juan Quintela
  2021-11-12 11:05   ` Daniel P. Berrangé
  2021-11-12  5:10 ` [PATCH v5 5/6] migration: Add migrate_use_tls() helper Leonardo Bras
  2021-11-12  5:10 ` [PATCH v5 6/6] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy) Leonardo Bras
  5 siblings, 2 replies; 45+ messages in thread
From: Leonardo Bras @ 2021-11-12  5:10 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster
  Cc: Leonardo Bras, qemu-devel

Add property that allows zerocopy migration of memory pages,
and also includes a helper function migrate_use_zerocopy() to check
if it's enabled.

No code is introduced to actually do the migration, but it allow
future implementations to enable/disable this feature.

On non-Linux builds this parameter is compiled-out.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 qapi/migration.json   | 18 ++++++++++++++++++
 migration/migration.h |  5 +++++
 migration/migration.c | 32 ++++++++++++++++++++++++++++++++
 migration/multifd.c   | 17 +++++++++--------
 migration/socket.c    |  5 +++++
 monitor/hmp-cmds.c    |  6 ++++++
 6 files changed, 75 insertions(+), 8 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index bbfd48cf0b..9534c299d7 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -730,6 +730,11 @@
 #                      will consume more CPU.
 #                      Defaults to 1. (Since 5.0)
 #
+# @zerocopy: Controls behavior on sending memory pages on migration.
+#            When true, enables a zerocopy mechanism for sending memory
+#            pages, if host supports it.
+#            Defaults to false. (Since 6.2)
+#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #                        aliases for the purpose of dirty bitmap migration.  Such
 #                        aliases may for example be the corresponding names on the
@@ -769,6 +774,7 @@
            'xbzrle-cache-size', 'max-postcopy-bandwidth',
            'max-cpu-throttle', 'multifd-compression',
            'multifd-zlib-level' ,'multifd-zstd-level',
+           { 'name': 'zerocopy', 'if' : 'CONFIG_LINUX'},
            'block-bitmap-mapping' ] }
 
 ##
@@ -895,6 +901,11 @@
 #                      will consume more CPU.
 #                      Defaults to 1. (Since 5.0)
 #
+# @zerocopy: Controls behavior on sending memory pages on migration.
+#            When true, enables a zerocopy mechanism for sending memory
+#            pages, if host supports it.
+#            Defaults to false. (Since 6.2)
+#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #                        aliases for the purpose of dirty bitmap migration.  Such
 #                        aliases may for example be the corresponding names on the
@@ -949,6 +960,7 @@
             '*multifd-compression': 'MultiFDCompression',
             '*multifd-zlib-level': 'uint8',
             '*multifd-zstd-level': 'uint8',
+            '*zerocopy': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
             '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
 
 ##
@@ -1095,6 +1107,11 @@
 #                      will consume more CPU.
 #                      Defaults to 1. (Since 5.0)
 #
+# @zerocopy: Controls behavior on sending memory pages on migration.
+#            When true, enables a zerocopy mechanism for sending memory
+#            pages, if host supports it.
+#            Defaults to false. (Since 6.2)
+#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #                        aliases for the purpose of dirty bitmap migration.  Such
 #                        aliases may for example be the corresponding names on the
@@ -1147,6 +1164,7 @@
             '*multifd-compression': 'MultiFDCompression',
             '*multifd-zlib-level': 'uint8',
             '*multifd-zstd-level': 'uint8',
+            '*zerocopy': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
             '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
 
 ##
diff --git a/migration/migration.h b/migration/migration.h
index 8130b703eb..e61ef81f26 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -339,6 +339,11 @@ MultiFDCompression migrate_multifd_compression(void);
 int migrate_multifd_zlib_level(void);
 int migrate_multifd_zstd_level(void);
 
+#ifdef CONFIG_LINUX
+int migrate_use_zerocopy(void);
+#else
+#define migrate_use_zerocopy() (0)
+#endif
 int migrate_use_xbzrle(void);
 uint64_t migrate_xbzrle_cache_size(void);
 bool migrate_colo_enabled(void);
diff --git a/migration/migration.c b/migration/migration.c
index abaf6f9e3d..add3dabc56 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -886,6 +886,10 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     params->multifd_zlib_level = s->parameters.multifd_zlib_level;
     params->has_multifd_zstd_level = true;
     params->multifd_zstd_level = s->parameters.multifd_zstd_level;
+#ifdef CONFIG_LINUX
+    params->has_zerocopy = true;
+    params->zerocopy = s->parameters.zerocopy;
+#endif
     params->has_xbzrle_cache_size = true;
     params->xbzrle_cache_size = s->parameters.xbzrle_cache_size;
     params->has_max_postcopy_bandwidth = true;
@@ -1538,6 +1542,11 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
     if (params->has_multifd_compression) {
         dest->multifd_compression = params->multifd_compression;
     }
+#ifdef CONFIG_LINUX
+    if (params->has_zerocopy) {
+        dest->zerocopy = params->zerocopy;
+    }
+#endif
     if (params->has_xbzrle_cache_size) {
         dest->xbzrle_cache_size = params->xbzrle_cache_size;
     }
@@ -1650,6 +1659,11 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
     if (params->has_multifd_compression) {
         s->parameters.multifd_compression = params->multifd_compression;
     }
+#ifdef CONFIG_LINUX
+    if (params->has_zerocopy) {
+        s->parameters.zerocopy = params->zerocopy;
+    }
+#endif
     if (params->has_xbzrle_cache_size) {
         s->parameters.xbzrle_cache_size = params->xbzrle_cache_size;
         xbzrle_cache_resize(params->xbzrle_cache_size, errp);
@@ -2540,6 +2554,17 @@ int migrate_multifd_zstd_level(void)
     return s->parameters.multifd_zstd_level;
 }
 
+#ifdef CONFIG_LINUX
+int migrate_use_zerocopy(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->parameters.zerocopy;
+}
+#endif
+
 int migrate_use_xbzrle(void)
 {
     MigrationState *s;
@@ -4190,6 +4215,10 @@ static Property migration_properties[] = {
     DEFINE_PROP_UINT8("multifd-zstd-level", MigrationState,
                       parameters.multifd_zstd_level,
                       DEFAULT_MIGRATE_MULTIFD_ZSTD_LEVEL),
+#ifdef CONFIG_LINUX
+    DEFINE_PROP_BOOL("zerocopy", MigrationState,
+                      parameters.zerocopy, false),
+#endif
     DEFINE_PROP_SIZE("xbzrle-cache-size", MigrationState,
                       parameters.xbzrle_cache_size,
                       DEFAULT_MIGRATE_XBZRLE_CACHE_SIZE),
@@ -4287,6 +4316,9 @@ static void migration_instance_init(Object *obj)
     params->has_multifd_compression = true;
     params->has_multifd_zlib_level = true;
     params->has_multifd_zstd_level = true;
+#ifdef CONFIG_LINUX
+    params->has_zerocopy = true;
+#endif
     params->has_xbzrle_cache_size = true;
     params->has_max_postcopy_bandwidth = true;
     params->has_max_cpu_throttle = true;
diff --git a/migration/multifd.c b/migration/multifd.c
index 7c9deb1921..ab8f0f97be 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -854,16 +854,17 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
     trace_multifd_new_send_channel_async(p->id);
     if (qio_task_propagate_error(task, &local_err)) {
         goto cleanup;
-    } else {
-        p->c = QIO_CHANNEL(sioc);
-        qio_channel_set_delay(p->c, false);
-        p->running = true;
-        if (!multifd_channel_connect(p, sioc, local_err)) {
-            goto cleanup;
-        }
-        return;
     }
 
+    p->c = QIO_CHANNEL(sioc);
+    qio_channel_set_delay(p->c, false);
+    p->running = true;
+    if (!multifd_channel_connect(p, sioc, local_err)) {
+        goto cleanup;
+    }
+
+    return;
+
 cleanup:
     multifd_new_send_channel_cleanup(p, sioc, local_err);
 }
diff --git a/migration/socket.c b/migration/socket.c
index 05705a32d8..e26e94aa0c 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -77,6 +77,11 @@ static void socket_outgoing_migration(QIOTask *task,
     } else {
         trace_migration_socket_outgoing_connected(data->hostname);
     }
+
+    if (migrate_use_zerocopy()) {
+        error_setg(&err, "Zerocopy not available in migration");
+    }
+
     migration_channel_connect(data->s, sioc, data->hostname, err);
     object_unref(OBJECT(sioc));
 }
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 9c91bf93e9..442679dcfa 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1297,6 +1297,12 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
         p->has_multifd_zstd_level = true;
         visit_type_uint8(v, param, &p->multifd_zstd_level, &err);
         break;
+#ifdef CONFIG_LINUX
+    case MIGRATION_PARAMETER_ZEROCOPY:
+        p->has_zerocopy = true;
+        visit_type_bool(v, param, &p->zerocopy, &err);
+        break;
+#endif
     case MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE:
         p->has_xbzrle_cache_size = true;
         if (!visit_type_size(v, param, &cache_size, &err)) {
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 5/6] migration: Add migrate_use_tls() helper
  2021-11-12  5:10 [PATCH v5 0/6] MSG_ZEROCOPY + multifd Leonardo Bras
                   ` (3 preceding siblings ...)
  2021-11-12  5:10 ` [PATCH v5 4/6] migration: Add zerocopy parameter for QMP/HMP for Linux Leonardo Bras
@ 2021-11-12  5:10 ` Leonardo Bras
  2021-11-12 11:04   ` Juan Quintela
  2021-11-12  5:10 ` [PATCH v5 6/6] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy) Leonardo Bras
  5 siblings, 1 reply; 45+ messages in thread
From: Leonardo Bras @ 2021-11-12  5:10 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster
  Cc: Leonardo Bras, qemu-devel

A lot of places check parameters.tls_creds in order to evaluate if TLS is
in use, and sometimes call migrate_get_current() just for that test.

Add new helper function migrate_use_tls() in order to simplify testing
for TLS usage.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/migration.h | 1 +
 migration/channel.c   | 6 +++---
 migration/migration.c | 9 +++++++++
 migration/multifd.c   | 5 +----
 4 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index e61ef81f26..9f38419312 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -344,6 +344,7 @@ int migrate_use_zerocopy(void);
 #else
 #define migrate_use_zerocopy() (0)
 #endif
+int migrate_use_tls(void);
 int migrate_use_xbzrle(void);
 uint64_t migrate_xbzrle_cache_size(void);
 bool migrate_colo_enabled(void);
diff --git a/migration/channel.c b/migration/channel.c
index c4fc000a1a..1a45b75d29 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -32,16 +32,16 @@
  */
 void migration_channel_process_incoming(QIOChannel *ioc)
 {
-    MigrationState *s = migrate_get_current();
     Error *local_err = NULL;
 
     trace_migration_set_incoming_channel(
         ioc, object_get_typename(OBJECT(ioc)));
 
-    if (s->parameters.tls_creds &&
-        *s->parameters.tls_creds &&
+    if (migrate_use_tls() &&
         !object_dynamic_cast(OBJECT(ioc),
                              TYPE_QIO_CHANNEL_TLS)) {
+        MigrationState *s = migrate_get_current();
+
         migration_tls_channel_process_incoming(s, ioc, &local_err);
     } else {
         migration_ioc_register_yank(ioc);
diff --git a/migration/migration.c b/migration/migration.c
index add3dabc56..20ca99d726 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2565,6 +2565,15 @@ int migrate_use_zerocopy(void)
 }
 #endif
 
+int migrate_use_tls(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->parameters.tls_creds && *s->parameters.tls_creds;
+}
+
 int migrate_use_xbzrle(void)
 {
     MigrationState *s;
diff --git a/migration/multifd.c b/migration/multifd.c
index ab8f0f97be..3d9dc8cb58 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -794,14 +794,11 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
                                     QIOChannel *ioc,
                                     Error *error)
 {
-    MigrationState *s = migrate_get_current();
-
     trace_multifd_set_outgoing_channel(
         ioc, object_get_typename(OBJECT(ioc)), p->tls_hostname, error);
 
     if (!error) {
-        if (s->parameters.tls_creds &&
-            *s->parameters.tls_creds &&
+        if (migrate_use_tls() &&
             !object_dynamic_cast(OBJECT(ioc),
                                  TYPE_QIO_CHANNEL_TLS)) {
             multifd_tls_channel_connect(p, ioc, &error);
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 6/6] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-11-12  5:10 [PATCH v5 0/6] MSG_ZEROCOPY + multifd Leonardo Bras
                   ` (4 preceding siblings ...)
  2021-11-12  5:10 ` [PATCH v5 5/6] migration: Add migrate_use_tls() helper Leonardo Bras
@ 2021-11-12  5:10 ` Leonardo Bras
  2021-11-16 16:08   ` Juan Quintela
  5 siblings, 1 reply; 45+ messages in thread
From: Leonardo Bras @ 2021-11-12  5:10 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster
  Cc: Leonardo Bras, qemu-devel

Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
zerocopy interface.

Change multifd_send_sync_main() so it can distinguish each iteration sync from
the setup and the completion, so a flush_zerocopy() can be called
at the after each iteration in order to make sure all dirty pages are sent
before a new iteration is started.

Also make it return -1 if flush_zerocopy() fails, in order to cancel
the migration process, and avoid resuming the guest in the target host
without receiving all current RAM.

This will work fine on RAM migration because the RAM pages are not usually freed,
and there is no problem on changing the pages content between async_send() and
the actual sending of the buffer, because this change will dirty the page and
cause it to be re-sent on a next iteration anyway.

Given a lot of locked memory may be needed in order to use multid migration
with zerocopy enabled, make it optional by creating a new migration parameter
"zerocopy" on qapi, so low-privileged users can still perform multifd
migrations.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/multifd.h |  4 +++-
 migration/multifd.c | 37 ++++++++++++++++++++++++++++++++-----
 migration/ram.c     | 29 ++++++++++++++++++++++-------
 migration/socket.c  |  9 +++++++--
 4 files changed, 64 insertions(+), 15 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 15c50ca0b2..37941c1872 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -22,7 +22,7 @@ int multifd_load_cleanup(Error **errp);
 bool multifd_recv_all_channels_created(void);
 bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
 void multifd_recv_sync_main(void);
-void multifd_send_sync_main(QEMUFile *f);
+int multifd_send_sync_main(QEMUFile *f, bool sync);
 int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
 
 /* Multifd Compression flags */
@@ -97,6 +97,8 @@ typedef struct {
     uint32_t packet_len;
     /* pointer to the packet */
     MultiFDPacket_t *packet;
+    /* multifd flags for sending ram */
+    int write_flags;
     /* multifd flags for each packet */
     uint32_t flags;
     /* size of the next packet that contains pages */
diff --git a/migration/multifd.c b/migration/multifd.c
index 3d9dc8cb58..816078df60 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -105,7 +105,8 @@ static int nocomp_send_prepare(MultiFDSendParams *p, uint32_t used,
  */
 static int nocomp_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
 {
-    return qio_channel_writev_all(p->c, p->pages->iov, used, errp);
+    return qio_channel_writev_all_flags(p->c, p->pages->iov, used,
+                                        p->write_flags, errp);
 }
 
 /**
@@ -578,19 +579,27 @@ void multifd_save_cleanup(void)
     multifd_send_state = NULL;
 }
 
-void multifd_send_sync_main(QEMUFile *f)
+int multifd_send_sync_main(QEMUFile *f, bool sync)
 {
     int i;
+    bool flush_zerocopy;
 
     if (!migrate_use_multifd()) {
-        return;
+        return 0;
     }
     if (multifd_send_state->pages->used) {
         if (multifd_send_pages(f) < 0) {
             error_report("%s: multifd_send_pages fail", __func__);
-            return;
+            return 0;
         }
     }
+
+    /*
+     * When using zerocopy, it's necessary to flush after each iteration to make
+     * sure pages from earlier iterations don't end up replacing newer pages.
+     */
+    flush_zerocopy = sync && migrate_use_zerocopy();
+
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
 
@@ -601,7 +610,7 @@ void multifd_send_sync_main(QEMUFile *f)
         if (p->quit) {
             error_report("%s: channel %d has already quit", __func__, i);
             qemu_mutex_unlock(&p->mutex);
-            return;
+            return 0;
         }
 
         p->packet_num = multifd_send_state->packet_num++;
@@ -612,6 +621,17 @@ void multifd_send_sync_main(QEMUFile *f)
         ram_counters.transferred += p->packet_len;
         qemu_mutex_unlock(&p->mutex);
         qemu_sem_post(&p->sem);
+
+        if (flush_zerocopy) {
+            int ret;
+            Error *err = NULL;
+
+            ret = qio_channel_flush_zerocopy(p->c, &err);
+            if (ret < 0) {
+                error_report_err(err);
+                return -1;
+            }
+        }
     }
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
@@ -620,6 +640,8 @@ void multifd_send_sync_main(QEMUFile *f)
         qemu_sem_wait(&p->sem_sync);
     }
     trace_multifd_send_sync_main(multifd_send_state->packet_num);
+
+    return 0;
 }
 
 static void *multifd_send_thread(void *opaque)
@@ -853,6 +875,10 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
         goto cleanup;
     }
 
+    if (migrate_use_zerocopy()) {
+        p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
+    }
+
     p->c = QIO_CHANNEL(sioc);
     qio_channel_set_delay(p->c, false);
     p->running = true;
@@ -918,6 +944,7 @@ int multifd_save_setup(Error **errp)
         p->packet->version = cpu_to_be32(MULTIFD_VERSION);
         p->name = g_strdup_printf("multifdsend_%d", i);
         p->tls_hostname = g_strdup(s->hostname);
+        p->write_flags = 0;
         socket_send_channel_create(multifd_new_send_channel_async, p);
     }
 
diff --git a/migration/ram.c b/migration/ram.c
index 863035d235..0b3ddbffc1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2992,6 +2992,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMState **rsp = opaque;
     RAMBlock *block;
+    int ret;
 
     if (compress_threads_save_setup()) {
         return -1;
@@ -3026,7 +3027,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     ram_control_before_iterate(f, RAM_CONTROL_SETUP);
     ram_control_after_iterate(f, RAM_CONTROL_SETUP);
 
-    multifd_send_sync_main(f);
+    ret =  multifd_send_sync_main(f, false);
+    if (ret < 0) {
+        return ret;
+    }
+
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
     qemu_fflush(f);
 
@@ -3135,7 +3140,11 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 out:
     if (ret >= 0
         && migration_is_setup_or_active(migrate_get_current()->state)) {
-        multifd_send_sync_main(rs->f);
+        ret = multifd_send_sync_main(rs->f, true);
+        if (ret < 0) {
+            return ret;
+        }
+
         qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
         qemu_fflush(f);
         ram_counters.transferred += 8;
@@ -3193,13 +3202,19 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
         ram_control_after_iterate(f, RAM_CONTROL_FINISH);
     }
 
-    if (ret >= 0) {
-        multifd_send_sync_main(rs->f);
-        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
-        qemu_fflush(f);
+    if (ret < 0) {
+        return ret;
     }
 
-    return ret;
+    ret = multifd_send_sync_main(rs->f, false);
+    if (ret < 0) {
+        return ret;
+    }
+
+    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+    qemu_fflush(f);
+
+    return 0;
 }
 
 static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
diff --git a/migration/socket.c b/migration/socket.c
index e26e94aa0c..8e40e0a3fd 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -78,8 +78,13 @@ static void socket_outgoing_migration(QIOTask *task,
         trace_migration_socket_outgoing_connected(data->hostname);
     }
 
-    if (migrate_use_zerocopy()) {
-        error_setg(&err, "Zerocopy not available in migration");
+    if (migrate_use_zerocopy() &&
+        (!migrate_use_multifd() ||
+         !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY) ||
+          migrate_multifd_compression() != MULTIFD_COMPRESSION_NONE ||
+          migrate_use_tls())) {
+        error_setg(&err,
+                   "Zerocopy only available for non-compressed non-TLS multifd migration");
     }
 
     migration_channel_connect(data->s, sioc, data->hostname, err);
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 1/6] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks
  2021-11-12  5:10 ` [PATCH v5 1/6] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks Leonardo Bras
@ 2021-11-12 10:13   ` Daniel P. Berrangé
  2021-11-12 10:26     ` Daniel P. Berrangé
  2021-11-22 23:18     ` Leonardo Bras Soares Passos
  2021-11-12 10:56   ` Daniel P. Berrangé
  1 sibling, 2 replies; 45+ messages in thread
From: Daniel P. Berrangé @ 2021-11-12 10:13 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: qemu-devel, Markus Armbruster, Eric Blake,
	Dr. David Alan Gilbert, Juan Quintela

On Fri, Nov 12, 2021 at 02:10:36AM -0300, Leonardo Bras wrote:
> Adds io_writev_zerocopy and io_flush_zerocopy as optional callback to QIOChannelClass,
> allowing the implementation of zerocopy writes by subclasses.
> 
> How to use them:
> - Write data using qio_channel_writev_zerocopy(),
> - Wait write completion with qio_channel_flush_zerocopy().
> 
> Notes:
> As some zerocopy implementations work asynchronously, it's
> recommended to keep the write buffer untouched until the return of
> qio_channel_flush_zerocopy(), to avoid the risk of sending an updated
> buffer instead of the one at the write.
> 
> As the new callbacks are optional, if a subclass does not implement them, then:
> - io_writev_zerocopy will return -1,
> - io_flush_zerocopy will return 0 without changing anything.
> 
> Also, some functions like qio_channel_writev_full_all() were adapted to
> receive a flag parameter. That allows shared code between zerocopy and
> non-zerocopy writev.
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  include/io/channel.h | 93 ++++++++++++++++++++++++++++++++++++++------
>  io/channel.c         | 65 +++++++++++++++++++++++++------
>  2 files changed, 135 insertions(+), 23 deletions(-)
> 
> diff --git a/include/io/channel.h b/include/io/channel.h
> index 88988979f8..a19c09bb84 100644
> --- a/include/io/channel.h
> +++ b/include/io/channel.h
> @@ -32,12 +32,15 @@ OBJECT_DECLARE_TYPE(QIOChannel, QIOChannelClass,
>  
>  #define QIO_CHANNEL_ERR_BLOCK -2
>  
> +#define QIO_CHANNEL_WRITE_FLAG_ZEROCOPY 0x1
> +
>  typedef enum QIOChannelFeature QIOChannelFeature;
>  
>  enum QIOChannelFeature {
>      QIO_CHANNEL_FEATURE_FD_PASS,
>      QIO_CHANNEL_FEATURE_SHUTDOWN,
>      QIO_CHANNEL_FEATURE_LISTEN,
> +    QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY,
>  };
>  
>  
> @@ -136,6 +139,12 @@ struct QIOChannelClass {
>                                    IOHandler *io_read,
>                                    IOHandler *io_write,
>                                    void *opaque);
> +    ssize_t (*io_writev_zerocopy)(QIOChannel *ioc,
> +                                  const struct iovec *iov,
> +                                  size_t niov,
> +                                  Error **errp);
> +    int (*io_flush_zerocopy)(QIOChannel *ioc,
> +                             Error **errp);
>  };
>  
>  /* General I/O handling functions */
> @@ -321,10 +330,11 @@ int qio_channel_readv_all(QIOChannel *ioc,
>  
>  
>  /**
> - * qio_channel_writev_all:
> + * qio_channel_writev_all_flags:
>   * @ioc: the channel object
>   * @iov: the array of memory regions to write data from
>   * @niov: the length of the @iov array
> + * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
>   * @errp: pointer to a NULL-initialized error object
>   *
>   * Write data to the IO channel, reading it from the
> @@ -337,12 +347,23 @@ int qio_channel_readv_all(QIOChannel *ioc,
>   * to be written, yielding from the current coroutine
>   * if required.
>   *
> + * If QIO_CHANNEL_WRITE_FLAG_ZEROCOPY is passed in flags,
> + * instead of waiting for all requested data to be written,
> + * this function will wait until it's all queued for writing.
> + * In this case, if the buffer gets changed between queueing and
> + * sending, the updated buffer will be sent. If this is not a
> + * desired behavior, it's suggested to call qio_channel_flush_zerocopy()
> + * before reusing the buffer.
> + *
>   * Returns: 0 if all bytes were written, or -1 on error
>   */
> -int qio_channel_writev_all(QIOChannel *ioc,
> -                           const struct iovec *iov,
> -                           size_t niov,
> -                           Error **erp);
> +int qio_channel_writev_all_flags(QIOChannel *ioc,
> +                                 const struct iovec *iov,
> +                                 size_t niov,
> +                                 int flags,
> +                                 Error **errp);
> +#define qio_channel_writev_all(ioc, iov, niov, errp) \
> +    qio_channel_writev_all_flags(ioc, iov, niov, 0, errp)

We already have separate methods for zerocopy, instead of adding
flags, so we shouldn't add flags to this either.

Add a qio_channel_writev_zerocopy_all method instead.

Internally, we can still make both qio_channel_writev_zerocopy_all
and qio_channel_writev_all use the same helper method, just don't
expose flags in the public API. Even internally we don't really
need flags, just a bool

>  
>  /**
>   * qio_channel_readv:
> @@ -831,12 +852,13 @@ int qio_channel_readv_full_all(QIOChannel *ioc,
>                                 Error **errp);
>  
>  /**
> - * qio_channel_writev_full_all:
> + * qio_channel_writev_full_all_flags:
>   * @ioc: the channel object
>   * @iov: the array of memory regions to write data from
>   * @niov: the length of the @iov array
>   * @fds: an array of file handles to send
>   * @nfds: number of file handles in @fds
> + * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
>   * @errp: pointer to a NULL-initialized error object
>   *
>   *
> @@ -846,13 +868,62 @@ int qio_channel_readv_full_all(QIOChannel *ioc,
>   * to be written, yielding from the current coroutine
>   * if required.
>   *
> + * If QIO_CHANNEL_WRITE_FLAG_ZEROCOPY is passed in flags,
> + * instead of waiting for all requested data to be written,
> + * this function will wait until it's all queued for writing.
> + * In this case, if the buffer gets changed between queueing and
> + * sending, the updated buffer will be sent. If this is not a
> + * desired behavior, it's suggested to call qio_channel_flush_zerocopy()
> + * before reusing the buffer.
> + *
>   * Returns: 0 if all bytes were written, or -1 on error
>   */
>  
> -int qio_channel_writev_full_all(QIOChannel *ioc,
> -                                const struct iovec *iov,
> -                                size_t niov,
> -                                int *fds, size_t nfds,
> -                                Error **errp);
> +int qio_channel_writev_full_all_flags(QIOChannel *ioc,
> +                                      const struct iovec *iov,
> +                                      size_t niov,
> +                                      int *fds, size_t nfds,
> +                                      int flags, Error **errp);
> +#define qio_channel_writev_full_all(ioc, iov, niov, fds, nfds, errp) \
> +    qio_channel_writev_full_all_flags(ioc, iov, niov, fds, nfds, 0, errp)

There's no need for this at all. Since fd passing is not supported
with zerocopy, there's no reason to ever use this method.

> +/**
> + * qio_channel_writev_zerocopy:
> + * @ioc: the channel object
> + * @iov: the array of memory regions to write data from
> + * @niov: the length of the @iov array
> + * @errp: pointer to a NULL-initialized error object
> + *
> + * Behaves like qio_channel_writev_full_all_flags, but may write

qio_channel_writev

> + * data asynchronously while avoiding unnecessary data copy.
> + * This function may return before any data is actually written,
> + * but should queue every buffer for writing.

Callers mustn't rely on "should" docs - they must rely on the
return value indicating how many bytes were accepted.

> + *
> + * If at some point it's necessary to wait for all data to be
> + * written, use qio_channel_flush_zerocopy().
> + *
> + * If zerocopy is not available, returns -1 and set errp.
> + */
> +
> +ssize_t qio_channel_writev_zerocopy(QIOChannel *ioc,
> +                                    const struct iovec *iov,
> +                                    size_t niov,
> +                                    Error **errp);
> +
> +/**
> + * qio_channel_flush_zerocopy:
> + * @ioc: the channel object
> + * @errp: pointer to a NULL-initialized error object
> + *
> + * Will block until every packet queued with
> + * qio_channel_writev_zerocopy() is sent, or return
> + * in case of any error.
> + *
> + * Returns -1 if any error is found, 0 otherwise.
> + * If not implemented, acts as a no-op, and returns 0.
> + */
> +
> +int qio_channel_flush_zerocopy(QIOChannel *ioc,
> +                               Error **errp);
>  
>  #endif /* QIO_CHANNEL_H */
> diff --git a/io/channel.c b/io/channel.c
> index e8b019dc36..009da9b772 100644
> --- a/io/channel.c
> +++ b/io/channel.c
> @@ -212,19 +212,21 @@ int qio_channel_readv_full_all(QIOChannel *ioc,
>      return ret;
>  }
>  
> -int qio_channel_writev_all(QIOChannel *ioc,
> -                           const struct iovec *iov,
> -                           size_t niov,
> -                           Error **errp)
> +int qio_channel_writev_all_flags(QIOChannel *ioc,
> +                                 const struct iovec *iov,
> +                                 size_t niov,
> +                                 int flags,
> +                                 Error **errp)
>  {
> -    return qio_channel_writev_full_all(ioc, iov, niov, NULL, 0, errp);
> +    return qio_channel_writev_full_all_flags(ioc, iov, niov, NULL, 0, flags,
> +                                             errp);
>  }
>  
> -int qio_channel_writev_full_all(QIOChannel *ioc,
> -                                const struct iovec *iov,
> -                                size_t niov,
> -                                int *fds, size_t nfds,
> -                                Error **errp)
> +int qio_channel_writev_full_all_flags(QIOChannel *ioc,
> +                                      const struct iovec *iov,
> +                                      size_t niov,
> +                                      int *fds, size_t nfds,
> +                                      int flags, Error **errp)
>  {
>      int ret = -1;
>      struct iovec *local_iov = g_new(struct iovec, niov);
> @@ -237,8 +239,15 @@ int qio_channel_writev_full_all(QIOChannel *ioc,
>  
>      while (nlocal_iov > 0) {
>          ssize_t len;
> -        len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds, nfds,
> -                                      errp);
> +
> +        if (flags & QIO_CHANNEL_WRITE_FLAG_ZEROCOPY) {
> +            assert(fds == NULL && nfds == 0);
> +            len = qio_channel_writev_zerocopy(ioc, local_iov, nlocal_iov, errp);
> +        } else {
> +            len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds, nfds,
> +                                          errp);
> +        }
> +
>          if (len == QIO_CHANNEL_ERR_BLOCK) {
>              if (qemu_in_coroutine()) {
>                  qio_channel_yield(ioc, G_IO_OUT);
> @@ -474,6 +483,38 @@ off_t qio_channel_io_seek(QIOChannel *ioc,
>  }
>  
>  
> +ssize_t qio_channel_writev_zerocopy(QIOChannel *ioc,
> +                                    const struct iovec *iov,
> +                                    size_t niov,
> +                                    Error **errp)
> +{
> +    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
> +
> +    if (!klass->io_writev_zerocopy ||
> +        !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY)) {
> +        error_setg_errno(errp, EINVAL,
> +                         "Channel does not support zerocopy writev");
> +        return -1;
> +    }
> +
> +    return klass->io_writev_zerocopy(ioc, iov, niov, errp);
> +}
> +
> +
> +int qio_channel_flush_zerocopy(QIOChannel *ioc,
> +                               Error **errp)
> +{
> +    QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
> +
> +    if (!klass->io_flush_zerocopy ||
> +        !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY)) {
> +        return 0;
> +    }
> +
> +    return klass->io_flush_zerocopy(ioc, errp);
> +}
> +
> +
>  static void qio_channel_restart_read(void *opaque)
>  {
>      QIOChannel *ioc = opaque;
> -- 
> 2.33.1
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 2/6] QIOChannelSocket: Add flags parameter for writing
  2021-11-12  5:10 ` [PATCH v5 2/6] QIOChannelSocket: Add flags parameter for writing Leonardo Bras
@ 2021-11-12 10:15   ` Daniel P. Berrangé
  2021-11-23  5:33     ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 45+ messages in thread
From: Daniel P. Berrangé @ 2021-11-12 10:15 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: qemu-devel, Markus Armbruster, Eric Blake,
	Dr. David Alan Gilbert, Juan Quintela

On Fri, Nov 12, 2021 at 02:10:37AM -0300, Leonardo Bras wrote:
> Change qio_channel_socket_writev() in order to accept flags, so its possible
> to selectively make use of sendmsg() flags.
> 
> qio_channel_socket_writev() contents were moved to a helper function
> qio_channel_socket_writev_flags() which accepts an extra argument for flags.
> (This argument is passed directly to sendmsg().
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  io/channel-socket.c | 26 +++++++++++++++++++-------
>  1 file changed, 19 insertions(+), 7 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 1/6] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks
  2021-11-12 10:13   ` Daniel P. Berrangé
@ 2021-11-12 10:26     ` Daniel P. Berrangé
  2021-11-22 23:18     ` Leonardo Bras Soares Passos
  1 sibling, 0 replies; 45+ messages in thread
From: Daniel P. Berrangé @ 2021-11-12 10:26 UTC (permalink / raw)
  To: Leonardo Bras, qemu-devel, Markus Armbruster, Eric Blake,
	Dr. David Alan Gilbert, Juan Quintela

On Fri, Nov 12, 2021 at 10:13:01AM +0000, Daniel P. Berrangé wrote:
> On Fri, Nov 12, 2021 at 02:10:36AM -0300, Leonardo Bras wrote:
> > Adds io_writev_zerocopy and io_flush_zerocopy as optional callback to QIOChannelClass,
> > allowing the implementation of zerocopy writes by subclasses.
> > 
> > How to use them:
> > - Write data using qio_channel_writev_zerocopy(),
> > - Wait write completion with qio_channel_flush_zerocopy().
> > 
> > Notes:
> > As some zerocopy implementations work asynchronously, it's
> > recommended to keep the write buffer untouched until the return of
> > qio_channel_flush_zerocopy(), to avoid the risk of sending an updated
> > buffer instead of the one at the write.
> > 
> > As the new callbacks are optional, if a subclass does not implement them, then:
> > - io_writev_zerocopy will return -1,
> > - io_flush_zerocopy will return 0 without changing anything.
> > 
> > Also, some functions like qio_channel_writev_full_all() were adapted to
> > receive a flag parameter. That allows shared code between zerocopy and
> > non-zerocopy writev.
> > 
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> >  include/io/channel.h | 93 ++++++++++++++++++++++++++++++++++++++------
> >  io/channel.c         | 65 +++++++++++++++++++++++++------
> >  2 files changed, 135 insertions(+), 23 deletions(-)
> > 
> > diff --git a/include/io/channel.h b/include/io/channel.h
> > index 88988979f8..a19c09bb84 100644
> > --- a/include/io/channel.h
> > +++ b/include/io/channel.h

> > +/**
> > + * qio_channel_writev_zerocopy:
> > + * @ioc: the channel object
> > + * @iov: the array of memory regions to write data from
> > + * @niov: the length of the @iov array
> > + * @errp: pointer to a NULL-initialized error object
> > + *
> > + * Behaves like qio_channel_writev_full_all_flags, but may write
> 
> qio_channel_writev
> 
> > + * data asynchronously while avoiding unnecessary data copy.
> > + * This function may return before any data is actually written,
> > + * but should queue every buffer for writing.
> 
> Callers mustn't rely on "should" docs - they must rely on the
> return value indicating how many bytes were accepted.

Also mention that this requires locked memory and can/will fail if
insufficient locked memory is available.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 3/6] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX
  2021-11-12  5:10 ` [PATCH v5 3/6] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX Leonardo Bras
@ 2021-11-12 10:54   ` Daniel P. Berrangé
  2021-11-23  4:46     ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 45+ messages in thread
From: Daniel P. Berrangé @ 2021-11-12 10:54 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: qemu-devel, Markus Armbruster, Eric Blake,
	Dr. David Alan Gilbert, Juan Quintela

On Fri, Nov 12, 2021 at 02:10:38AM -0300, Leonardo Bras wrote:
> For CONFIG_LINUX, implement the new optional callbacks io_write_zerocopy and
> io_flush_zerocopy on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
> feature is available in the host kernel, which is checked on
> qio_channel_socket_connect_sync()
> 
> qio_channel_socket_flush_zerocopy() was implemented by counting how many times
> sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the
> socket's error queue, in order to find how many of them finished sending.
> Flush will loop until those counters are the same, or until some error occurs.
> 
> A new function qio_channel_socket_poll() was also created in order to avoid
> busy-looping recvmsg() in qio_channel_socket_flush_zerocopy() while waiting for
> updates in socket's error queue.
> 
> Notes on using writev_zerocopy():
> 1: Buffer
> - As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying,
> some caution is necessary to avoid overwriting any buffer before it's sent.
> If something like this happen, a newer version of the buffer may be sent instead.
> - If this is a problem, it's recommended to call flush_zerocopy() before freeing
> or re-using the buffer.
> 
> 2: Locked memory
> - When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and
> unlocked after it's sent.
> - Depending on the size of each buffer, and how often it's sent, it may require
> a larger amount of locked memory than usually available to non-root user.
> - If the required amount of locked memory is not available, writev_zerocopy
> will return an error, which can abort an operation like migration,
> - Because of this, when an user code wants to add zerocopy as a feature, it
> requires a mechanism to disable it, so it can still be accessible to less
> privileged users.
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  include/io/channel-socket.h |   2 +
>  include/io/channel.h        |   1 +
>  io/channel-socket.c         | 150 +++++++++++++++++++++++++++++++++++-
>  3 files changed, 150 insertions(+), 3 deletions(-)
> 
> diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
> index e747e63514..81d04baa4c 100644
> --- a/include/io/channel-socket.h
> +++ b/include/io/channel-socket.h
> @@ -47,6 +47,8 @@ struct QIOChannelSocket {
>      socklen_t localAddrLen;
>      struct sockaddr_storage remoteAddr;
>      socklen_t remoteAddrLen;
> +    ssize_t zerocopy_queued;
> +    ssize_t zerocopy_sent;
>  };
>  
>  
> diff --git a/include/io/channel.h b/include/io/channel.h
> index a19c09bb84..051fff4197 100644
> --- a/include/io/channel.h
> +++ b/include/io/channel.h
> @@ -31,6 +31,7 @@ OBJECT_DECLARE_TYPE(QIOChannel, QIOChannelClass,
>  
>  
>  #define QIO_CHANNEL_ERR_BLOCK -2
> +#define QIO_CHANNEL_ERR_NOBUFS -3
>  
>  #define QIO_CHANNEL_WRITE_FLAG_ZEROCOPY 0x1
>  
> diff --git a/io/channel-socket.c b/io/channel-socket.c
> index b57a27bf91..c724b849ad 100644
> --- a/io/channel-socket.c
> +++ b/io/channel-socket.c
> @@ -26,6 +26,10 @@
>  #include "io/channel-watch.h"
>  #include "trace.h"
>  #include "qapi/clone-visitor.h"
> +#ifdef CONFIG_LINUX
> +#include <linux/errqueue.h>
> +#include <poll.h>
> +#endif
>  
>  #define SOCKET_MAX_FDS 16
>  
> @@ -55,6 +59,8 @@ qio_channel_socket_new(void)
>  
>      sioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET));
>      sioc->fd = -1;
> +    sioc->zerocopy_queued = 0;
> +    sioc->zerocopy_sent = 0;
>  
>      ioc = QIO_CHANNEL(sioc);
>      qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN);
> @@ -140,6 +146,7 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
>                                      Error **errp)
>  {
>      int fd;
> +    int ret, v = 1;
>  
>      trace_qio_channel_socket_connect_sync(ioc, addr);
>      fd = socket_connect(addr, errp);
> @@ -154,6 +161,15 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
>          return -1;
>      }
>  
> +#ifdef CONFIG_LINUX
> +    ret = qemu_setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v));
> +    if (ret == 0) {
> +        /* Zerocopy available on host */
> +        qio_channel_set_feature(QIO_CHANNEL(ioc),
> +                                QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY);
> +    }
> +#endif
> +
>      return 0;
>  }
>  
> @@ -561,12 +577,15 @@ static ssize_t qio_channel_socket_writev_flags(QIOChannel *ioc,
>   retry:
>      ret = sendmsg(sioc->fd, &msg, flags);
>      if (ret <= 0) {
> -        if (errno == EAGAIN) {
> +        switch (errno) {
> +        case EAGAIN:
>              return QIO_CHANNEL_ERR_BLOCK;
> -        }
> -        if (errno == EINTR) {
> +        case EINTR:
>              goto retry;
> +        case ENOBUFS:
> +            return QIO_CHANNEL_ERR_NOBUFS;

Why does ENOBUFS need handling separately instead of letting
the error_setg_errno below handle it ?

The caller immediately invokes error_setg_errno() again,
just with different error message.

No code in this series ever looks at QIO_CHANNEL_ERR_NOBUFS
either, so we don't even need that special error return code
added AFAICT ?

>          }
> +
>          error_setg_errno(errp, errno,
>                           "Unable to write to socket");
>          return -1;
> @@ -670,6 +689,127 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
>  }
>  #endif /* WIN32 */
>  
> +
> +#ifdef CONFIG_LINUX
> +
> +static int qio_channel_socket_poll(QIOChannelSocket *sioc, bool zerocopy,
> +                                   Error **errp)

There's only one caller and it always passes zerocopy=true,
so this parmeter looks pointless.

> +{
> +    struct pollfd pfd;
> +    int ret;
> +
> +    pfd.fd = sioc->fd;
> +    pfd.events = 0;
> +
> + retry:
> +    ret = poll(&pfd, 1, -1);
> +    if (ret < 0) {
> +        switch (errno) {
> +        case EAGAIN:
> +        case EINTR:
> +            goto retry;
> +        default:
> +            error_setg_errno(errp, errno,
> +                             "Poll error");
> +            return ret;

       return -1;

> +        }
> +    }
> +
> +    if (pfd.revents & (POLLHUP | POLLNVAL)) {
> +        error_setg(errp, "Poll error: Invalid or disconnected fd");
> +        return -1;
> +    }
> +
> +    if (!zerocopy && (pfd.revents & POLLERR)) {
> +        error_setg(errp, "Poll error: Errors present in errqueue");
> +        return -1;
> +    }

> +
> +    return ret;

  return 0;

> +}
> +
> +static ssize_t qio_channel_socket_writev_zerocopy(QIOChannel *ioc,
> +                                                  const struct iovec *iov,
> +                                                  size_t niov,
> +                                                  Error **errp)
> +{
> +    QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
> +    ssize_t ret;
> +
> +    ret = qio_channel_socket_writev_flags(ioc, iov, niov, NULL, 0,
> +                                          MSG_ZEROCOPY, errp);
> +    if (ret == QIO_CHANNEL_ERR_NOBUFS) {
> +        error_setg_errno(errp, errno,
> +                         "Process can't lock enough memory for using MSG_ZEROCOPY");

This should not be touching errno - the method should be setting the
errp directly, not leaving it to the caller.

> +        return -1;
> +    }
> +
> +    sioc->zerocopy_queued++;

 if (ret > 0)
    sio->zerocopy_queued++

since the kernel doesn't increase the counter if the data sent
was zero length. A caller shouldn't be passing us a zero length
iov data element, but it is wise to be cautious

> +    return ret;
> +}
> +
> +static int qio_channel_socket_flush_zerocopy(QIOChannel *ioc,
> +                                             Error **errp)
> +{
> +    QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
> +    struct msghdr msg = {};
> +    struct sock_extended_err *serr;
> +    struct cmsghdr *cm;
> +    char control[CMSG_SPACE(sizeof(*serr))];
> +    int ret;

Add

   int rv = 0;

> +
> +    msg.msg_control = control;
> +    msg.msg_controllen = sizeof(control);
> +    memset(control, 0, sizeof(control));
> +
> +    while (sioc->zerocopy_sent < sioc->zerocopy_queued) {
> +        ret = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE);
> +        if (ret < 0) {
> +            switch (errno) {
> +            case EAGAIN:
> +                /* Nothing on errqueue, wait until something is available */
> +                ret = qio_channel_socket_poll(sioc, true, errp);
> +                if (ret < 0) {
> +                    return -1;
> +                }
> +                continue;
> +            case EINTR:
> +                continue;
> +            default:
> +                error_setg_errno(errp, errno,
> +                                 "Unable to read errqueue");
> +                return -1;
> +            }
> +        }
> +
> +        cm = CMSG_FIRSTHDR(&msg);
> +        if (cm->cmsg_level != SOL_IP &&
> +            cm->cmsg_type != IP_RECVERR) {
> +            error_setg_errno(errp, EPROTOTYPE,
> +                             "Wrong cmsg in errqueue");
> +            return -1;
> +        }
> +
> +        serr = (void *) CMSG_DATA(cm);
> +        if (serr->ee_errno != SO_EE_ORIGIN_NONE) {
> +            error_setg_errno(errp, serr->ee_errno,
> +                             "Error on socket");
> +            return -1;
> +        }
> +        if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) {
> +            error_setg_errno(errp, serr->ee_origin,
> +                             "Error not from zerocopy");
> +            return -1;
> +        }
> +
> +        /* No errors, count successfully finished sendmsg()*/
> +        sioc->zerocopy_sent += serr->ee_data - serr->ee_info + 1;

Here add


     if (ee_code ==  SO_EE_CODE_ZEROCOPY_COPIED)
        rv = 1;

> +    }
> +    return 0;

return rv;

> +}



Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 1/6] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks
  2021-11-12  5:10 ` [PATCH v5 1/6] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks Leonardo Bras
  2021-11-12 10:13   ` Daniel P. Berrangé
@ 2021-11-12 10:56   ` Daniel P. Berrangé
  1 sibling, 0 replies; 45+ messages in thread
From: Daniel P. Berrangé @ 2021-11-12 10:56 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: qemu-devel, Markus Armbruster, Eric Blake,
	Dr. David Alan Gilbert, Juan Quintela

On Fri, Nov 12, 2021 at 02:10:36AM -0300, Leonardo Bras wrote:
> Adds io_writev_zerocopy and io_flush_zerocopy as optional callback to QIOChannelClass,
> allowing the implementation of zerocopy writes by subclasses.
> 
> How to use them:
> - Write data using qio_channel_writev_zerocopy(),
> - Wait write completion with qio_channel_flush_zerocopy().
> 
> Notes:
> As some zerocopy implementations work asynchronously, it's
> recommended to keep the write buffer untouched until the return of
> qio_channel_flush_zerocopy(), to avoid the risk of sending an updated
> buffer instead of the one at the write.
> 
> As the new callbacks are optional, if a subclass does not implement them, then:
> - io_writev_zerocopy will return -1,
> - io_flush_zerocopy will return 0 without changing anything.
> 
> Also, some functions like qio_channel_writev_full_all() were adapted to
> receive a flag parameter. That allows shared code between zerocopy and
> non-zerocopy writev.
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  include/io/channel.h | 93 ++++++++++++++++++++++++++++++++++++++------
>  io/channel.c         | 65 +++++++++++++++++++++++++------
>  2 files changed, 135 insertions(+), 23 deletions(-)
> 

> +/**
> + * qio_channel_flush_zerocopy:
> + * @ioc: the channel object
> + * @errp: pointer to a NULL-initialized error object
> + *
> + * Will block until every packet queued with
> + * qio_channel_writev_zerocopy() is sent, or return
> + * in case of any error.
> + *
> + * Returns -1 if any error is found, 0 otherwise.

  Returns -1 if any error is found, 0 if all data was sent,
           or 1 if all data was sent but at least some was copied.


> + * If not implemented, acts as a no-op, and returns 0.
> + */
> +
> +int qio_channel_flush_zerocopy(QIOChannel *ioc,
> +                               Error **errp);

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 5/6] migration: Add migrate_use_tls() helper
  2021-11-12  5:10 ` [PATCH v5 5/6] migration: Add migrate_use_tls() helper Leonardo Bras
@ 2021-11-12 11:04   ` Juan Quintela
  2021-11-30 19:00     ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 45+ messages in thread
From: Juan Quintela @ 2021-11-12 11:04 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: qemu-devel, Eric Blake, Daniel P. Berrangé,
	Dr. David Alan Gilbert, Markus Armbruster

Leonardo Bras <leobras@redhat.com> wrote:
> A lot of places check parameters.tls_creds in order to evaluate if TLS is
> in use, and sometimes call migrate_get_current() just for that test.
>
> Add new helper function migrate_use_tls() in order to simplify testing
> for TLS usage.
>
> Signed-off-by: Leonardo Bras <leobras@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 4/6] migration: Add zerocopy parameter for QMP/HMP for Linux
  2021-11-12  5:10 ` [PATCH v5 4/6] migration: Add zerocopy parameter for QMP/HMP for Linux Leonardo Bras
@ 2021-11-12 11:04   ` Juan Quintela
  2021-11-12 11:08     ` Daniel P. Berrangé
                       ` (2 more replies)
  2021-11-12 11:05   ` Daniel P. Berrangé
  1 sibling, 3 replies; 45+ messages in thread
From: Juan Quintela @ 2021-11-12 11:04 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: qemu-devel, Eric Blake, Daniel P. Berrangé,
	Dr. David Alan Gilbert, Markus Armbruster

Leonardo Bras <leobras@redhat.com> wrote:
> Add property that allows zerocopy migration of memory pages,
> and also includes a helper function migrate_use_zerocopy() to check
> if it's enabled.
>
> No code is introduced to actually do the migration, but it allow
> future implementations to enable/disable this feature.
>
> On non-Linux builds this parameter is compiled-out.
>
> Signed-off-by: Leonardo Bras <leobras@redhat.com>

Hi

> +# @zerocopy: Controls behavior on sending memory pages on migration.
> +#            When true, enables a zerocopy mechanism for sending memory
> +#            pages, if host supports it.
> +#            Defaults to false. (Since 6.2)
> +#

This needs to be changed to next release, but not big deal.


> +#ifdef CONFIG_LINUX
> +int migrate_use_zerocopy(void);

Please, return bool

> +#else
> +#define migrate_use_zerocopy() (0)
> +#endif

and false here.

I know, I know.  We are not consistent here, but the preffered way is
the other way.

>  int migrate_use_xbzrle(void);
>  uint64_t migrate_xbzrle_cache_size(void);
>  bool migrate_colo_enabled(void);
> diff --git a/migration/migration.c b/migration/migration.c
> index abaf6f9e3d..add3dabc56 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -886,6 +886,10 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
>      params->multifd_zlib_level = s->parameters.multifd_zlib_level;
>      params->has_multifd_zstd_level = true;
>      params->multifd_zstd_level = s->parameters.multifd_zstd_level;
> +#ifdef CONFIG_LINUX
> +    params->has_zerocopy = true;
> +    params->zerocopy = s->parameters.zerocopy;
> +#endif
>      params->has_xbzrle_cache_size = true;
>      params->xbzrle_cache_size = s->parameters.xbzrle_cache_size;
>      params->has_max_postcopy_bandwidth = true;
> @@ -1538,6 +1542,11 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
>      if (params->has_multifd_compression) {
>          dest->multifd_compression = params->multifd_compression;
>      }
> +#ifdef CONFIG_LINUX
> +    if (params->has_zerocopy) {
> +        dest->zerocopy = params->zerocopy;
> +    }
> +#endif
>      if (params->has_xbzrle_cache_size) {
>          dest->xbzrle_cache_size = params->xbzrle_cache_size;
>      }
> @@ -1650,6 +1659,11 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
>      if (params->has_multifd_compression) {
>          s->parameters.multifd_compression = params->multifd_compression;
>      }
> +#ifdef CONFIG_LINUX
> +    if (params->has_zerocopy) {
> +        s->parameters.zerocopy = params->zerocopy;
> +    }
> +#endif

After seing all this CONFIG_LINUX mess, I am not sure that it is a good
idea to add the parameter only for LINUX.  It appears that it is better
to add it for all OS's and just not allow to set it to true there.

But If QAPI/QOM people preffer that way, I am not going to get into the middle.

> diff --git a/migration/multifd.c b/migration/multifd.c
> index 7c9deb1921..ab8f0f97be 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -854,16 +854,17 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
>      trace_multifd_new_send_channel_async(p->id);
>      if (qio_task_propagate_error(task, &local_err)) {
>          goto cleanup;
> -    } else {
> -        p->c = QIO_CHANNEL(sioc);
> -        qio_channel_set_delay(p->c, false);
> -        p->running = true;
> -        if (!multifd_channel_connect(p, sioc, local_err)) {
> -            goto cleanup;
> -        }
> -        return;
>      }
>  
> +    p->c = QIO_CHANNEL(sioc);
> +    qio_channel_set_delay(p->c, false);
> +    p->running = true;
> +    if (!multifd_channel_connect(p, sioc, local_err)) {
> +        goto cleanup;
> +    }
> +
> +    return;
> +
>  cleanup:
>      multifd_new_send_channel_cleanup(p, sioc, local_err);
>  }

As far as I can see, this chunk is a NOP, and it don't belong to this patch.

Later, Juan.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 4/6] migration: Add zerocopy parameter for QMP/HMP for Linux
  2021-11-12  5:10 ` [PATCH v5 4/6] migration: Add zerocopy parameter for QMP/HMP for Linux Leonardo Bras
  2021-11-12 11:04   ` Juan Quintela
@ 2021-11-12 11:05   ` Daniel P. Berrangé
  2021-12-01 19:05     ` Leonardo Bras Soares Passos
  1 sibling, 1 reply; 45+ messages in thread
From: Daniel P. Berrangé @ 2021-11-12 11:05 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: qemu-devel, Markus Armbruster, Eric Blake,
	Dr. David Alan Gilbert, Juan Quintela

On Fri, Nov 12, 2021 at 02:10:39AM -0300, Leonardo Bras wrote:
> Add property that allows zerocopy migration of memory pages,
> and also includes a helper function migrate_use_zerocopy() to check
> if it's enabled.
> 
> No code is introduced to actually do the migration, but it allow
> future implementations to enable/disable this feature.
> 
> On non-Linux builds this parameter is compiled-out.
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  qapi/migration.json   | 18 ++++++++++++++++++
>  migration/migration.h |  5 +++++
>  migration/migration.c | 32 ++++++++++++++++++++++++++++++++
>  migration/multifd.c   | 17 +++++++++--------
>  migration/socket.c    |  5 +++++
>  monitor/hmp-cmds.c    |  6 ++++++
>  6 files changed, 75 insertions(+), 8 deletions(-)
> 
> diff --git a/qapi/migration.json b/qapi/migration.json
> index bbfd48cf0b..9534c299d7 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -730,6 +730,11 @@
>  #                      will consume more CPU.
>  #                      Defaults to 1. (Since 5.0)
>  #
> +# @zerocopy: Controls behavior on sending memory pages on migration.
> +#            When true, enables a zerocopy mechanism for sending memory
> +#            pages, if host supports it.
> +#            Defaults to false. (Since 6.2)

Add

   Requires that QEMU be permitted to use locked memory for guest
   RAM pages.

Also 7.0 since this has missed the 6.2 deadline.


Both these notes apply to later in this file too



> diff --git a/migration/multifd.c b/migration/multifd.c
> index 7c9deb1921..ab8f0f97be 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -854,16 +854,17 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
>      trace_multifd_new_send_channel_async(p->id);
>      if (qio_task_propagate_error(task, &local_err)) {
>          goto cleanup;
> -    } else {
> -        p->c = QIO_CHANNEL(sioc);
> -        qio_channel_set_delay(p->c, false);
> -        p->running = true;
> -        if (!multifd_channel_connect(p, sioc, local_err)) {
> -            goto cleanup;
> -        }
> -        return;
>      }
>  
> +    p->c = QIO_CHANNEL(sioc);
> +    qio_channel_set_delay(p->c, false);
> +    p->running = true;
> +    if (!multifd_channel_connect(p, sioc, local_err)) {
> +        goto cleanup;
> +    }
> +
> +    return;
> +
>  cleanup:
>      multifd_new_send_channel_cleanup(p, sioc, local_err);
>  }

This change is just a code style alteration with no relation to
zerocopy. Either remove it, or do this change in its own patch
seprate from zerocopy.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 4/6] migration: Add zerocopy parameter for QMP/HMP for Linux
  2021-11-12 11:04   ` Juan Quintela
@ 2021-11-12 11:08     ` Daniel P. Berrangé
  2021-11-12 11:59       ` Markus Armbruster
  2021-11-12 12:01     ` Markus Armbruster
  2021-12-01 18:51     ` Leonardo Bras Soares Passos
  2 siblings, 1 reply; 45+ messages in thread
From: Daniel P. Berrangé @ 2021-11-12 11:08 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Leonardo Bras, Eric Blake, Dr. David Alan Gilbert,
	Markus Armbruster

On Fri, Nov 12, 2021 at 12:04:33PM +0100, Juan Quintela wrote:
> Leonardo Bras <leobras@redhat.com> wrote:
> > Add property that allows zerocopy migration of memory pages,
> > and also includes a helper function migrate_use_zerocopy() to check
> > if it's enabled.
> >
> > No code is introduced to actually do the migration, but it allow
> > future implementations to enable/disable this feature.
> >
> > On non-Linux builds this parameter is compiled-out.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> 
> Hi
> 
> > +# @zerocopy: Controls behavior on sending memory pages on migration.
> > +#            When true, enables a zerocopy mechanism for sending memory
> > +#            pages, if host supports it.
> > +#            Defaults to false. (Since 6.2)
> > +#
> 
> This needs to be changed to next release, but not big deal.
> 
> 
> > +#ifdef CONFIG_LINUX
> > +int migrate_use_zerocopy(void);
> 
> Please, return bool
> 
> > +#else
> > +#define migrate_use_zerocopy() (0)
> > +#endif
> 
> and false here.
> 
> I know, I know.  We are not consistent here, but the preffered way is
> the other way.
> 
> >  int migrate_use_xbzrle(void);
> >  uint64_t migrate_xbzrle_cache_size(void);
> >  bool migrate_colo_enabled(void);
> > diff --git a/migration/migration.c b/migration/migration.c
> > index abaf6f9e3d..add3dabc56 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -886,6 +886,10 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
> >      params->multifd_zlib_level = s->parameters.multifd_zlib_level;
> >      params->has_multifd_zstd_level = true;
> >      params->multifd_zstd_level = s->parameters.multifd_zstd_level;
> > +#ifdef CONFIG_LINUX
> > +    params->has_zerocopy = true;
> > +    params->zerocopy = s->parameters.zerocopy;
> > +#endif
> >      params->has_xbzrle_cache_size = true;
> >      params->xbzrle_cache_size = s->parameters.xbzrle_cache_size;
> >      params->has_max_postcopy_bandwidth = true;
> > @@ -1538,6 +1542,11 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
> >      if (params->has_multifd_compression) {
> >          dest->multifd_compression = params->multifd_compression;
> >      }
> > +#ifdef CONFIG_LINUX
> > +    if (params->has_zerocopy) {
> > +        dest->zerocopy = params->zerocopy;
> > +    }
> > +#endif
> >      if (params->has_xbzrle_cache_size) {
> >          dest->xbzrle_cache_size = params->xbzrle_cache_size;
> >      }
> > @@ -1650,6 +1659,11 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
> >      if (params->has_multifd_compression) {
> >          s->parameters.multifd_compression = params->multifd_compression;
> >      }
> > +#ifdef CONFIG_LINUX
> > +    if (params->has_zerocopy) {
> > +        s->parameters.zerocopy = params->zerocopy;
> > +    }
> > +#endif
> 
> After seing all this CONFIG_LINUX mess, I am not sure that it is a good
> idea to add the parameter only for LINUX.  It appears that it is better
> to add it for all OS's and just not allow to set it to true there.
> 
> But If QAPI/QOM people preffer that way, I am not going to get into the middle.

I don't like all the conditionals either, but QAPI design wants the
conditionals, as that allows mgmt apps to query whether the feature
is supported in a build or not.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 4/6] migration: Add zerocopy parameter for QMP/HMP for Linux
  2021-11-12 11:08     ` Daniel P. Berrangé
@ 2021-11-12 11:59       ` Markus Armbruster
  2021-12-01 19:07         ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 45+ messages in thread
From: Markus Armbruster @ 2021-11-12 11:59 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Leonardo Bras, Eric Blake, Dr. David Alan Gilbert,
	Juan Quintela

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Fri, Nov 12, 2021 at 12:04:33PM +0100, Juan Quintela wrote:
>> Leonardo Bras <leobras@redhat.com> wrote:

[...]

>> > diff --git a/migration/migration.c b/migration/migration.c
>> > index abaf6f9e3d..add3dabc56 100644
>> > --- a/migration/migration.c
>> > +++ b/migration/migration.c
>> > @@ -886,6 +886,10 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
>> >      params->multifd_zlib_level = s->parameters.multifd_zlib_level;
>> >      params->has_multifd_zstd_level = true;
>> >      params->multifd_zstd_level = s->parameters.multifd_zstd_level;
>> > +#ifdef CONFIG_LINUX
>> > +    params->has_zerocopy = true;
>> > +    params->zerocopy = s->parameters.zerocopy;
>> > +#endif
>> >      params->has_xbzrle_cache_size = true;
>> >      params->xbzrle_cache_size = s->parameters.xbzrle_cache_size;
>> >      params->has_max_postcopy_bandwidth = true;
>> > @@ -1538,6 +1542,11 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
>> >      if (params->has_multifd_compression) {
>> >          dest->multifd_compression = params->multifd_compression;
>> >      }
>> > +#ifdef CONFIG_LINUX
>> > +    if (params->has_zerocopy) {
>> > +        dest->zerocopy = params->zerocopy;
>> > +    }
>> > +#endif
>> >      if (params->has_xbzrle_cache_size) {
>> >          dest->xbzrle_cache_size = params->xbzrle_cache_size;
>> >      }
>> > @@ -1650,6 +1659,11 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
>> >      if (params->has_multifd_compression) {
>> >          s->parameters.multifd_compression = params->multifd_compression;
>> >      }
>> > +#ifdef CONFIG_LINUX
>> > +    if (params->has_zerocopy) {
>> > +        s->parameters.zerocopy = params->zerocopy;
>> > +    }
>> > +#endif
>> 
>> After seing all this CONFIG_LINUX mess, I am not sure that it is a good
>> idea to add the parameter only for LINUX.  It appears that it is better
>> to add it for all OS's and just not allow to set it to true there.
>> 
>> But If QAPI/QOM people preffer that way, I am not going to get into the middle.
>
> I don't like all the conditionals either, but QAPI design wants the
> conditionals, as that allows mgmt apps to query whether the feature
> is supported in a build or not.

Specifically, the conditionals keep @zerocopy out of query-qmp-schema
(a.k.a. schema introspection) when it's not actually supported.

This lets management applications recognize zero-copy support.

Without conditionals, the only way to probe for it is trying to switch
it on.  This is inconvenient and error-prone.

Immature ideas to avoid conditionals:

1. Make *values* conditional, i.e. unconditional false, but true only if
CONFIG_LINUX.  The QAPI schema language lets you do this for
enumerations today, but not for bool.

2. A new kind of conditional that only applies to schema introspection,
so you can eat your introspection cake and keep the #ifdef-less code
cake (and the slight binary bloat that comes with it).



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 4/6] migration: Add zerocopy parameter for QMP/HMP for Linux
  2021-11-12 11:04   ` Juan Quintela
  2021-11-12 11:08     ` Daniel P. Berrangé
@ 2021-11-12 12:01     ` Markus Armbruster
  2021-12-02  4:31       ` Leonardo Bras Soares Passos
  2021-12-01 18:51     ` Leonardo Bras Soares Passos
  2 siblings, 1 reply; 45+ messages in thread
From: Markus Armbruster @ 2021-11-12 12:01 UTC (permalink / raw)
  To: Juan Quintela
  Cc: Eric Blake, Leonardo Bras, Daniel P. Berrangé,
	Dr. David Alan Gilbert, qemu-devel

Juan Quintela <quintela@redhat.com> writes:

> Leonardo Bras <leobras@redhat.com> wrote:
>> Add property that allows zerocopy migration of memory pages,
>> and also includes a helper function migrate_use_zerocopy() to check
>> if it's enabled.
>>
>> No code is introduced to actually do the migration, but it allow
>> future implementations to enable/disable this feature.
>>
>> On non-Linux builds this parameter is compiled-out.
>>
>> Signed-off-by: Leonardo Bras <leobras@redhat.com>
>
> Hi
>
>> +# @zerocopy: Controls behavior on sending memory pages on migration.
>> +#            When true, enables a zerocopy mechanism for sending memory
>> +#            pages, if host supports it.
>> +#            Defaults to false. (Since 6.2)
>> +#
>
> This needs to be changed to next release, but not big deal.

Rename to zero-copy while there.  QAPI/QMP strongly prefer separating
words with dashes.  "zerocopy" is not a word, "zero" and "copy" are.

[...]



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 6/6] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-11-12  5:10 ` [PATCH v5 6/6] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy) Leonardo Bras
@ 2021-11-16 16:08   ` Juan Quintela
  2021-11-16 16:17     ` Daniel P. Berrangé
                       ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Juan Quintela @ 2021-11-16 16:08 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: qemu-devel, Eric Blake, Daniel P. Berrangé,
	Dr. David Alan Gilbert, Markus Armbruster

Leonardo Bras <leobras@redhat.com> wrote:
> Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> zerocopy interface.
>
> Change multifd_send_sync_main() so it can distinguish each iteration sync from
> the setup and the completion, so a flush_zerocopy() can be called
> at the after each iteration in order to make sure all dirty pages are sent
> before a new iteration is started.
>
> Also make it return -1 if flush_zerocopy() fails, in order to cancel
> the migration process, and avoid resuming the guest in the target host
> without receiving all current RAM.
>
> This will work fine on RAM migration because the RAM pages are not usually freed,
> and there is no problem on changing the pages content between async_send() and
> the actual sending of the buffer, because this change will dirty the page and
> cause it to be re-sent on a next iteration anyway.
>
> Given a lot of locked memory may be needed in order to use multid migration
> with zerocopy enabled, make it optional by creating a new migration parameter
> "zerocopy" on qapi, so low-privileged users can still perform multifd
> migrations.

How much memory can a non-root program use by default?


>  static void *multifd_send_thread(void *opaque)
> @@ -853,6 +875,10 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
>          goto cleanup;
>      }
>  
> +    if (migrate_use_zerocopy()) {
> +        p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
> +    }

This belongs


>      p->c = QIO_CHANNEL(sioc);
>      qio_channel_set_delay(p->c, false);
>      p->running = true;
> @@ -918,6 +944,7 @@ int multifd_save_setup(Error **errp)
>          p->packet->version = cpu_to_be32(MULTIFD_VERSION);
>          p->name = g_strdup_printf("multifdsend_%d", i);
>          p->tls_hostname = g_strdup(s->hostname);
> +        p->write_flags = 0;

here?

>          socket_send_channel_create(multifd_new_send_channel_async, p);
>      }
> diff --git a/migration/socket.c b/migration/socket.c
> index e26e94aa0c..8e40e0a3fd 100644
> --- a/migration/socket.c
> +++ b/migration/socket.c
> @@ -78,8 +78,13 @@ static void socket_outgoing_migration(QIOTask *task,
>          trace_migration_socket_outgoing_connected(data->hostname);
>      }
>  
> -    if (migrate_use_zerocopy()) {
> -        error_setg(&err, "Zerocopy not available in migration");
> +    if (migrate_use_zerocopy() &&
> +        (!migrate_use_multifd() ||
> +         !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY) ||
> +          migrate_multifd_compression() != MULTIFD_COMPRESSION_NONE ||
> +          migrate_use_tls())) {
> +        error_setg(&err,
> +                   "Zerocopy only available for non-compressed non-TLS multifd migration");
>      }
>  
>      migration_channel_connect(data->s, sioc, data->hostname, err);

Do we really want to do this check here?  I think this is really too
late.

You are not patching migrate_params_check().

I think that the proper way of doing this is something like:

    if (params->zerocopy &&
        (params->parameters.multifd_compression != MULTIFD_COMPRESSION_NONE ||
         migrate_use_tls())) {
           error_setg(&err,
                     "Zerocopy only available for non-compressed non-TLS multifd migration");
        return false;
    }

You have to do the equivalent of multifd_compression and tls enablement,
to see that zerocopy is not enabled, of course.

I would prefer to check for QIO_CHANNEL_FEATUR_WRITE_ZEROCPY there, but
I can't see a way of doing that without a qio.

Later, Juan.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 6/6] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-11-16 16:08   ` Juan Quintela
@ 2021-11-16 16:17     ` Daniel P. Berrangé
  2021-11-16 16:34       ` Juan Quintela
  2021-11-16 16:34       ` Daniel P. Berrangé
  2021-12-02  6:47     ` Leonardo Bras Soares Passos
  2021-12-09  8:51     ` Leonardo Bras Soares Passos
  2 siblings, 2 replies; 45+ messages in thread
From: Daniel P. Berrangé @ 2021-11-16 16:17 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Leonardo Bras, Eric Blake, Dr. David Alan Gilbert,
	Markus Armbruster

On Tue, Nov 16, 2021 at 05:08:06PM +0100, Juan Quintela wrote:
> Leonardo Bras <leobras@redhat.com> wrote:
> > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> > zerocopy interface.
> >
> > Change multifd_send_sync_main() so it can distinguish each iteration sync from
> > the setup and the completion, so a flush_zerocopy() can be called
> > at the after each iteration in order to make sure all dirty pages are sent
> > before a new iteration is started.
> >
> > Also make it return -1 if flush_zerocopy() fails, in order to cancel
> > the migration process, and avoid resuming the guest in the target host
> > without receiving all current RAM.
> >
> > This will work fine on RAM migration because the RAM pages are not usually freed,
> > and there is no problem on changing the pages content between async_send() and
> > the actual sending of the buffer, because this change will dirty the page and
> > cause it to be re-sent on a next iteration anyway.
> >
> > Given a lot of locked memory may be needed in order to use multid migration
> > with zerocopy enabled, make it optional by creating a new migration parameter
> > "zerocopy" on qapi, so low-privileged users can still perform multifd
> > migrations.
> 
> How much memory can a non-root program use by default?
> 
> 
> >  static void *multifd_send_thread(void *opaque)
> > @@ -853,6 +875,10 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> >          goto cleanup;
> >      }
> >  
> > +    if (migrate_use_zerocopy()) {
> > +        p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
> > +    }
> 
> This belongs
> 
> 
> >      p->c = QIO_CHANNEL(sioc);
> >      qio_channel_set_delay(p->c, false);
> >      p->running = true;
> > @@ -918,6 +944,7 @@ int multifd_save_setup(Error **errp)
> >          p->packet->version = cpu_to_be32(MULTIFD_VERSION);
> >          p->name = g_strdup_printf("multifdsend_%d", i);
> >          p->tls_hostname = g_strdup(s->hostname);
> > +        p->write_flags = 0;
> 
> here?
> 
> >          socket_send_channel_create(multifd_new_send_channel_async, p);
> >      }
> > diff --git a/migration/socket.c b/migration/socket.c
> > index e26e94aa0c..8e40e0a3fd 100644
> > --- a/migration/socket.c
> > +++ b/migration/socket.c
> > @@ -78,8 +78,13 @@ static void socket_outgoing_migration(QIOTask *task,
> >          trace_migration_socket_outgoing_connected(data->hostname);
> >      }
> >  
> > -    if (migrate_use_zerocopy()) {
> > -        error_setg(&err, "Zerocopy not available in migration");
> > +    if (migrate_use_zerocopy() &&
> > +        (!migrate_use_multifd() ||
> > +         !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY) ||
> > +          migrate_multifd_compression() != MULTIFD_COMPRESSION_NONE ||
> > +          migrate_use_tls())) {
> > +        error_setg(&err,
> > +                   "Zerocopy only available for non-compressed non-TLS multifd migration");
> >      }
> >  
> >      migration_channel_connect(data->s, sioc, data->hostname, err);
> 
> Do we really want to do this check here?  I think this is really too
> late.
> 
> You are not patching migrate_params_check().
> 
> I think that the proper way of doing this is something like:
> 
>     if (params->zerocopy &&
>         (params->parameters.multifd_compression != MULTIFD_COMPRESSION_NONE ||
>          migrate_use_tls())) {
>            error_setg(&err,
>                      "Zerocopy only available for non-compressed non-TLS multifd migration");
>         return false;
>     }
> 
> You have to do the equivalent of multifd_compression and tls enablement,
> to see that zerocopy is not enabled, of course.
> 
> I would prefer to check for QIO_CHANNEL_FEATUR_WRITE_ZEROCPY there, but
> I can't see a way of doing that without a qio.

I don't think you need to check that feature flag.

The combination of zerocopy and compression is simply illogical
and can be rejected unconditionally.

The combination of zerocopy and TLS is somewhat questionable.
You're always going to have a copy between the plain text and
cipher text. Avoiding an extra copy for the cipher text will
require kerenl work which has no ETA. If it does ever arise,
QEMU will need more work again to actually support it. So
today we can just reject it unconditonally again.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 6/6] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-11-16 16:17     ` Daniel P. Berrangé
@ 2021-11-16 16:34       ` Juan Quintela
  2021-11-16 16:39         ` Daniel P. Berrangé
  2021-11-16 16:34       ` Daniel P. Berrangé
  1 sibling, 1 reply; 45+ messages in thread
From: Juan Quintela @ 2021-11-16 16:34 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Leonardo Bras, Eric Blake, Dr. David Alan Gilbert,
	Markus Armbruster

Daniel P. Berrangé <berrange@redhat.com> wrote:

>> 
>>     if (params->zerocopy &&
>>         (params->parameters.multifd_compression != MULTIFD_COMPRESSION_NONE ||
>>          migrate_use_tls())) {
>>            error_setg(&err,
>>                      "Zerocopy only available for non-compressed non-TLS multifd migration");
>>         return false;
>>     }
>> 
>> You have to do the equivalent of multifd_compression and tls enablement,
>> to see that zerocopy is not enabled, of course.
>> 
>> I would prefer to check for QIO_CHANNEL_FEATUR_WRITE_ZEROCPY there, but
>> I can't see a way of doing that without a qio.
>
> I don't think you need to check that feature flag.

Oh, I mean other thing.

When you set "zerocopy" capability, you don't know if the kernel support
it.  My understanding is that the only way to check if it supported is
here.

I agree with the rest of the arguments.

Later, Juan.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 6/6] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-11-16 16:17     ` Daniel P. Berrangé
  2021-11-16 16:34       ` Juan Quintela
@ 2021-11-16 16:34       ` Daniel P. Berrangé
  2021-12-02  6:54         ` Leonardo Bras Soares Passos
  1 sibling, 1 reply; 45+ messages in thread
From: Daniel P. Berrangé @ 2021-11-16 16:34 UTC (permalink / raw)
  To: Juan Quintela, qemu-devel, Leonardo Bras, Eric Blake,
	Dr. David Alan Gilbert, Markus Armbruster

On Tue, Nov 16, 2021 at 04:17:47PM +0000, Daniel P. Berrangé wrote:
> On Tue, Nov 16, 2021 at 05:08:06PM +0100, Juan Quintela wrote:
> > Leonardo Bras <leobras@redhat.com> wrote:
> > > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> > > zerocopy interface.
> > >
> > > Change multifd_send_sync_main() so it can distinguish each iteration sync from
> > > the setup and the completion, so a flush_zerocopy() can be called
> > > at the after each iteration in order to make sure all dirty pages are sent
> > > before a new iteration is started.
> > >
> > > Also make it return -1 if flush_zerocopy() fails, in order to cancel
> > > the migration process, and avoid resuming the guest in the target host
> > > without receiving all current RAM.
> > >
> > > This will work fine on RAM migration because the RAM pages are not usually freed,
> > > and there is no problem on changing the pages content between async_send() and
> > > the actual sending of the buffer, because this change will dirty the page and
> > > cause it to be re-sent on a next iteration anyway.
> > >
> > > Given a lot of locked memory may be needed in order to use multid migration
> > > with zerocopy enabled, make it optional by creating a new migration parameter
> > > "zerocopy" on qapi, so low-privileged users can still perform multifd
> > > migrations.
> > 
> > How much memory can a non-root program use by default?
> > 
> > 
> > >  static void *multifd_send_thread(void *opaque)
> > > @@ -853,6 +875,10 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> > >          goto cleanup;
> > >      }
> > >  
> > > +    if (migrate_use_zerocopy()) {
> > > +        p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
> > > +    }
> > 
> > This belongs
> > 
> > 
> > >      p->c = QIO_CHANNEL(sioc);
> > >      qio_channel_set_delay(p->c, false);
> > >      p->running = true;
> > > @@ -918,6 +944,7 @@ int multifd_save_setup(Error **errp)
> > >          p->packet->version = cpu_to_be32(MULTIFD_VERSION);
> > >          p->name = g_strdup_printf("multifdsend_%d", i);
> > >          p->tls_hostname = g_strdup(s->hostname);
> > > +        p->write_flags = 0;
> > 
> > here?
> > 
> > >          socket_send_channel_create(multifd_new_send_channel_async, p);
> > >      }
> > > diff --git a/migration/socket.c b/migration/socket.c
> > > index e26e94aa0c..8e40e0a3fd 100644
> > > --- a/migration/socket.c
> > > +++ b/migration/socket.c
> > > @@ -78,8 +78,13 @@ static void socket_outgoing_migration(QIOTask *task,
> > >          trace_migration_socket_outgoing_connected(data->hostname);
> > >      }
> > >  
> > > -    if (migrate_use_zerocopy()) {
> > > -        error_setg(&err, "Zerocopy not available in migration");
> > > +    if (migrate_use_zerocopy() &&
> > > +        (!migrate_use_multifd() ||
> > > +         !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY) ||
> > > +          migrate_multifd_compression() != MULTIFD_COMPRESSION_NONE ||
> > > +          migrate_use_tls())) {
> > > +        error_setg(&err,
> > > +                   "Zerocopy only available for non-compressed non-TLS multifd migration");
> > >      }
> > >  
> > >      migration_channel_connect(data->s, sioc, data->hostname, err);
> > 
> > Do we really want to do this check here?  I think this is really too
> > late.
> > 
> > You are not patching migrate_params_check().
> > 
> > I think that the proper way of doing this is something like:
> > 
> >     if (params->zerocopy &&
> >         (params->parameters.multifd_compression != MULTIFD_COMPRESSION_NONE ||
> >          migrate_use_tls())) {
> >            error_setg(&err,
> >                      "Zerocopy only available for non-compressed non-TLS multifd migration");
> >         return false;
> >     }
> > 
> > You have to do the equivalent of multifd_compression and tls enablement,
> > to see that zerocopy is not enabled, of course.
> > 
> > I would prefer to check for QIO_CHANNEL_FEATUR_WRITE_ZEROCPY there, but
> > I can't see a way of doing that without a qio.
> 
> I don't think you need to check that feature flag.
> 
> The combination of zerocopy and compression is simply illogical
> and can be rejected unconditionally.

Or we could think of "zerocopy"  in a more targetted way.
It is only "zerocopy" in terms the final I/O operation.
Earlier parts of the process may involve copies. IOW, we
can copy as part of the compress operation, but skip the
when then sending the compressed page.

In practice though this is still unlikely to be something
we can practically do, as we would need to keep compressed
pages around for an entire migration iteration until we can
call flush to ensure the data has been sent. This would be
significant memory burden.

> The combination of zerocopy and TLS is somewhat questionable.
> You're always going to have a copy between the plain text and
> cipher text. Avoiding an extra copy for the cipher text will
> require kerenl work which has no ETA. If it does ever arise,
> QEMU will need more work again to actually support it. So
> today we can just reject it unconditonally again.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 6/6] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-11-16 16:34       ` Juan Quintela
@ 2021-11-16 16:39         ` Daniel P. Berrangé
  2021-12-02  6:56           ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 45+ messages in thread
From: Daniel P. Berrangé @ 2021-11-16 16:39 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Leonardo Bras, Eric Blake, Dr. David Alan Gilbert,
	Markus Armbruster

On Tue, Nov 16, 2021 at 05:34:50PM +0100, Juan Quintela wrote:
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> >> 
> >>     if (params->zerocopy &&
> >>         (params->parameters.multifd_compression != MULTIFD_COMPRESSION_NONE ||
> >>          migrate_use_tls())) {
> >>            error_setg(&err,
> >>                      "Zerocopy only available for non-compressed non-TLS multifd migration");
> >>         return false;
> >>     }
> >> 
> >> You have to do the equivalent of multifd_compression and tls enablement,
> >> to see that zerocopy is not enabled, of course.
> >> 
> >> I would prefer to check for QIO_CHANNEL_FEATUR_WRITE_ZEROCPY there, but
> >> I can't see a way of doing that without a qio.
> >
> > I don't think you need to check that feature flag.
> 
> Oh, I mean other thing.
> 
> When you set "zerocopy" capability, you don't know if the kernel support
> it.  My understanding is that the only way to check if it supported is
> here.

If you reqest it and it isn't supported you'll get an error back from
qio_channel_writev_zerocopy(). That's a bit too late though.

Ideally we should report an error straight after the migration code
creates the I/O channel, by querying for the feature.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 1/6] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks
  2021-11-12 10:13   ` Daniel P. Berrangé
  2021-11-12 10:26     ` Daniel P. Berrangé
@ 2021-11-22 23:18     ` Leonardo Bras Soares Passos
  2021-11-23  9:45       ` Daniel P. Berrangé
  1 sibling, 1 reply; 45+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-11-22 23:18 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Markus Armbruster, Eric Blake,
	Dr. David Alan Gilbert, Juan Quintela

Hello Daniel,
Thanks for the feedback!

On Fri, Nov 12, 2021 at 7:13 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Fri, Nov 12, 2021 at 02:10:36AM -0300, Leonardo Bras wrote:
> > -int qio_channel_writev_all(QIOChannel *ioc,
> > -                           const struct iovec *iov,
> > -                           size_t niov,
> > -                           Error **erp);
> > +int qio_channel_writev_all_flags(QIOChannel *ioc,
> > +                                 const struct iovec *iov,
> > +                                 size_t niov,
> > +                                 int flags,
> > +                                 Error **errp);
> > +#define qio_channel_writev_all(ioc, iov, niov, errp) \
> > +    qio_channel_writev_all_flags(ioc, iov, niov, 0, errp)
>
> We already have separate methods for zerocopy, instead of adding
> flags, so we shouldn't add flags to this either.
>
> Add a qio_channel_writev_zerocopy_all method instead.
>
> Internally, we can still make both qio_channel_writev_zerocopy_all
> and qio_channel_writev_all use the same helper method, just don't
> expose flags in the public API. Even internally we don't really
> need flags, just a bool

I see.
The idea of having a flag was to make it easier to expand the
interface in the future.
I got some feedback on v1 that would suggest it would be desired:
http://patchwork.ozlabs.org/project/qemu-devel/patch/20210831110238.299458-2-leobras@redhat.com/


>
[...]
> > +#define qio_channel_writev_full_all(ioc, iov, niov, fds, nfds, errp) \
> > +    qio_channel_writev_full_all_flags(ioc, iov, niov, fds, nfds, 0, errp)
>
> There's no need for this at all. Since fd passing is not supported
> with zerocopy, there's no reason to ever use this method.
>
> > +/**
> > + * qio_channel_writev_zerocopy:
> > + * @ioc: the channel object
> > + * @iov: the array of memory regions to write data from
> > + * @niov: the length of the @iov array
> > + * @errp: pointer to a NULL-initialized error object
> > + *
> > + * Behaves like qio_channel_writev_full_all_flags, but may write
>
> qio_channel_writev
>
> > + * data asynchronously while avoiding unnecessary data copy.
> > + * This function may return before any data is actually written,
> > + * but should queue every buffer for writing.
>
> Callers mustn't rely on "should" docs - they must rely on the
> return value indicating how many bytes were accepted.
>
> Also mention that this requires locked memory and can/will fail if
> insufficient locked memory is available.
>

Sure, I will update that.

> > +/**
> > + * qio_channel_flush_zerocopy:
> > + * @ioc: the channel object
> > + * @errp: pointer to a NULL-initialized error object
> > + *
> > + * Will block until every packet queued with
> > + * qio_channel_writev_zerocopy() is sent, or return
> > + * in case of any error.
> > + *
> > + * Returns -1 if any error is found, 0 otherwise.
>
>   Returns -1 if any error is found, 0 if all data was sent,
>            or 1 if all data was sent but at least some was copied.
>

I don't really get the return 1 part, I mean, per description it will
'block until every queued packet was sent, so "at least some was
copied" doesn't seem to fit here.
Could you elaborate?

Best regards,
Leo



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 3/6] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX
  2021-11-12 10:54   ` Daniel P. Berrangé
@ 2021-11-23  4:46     ` Leonardo Bras Soares Passos
  2021-11-23  9:55       ` Daniel P. Berrangé
  0 siblings, 1 reply; 45+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-11-23  4:46 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Markus Armbruster, Eric Blake,
	Dr. David Alan Gilbert, Juan Quintela

Hello Daniel,

On Fri, Nov 12, 2021 at 7:54 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
[...]
> > @@ -561,12 +577,15 @@ static ssize_t qio_channel_socket_writev_flags(QIOChannel *ioc,
> >   retry:
> >      ret = sendmsg(sioc->fd, &msg, flags);
> >      if (ret <= 0) {
> > -        if (errno == EAGAIN) {
> > +        switch (errno) {
> > +        case EAGAIN:
> >              return QIO_CHANNEL_ERR_BLOCK;
> > -        }
> > -        if (errno == EINTR) {
> > +        case EINTR:
> >              goto retry;
> > +        case ENOBUFS:
> > +            return QIO_CHANNEL_ERR_NOBUFS;
>
> Why does ENOBUFS need handling separately instead of letting
> the error_setg_errno below handle it ?
>
> The caller immediately invokes error_setg_errno() again,
> just with different error message.
>
> No code in this series ever looks at QIO_CHANNEL_ERR_NOBUFS
> either, so we don't even need that special error return code
> added AFAICT ?
>

The idea was to add a custom message for ENOBUFS return when sending
with MSG_ZEROCOPY.
I mean, having this message is important for the user to understand
why the migration is failing, but it would
not make any sense to have this message while a non-zerocopy sendmsg()
returns with ENOBUFS.

ENOBUFS : The output queue for a network interface was full.  This
generally indicates that the interface has stopped sending, but may be
caused by transient congestion.

As an alternative, I could add this message inside the switch, inside
an if (flags & MSG_ZEROCOPY) on qio_channel_socket_writev_flags()
instead of in it's caller.
But for me it looks bloated, I mean, dealing with an error for
ZEROCOPY only in the general function.

OTOH, if you think that it's a better idea to deal with every error in
qio_channel_socket_writev_flags() instead of in the caller, I will
change it for v6. Please let me know.

> >          }
> > +
> >          error_setg_errno(errp, errno,
> >                           "Unable to write to socket");
> >          return -1;
> > @@ -670,6 +689,127 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> >  }
> >  #endif /* WIN32 */
> >
> > +
> > +#ifdef CONFIG_LINUX
> > +
> > +static int qio_channel_socket_poll(QIOChannelSocket *sioc, bool zerocopy,
> > +                                   Error **errp)
>
> There's only one caller and it always passes zerocopy=true,
> so this parmeter looks pointless.

I did that for possible reuse of this function in the future:
- As of today, this is certainly compiled out, but if at some point
someone wants to use poll for something other
than the reading of an zerocopy errqueue, it could be reused.

But sure, if that's not desirable, I can remove the parameter (and the
if clause for !zerocopy).

>
> > +{
> > +    struct pollfd pfd;
> > +    int ret;
> > +
> > +    pfd.fd = sioc->fd;
> > +    pfd.events = 0;
> > +
> > + retry:
> > +    ret = poll(&pfd, 1, -1);
> > +    if (ret < 0) {
> > +        switch (errno) {
> > +        case EAGAIN:
> > +        case EINTR:
> > +            goto retry;
> > +        default:
> > +            error_setg_errno(errp, errno,
> > +                             "Poll error");
> > +            return ret;
>
>        return -1;
>
> > +        }
> > +    }
> > +
> > +    if (pfd.revents & (POLLHUP | POLLNVAL)) {
> > +        error_setg(errp, "Poll error: Invalid or disconnected fd");
> > +        return -1;
> > +    }
> > +
> > +    if (!zerocopy && (pfd.revents & POLLERR)) {
> > +        error_setg(errp, "Poll error: Errors present in errqueue");
> > +        return -1;
> > +    }
>
> > +
> > +    return ret;
>
>   return 0;

In the idea of future reuse I spoke above, returning zero here would
make this function always look like the poll timed out. Some future
users may want to repeat the waiting if poll() timed out, or if
(return > 0) stop polling.
(That was an earlier approach of this series)

>
> > +}
> > +
> > +static ssize_t qio_channel_socket_writev_zerocopy(QIOChannel *ioc,
> > +                                                  const struct iovec *iov,
> > +                                                  size_t niov,
> > +                                                  Error **errp)
> > +{
> > +    QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
> > +    ssize_t ret;
> > +
> > +    ret = qio_channel_socket_writev_flags(ioc, iov, niov, NULL, 0,
> > +                                          MSG_ZEROCOPY, errp);
> > +    if (ret == QIO_CHANNEL_ERR_NOBUFS) {
> > +        error_setg_errno(errp, errno,
> > +                         "Process can't lock enough memory for using MSG_ZEROCOPY");
>
> This should not be touching errno - the method should be setting the
> errp directly, not leaving it to the caller.
>

Discussed above.
If you think that's a better approach I can change for v6.


> > +        return -1;
> > +    }
> > +
> > +    sioc->zerocopy_queued++;
>
>  if (ret > 0)
>     sio->zerocopy_queued++
>

Nice catch!

> since the kernel doesn't increase the counter if the data sent
> was zero length. A caller shouldn't be passing us a zero length
> iov data element, but it is wise to be cautious

Seems ok to me.

>
> > +    return ret;
> > +}
> > +
> > +static int qio_channel_socket_flush_zerocopy(QIOChannel *ioc,
> > +                                             Error **errp)
> > +{
> > +    QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
> > +    struct msghdr msg = {};
> > +    struct sock_extended_err *serr;
> > +    struct cmsghdr *cm;
> > +    char control[CMSG_SPACE(sizeof(*serr))];
> > +    int ret;
>
> Add
>
>    int rv = 0;
>
> > +
> > +    msg.msg_control = control;
> > +    msg.msg_controllen = sizeof(control);
> > +    memset(control, 0, sizeof(control));
> > +
> > +    while (sioc->zerocopy_sent < sioc->zerocopy_queued) {
> > +        ret = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE);
> > +        if (ret < 0) {
> > +            switch (errno) {
> > +            case EAGAIN:
> > +                /* Nothing on errqueue, wait until something is available */
> > +                ret = qio_channel_socket_poll(sioc, true, errp);
> > +                if (ret < 0) {
> > +                    return -1;
> > +                }
> > +                continue;
> > +            case EINTR:
> > +                continue;
> > +            default:
> > +                error_setg_errno(errp, errno,
> > +                                 "Unable to read errqueue");
> > +                return -1;
> > +            }
> > +        }
> > +
> > +        cm = CMSG_FIRSTHDR(&msg);
> > +        if (cm->cmsg_level != SOL_IP &&
> > +            cm->cmsg_type != IP_RECVERR) {
> > +            error_setg_errno(errp, EPROTOTYPE,
> > +                             "Wrong cmsg in errqueue");
> > +            return -1;
> > +        }
> > +
> > +        serr = (void *) CMSG_DATA(cm);
> > +        if (serr->ee_errno != SO_EE_ORIGIN_NONE) {
> > +            error_setg_errno(errp, serr->ee_errno,
> > +                             "Error on socket");
> > +            return -1;
> > +        }
> > +        if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) {
> > +            error_setg_errno(errp, serr->ee_origin,
> > +                             "Error not from zerocopy");
> > +            return -1;
> > +        }
> > +
> > +        /* No errors, count successfully finished sendmsg()*/
> > +        sioc->zerocopy_sent += serr->ee_data - serr->ee_info + 1;
>
> Here add
>
>
>      if (ee_code ==  SO_EE_CODE_ZEROCOPY_COPIED)
>         rv = 1;
>
> > +    }
> > +    return 0;
>
> return rv;
>
> > +}
>
>

I understand the idea of testing SO_EE_CODE_ZEROCOPY_COPIED to be able
to tell whenever zerocopy fell back to copying for some reason, but I
don't see how this can be helpful here.

Other than that I would do rv++ instead of rv=1 here, if I want to
keep track of how many buffers were sent with zerocopy and how many
ended up being copied.

>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>

Thanks for this feedback Daniel,

Best regards,
Leo



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 2/6] QIOChannelSocket: Add flags parameter for writing
  2021-11-12 10:15   ` Daniel P. Berrangé
@ 2021-11-23  5:33     ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 45+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-11-23  5:33 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Markus Armbruster, Eric Blake,
	Dr. David Alan Gilbert, Juan Quintela

Hello Daniel,

Thanks for reviewing!
Best regards,
Leo

On Fri, Nov 12, 2021 at 7:15 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Fri, Nov 12, 2021 at 02:10:37AM -0300, Leonardo Bras wrote:
> > Change qio_channel_socket_writev() in order to accept flags, so its possible
> > to selectively make use of sendmsg() flags.
> >
> > qio_channel_socket_writev() contents were moved to a helper function
> > qio_channel_socket_writev_flags() which accepts an extra argument for flags.
> > (This argument is passed directly to sendmsg().
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> >  io/channel-socket.c | 26 +++++++++++++++++++-------
> >  1 file changed, 19 insertions(+), 7 deletions(-)
>
> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
>
>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 1/6] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks
  2021-11-22 23:18     ` Leonardo Bras Soares Passos
@ 2021-11-23  9:45       ` Daniel P. Berrangé
  2021-12-03  5:24         ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 45+ messages in thread
From: Daniel P. Berrangé @ 2021-11-23  9:45 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: qemu-devel, Markus Armbruster, Eric Blake,
	Dr. David Alan Gilbert, Juan Quintela

On Mon, Nov 22, 2021 at 08:18:09PM -0300, Leonardo Bras Soares Passos wrote:
> Hello Daniel,
> Thanks for the feedback!
> 
> On Fri, Nov 12, 2021 at 7:13 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > On Fri, Nov 12, 2021 at 02:10:36AM -0300, Leonardo Bras wrote:
> > > -int qio_channel_writev_all(QIOChannel *ioc,
> > > -                           const struct iovec *iov,
> > > -                           size_t niov,
> > > -                           Error **erp);
> > > +int qio_channel_writev_all_flags(QIOChannel *ioc,
> > > +                                 const struct iovec *iov,
> > > +                                 size_t niov,
> > > +                                 int flags,
> > > +                                 Error **errp);
> > > +#define qio_channel_writev_all(ioc, iov, niov, errp) \
> > > +    qio_channel_writev_all_flags(ioc, iov, niov, 0, errp)
> >
> > We already have separate methods for zerocopy, instead of adding
> > flags, so we shouldn't add flags to this either.
> >
> > Add a qio_channel_writev_zerocopy_all method instead.
> >
> > Internally, we can still make both qio_channel_writev_zerocopy_all
> > and qio_channel_writev_all use the same helper method, just don't
> > expose flags in the public API. Even internally we don't really
> > need flags, just a bool
> 
> I see.
> The idea of having a flag was to make it easier to expand the
> interface in the future.
> I got some feedback on v1 that would suggest it would be desired:
> http://patchwork.ozlabs.org/project/qemu-devel/patch/20210831110238.299458-2-leobras@redhat.com/
> 
> 
> >
> [...]
> > > +#define qio_channel_writev_full_all(ioc, iov, niov, fds, nfds, errp) \
> > > +    qio_channel_writev_full_all_flags(ioc, iov, niov, fds, nfds, 0, errp)
> >
> > There's no need for this at all. Since fd passing is not supported
> > with zerocopy, there's no reason to ever use this method.
> >
> > > +/**
> > > + * qio_channel_writev_zerocopy:
> > > + * @ioc: the channel object
> > > + * @iov: the array of memory regions to write data from
> > > + * @niov: the length of the @iov array
> > > + * @errp: pointer to a NULL-initialized error object
> > > + *
> > > + * Behaves like qio_channel_writev_full_all_flags, but may write
> >
> > qio_channel_writev
> >
> > > + * data asynchronously while avoiding unnecessary data copy.
> > > + * This function may return before any data is actually written,
> > > + * but should queue every buffer for writing.
> >
> > Callers mustn't rely on "should" docs - they must rely on the
> > return value indicating how many bytes were accepted.
> >
> > Also mention that this requires locked memory and can/will fail if
> > insufficient locked memory is available.
> >
> 
> Sure, I will update that.
> 
> > > +/**
> > > + * qio_channel_flush_zerocopy:
> > > + * @ioc: the channel object
> > > + * @errp: pointer to a NULL-initialized error object
> > > + *
> > > + * Will block until every packet queued with
> > > + * qio_channel_writev_zerocopy() is sent, or return
> > > + * in case of any error.
> > > + *
> > > + * Returns -1 if any error is found, 0 otherwise.
> >
> >   Returns -1 if any error is found, 0 if all data was sent,
> >            or 1 if all data was sent but at least some was copied.
> >
> 
> I don't really get the return 1 part, I mean, per description it will
> 'block until every queued packet was sent, so "at least some was
> copied" doesn't seem to fit here.
> Could you elaborate?

Passing the ZEROCOPY flag to the kernel does not guarantee
that the copy is avoided, it is merely a hint to the kernel

When getting the notification, the ee_code  field in the
notification struct will have the flag
SO_EE_CODE_ZEROCOPY_COPIED  set if the kernel could not
avoid the copy.

In this case, it is better for the application to stop
using the ZEROCOPY flag and just do normal writes, so
it avoids the overhead of the the notifications.

This is described in "Deferred copies" section of the
Documentation/networking/msg_zerocopy.rst in linux.git

So I'm suggesting that the return value of this method
be '0' if SO_EE_CODE_ZEROCOPY_COPIED was *NOT* set in
/all/ notifications, or '1' if SO_EE_CODE_ZEROCOPY_COPIED
was set in at least one notification.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 3/6] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX
  2021-11-23  4:46     ` Leonardo Bras Soares Passos
@ 2021-11-23  9:55       ` Daniel P. Berrangé
  2021-12-03  5:42         ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 45+ messages in thread
From: Daniel P. Berrangé @ 2021-11-23  9:55 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: qemu-devel, Markus Armbruster, Eric Blake,
	Dr. David Alan Gilbert, Juan Quintela

On Tue, Nov 23, 2021 at 01:46:44AM -0300, Leonardo Bras Soares Passos wrote:
> Hello Daniel,
> 
> On Fri, Nov 12, 2021 at 7:54 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> [...]
> > > @@ -561,12 +577,15 @@ static ssize_t qio_channel_socket_writev_flags(QIOChannel *ioc,
> > >   retry:
> > >      ret = sendmsg(sioc->fd, &msg, flags);
> > >      if (ret <= 0) {
> > > -        if (errno == EAGAIN) {
> > > +        switch (errno) {
> > > +        case EAGAIN:
> > >              return QIO_CHANNEL_ERR_BLOCK;
> > > -        }
> > > -        if (errno == EINTR) {
> > > +        case EINTR:
> > >              goto retry;
> > > +        case ENOBUFS:
> > > +            return QIO_CHANNEL_ERR_NOBUFS;
> >
> > Why does ENOBUFS need handling separately instead of letting
> > the error_setg_errno below handle it ?
> >
> > The caller immediately invokes error_setg_errno() again,
> > just with different error message.
> >
> > No code in this series ever looks at QIO_CHANNEL_ERR_NOBUFS
> > either, so we don't even need that special error return code
> > added AFAICT ?
> >
> 
> The idea was to add a custom message for ENOBUFS return when sending
> with MSG_ZEROCOPY.
> I mean, having this message is important for the user to understand
> why the migration is failing, but it would
> not make any sense to have this message while a non-zerocopy sendmsg()
> returns with ENOBUFS.
> 
> ENOBUFS : The output queue for a network interface was full.  This
> generally indicates that the interface has stopped sending, but may be
> caused by transient congestion.
> 
> As an alternative, I could add this message inside the switch, inside
> an if (flags & MSG_ZEROCOPY) on qio_channel_socket_writev_flags()
> instead of in it's caller.
> But for me it looks bloated, I mean, dealing with an error for
> ZEROCOPY only in the general function.

It is perfectly reasonable to check flags in this method.

> OTOH, if you think that it's a better idea to deal with every error in
> qio_channel_socket_writev_flags() instead of in the caller, I will
> change it for v6. Please let me know.

Yes, this method is already taking an ERror **errp parameter and
reporting a user facing error. If we need to report different
message text for ENOBUFS, it should be done in this method too.

The reason QIO_CHANNEL_ERR_BLOCK is special is because we are
explicitly not treating it as an error scenario at all.  That's
different to the ENOBUFS case.


> 
> > >          }
> > > +
> > >          error_setg_errno(errp, errno,
> > >                           "Unable to write to socket");
> > >          return -1;
> > > @@ -670,6 +689,127 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> > >  }
> > >  #endif /* WIN32 */
> > >
> > > +
> > > +#ifdef CONFIG_LINUX
> > > +
> > > +static int qio_channel_socket_poll(QIOChannelSocket *sioc, bool zerocopy,
> > > +                                   Error **errp)
> >
> > There's only one caller and it always passes zerocopy=true,
> > so this parmeter looks pointless.
> 
> I did that for possible reuse of this function in the future:
> - As of today, this is certainly compiled out, but if at some point
> someone wants to use poll for something other
> than the reading of an zerocopy errqueue, it could be reused.
> 
> But sure, if that's not desirable, I can remove the parameter (and the
> if clause for !zerocopy).
> 
> >
> > > +{
> > > +    struct pollfd pfd;
> > > +    int ret;
> > > +
> > > +    pfd.fd = sioc->fd;
> > > +    pfd.events = 0;
> > > +
> > > + retry:
> > > +    ret = poll(&pfd, 1, -1);
> > > +    if (ret < 0) {
> > > +        switch (errno) {
> > > +        case EAGAIN:
> > > +        case EINTR:
> > > +            goto retry;
> > > +        default:
> > > +            error_setg_errno(errp, errno,
> > > +                             "Poll error");
> > > +            return ret;
> >
> >        return -1;
> >
> > > +        }
> > > +    }
> > > +
> > > +    if (pfd.revents & (POLLHUP | POLLNVAL)) {
> > > +        error_setg(errp, "Poll error: Invalid or disconnected fd");
> > > +        return -1;
> > > +    }
> > > +
> > > +    if (!zerocopy && (pfd.revents & POLLERR)) {
> > > +        error_setg(errp, "Poll error: Errors present in errqueue");
> > > +        return -1;
> > > +    }
> >
> > > +
> > > +    return ret;
> >
> >   return 0;
> 
> In the idea of future reuse I spoke above, returning zero here would
> make this function always look like the poll timed out. Some future
> users may want to repeat the waiting if poll() timed out, or if
> (return > 0) stop polling.

Now that I'm looking again, we should not really use poll() at all,
as GLib provides us higher level APIs. We in fact already have the
qio_channel_wait() method as a general purpose helper for waiting
for an I/O condition to occur.;


> I understand the idea of testing SO_EE_CODE_ZEROCOPY_COPIED to be able
> to tell whenever zerocopy fell back to copying for some reason, but I
> don't see how this can be helpful here.
> 
> Other than that I would do rv++ instead of rv=1 here, if I want to
> keep track of how many buffers were sent with zerocopy and how many
> ended up being copied.

Sure, we could do   "ret > 0 == number of buffers that were copied"
as the API contract, rather than just treating it as a boolean.



Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 5/6] migration: Add migrate_use_tls() helper
  2021-11-12 11:04   ` Juan Quintela
@ 2021-11-30 19:00     ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 45+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-11-30 19:00 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Eric Blake, Daniel P. Berrangé,
	Dr. David Alan Gilbert, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 539 bytes --]

Hello Juan,

Thanks for reviewing!

Best regards,
Leo

On Fri, Nov 12, 2021 at 8:04 AM Juan Quintela <quintela@redhat.com> wrote:

> Leonardo Bras <leobras@redhat.com> wrote:
> > A lot of places check parameters.tls_creds in order to evaluate if TLS is
> > in use, and sometimes call migrate_get_current() just for that test.
> >
> > Add new helper function migrate_use_tls() in order to simplify testing
> > for TLS usage.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
>
>

[-- Attachment #2: Type: text/html, Size: 1582 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 4/6] migration: Add zerocopy parameter for QMP/HMP for Linux
  2021-11-12 11:04   ` Juan Quintela
  2021-11-12 11:08     ` Daniel P. Berrangé
  2021-11-12 12:01     ` Markus Armbruster
@ 2021-12-01 18:51     ` Leonardo Bras Soares Passos
  2 siblings, 0 replies; 45+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-12-01 18:51 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Eric Blake, Daniel P. Berrangé,
	Dr. David Alan Gilbert, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 4420 bytes --]

Hello Juan,

On Fri, Nov 12, 2021 at 8:04 AM Juan Quintela <quintela@redhat.com> wrote:

> Leonardo Bras <leobras@redhat.com> wrote:
> > Add property that allows zerocopy migration of memory pages,
> > and also includes a helper function migrate_use_zerocopy() to check
> > if it's enabled.
> >
> > No code is introduced to actually do the migration, but it allow
> > future implementations to enable/disable this feature.
> >
> > On non-Linux builds this parameter is compiled-out.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
>
> Hi
>
> > +# @zerocopy: Controls behavior on sending memory pages on migration.
> > +#            When true, enables a zerocopy mechanism for sending memory
> > +#            pages, if host supports it.
> > +#            Defaults to false. (Since 6.2)
> > +#
>
> This needs to be changed to next release, but not big deal.
>
>
> > +#ifdef CONFIG_LINUX
> > +int migrate_use_zerocopy(void);
>
> Please, return bool
>
> > +#else
> > +#define migrate_use_zerocopy() (0)
> > +#endif
>
> and false here.
>
> I know, I know.  We are not consistent here, but the preffered way is
> the other way.
>
>
Ok, changed for v6



> >  int migrate_use_xbzrle(void);
> >  uint64_t migrate_xbzrle_cache_size(void);
> >  bool migrate_colo_enabled(void);
> > diff --git a/migration/migration.c b/migration/migration.c
> > index abaf6f9e3d..add3dabc56 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -886,6 +886,10 @@ MigrationParameters
> *qmp_query_migrate_parameters(Error **errp)
> >      params->multifd_zlib_level = s->parameters.multifd_zlib_level;
> >      params->has_multifd_zstd_level = true;
> >      params->multifd_zstd_level = s->parameters.multifd_zstd_level;
> > +#ifdef CONFIG_LINUX
> > +    params->has_zerocopy = true;
> > +    params->zerocopy = s->parameters.zerocopy;
> > +#endif
> >      params->has_xbzrle_cache_size = true;
> >      params->xbzrle_cache_size = s->parameters.xbzrle_cache_size;
> >      params->has_max_postcopy_bandwidth = true;
> > @@ -1538,6 +1542,11 @@ static void
> migrate_params_test_apply(MigrateSetParameters *params,
> >      if (params->has_multifd_compression) {
> >          dest->multifd_compression = params->multifd_compression;
> >      }
> > +#ifdef CONFIG_LINUX
> > +    if (params->has_zerocopy) {
> > +        dest->zerocopy = params->zerocopy;
> > +    }
> > +#endif
> >      if (params->has_xbzrle_cache_size) {
> >          dest->xbzrle_cache_size = params->xbzrle_cache_size;
> >      }
> > @@ -1650,6 +1659,11 @@ static void
> migrate_params_apply(MigrateSetParameters *params, Error **errp)
> >      if (params->has_multifd_compression) {
> >          s->parameters.multifd_compression = params->multifd_compression;
> >      }
> > +#ifdef CONFIG_LINUX
> > +    if (params->has_zerocopy) {
> > +        s->parameters.zerocopy = params->zerocopy;
> > +    }
> > +#endif
>
> After seing all this CONFIG_LINUX mess, I am not sure that it is a good
> idea to add the parameter only for LINUX.  It appears that it is better
> to add it for all OS's and just not allow to set it to true there.
>
> But If QAPI/QOM people preffer that way, I am not going to get into the
> middle.
>
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index 7c9deb1921..ab8f0f97be 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -854,16 +854,17 @@ static void multifd_new_send_channel_async(QIOTask
> *task, gpointer opaque)
> >      trace_multifd_new_send_channel_async(p->id);
> >      if (qio_task_propagate_error(task, &local_err)) {
> >          goto cleanup;
> > -    } else {
> > -        p->c = QIO_CHANNEL(sioc);
> > -        qio_channel_set_delay(p->c, false);
> > -        p->running = true;
> > -        if (!multifd_channel_connect(p, sioc, local_err)) {
> > -            goto cleanup;
> > -        }
> > -        return;
> >      }
> >
> > +    p->c = QIO_CHANNEL(sioc);
> > +    qio_channel_set_delay(p->c, false);
> > +    p->running = true;
> > +    if (!multifd_channel_connect(p, sioc, local_err)) {
> > +        goto cleanup;
> > +    }
> > +
> > +    return;
> > +
> >  cleanup:
> >      multifd_new_send_channel_cleanup(p, sioc, local_err);
> >  }
>
> As far as I can see, this chunk is a NOP, and it don't belong to this
> patch.
>
>
Yeah, it made sense in a previous version, but now it doesn't matter
anymore.
Removed for v6.



> Later, Juan.
>
>
Thanks,
Leo

[-- Attachment #2: Type: text/html, Size: 6711 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 4/6] migration: Add zerocopy parameter for QMP/HMP for Linux
  2021-11-12 11:05   ` Daniel P. Berrangé
@ 2021-12-01 19:05     ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 45+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-12-01 19:05 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Markus Armbruster, Eric Blake,
	Dr. David Alan Gilbert, Juan Quintela

Hello Daniel,

On Fri, Nov 12, 2021 at 8:05 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Fri, Nov 12, 2021 at 02:10:39AM -0300, Leonardo Bras wrote:
> > Add property that allows zerocopy migration of memory pages,
> > and also includes a helper function migrate_use_zerocopy() to check
> > if it's enabled.
> >
> > No code is introduced to actually do the migration, but it allow
> > future implementations to enable/disable this feature.
> >
> > On non-Linux builds this parameter is compiled-out.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> >  qapi/migration.json   | 18 ++++++++++++++++++
> >  migration/migration.h |  5 +++++
> >  migration/migration.c | 32 ++++++++++++++++++++++++++++++++
> >  migration/multifd.c   | 17 +++++++++--------
> >  migration/socket.c    |  5 +++++
> >  monitor/hmp-cmds.c    |  6 ++++++
> >  6 files changed, 75 insertions(+), 8 deletions(-)
> >
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index bbfd48cf0b..9534c299d7 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -730,6 +730,11 @@
> >  #                      will consume more CPU.
> >  #                      Defaults to 1. (Since 5.0)
> >  #
> > +# @zerocopy: Controls behavior on sending memory pages on migration.
> > +#            When true, enables a zerocopy mechanism for sending memory
> > +#            pages, if host supports it.
> > +#            Defaults to false. (Since 6.2)
>
> Add
>
>    Requires that QEMU be permitted to use locked memory for guest
>    RAM pages.
>

Done

>
> Also 7.0 since this has missed the 6.2 deadline.
>

Done

>
>
> Both these notes apply to later in this file too
>

Replaced thrice in this file.

>
>
>
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index 7c9deb1921..ab8f0f97be 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -854,16 +854,17 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> >      trace_multifd_new_send_channel_async(p->id);
> >      if (qio_task_propagate_error(task, &local_err)) {
> >          goto cleanup;
> > -    } else {
> > -        p->c = QIO_CHANNEL(sioc);
> > -        qio_channel_set_delay(p->c, false);
> > -        p->running = true;
> > -        if (!multifd_channel_connect(p, sioc, local_err)) {
> > -            goto cleanup;
> > -        }
> > -        return;
> >      }
> >
> > +    p->c = QIO_CHANNEL(sioc);
> > +    qio_channel_set_delay(p->c, false);
> > +    p->running = true;
> > +    if (!multifd_channel_connect(p, sioc, local_err)) {
> > +        goto cleanup;
> > +    }
> > +
> > +    return;
> > +
> >  cleanup:
> >      multifd_new_send_channel_cleanup(p, sioc, local_err);
> >  }
>
> This change is just a code style alteration with no relation to
> zerocopy. Either remove it, or do this change in its own patch
> seprate from zerocopy.
>

Removed.

>
>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>

Thanks for reviewing.

Best regards,
Leo



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 4/6] migration: Add zerocopy parameter for QMP/HMP for Linux
  2021-11-12 11:59       ` Markus Armbruster
@ 2021-12-01 19:07         ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 45+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-12-01 19:07 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, Eric Blake, Daniel P. Berrangé,
	Dr. David Alan Gilbert, Juan Quintela

Hello Markus,
Thanks for sharing this info!

Best regards,
Leo

On Fri, Nov 12, 2021 at 8:59 AM Markus Armbruster <armbru@redhat.com> wrote:
>
> Daniel P. Berrangé <berrange@redhat.com> writes:
>
> > On Fri, Nov 12, 2021 at 12:04:33PM +0100, Juan Quintela wrote:
> >> Leonardo Bras <leobras@redhat.com> wrote:
>
> [...]
>
> >> > diff --git a/migration/migration.c b/migration/migration.c
> >> > index abaf6f9e3d..add3dabc56 100644
> >> > --- a/migration/migration.c
> >> > +++ b/migration/migration.c
> >> > @@ -886,6 +886,10 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
> >> >      params->multifd_zlib_level = s->parameters.multifd_zlib_level;
> >> >      params->has_multifd_zstd_level = true;
> >> >      params->multifd_zstd_level = s->parameters.multifd_zstd_level;
> >> > +#ifdef CONFIG_LINUX
> >> > +    params->has_zerocopy = true;
> >> > +    params->zerocopy = s->parameters.zerocopy;
> >> > +#endif
> >> >      params->has_xbzrle_cache_size = true;
> >> >      params->xbzrle_cache_size = s->parameters.xbzrle_cache_size;
> >> >      params->has_max_postcopy_bandwidth = true;
> >> > @@ -1538,6 +1542,11 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
> >> >      if (params->has_multifd_compression) {
> >> >          dest->multifd_compression = params->multifd_compression;
> >> >      }
> >> > +#ifdef CONFIG_LINUX
> >> > +    if (params->has_zerocopy) {
> >> > +        dest->zerocopy = params->zerocopy;
> >> > +    }
> >> > +#endif
> >> >      if (params->has_xbzrle_cache_size) {
> >> >          dest->xbzrle_cache_size = params->xbzrle_cache_size;
> >> >      }
> >> > @@ -1650,6 +1659,11 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
> >> >      if (params->has_multifd_compression) {
> >> >          s->parameters.multifd_compression = params->multifd_compression;
> >> >      }
> >> > +#ifdef CONFIG_LINUX
> >> > +    if (params->has_zerocopy) {
> >> > +        s->parameters.zerocopy = params->zerocopy;
> >> > +    }
> >> > +#endif
> >>
> >> After seing all this CONFIG_LINUX mess, I am not sure that it is a good
> >> idea to add the parameter only for LINUX.  It appears that it is better
> >> to add it for all OS's and just not allow to set it to true there.
> >>
> >> But If QAPI/QOM people preffer that way, I am not going to get into the middle.
> >
> > I don't like all the conditionals either, but QAPI design wants the
> > conditionals, as that allows mgmt apps to query whether the feature
> > is supported in a build or not.
>
> Specifically, the conditionals keep @zerocopy out of query-qmp-schema
> (a.k.a. schema introspection) when it's not actually supported.
>
> This lets management applications recognize zero-copy support.
>
> Without conditionals, the only way to probe for it is trying to switch
> it on.  This is inconvenient and error-prone.
>
> Immature ideas to avoid conditionals:
>
> 1. Make *values* conditional, i.e. unconditional false, but true only if
> CONFIG_LINUX.  The QAPI schema language lets you do this for
> enumerations today, but not for bool.
>
> 2. A new kind of conditional that only applies to schema introspection,
> so you can eat your introspection cake and keep the #ifdef-less code
> cake (and the slight binary bloat that comes with it).
>



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 4/6] migration: Add zerocopy parameter for QMP/HMP for Linux
  2021-11-12 12:01     ` Markus Armbruster
@ 2021-12-02  4:31       ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 45+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-12-02  4:31 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, Eric Blake, Daniel P. Berrangé,
	Dr. David Alan Gilbert, Juan Quintela

Hello Markus,

On Fri, Nov 12, 2021 at 9:01 AM Markus Armbruster <armbru@redhat.com> wrote:
>
> Juan Quintela <quintela@redhat.com> writes:
>
> > Leonardo Bras <leobras@redhat.com> wrote:
> >> Add property that allows zerocopy migration of memory pages,
> >> and also includes a helper function migrate_use_zerocopy() to check
> >> if it's enabled.
> >>
> >> No code is introduced to actually do the migration, but it allow
> >> future implementations to enable/disable this feature.
> >>
> >> On non-Linux builds this parameter is compiled-out.
> >>
> >> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> >
> > Hi
> >
> >> +# @zerocopy: Controls behavior on sending memory pages on migration.
> >> +#            When true, enables a zerocopy mechanism for sending memory
> >> +#            pages, if host supports it.
> >> +#            Defaults to false. (Since 6.2)
> >> +#
> >
> > This needs to be changed to next release, but not big deal.
>
> Rename to zero-copy while there.  QAPI/QMP strongly prefer separating
> words with dashes.  "zerocopy" is not a word, "zero" and "copy" are.
>
> [...]
>

Fine then.
To make sure it does not look strange, I will change the naming for
all the code (zerocopy becomes zero-copy or zero_copy according to the
context).

Thanks for reviewing!

Best regards,
Leo



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 6/6] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-11-16 16:08   ` Juan Quintela
  2021-11-16 16:17     ` Daniel P. Berrangé
@ 2021-12-02  6:47     ` Leonardo Bras Soares Passos
  2021-12-02 12:10       ` Juan Quintela
  2021-12-09  8:51     ` Leonardo Bras Soares Passos
  2 siblings, 1 reply; 45+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-12-02  6:47 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Eric Blake, Daniel P. Berrangé,
	Dr. David Alan Gilbert, Markus Armbruster

On Tue, Nov 16, 2021 at 1:08 PM Juan Quintela <quintela@redhat.com> wrote:
>
> Leonardo Bras <leobras@redhat.com> wrote:
> > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> > zerocopy interface.
> >
> > Change multifd_send_sync_main() so it can distinguish each iteration sync from
> > the setup and the completion, so a flush_zerocopy() can be called
> > at the after each iteration in order to make sure all dirty pages are sent
> > before a new iteration is started.
> >
> > Also make it return -1 if flush_zerocopy() fails, in order to cancel
> > the migration process, and avoid resuming the guest in the target host
> > without receiving all current RAM.
> >
> > This will work fine on RAM migration because the RAM pages are not usually freed,
> > and there is no problem on changing the pages content between async_send() and
> > the actual sending of the buffer, because this change will dirty the page and
> > cause it to be re-sent on a next iteration anyway.
> >
> > Given a lot of locked memory may be needed in order to use multid migration
> > with zerocopy enabled, make it optional by creating a new migration parameter
> > "zerocopy" on qapi, so low-privileged users can still perform multifd
> > migrations.
>
> How much memory can a non-root program use by default?

On RHEL 8, a standard user is created allowing 64kB max locked memory.
(memory seems 'unlimited', though)


>
>
> >  static void *multifd_send_thread(void *opaque)
> > @@ -853,6 +875,10 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> >          goto cleanup;
> >      }
> >
> > +    if (migrate_use_zerocopy()) {
> > +        p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
> > +    }
>
> This belongs
>
>
> >      p->c = QIO_CHANNEL(sioc);
> >      qio_channel_set_delay(p->c, false);
> >      p->running = true;
> > @@ -918,6 +944,7 @@ int multifd_save_setup(Error **errp)
> >          p->packet->version = cpu_to_be32(MULTIFD_VERSION);
> >          p->name = g_strdup_printf("multifdsend_%d", i);
> >          p->tls_hostname = g_strdup(s->hostname);
> > +        p->write_flags = 0;
>
> here?

yeah, makes sense.
Moving on v6.

>
> >          socket_send_channel_create(multifd_new_send_channel_async, p);
> >      }
> > diff --git a/migration/socket.c b/migration/socket.c
> > index e26e94aa0c..8e40e0a3fd 100644
> > --- a/migration/socket.c
> > +++ b/migration/socket.c
> > @@ -78,8 +78,13 @@ static void socket_outgoing_migration(QIOTask *task,
> >          trace_migration_socket_outgoing_connected(data->hostname);
> >      }
> >
> > -    if (migrate_use_zerocopy()) {
> > -        error_setg(&err, "Zerocopy not available in migration");
> > +    if (migrate_use_zerocopy() &&
> > +        (!migrate_use_multifd() ||
> > +         !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY) ||
> > +          migrate_multifd_compression() != MULTIFD_COMPRESSION_NONE ||
> > +          migrate_use_tls())) {
> > +        error_setg(&err,
> > +                   "Zerocopy only available for non-compressed non-TLS multifd migration");
> >      }
> >
> >      migration_channel_connect(data->s, sioc, data->hostname, err);
>
> Do we really want to do this check here?  I think this is really too
> late.
>
> You are not patching migrate_params_check().
>
> I think that the proper way of doing this is something like:
>
>     if (params->zerocopy &&
>         (params->parameters.multifd_compression != MULTIFD_COMPRESSION_NONE ||
>          migrate_use_tls())) {
>            error_setg(&err,
>                      "Zerocopy only available for non-compressed non-TLS multifd migration");
>         return false;
>     }
>
> You have to do the equivalent of multifd_compression and tls enablement,
> to see that zerocopy is not enabled, of course.

IIUC, by following your suggestion and changing this in
migrate_params_check() instead would allow the misconfiguration to be
known before migration is attempted.
That seems the best thing to do here.


>
> I would prefer to check for QIO_CHANNEL_FEATUR_WRITE_ZEROCPY there, but
> I can't see a way of doing that without a qio.

Yeah, I think I should leave the feature testing in here, and move the
parameter testing to migrate_params_check() as commented before.

What do you think?

>
> Later, Juan.
>

Best regards,
Leo



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 6/6] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-11-16 16:34       ` Daniel P. Berrangé
@ 2021-12-02  6:54         ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 45+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-12-02  6:54 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Markus Armbruster, Eric Blake, qemu-devel,
	Dr. David Alan Gilbert, Juan Quintela

On Tue, Nov 16, 2021 at 1:35 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Tue, Nov 16, 2021 at 04:17:47PM +0000, Daniel P. Berrangé wrote:
> > On Tue, Nov 16, 2021 at 05:08:06PM +0100, Juan Quintela wrote:
> > > Leonardo Bras <leobras@redhat.com> wrote:
> > > > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> > > > zerocopy interface.
> > > >
> > > > Change multifd_send_sync_main() so it can distinguish each iteration sync from
> > > > the setup and the completion, so a flush_zerocopy() can be called
> > > > at the after each iteration in order to make sure all dirty pages are sent
> > > > before a new iteration is started.
> > > >
> > > > Also make it return -1 if flush_zerocopy() fails, in order to cancel
> > > > the migration process, and avoid resuming the guest in the target host
> > > > without receiving all current RAM.
> > > >
> > > > This will work fine on RAM migration because the RAM pages are not usually freed,
> > > > and there is no problem on changing the pages content between async_send() and
> > > > the actual sending of the buffer, because this change will dirty the page and
> > > > cause it to be re-sent on a next iteration anyway.
> > > >
> > > > Given a lot of locked memory may be needed in order to use multid migration
> > > > with zerocopy enabled, make it optional by creating a new migration parameter
> > > > "zerocopy" on qapi, so low-privileged users can still perform multifd
> > > > migrations.
> > >
> > > How much memory can a non-root program use by default?
> > >
> > >
> > > >  static void *multifd_send_thread(void *opaque)
> > > > @@ -853,6 +875,10 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> > > >          goto cleanup;
> > > >      }
> > > >
> > > > +    if (migrate_use_zerocopy()) {
> > > > +        p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
> > > > +    }
> > >
> > > This belongs
> > >
> > >
> > > >      p->c = QIO_CHANNEL(sioc);
> > > >      qio_channel_set_delay(p->c, false);
> > > >      p->running = true;
> > > > @@ -918,6 +944,7 @@ int multifd_save_setup(Error **errp)
> > > >          p->packet->version = cpu_to_be32(MULTIFD_VERSION);
> > > >          p->name = g_strdup_printf("multifdsend_%d", i);
> > > >          p->tls_hostname = g_strdup(s->hostname);
> > > > +        p->write_flags = 0;
> > >
> > > here?
> > >
> > > >          socket_send_channel_create(multifd_new_send_channel_async, p);
> > > >      }
> > > > diff --git a/migration/socket.c b/migration/socket.c
> > > > index e26e94aa0c..8e40e0a3fd 100644
> > > > --- a/migration/socket.c
> > > > +++ b/migration/socket.c
> > > > @@ -78,8 +78,13 @@ static void socket_outgoing_migration(QIOTask *task,
> > > >          trace_migration_socket_outgoing_connected(data->hostname);
> > > >      }
> > > >
> > > > -    if (migrate_use_zerocopy()) {
> > > > -        error_setg(&err, "Zerocopy not available in migration");
> > > > +    if (migrate_use_zerocopy() &&
> > > > +        (!migrate_use_multifd() ||
> > > > +         !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY) ||
> > > > +          migrate_multifd_compression() != MULTIFD_COMPRESSION_NONE ||
> > > > +          migrate_use_tls())) {
> > > > +        error_setg(&err,
> > > > +                   "Zerocopy only available for non-compressed non-TLS multifd migration");
> > > >      }
> > > >
> > > >      migration_channel_connect(data->s, sioc, data->hostname, err);
> > >
> > > Do we really want to do this check here?  I think this is really too
> > > late.
> > >
> > > You are not patching migrate_params_check().
> > >
> > > I think that the proper way of doing this is something like:
> > >
> > >     if (params->zerocopy &&
> > >         (params->parameters.multifd_compression != MULTIFD_COMPRESSION_NONE ||
> > >          migrate_use_tls())) {
> > >            error_setg(&err,
> > >                      "Zerocopy only available for non-compressed non-TLS multifd migration");
> > >         return false;
> > >     }
> > >
> > > You have to do the equivalent of multifd_compression and tls enablement,
> > > to see that zerocopy is not enabled, of course.
> > >
> > > I would prefer to check for QIO_CHANNEL_FEATUR_WRITE_ZEROCPY there, but
> > > I can't see a way of doing that without a qio.
> >
> > I don't think you need to check that feature flag.
> >
> > The combination of zerocopy and compression is simply illogical
> > and can be rejected unconditionally.
>
> Or we could think of "zerocopy"  in a more targetted way.
> It is only "zerocopy" in terms the final I/O operation.
> Earlier parts of the process may involve copies. IOW, we
> can copy as part of the compress operation, but skip the
> when then sending the compressed page.
>
> In practice though this is still unlikely to be something
> we can practically do, as we would need to keep compressed
> pages around for an entire migration iteration until we can
> call flush to ensure the data has been sent. This would be
> significant memory burden.
>
> > The combination of zerocopy and TLS is somewhat questionable.
> > You're always going to have a copy between the plain text and
> > cipher text. Avoiding an extra copy for the cipher text will
> > require kerenl work which has no ETA. If it does ever arise,
> > QEMU will need more work again to actually support it. So
> > today we can just reject it unconditonally again.
>

My idea on failing here in the combination of zerocopy & (!multifd ||
TLS || compaction) is just about how it is today.
Meaning if someone wants to use zerocopy + TLS or zerocopy +
compaction in the future, and can provide a good implementation, it
should be ok to change it here (or at migrate_params_check() where
this should be).

>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>

Best regards,
Leo



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 6/6] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-11-16 16:39         ` Daniel P. Berrangé
@ 2021-12-02  6:56           ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 45+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-12-02  6:56 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Markus Armbruster, Eric Blake,
	Dr. David Alan Gilbert, Juan Quintela

On Tue, Nov 16, 2021 at 1:40 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Tue, Nov 16, 2021 at 05:34:50PM +0100, Juan Quintela wrote:
> > Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > >>
> > >>     if (params->zerocopy &&
> > >>         (params->parameters.multifd_compression != MULTIFD_COMPRESSION_NONE ||
> > >>          migrate_use_tls())) {
> > >>            error_setg(&err,
> > >>                      "Zerocopy only available for non-compressed non-TLS multifd migration");
> > >>         return false;
> > >>     }
> > >>
> > >> You have to do the equivalent of multifd_compression and tls enablement,
> > >> to see that zerocopy is not enabled, of course.
> > >>
> > >> I would prefer to check for QIO_CHANNEL_FEATUR_WRITE_ZEROCPY there, but
> > >> I can't see a way of doing that without a qio.
> > >
> > > I don't think you need to check that feature flag.
> >
> > Oh, I mean other thing.
> >
> > When you set "zerocopy" capability, you don't know if the kernel support
> > it.  My understanding is that the only way to check if it supported is
> > here.
>
> If you reqest it and it isn't supported you'll get an error back from
> qio_channel_writev_zerocopy(). That's a bit too late though.
>
> Ideally we should report an error straight after the migration code
> creates the I/O channel, by querying for the feature.
>
>

Agree.
I suggested checking the feature presence where the test is happening
in v5, and the other combinations of migration parameters at
migrate_params_check() as Juan suggested.

What do you think?

> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>

Best regards,
Leo



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 6/6] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-12-02  6:47     ` Leonardo Bras Soares Passos
@ 2021-12-02 12:10       ` Juan Quintela
  0 siblings, 0 replies; 45+ messages in thread
From: Juan Quintela @ 2021-12-02 12:10 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: qemu-devel, Eric Blake, Daniel P. Berrangé,
	Dr. David Alan Gilbert, Markus Armbruster

Leonardo Bras Soares Passos <leobras@redhat.com> wrote:
> On Tue, Nov 16, 2021 at 1:08 PM Juan Quintela <quintela@redhat.com> wrote:
>>
>> I would prefer to check for QIO_CHANNEL_FEATUR_WRITE_ZEROCPY there, but
>> I can't see a way of doing that without a qio.
>
> Yeah, I think I should leave the feature testing in here, and move the
> parameter testing to migrate_params_check() as commented before.
>
> What do you think?

The best that we can do.

Thanks, Juan.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 1/6] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks
  2021-11-23  9:45       ` Daniel P. Berrangé
@ 2021-12-03  5:24         ` Leonardo Bras Soares Passos
  2021-12-03  9:15           ` Daniel P. Berrangé
  0 siblings, 1 reply; 45+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-12-03  5:24 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Markus Armbruster, Eric Blake,
	Dr. David Alan Gilbert, Juan Quintela

Hello Daniel,

On Tue, Nov 23, 2021 at 6:45 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Mon, Nov 22, 2021 at 08:18:09PM -0300, Leonardo Bras Soares Passos wrote:
> > Hello Daniel,
> > Thanks for the feedback!
> >
> > On Fri, Nov 12, 2021 at 7:13 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > >
> > > On Fri, Nov 12, 2021 at 02:10:36AM -0300, Leonardo Bras wrote:
> > > > -int qio_channel_writev_all(QIOChannel *ioc,
> > > > -                           const struct iovec *iov,
> > > > -                           size_t niov,
> > > > -                           Error **erp);
> > > > +int qio_channel_writev_all_flags(QIOChannel *ioc,
> > > > +                                 const struct iovec *iov,
> > > > +                                 size_t niov,
> > > > +                                 int flags,
> > > > +                                 Error **errp);
> > > > +#define qio_channel_writev_all(ioc, iov, niov, errp) \
> > > > +    qio_channel_writev_all_flags(ioc, iov, niov, 0, errp)
> > >
> > > We already have separate methods for zerocopy, instead of adding
> > > flags, so we shouldn't add flags to this either.
> > >
> > > Add a qio_channel_writev_zerocopy_all method instead.
> > >
> > > Internally, we can still make both qio_channel_writev_zerocopy_all
> > > and qio_channel_writev_all use the same helper method, just don't
> > > expose flags in the public API. Even internally we don't really
> > > need flags, just a bool
> >
> > I see.
> > The idea of having a flag was to make it easier to expand the
> > interface in the future.
> > I got some feedback on v1 that would suggest it would be desired:
> > http://patchwork.ozlabs.org/project/qemu-devel/patch/20210831110238.299458-2-leobras@redhat.com/
> >
> >
> > >
> > [...]
> > > > +#define qio_channel_writev_full_all(ioc, iov, niov, fds, nfds, errp) \
> > > > +    qio_channel_writev_full_all_flags(ioc, iov, niov, fds, nfds, 0, errp)
> > >
> > > There's no need for this at all. Since fd passing is not supported
> > > with zerocopy, there's no reason to ever use this method.
> > >
> > > > +/**
> > > > + * qio_channel_writev_zerocopy:
> > > > + * @ioc: the channel object
> > > > + * @iov: the array of memory regions to write data from
> > > > + * @niov: the length of the @iov array
> > > > + * @errp: pointer to a NULL-initialized error object
> > > > + *
> > > > + * Behaves like qio_channel_writev_full_all_flags, but may write
> > >
> > > qio_channel_writev
> > >
> > > > + * data asynchronously while avoiding unnecessary data copy.
> > > > + * This function may return before any data is actually written,
> > > > + * but should queue every buffer for writing.
> > >
> > > Callers mustn't rely on "should" docs - they must rely on the
> > > return value indicating how many bytes were accepted.
> > >
> > > Also mention that this requires locked memory and can/will fail if
> > > insufficient locked memory is available.
> > >
> >
> > Sure, I will update that.
> >
> > > > +/**
> > > > + * qio_channel_flush_zerocopy:
> > > > + * @ioc: the channel object
> > > > + * @errp: pointer to a NULL-initialized error object
> > > > + *
> > > > + * Will block until every packet queued with
> > > > + * qio_channel_writev_zerocopy() is sent, or return
> > > > + * in case of any error.
> > > > + *
> > > > + * Returns -1 if any error is found, 0 otherwise.
> > >
> > >   Returns -1 if any error is found, 0 if all data was sent,
> > >            or 1 if all data was sent but at least some was copied.
> > >
> >
> > I don't really get the return 1 part, I mean, per description it will
> > 'block until every queued packet was sent, so "at least some was
> > copied" doesn't seem to fit here.
> > Could you elaborate?
>
> Passing the ZEROCOPY flag to the kernel does not guarantee
> that the copy is avoided, it is merely a hint to the kernel
>
> When getting the notification, the ee_code  field in the
> notification struct will have the flag
> SO_EE_CODE_ZEROCOPY_COPIED  set if the kernel could not
> avoid the copy.
>

Correct,

> In this case, it is better for the application to stop
> using the ZEROCOPY flag and just do normal writes, so
> it avoids the overhead of the the notifications.
>
> This is described in "Deferred copies" section of the
> Documentation/networking/msg_zerocopy.rst in linux.git
>
> So I'm suggesting that the return value of this method
> be '0' if SO_EE_CODE_ZEROCOPY_COPIED was *NOT* set in
> /all/ notifications, or '1' if SO_EE_CODE_ZEROCOPY_COPIED
> was set in at least one notification.

So, here you say that once a message has SO_EE_CODE_ZEROCOPY_COPIED we
should return 1.
Is the idea here to understand if zerocopy is working at all, so we
can disable zerocopy and avoid overhead ?

If so, we should somehow check if every write was sent with
SO_EE_CODE_ZEROCOPY_COPIED instead.
I mean, we should not disable Zerocopy if a single write got
SO_EE_CODE_ZEROCOPY_COPIED ?



>
>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>

Best regards,
Leo



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 3/6] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX
  2021-11-23  9:55       ` Daniel P. Berrangé
@ 2021-12-03  5:42         ` Leonardo Bras Soares Passos
  2021-12-03  9:17           ` Daniel P. Berrangé
  0 siblings, 1 reply; 45+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-12-03  5:42 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Markus Armbruster, Eric Blake,
	Dr. David Alan Gilbert, Juan Quintela

Hello Daniel,

On Tue, Nov 23, 2021 at 6:56 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Tue, Nov 23, 2021 at 01:46:44AM -0300, Leonardo Bras Soares Passos wrote:
> > Hello Daniel,
> >
> > On Fri, Nov 12, 2021 at 7:54 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > [...]
> > > > @@ -561,12 +577,15 @@ static ssize_t qio_channel_socket_writev_flags(QIOChannel *ioc,
> > > >   retry:
> > > >      ret = sendmsg(sioc->fd, &msg, flags);
> > > >      if (ret <= 0) {
> > > > -        if (errno == EAGAIN) {
> > > > +        switch (errno) {
> > > > +        case EAGAIN:
> > > >              return QIO_CHANNEL_ERR_BLOCK;
> > > > -        }
> > > > -        if (errno == EINTR) {
> > > > +        case EINTR:
> > > >              goto retry;
> > > > +        case ENOBUFS:
> > > > +            return QIO_CHANNEL_ERR_NOBUFS;
> > >
> > > Why does ENOBUFS need handling separately instead of letting
> > > the error_setg_errno below handle it ?
> > >
> > > The caller immediately invokes error_setg_errno() again,
> > > just with different error message.
> > >
> > > No code in this series ever looks at QIO_CHANNEL_ERR_NOBUFS
> > > either, so we don't even need that special error return code
> > > added AFAICT ?
> > >
> >
> > The idea was to add a custom message for ENOBUFS return when sending
> > with MSG_ZEROCOPY.
> > I mean, having this message is important for the user to understand
> > why the migration is failing, but it would
> > not make any sense to have this message while a non-zerocopy sendmsg()
> > returns with ENOBUFS.
> >
> > ENOBUFS : The output queue for a network interface was full.  This
> > generally indicates that the interface has stopped sending, but may be
> > caused by transient congestion.
> >
> > As an alternative, I could add this message inside the switch, inside
> > an if (flags & MSG_ZEROCOPY) on qio_channel_socket_writev_flags()
> > instead of in it's caller.
> > But for me it looks bloated, I mean, dealing with an error for
> > ZEROCOPY only in the general function.
>
> It is perfectly reasonable to check flags in this method.
>
> > OTOH, if you think that it's a better idea to deal with every error in
> > qio_channel_socket_writev_flags() instead of in the caller, I will
> > change it for v6. Please let me know.
>
> Yes, this method is already taking an ERror **errp parameter and
> reporting a user facing error. If we need to report different
> message text for ENOBUFS, it should be done in this method too.
>
> The reason QIO_CHANNEL_ERR_BLOCK is special is because we are
> explicitly not treating it as an error scenario at all.  That's
> different to the ENOBUFS case.
>

Ok, I will change it for v6.

>
> >
> > > >          }
> > > > +
> > > >          error_setg_errno(errp, errno,
> > > >                           "Unable to write to socket");
> > > >          return -1;
> > > > @@ -670,6 +689,127 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> > > >  }
> > > >  #endif /* WIN32 */
> > > >
> > > > +
> > > > +#ifdef CONFIG_LINUX
> > > > +
> > > > +static int qio_channel_socket_poll(QIOChannelSocket *sioc, bool zerocopy,
> > > > +                                   Error **errp)
> > >
> > > There's only one caller and it always passes zerocopy=true,
> > > so this parmeter looks pointless.
> >
> > I did that for possible reuse of this function in the future:
> > - As of today, this is certainly compiled out, but if at some point
> > someone wants to use poll for something other
> > than the reading of an zerocopy errqueue, it could be reused.
> >
> > But sure, if that's not desirable, I can remove the parameter (and the
> > if clause for !zerocopy).
> >
> > >
> > > > +{
> > > > +    struct pollfd pfd;
> > > > +    int ret;
> > > > +
> > > > +    pfd.fd = sioc->fd;
> > > > +    pfd.events = 0;
> > > > +
> > > > + retry:
> > > > +    ret = poll(&pfd, 1, -1);
> > > > +    if (ret < 0) {
> > > > +        switch (errno) {
> > > > +        case EAGAIN:
> > > > +        case EINTR:
> > > > +            goto retry;
> > > > +        default:
> > > > +            error_setg_errno(errp, errno,
> > > > +                             "Poll error");
> > > > +            return ret;
> > >
> > >        return -1;
> > >
> > > > +        }
> > > > +    }
> > > > +
> > > > +    if (pfd.revents & (POLLHUP | POLLNVAL)) {
> > > > +        error_setg(errp, "Poll error: Invalid or disconnected fd");
> > > > +        return -1;
> > > > +    }
> > > > +
> > > > +    if (!zerocopy && (pfd.revents & POLLERR)) {
> > > > +        error_setg(errp, "Poll error: Errors present in errqueue");
> > > > +        return -1;
> > > > +    }
> > >
> > > > +
> > > > +    return ret;
> > >
> > >   return 0;
> >
> > In the idea of future reuse I spoke above, returning zero here would
> > make this function always look like the poll timed out. Some future
> > users may want to repeat the waiting if poll() timed out, or if
> > (return > 0) stop polling.
>
> Now that I'm looking again, we should not really use poll() at all,
> as GLib provides us higher level APIs. We in fact already have the
> qio_channel_wait() method as a general purpose helper for waiting
> for an I/O condition to occur.;
>

So you suggest using
qio_channel_wait(sioc, G_IO_IN);
instead of creating the new qio_channel_socket_poll().

Is the above correct? I mean, is it as simple as that?

>
> > I understand the idea of testing SO_EE_CODE_ZEROCOPY_COPIED to be able
> > to tell whenever zerocopy fell back to copying for some reason, but I
> > don't see how this can be helpful here.
> >
> > Other than that I would do rv++ instead of rv=1 here, if I want to
> > keep track of how many buffers were sent with zerocopy and how many
> > ended up being copied.
>
> Sure, we could do   "ret > 0 == number of buffers that were copied"
> as the API contract, rather than just treating it as a boolean.

Ok, then you suggest the responsibility of checking the number of
writes with SO_EE_CODE_ZEROCOPY_COPIED, comparing with the total
number of writes,  and deciding whether to disable or not zerocopy
should be on the caller.

(Disabling zerocopy on a single SO_EE_CODE_ZEROCOPY_COPIED doesn't
seem a good idea also)

>
>
>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>

Best regards,
Leo



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 1/6] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks
  2021-12-03  5:24         ` Leonardo Bras Soares Passos
@ 2021-12-03  9:15           ` Daniel P. Berrangé
  0 siblings, 0 replies; 45+ messages in thread
From: Daniel P. Berrangé @ 2021-12-03  9:15 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: qemu-devel, Markus Armbruster, Eric Blake,
	Dr. David Alan Gilbert, Juan Quintela

On Fri, Dec 03, 2021 at 02:24:52AM -0300, Leonardo Bras Soares Passos wrote:
> Hello Daniel,
> 
> On Tue, Nov 23, 2021 at 6:45 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > On Mon, Nov 22, 2021 at 08:18:09PM -0300, Leonardo Bras Soares Passos wrote:
> > > Hello Daniel,
> > > Thanks for the feedback!
> > >
> > > On Fri, Nov 12, 2021 at 7:13 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > >
> > > > On Fri, Nov 12, 2021 at 02:10:36AM -0300, Leonardo Bras wrote:
> > > > > -int qio_channel_writev_all(QIOChannel *ioc,
> > > > > -                           const struct iovec *iov,
> > > > > -                           size_t niov,
> > > > > -                           Error **erp);
> > > > > +int qio_channel_writev_all_flags(QIOChannel *ioc,
> > > > > +                                 const struct iovec *iov,
> > > > > +                                 size_t niov,
> > > > > +                                 int flags,
> > > > > +                                 Error **errp);
> > > > > +#define qio_channel_writev_all(ioc, iov, niov, errp) \
> > > > > +    qio_channel_writev_all_flags(ioc, iov, niov, 0, errp)
> > > >
> > > > We already have separate methods for zerocopy, instead of adding
> > > > flags, so we shouldn't add flags to this either.
> > > >
> > > > Add a qio_channel_writev_zerocopy_all method instead.
> > > >
> > > > Internally, we can still make both qio_channel_writev_zerocopy_all
> > > > and qio_channel_writev_all use the same helper method, just don't
> > > > expose flags in the public API. Even internally we don't really
> > > > need flags, just a bool
> > >
> > > I see.
> > > The idea of having a flag was to make it easier to expand the
> > > interface in the future.
> > > I got some feedback on v1 that would suggest it would be desired:
> > > http://patchwork.ozlabs.org/project/qemu-devel/patch/20210831110238.299458-2-leobras@redhat.com/
> > >
> > >
> > > >
> > > [...]
> > > > > +#define qio_channel_writev_full_all(ioc, iov, niov, fds, nfds, errp) \
> > > > > +    qio_channel_writev_full_all_flags(ioc, iov, niov, fds, nfds, 0, errp)
> > > >
> > > > There's no need for this at all. Since fd passing is not supported
> > > > with zerocopy, there's no reason to ever use this method.
> > > >
> > > > > +/**
> > > > > + * qio_channel_writev_zerocopy:
> > > > > + * @ioc: the channel object
> > > > > + * @iov: the array of memory regions to write data from
> > > > > + * @niov: the length of the @iov array
> > > > > + * @errp: pointer to a NULL-initialized error object
> > > > > + *
> > > > > + * Behaves like qio_channel_writev_full_all_flags, but may write
> > > >
> > > > qio_channel_writev
> > > >
> > > > > + * data asynchronously while avoiding unnecessary data copy.
> > > > > + * This function may return before any data is actually written,
> > > > > + * but should queue every buffer for writing.
> > > >
> > > > Callers mustn't rely on "should" docs - they must rely on the
> > > > return value indicating how many bytes were accepted.
> > > >
> > > > Also mention that this requires locked memory and can/will fail if
> > > > insufficient locked memory is available.
> > > >
> > >
> > > Sure, I will update that.
> > >
> > > > > +/**
> > > > > + * qio_channel_flush_zerocopy:
> > > > > + * @ioc: the channel object
> > > > > + * @errp: pointer to a NULL-initialized error object
> > > > > + *
> > > > > + * Will block until every packet queued with
> > > > > + * qio_channel_writev_zerocopy() is sent, or return
> > > > > + * in case of any error.
> > > > > + *
> > > > > + * Returns -1 if any error is found, 0 otherwise.
> > > >
> > > >   Returns -1 if any error is found, 0 if all data was sent,
> > > >            or 1 if all data was sent but at least some was copied.
> > > >
> > >
> > > I don't really get the return 1 part, I mean, per description it will
> > > 'block until every queued packet was sent, so "at least some was
> > > copied" doesn't seem to fit here.
> > > Could you elaborate?
> >
> > Passing the ZEROCOPY flag to the kernel does not guarantee
> > that the copy is avoided, it is merely a hint to the kernel
> >
> > When getting the notification, the ee_code  field in the
> > notification struct will have the flag
> > SO_EE_CODE_ZEROCOPY_COPIED  set if the kernel could not
> > avoid the copy.
> >
> 
> Correct,
> 
> > In this case, it is better for the application to stop
> > using the ZEROCOPY flag and just do normal writes, so
> > it avoids the overhead of the the notifications.
> >
> > This is described in "Deferred copies" section of the
> > Documentation/networking/msg_zerocopy.rst in linux.git
> >
> > So I'm suggesting that the return value of this method
> > be '0' if SO_EE_CODE_ZEROCOPY_COPIED was *NOT* set in
> > /all/ notifications, or '1' if SO_EE_CODE_ZEROCOPY_COPIED
> > was set in at least one notification.
> 
> So, here you say that once a message has SO_EE_CODE_ZEROCOPY_COPIED we
> should return 1.
> Is the idea here to understand if zerocopy is working at all, so we
> can disable zerocopy and avoid overhead ?
> 
> If so, we should somehow check if every write was sent with
> SO_EE_CODE_ZEROCOPY_COPIED instead.
> I mean, we should not disable Zerocopy if a single write got
> SO_EE_CODE_ZEROCOPY_COPIED ?

Honestly I'm not sure whether there are scenarios where you'll get
one write fail and others succeed. If we want to be paranoid though,
we could check for all writes, rather than jhust one write, so we
don't disable zerocopy on transient issues. 


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 3/6] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX
  2021-12-03  5:42         ` Leonardo Bras Soares Passos
@ 2021-12-03  9:17           ` Daniel P. Berrangé
  2021-12-09  8:38             ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 45+ messages in thread
From: Daniel P. Berrangé @ 2021-12-03  9:17 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: qemu-devel, Markus Armbruster, Eric Blake,
	Dr. David Alan Gilbert, Juan Quintela

On Fri, Dec 03, 2021 at 02:42:19AM -0300, Leonardo Bras Soares Passos wrote:
> Hello Daniel,
> 
> On Tue, Nov 23, 2021 at 6:56 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > On Tue, Nov 23, 2021 at 01:46:44AM -0300, Leonardo Bras Soares Passos wrote:
> > > Hello Daniel,
> > >
> > > On Fri, Nov 12, 2021 at 7:54 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > > > +
> > > > > +#ifdef CONFIG_LINUX
> > > > > +
> > > > > +static int qio_channel_socket_poll(QIOChannelSocket *sioc, bool zerocopy,
> > > > > +                                   Error **errp)
> > > >
> > > > There's only one caller and it always passes zerocopy=true,
> > > > so this parmeter looks pointless.
> > >
> > > I did that for possible reuse of this function in the future:
> > > - As of today, this is certainly compiled out, but if at some point
> > > someone wants to use poll for something other
> > > than the reading of an zerocopy errqueue, it could be reused.
> > >
> > > But sure, if that's not desirable, I can remove the parameter (and the
> > > if clause for !zerocopy).
> > >
> > > >
> > > > > +{
> > > > > +    struct pollfd pfd;
> > > > > +    int ret;
> > > > > +
> > > > > +    pfd.fd = sioc->fd;
> > > > > +    pfd.events = 0;
> > > > > +
> > > > > + retry:
> > > > > +    ret = poll(&pfd, 1, -1);
> > > > > +    if (ret < 0) {
> > > > > +        switch (errno) {
> > > > > +        case EAGAIN:
> > > > > +        case EINTR:
> > > > > +            goto retry;
> > > > > +        default:
> > > > > +            error_setg_errno(errp, errno,
> > > > > +                             "Poll error");
> > > > > +            return ret;
> > > >
> > > >        return -1;
> > > >
> > > > > +        }
> > > > > +    }
> > > > > +
> > > > > +    if (pfd.revents & (POLLHUP | POLLNVAL)) {
> > > > > +        error_setg(errp, "Poll error: Invalid or disconnected fd");
> > > > > +        return -1;
> > > > > +    }
> > > > > +
> > > > > +    if (!zerocopy && (pfd.revents & POLLERR)) {
> > > > > +        error_setg(errp, "Poll error: Errors present in errqueue");
> > > > > +        return -1;
> > > > > +    }
> > > >
> > > > > +
> > > > > +    return ret;
> > > >
> > > >   return 0;
> > >
> > > In the idea of future reuse I spoke above, returning zero here would
> > > make this function always look like the poll timed out. Some future
> > > users may want to repeat the waiting if poll() timed out, or if
> > > (return > 0) stop polling.
> >
> > Now that I'm looking again, we should not really use poll() at all,
> > as GLib provides us higher level APIs. We in fact already have the
> > qio_channel_wait() method as a general purpose helper for waiting
> > for an I/O condition to occur.;
> >
> 
> So you suggest using
> qio_channel_wait(sioc, G_IO_IN);
> instead of creating the new qio_channel_socket_poll().
> 
> Is the above correct? I mean, is it as simple as that?

Yes, hopefully it is that simple.

> > > I understand the idea of testing SO_EE_CODE_ZEROCOPY_COPIED to be able
> > > to tell whenever zerocopy fell back to copying for some reason, but I
> > > don't see how this can be helpful here.
> > >
> > > Other than that I would do rv++ instead of rv=1 here, if I want to
> > > keep track of how many buffers were sent with zerocopy and how many
> > > ended up being copied.
> >
> > Sure, we could do   "ret > 0 == number of buffers that were copied"
> > as the API contract, rather than just treating it as a boolean.
> 
> Ok, then you suggest the responsibility of checking the number of
> writes with SO_EE_CODE_ZEROCOPY_COPIED, comparing with the total
> number of writes,  and deciding whether to disable or not zerocopy
> should be on the caller.

Yep, its a usage policy so nicer to allow caller to decide the
policy.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 3/6] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX
  2021-12-03  9:17           ` Daniel P. Berrangé
@ 2021-12-09  8:38             ` Leonardo Bras Soares Passos
  2021-12-09  8:49               ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 45+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-12-09  8:38 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Markus Armbruster, Eric Blake,
	Dr. David Alan Gilbert, Juan Quintela

Hello Daniel,

On Fri, Dec 3, 2021 at 6:18 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Fri, Dec 03, 2021 at 02:42:19AM -0300, Leonardo Bras Soares Passos wrote:
> > Hello Daniel,
> >
> > On Tue, Nov 23, 2021 at 6:56 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > >
> > > On Tue, Nov 23, 2021 at 01:46:44AM -0300, Leonardo Bras Soares Passos wrote:
> > > > Hello Daniel,
> > > >
> > > > On Fri, Nov 12, 2021 at 7:54 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > > > > +
> > > > > > +#ifdef CONFIG_LINUX
> > > > > > +
> > > > > > +static int qio_channel_socket_poll(QIOChannelSocket *sioc, bool zerocopy,
> > > > > > +                                   Error **errp)
> > > > >
> > > > > There's only one caller and it always passes zerocopy=true,
> > > > > so this parmeter looks pointless.
> > > >
> > > > I did that for possible reuse of this function in the future:
> > > > - As of today, this is certainly compiled out, but if at some point
> > > > someone wants to use poll for something other
> > > > than the reading of an zerocopy errqueue, it could be reused.
> > > >
> > > > But sure, if that's not desirable, I can remove the parameter (and the
> > > > if clause for !zerocopy).
> > > >
> > > > >
> > > > > > +{
> > > > > > +    struct pollfd pfd;
> > > > > > +    int ret;
> > > > > > +
> > > > > > +    pfd.fd = sioc->fd;
> > > > > > +    pfd.events = 0;
> > > > > > +
> > > > > > + retry:
> > > > > > +    ret = poll(&pfd, 1, -1);
> > > > > > +    if (ret < 0) {
> > > > > > +        switch (errno) {
> > > > > > +        case EAGAIN:
> > > > > > +        case EINTR:
> > > > > > +            goto retry;
> > > > > > +        default:
> > > > > > +            error_setg_errno(errp, errno,
> > > > > > +                             "Poll error");
> > > > > > +            return ret;
> > > > >
> > > > >        return -1;
> > > > >
> > > > > > +        }
> > > > > > +    }
> > > > > > +
> > > > > > +    if (pfd.revents & (POLLHUP | POLLNVAL)) {
> > > > > > +        error_setg(errp, "Poll error: Invalid or disconnected fd");
> > > > > > +        return -1;
> > > > > > +    }
> > > > > > +
> > > > > > +    if (!zerocopy && (pfd.revents & POLLERR)) {
> > > > > > +        error_setg(errp, "Poll error: Errors present in errqueue");
> > > > > > +        return -1;
> > > > > > +    }
> > > > >
> > > > > > +
> > > > > > +    return ret;
> > > > >
> > > > >   return 0;
> > > >
> > > > In the idea of future reuse I spoke above, returning zero here would
> > > > make this function always look like the poll timed out. Some future
> > > > users may want to repeat the waiting if poll() timed out, or if
> > > > (return > 0) stop polling.
> > >
> > > Now that I'm looking again, we should not really use poll() at all,
> > > as GLib provides us higher level APIs. We in fact already have the
> > > qio_channel_wait() method as a general purpose helper for waiting
> > > for an I/O condition to occur.;
> > >
> >
> > So you suggest using
> > qio_channel_wait(sioc, G_IO_IN);
> > instead of creating the new qio_channel_socket_poll().
> >
> > Is the above correct? I mean, is it as simple as that?
>
> Yes, hopefully it is that simple.

It seems not to be the case.
After some testing, I found out using this stalls the migration.

This happens when multifd_send_sync_main() calls flush_zerocopy(), but
the migration threads are
in multifd_send_thread() calling qemu_sem_wait(&p->sem);

I don't really understand enough of GLib to know how much this is
different from a poll(), but seems to make a difference.

>
> > > > I understand the idea of testing SO_EE_CODE_ZEROCOPY_COPIED to be able
> > > > to tell whenever zerocopy fell back to copying for some reason, but I
> > > > don't see how this can be helpful here.
> > > >
> > > > Other than that I would do rv++ instead of rv=1 here, if I want to
> > > > keep track of how many buffers were sent with zerocopy and how many
> > > > ended up being copied.
> > >
> > > Sure, we could do   "ret > 0 == number of buffers that were copied"
> > > as the API contract, rather than just treating it as a boolean.
> >
> > Ok, then you suggest the responsibility of checking the number of
> > writes with SO_EE_CODE_ZEROCOPY_COPIED, comparing with the total
> > number of writes,  and deciding whether to disable or not zerocopy
> > should be on the caller.
>
> Yep, its a usage policy so nicer to allow caller to decide the
> policy.
>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 3/6] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX
  2021-12-09  8:38             ` Leonardo Bras Soares Passos
@ 2021-12-09  8:49               ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 45+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-12-09  8:49 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Markus Armbruster, Eric Blake,
	Dr. David Alan Gilbert, Juan Quintela

On Thu, Dec 9, 2021 at 5:38 AM Leonardo Bras Soares Passos
<leobras@redhat.com> wrote:
>
> Hello Daniel,
>
> On Fri, Dec 3, 2021 at 6:18 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > On Fri, Dec 03, 2021 at 02:42:19AM -0300, Leonardo Bras Soares Passos wrote:
> > > Hello Daniel,
> > >
> > > On Tue, Nov 23, 2021 at 6:56 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > >
> > > > On Tue, Nov 23, 2021 at 01:46:44AM -0300, Leonardo Bras Soares Passos wrote:
> > > > > Hello Daniel,
> > > > >
> > > > > On Fri, Nov 12, 2021 at 7:54 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > > > > > +
> > > > > > > +#ifdef CONFIG_LINUX
> > > > > > > +
> > > > > > > +static int qio_channel_socket_poll(QIOChannelSocket *sioc, bool zerocopy,
> > > > > > > +                                   Error **errp)
> > > > > >
> > > > > > There's only one caller and it always passes zerocopy=true,
> > > > > > so this parmeter looks pointless.
> > > > >
> > > > > I did that for possible reuse of this function in the future:
> > > > > - As of today, this is certainly compiled out, but if at some point
> > > > > someone wants to use poll for something other
> > > > > than the reading of an zerocopy errqueue, it could be reused.
> > > > >
> > > > > But sure, if that's not desirable, I can remove the parameter (and the
> > > > > if clause for !zerocopy).
> > > > >
> > > > > >
> > > > > > > +{
> > > > > > > +    struct pollfd pfd;
> > > > > > > +    int ret;
> > > > > > > +
> > > > > > > +    pfd.fd = sioc->fd;
> > > > > > > +    pfd.events = 0;
> > > > > > > +
> > > > > > > + retry:
> > > > > > > +    ret = poll(&pfd, 1, -1);
> > > > > > > +    if (ret < 0) {
> > > > > > > +        switch (errno) {
> > > > > > > +        case EAGAIN:
> > > > > > > +        case EINTR:
> > > > > > > +            goto retry;
> > > > > > > +        default:
> > > > > > > +            error_setg_errno(errp, errno,
> > > > > > > +                             "Poll error");
> > > > > > > +            return ret;
> > > > > >
> > > > > >        return -1;
> > > > > >
> > > > > > > +        }
> > > > > > > +    }
> > > > > > > +
> > > > > > > +    if (pfd.revents & (POLLHUP | POLLNVAL)) {
> > > > > > > +        error_setg(errp, "Poll error: Invalid or disconnected fd");
> > > > > > > +        return -1;
> > > > > > > +    }
> > > > > > > +
> > > > > > > +    if (!zerocopy && (pfd.revents & POLLERR)) {
> > > > > > > +        error_setg(errp, "Poll error: Errors present in errqueue");
> > > > > > > +        return -1;
> > > > > > > +    }
> > > > > >
> > > > > > > +
> > > > > > > +    return ret;
> > > > > >
> > > > > >   return 0;
> > > > >
> > > > > In the idea of future reuse I spoke above, returning zero here would
> > > > > make this function always look like the poll timed out. Some future
> > > > > users may want to repeat the waiting if poll() timed out, or if
> > > > > (return > 0) stop polling.
> > > >
> > > > Now that I'm looking again, we should not really use poll() at all,
> > > > as GLib provides us higher level APIs. We in fact already have the
> > > > qio_channel_wait() method as a general purpose helper for waiting
> > > > for an I/O condition to occur.;
> > > >
> > >
> > > So you suggest using
> > > qio_channel_wait(sioc, G_IO_IN);
> > > instead of creating the new qio_channel_socket_poll().
> > >
> > > Is the above correct? I mean, is it as simple as that?
> >
> > Yes, hopefully it is that simple.
>
> It seems not to be the case.
> After some testing, I found out using this stalls the migration.
>
> This happens when multifd_send_sync_main() calls flush_zerocopy(), but
> the migration threads are
> in multifd_send_thread() calling qemu_sem_wait(&p->sem);
>
> I don't really understand enough of GLib to know how much this is
> different from a poll(), but seems to make a difference.

Oh, nevermind.
A few minutes reading GLib docs was enough for me to understand my mistake.
We will need to use G_IO_ERR instead of G_IO_IN, because we are
waiting for messages
in the ERRQUEUE.

>
> >
> > > > > I understand the idea of testing SO_EE_CODE_ZEROCOPY_COPIED to be able
> > > > > to tell whenever zerocopy fell back to copying for some reason, but I
> > > > > don't see how this can be helpful here.
> > > > >
> > > > > Other than that I would do rv++ instead of rv=1 here, if I want to
> > > > > keep track of how many buffers were sent with zerocopy and how many
> > > > > ended up being copied.
> > > >
> > > > Sure, we could do   "ret > 0 == number of buffers that were copied"
> > > > as the API contract, rather than just treating it as a boolean.
> > >
> > > Ok, then you suggest the responsibility of checking the number of
> > > writes with SO_EE_CODE_ZEROCOPY_COPIED, comparing with the total
> > > number of writes,  and deciding whether to disable or not zerocopy
> > > should be on the caller.
> >
> > Yep, its a usage policy so nicer to allow caller to decide the
> > policy.
> >
> > Regards,
> > Daniel
> > --
> > |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> > |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> > |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> >



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 6/6] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-11-16 16:08   ` Juan Quintela
  2021-11-16 16:17     ` Daniel P. Berrangé
  2021-12-02  6:47     ` Leonardo Bras Soares Passos
@ 2021-12-09  8:51     ` Leonardo Bras Soares Passos
  2021-12-09  9:42       ` Leonardo Bras Soares Passos
  2 siblings, 1 reply; 45+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-12-09  8:51 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Eric Blake, Daniel P. Berrangé,
	Dr. David Alan Gilbert, Markus Armbruster

Hello Juan,

On Tue, Nov 16, 2021 at 1:08 PM Juan Quintela <quintela@redhat.com> wrote:
>
> Leonardo Bras <leobras@redhat.com> wrote:
> > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> > zerocopy interface.
> >
> > Change multifd_send_sync_main() so it can distinguish each iteration sync from
> > the setup and the completion, so a flush_zerocopy() can be called
> > at the after each iteration in order to make sure all dirty pages are sent
> > before a new iteration is started.
> >
> > Also make it return -1 if flush_zerocopy() fails, in order to cancel
> > the migration process, and avoid resuming the guest in the target host
> > without receiving all current RAM.
> >
> > This will work fine on RAM migration because the RAM pages are not usually freed,
> > and there is no problem on changing the pages content between async_send() and
> > the actual sending of the buffer, because this change will dirty the page and
> > cause it to be re-sent on a next iteration anyway.
> >
> > Given a lot of locked memory may be needed in order to use multid migration
> > with zerocopy enabled, make it optional by creating a new migration parameter
> > "zerocopy" on qapi, so low-privileged users can still perform multifd
> > migrations.
>
> How much memory can a non-root program use by default?
>
>
> >  static void *multifd_send_thread(void *opaque)
> > @@ -853,6 +875,10 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> >          goto cleanup;
> >      }
> >
> > +    if (migrate_use_zerocopy()) {
> > +        p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
> > +    }
>
> This belongs
>
>
> >      p->c = QIO_CHANNEL(sioc);
> >      qio_channel_set_delay(p->c, false);
> >      p->running = true;
> > @@ -918,6 +944,7 @@ int multifd_save_setup(Error **errp)
> >          p->packet->version = cpu_to_be32(MULTIFD_VERSION);
> >          p->name = g_strdup_printf("multifdsend_%d", i);
> >          p->tls_hostname = g_strdup(s->hostname);
> > +        p->write_flags = 0;
>
> here?
>
> >          socket_send_channel_create(multifd_new_send_channel_async, p);
> >      }
> > diff --git a/migration/socket.c b/migration/socket.c
> > index e26e94aa0c..8e40e0a3fd 100644
> > --- a/migration/socket.c
> > +++ b/migration/socket.c
> > @@ -78,8 +78,13 @@ static void socket_outgoing_migration(QIOTask *task,
> >          trace_migration_socket_outgoing_connected(data->hostname);
> >      }
> >
> > -    if (migrate_use_zerocopy()) {
> > -        error_setg(&err, "Zerocopy not available in migration");
> > +    if (migrate_use_zerocopy() &&
> > +        (!migrate_use_multifd() ||
> > +         !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY) ||
> > +          migrate_multifd_compression() != MULTIFD_COMPRESSION_NONE ||
> > +          migrate_use_tls())) {
> > +        error_setg(&err,
> > +                   "Zerocopy only available for non-compressed non-TLS multifd migration");
> >      }
> >
> >      migration_channel_connect(data->s, sioc, data->hostname, err);
>
> Do we really want to do this check here?  I think this is really too
> late.
>
> You are not patching migrate_params_check().
>
> I think that the proper way of doing this is something like:
>
>     if (params->zerocopy &&
>         (params->parameters.multifd_compression != MULTIFD_COMPRESSION_NONE ||
>          migrate_use_tls())) {
>            error_setg(&err,
>                      "Zerocopy only available for non-compressed non-TLS multifd migration");
>         return false;
>     }

Don't we also need a check for multifd enabled here?
We could have zerocopy, multifd_compression=none, tls=disabled but it
will not fail if multifd=disabled.

Is this correct?


>
> You have to do the equivalent of multifd_compression and tls enablement,
> to see that zerocopy is not enabled, of course.
>
> I would prefer to check for QIO_CHANNEL_FEATUR_WRITE_ZEROCPY there, but
> I can't see a way of doing that without a qio.
>
> Later, Juan.
>



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 6/6] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)
  2021-12-09  8:51     ` Leonardo Bras Soares Passos
@ 2021-12-09  9:42       ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 45+ messages in thread
From: Leonardo Bras Soares Passos @ 2021-12-09  9:42 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Eric Blake, Daniel P. Berrangé,
	Dr. David Alan Gilbert, Markus Armbruster

On Thu, Dec 9, 2021 at 5:51 AM Leonardo Bras Soares Passos
<leobras@redhat.com> wrote:
>
> Hello Juan,
>
> On Tue, Nov 16, 2021 at 1:08 PM Juan Quintela <quintela@redhat.com> wrote:
> >
> > Leonardo Bras <leobras@redhat.com> wrote:
> > > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> > > zerocopy interface.
> > >
> > > Change multifd_send_sync_main() so it can distinguish each iteration sync from
> > > the setup and the completion, so a flush_zerocopy() can be called
> > > at the after each iteration in order to make sure all dirty pages are sent
> > > before a new iteration is started.
> > >
> > > Also make it return -1 if flush_zerocopy() fails, in order to cancel
> > > the migration process, and avoid resuming the guest in the target host
> > > without receiving all current RAM.
> > >
> > > This will work fine on RAM migration because the RAM pages are not usually freed,
> > > and there is no problem on changing the pages content between async_send() and
> > > the actual sending of the buffer, because this change will dirty the page and
> > > cause it to be re-sent on a next iteration anyway.
> > >
> > > Given a lot of locked memory may be needed in order to use multid migration
> > > with zerocopy enabled, make it optional by creating a new migration parameter
> > > "zerocopy" on qapi, so low-privileged users can still perform multifd
> > > migrations.
> >
> > How much memory can a non-root program use by default?
> >
> >
> > >  static void *multifd_send_thread(void *opaque)
> > > @@ -853,6 +875,10 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> > >          goto cleanup;
> > >      }
> > >
> > > +    if (migrate_use_zerocopy()) {
> > > +        p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZEROCOPY;
> > > +    }
> >
> > This belongs
> >
> >
> > >      p->c = QIO_CHANNEL(sioc);
> > >      qio_channel_set_delay(p->c, false);
> > >      p->running = true;
> > > @@ -918,6 +944,7 @@ int multifd_save_setup(Error **errp)
> > >          p->packet->version = cpu_to_be32(MULTIFD_VERSION);
> > >          p->name = g_strdup_printf("multifdsend_%d", i);
> > >          p->tls_hostname = g_strdup(s->hostname);
> > > +        p->write_flags = 0;
> >
> > here?
> >
> > >          socket_send_channel_create(multifd_new_send_channel_async, p);
> > >      }
> > > diff --git a/migration/socket.c b/migration/socket.c
> > > index e26e94aa0c..8e40e0a3fd 100644
> > > --- a/migration/socket.c
> > > +++ b/migration/socket.c
> > > @@ -78,8 +78,13 @@ static void socket_outgoing_migration(QIOTask *task,
> > >          trace_migration_socket_outgoing_connected(data->hostname);
> > >      }
> > >
> > > -    if (migrate_use_zerocopy()) {
> > > -        error_setg(&err, "Zerocopy not available in migration");
> > > +    if (migrate_use_zerocopy() &&
> > > +        (!migrate_use_multifd() ||
> > > +         !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZEROCOPY) ||
> > > +          migrate_multifd_compression() != MULTIFD_COMPRESSION_NONE ||
> > > +          migrate_use_tls())) {
> > > +        error_setg(&err,
> > > +                   "Zerocopy only available for non-compressed non-TLS multifd migration");
> > >      }
> > >
> > >      migration_channel_connect(data->s, sioc, data->hostname, err);
> >
> > Do we really want to do this check here?  I think this is really too
> > late.
> >
> > You are not patching migrate_params_check().
> >
> > I think that the proper way of doing this is something like:
> >
> >     if (params->zerocopy &&
> >         (params->parameters.multifd_compression != MULTIFD_COMPRESSION_NONE ||
> >          migrate_use_tls())) {
> >            error_setg(&err,
> >                      "Zerocopy only available for non-compressed non-TLS multifd migration");
> >         return false;
> >     }
>
> Don't we also need a check for multifd enabled here?
> We could have zerocopy, multifd_compression=none, tls=disabled but it
> will not fail if multifd=disabled.
>
> Is this correct?
>

I did some tests and this case actually seems to not fail, even though
it should.
So IIUC we really need to check for multifd here.

Sending v6.

>
> >
> > You have to do the equivalent of multifd_compression and tls enablement,
> > to see that zerocopy is not enabled, of course.
> >
> > I would prefer to check for QIO_CHANNEL_FEATUR_WRITE_ZEROCPY there, but
> > I can't see a way of doing that without a qio.
> >
> > Later, Juan.
> >



^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2021-12-09  9:48 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-12  5:10 [PATCH v5 0/6] MSG_ZEROCOPY + multifd Leonardo Bras
2021-11-12  5:10 ` [PATCH v5 1/6] QIOChannel: Add io_writev_zerocopy & io_flush_zerocopy callbacks Leonardo Bras
2021-11-12 10:13   ` Daniel P. Berrangé
2021-11-12 10:26     ` Daniel P. Berrangé
2021-11-22 23:18     ` Leonardo Bras Soares Passos
2021-11-23  9:45       ` Daniel P. Berrangé
2021-12-03  5:24         ` Leonardo Bras Soares Passos
2021-12-03  9:15           ` Daniel P. Berrangé
2021-11-12 10:56   ` Daniel P. Berrangé
2021-11-12  5:10 ` [PATCH v5 2/6] QIOChannelSocket: Add flags parameter for writing Leonardo Bras
2021-11-12 10:15   ` Daniel P. Berrangé
2021-11-23  5:33     ` Leonardo Bras Soares Passos
2021-11-12  5:10 ` [PATCH v5 3/6] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX Leonardo Bras
2021-11-12 10:54   ` Daniel P. Berrangé
2021-11-23  4:46     ` Leonardo Bras Soares Passos
2021-11-23  9:55       ` Daniel P. Berrangé
2021-12-03  5:42         ` Leonardo Bras Soares Passos
2021-12-03  9:17           ` Daniel P. Berrangé
2021-12-09  8:38             ` Leonardo Bras Soares Passos
2021-12-09  8:49               ` Leonardo Bras Soares Passos
2021-11-12  5:10 ` [PATCH v5 4/6] migration: Add zerocopy parameter for QMP/HMP for Linux Leonardo Bras
2021-11-12 11:04   ` Juan Quintela
2021-11-12 11:08     ` Daniel P. Berrangé
2021-11-12 11:59       ` Markus Armbruster
2021-12-01 19:07         ` Leonardo Bras Soares Passos
2021-11-12 12:01     ` Markus Armbruster
2021-12-02  4:31       ` Leonardo Bras Soares Passos
2021-12-01 18:51     ` Leonardo Bras Soares Passos
2021-11-12 11:05   ` Daniel P. Berrangé
2021-12-01 19:05     ` Leonardo Bras Soares Passos
2021-11-12  5:10 ` [PATCH v5 5/6] migration: Add migrate_use_tls() helper Leonardo Bras
2021-11-12 11:04   ` Juan Quintela
2021-11-30 19:00     ` Leonardo Bras Soares Passos
2021-11-12  5:10 ` [PATCH v5 6/6] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy) Leonardo Bras
2021-11-16 16:08   ` Juan Quintela
2021-11-16 16:17     ` Daniel P. Berrangé
2021-11-16 16:34       ` Juan Quintela
2021-11-16 16:39         ` Daniel P. Berrangé
2021-12-02  6:56           ` Leonardo Bras Soares Passos
2021-11-16 16:34       ` Daniel P. Berrangé
2021-12-02  6:54         ` Leonardo Bras Soares Passos
2021-12-02  6:47     ` Leonardo Bras Soares Passos
2021-12-02 12:10       ` Juan Quintela
2021-12-09  8:51     ` Leonardo Bras Soares Passos
2021-12-09  9:42       ` Leonardo Bras Soares Passos

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.