All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] Zero copy improvements (QIOChannel + multifd)
@ 2022-07-04 20:23 Leonardo Bras
  2022-07-04 20:23 ` [PATCH v3 1/3] QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent Leonardo Bras
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Leonardo Bras @ 2022-07-04 20:23 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, Peter Xu
  Cc: Leonardo Bras, qemu-devel

The first patch avoid spuriously returning 1 [*] when zero-copy flush is
attempted before any buffer was sent using MSG_ZEROCOPY.

[*] zero-copy not being used, even though it's enabled and supported
by kernel

The second patch introduces a new migration stat
(dirty-sync-missed-zero-copy) that will be used to keep track of [*]. 

The third patch keeps track of how many zero-copy flushes retured 1 [*]

Changes since v2:
- Documentation release number changed from 7.2 to 7.1
- migration stat renamed from zero-copy-copied to 
  dirty-sync-missed-zero-copy
- Updated documentation to make it more user-friendly

Changes since v1:
- Idea of using a warning replaced by using a migration stat counter

Leonardo Bras (3):
  QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing
    sent
  Add dirty-sync-missed-zero-copy migration stat
  migration/multifd: Warn user when zerocopy not working

 qapi/migration.json   | 7 ++++++-
 migration/ram.h       | 2 ++
 io/channel-socket.c   | 8 +++++++-
 migration/migration.c | 2 ++
 migration/multifd.c   | 2 ++
 migration/ram.c       | 5 +++++
 monitor/hmp-cmds.c    | 4 ++++
 7 files changed, 28 insertions(+), 2 deletions(-)

-- 
2.36.1



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 1/3] QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent
  2022-07-04 20:23 [PATCH v3 0/3] Zero copy improvements (QIOChannel + multifd) Leonardo Bras
@ 2022-07-04 20:23 ` Leonardo Bras
  2022-07-05  8:04   ` Daniel P. Berrangé
  2022-07-07 17:46   ` Peter Xu
  2022-07-04 20:23 ` [PATCH v3 2/3] Add dirty-sync-missed-zero-copy migration stat Leonardo Bras
  2022-07-04 20:23 ` [PATCH v3 3/3] migration/multifd: Warn user when zerocopy not working Leonardo Bras
  2 siblings, 2 replies; 23+ messages in thread
From: Leonardo Bras @ 2022-07-04 20:23 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, Peter Xu
  Cc: Leonardo Bras, qemu-devel

If flush is called when no buffer was sent with MSG_ZEROCOPY, it currently
returns 1. This return code should be used only when Linux fails to use
MSG_ZEROCOPY on a lot of sendmsg().

Fix this by returning early from flush if no sendmsg(...,MSG_ZEROCOPY)
was attempted.

Fixes: 2bc58ffc2926 ("QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX")
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 io/channel-socket.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/io/channel-socket.c b/io/channel-socket.c
index 4466bb1cd4..698c086b70 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -716,12 +716,18 @@ static int qio_channel_socket_flush(QIOChannel *ioc,
     struct cmsghdr *cm;
     char control[CMSG_SPACE(sizeof(*serr))];
     int received;
-    int ret = 1;
+    int ret;
+
+    if (!sioc->zero_copy_queued) {
+        return 0;
+    }
 
     msg.msg_control = control;
     msg.msg_controllen = sizeof(control);
     memset(control, 0, sizeof(control));
 
+    ret = 1;
+
     while (sioc->zero_copy_sent < sioc->zero_copy_queued) {
         received = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE);
         if (received < 0) {
-- 
2.36.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 2/3] Add dirty-sync-missed-zero-copy migration stat
  2022-07-04 20:23 [PATCH v3 0/3] Zero copy improvements (QIOChannel + multifd) Leonardo Bras
  2022-07-04 20:23 ` [PATCH v3 1/3] QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent Leonardo Bras
@ 2022-07-04 20:23 ` Leonardo Bras
  2022-07-05  4:14   ` Markus Armbruster
                     ` (2 more replies)
  2022-07-04 20:23 ` [PATCH v3 3/3] migration/multifd: Warn user when zerocopy not working Leonardo Bras
  2 siblings, 3 replies; 23+ messages in thread
From: Leonardo Bras @ 2022-07-04 20:23 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, Peter Xu
  Cc: Leonardo Bras, qemu-devel

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 qapi/migration.json   | 7 ++++++-
 migration/migration.c | 2 ++
 monitor/hmp-cmds.c    | 4 ++++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 7102e474a6..fed08b9b88 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -55,6 +55,10 @@
 # @postcopy-bytes: The number of bytes sent during the post-copy phase
 #                  (since 7.0).
 #
+# @dirty-sync-missed-zero-copy: Number of times dirty RAM synchronization could
+#                               not avoid copying zero pages.  This is between 0
+#                               and @dirty-sync-count * @multifd-channels.
+#                               (since 7.1)
 # Since: 0.14
 ##
 { 'struct': 'MigrationStats',
@@ -65,7 +69,8 @@
            'postcopy-requests' : 'int', 'page-size' : 'int',
            'multifd-bytes' : 'uint64', 'pages-per-second' : 'uint64',
            'precopy-bytes' : 'uint64', 'downtime-bytes' : 'uint64',
-           'postcopy-bytes' : 'uint64' } }
+           'postcopy-bytes' : 'uint64',
+           'dirty-sync-missed-zero-copy' : 'uint64' } }
 
 ##
 # @XBZRLECacheStats:
diff --git a/migration/migration.c b/migration/migration.c
index 78f5057373..048f7f8bdb 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1027,6 +1027,8 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
     info->ram->normal_bytes = ram_counters.normal * page_size;
     info->ram->mbps = s->mbps;
     info->ram->dirty_sync_count = ram_counters.dirty_sync_count;
+    info->ram->dirty_sync_missed_zero_copy =
+            ram_counters.dirty_sync_missed_zero_copy;
     info->ram->postcopy_requests = ram_counters.postcopy_requests;
     info->ram->page_size = page_size;
     info->ram->multifd_bytes = ram_counters.multifd_bytes;
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index ca98df0495..5f3be9e405 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -307,6 +307,10 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
             monitor_printf(mon, "postcopy ram: %" PRIu64 " kbytes\n",
                            info->ram->postcopy_bytes >> 10);
         }
+        if (info->ram->dirty_sync_missed_zero_copy) {
+            monitor_printf(mon, "missed zero-copy on: %" PRIu64 " iterations\n",
+                           info->ram->dirty_sync_missed_zero_copy);
+        }
     }
 
     if (info->has_disk) {
-- 
2.36.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 3/3] migration/multifd: Warn user when zerocopy not working
  2022-07-04 20:23 [PATCH v3 0/3] Zero copy improvements (QIOChannel + multifd) Leonardo Bras
  2022-07-04 20:23 ` [PATCH v3 1/3] QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent Leonardo Bras
  2022-07-04 20:23 ` [PATCH v3 2/3] Add dirty-sync-missed-zero-copy migration stat Leonardo Bras
@ 2022-07-04 20:23 ` Leonardo Bras
  2022-07-05  8:05   ` Daniel P. Berrangé
  2022-07-07 17:56   ` Peter Xu
  2 siblings, 2 replies; 23+ messages in thread
From: Leonardo Bras @ 2022-07-04 20:23 UTC (permalink / raw)
  To: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, Peter Xu
  Cc: Leonardo Bras, qemu-devel

Some errors, like the lack of Scatter-Gather support by the network
interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using
zero-copy, which causes it to fall back to the default copying mechanism.

After each full dirty-bitmap scan there should be a zero-copy flush
happening, which checks for errors each of the previous calls to
sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then
increment dirty_sync_missed_zero_copy migration stat to let the user know
about it.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/ram.h     | 2 ++
 migration/multifd.c | 2 ++
 migration/ram.c     | 5 +++++
 3 files changed, 9 insertions(+)

diff --git a/migration/ram.h b/migration/ram.h
index ded0a3a086..d3c7eb96f5 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -87,4 +87,6 @@ void ram_write_tracking_prepare(void);
 int ram_write_tracking_start(void);
 void ram_write_tracking_stop(void);
 
+void dirty_sync_missed_zero_copy(void);
+
 #endif
diff --git a/migration/multifd.c b/migration/multifd.c
index 684c014c86..3909b34967 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -624,6 +624,8 @@ int multifd_send_sync_main(QEMUFile *f)
             if (ret < 0) {
                 error_report_err(err);
                 return -1;
+            } else if (ret == 1) {
+                dirty_sync_missed_zero_copy();
             }
         }
     }
diff --git a/migration/ram.c b/migration/ram.c
index 01f9cc1d72..db948c4787 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -407,6 +407,11 @@ static void ram_transferred_add(uint64_t bytes)
     ram_counters.transferred += bytes;
 }
 
+void dirty_sync_missed_zero_copy(void)
+{
+    ram_counters.dirty_sync_missed_zero_copy++;
+}
+
 /* used by the search for pages to send */
 struct PageSearchStatus {
     /* Current block being searched */
-- 
2.36.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/3] Add dirty-sync-missed-zero-copy migration stat
  2022-07-04 20:23 ` [PATCH v3 2/3] Add dirty-sync-missed-zero-copy migration stat Leonardo Bras
@ 2022-07-05  4:14   ` Markus Armbruster
  2022-07-05  8:05   ` Daniel P. Berrangé
  2022-07-07 17:54   ` Peter Xu
  2 siblings, 0 replies; 23+ messages in thread
From: Markus Armbruster @ 2022-07-05  4:14 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake, Peter Xu,
	qemu-devel

Leonardo Bras <leobras@redhat.com> writes:

> Signed-off-by: Leonardo Bras <leobras@redhat.com>

Acked-by: Markus Armbruster <armbru@redhat.com>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent
  2022-07-04 20:23 ` [PATCH v3 1/3] QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent Leonardo Bras
@ 2022-07-05  8:04   ` Daniel P. Berrangé
  2022-07-05 15:15     ` Juan Quintela
  2022-07-07 17:46   ` Peter Xu
  1 sibling, 1 reply; 23+ messages in thread
From: Daniel P. Berrangé @ 2022-07-05  8:04 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, Peter Xu, qemu-devel

On Mon, Jul 04, 2022 at 05:23:13PM -0300, Leonardo Bras wrote:
> If flush is called when no buffer was sent with MSG_ZEROCOPY, it currently
> returns 1. This return code should be used only when Linux fails to use
> MSG_ZEROCOPY on a lot of sendmsg().
> 
> Fix this by returning early from flush if no sendmsg(...,MSG_ZEROCOPY)
> was attempted.
> 
> Fixes: 2bc58ffc2926 ("QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX")
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  io/channel-socket.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

And if this merges via migration maintainers' tree

Acked-by: Daniel P. Berrangé <berrange@redhat.com>

> 
> diff --git a/io/channel-socket.c b/io/channel-socket.c
> index 4466bb1cd4..698c086b70 100644
> --- a/io/channel-socket.c
> +++ b/io/channel-socket.c
> @@ -716,12 +716,18 @@ static int qio_channel_socket_flush(QIOChannel *ioc,
>      struct cmsghdr *cm;
>      char control[CMSG_SPACE(sizeof(*serr))];
>      int received;
> -    int ret = 1;
> +    int ret;
> +
> +    if (!sioc->zero_copy_queued) {
> +        return 0;
> +    }
>  
>      msg.msg_control = control;
>      msg.msg_controllen = sizeof(control);
>      memset(control, 0, sizeof(control));
>  
> +    ret = 1;
> +
>      while (sioc->zero_copy_sent < sioc->zero_copy_queued) {
>          received = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE);
>          if (received < 0) {
> -- 
> 2.36.1
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/3] Add dirty-sync-missed-zero-copy migration stat
  2022-07-04 20:23 ` [PATCH v3 2/3] Add dirty-sync-missed-zero-copy migration stat Leonardo Bras
  2022-07-05  4:14   ` Markus Armbruster
@ 2022-07-05  8:05   ` Daniel P. Berrangé
  2022-07-07 17:54   ` Peter Xu
  2 siblings, 0 replies; 23+ messages in thread
From: Daniel P. Berrangé @ 2022-07-05  8:05 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, Peter Xu, qemu-devel

On Mon, Jul 04, 2022 at 05:23:14PM -0300, Leonardo Bras wrote:
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  qapi/migration.json   | 7 ++++++-
>  migration/migration.c | 2 ++
>  monitor/hmp-cmds.c    | 4 ++++
>  3 files changed, 12 insertions(+), 1 deletion(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/3] migration/multifd: Warn user when zerocopy not working
  2022-07-04 20:23 ` [PATCH v3 3/3] migration/multifd: Warn user when zerocopy not working Leonardo Bras
@ 2022-07-05  8:05   ` Daniel P. Berrangé
  2022-07-07 17:56   ` Peter Xu
  1 sibling, 0 replies; 23+ messages in thread
From: Daniel P. Berrangé @ 2022-07-05  8:05 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, Peter Xu, qemu-devel

On Mon, Jul 04, 2022 at 05:23:15PM -0300, Leonardo Bras wrote:
> Some errors, like the lack of Scatter-Gather support by the network
> interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using
> zero-copy, which causes it to fall back to the default copying mechanism.
> 
> After each full dirty-bitmap scan there should be a zero-copy flush
> happening, which checks for errors each of the previous calls to
> sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then
> increment dirty_sync_missed_zero_copy migration stat to let the user know
> about it.
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  migration/ram.h     | 2 ++
>  migration/multifd.c | 2 ++
>  migration/ram.c     | 5 +++++
>  3 files changed, 9 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent
  2022-07-05  8:04   ` Daniel P. Berrangé
@ 2022-07-05 15:15     ` Juan Quintela
  0 siblings, 0 replies; 23+ messages in thread
From: Juan Quintela @ 2022-07-05 15:15 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Leonardo Bras, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, Peter Xu, qemu-devel

Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Mon, Jul 04, 2022 at 05:23:13PM -0300, Leonardo Bras wrote:
>> If flush is called when no buffer was sent with MSG_ZEROCOPY, it currently
>> returns 1. This return code should be used only when Linux fails to use
>> MSG_ZEROCOPY on a lot of sendmsg().
>> 
>> Fix this by returning early from flush if no sendmsg(...,MSG_ZEROCOPY)
>> was attempted.
>> 
>> Fixes: 2bc58ffc2926 ("QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX")
>> Signed-off-by: Leonardo Bras <leobras@redhat.com>
>> ---
>>  io/channel-socket.c | 8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
>
> And if this merges via migration maintainers' tree
>
> Acked-by: Daniel P. Berrangé <berrange@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

I will add this on the next PULLL request.

Thanks.


>> 
>> diff --git a/io/channel-socket.c b/io/channel-socket.c
>> index 4466bb1cd4..698c086b70 100644
>> --- a/io/channel-socket.c
>> +++ b/io/channel-socket.c
>> @@ -716,12 +716,18 @@ static int qio_channel_socket_flush(QIOChannel *ioc,
>>      struct cmsghdr *cm;
>>      char control[CMSG_SPACE(sizeof(*serr))];
>>      int received;
>> -    int ret = 1;
>> +    int ret;
>> +
>> +    if (!sioc->zero_copy_queued) {
>> +        return 0;
>> +    }
>>  
>>      msg.msg_control = control;
>>      msg.msg_controllen = sizeof(control);
>>      memset(control, 0, sizeof(control));
>>  
>> +    ret = 1;
>> +
>>      while (sioc->zero_copy_sent < sioc->zero_copy_queued) {
>>          received = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE);
>>          if (received < 0) {
>> -- 
>> 2.36.1
>> 
>
> With regards,
> Daniel



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent
  2022-07-04 20:23 ` [PATCH v3 1/3] QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent Leonardo Bras
  2022-07-05  8:04   ` Daniel P. Berrangé
@ 2022-07-07 17:46   ` Peter Xu
  2022-07-07 19:44     ` Leonardo Bras Soares Passos
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Xu @ 2022-07-07 17:46 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, qemu-devel

Hi, Leo,

On Mon, Jul 04, 2022 at 05:23:13PM -0300, Leonardo Bras wrote:
> If flush is called when no buffer was sent with MSG_ZEROCOPY, it currently
> returns 1. This return code should be used only when Linux fails to use
> MSG_ZEROCOPY on a lot of sendmsg().
> 
> Fix this by returning early from flush if no sendmsg(...,MSG_ZEROCOPY)
> was attempted.
> 
> Fixes: 2bc58ffc2926 ("QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX")
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  io/channel-socket.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/io/channel-socket.c b/io/channel-socket.c
> index 4466bb1cd4..698c086b70 100644
> --- a/io/channel-socket.c
> +++ b/io/channel-socket.c
> @@ -716,12 +716,18 @@ static int qio_channel_socket_flush(QIOChannel *ioc,
>      struct cmsghdr *cm;
>      char control[CMSG_SPACE(sizeof(*serr))];
>      int received;
> -    int ret = 1;
> +    int ret;
> +
> +    if (!sioc->zero_copy_queued) {

I think I asked this in the downstream review but didn't get a
response.. shouldn't this check be "queued == sent"?

> +        return 0;
> +    }
>  
>      msg.msg_control = control;
>      msg.msg_controllen = sizeof(control);
>      memset(control, 0, sizeof(control));
>  
> +    ret = 1;
> +
>      while (sioc->zero_copy_sent < sioc->zero_copy_queued) {
>          received = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE);
>          if (received < 0) {
> -- 
> 2.36.1
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/3] Add dirty-sync-missed-zero-copy migration stat
  2022-07-04 20:23 ` [PATCH v3 2/3] Add dirty-sync-missed-zero-copy migration stat Leonardo Bras
  2022-07-05  4:14   ` Markus Armbruster
  2022-07-05  8:05   ` Daniel P. Berrangé
@ 2022-07-07 17:54   ` Peter Xu
  2022-07-07 19:50     ` Leonardo Bras Soares Passos
  2 siblings, 1 reply; 23+ messages in thread
From: Peter Xu @ 2022-07-07 17:54 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, qemu-devel

On Mon, Jul 04, 2022 at 05:23:14PM -0300, Leonardo Bras wrote:
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  qapi/migration.json   | 7 ++++++-
>  migration/migration.c | 2 ++
>  monitor/hmp-cmds.c    | 4 ++++
>  3 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 7102e474a6..fed08b9b88 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -55,6 +55,10 @@
>  # @postcopy-bytes: The number of bytes sent during the post-copy phase
>  #                  (since 7.0).
>  #
> +# @dirty-sync-missed-zero-copy: Number of times dirty RAM synchronization could
> +#                               not avoid copying zero pages.  This is between 0

Avoid copying zero pages?  Isn't this for counting MSG_ZEROCOPY fallbacks?

> +#                               and @dirty-sync-count * @multifd-channels.

I'd not name it as "dirty-sync-*" because fundamentally the accounting is
not doing like that (more in latter patch).  I also think we should squash
patch 2/3 as patch 3 only started to provide meaningful values.

> +#                               (since 7.1)
>  # Since: 0.14
>  ##
>  { 'struct': 'MigrationStats',
> @@ -65,7 +69,8 @@
>             'postcopy-requests' : 'int', 'page-size' : 'int',
>             'multifd-bytes' : 'uint64', 'pages-per-second' : 'uint64',
>             'precopy-bytes' : 'uint64', 'downtime-bytes' : 'uint64',
> -           'postcopy-bytes' : 'uint64' } }
> +           'postcopy-bytes' : 'uint64',
> +           'dirty-sync-missed-zero-copy' : 'uint64' } }
>  
>  ##
>  # @XBZRLECacheStats:
> diff --git a/migration/migration.c b/migration/migration.c
> index 78f5057373..048f7f8bdb 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1027,6 +1027,8 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
>      info->ram->normal_bytes = ram_counters.normal * page_size;
>      info->ram->mbps = s->mbps;
>      info->ram->dirty_sync_count = ram_counters.dirty_sync_count;
> +    info->ram->dirty_sync_missed_zero_copy =
> +            ram_counters.dirty_sync_missed_zero_copy;
>      info->ram->postcopy_requests = ram_counters.postcopy_requests;
>      info->ram->page_size = page_size;
>      info->ram->multifd_bytes = ram_counters.multifd_bytes;
> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> index ca98df0495..5f3be9e405 100644
> --- a/monitor/hmp-cmds.c
> +++ b/monitor/hmp-cmds.c
> @@ -307,6 +307,10 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
>              monitor_printf(mon, "postcopy ram: %" PRIu64 " kbytes\n",
>                             info->ram->postcopy_bytes >> 10);
>          }
> +        if (info->ram->dirty_sync_missed_zero_copy) {
> +            monitor_printf(mon, "missed zero-copy on: %" PRIu64 " iterations\n",
> +                           info->ram->dirty_sync_missed_zero_copy);

I suggest we don't call it "iterations" because it's not the generic mean
of iterations.

> +        }
>      }
>  
>      if (info->has_disk) {
> -- 
> 2.36.1
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/3] migration/multifd: Warn user when zerocopy not working
  2022-07-04 20:23 ` [PATCH v3 3/3] migration/multifd: Warn user when zerocopy not working Leonardo Bras
  2022-07-05  8:05   ` Daniel P. Berrangé
@ 2022-07-07 17:56   ` Peter Xu
  2022-07-07 19:59     ` Leonardo Bras Soares Passos
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Xu @ 2022-07-07 17:56 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, qemu-devel

On Mon, Jul 04, 2022 at 05:23:15PM -0300, Leonardo Bras wrote:
> Some errors, like the lack of Scatter-Gather support by the network
> interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using
> zero-copy, which causes it to fall back to the default copying mechanism.
> 
> After each full dirty-bitmap scan there should be a zero-copy flush
> happening, which checks for errors each of the previous calls to
> sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then
> increment dirty_sync_missed_zero_copy migration stat to let the user know
> about it.
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  migration/ram.h     | 2 ++
>  migration/multifd.c | 2 ++
>  migration/ram.c     | 5 +++++
>  3 files changed, 9 insertions(+)
> 
> diff --git a/migration/ram.h b/migration/ram.h
> index ded0a3a086..d3c7eb96f5 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -87,4 +87,6 @@ void ram_write_tracking_prepare(void);
>  int ram_write_tracking_start(void);
>  void ram_write_tracking_stop(void);
>  
> +void dirty_sync_missed_zero_copy(void);
> +
>  #endif
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 684c014c86..3909b34967 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -624,6 +624,8 @@ int multifd_send_sync_main(QEMUFile *f)
>              if (ret < 0) {
>                  error_report_err(err);
>                  return -1;
> +            } else if (ret == 1) {
> +                dirty_sync_missed_zero_copy();
>              }
>          }
>      }

I know that Juan is working on some patch to only do
multifd_send_sync_main() for each dirty sync, but that's not landed, right?

Can we name it without "dirty-sync" at all (so it'll work before/after
Juan's patch will be applied)?  Something like "zero-copy-send-fallbacks"?

The other thing is the subject may need to be touched up as right now with
the field we don't warn the user anymore on zero-copy-send fallbacks.

Thanks,

> diff --git a/migration/ram.c b/migration/ram.c
> index 01f9cc1d72..db948c4787 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -407,6 +407,11 @@ static void ram_transferred_add(uint64_t bytes)
>      ram_counters.transferred += bytes;
>  }
>  
> +void dirty_sync_missed_zero_copy(void)
> +{
> +    ram_counters.dirty_sync_missed_zero_copy++;
> +}
> +
>  /* used by the search for pages to send */
>  struct PageSearchStatus {
>      /* Current block being searched */
> -- 
> 2.36.1
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent
  2022-07-07 17:46   ` Peter Xu
@ 2022-07-07 19:44     ` Leonardo Bras Soares Passos
  2022-07-07 19:52       ` Peter Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-07-07 19:44 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, qemu-devel

Hello Peter,

On Thu, Jul 7, 2022 at 2:47 PM Peter Xu <peterx@redhat.com> wrote:
>
> Hi, Leo,
>
> On Mon, Jul 04, 2022 at 05:23:13PM -0300, Leonardo Bras wrote:
> > If flush is called when no buffer was sent with MSG_ZEROCOPY, it currently
> > returns 1. This return code should be used only when Linux fails to use
> > MSG_ZEROCOPY on a lot of sendmsg().
> >
> > Fix this by returning early from flush if no sendmsg(...,MSG_ZEROCOPY)
> > was attempted.
> >
> > Fixes: 2bc58ffc2926 ("QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX")
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> >  io/channel-socket.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > index 4466bb1cd4..698c086b70 100644
> > --- a/io/channel-socket.c
> > +++ b/io/channel-socket.c
> > @@ -716,12 +716,18 @@ static int qio_channel_socket_flush(QIOChannel *ioc,
> >      struct cmsghdr *cm;
> >      char control[CMSG_SPACE(sizeof(*serr))];
> >      int received;
> > -    int ret = 1;
> > +    int ret;
> > +
> > +    if (!sioc->zero_copy_queued) {
>
> I think I asked this in the downstream review but didn't get a
> response.. shouldn't this check be "queued == sent"?

This is just supposed to skip flush if nothing was queued for sending.
queued == sent is tested bellow in the while part.

Without this, the function could return 1 if nothing was sent with zero-copy,
and it would be confusing, because the QIOChannel API says 1 should be
returned only if all zero-copy sends fell back to copying.

Best regards,
Leo

>
> > +        return 0;
> > +    }
> >
> >      msg.msg_control = control;
> >      msg.msg_controllen = sizeof(control);
> >      memset(control, 0, sizeof(control));
> >
> > +    ret = 1;
> > +
> >      while (sioc->zero_copy_sent < sioc->zero_copy_queued) {
> >          received = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE);
> >          if (received < 0) {
> > --
> > 2.36.1
> >
>
> --
> Peter Xu
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/3] Add dirty-sync-missed-zero-copy migration stat
  2022-07-07 17:54   ` Peter Xu
@ 2022-07-07 19:50     ` Leonardo Bras Soares Passos
  2022-07-07 19:56       ` Peter Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-07-07 19:50 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, qemu-devel

Hello Peter,

On Thu, Jul 7, 2022 at 2:54 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Mon, Jul 04, 2022 at 05:23:14PM -0300, Leonardo Bras wrote:
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> >  qapi/migration.json   | 7 ++++++-
> >  migration/migration.c | 2 ++
> >  monitor/hmp-cmds.c    | 4 ++++
> >  3 files changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 7102e474a6..fed08b9b88 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -55,6 +55,10 @@
> >  # @postcopy-bytes: The number of bytes sent during the post-copy phase
> >  #                  (since 7.0).
> >  #
> > +# @dirty-sync-missed-zero-copy: Number of times dirty RAM synchronization could
> > +#                               not avoid copying zero pages.  This is between 0
>
> Avoid copying zero pages?  Isn't this for counting MSG_ZEROCOPY fallbacks?

Yes, sorry, I think I got confused at some point between some cuts & pastes.
It should be "not avoid copying dirty pages." I will fix that on a V4.


>
> > +#                               and @dirty-sync-count * @multifd-channels.
>
> I'd not name it as "dirty-sync-*" because fundamentally the accounting is
> not doing like that (more in latter patch).

Ok, I will take a look & answer there.

> I also think we should squash
> patch 2/3 as patch 3 only started to provide meaningful values.

IIRC Previously in zero-copy-send implementation, I was asked to keep the
property/capability in a separated patch in order to make it easier to review.
So I thought it would be helpful now.

>
> > +#                               (since 7.1)
> >  # Since: 0.14
> >  ##
> >  { 'struct': 'MigrationStats',
> > @@ -65,7 +69,8 @@
> >             'postcopy-requests' : 'int', 'page-size' : 'int',
> >             'multifd-bytes' : 'uint64', 'pages-per-second' : 'uint64',
> >             'precopy-bytes' : 'uint64', 'downtime-bytes' : 'uint64',
> > -           'postcopy-bytes' : 'uint64' } }
> > +           'postcopy-bytes' : 'uint64',
> > +           'dirty-sync-missed-zero-copy' : 'uint64' } }
> >
> >  ##
> >  # @XBZRLECacheStats:
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 78f5057373..048f7f8bdb 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1027,6 +1027,8 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
> >      info->ram->normal_bytes = ram_counters.normal * page_size;
> >      info->ram->mbps = s->mbps;
> >      info->ram->dirty_sync_count = ram_counters.dirty_sync_count;
> > +    info->ram->dirty_sync_missed_zero_copy =
> > +            ram_counters.dirty_sync_missed_zero_copy;
> >      info->ram->postcopy_requests = ram_counters.postcopy_requests;
> >      info->ram->page_size = page_size;
> >      info->ram->multifd_bytes = ram_counters.multifd_bytes;
> > diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> > index ca98df0495..5f3be9e405 100644
> > --- a/monitor/hmp-cmds.c
> > +++ b/monitor/hmp-cmds.c
> > @@ -307,6 +307,10 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
> >              monitor_printf(mon, "postcopy ram: %" PRIu64 " kbytes\n",
> >                             info->ram->postcopy_bytes >> 10);
> >          }
> > +        if (info->ram->dirty_sync_missed_zero_copy) {
> > +            monitor_printf(mon, "missed zero-copy on: %" PRIu64 " iterations\n",
> > +                           info->ram->dirty_sync_missed_zero_copy);
>
> I suggest we don't call it "iterations" because it's not the generic mean
> of iterations.

Yeah, I thought that too, but I could not think on anything better.
What do you suggest instead?

Best regards,
Leo

>
> > +        }
> >      }
> >
> >      if (info->has_disk) {
> > --
> > 2.36.1
> >
>
> --
> Peter Xu
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent
  2022-07-07 19:44     ` Leonardo Bras Soares Passos
@ 2022-07-07 19:52       ` Peter Xu
  2022-07-07 21:14         ` Leonardo Brás
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Xu @ 2022-07-07 19:52 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, qemu-devel

On Thu, Jul 07, 2022 at 04:44:21PM -0300, Leonardo Bras Soares Passos wrote:
> Hello Peter,
> 
> On Thu, Jul 7, 2022 at 2:47 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > Hi, Leo,
> >
> > On Mon, Jul 04, 2022 at 05:23:13PM -0300, Leonardo Bras wrote:
> > > If flush is called when no buffer was sent with MSG_ZEROCOPY, it currently
> > > returns 1. This return code should be used only when Linux fails to use
> > > MSG_ZEROCOPY on a lot of sendmsg().
> > >
> > > Fix this by returning early from flush if no sendmsg(...,MSG_ZEROCOPY)
> > > was attempted.
> > >
> > > Fixes: 2bc58ffc2926 ("QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX")
> > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > ---
> > >  io/channel-socket.c | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > > index 4466bb1cd4..698c086b70 100644
> > > --- a/io/channel-socket.c
> > > +++ b/io/channel-socket.c
> > > @@ -716,12 +716,18 @@ static int qio_channel_socket_flush(QIOChannel *ioc,
> > >      struct cmsghdr *cm;
> > >      char control[CMSG_SPACE(sizeof(*serr))];
> > >      int received;
> > > -    int ret = 1;
> > > +    int ret;
> > > +
> > > +    if (!sioc->zero_copy_queued) {
> >
> > I think I asked this in the downstream review but didn't get a
> > response.. shouldn't this check be "queued == sent"?
> 
> This is just supposed to skip flush if nothing was queued for sending.
> queued == sent is tested bellow in the while part.
> 
> Without this, the function could return 1 if nothing was sent with zero-copy,
> and it would be confusing, because the QIOChannel API says 1 should be
> returned only if all zero-copy sends fell back to copying.

I know it won't happen in practise, but.. what if we call flush() without
sending anything zero-copy-wise at all (so zero_copy_sent > 0,
zero_copy_queued > 0, meanwhile zero_copy_sent == zero_copy_queued)?  Then
IIUC we'll return 1 even if we didn't do any fallback, or am I wrong?

> 
> Best regards,
> Leo
> 
> >
> > > +        return 0;
> > > +    }
> > >
> > >      msg.msg_control = control;
> > >      msg.msg_controllen = sizeof(control);
> > >      memset(control, 0, sizeof(control));
> > >
> > > +    ret = 1;
> > > +
> > >      while (sioc->zero_copy_sent < sioc->zero_copy_queued) {
> > >          received = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE);
> > >          if (received < 0) {
> > > --
> > > 2.36.1
> > >
> >
> > --
> > Peter Xu
> >
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/3] Add dirty-sync-missed-zero-copy migration stat
  2022-07-07 19:50     ` Leonardo Bras Soares Passos
@ 2022-07-07 19:56       ` Peter Xu
  2022-07-07 21:16         ` Leonardo Brás
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Xu @ 2022-07-07 19:56 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, qemu-devel

On Thu, Jul 07, 2022 at 04:50:47PM -0300, Leonardo Bras Soares Passos wrote:
> > I also think we should squash
> > patch 2/3 as patch 3 only started to provide meaningful values.
> 
> IIRC Previously in zero-copy-send implementation, I was asked to keep the
> property/capability in a separated patch in order to make it easier to review.
> So I thought it would be helpful now.

Ah, that's fine then.

> > > diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> > > index ca98df0495..5f3be9e405 100644
> > > --- a/monitor/hmp-cmds.c
> > > +++ b/monitor/hmp-cmds.c
> > > @@ -307,6 +307,10 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
> > >              monitor_printf(mon, "postcopy ram: %" PRIu64 " kbytes\n",
> > >                             info->ram->postcopy_bytes >> 10);
> > >          }
> > > +        if (info->ram->dirty_sync_missed_zero_copy) {
> > > +            monitor_printf(mon, "missed zero-copy on: %" PRIu64 " iterations\n",
> > > +                           info->ram->dirty_sync_missed_zero_copy);
> >
> > I suggest we don't call it "iterations" because it's not the generic mean
> > of iterations.
> 
> Yeah, I thought that too, but I could not think on anything better.
> What do you suggest instead?

"Zero-copy-send fallbacks happened: xxx times\n"?

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/3] migration/multifd: Warn user when zerocopy not working
  2022-07-07 17:56   ` Peter Xu
@ 2022-07-07 19:59     ` Leonardo Bras Soares Passos
  2022-07-07 20:06       ` Peter Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-07-07 19:59 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, qemu-devel

Hello Peter,

On Thu, Jul 7, 2022 at 2:56 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Mon, Jul 04, 2022 at 05:23:15PM -0300, Leonardo Bras wrote:
> > Some errors, like the lack of Scatter-Gather support by the network
> > interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using
> > zero-copy, which causes it to fall back to the default copying mechanism.
> >
> > After each full dirty-bitmap scan there should be a zero-copy flush
> > happening, which checks for errors each of the previous calls to
> > sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then
> > increment dirty_sync_missed_zero_copy migration stat to let the user know
> > about it.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> >  migration/ram.h     | 2 ++
> >  migration/multifd.c | 2 ++
> >  migration/ram.c     | 5 +++++
> >  3 files changed, 9 insertions(+)
> >
> > diff --git a/migration/ram.h b/migration/ram.h
> > index ded0a3a086..d3c7eb96f5 100644
> > --- a/migration/ram.h
> > +++ b/migration/ram.h
> > @@ -87,4 +87,6 @@ void ram_write_tracking_prepare(void);
> >  int ram_write_tracking_start(void);
> >  void ram_write_tracking_stop(void);
> >
> > +void dirty_sync_missed_zero_copy(void);
> > +
> >  #endif
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index 684c014c86..3909b34967 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -624,6 +624,8 @@ int multifd_send_sync_main(QEMUFile *f)
> >              if (ret < 0) {
> >                  error_report_err(err);
> >                  return -1;
> > +            } else if (ret == 1) {
> > +                dirty_sync_missed_zero_copy();
> >              }
> >          }
> >      }
>
> I know that Juan is working on some patch to only do
> multifd_send_sync_main() for each dirty sync, but that's not landed, right?

That's correct, but I am hoping it should land before the release, so
the numbers will match.


>
> Can we name it without "dirty-sync" at all (so it'll work before/after
> Juan's patch will be applied)?  Something like "zero-copy-send-fallbacks"?

It initially was something like that, but on the v2 thread there was
some discussion on
the topic, and it was suggested the number would not mean much to the
user, unless
it was connected to something else.

Markus suggested the connection to @dirty-sync-count right in the
name, and Daniel suggested the above name, which sounds fine to me.

>
> The other thing is the subject may need to be touched up as right now with
> the field we don't warn the user anymore on zero-copy-send fallbacks.

Ok, Warning sounds misleading here.
What do you think about 'report' instead?

Best regards,
Leo

>
> Thanks,
>
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 01f9cc1d72..db948c4787 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -407,6 +407,11 @@ static void ram_transferred_add(uint64_t bytes)
> >      ram_counters.transferred += bytes;
> >  }
> >
> > +void dirty_sync_missed_zero_copy(void)
> > +{
> > +    ram_counters.dirty_sync_missed_zero_copy++;
> > +}
> > +
> >  /* used by the search for pages to send */
> >  struct PageSearchStatus {
> >      /* Current block being searched */
> > --
> > 2.36.1
> >
>
> --
> Peter Xu
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/3] migration/multifd: Warn user when zerocopy not working
  2022-07-07 19:59     ` Leonardo Bras Soares Passos
@ 2022-07-07 20:06       ` Peter Xu
  2022-07-07 21:18         ` Leonardo Brás
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Xu @ 2022-07-07 20:06 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, qemu-devel

On Thu, Jul 07, 2022 at 04:59:22PM -0300, Leonardo Bras Soares Passos wrote:
> Hello Peter,
> 
> On Thu, Jul 7, 2022 at 2:56 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Mon, Jul 04, 2022 at 05:23:15PM -0300, Leonardo Bras wrote:
> > > Some errors, like the lack of Scatter-Gather support by the network
> > > interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using
> > > zero-copy, which causes it to fall back to the default copying mechanism.
> > >
> > > After each full dirty-bitmap scan there should be a zero-copy flush
> > > happening, which checks for errors each of the previous calls to
> > > sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then
> > > increment dirty_sync_missed_zero_copy migration stat to let the user know
> > > about it.
> > >
> > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > ---
> > >  migration/ram.h     | 2 ++
> > >  migration/multifd.c | 2 ++
> > >  migration/ram.c     | 5 +++++
> > >  3 files changed, 9 insertions(+)
> > >
> > > diff --git a/migration/ram.h b/migration/ram.h
> > > index ded0a3a086..d3c7eb96f5 100644
> > > --- a/migration/ram.h
> > > +++ b/migration/ram.h
> > > @@ -87,4 +87,6 @@ void ram_write_tracking_prepare(void);
> > >  int ram_write_tracking_start(void);
> > >  void ram_write_tracking_stop(void);
> > >
> > > +void dirty_sync_missed_zero_copy(void);
> > > +
> > >  #endif
> > > diff --git a/migration/multifd.c b/migration/multifd.c
> > > index 684c014c86..3909b34967 100644
> > > --- a/migration/multifd.c
> > > +++ b/migration/multifd.c
> > > @@ -624,6 +624,8 @@ int multifd_send_sync_main(QEMUFile *f)
> > >              if (ret < 0) {
> > >                  error_report_err(err);
> > >                  return -1;
> > > +            } else if (ret == 1) {
> > > +                dirty_sync_missed_zero_copy();
> > >              }
> > >          }
> > >      }
> >
> > I know that Juan is working on some patch to only do
> > multifd_send_sync_main() for each dirty sync, but that's not landed, right?
> 
> That's correct, but I am hoping it should land before the release, so
> the numbers will match.
> 
> 
> >
> > Can we name it without "dirty-sync" at all (so it'll work before/after
> > Juan's patch will be applied)?  Something like "zero-copy-send-fallbacks"?
> 
> It initially was something like that, but on the v2 thread there was
> some discussion on
> the topic, and it was suggested the number would not mean much to the
> user, unless
> it was connected to something else.
> 
> Markus suggested the connection to @dirty-sync-count right in the
> name, and Daniel suggested the above name, which sounds fine to me.

Ah okay.

But then I suggest we make sure it lands only after Juan's.. or it won't
really match.  Also when Juan's patch ready, we'd also double check it will
be exactly called once per iteration, or we can get confusing numbers.  I
assume Juan will take care of that then.

> 
> >
> > The other thing is the subject may need to be touched up as right now with
> > the field we don't warn the user anymore on zero-copy-send fallbacks.
> 
> Ok, Warning sounds misleading here.
> What do you think about 'report' instead?

Looks good.  Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent
  2022-07-07 19:52       ` Peter Xu
@ 2022-07-07 21:14         ` Leonardo Brás
  2022-07-07 22:18           ` Peter Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Leonardo Brás @ 2022-07-07 21:14 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, qemu-devel

On Thu, 2022-07-07 at 15:52 -0400, Peter Xu wrote:
> On Thu, Jul 07, 2022 at 04:44:21PM -0300, Leonardo Bras Soares Passos wrote:
> > Hello Peter,
> > 
> > On Thu, Jul 7, 2022 at 2:47 PM Peter Xu <peterx@redhat.com> wrote:
> > > 
> > > Hi, Leo,
> > > 
> > > On Mon, Jul 04, 2022 at 05:23:13PM -0300, Leonardo Bras wrote:
> > > > If flush is called when no buffer was sent with MSG_ZEROCOPY, it
> > > > currently
> > > > returns 1. This return code should be used only when Linux fails to use
> > > > MSG_ZEROCOPY on a lot of sendmsg().
> > > > 
> > > > Fix this by returning early from flush if no sendmsg(...,MSG_ZEROCOPY)
> > > > was attempted.
> > > > 
> > > > Fixes: 2bc58ffc2926 ("QIOChannelSocket: Implement io_writev zero copy
> > > > flag & io_flush for CONFIG_LINUX")
> > > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > > ---
> > > >  io/channel-socket.c | 8 +++++++-
> > > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > > > index 4466bb1cd4..698c086b70 100644
> > > > --- a/io/channel-socket.c
> > > > +++ b/io/channel-socket.c
> > > > @@ -716,12 +716,18 @@ static int qio_channel_socket_flush(QIOChannel
> > > > *ioc,
> > > >      struct cmsghdr *cm;
> > > >      char control[CMSG_SPACE(sizeof(*serr))];
> > > >      int received;
> > > > -    int ret = 1;
> > > > +    int ret;
> > > > +
> > > > +    if (!sioc->zero_copy_queued) {
> > > 
> > > I think I asked this in the downstream review but didn't get a
> > > response.. shouldn't this check be "queued == sent"?
> > 
> > This is just supposed to skip flush if nothing was queued for sending.
> > queued == sent is tested bellow in the while part.
> > 
> > Without this, the function could return 1 if nothing was sent with zero-
> > copy,
> > and it would be confusing, because the QIOChannel API says 1 should be
> > returned only if all zero-copy sends fell back to copying.
> 
> I know it won't happen in practise, but.. what if we call flush() without
> sending anything zero-copy-wise at all (so zero_copy_sent > 0,
> zero_copy_queued > 0, 

On a no-bug scenario:
- If we don't send anything zero-copy wise, zero_copy_queued will never get
incremented. 
- If we don't get inside the while loop in flush, zero_copy_sent will never be
incremented.


But sure, suppose it does get incremented by bug / mistake: 
-  zero_copy_queued incremented, zero_copy_sent untouched : 
 On (queued == 0) it will go past the 'if', and get stuck 'forever' in the while
loop, since sent will 'never' get incremented.
 On (queued == sent), same.
 On (queued <= sent), same.

-  zero_copy_queued untouched, zero_copy_sent incremented :
 On 'if (queued == 0)' it will not go past the 'if'
 On 'if (queued == sent)', will go past the 'if', but will exit on the 'while',
falsely returning 1.
 On 'if (queued <= sent)', it will not go past the 'if'.

-  zero_copy_queued incremented, zero_copy_sent incremented : 
 On 'if (queued == 0)',  infinite loop as above.
 On 'if (queued == sent)', 
	if sent < queued :  infinite loop , 
	if sent == queued : won't go past 'if' ,
       	if sent > queued :  will go past the 'if', but will exit on the
'while', falsely returning 1.  
 On (queued <= sent), infinite loop if sent < queued, not past if otherwise

BTW, I consider it 'infinite loop', but it could also end up returning -1 on any
error.

> meanwhile zero_copy_sent == zero_copy_queued)?  Then
> IIUC we'll return 1 even if we didn't do any fallback, or am I wrong?

Having 'if(queued == sent)' will cause us to falsely return '1' in two buggy
cases, while 'if queued == 0) will either skip early or go into 'infinite' loop.

Best regards,
Leo

> 
> > 
> > Best regards,
> > Leo
> > 
> > > 
> > > > +        return 0;
> > > > +    }
> > > > 
> > > >      msg.msg_control = control;
> > > >      msg.msg_controllen = sizeof(control);
> > > >      memset(control, 0, sizeof(control));
> > > > 
> > > > +    ret = 1;
> > > > +
> > > >      while (sioc->zero_copy_sent < sioc->zero_copy_queued) {
> > > >          received = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE);
> > > >          if (received < 0) {
> > > > --
> > > > 2.36.1
> > > > 
> > > 
> > > --
> > > Peter Xu
> > > 
> > 
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/3] Add dirty-sync-missed-zero-copy migration stat
  2022-07-07 19:56       ` Peter Xu
@ 2022-07-07 21:16         ` Leonardo Brás
  0 siblings, 0 replies; 23+ messages in thread
From: Leonardo Brás @ 2022-07-07 21:16 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, qemu-devel

On Thu, 2022-07-07 at 15:56 -0400, Peter Xu wrote:
> On Thu, Jul 07, 2022 at 04:50:47PM -0300, Leonardo Bras Soares Passos wrote:
> > > I also think we should squash
> > > patch 2/3 as patch 3 only started to provide meaningful values.
> > 
> > IIRC Previously in zero-copy-send implementation, I was asked to keep the
> > property/capability in a separated patch in order to make it easier to
> > review.
> > So I thought it would be helpful now.
> 
> Ah, that's fine then.
> 
> > > > diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> > > > index ca98df0495..5f3be9e405 100644
> > > > --- a/monitor/hmp-cmds.c
> > > > +++ b/monitor/hmp-cmds.c
> > > > @@ -307,6 +307,10 @@ void hmp_info_migrate(Monitor *mon, const QDict
> > > > *qdict)
> > > >              monitor_printf(mon, "postcopy ram: %" PRIu64 " kbytes\n",
> > > >                             info->ram->postcopy_bytes >> 10);
> > > >          }
> > > > +        if (info->ram->dirty_sync_missed_zero_copy) {
> > > > +            monitor_printf(mon, "missed zero-copy on: %" PRIu64 "
> > > > iterations\n",
> > > > +                           info->ram->dirty_sync_missed_zero_copy);
> > > 
> > > I suggest we don't call it "iterations" because it's not the generic mean
> > > of iterations.
> > 
> > Yeah, I thought that too, but I could not think on anything better.
> > What do you suggest instead?
> 
> "Zero-copy-send fallbacks happened: xxx times\n"?

Oh, yeah, that will work.
I was thinking on keeping the pattern and ended up thinking what was the correct
unit. But this is much simpler and work better.

Best regards,
Leo

> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/3] migration/multifd: Warn user when zerocopy not working
  2022-07-07 20:06       ` Peter Xu
@ 2022-07-07 21:18         ` Leonardo Brás
  0 siblings, 0 replies; 23+ messages in thread
From: Leonardo Brás @ 2022-07-07 21:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, qemu-devel

On Thu, 2022-07-07 at 16:06 -0400, Peter Xu wrote:
> On Thu, Jul 07, 2022 at 04:59:22PM -0300, Leonardo Bras Soares Passos wrote:
> > Hello Peter,
> > 
> > On Thu, Jul 7, 2022 at 2:56 PM Peter Xu <peterx@redhat.com> wrote:
> > > 
> > > On Mon, Jul 04, 2022 at 05:23:15PM -0300, Leonardo Bras wrote:
> > > > Some errors, like the lack of Scatter-Gather support by the network
> > > > interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on
> > > > using
> > > > zero-copy, which causes it to fall back to the default copying
> > > > mechanism.
> > > > 
> > > > After each full dirty-bitmap scan there should be a zero-copy flush
> > > > happening, which checks for errors each of the previous calls to
> > > > sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then
> > > > increment dirty_sync_missed_zero_copy migration stat to let the user
> > > > know
> > > > about it.
> > > > 
> > > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > > ---
> > > >  migration/ram.h     | 2 ++
> > > >  migration/multifd.c | 2 ++
> > > >  migration/ram.c     | 5 +++++
> > > >  3 files changed, 9 insertions(+)
> > > > 
> > > > diff --git a/migration/ram.h b/migration/ram.h
> > > > index ded0a3a086..d3c7eb96f5 100644
> > > > --- a/migration/ram.h
> > > > +++ b/migration/ram.h
> > > > @@ -87,4 +87,6 @@ void ram_write_tracking_prepare(void);
> > > >  int ram_write_tracking_start(void);
> > > >  void ram_write_tracking_stop(void);
> > > > 
> > > > +void dirty_sync_missed_zero_copy(void);
> > > > +
> > > >  #endif
> > > > diff --git a/migration/multifd.c b/migration/multifd.c
> > > > index 684c014c86..3909b34967 100644
> > > > --- a/migration/multifd.c
> > > > +++ b/migration/multifd.c
> > > > @@ -624,6 +624,8 @@ int multifd_send_sync_main(QEMUFile *f)
> > > >              if (ret < 0) {
> > > >                  error_report_err(err);
> > > >                  return -1;
> > > > +            } else if (ret == 1) {
> > > > +                dirty_sync_missed_zero_copy();
> > > >              }
> > > >          }
> > > >      }
> > > 
> > > I know that Juan is working on some patch to only do
> > > multifd_send_sync_main() for each dirty sync, but that's not landed,
> > > right?
> > 
> > That's correct, but I am hoping it should land before the release, so
> > the numbers will match.
> > 
> > 
> > > 
> > > Can we name it without "dirty-sync" at all (so it'll work before/after
> > > Juan's patch will be applied)?  Something like "zero-copy-send-fallbacks"?
> > 
> > It initially was something like that, but on the v2 thread there was
> > some discussion on
> > the topic, and it was suggested the number would not mean much to the
> > user, unless
> > it was connected to something else.
> > 
> > Markus suggested the connection to @dirty-sync-count right in the
> > name, and Daniel suggested the above name, which sounds fine to me.
> 
> Ah okay.
> 
> But then I suggest we make sure it lands only after Juan's.. or it won't
> really match.  Also when Juan's patch ready, we'd also double check it will
> be exactly called once per iteration, or we can get confusing numbers.  I
> assume Juan will take care of that then.
> 
> > 
> > > 
> > > The other thing is the subject may need to be touched up as right now with
> > > the field we don't warn the user anymore on zero-copy-send fallbacks.
> > 
> > Ok, Warning sounds misleading here.
> > What do you think about 'report' instead?
> 
> Looks good.  Thanks,

Thank you for reviewing, Peter!

Leo

> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent
  2022-07-07 21:14         ` Leonardo Brás
@ 2022-07-07 22:18           ` Peter Xu
  2022-07-11 19:29             ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Xu @ 2022-07-07 22:18 UTC (permalink / raw)
  To: Leonardo Brás
  Cc: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, qemu-devel

On Thu, Jul 07, 2022 at 06:14:17PM -0300, Leonardo Brás wrote:
> Having 'if(queued == sent)' will cause us to falsely return '1' in two buggy
> cases, while 'if queued == 0) will either skip early or go into 'infinite' loop.

I'm not sure I strictly follow here..

Imagine the case we do flush() twice without sending anything, then in the
1st flush we'll see queued>sent, we'll finish flush() until queued==sent.
Then in the 2nd (continuous) flush() we'll see queued==sent immediately.

IIUC with the current patch we'll return 1 which I think is wrong because
fallback didn't happen, and if with the change to "if (queued==sent) return
0" it'll fix it?

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/3] QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent
  2022-07-07 22:18           ` Peter Xu
@ 2022-07-11 19:29             ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 23+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-07-11 19:29 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
	Markus Armbruster, qemu-devel

On Thu, Jul 7, 2022 at 7:18 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Jul 07, 2022 at 06:14:17PM -0300, Leonardo Brás wrote:
> > Having 'if(queued == sent)' will cause us to falsely return '1' in two buggy
> > cases, while 'if queued == 0) will either skip early or go into 'infinite' loop.
>
> I'm not sure I strictly follow here..
>

Sorry, I was thinking of a different scenario.

> Imagine the case we do flush() twice without sending anything, then in the
> 1st flush we'll see queued>sent, we'll finish flush() until queued==sent.
> Then in the 2nd (continuous) flush() we'll see queued==sent immediately.
>
> IIUC with the current patch we'll return 1 which I think is wrong because
> fallback didn't happen, and if with the change to "if (queued==sent) return
> 0" it'll fix it?

Yes, you are correct.
It's a possible scenario to have a flush happen just after another
without any sending in between.

I will fix it as suggested.

Best regards,
Leo

>
> --
> Peter Xu
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-07-11 19:33 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-04 20:23 [PATCH v3 0/3] Zero copy improvements (QIOChannel + multifd) Leonardo Bras
2022-07-04 20:23 ` [PATCH v3 1/3] QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent Leonardo Bras
2022-07-05  8:04   ` Daniel P. Berrangé
2022-07-05 15:15     ` Juan Quintela
2022-07-07 17:46   ` Peter Xu
2022-07-07 19:44     ` Leonardo Bras Soares Passos
2022-07-07 19:52       ` Peter Xu
2022-07-07 21:14         ` Leonardo Brás
2022-07-07 22:18           ` Peter Xu
2022-07-11 19:29             ` Leonardo Bras Soares Passos
2022-07-04 20:23 ` [PATCH v3 2/3] Add dirty-sync-missed-zero-copy migration stat Leonardo Bras
2022-07-05  4:14   ` Markus Armbruster
2022-07-05  8:05   ` Daniel P. Berrangé
2022-07-07 17:54   ` Peter Xu
2022-07-07 19:50     ` Leonardo Bras Soares Passos
2022-07-07 19:56       ` Peter Xu
2022-07-07 21:16         ` Leonardo Brás
2022-07-04 20:23 ` [PATCH v3 3/3] migration/multifd: Warn user when zerocopy not working Leonardo Bras
2022-07-05  8:05   ` Daniel P. Berrangé
2022-07-07 17:56   ` Peter Xu
2022-07-07 19:59     ` Leonardo Bras Soares Passos
2022-07-07 20:06       ` Peter Xu
2022-07-07 21:18         ` Leonardo Brás

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.