qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] multifd: various fixes
@ 2023-09-22  6:56 Elena Ufimtseva
  2023-09-22  6:56 ` [PATCH 1/4] multifd: wait for channels_ready before sending sync Elena Ufimtseva
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Elena Ufimtseva @ 2023-09-22  6:56 UTC (permalink / raw)
  To: quintela, peterx, leobras; +Cc: elena.ufimtseva, qemu-devel

Hello

While working and testing various live migration scenarios,
a few issues were found.

This is my first patches in live migration and I will
appreciate the suggestions from the community if these
patches could be done differently.

[PATCH 1/4] multifd: wait for channels_ready before sending sync
I am not certain about this change since it seems that
the sync flag could be the part of the packets with pages that are
being sent out currently.
But the traces show this is not always the case:
multifd_send 230.873 pid=55477 id=0x0 packet_num=0x6f4 normal=0x40 flags=0x1 next_packet_size=0x40000
multifd_send 14.718 pid=55477 id=0x1 packet_num=0x6f5 normal=0x0 flags=0x1 next_packet_size=0x80000
If the sync packet is indeed can be a standalone one, then waiting for
channels_ready before seem to be appropriate, but waisting iteration on
sync only packet.
[PATCH 4/4] is also relevant to 1/4, but fixes the over-accounting in
case of sync only packet.


Thank you in advance and looking forward for your feedback.

Elena

Elena Ufimtseva (4):
  multifd: wait for channels_ready before sending sync
  migration: check for rate_limit_max for RATE_LIMIT_DISABLED
  multifd: fix counters in multifd_send_thread
  multifd: reset next_packet_len after sending pages

 migration/migration-stats.c |  8 ++++----
 migration/multifd.c         | 11 ++++++-----
 2 files changed, 10 insertions(+), 9 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] multifd: wait for channels_ready before sending sync
  2023-09-22  6:56 [PATCH 0/4] multifd: various fixes Elena Ufimtseva
@ 2023-09-22  6:56 ` Elena Ufimtseva
  2023-09-22 16:06   ` Fabiano Rosas
  2023-09-22  6:56 ` [PATCH 2/4] migration: check for rate_limit_max for RATE_LIMIT_DISABLED Elena Ufimtseva
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Elena Ufimtseva @ 2023-09-22  6:56 UTC (permalink / raw)
  To: quintela, peterx, leobras; +Cc: elena.ufimtseva, qemu-devel

In multifd_send_sync_main we need to wait for channels_ready
before submitting sync packet as the threads may still be sending
their previous pages.
There is also no need to check for channels_ready in the loop
before the wait for sem_sync, next iteration of sending pages
or another sync will start with waiting for channels_ready
semaphore.
Changes to commit 90b3cec351996dd8ef4eb847ad38607812c5e7f5
("multifd: Fix the number of channels ready")

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 migration/multifd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 0f6b203877..e61e458151 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -595,6 +595,7 @@ int multifd_send_sync_main(QEMUFile *f)
         }
     }
 
+    qemu_sem_wait(&multifd_send_state->channels_ready);
     /*
      * When using zero-copy, it's necessary to flush the pages before any of
      * the pages can be sent again, so we'll make sure the new version of the
@@ -630,7 +631,6 @@ int multifd_send_sync_main(QEMUFile *f)
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
 
-        qemu_sem_wait(&multifd_send_state->channels_ready);
         trace_multifd_send_sync_main_wait(p->id);
         qemu_sem_wait(&p->sem_sync);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] migration: check for rate_limit_max for RATE_LIMIT_DISABLED
  2023-09-22  6:56 [PATCH 0/4] multifd: various fixes Elena Ufimtseva
  2023-09-22  6:56 ` [PATCH 1/4] multifd: wait for channels_ready before sending sync Elena Ufimtseva
@ 2023-09-22  6:56 ` Elena Ufimtseva
  2023-09-22 17:38   ` Fabiano Rosas
  2023-10-10 20:17   ` Peter Xu
  2023-09-22  6:56 ` [PATCH 3/4] multifd: fix counters in multifd_send_thread Elena Ufimtseva
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 13+ messages in thread
From: Elena Ufimtseva @ 2023-09-22  6:56 UTC (permalink / raw)
  To: quintela, peterx, leobras; +Cc: elena.ufimtseva, qemu-devel

In migration rate limiting atomic operations are used
to read the rate limit variables and transferred bytes and
they are expensive. Check first if rate_limit_max is equal
to RATE_LIMIT_DISABLED and return false immediately if so.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 migration/migration-stats.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/migration/migration-stats.c b/migration/migration-stats.c
index 095d6d75bb..abc31483d5 100644
--- a/migration/migration-stats.c
+++ b/migration/migration-stats.c
@@ -24,14 +24,14 @@ bool migration_rate_exceeded(QEMUFile *f)
         return true;
     }
 
-    uint64_t rate_limit_start = stat64_get(&mig_stats.rate_limit_start);
-    uint64_t rate_limit_current = migration_transferred_bytes(f);
-    uint64_t rate_limit_used = rate_limit_current - rate_limit_start;
     uint64_t rate_limit_max = stat64_get(&mig_stats.rate_limit_max);
-
     if (rate_limit_max == RATE_LIMIT_DISABLED) {
         return false;
     }
+    uint64_t rate_limit_start = stat64_get(&mig_stats.rate_limit_start);
+    uint64_t rate_limit_current = migration_transferred_bytes(f);
+    uint64_t rate_limit_used = rate_limit_current - rate_limit_start;
+
     if (rate_limit_max > 0 && rate_limit_used > rate_limit_max) {
         return true;
     }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] multifd: fix counters in multifd_send_thread
  2023-09-22  6:56 [PATCH 0/4] multifd: various fixes Elena Ufimtseva
  2023-09-22  6:56 ` [PATCH 1/4] multifd: wait for channels_ready before sending sync Elena Ufimtseva
  2023-09-22  6:56 ` [PATCH 2/4] migration: check for rate_limit_max for RATE_LIMIT_DISABLED Elena Ufimtseva
@ 2023-09-22  6:56 ` Elena Ufimtseva
  2023-09-22 18:13   ` Fabiano Rosas
  2023-09-22  6:56 ` [PATCH 4/4] multifd: reset next_packet_len after sending pages Elena Ufimtseva
  2023-09-22 14:18 ` [PATCH 0/4] multifd: various fixes Fabiano Rosas
  4 siblings, 1 reply; 13+ messages in thread
From: Elena Ufimtseva @ 2023-09-22  6:56 UTC (permalink / raw)
  To: quintela, peterx, leobras; +Cc: elena.ufimtseva, qemu-devel

Previous commit cbec7eb76879d419e7dbf531ee2506ec0722e825
"migration/multifd: Compute transferred bytes correctly"
removed accounting for packet_len in non-rdma
case, but the next_packet_size only accounts for pages, not for
the header packet (normal_pages * PAGE_SIZE) that is being sent
as iov[0]. The packet_len part should be added to account for
the size of MultiFDPacket and the array of the offsets.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 migration/multifd.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index e61e458151..3281397b18 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -714,8 +714,6 @@ static void *multifd_send_thread(void *opaque)
                 if (ret != 0) {
                     break;
                 }
-                stat64_add(&mig_stats.multifd_bytes, p->packet_len);
-                stat64_add(&mig_stats.transferred, p->packet_len);
             } else {
                 /* Send header using the same writev call */
                 p->iov[0].iov_len = p->packet_len;
@@ -728,8 +726,10 @@ static void *multifd_send_thread(void *opaque)
                 break;
             }
 
-            stat64_add(&mig_stats.multifd_bytes, p->next_packet_size);
-            stat64_add(&mig_stats.transferred, p->next_packet_size);
+            stat64_add(&mig_stats.multifd_bytes,
+                       p->next_packet_size + p->packet_len);
+            stat64_add(&mig_stats.transferred,
+                       p->next_packet_size + p->packet_len);
             qemu_mutex_lock(&p->mutex);
             p->pending_job--;
             qemu_mutex_unlock(&p->mutex);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] multifd: reset next_packet_len after sending pages
  2023-09-22  6:56 [PATCH 0/4] multifd: various fixes Elena Ufimtseva
                   ` (2 preceding siblings ...)
  2023-09-22  6:56 ` [PATCH 3/4] multifd: fix counters in multifd_send_thread Elena Ufimtseva
@ 2023-09-22  6:56 ` Elena Ufimtseva
  2023-09-22 18:32   ` Fabiano Rosas
  2023-09-22 18:44   ` Fabiano Rosas
  2023-09-22 14:18 ` [PATCH 0/4] multifd: various fixes Fabiano Rosas
  4 siblings, 2 replies; 13+ messages in thread
From: Elena Ufimtseva @ 2023-09-22  6:56 UTC (permalink / raw)
  To: quintela, peterx, leobras; +Cc: elena.ufimtseva, qemu-devel

Sometimes multifd sends just sync packet with no pages
(normal_num is 0). In this case the old value is being
preserved and being accounted for while only packet_len
is being transferred.
Reset it to 0 after sending and accounting for.

TODO: Fix the same packet ids in the stream.
with this patch, there is still an issue with the duplicated
packets ids being sent (with different number of pages/flags).
See in below multifd_send trace (before this change):
multifd_send 394.774 pid=55477 id=0x1 packet_num=0x6f0 normal=0x57 flags=0x1 next_packet_size=0x57000
multifd_send 181.244 pid=55477 id=0x1 packet_num=0x6f0 normal=0x0 flags=0x0 next_packet_size=0x57000

With this commit there are still duplicated packets, but since no pages
are being sent with sync flag set, next_packet_size is 0:
multifd_send 27.814 pid=18602 id=0x1 packet_num=0x574 normal=0x7b flags=0x1 next_packet_size=0x7b000
multifd_send 136054.792 pid=18602 id=0x1 packet_num=0x574 normal=0x0 flags=0x0 next_packet_size=0x0
If there is a suggestion how to fix this properly, I will be
glad to use it.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 migration/multifd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/migration/multifd.c b/migration/multifd.c
index 3281397b18..8b4e26051b 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -730,6 +730,7 @@ static void *multifd_send_thread(void *opaque)
                        p->next_packet_size + p->packet_len);
             stat64_add(&mig_stats.transferred,
                        p->next_packet_size + p->packet_len);
+            p->next_packet_size = 0;
             qemu_mutex_lock(&p->mutex);
             p->pending_job--;
             qemu_mutex_unlock(&p->mutex);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/4] multifd: various fixes
  2023-09-22  6:56 [PATCH 0/4] multifd: various fixes Elena Ufimtseva
                   ` (3 preceding siblings ...)
  2023-09-22  6:56 ` [PATCH 4/4] multifd: reset next_packet_len after sending pages Elena Ufimtseva
@ 2023-09-22 14:18 ` Fabiano Rosas
  4 siblings, 0 replies; 13+ messages in thread
From: Fabiano Rosas @ 2023-09-22 14:18 UTC (permalink / raw)
  To: Elena Ufimtseva, quintela, peterx, leobras; +Cc: elena.ufimtseva, qemu-devel

Elena Ufimtseva <elena.ufimtseva@oracle.com> writes:

> Hello
>
> While working and testing various live migration scenarios,
> a few issues were found.
>
> This is my first patches in live migration and I will
> appreciate the suggestions from the community if these
> patches could be done differently.
>
> [PATCH 1/4] multifd: wait for channels_ready before sending sync
> I am not certain about this change since it seems that
> the sync flag could be the part of the packets with pages that are
> being sent out currently.
> But the traces show this is not always the case:
> multifd_send 230.873 pid=55477 id=0x0 packet_num=0x6f4 normal=0x40 flags=0x1 next_packet_size=0x40000
> multifd_send 14.718 pid=55477 id=0x1 packet_num=0x6f5 normal=0x0 flags=0x1 next_packet_size=0x80000
> If the sync packet is indeed can be a standalone one, then waiting for
> channels_ready before seem to be appropriate, but waisting iteration on
> sync only packet.

I haven't looked at this code for a while, so there's some context
switching to be made, but you're definitely on the right track here. I
actually have an unsent patch doing almost the same as your patch
1/4. I'll comment more there.

About the sync being standalone, I would expect that to always be the
case since we're incrementing packet_num at that point.

> [PATCH 4/4] is also relevant to 1/4, but fixes the over-accounting in
> case of sync only packet.
>
>
> Thank you in advance and looking forward for your feedback.
>
> Elena
>
> Elena Ufimtseva (4):
>   multifd: wait for channels_ready before sending sync
>   migration: check for rate_limit_max for RATE_LIMIT_DISABLED
>   multifd: fix counters in multifd_send_thread
>   multifd: reset next_packet_len after sending pages
>
>  migration/migration-stats.c |  8 ++++----
>  migration/multifd.c         | 11 ++++++-----
>  2 files changed, 10 insertions(+), 9 deletions(-)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] multifd: wait for channels_ready before sending sync
  2023-09-22  6:56 ` [PATCH 1/4] multifd: wait for channels_ready before sending sync Elena Ufimtseva
@ 2023-09-22 16:06   ` Fabiano Rosas
  2023-09-22 23:18     ` Elena Ufimtseva
  0 siblings, 1 reply; 13+ messages in thread
From: Fabiano Rosas @ 2023-09-22 16:06 UTC (permalink / raw)
  To: Elena Ufimtseva, quintela, peterx, leobras; +Cc: elena.ufimtseva, qemu-devel

Elena Ufimtseva <elena.ufimtseva@oracle.com> writes:

> In multifd_send_sync_main we need to wait for channels_ready
> before submitting sync packet as the threads may still be sending
> their previous pages.
> There is also no need to check for channels_ready in the loop
> before the wait for sem_sync, next iteration of sending pages
> or another sync will start with waiting for channels_ready
> semaphore.
> Changes to commit 90b3cec351996dd8ef4eb847ad38607812c5e7f5
> ("multifd: Fix the number of channels ready")
>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> ---
>  migration/multifd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 0f6b203877..e61e458151 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -595,6 +595,7 @@ int multifd_send_sync_main(QEMUFile *f)
>          }
>      }
>  
> +    qemu_sem_wait(&multifd_send_state->channels_ready);
>      /*
>       * When using zero-copy, it's necessary to flush the pages before any of
>       * the pages can be sent again, so we'll make sure the new version of the
> @@ -630,7 +631,6 @@ int multifd_send_sync_main(QEMUFile *f)
>      for (i = 0; i < migrate_multifd_channels(); i++) {
>          MultiFDSendParams *p = &multifd_send_state->params[i];
>  
> -        qemu_sem_wait(&multifd_send_state->channels_ready);
>          trace_multifd_send_sync_main_wait(p->id);
>          qemu_sem_wait(&p->sem_sync);

Please take a look at the series I just sent. Basically, I think we
should wait on 'sem' for the number of existing channels and not just
once per sync. Otherwise I think we'd hit the same issue this patch is
trying to fix when we loop into the n+1 channels. I think the
assert(!p->pending_job) in patch 3 helps prove that's more appropriate.

Let me know what you think.

Thanks


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] migration: check for rate_limit_max for RATE_LIMIT_DISABLED
  2023-09-22  6:56 ` [PATCH 2/4] migration: check for rate_limit_max for RATE_LIMIT_DISABLED Elena Ufimtseva
@ 2023-09-22 17:38   ` Fabiano Rosas
  2023-10-10 20:17   ` Peter Xu
  1 sibling, 0 replies; 13+ messages in thread
From: Fabiano Rosas @ 2023-09-22 17:38 UTC (permalink / raw)
  To: Elena Ufimtseva, quintela, peterx, leobras; +Cc: elena.ufimtseva, qemu-devel

Elena Ufimtseva <elena.ufimtseva@oracle.com> writes:

> In migration rate limiting atomic operations are used
> to read the rate limit variables and transferred bytes and
> they are expensive. Check first if rate_limit_max is equal
> to RATE_LIMIT_DISABLED and return false immediately if so.
>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> ---
>  migration/migration-stats.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/migration/migration-stats.c b/migration/migration-stats.c
> index 095d6d75bb..abc31483d5 100644
> --- a/migration/migration-stats.c
> +++ b/migration/migration-stats.c
> @@ -24,14 +24,14 @@ bool migration_rate_exceeded(QEMUFile *f)
>          return true;
>      }
>  
> -    uint64_t rate_limit_start = stat64_get(&mig_stats.rate_limit_start);
> -    uint64_t rate_limit_current = migration_transferred_bytes(f);

There's a qemu_fflush() hiding inside migration_transferred_bytes(). It
currently always flushes if we haven't detected an error in the
file. After this patch we will stop flushing at this point if
ratelimiting is disabled.

You might want to add that information to the commit message to make it
easier to track if this ends up causing a regression.

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] multifd: fix counters in multifd_send_thread
  2023-09-22  6:56 ` [PATCH 3/4] multifd: fix counters in multifd_send_thread Elena Ufimtseva
@ 2023-09-22 18:13   ` Fabiano Rosas
  0 siblings, 0 replies; 13+ messages in thread
From: Fabiano Rosas @ 2023-09-22 18:13 UTC (permalink / raw)
  To: Elena Ufimtseva, quintela, peterx, leobras; +Cc: elena.ufimtseva, qemu-devel

Elena Ufimtseva <elena.ufimtseva@oracle.com> writes:

> Previous commit cbec7eb76879d419e7dbf531ee2506ec0722e825
> "migration/multifd: Compute transferred bytes correctly"
> removed accounting for packet_len in non-rdma
> case, but the next_packet_size only accounts for pages, not for
> the header packet (normal_pages * PAGE_SIZE) that is being sent
> as iov[0]. The packet_len part should be added to account for
> the size of MultiFDPacket and the array of the offsets.
>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

I don't really understand the purpose of next_packet_size, but the
accounting and explanation seem correct.

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] multifd: reset next_packet_len after sending pages
  2023-09-22  6:56 ` [PATCH 4/4] multifd: reset next_packet_len after sending pages Elena Ufimtseva
@ 2023-09-22 18:32   ` Fabiano Rosas
  2023-09-22 18:44   ` Fabiano Rosas
  1 sibling, 0 replies; 13+ messages in thread
From: Fabiano Rosas @ 2023-09-22 18:32 UTC (permalink / raw)
  To: Elena Ufimtseva, quintela, peterx, leobras; +Cc: elena.ufimtseva, qemu-devel

Elena Ufimtseva <elena.ufimtseva@oracle.com> writes:

> Sometimes multifd sends just sync packet with no pages
> (normal_num is 0). In this case the old value is being
> preserved and being accounted for while only packet_len
> is being transferred.
> Reset it to 0 after sending and accounting for.
>
> TODO: Fix the same packet ids in the stream.
> with this patch, there is still an issue with the duplicated
> packets ids being sent (with different number of pages/flags).
> See in below multifd_send trace (before this change):
> multifd_send 394.774 pid=55477 id=0x1 packet_num=0x6f0 normal=0x57 flags=0x1 next_packet_size=0x57000
> multifd_send 181.244 pid=55477 id=0x1 packet_num=0x6f0 normal=0x0 flags=0x0 next_packet_size=0x57000
>
> With this commit there are still duplicated packets, but since no pages
> are being sent with sync flag set, next_packet_size is 0:
> multifd_send 27.814 pid=18602 id=0x1 packet_num=0x574 normal=0x7b flags=0x1 next_packet_size=0x7b000
> multifd_send 136054.792 pid=18602 id=0x1 packet_num=0x574 normal=0x0 flags=0x0 next_packet_size=0x0
> If there is a suggestion how to fix this properly, I will be
> glad to use it.
>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> ---
>  migration/multifd.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 3281397b18..8b4e26051b 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -730,6 +730,7 @@ static void *multifd_send_thread(void *opaque)
>                         p->next_packet_size + p->packet_len);
>              stat64_add(&mig_stats.transferred,
>                         p->next_packet_size + p->packet_len);
> +            p->next_packet_size = 0;
>              qemu_mutex_lock(&p->mutex);
>              p->pending_job--;
>              qemu_mutex_unlock(&p->mutex);

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] multifd: reset next_packet_len after sending pages
  2023-09-22  6:56 ` [PATCH 4/4] multifd: reset next_packet_len after sending pages Elena Ufimtseva
  2023-09-22 18:32   ` Fabiano Rosas
@ 2023-09-22 18:44   ` Fabiano Rosas
  1 sibling, 0 replies; 13+ messages in thread
From: Fabiano Rosas @ 2023-09-22 18:44 UTC (permalink / raw)
  To: Elena Ufimtseva, quintela, peterx, leobras; +Cc: elena.ufimtseva, qemu-devel

Elena Ufimtseva <elena.ufimtseva@oracle.com> writes:

> Sometimes multifd sends just sync packet with no pages
> (normal_num is 0). In this case the old value is being
> preserved and being accounted for while only packet_len
> is being transferred.
> Reset it to 0 after sending and accounting for.
>

Usually you'd finish your commit message here with your Signed off tag
and the comments below would be included after the following ---
sign. That way we can merge this patch without bring the unrelated
discussion along.

> TODO: Fix the same packet ids in the stream.
> with this patch, there is still an issue with the duplicated
> packets ids being sent (with different number of pages/flags).
> See in below multifd_send trace (before this change):
> multifd_send 394.774 pid=55477 id=0x1 packet_num=0x6f0 normal=0x57 flags=0x1 next_packet_size=0x57000
> multifd_send 181.244 pid=55477 id=0x1 packet_num=0x6f0 normal=0x0 flags=0x0 next_packet_size=0x57000
>
> With this commit there are still duplicated packets, but since no pages
> are being sent with sync flag set, next_packet_size is 0:
> multifd_send 27.814 pid=18602 id=0x1 packet_num=0x574 normal=0x7b flags=0x1 next_packet_size=0x7b000
> multifd_send 136054.792 pid=18602 id=0x1 packet_num=0x574 normal=0x0 flags=0x0 next_packet_size=0x0
> If there is a suggestion how to fix this properly, I will be
> glad to use it.

What is going on here? Is it that we're incrementing the
multifd_send_state->packet_num under different locks? Because p->mutex
is per-channel? You could try turning that into an atomic operation
instead.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] multifd: wait for channels_ready before sending sync
  2023-09-22 16:06   ` Fabiano Rosas
@ 2023-09-22 23:18     ` Elena Ufimtseva
  0 siblings, 0 replies; 13+ messages in thread
From: Elena Ufimtseva @ 2023-09-22 23:18 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: quintela, peterx, leobras, qemu-devel

On Fri, Sep 22, 2023 at 01:06:53PM -0300, Fabiano Rosas wrote:
> Elena Ufimtseva <elena.ufimtseva@oracle.com> writes:
> 
> > In multifd_send_sync_main we need to wait for channels_ready
> > before submitting sync packet as the threads may still be sending
> > their previous pages.
> > There is also no need to check for channels_ready in the loop
> > before the wait for sem_sync, next iteration of sending pages
> > or another sync will start with waiting for channels_ready
> > semaphore.
> > Changes to commit 90b3cec351996dd8ef4eb847ad38607812c5e7f5
> > ("multifd: Fix the number of channels ready")
> >
> > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > ---
> >  migration/multifd.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index 0f6b203877..e61e458151 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -595,6 +595,7 @@ int multifd_send_sync_main(QEMUFile *f)
> >          }
> >      }
> >  
> > +    qemu_sem_wait(&multifd_send_state->channels_ready);
> >      /*
> >       * When using zero-copy, it's necessary to flush the pages before any of
> >       * the pages can be sent again, so we'll make sure the new version of the
> > @@ -630,7 +631,6 @@ int multifd_send_sync_main(QEMUFile *f)
> >      for (i = 0; i < migrate_multifd_channels(); i++) {
> >          MultiFDSendParams *p = &multifd_send_state->params[i];
> >  
> > -        qemu_sem_wait(&multifd_send_state->channels_ready);
> >          trace_multifd_send_sync_main_wait(p->id);
> >          qemu_sem_wait(&p->sem_sync);
> 
> Please take a look at the series I just sent. Basically, I think we
> should wait on 'sem' for the number of existing channels and not just
> once per sync. Otherwise I think we'd hit the same issue this patch is
> trying to fix when we loop into the n+1 channels. I think the
> assert(!p->pending_job) in patch 3 helps prove that's more appropriate.

Thank you!

These patches make sense to me.
Agree on redundant sem_sync. Lets see what others think.

I will run some tests as well with your patches and spend
more time looking at [2/3] patch.

Elena U.

> 
> Let me know what you think.
> 
> Thanks


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] migration: check for rate_limit_max for RATE_LIMIT_DISABLED
  2023-09-22  6:56 ` [PATCH 2/4] migration: check for rate_limit_max for RATE_LIMIT_DISABLED Elena Ufimtseva
  2023-09-22 17:38   ` Fabiano Rosas
@ 2023-10-10 20:17   ` Peter Xu
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Xu @ 2023-10-10 20:17 UTC (permalink / raw)
  To: Elena Ufimtseva; +Cc: quintela, leobras, qemu-devel

On Thu, Sep 21, 2023 at 11:56:23PM -0700, Elena Ufimtseva wrote:
> In migration rate limiting atomic operations are used
> to read the rate limit variables and transferred bytes and
> they are expensive. Check first if rate_limit_max is equal
> to RATE_LIMIT_DISABLED and return false immediately if so.
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

One trivial comment:

> ---
>  migration/migration-stats.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/migration-stats.c b/migration/migration-stats.c
> index 095d6d75bb..abc31483d5 100644
> --- a/migration/migration-stats.c
> +++ b/migration/migration-stats.c
> @@ -24,14 +24,14 @@ bool migration_rate_exceeded(QEMUFile *f)
>          return true;
>      }
>  
> -    uint64_t rate_limit_start = stat64_get(&mig_stats.rate_limit_start);
> -    uint64_t rate_limit_current = migration_transferred_bytes(f);
> -    uint64_t rate_limit_used = rate_limit_current - rate_limit_start;
>      uint64_t rate_limit_max = stat64_get(&mig_stats.rate_limit_max);

Side note: since we have a helper, this can be migration_rate_get() too.

> -
>      if (rate_limit_max == RATE_LIMIT_DISABLED) {
>          return false;
>      }

empty line would be nice.

> +    uint64_t rate_limit_start = stat64_get(&mig_stats.rate_limit_start);
> +    uint64_t rate_limit_current = migration_transferred_bytes(f);
> +    uint64_t rate_limit_used = rate_limit_current - rate_limit_start;
> +
>      if (rate_limit_max > 0 && rate_limit_used > rate_limit_max) {
>          return true;
>      }
> -- 
> 2.34.1
> 
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-10-10 20:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-22  6:56 [PATCH 0/4] multifd: various fixes Elena Ufimtseva
2023-09-22  6:56 ` [PATCH 1/4] multifd: wait for channels_ready before sending sync Elena Ufimtseva
2023-09-22 16:06   ` Fabiano Rosas
2023-09-22 23:18     ` Elena Ufimtseva
2023-09-22  6:56 ` [PATCH 2/4] migration: check for rate_limit_max for RATE_LIMIT_DISABLED Elena Ufimtseva
2023-09-22 17:38   ` Fabiano Rosas
2023-10-10 20:17   ` Peter Xu
2023-09-22  6:56 ` [PATCH 3/4] multifd: fix counters in multifd_send_thread Elena Ufimtseva
2023-09-22 18:13   ` Fabiano Rosas
2023-09-22  6:56 ` [PATCH 4/4] multifd: reset next_packet_len after sending pages Elena Ufimtseva
2023-09-22 18:32   ` Fabiano Rosas
2023-09-22 18:44   ` Fabiano Rosas
2023-09-22 14:18 ` [PATCH 0/4] multifd: various fixes Fabiano Rosas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).