All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] migration/multifd: SYNC packet changes
@ 2023-09-22 14:53 Fabiano Rosas
  2023-09-22 14:53 ` [RFC PATCH 1/3] migration/multifd: Move channels_ready semaphore Fabiano Rosas
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Fabiano Rosas @ 2023-09-22 14:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Leonardo Bras, Elena Ufimtseva

I'm putting this RFC out early so we can discuss the issues around the
SYNC packet of the multifd protocol. There's a related series posted
by Elena Ufimtseva:

https://lore.kernel.org/r/20230922065625.21848-1-elena.ufimtseva@oracle.com

My interest in this (aside from correctness) is that when doing the
(upcoming) fixed-ram[1] migration, I would like to have multifd ignore
the concept of packets altogether, since the file: migration is not
synchronous.

The main problem I hit is that multifd (ab)uses the knowledge that a
sync packet is sent after a batch of pages and relies (perhaps
inadvertently) on the last post to sem_sync to finish the
migration. Which means that without the sync, the main thread just
rushes and does cleanup while packets are still in flight.

I have add another patch to this series that introduces a
multifd-nopackets option (placeholder name), but it's probably too
early to discuss that so I'm leaving it out.

1- https://lore.kernel.org/r/20230330180336.2791-1-farosas@suse.de

Fabiano Rosas (3):
  migration/multifd: Move channels_ready semaphore
  migration/multifd: Decouple control flow from the SYNC packet
  migration/multifd: Extract sem_done waiting into a function

 migration/multifd.c    | 97 +++++++++++++++++++++++++-----------------
 migration/multifd.h    |  8 ++--
 migration/trace-events |  2 +-
 3 files changed, 63 insertions(+), 44 deletions(-)

-- 
2.35.3



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 1/3] migration/multifd: Move channels_ready semaphore
  2023-09-22 14:53 [RFC PATCH 0/3] migration/multifd: SYNC packet changes Fabiano Rosas
@ 2023-09-22 14:53 ` Fabiano Rosas
  2023-09-22 22:33   ` Elena Ufimtseva
  2023-10-10 21:00   ` Peter Xu
  2023-09-22 14:53 ` [RFC PATCH 2/3] migration/multifd: Decouple control flow from the SYNC packet Fabiano Rosas
  2023-09-22 14:53 ` [RFC PATCH 3/3] migration/multifd: Extract sem_done waiting into a function Fabiano Rosas
  2 siblings, 2 replies; 10+ messages in thread
From: Fabiano Rosas @ 2023-09-22 14:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Leonardo Bras, Elena Ufimtseva

Commit d2026ee117 ("multifd: Fix the number of channels ready") moved
the "post" of channels_ready to the start of the multifd_send_thread()
loop and added a missing "wait" at multifd_send_sync_main(). While it
does work, the placement of the wait goes against what the rest of the
code does.

The sequence at multifd_send_thread() is:

    qemu_sem_post(&multifd_send_state->channels_ready);
    qemu_sem_wait(&p->sem);
    <work>
    if (flags & MULTIFD_FLAG_SYNC) {
        qemu_sem_post(&p->sem_sync);
    }

Which means that the sending thread makes itself available
(channels_ready) and waits for more work (sem). So the sequence in the
migration thread should be to check if any channel is available
(channels_ready), give it some work and set it off (sem):

    qemu_sem_wait(&multifd_send_state->channels_ready);
    <enqueue work>
    qemu_sem_post(&p->sem);
    if (flags & MULTIFD_FLAG_SYNC) {
        qemu_sem_wait(&p->sem_sync);
    }

The reason there's no deadlock today is that the migration thread
enqueues the SYNC packet right before the wait on channels_ready and
we end up taking advantage of the out-of-order post to sem:

        ...
        qemu_sem_post(&p->sem);
    }
    for (i = 0; i < migrate_multifd_channels(); i++) {
        MultiFDSendParams *p = &multifd_send_state->params[i];

        qemu_sem_wait(&multifd_send_state->channels_ready);
        trace_multifd_send_sync_main_wait(p->id);
        qemu_sem_wait(&p->sem_sync);
	...

Move the channels_ready wait before the sem post to keep the sequence
consistent. Also fix the error path to post to channels_ready and
sem_sync in the correct order.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index a7c7a947e3..d626740f2f 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -618,6 +618,7 @@ int multifd_send_sync_main(QEMUFile *f)
 
         trace_multifd_send_sync_main_signal(p->id);
 
+        qemu_sem_wait(&multifd_send_state->channels_ready);
         qemu_mutex_lock(&p->mutex);
 
         if (p->quit) {
@@ -635,7 +636,6 @@ int multifd_send_sync_main(QEMUFile *f)
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
 
-        qemu_sem_wait(&multifd_send_state->channels_ready);
         trace_multifd_send_sync_main_wait(p->id);
         qemu_sem_wait(&p->sem_sync);
 
@@ -763,8 +763,8 @@ out:
      * who pay attention to me.
      */
     if (ret != 0) {
-        qemu_sem_post(&p->sem_sync);
         qemu_sem_post(&multifd_send_state->channels_ready);
+        qemu_sem_post(&p->sem_sync);
     }
 
     qemu_mutex_lock(&p->mutex);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/3] migration/multifd: Decouple control flow from the SYNC packet
  2023-09-22 14:53 [RFC PATCH 0/3] migration/multifd: SYNC packet changes Fabiano Rosas
  2023-09-22 14:53 ` [RFC PATCH 1/3] migration/multifd: Move channels_ready semaphore Fabiano Rosas
@ 2023-09-22 14:53 ` Fabiano Rosas
  2023-09-22 14:53 ` [RFC PATCH 3/3] migration/multifd: Extract sem_done waiting into a function Fabiano Rosas
  2 siblings, 0 replies; 10+ messages in thread
From: Fabiano Rosas @ 2023-09-22 14:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Leonardo Bras, Elena Ufimtseva

We currently have the sem_sync semaphore that is used:

1) on the sending side, to know when the multifd_send_thread has
   finished sending the MULTIFD_FLAG_SYNC packet;

  This is unnecessary. Multifd sends packets (not pages) one by one
  and completion is already bound by both the channels_ready and sem
  semaphores. The SYNC packet has nothing special that would require
  it to have a separate semaphore on the sending side.

2) on the receiving side, to know when the multifd_recv_thread has
   finished receiving the MULTIFD_FLAG_SYNC packet;

  This is unnecessary because the multifd_recv_state->sem_sync
  semaphore already does the same thing. We care that the SYNC arrived
  from the source, knowing that the SYNC has been received by the recv
  thread doesn't add anything.

3) on both sending and receiving sides, to wait for the multifd threads
   to finish before cleaning up;

   This happens because multifd_send_sync_main() blocks
   ram_save_complete() from finishing until the semaphore is
   posted. This is surprising and not documented.

Clarify the above situation by renaming 'sem_sync' to 'sem_done' and
making the #3 usage the main one. Stop tracking the SYNC packet on
source (#1) and leave multifd_recv_state->sem_sync untouched on the
destination (#2).

Due to the 'channels_ready' and 'sem' semaphores, we always send
packets in lockstep with switching MultiFDSendParams, so
p->pending_job is always either 1 or 0. The thread has no knowledge of
whether it will have more to send once it posts to
channels_ready. Send it on an extra loop so it sees no pending_job and
releases the semaphore.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd.c    | 89 ++++++++++++++++++++++++++++++++----------
 migration/multifd.h    |  8 ++--
 migration/trace-events |  2 +-
 3 files changed, 73 insertions(+), 26 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index d626740f2f..3d4a631915 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -541,7 +541,7 @@ void multifd_save_cleanup(void)
         p->c = NULL;
         qemu_mutex_destroy(&p->mutex);
         qemu_sem_destroy(&p->sem);
-        qemu_sem_destroy(&p->sem_sync);
+        qemu_sem_destroy(&p->sem_done);
         g_free(p->name);
         p->name = NULL;
         multifd_pages_clear(p->pages);
@@ -592,7 +592,7 @@ int multifd_send_sync_main(QEMUFile *f)
 
     if (!migrate_multifd()) {
         return 0;
-    }
+
     if (multifd_send_state->pages->num) {
         if (multifd_send_pages(f) < 0) {
             error_report("%s: multifd_send_pages fail", __func__);
@@ -600,6 +600,12 @@ int multifd_send_sync_main(QEMUFile *f)
         }
     }
 
+    /* wait for all channels to be idle */
+    for (i = 0; i < migrate_multifd_channels(); i++) {
+        trace_multifd_send_sync_main_wait(p->id);
+        qemu_sem_wait(&multifd_send_state->channels_ready);
+    }
+
     /*
      * When using zero-copy, it's necessary to flush the pages before any of
      * the pages can be sent again, so we'll make sure the new version of the
@@ -610,9 +616,46 @@ int multifd_send_sync_main(QEMUFile *f)
      * to be less frequent, e.g. only after we finished one whole scanning of
      * all the dirty bitmaps.
      */
-
     flush_zero_copy = migrate_zero_copy_send();
 
+    for (i = 0; i < migrate_multifd_channels(); i++) {
+        MultiFDSendParams *p = &multifd_send_state->params[i];
+
+        qemu_mutex_lock(&p->mutex);
+        assert(!p->pending_job);
+        qemu_mutex_unlock(&p->mutex);
+
+        qemu_sem_post(&p->sem);
+        qemu_sem_wait(&p->sem_done);
+
+        if (flush_zero_copy && p->c && (multifd_zero_copy_flush(p->c) < 0)) {
+            return -1;
+        }
+    }
+
+    /*
+     * All channels went idle and have no more jobs. Unless we send
+     * them more work, we're good to allow any cleanup code to run at
+     * this point.
+     */
+
+    return 0;
+}
+
+int multifd_send_sync_main(QEMUFile *f)
+{
+    int i, ret;
+
+    if (!migrate_multifd()) {
+        return 0;
+    }
+    if (multifd_send_state->pages->num) {
+        if (multifd_send_pages(f) < 0) {
+            error_report("%s: multifd_send_pages fail", __func__);
+            return -1;
+        }
+    }
+
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
 
@@ -633,11 +676,21 @@ int multifd_send_sync_main(QEMUFile *f)
         qemu_mutex_unlock(&p->mutex);
         qemu_sem_post(&p->sem);
     }
+
+    for (i = 0; i < migrate_multifd_channels(); i++) {
+        trace_multifd_send_wait(migrate_multifd_channels() - i);
+        qemu_sem_wait(&multifd_send_state->channels_ready);
+    }
+
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
 
-        trace_multifd_send_sync_main_wait(p->id);
-        qemu_sem_wait(&p->sem_sync);
+        qemu_mutex_lock(&p->mutex);
+        assert(!p->pending_job);
+        qemu_mutex_unlock(&p->mutex);
+
+        qemu_sem_post(&p->sem);
+        qemu_sem_wait(&p->sem_done);
 
         if (flush_zero_copy && p->c && (multifd_zero_copy_flush(p->c) < 0)) {
             return -1;
@@ -739,15 +792,9 @@ static void *multifd_send_thread(void *opaque)
             p->pending_job--;
             qemu_mutex_unlock(&p->mutex);
 
-            if (flags & MULTIFD_FLAG_SYNC) {
-                qemu_sem_post(&p->sem_sync);
-            }
-        } else if (p->quit) {
-            qemu_mutex_unlock(&p->mutex);
-            break;
         } else {
+            qemu_sem_post(&p->sem_done);
             qemu_mutex_unlock(&p->mutex);
-            /* sometimes there are spurious wakeups */
         }
     }
 
@@ -764,7 +811,7 @@ out:
      */
     if (ret != 0) {
         qemu_sem_post(&multifd_send_state->channels_ready);
-        qemu_sem_post(&p->sem_sync);
+        qemu_sem_post(&p->sem_done);
     }
 
     qemu_mutex_lock(&p->mutex);
@@ -802,7 +849,7 @@ static void multifd_tls_outgoing_handshake(QIOTask *task,
          */
         p->quit = true;
         qemu_sem_post(&multifd_send_state->channels_ready);
-        qemu_sem_post(&p->sem_sync);
+        qemu_sem_post(&p->sem_done);
     }
 }
 
@@ -880,7 +927,7 @@ static void multifd_new_send_channel_cleanup(MultiFDSendParams *p,
      migrate_set_error(migrate_get_current(), err);
      /* Error happen, we need to tell who pay attention to me */
      qemu_sem_post(&multifd_send_state->channels_ready);
-     qemu_sem_post(&p->sem_sync);
+     qemu_sem_post(&p->sem_done);
      /*
       * Although multifd_send_thread is not created, but main migration
       * thread need to judge whether it is running, so we need to mark
@@ -938,7 +985,7 @@ int multifd_save_setup(Error **errp)
 
         qemu_mutex_init(&p->mutex);
         qemu_sem_init(&p->sem, 0);
-        qemu_sem_init(&p->sem_sync, 0);
+        qemu_sem_init(&p->sem_done, 0);
         p->quit = false;
         p->pending_job = 0;
         p->id = i;
@@ -1047,7 +1094,7 @@ void multifd_load_cleanup(void)
              * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code,
              * however try to wakeup it without harm in cleanup phase.
              */
-            qemu_sem_post(&p->sem_sync);
+            qemu_sem_post(&p->sem_done);
         }
 
         qemu_thread_join(&p->thread);
@@ -1059,7 +1106,7 @@ void multifd_load_cleanup(void)
         object_unref(OBJECT(p->c));
         p->c = NULL;
         qemu_mutex_destroy(&p->mutex);
-        qemu_sem_destroy(&p->sem_sync);
+        qemu_sem_destroy(&p->sem_done);
         g_free(p->name);
         p->name = NULL;
         p->packet_len = 0;
@@ -1100,7 +1147,7 @@ void multifd_recv_sync_main(void)
             }
         }
         trace_multifd_recv_sync_main_signal(p->id);
-        qemu_sem_post(&p->sem_sync);
+        qemu_sem_post(&p->sem_done);
     }
     trace_multifd_recv_sync_main(multifd_recv_state->packet_num);
 }
@@ -1152,7 +1199,7 @@ static void *multifd_recv_thread(void *opaque)
 
         if (flags & MULTIFD_FLAG_SYNC) {
             qemu_sem_post(&multifd_recv_state->sem_sync);
-            qemu_sem_wait(&p->sem_sync);
+            qemu_sem_wait(&p->sem_done);
         }
     }
 
@@ -1195,7 +1242,7 @@ int multifd_load_setup(Error **errp)
         MultiFDRecvParams *p = &multifd_recv_state->params[i];
 
         qemu_mutex_init(&p->mutex);
-        qemu_sem_init(&p->sem_sync, 0);
+        qemu_sem_init(&p->sem_done, 0);
         p->quit = false;
         p->id = i;
         p->packet_len = sizeof(MultiFDPacket_t)
diff --git a/migration/multifd.h b/migration/multifd.h
index a835643b48..2d53f91da3 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -90,8 +90,8 @@ typedef struct {
 
     /* sem where to wait for more work */
     QemuSemaphore sem;
-    /* syncs main thread and channels */
-    QemuSemaphore sem_sync;
+    /* channel is done transmitting until more pages are queued */
+    QemuSemaphore sem_done;
 
     /* this mutex protects the following parameters */
     QemuMutex mutex;
@@ -153,8 +153,8 @@ typedef struct {
     /* number of pages in a full packet */
     uint32_t page_count;
 
-    /* syncs main thread and channels */
-    QemuSemaphore sem_sync;
+    /* channel is done transmitting until more pages are queued */
+    QemuSemaphore sem_done;
 
     /* this mutex protects the following parameters */
     QemuMutex mutex;
diff --git a/migration/trace-events b/migration/trace-events
index 4666f19325..4367a1a22b 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -136,7 +136,7 @@ multifd_send(uint8_t id, uint64_t packet_num, uint32_t normal, uint32_t flags, u
 multifd_send_error(uint8_t id) "channel %u"
 multifd_send_sync_main(long packet_num) "packet num %ld"
 multifd_send_sync_main_signal(uint8_t id) "channel %u"
-multifd_send_sync_main_wait(uint8_t id) "channel %u"
+multifd_send_wait(uint8_t n) "waiting for %u channels to finish sending"
 multifd_send_terminate_threads(bool error) "error %d"
 multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages) "channel %u packets %" PRIu64 " normal pages %"  PRIu64
 multifd_send_thread_start(uint8_t id) "%u"
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 3/3] migration/multifd: Extract sem_done waiting into a function
  2023-09-22 14:53 [RFC PATCH 0/3] migration/multifd: SYNC packet changes Fabiano Rosas
  2023-09-22 14:53 ` [RFC PATCH 1/3] migration/multifd: Move channels_ready semaphore Fabiano Rosas
  2023-09-22 14:53 ` [RFC PATCH 2/3] migration/multifd: Decouple control flow from the SYNC packet Fabiano Rosas
@ 2023-09-22 14:53 ` Fabiano Rosas
  2 siblings, 0 replies; 10+ messages in thread
From: Fabiano Rosas @ 2023-09-22 14:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Leonardo Bras, Elena Ufimtseva

This helps document the intent of the loop via the function name and
we can reuse this in the future.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd.c | 38 +++++---------------------------------
 1 file changed, 5 insertions(+), 33 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 3d4a631915..159225530d 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -585,24 +585,14 @@ static int multifd_zero_copy_flush(QIOChannel *c)
     return ret;
 }
 
-int multifd_send_sync_main(QEMUFile *f)
+static int multifd_send_wait(void)
 {
-    int i;
     bool flush_zero_copy;
-
-    if (!migrate_multifd()) {
-        return 0;
-
-    if (multifd_send_state->pages->num) {
-        if (multifd_send_pages(f) < 0) {
-            error_report("%s: multifd_send_pages fail", __func__);
-            return -1;
-        }
-    }
+    int i;
 
     /* wait for all channels to be idle */
     for (i = 0; i < migrate_multifd_channels(); i++) {
-        trace_multifd_send_sync_main_wait(p->id);
+        trace_multifd_send_wait(migrate_multifd_channels() - i);
         qemu_sem_wait(&multifd_send_state->channels_ready);
     }
 
@@ -677,28 +667,10 @@ int multifd_send_sync_main(QEMUFile *f)
         qemu_sem_post(&p->sem);
     }
 
-    for (i = 0; i < migrate_multifd_channels(); i++) {
-        trace_multifd_send_wait(migrate_multifd_channels() - i);
-        qemu_sem_wait(&multifd_send_state->channels_ready);
-    }
-
-    for (i = 0; i < migrate_multifd_channels(); i++) {
-        MultiFDSendParams *p = &multifd_send_state->params[i];
-
-        qemu_mutex_lock(&p->mutex);
-        assert(!p->pending_job);
-        qemu_mutex_unlock(&p->mutex);
-
-        qemu_sem_post(&p->sem);
-        qemu_sem_wait(&p->sem_done);
-
-        if (flush_zero_copy && p->c && (multifd_zero_copy_flush(p->c) < 0)) {
-            return -1;
-        }
-    }
+    ret = multifd_send_wait();
     trace_multifd_send_sync_main(multifd_send_state->packet_num);
 
-    return 0;
+    return ret;
 }
 
 static void *multifd_send_thread(void *opaque)
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/3] migration/multifd: Move channels_ready semaphore
  2023-09-22 14:53 ` [RFC PATCH 1/3] migration/multifd: Move channels_ready semaphore Fabiano Rosas
@ 2023-09-22 22:33   ` Elena Ufimtseva
  2023-09-29 14:41     ` Fabiano Rosas
  2023-10-10 21:00   ` Peter Xu
  1 sibling, 1 reply; 10+ messages in thread
From: Elena Ufimtseva @ 2023-09-22 22:33 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Juan Quintela, Peter Xu, Leonardo Bras

On Fri, Sep 22, 2023 at 11:53:17AM -0300, Fabiano Rosas wrote:
> Commit d2026ee117 ("multifd: Fix the number of channels ready") moved
> the "post" of channels_ready to the start of the multifd_send_thread()
> loop and added a missing "wait" at multifd_send_sync_main(). While it
> does work, the placement of the wait goes against what the rest of the
> code does.
> 
> The sequence at multifd_send_thread() is:
> 
>     qemu_sem_post(&multifd_send_state->channels_ready);
>     qemu_sem_wait(&p->sem);
>     <work>
>     if (flags & MULTIFD_FLAG_SYNC) {
>         qemu_sem_post(&p->sem_sync);
>     }
> 
> Which means that the sending thread makes itself available
> (channels_ready) and waits for more work (sem). So the sequence in the
> migration thread should be to check if any channel is available
> (channels_ready), give it some work and set it off (sem):
> 
>     qemu_sem_wait(&multifd_send_state->channels_ready);
>     <enqueue work>
>     qemu_sem_post(&p->sem);
>     if (flags & MULTIFD_FLAG_SYNC) {
>         qemu_sem_wait(&p->sem_sync);
>     }
> 
> The reason there's no deadlock today is that the migration thread
> enqueues the SYNC packet right before the wait on channels_ready and
> we end up taking advantage of the out-of-order post to sem:
> 
>         ...
>         qemu_sem_post(&p->sem);
>     }
>     for (i = 0; i < migrate_multifd_channels(); i++) {
>         MultiFDSendParams *p = &multifd_send_state->params[i];
> 
>         qemu_sem_wait(&multifd_send_state->channels_ready);
>         trace_multifd_send_sync_main_wait(p->id);
>         qemu_sem_wait(&p->sem_sync);
> 	...
> 
> Move the channels_ready wait before the sem post to keep the sequence
> consistent. Also fix the error path to post to channels_ready and
> sem_sync in the correct order.
>

Thank you Fabiano,

Your solution is more complete. I also had in mind getting rid of
sem_sync.

With your second patch, this one could be merged with it?

> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/multifd.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index a7c7a947e3..d626740f2f 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -618,6 +618,7 @@ int multifd_send_sync_main(QEMUFile *f)
>  
>          trace_multifd_send_sync_main_signal(p->id);
>  
> +        qemu_sem_wait(&multifd_send_state->channels_ready);
>          qemu_mutex_lock(&p->mutex);
>  
>          if (p->quit) {
> @@ -635,7 +636,6 @@ int multifd_send_sync_main(QEMUFile *f)
>      for (i = 0; i < migrate_multifd_channels(); i++) {
>          MultiFDSendParams *p = &multifd_send_state->params[i];
>  
> -        qemu_sem_wait(&multifd_send_state->channels_ready);
>          trace_multifd_send_sync_main_wait(p->id);
>          qemu_sem_wait(&p->sem_sync);
>  
> @@ -763,8 +763,8 @@ out:
>       * who pay attention to me.
>       */
>      if (ret != 0) {
> -        qemu_sem_post(&p->sem_sync);
>          qemu_sem_post(&multifd_send_state->channels_ready);
> +        qemu_sem_post(&p->sem_sync);

Can this thread in this error case be woken up again between
these two qemu_sem_posts?
I see in other places p->quit is set to true before it.
Or maybe it should one more patch to make these consistent 
as well.

Elena U.
>      }
>  
>      qemu_mutex_lock(&p->mutex);
> -- 
> 2.35.3
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/3] migration/multifd: Move channels_ready semaphore
  2023-09-22 22:33   ` Elena Ufimtseva
@ 2023-09-29 14:41     ` Fabiano Rosas
  0 siblings, 0 replies; 10+ messages in thread
From: Fabiano Rosas @ 2023-09-29 14:41 UTC (permalink / raw)
  To: Elena Ufimtseva; +Cc: qemu-devel, Juan Quintela, Peter Xu, Leonardo Bras

Elena Ufimtseva <elena.ufimtseva@oracle.com> writes:

> On Fri, Sep 22, 2023 at 11:53:17AM -0300, Fabiano Rosas wrote:
>> Commit d2026ee117 ("multifd: Fix the number of channels ready") moved
>> the "post" of channels_ready to the start of the multifd_send_thread()
>> loop and added a missing "wait" at multifd_send_sync_main(). While it
>> does work, the placement of the wait goes against what the rest of the
>> code does.
>> 
>> The sequence at multifd_send_thread() is:
>> 
>>     qemu_sem_post(&multifd_send_state->channels_ready);
>>     qemu_sem_wait(&p->sem);
>>     <work>
>>     if (flags & MULTIFD_FLAG_SYNC) {
>>         qemu_sem_post(&p->sem_sync);
>>     }
>> 
>> Which means that the sending thread makes itself available
>> (channels_ready) and waits for more work (sem). So the sequence in the
>> migration thread should be to check if any channel is available
>> (channels_ready), give it some work and set it off (sem):
>> 
>>     qemu_sem_wait(&multifd_send_state->channels_ready);
>>     <enqueue work>
>>     qemu_sem_post(&p->sem);
>>     if (flags & MULTIFD_FLAG_SYNC) {
>>         qemu_sem_wait(&p->sem_sync);
>>     }
>> 
>> The reason there's no deadlock today is that the migration thread
>> enqueues the SYNC packet right before the wait on channels_ready and
>> we end up taking advantage of the out-of-order post to sem:
>> 
>>         ...
>>         qemu_sem_post(&p->sem);
>>     }
>>     for (i = 0; i < migrate_multifd_channels(); i++) {
>>         MultiFDSendParams *p = &multifd_send_state->params[i];
>> 
>>         qemu_sem_wait(&multifd_send_state->channels_ready);
>>         trace_multifd_send_sync_main_wait(p->id);
>>         qemu_sem_wait(&p->sem_sync);
>> 	...
>> 
>> Move the channels_ready wait before the sem post to keep the sequence
>> consistent. Also fix the error path to post to channels_ready and
>> sem_sync in the correct order.
>>
>
> Thank you Fabiano,
>
> Your solution is more complete. I also had in mind getting rid of
> sem_sync.
>
> With your second patch, this one could be merged with it?
>
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  migration/multifd.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index a7c7a947e3..d626740f2f 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -618,6 +618,7 @@ int multifd_send_sync_main(QEMUFile *f)
>>  
>>          trace_multifd_send_sync_main_signal(p->id);
>>  
>> +        qemu_sem_wait(&multifd_send_state->channels_ready);
>>          qemu_mutex_lock(&p->mutex);
>>  
>>          if (p->quit) {
>> @@ -635,7 +636,6 @@ int multifd_send_sync_main(QEMUFile *f)
>>      for (i = 0; i < migrate_multifd_channels(); i++) {
>>          MultiFDSendParams *p = &multifd_send_state->params[i];
>>  
>> -        qemu_sem_wait(&multifd_send_state->channels_ready);
>>          trace_multifd_send_sync_main_wait(p->id);
>>          qemu_sem_wait(&p->sem_sync);
>>  
>> @@ -763,8 +763,8 @@ out:
>>       * who pay attention to me.
>>       */
>>      if (ret != 0) {
>> -        qemu_sem_post(&p->sem_sync);
>>          qemu_sem_post(&multifd_send_state->channels_ready);
>> +        qemu_sem_post(&p->sem_sync);
>
> Can this thread in this error case be woken up again between
> these two qemu_sem_posts?
> I see in other places p->quit is set to true before it.
> Or maybe it should one more patch to make these consistent 
> as well.

That's a good point. There's clearly something going on here if we need
a 'running', a 'quit' and a 'exiting' flag. The tls code uses quit as a
signal in one direction while the regular multifd path uses it in
another.

I'll give it some more thought. Thanks


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/3] migration/multifd: Move channels_ready semaphore
  2023-09-22 14:53 ` [RFC PATCH 1/3] migration/multifd: Move channels_ready semaphore Fabiano Rosas
  2023-09-22 22:33   ` Elena Ufimtseva
@ 2023-10-10 21:00   ` Peter Xu
  2023-10-10 21:40     ` Peter Xu
  2023-10-10 21:43     ` Fabiano Rosas
  1 sibling, 2 replies; 10+ messages in thread
From: Peter Xu @ 2023-10-10 21:00 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Juan Quintela, Leonardo Bras, Elena Ufimtseva

On Fri, Sep 22, 2023 at 11:53:17AM -0300, Fabiano Rosas wrote:
> Commit d2026ee117 ("multifd: Fix the number of channels ready") moved
> the "post" of channels_ready to the start of the multifd_send_thread()
> loop and added a missing "wait" at multifd_send_sync_main(). While it
> does work, the placement of the wait goes against what the rest of the
> code does.
> 
> The sequence at multifd_send_thread() is:
> 
>     qemu_sem_post(&multifd_send_state->channels_ready);
>     qemu_sem_wait(&p->sem);
>     <work>
>     if (flags & MULTIFD_FLAG_SYNC) {
>         qemu_sem_post(&p->sem_sync);
>     }
> 
> Which means that the sending thread makes itself available
> (channels_ready) and waits for more work (sem). So the sequence in the
> migration thread should be to check if any channel is available
> (channels_ready), give it some work and set it off (sem):
> 
>     qemu_sem_wait(&multifd_send_state->channels_ready);

Here it means we have at least 1 free send thread, then...

>     <enqueue work>
>     qemu_sem_post(&p->sem);

... here we enqueue some work to the current thread (pointed by "i"), no
matter it's free or not, as "i" may not always point to the free thread.

>     if (flags & MULTIFD_FLAG_SYNC) {
>         qemu_sem_wait(&p->sem_sync);
>     }

So I must confess I never fully digest how these sem/mutex/.. worked in
multifd, since the 1st day it's introduced.. so please take below comment
with a grain of salt..

It seems to me that the current design allows >1 pending_job for a thread.
Here the current code didn't do "wait(channels_ready)" because it doesn't
need to - it simply always queue an MULTIFD_FLAG_SYNC pending job over the
thread, and wait for it to run.

From that POV I think I can understand why "wait(channels_ready)" is not
needed here.  But then I'm confused because we don't have a real QUEUE to
put those requests; we simply apply this:

multifd_send_sync_main():
        p->flags |= MULTIFD_FLAG_SYNC;

Even if this send thread can be busy handling a batch of pages and
accessing p->flags.  I think it can actually race with the send thread
reading the flag at the exact same time:

multifd_send_thread():
            multifd_send_fill_packet(p);
            flags = p->flags;  <-------------- here

And whether it sees MULTIFD_FLAG_SYNC is unpredictable.  If it sees it,
it'll post(sem_sync) in this round.  If it doesn't see it, it'll
post(sem_sync) in the next round.  In whatever way, we'll generate an empty
multifd packet to the wire I think, even though I don't know whether that's
needed at all...

I'm not sure whether we should fix it in a more complete form, by not
sending that empty multifd packet at all? Because that only contains the
header without any real page inside, IIUC, so it seems to be a waste of
resource.  Here what we want is only to kick sem_sync?

> 
> The reason there's no deadlock today is that the migration thread
> enqueues the SYNC packet right before the wait on channels_ready and
> we end up taking advantage of the out-of-order post to sem:
> 
>         ...
>         qemu_sem_post(&p->sem);
>     }
>     for (i = 0; i < migrate_multifd_channels(); i++) {
>         MultiFDSendParams *p = &multifd_send_state->params[i];
> 
>         qemu_sem_wait(&multifd_send_state->channels_ready);
>         trace_multifd_send_sync_main_wait(p->id);
>         qemu_sem_wait(&p->sem_sync);
> 	...
> 
> Move the channels_ready wait before the sem post to keep the sequence
> consistent. Also fix the error path to post to channels_ready and
> sem_sync in the correct order.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/multifd.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index a7c7a947e3..d626740f2f 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -618,6 +618,7 @@ int multifd_send_sync_main(QEMUFile *f)
>  
>          trace_multifd_send_sync_main_signal(p->id);
>  
> +        qemu_sem_wait(&multifd_send_state->channels_ready);
>          qemu_mutex_lock(&p->mutex);
>  
>          if (p->quit) {
> @@ -635,7 +636,6 @@ int multifd_send_sync_main(QEMUFile *f)
>      for (i = 0; i < migrate_multifd_channels(); i++) {
>          MultiFDSendParams *p = &multifd_send_state->params[i];
>  
> -        qemu_sem_wait(&multifd_send_state->channels_ready);
>          trace_multifd_send_sync_main_wait(p->id);
>          qemu_sem_wait(&p->sem_sync);
>  
> @@ -763,8 +763,8 @@ out:
>       * who pay attention to me.
>       */
>      if (ret != 0) {
> -        qemu_sem_post(&p->sem_sync);
>          qemu_sem_post(&multifd_send_state->channels_ready);
> +        qemu_sem_post(&p->sem_sync);

I'm not sure why such movement will have a difference; afaiu on the
semaphore semantics, post() to two sems don't matter on order?

>      }
>  
>      qemu_mutex_lock(&p->mutex);
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/3] migration/multifd: Move channels_ready semaphore
  2023-10-10 21:00   ` Peter Xu
@ 2023-10-10 21:40     ` Peter Xu
  2023-10-10 21:43     ` Fabiano Rosas
  1 sibling, 0 replies; 10+ messages in thread
From: Peter Xu @ 2023-10-10 21:40 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Juan Quintela, Leonardo Bras, Elena Ufimtseva

On Tue, Oct 10, 2023 at 05:00:37PM -0400, Peter Xu wrote:
> On Fri, Sep 22, 2023 at 11:53:17AM -0300, Fabiano Rosas wrote:
> > Commit d2026ee117 ("multifd: Fix the number of channels ready") moved
> > the "post" of channels_ready to the start of the multifd_send_thread()
> > loop and added a missing "wait" at multifd_send_sync_main(). While it
> > does work, the placement of the wait goes against what the rest of the
> > code does.
> > 
> > The sequence at multifd_send_thread() is:
> > 
> >     qemu_sem_post(&multifd_send_state->channels_ready);
> >     qemu_sem_wait(&p->sem);
> >     <work>
> >     if (flags & MULTIFD_FLAG_SYNC) {
> >         qemu_sem_post(&p->sem_sync);
> >     }
> > 
> > Which means that the sending thread makes itself available
> > (channels_ready) and waits for more work (sem). So the sequence in the
> > migration thread should be to check if any channel is available
> > (channels_ready), give it some work and set it off (sem):
> > 
> >     qemu_sem_wait(&multifd_send_state->channels_ready);
> 
> Here it means we have at least 1 free send thread, then...
> 
> >     <enqueue work>
> >     qemu_sem_post(&p->sem);
> 
> ... here we enqueue some work to the current thread (pointed by "i"), no
> matter it's free or not, as "i" may not always point to the free thread.
> 
> >     if (flags & MULTIFD_FLAG_SYNC) {
> >         qemu_sem_wait(&p->sem_sync);
> >     }
> 
> So I must confess I never fully digest how these sem/mutex/.. worked in
> multifd, since the 1st day it's introduced.. so please take below comment
> with a grain of salt..
> 
> It seems to me that the current design allows >1 pending_job for a thread.
> Here the current code didn't do "wait(channels_ready)" because it doesn't
> need to - it simply always queue an MULTIFD_FLAG_SYNC pending job over the
> thread, and wait for it to run.
> 
> From that POV I think I can understand why "wait(channels_ready)" is not
> needed here.  But then I'm confused because we don't have a real QUEUE to
> put those requests; we simply apply this:
> 
> multifd_send_sync_main():
>         p->flags |= MULTIFD_FLAG_SYNC;
> 
> Even if this send thread can be busy handling a batch of pages and
> accessing p->flags.  I think it can actually race with the send thread
> reading the flag at the exact same time:
> 
> multifd_send_thread():
>             multifd_send_fill_packet(p);
>             flags = p->flags;  <-------------- here
> 
> And whether it sees MULTIFD_FLAG_SYNC is unpredictable.  If it sees it,
> it'll post(sem_sync) in this round.  If it doesn't see it, it'll
> post(sem_sync) in the next round.  In whatever way, we'll generate an empty
> multifd packet to the wire I think, even though I don't know whether that's
> needed at all...

A correction: Since it's protected by p->mutex, I think we will only get an
empty multifd packet when we have pending_jobs==2.. because then we'll see
pending_job==2 with p->flags==SYNC, we send pages along with flags=SYNC to
dest, after that we kick sem_sync on src, then we found another
pending_jobs==1 even if p->flags will be zero.  The next multifd packet
will be only containing header (flags=0) and with no pages.

> 
> I'm not sure whether we should fix it in a more complete form, by not
> sending that empty multifd packet at all? Because that only contains the
> header without any real page inside, IIUC, so it seems to be a waste of
> resource.  Here what we want is only to kick sem_sync?

When thinking more about it, now I'm unsure whether sync is really working
as expected now in general..

IIUC SYNC message is used to flush all pages from source to destination.
We need that because we want to order the different versions of guest
pages, making sure the new version of a page always arrives later than its
old version, hence after all pages migrated we'll be sure all guest pages
on dest will be the latest.

Let's define "version X for page Y" as PyVx.  Version 1 of page 2 is P2V1.

So if without SYNC, a race can happen like this:

  sender 1         sender 2           receiver 1            receiver 2
  --------         --------           ----------            ----------

  send P1V1
          ...P1 changed content.. queued again in sender 2...
                   send P1V2
          ...If we got unlucky on receiving order of P1 versions...
                                                            receive P1V2
                                      receive P1V1

So if receiver 1 got P1V1 after receiver 2 got P1V2, we'll ultimately have
P1V1 on dst, which is an old data, causing data corrupt after migration.

Now we have the SYNC packet, but would it always work?  I'll discuss with
the latest RAM_SAVE_FLAG_MULTIFD_FLUSH sync message:

  src main    sender 1     sender 2    dst main  receiver 1  receiver 2
  --------    --------     --------    --------  ----------  ----------

              send P1V1
  send MULTIFD_FLUSH   
          ...P1 changed.. queued again in sender 2...
                           send P1V2
                                       receive MULTIFD_FLUSH
                                       (but since nothing received, flush nothing)
                                                             receive P1V2
                                                 receive P1V1

IIUC the problem is MULTIFD_FLUSH now does not rely on dest QEMU receiving
all existing pages sent.  Since the main channel is also a separate channel
from other multifd channels, I don't see why above cannot happen.

I think the problem will go away if e.g. src QEMU will need an SYNC_ACK
from dest qemu, making sure dest qemu digested all the sent pages.  Or, we
always send the same page via the same channel, e.g. by a hash(page_index).

I had a feeling that we have a bug, we just never hit it, because we don't
send P1V1 and P1V2 that close; we only do that for each whole iteration
looping over all ramblocks in find_dirty_block().  But it seems the bug is
possible, but I'll be very happy to be proven wrong by anyone..

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/3] migration/multifd: Move channels_ready semaphore
  2023-10-10 21:00   ` Peter Xu
  2023-10-10 21:40     ` Peter Xu
@ 2023-10-10 21:43     ` Fabiano Rosas
  2023-10-10 21:59       ` Peter Xu
  1 sibling, 1 reply; 10+ messages in thread
From: Fabiano Rosas @ 2023-10-10 21:43 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juan Quintela, Leonardo Bras, Elena Ufimtseva

Peter Xu <peterx@redhat.com> writes:

> On Fri, Sep 22, 2023 at 11:53:17AM -0300, Fabiano Rosas wrote:
>> Commit d2026ee117 ("multifd: Fix the number of channels ready") moved
>> the "post" of channels_ready to the start of the multifd_send_thread()
>> loop and added a missing "wait" at multifd_send_sync_main(). While it
>> does work, the placement of the wait goes against what the rest of the
>> code does.
>> 
>> The sequence at multifd_send_thread() is:
>> 
>>     qemu_sem_post(&multifd_send_state->channels_ready);
>>     qemu_sem_wait(&p->sem);
>>     <work>
>>     if (flags & MULTIFD_FLAG_SYNC) {
>>         qemu_sem_post(&p->sem_sync);
>>     }
>> 
>> Which means that the sending thread makes itself available
>> (channels_ready) and waits for more work (sem). So the sequence in the
>> migration thread should be to check if any channel is available
>> (channels_ready), give it some work and set it off (sem):
>> 
>>     qemu_sem_wait(&multifd_send_state->channels_ready);
>
> Here it means we have at least 1 free send thread, then...
>
>>     <enqueue work>
>>     qemu_sem_post(&p->sem);
>
> ... here we enqueue some work to the current thread (pointed by "i"), no
> matter it's free or not, as "i" may not always point to the free thread.
>

Yes. Which means channels_ready is currently useless. Since I posted
this I realized that and have been working on a series to remove it
completely.

... I'm not opposed to "fixing" whatever needs to be fixed here as well, but
I think removing it makes sense. I'll try to focus on that and post a v2
here.

>>     if (flags & MULTIFD_FLAG_SYNC) {
>>         qemu_sem_wait(&p->sem_sync);
>>     }
>
> So I must confess I never fully digest how these sem/mutex/.. worked in
> multifd, since the 1st day it's introduced.. so please take below comment
> with a grain of salt..

We definitely need to clarify some things in the multifd
design. Specially if we're going to use it as the main migration
infrastructure moving forward.

I think what we lack is a design direction. I'm not really interested in
how things work currently, but in how they *should* work based on the
design.

I'm confused about:

1) why channels_ready exists? Were we trying to do some lockstep
movement of: populate MultiFDPages -> release the sender thread -> move
to next channel -> wait for it to become ready -> repeat. If so, that
semaphore should be per-channel I think.

(my future proposal will be to remove the channels_ready semaphore)

2) why do we need sem_sync? The SYNC flag makes sense, but why the
source needs to sync with itself when syncing with dst?

(my proposal in this series is to rename sem_sync to sem_done and use it
to track sending completion)

3) why do we need to take the params lock? Shouldn't the semaphores
already ensure that only one of the main thread and the sender thread
will touch the params? The comment in multifd_send_pages says that we
don't take locks for the pages structure, but that seems pointeless to
me since we still lock the params structure.

> It seems to me that the current design allows >1 pending_job for a thread.
> Here the current code didn't do "wait(channels_ready)" because it doesn't
> need to - it simply always queue an MULTIFD_FLAG_SYNC pending job over the
> thread, and wait for it to run.
>
> From that POV I think I can understand why "wait(channels_ready)" is not
> needed here.  But then I'm confused because we don't have a real QUEUE to
> put those requests; we simply apply this:
>
> multifd_send_sync_main():
>         p->flags |= MULTIFD_FLAG_SYNC;
>
> Even if this send thread can be busy handling a batch of pages and
> accessing p->flags.  I think it can actually race with the send thread
> reading the flag at the exact same time:
>
> multifd_send_thread():
>             multifd_send_fill_packet(p);
>             flags = p->flags;  <-------------- here

It doesn't race in reading due to the p->mutex lock. But it looks like
it could miss a newly set flag when it unlocks to start sending
(qio_channel_write*).

> And whether it sees MULTIFD_FLAG_SYNC is unpredictable.  If it sees it,
> it'll post(sem_sync) in this round.  If it doesn't see it, it'll
> post(sem_sync) in the next round.  In whatever way, we'll generate an empty
> multifd packet to the wire I think, even though I don't know whether that's
> needed at all...
>
> I'm not sure whether we should fix it in a more complete form, by not
> sending that empty multifd packet at all? Because that only contains the
> header without any real page inside, IIUC, so it seems to be a waste of
> resource.  Here what we want is only to kick sem_sync?
>
>> 
>> The reason there's no deadlock today is that the migration thread
>> enqueues the SYNC packet right before the wait on channels_ready and
>> we end up taking advantage of the out-of-order post to sem:
>> 
>>         ...
>>         qemu_sem_post(&p->sem);
>>     }
>>     for (i = 0; i < migrate_multifd_channels(); i++) {
>>         MultiFDSendParams *p = &multifd_send_state->params[i];
>> 
>>         qemu_sem_wait(&multifd_send_state->channels_ready);
>>         trace_multifd_send_sync_main_wait(p->id);
>>         qemu_sem_wait(&p->sem_sync);
>> 	...
>> 
>> Move the channels_ready wait before the sem post to keep the sequence
>> consistent. Also fix the error path to post to channels_ready and
>> sem_sync in the correct order.
>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  migration/multifd.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index a7c7a947e3..d626740f2f 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -618,6 +618,7 @@ int multifd_send_sync_main(QEMUFile *f)
>>  
>>          trace_multifd_send_sync_main_signal(p->id);
>>  
>> +        qemu_sem_wait(&multifd_send_state->channels_ready);
>>          qemu_mutex_lock(&p->mutex);
>>  
>>          if (p->quit) {
>> @@ -635,7 +636,6 @@ int multifd_send_sync_main(QEMUFile *f)
>>      for (i = 0; i < migrate_multifd_channels(); i++) {
>>          MultiFDSendParams *p = &multifd_send_state->params[i];
>>  
>> -        qemu_sem_wait(&multifd_send_state->channels_ready);
>>          trace_multifd_send_sync_main_wait(p->id);
>>          qemu_sem_wait(&p->sem_sync);
>>  
>> @@ -763,8 +763,8 @@ out:
>>       * who pay attention to me.
>>       */
>>      if (ret != 0) {
>> -        qemu_sem_post(&p->sem_sync);
>>          qemu_sem_post(&multifd_send_state->channels_ready);
>> +        qemu_sem_post(&p->sem_sync);
>
> I'm not sure why such movement will have a difference; afaiu on the
> semaphore semantics, post() to two sems don't matter on order?

You're right, there's no difference here. I have been working on
centralizing these "cleanup posts" and ended up convincing myself that
this was needed.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/3] migration/multifd: Move channels_ready semaphore
  2023-10-10 21:43     ` Fabiano Rosas
@ 2023-10-10 21:59       ` Peter Xu
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2023-10-10 21:59 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Juan Quintela, Leonardo Bras, Elena Ufimtseva

On Tue, Oct 10, 2023 at 06:43:05PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Fri, Sep 22, 2023 at 11:53:17AM -0300, Fabiano Rosas wrote:
> >> Commit d2026ee117 ("multifd: Fix the number of channels ready") moved
> >> the "post" of channels_ready to the start of the multifd_send_thread()
> >> loop and added a missing "wait" at multifd_send_sync_main(). While it
> >> does work, the placement of the wait goes against what the rest of the
> >> code does.
> >> 
> >> The sequence at multifd_send_thread() is:
> >> 
> >>     qemu_sem_post(&multifd_send_state->channels_ready);
> >>     qemu_sem_wait(&p->sem);
> >>     <work>
> >>     if (flags & MULTIFD_FLAG_SYNC) {
> >>         qemu_sem_post(&p->sem_sync);
> >>     }
> >> 
> >> Which means that the sending thread makes itself available
> >> (channels_ready) and waits for more work (sem). So the sequence in the
> >> migration thread should be to check if any channel is available
> >> (channels_ready), give it some work and set it off (sem):
> >> 
> >>     qemu_sem_wait(&multifd_send_state->channels_ready);
> >
> > Here it means we have at least 1 free send thread, then...
> >
> >>     <enqueue work>
> >>     qemu_sem_post(&p->sem);
> >
> > ... here we enqueue some work to the current thread (pointed by "i"), no
> > matter it's free or not, as "i" may not always point to the free thread.
> >
> 
> Yes. Which means channels_ready is currently useless. Since I posted
> this I realized that and have been working on a series to remove it
> completely.
> 
> ... I'm not opposed to "fixing" whatever needs to be fixed here as well, but
> I think removing it makes sense. I'll try to focus on that and post a v2
> here.

Happy to read it.

> 
> >>     if (flags & MULTIFD_FLAG_SYNC) {
> >>         qemu_sem_wait(&p->sem_sync);
> >>     }
> >
> > So I must confess I never fully digest how these sem/mutex/.. worked in
> > multifd, since the 1st day it's introduced.. so please take below comment
> > with a grain of salt..
> 
> We definitely need to clarify some things in the multifd
> design. Specially if we're going to use it as the main migration
> infrastructure moving forward.

Exactly.

> 
> I think what we lack is a design direction. I'm not really interested in
> how things work currently, but in how they *should* work based on the
> design.

Unfortunately we can't ignore how old code works; normally old code has its
reason to work like that. So the best way to go is trying to figure out
exactly how it works with the author, unless reaching the consensus it was
a design mistake after that conversation.

The luckiest thing here is we have the author around (Juan).  Let's discuss
thoroughly with him to make sure nothing is overlooked.

> 
> I'm confused about:
> 
> 1) why channels_ready exists? Were we trying to do some lockstep
> movement of: populate MultiFDPages -> release the sender thread -> move
> to next channel -> wait for it to become ready -> repeat. If so, that
> semaphore should be per-channel I think.
> 
> (my future proposal will be to remove the channels_ready semaphore)
> 
> 2) why do we need sem_sync? The SYNC flag makes sense, but why the
> source needs to sync with itself when syncing with dst?
> 
> (my proposal in this series is to rename sem_sync to sem_done and use it
> to track sending completion)
> 
> 3) why do we need to take the params lock? Shouldn't the semaphores
> already ensure that only one of the main thread and the sender thread
> will touch the params? The comment in multifd_send_pages says that we
> don't take locks for the pages structure, but that seems pointeless to
> me since we still lock the params structure.

Heh, so I'm not the only one who is confused with all these. :) You
reminded me of the days when I was reviewing the initial versions of
multifd, since when I failed to understand the code..

It's great to start discussing this again.  I'd say go ahead and propose
your patches; I'll read them.

> 
> > It seems to me that the current design allows >1 pending_job for a thread.
> > Here the current code didn't do "wait(channels_ready)" because it doesn't
> > need to - it simply always queue an MULTIFD_FLAG_SYNC pending job over the
> > thread, and wait for it to run.
> >
> > From that POV I think I can understand why "wait(channels_ready)" is not
> > needed here.  But then I'm confused because we don't have a real QUEUE to
> > put those requests; we simply apply this:
> >
> > multifd_send_sync_main():
> >         p->flags |= MULTIFD_FLAG_SYNC;
> >
> > Even if this send thread can be busy handling a batch of pages and
> > accessing p->flags.  I think it can actually race with the send thread
> > reading the flag at the exact same time:
> >
> > multifd_send_thread():
> >             multifd_send_fill_packet(p);
> >             flags = p->flags;  <-------------- here
> 
> It doesn't race in reading due to the p->mutex lock. But it looks like
> it could miss a newly set flag when it unlocks to start sending
> (qio_channel_write*).

Right. See my follow up email, I think the "race" is that it's
unpredictable on what will happen, and I think it's possible src qemu
generates an useless packet.  Not a real race.

If I'm doing the multifd thing, I'll avoid any form of modifying p->flags
if a job was assigned already to be clear.

Also please feel free to have a look at the SYNC issue I raised in that
same follow up email.  I hope I'm wrong somewhere.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-10-10 22:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-22 14:53 [RFC PATCH 0/3] migration/multifd: SYNC packet changes Fabiano Rosas
2023-09-22 14:53 ` [RFC PATCH 1/3] migration/multifd: Move channels_ready semaphore Fabiano Rosas
2023-09-22 22:33   ` Elena Ufimtseva
2023-09-29 14:41     ` Fabiano Rosas
2023-10-10 21:00   ` Peter Xu
2023-10-10 21:40     ` Peter Xu
2023-10-10 21:43     ` Fabiano Rosas
2023-10-10 21:59       ` Peter Xu
2023-09-22 14:53 ` [RFC PATCH 2/3] migration/multifd: Decouple control flow from the SYNC packet Fabiano Rosas
2023-09-22 14:53 ` [RFC PATCH 3/3] migration/multifd: Extract sem_done waiting into a function Fabiano Rosas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.