All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/14] migration: Postcopy Preempt-Full
@ 2022-09-20 22:50 Peter Xu
  2022-09-20 22:50 ` [PATCH 01/14] migration: Add postcopy_preempt_active() Peter Xu
                   ` (13 more replies)
  0 siblings, 14 replies; 44+ messages in thread
From: Peter Xu @ 2022-09-20 22:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Dr . David Alan Gilbert, Manish Mishra, peterx, Juan Quintela,
	Leonardo Bras Soares Passos, ani, Daniel P . Berrange

Based-on: <20220920223800.47467-1-peterx@redhat.com>
  [PATCH 0/5] migration: Bug fixes (prepare for preempt-full)

Tree is here:
  https://github.com/xzpeter/qemu/tree/preempt-full

RFC:
  https://lore.kernel.org/qemu-devel/20220829165659.96046-1-peterx@redhat.com

This patchset is the v1 formal version of preempt-full series.  The RFC tag
is removed as more tests was done, and all items mentioned in the TODO
items previously in the RFC cover letters are implemented in this patchset.

A few patches are added. Most of the patches are still the same as the RFC
ones but may have some trivial touch ups here and there.  E.g., comment
touch-ups as Dave used to suggest on bitmap_mutex.  If to have a look on
the diff stat we'll see mostly "+" is the same as "-" this time though,
because with the rp-return thread change we can logically drop a lot of
complicated preempt logic previously maintained in migration thread:

  3 files changed, 371 insertions(+), 399 deletions(-)

Feel free to have a look at patch "migration: Remove old preempt code
around state maintainance" where we dropped a lot of old code on preempt
state maintainance of migration thread (but the major part of the preempt
old code still needs to be there, e.g. channel managements) along with the
break-huge parameter (as we never need to break-huge anymore.. because we
already run in parallel).

Comparing to the recently merged preempt mode I called it "preempt-full"
because it threadifies the postcopy channels so now urgent pages can be
fully handled separately outside of the ram save loop.

The existing preempt code has reduced ramdom page req latency over 10Gbps
network from ~12ms to ~500us which has already landed.

This preempt-full series can further reduces that ~500us to ~230us per my
initial test.  More to share below.

Note that no new capability is needed, IOW it's fully compatible with the
existing preempt mode.  So the naming is actually not important but just to
identify the difference on the binaries.

The logic of the series is simple: send urgent pages in rp-return thread
rather than migration thread.  It also means rp-thread will take over the
ownership of the newly created preempt channel.  It can slow down rp-return
thread on receiving page requests, but I don't really see a major issue
with it so far but only benefits.

For detailed performance numbers, please refer to the rfc cover letter.

Please have a look, thanks.

Peter Xu (14):
  migration: Add postcopy_preempt_active()
  migration: Cleanup xbzrle zero page cache update logic
  migration: Trivial cleanup save_page_header() on same block check
  migration: Remove RAMState.f references in compression code
  migration: Yield bitmap_mutex properly when sending/sleeping
  migration: Use atomic ops properly for page accountings
  migration: Teach PSS about host page
  migration: Introduce pss_channel
  migration: Add pss_init()
  migration: Make PageSearchStatus part of RAMState
  migration: Move last_sent_block into PageSearchStatus
  migration: Send requested page directly in rp-return thread
  migration: Remove old preempt code around state maintainance
  migration: Drop rs->f

 migration/migration.c |  47 +--
 migration/multifd.c   |   2 +-
 migration/ram.c       | 721 ++++++++++++++++++++----------------------
 3 files changed, 371 insertions(+), 399 deletions(-)

-- 
2.32.0



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 01/14] migration: Add postcopy_preempt_active()
  2022-09-20 22:50 [PATCH 00/14] migration: Postcopy Preempt-Full Peter Xu
@ 2022-09-20 22:50 ` Peter Xu
  2022-09-20 22:50 ` [PATCH 02/14] migration: Cleanup xbzrle zero page cache update logic Peter Xu
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 44+ messages in thread
From: Peter Xu @ 2022-09-20 22:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Dr . David Alan Gilbert, Manish Mishra, peterx, Juan Quintela,
	Leonardo Bras Soares Passos, ani, Daniel P . Berrange

Add the helper to show that postcopy preempt enabled, meanwhile active.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 1d42414ecc..d8cf7cc901 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -162,6 +162,11 @@ out:
     return ret;
 }
 
+static bool postcopy_preempt_active(void)
+{
+    return migrate_postcopy_preempt() && migration_in_postcopy();
+}
+
 bool ramblock_is_ignored(RAMBlock *block)
 {
     return !qemu_ram_is_migratable(block) ||
@@ -2434,7 +2439,7 @@ static void postcopy_preempt_choose_channel(RAMState *rs, PageSearchStatus *pss)
 /* We need to make sure rs->f always points to the default channel elsewhere */
 static void postcopy_preempt_reset_channel(RAMState *rs)
 {
-    if (migrate_postcopy_preempt() && migration_in_postcopy()) {
+    if (postcopy_preempt_active()) {
         rs->postcopy_channel = RAM_CHANNEL_PRECOPY;
         rs->f = migrate_get_current()->to_dst_file;
         trace_postcopy_preempt_reset_channel();
@@ -2472,7 +2477,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
         return 0;
     }
 
-    if (migrate_postcopy_preempt() && migration_in_postcopy()) {
+    if (postcopy_preempt_active()) {
         postcopy_preempt_choose_channel(rs, pss);
     }
 
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 02/14] migration: Cleanup xbzrle zero page cache update logic
  2022-09-20 22:50 [PATCH 00/14] migration: Postcopy Preempt-Full Peter Xu
  2022-09-20 22:50 ` [PATCH 01/14] migration: Add postcopy_preempt_active() Peter Xu
@ 2022-09-20 22:50 ` Peter Xu
  2022-10-04 10:33   ` Dr. David Alan Gilbert
  2022-09-20 22:50 ` [PATCH 03/14] migration: Trivial cleanup save_page_header() on same block check Peter Xu
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-09-20 22:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Dr . David Alan Gilbert, Manish Mishra, peterx, Juan Quintela,
	Leonardo Bras Soares Passos, ani, Daniel P . Berrange

The major change is to replace "!save_page_use_compression()" with
"xbzrle_enabled" to make it clear.

Reasonings:

(1) When compression enabled, "!save_page_use_compression()" is exactly the
    same as checking "xbzrle_enabled".

(2) When compression disabled, "!save_page_use_compression()" always return
    true.  We used to try calling the xbzrle code, but after this change we
    won't, and we shouldn't need to.

Since at it, drop the xbzrle_enabled check in xbzrle_cache_zero_page()
because with this change it's not needed anymore.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index d8cf7cc901..fc59c052cf 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -741,10 +741,6 @@ void mig_throttle_counter_reset(void)
  */
 static void xbzrle_cache_zero_page(RAMState *rs, ram_addr_t current_addr)
 {
-    if (!rs->xbzrle_enabled) {
-        return;
-    }
-
     /* We don't care if this fails to allocate a new cache page
      * as long as it updated an old one */
     cache_insert(XBZRLE.cache, current_addr, XBZRLE.zero_target_page,
@@ -2301,7 +2297,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
         /* Must let xbzrle know, otherwise a previous (now 0'd) cached
          * page would be stale
          */
-        if (!save_page_use_compression(rs)) {
+        if (rs->xbzrle_enabled) {
             XBZRLE_cache_lock();
             xbzrle_cache_zero_page(rs, block->offset + offset);
             XBZRLE_cache_unlock();
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 03/14] migration: Trivial cleanup save_page_header() on same block check
  2022-09-20 22:50 [PATCH 00/14] migration: Postcopy Preempt-Full Peter Xu
  2022-09-20 22:50 ` [PATCH 01/14] migration: Add postcopy_preempt_active() Peter Xu
  2022-09-20 22:50 ` [PATCH 02/14] migration: Cleanup xbzrle zero page cache update logic Peter Xu
@ 2022-09-20 22:50 ` Peter Xu
  2022-10-04 10:41   ` Dr. David Alan Gilbert
  2022-09-20 22:50 ` [PATCH 04/14] migration: Remove RAMState.f references in compression code Peter Xu
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-09-20 22:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Dr . David Alan Gilbert, Manish Mishra, peterx, Juan Quintela,
	Leonardo Bras Soares Passos, ani, Daniel P . Berrange

The 2nd check on RAM_SAVE_FLAG_CONTINUE is a bit redundant.  Use a boolean
to be clearer.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index fc59c052cf..62ff2c1469 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -661,14 +661,15 @@ static size_t save_page_header(RAMState *rs, QEMUFile *f,  RAMBlock *block,
                                ram_addr_t offset)
 {
     size_t size, len;
+    bool same_block = (block == rs->last_sent_block);
 
-    if (block == rs->last_sent_block) {
+    if (same_block) {
         offset |= RAM_SAVE_FLAG_CONTINUE;
     }
     qemu_put_be64(f, offset);
     size = 8;
 
-    if (!(offset & RAM_SAVE_FLAG_CONTINUE)) {
+    if (!same_block) {
         len = strlen(block->idstr);
         qemu_put_byte(f, len);
         qemu_put_buffer(f, (uint8_t *)block->idstr, len);
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 04/14] migration: Remove RAMState.f references in compression code
  2022-09-20 22:50 [PATCH 00/14] migration: Postcopy Preempt-Full Peter Xu
                   ` (2 preceding siblings ...)
  2022-09-20 22:50 ` [PATCH 03/14] migration: Trivial cleanup save_page_header() on same block check Peter Xu
@ 2022-09-20 22:50 ` Peter Xu
  2022-10-04 10:54   ` Dr. David Alan Gilbert
  2022-09-20 22:52 ` [PATCH 05/14] migration: Yield bitmap_mutex properly when sending/sleeping Peter Xu
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-09-20 22:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Dr . David Alan Gilbert, Manish Mishra, peterx, Juan Quintela,
	Leonardo Bras Soares Passos, ani, Daniel P . Berrange

Removing referencing to RAMState.f in compress_page_with_multi_thread() and
flush_compressed_data().

Compression code by default isn't compatible with having >1 channels (or it
won't currently know which channel to flush the compressed data), so to
make it simple we always flush on the default to_dst_file port until
someone wants to add >1 ports support, as rs->f right now can really
change (after postcopy preempt is introduced).

There should be no functional change at all after patch applied, since as
long as rs->f referenced in compression code, it must be to_dst_file.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 62ff2c1469..8303252b6d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1461,6 +1461,7 @@ static bool save_page_use_compression(RAMState *rs);
 
 static void flush_compressed_data(RAMState *rs)
 {
+    MigrationState *ms = migrate_get_current();
     int idx, len, thread_count;
 
     if (!save_page_use_compression(rs)) {
@@ -1479,7 +1480,7 @@ static void flush_compressed_data(RAMState *rs)
     for (idx = 0; idx < thread_count; idx++) {
         qemu_mutex_lock(&comp_param[idx].mutex);
         if (!comp_param[idx].quit) {
-            len = qemu_put_qemu_file(rs->f, comp_param[idx].file);
+            len = qemu_put_qemu_file(ms->to_dst_file, comp_param[idx].file);
             /*
              * it's safe to fetch zero_page without holding comp_done_lock
              * as there is no further request submitted to the thread,
@@ -1498,11 +1499,11 @@ static inline void set_compress_params(CompressParam *param, RAMBlock *block,
     param->offset = offset;
 }
 
-static int compress_page_with_multi_thread(RAMState *rs, RAMBlock *block,
-                                           ram_addr_t offset)
+static int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset)
 {
     int idx, thread_count, bytes_xmit = -1, pages = -1;
     bool wait = migrate_compress_wait_thread();
+    MigrationState *ms = migrate_get_current();
 
     thread_count = migrate_compress_threads();
     qemu_mutex_lock(&comp_done_lock);
@@ -1510,7 +1511,8 @@ retry:
     for (idx = 0; idx < thread_count; idx++) {
         if (comp_param[idx].done) {
             comp_param[idx].done = false;
-            bytes_xmit = qemu_put_qemu_file(rs->f, comp_param[idx].file);
+            bytes_xmit = qemu_put_qemu_file(ms->to_dst_file,
+                                            comp_param[idx].file);
             qemu_mutex_lock(&comp_param[idx].mutex);
             set_compress_params(&comp_param[idx], block, offset);
             qemu_cond_signal(&comp_param[idx].cond);
@@ -2263,7 +2265,7 @@ static bool save_compress_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
         return false;
     }
 
-    if (compress_page_with_multi_thread(rs, block, offset) > 0) {
+    if (compress_page_with_multi_thread(block, offset) > 0) {
         return true;
     }
 
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 05/14] migration: Yield bitmap_mutex properly when sending/sleeping
  2022-09-20 22:50 [PATCH 00/14] migration: Postcopy Preempt-Full Peter Xu
                   ` (3 preceding siblings ...)
  2022-09-20 22:50 ` [PATCH 04/14] migration: Remove RAMState.f references in compression code Peter Xu
@ 2022-09-20 22:52 ` Peter Xu
  2022-10-04 13:55   ` Dr. David Alan Gilbert
  2022-09-20 22:52 ` [PATCH 06/14] migration: Use atomic ops properly for page accountings Peter Xu
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-09-20 22:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: peterx, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Dr . David Alan Gilbert,
	Daniel P . Berrange

Don't take the bitmap mutex when sending pages, or when being throttled by
migration_rate_limit() (which is a bit tricky to call it here in ram code,
but seems still helpful).

It prepares for the possibility of concurrently sending pages in >1 threads
using the function ram_save_host_page() because all threads may need the
bitmap_mutex to operate on bitmaps, so that either sendmsg() or any kind of
qemu_sem_wait() blocking for one thread will not block the other from
progressing.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 42 +++++++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 11 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 8303252b6d..6e7de6087a 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2463,6 +2463,7 @@ static void postcopy_preempt_reset_channel(RAMState *rs)
  */
 static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
 {
+    bool page_dirty, release_lock = postcopy_preempt_active();
     int tmppages, pages = 0;
     size_t pagesize_bits =
         qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
@@ -2486,22 +2487,41 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
             break;
         }
 
+        page_dirty = migration_bitmap_clear_dirty(rs, pss->block, pss->page);
+        /*
+         * Properly yield the lock only in postcopy preempt mode because
+         * both migration thread and rp-return thread can operate on the
+         * bitmaps.
+         */
+        if (release_lock) {
+            qemu_mutex_unlock(&rs->bitmap_mutex);
+        }
+
         /* Check the pages is dirty and if it is send it */
-        if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
+        if (page_dirty) {
             tmppages = ram_save_target_page(rs, pss);
-            if (tmppages < 0) {
-                return tmppages;
+            if (tmppages >= 0) {
+                pages += tmppages;
+                /*
+                 * Allow rate limiting to happen in the middle of huge pages if
+                 * something is sent in the current iteration.
+                 */
+                if (pagesize_bits > 1 && tmppages > 0) {
+                    migration_rate_limit();
+                }
             }
+        } else {
+            tmppages = 0;
+        }
 
-            pages += tmppages;
-            /*
-             * Allow rate limiting to happen in the middle of huge pages if
-             * something is sent in the current iteration.
-             */
-            if (pagesize_bits > 1 && tmppages > 0) {
-                migration_rate_limit();
-            }
+        if (release_lock) {
+            qemu_mutex_lock(&rs->bitmap_mutex);
         }
+
+        if (tmppages < 0) {
+            return tmppages;
+        }
+
         pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
     } while ((pss->page < hostpage_boundary) &&
              offset_in_ramblock(pss->block,
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 06/14] migration: Use atomic ops properly for page accountings
  2022-09-20 22:50 [PATCH 00/14] migration: Postcopy Preempt-Full Peter Xu
                   ` (4 preceding siblings ...)
  2022-09-20 22:52 ` [PATCH 05/14] migration: Yield bitmap_mutex properly when sending/sleeping Peter Xu
@ 2022-09-20 22:52 ` Peter Xu
  2022-10-04 16:59   ` Dr. David Alan Gilbert
  2022-09-20 22:52 ` [PATCH 07/14] migration: Teach PSS about host page Peter Xu
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-09-20 22:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: peterx, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Dr . David Alan Gilbert,
	Daniel P . Berrange

To prepare for thread-safety on page accountings, at least below counters
need to be accessed only atomically, they are:

        ram_counters.transferred
        ram_counters.duplicate
        ram_counters.normal
        ram_counters.postcopy_bytes

There are a lot of other counters but they won't be accessed outside
migration thread, then they're still safe to be accessed without atomic
ops.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 10 +++++-----
 migration/multifd.c   |  2 +-
 migration/ram.c       | 29 +++++++++++++++--------------
 3 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 07c74a79a2..0eacc0c99b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1048,13 +1048,13 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
 
     info->has_ram = true;
     info->ram = g_malloc0(sizeof(*info->ram));
-    info->ram->transferred = ram_counters.transferred;
+    info->ram->transferred = qatomic_read(&ram_counters.transferred);
     info->ram->total = ram_bytes_total();
-    info->ram->duplicate = ram_counters.duplicate;
+    info->ram->duplicate = qatomic_read(&ram_counters.duplicate);
     /* legacy value.  It is not used anymore */
     info->ram->skipped = 0;
-    info->ram->normal = ram_counters.normal;
-    info->ram->normal_bytes = ram_counters.normal * page_size;
+    info->ram->normal = qatomic_read(&ram_counters.normal);
+    info->ram->normal_bytes = info->ram->normal * page_size;
     info->ram->mbps = s->mbps;
     info->ram->dirty_sync_count = ram_counters.dirty_sync_count;
     info->ram->dirty_sync_missed_zero_copy =
@@ -1065,7 +1065,7 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
     info->ram->pages_per_second = s->pages_per_second;
     info->ram->precopy_bytes = ram_counters.precopy_bytes;
     info->ram->downtime_bytes = ram_counters.downtime_bytes;
-    info->ram->postcopy_bytes = ram_counters.postcopy_bytes;
+    info->ram->postcopy_bytes = qatomic_read(&ram_counters.postcopy_bytes);
 
     if (migrate_use_xbzrle()) {
         info->has_xbzrle_cache = true;
diff --git a/migration/multifd.c b/migration/multifd.c
index 586ddc9d65..460326acd4 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -437,7 +437,7 @@ static int multifd_send_pages(QEMUFile *f)
                 + p->packet_len;
     qemu_file_acct_rate_limit(f, transferred);
     ram_counters.multifd_bytes += transferred;
-    ram_counters.transferred += transferred;
+    qatomic_add(&ram_counters.transferred, transferred);
     qemu_mutex_unlock(&p->mutex);
     qemu_sem_post(&p->sem);
 
diff --git a/migration/ram.c b/migration/ram.c
index 6e7de6087a..5bd3d76bf0 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -432,11 +432,11 @@ static void ram_transferred_add(uint64_t bytes)
     if (runstate_is_running()) {
         ram_counters.precopy_bytes += bytes;
     } else if (migration_in_postcopy()) {
-        ram_counters.postcopy_bytes += bytes;
+        qatomic_add(&ram_counters.postcopy_bytes, bytes);
     } else {
         ram_counters.downtime_bytes += bytes;
     }
-    ram_counters.transferred += bytes;
+    qatomic_add(&ram_counters.transferred, bytes);
 }
 
 void dirty_sync_missed_zero_copy(void)
@@ -725,7 +725,7 @@ void mig_throttle_counter_reset(void)
 
     rs->time_last_bitmap_sync = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     rs->num_dirty_pages_period = 0;
-    rs->bytes_xfer_prev = ram_counters.transferred;
+    rs->bytes_xfer_prev = qatomic_read(&ram_counters.transferred);
 }
 
 /**
@@ -1085,8 +1085,9 @@ uint64_t ram_pagesize_summary(void)
 
 uint64_t ram_get_total_transferred_pages(void)
 {
-    return  ram_counters.normal + ram_counters.duplicate +
-                compression_counters.pages + xbzrle_counters.pages;
+    return  qatomic_read(&ram_counters.normal) +
+        qatomic_read(&ram_counters.duplicate) +
+        compression_counters.pages + xbzrle_counters.pages;
 }
 
 static void migration_update_rates(RAMState *rs, int64_t end_time)
@@ -1145,8 +1146,8 @@ static void migration_trigger_throttle(RAMState *rs)
 {
     MigrationState *s = migrate_get_current();
     uint64_t threshold = s->parameters.throttle_trigger_threshold;
-
-    uint64_t bytes_xfer_period = ram_counters.transferred - rs->bytes_xfer_prev;
+    uint64_t bytes_xfer_period =
+        qatomic_read(&ram_counters.transferred) - rs->bytes_xfer_prev;
     uint64_t bytes_dirty_period = rs->num_dirty_pages_period * TARGET_PAGE_SIZE;
     uint64_t bytes_dirty_threshold = bytes_xfer_period * threshold / 100;
 
@@ -1285,7 +1286,7 @@ static int save_zero_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
     int len = save_zero_page_to_file(rs, rs->f, block, offset);
 
     if (len) {
-        ram_counters.duplicate++;
+        qatomic_inc(&ram_counters.duplicate);
         ram_transferred_add(len);
         return 1;
     }
@@ -1322,9 +1323,9 @@ static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
     }
 
     if (bytes_xmit > 0) {
-        ram_counters.normal++;
+        qatomic_inc(&ram_counters.normal);
     } else if (bytes_xmit == 0) {
-        ram_counters.duplicate++;
+        qatomic_inc(&ram_counters.duplicate);
     }
 
     return true;
@@ -1354,7 +1355,7 @@ static int save_normal_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
         qemu_put_buffer(rs->f, buf, TARGET_PAGE_SIZE);
     }
     ram_transferred_add(TARGET_PAGE_SIZE);
-    ram_counters.normal++;
+    qatomic_inc(&ram_counters.normal);
     return 1;
 }
 
@@ -1448,7 +1449,7 @@ update_compress_thread_counts(const CompressParam *param, int bytes_xmit)
     ram_transferred_add(bytes_xmit);
 
     if (param->zero_page) {
-        ram_counters.duplicate++;
+        qatomic_inc(&ram_counters.duplicate);
         return;
     }
 
@@ -2620,9 +2621,9 @@ void acct_update_position(QEMUFile *f, size_t size, bool zero)
     uint64_t pages = size / TARGET_PAGE_SIZE;
 
     if (zero) {
-        ram_counters.duplicate += pages;
+        qatomic_add(&ram_counters.duplicate, pages);
     } else {
-        ram_counters.normal += pages;
+        qatomic_add(&ram_counters.normal, pages);
         ram_transferred_add(size);
         qemu_file_credit_transfer(f, size);
     }
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 07/14] migration: Teach PSS about host page
  2022-09-20 22:50 [PATCH 00/14] migration: Postcopy Preempt-Full Peter Xu
                   ` (5 preceding siblings ...)
  2022-09-20 22:52 ` [PATCH 06/14] migration: Use atomic ops properly for page accountings Peter Xu
@ 2022-09-20 22:52 ` Peter Xu
  2022-10-05 11:12   ` Dr. David Alan Gilbert
  2022-09-20 22:52 ` [PATCH 08/14] migration: Introduce pss_channel Peter Xu
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-09-20 22:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: peterx, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Dr . David Alan Gilbert,
	Daniel P . Berrange

Migration code has a lot to do with host pages.  Teaching PSS core about
the idea of host page helps a lot and makes the code clean.  Meanwhile,
this prepares for the future changes that can leverage the new PSS helpers
that this patch introduces to send host page in another thread.

Three more fields are introduced for this:

  (1) host_page_sending: this is set to true when QEMU is sending a host
      page, false otherwise.

  (2) host_page_{start|end}: these point to the start/end of host page
      we're sending, and it's only valid when host_page_sending==true.

For example, when we look up the next dirty page on the ramblock, with
host_page_sending==true, we'll not try to look for anything beyond the
current host page boundary.  This can be slightly efficient than current
code because currently we'll set pss->page to next dirty bit (which can be
over current host page boundary) and reset it to host page boundary if we
found it goes beyond that.

With above, we can easily make migration_bitmap_find_dirty() self contained
by updating pss->page properly.  rs* parameter is removed because it's not
even used in old code.

When sending a host page, we should use the pss helpers like this:

  - pss_host_page_prepare(pss): called before sending host page
  - pss_within_range(pss): whether we're still working on the cur host page?
  - pss_host_page_finish(pss): called after sending a host page

Then we can use ram_save_target_page() to save one small page.

Currently ram_save_host_page() is still the only user. If there'll be
another function to send host page (e.g. in return path thread) in the
future, it should follow the same style.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 95 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 76 insertions(+), 19 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 5bd3d76bf0..3f720b6de2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -474,6 +474,11 @@ struct PageSearchStatus {
      * postcopy pages via postcopy preempt channel.
      */
     bool         postcopy_target_channel;
+    /* Whether we're sending a host page */
+    bool          host_page_sending;
+    /* The start/end of current host page.  Only valid if host_page_sending==true */
+    unsigned long host_page_start;
+    unsigned long host_page_end;
 };
 typedef struct PageSearchStatus PageSearchStatus;
 
@@ -851,26 +856,38 @@ static int save_xbzrle_page(RAMState *rs, uint8_t **current_data,
 }
 
 /**
- * migration_bitmap_find_dirty: find the next dirty page from start
+ * pss_find_next_dirty: find the next dirty page of current ramblock
  *
- * Returns the page offset within memory region of the start of a dirty page
+ * This function updates pss->page to point to the next dirty page index
+ * within the ramblock to migrate, or the end of ramblock when nothing
+ * found.  Note that when pss->host_page_sending==true it means we're
+ * during sending a host page, so we won't look for dirty page that is
+ * outside the host page boundary.
  *
- * @rs: current RAM state
- * @rb: RAMBlock where to search for dirty pages
- * @start: page where we start the search
+ * @pss: the current page search status
  */
-static inline
-unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
-                                          unsigned long start)
+static void pss_find_next_dirty(PageSearchStatus *pss)
 {
+    RAMBlock *rb = pss->block;
     unsigned long size = rb->used_length >> TARGET_PAGE_BITS;
     unsigned long *bitmap = rb->bmap;
 
     if (ramblock_is_ignored(rb)) {
-        return size;
+        /* Points directly to the end, so we know no dirty page */
+        pss->page = size;
+        return;
+    }
+
+    /*
+     * If during sending a host page, only look for dirty pages within the
+     * current host page being send.
+     */
+    if (pss->host_page_sending) {
+        assert(pss->host_page_end);
+        size = MIN(size, pss->host_page_end);
     }
 
-    return find_next_bit(bitmap, size, start);
+    pss->page = find_next_bit(bitmap, size, pss->page);
 }
 
 static void migration_clear_memory_region_dirty_bitmap(RAMBlock *rb,
@@ -1556,7 +1573,9 @@ static bool find_dirty_block(RAMState *rs, PageSearchStatus *pss, bool *again)
     pss->postcopy_requested = false;
     pss->postcopy_target_channel = RAM_CHANNEL_PRECOPY;
 
-    pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
+    /* Update pss->page for the next dirty bit in ramblock */
+    pss_find_next_dirty(pss);
+
     if (pss->complete_round && pss->block == rs->last_seen_block &&
         pss->page >= rs->last_page) {
         /*
@@ -2446,6 +2465,44 @@ static void postcopy_preempt_reset_channel(RAMState *rs)
     }
 }
 
+/* Should be called before sending a host page */
+static void pss_host_page_prepare(PageSearchStatus *pss)
+{
+    /* How many guest pages are there in one host page? */
+    size_t guest_pfns = qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
+
+    pss->host_page_sending = true;
+    pss->host_page_start = ROUND_DOWN(pss->page, guest_pfns);
+    pss->host_page_end = ROUND_UP(pss->page + 1, guest_pfns);
+}
+
+/*
+ * Whether the page pointed by PSS is within the host page being sent.
+ * Must be called after a previous pss_host_page_prepare().
+ */
+static bool pss_within_range(PageSearchStatus *pss)
+{
+    ram_addr_t ram_addr;
+
+    assert(pss->host_page_sending);
+
+    /* Over host-page boundary? */
+    if (pss->page >= pss->host_page_end) {
+        return false;
+    }
+
+    ram_addr = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
+
+    return offset_in_ramblock(pss->block, ram_addr);
+}
+
+static void pss_host_page_finish(PageSearchStatus *pss)
+{
+    pss->host_page_sending = false;
+    /* This is not needed, but just to reset it */
+    pss->host_page_start = pss->host_page_end = 0;
+}
+
 /**
  * ram_save_host_page: save a whole host page
  *
@@ -2468,8 +2525,6 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
     int tmppages, pages = 0;
     size_t pagesize_bits =
         qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
-    unsigned long hostpage_boundary =
-        QEMU_ALIGN_UP(pss->page + 1, pagesize_bits);
     unsigned long start_page = pss->page;
     int res;
 
@@ -2482,6 +2537,9 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
         postcopy_preempt_choose_channel(rs, pss);
     }
 
+    /* Update host page boundary information */
+    pss_host_page_prepare(pss);
+
     do {
         if (postcopy_needs_preempt(rs, pss)) {
             postcopy_do_preempt(rs, pss);
@@ -2520,15 +2578,14 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
         }
 
         if (tmppages < 0) {
+            pss_host_page_finish(pss);
             return tmppages;
         }
 
-        pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
-    } while ((pss->page < hostpage_boundary) &&
-             offset_in_ramblock(pss->block,
-                                ((ram_addr_t)pss->page) << TARGET_PAGE_BITS));
-    /* The offset we leave with is the min boundary of host page and block */
-    pss->page = MIN(pss->page, hostpage_boundary);
+        pss_find_next_dirty(pss);
+    } while (pss_within_range(pss));
+
+    pss_host_page_finish(pss);
 
     /*
      * When with postcopy preempt mode, flush the data as soon as possible for
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 08/14] migration: Introduce pss_channel
  2022-09-20 22:50 [PATCH 00/14] migration: Postcopy Preempt-Full Peter Xu
                   ` (6 preceding siblings ...)
  2022-09-20 22:52 ` [PATCH 07/14] migration: Teach PSS about host page Peter Xu
@ 2022-09-20 22:52 ` Peter Xu
  2022-10-05 13:03   ` Dr. David Alan Gilbert
  2022-09-20 22:52 ` [PATCH 09/14] migration: Add pss_init() Peter Xu
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-09-20 22:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: peterx, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Dr . David Alan Gilbert,
	Daniel P . Berrange

Introduce pss_channel for PageSearchStatus, define it as "the migration
channel to be used to transfer this host page".

We used to have rs->f, which is a mirror to MigrationState.to_dst_file.

After postcopy preempt initial version, rs->f can be dynamically changed
depending on which channel we want to use.

But that later work still doesn't grant full concurrency of sending pages
in e.g. different threads, because rs->f can either be the PRECOPY channel
or POSTCOPY channel.  This needs to be per-thread too.

PageSearchStatus is actually a good piece of struct which we can leverage
if we want to have multiple threads sending pages.  Sending a single guest
page may not make sense, so we make the granule to be "host page", and in
the PSS structure we allow specify a QEMUFile* to migrate a specific host
page.  Then we open the possibility to specify different channels in
different threads with different PSS structures.

The PSS prefix can be slightly misleading here because e.g. for the
upcoming usage of postcopy channel/thread it's not "searching" (or,
scanning) at all but sending the explicit page that was requested.  However
since PSS existed for some years keep it as-is until someone complains.

This patch mostly (simply) replace rs->f with pss->pss_channel only. No
functional change intended for this patch yet.  But it does prepare to
finally drop rs->f, and make ram_save_guest_page() thread safe.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 70 +++++++++++++++++++++++++++----------------------
 1 file changed, 38 insertions(+), 32 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 3f720b6de2..40ff5dc49f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -446,6 +446,8 @@ void dirty_sync_missed_zero_copy(void)
 
 /* used by the search for pages to send */
 struct PageSearchStatus {
+    /* The migration channel used for a specific host page */
+    QEMUFile    *pss_channel;
     /* Current block being searched */
     RAMBlock    *block;
     /* Current page to search from */
@@ -768,9 +770,9 @@ static void xbzrle_cache_zero_page(RAMState *rs, ram_addr_t current_addr)
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  */
-static int save_xbzrle_page(RAMState *rs, uint8_t **current_data,
-                            ram_addr_t current_addr, RAMBlock *block,
-                            ram_addr_t offset)
+static int save_xbzrle_page(RAMState *rs, QEMUFile *file,
+                            uint8_t **current_data, ram_addr_t current_addr,
+                            RAMBlock *block, ram_addr_t offset)
 {
     int encoded_len = 0, bytes_xbzrle;
     uint8_t *prev_cached_page;
@@ -838,11 +840,11 @@ static int save_xbzrle_page(RAMState *rs, uint8_t **current_data,
     }
 
     /* Send XBZRLE based compressed page */
-    bytes_xbzrle = save_page_header(rs, rs->f, block,
+    bytes_xbzrle = save_page_header(rs, file, block,
                                     offset | RAM_SAVE_FLAG_XBZRLE);
-    qemu_put_byte(rs->f, ENCODING_FLAG_XBZRLE);
-    qemu_put_be16(rs->f, encoded_len);
-    qemu_put_buffer(rs->f, XBZRLE.encoded_buf, encoded_len);
+    qemu_put_byte(file, ENCODING_FLAG_XBZRLE);
+    qemu_put_be16(file, encoded_len);
+    qemu_put_buffer(file, XBZRLE.encoded_buf, encoded_len);
     bytes_xbzrle += encoded_len + 1 + 2;
     /*
      * Like compressed_size (please see update_compress_thread_counts),
@@ -1298,9 +1300,10 @@ static int save_zero_page_to_file(RAMState *rs, QEMUFile *file,
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  */
-static int save_zero_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
+static int save_zero_page(RAMState *rs, QEMUFile *file, RAMBlock *block,
+                          ram_addr_t offset)
 {
-    int len = save_zero_page_to_file(rs, rs->f, block, offset);
+    int len = save_zero_page_to_file(rs, file, block, offset);
 
     if (len) {
         qatomic_inc(&ram_counters.duplicate);
@@ -1317,15 +1320,15 @@ static int save_zero_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
  *
  * Return true if the pages has been saved, otherwise false is returned.
  */
-static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
-                              int *pages)
+static bool control_save_page(PageSearchStatus *pss, RAMBlock *block,
+                              ram_addr_t offset, int *pages)
 {
     uint64_t bytes_xmit = 0;
     int ret;
 
     *pages = -1;
-    ret = ram_control_save_page(rs->f, block->offset, offset, TARGET_PAGE_SIZE,
-                                &bytes_xmit);
+    ret = ram_control_save_page(pss->pss_channel, block->offset, offset,
+                                TARGET_PAGE_SIZE, &bytes_xmit);
     if (ret == RAM_SAVE_CONTROL_NOT_SUPP) {
         return false;
     }
@@ -1359,17 +1362,17 @@ static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
  * @buf: the page to be sent
  * @async: send to page asyncly
  */
-static int save_normal_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
-                            uint8_t *buf, bool async)
+static int save_normal_page(RAMState *rs, QEMUFile *file, RAMBlock *block,
+                            ram_addr_t offset, uint8_t *buf, bool async)
 {
-    ram_transferred_add(save_page_header(rs, rs->f, block,
+    ram_transferred_add(save_page_header(rs, file, block,
                                          offset | RAM_SAVE_FLAG_PAGE));
     if (async) {
-        qemu_put_buffer_async(rs->f, buf, TARGET_PAGE_SIZE,
+        qemu_put_buffer_async(file, buf, TARGET_PAGE_SIZE,
                               migrate_release_ram() &&
                               migration_in_postcopy());
     } else {
-        qemu_put_buffer(rs->f, buf, TARGET_PAGE_SIZE);
+        qemu_put_buffer(file, buf, TARGET_PAGE_SIZE);
     }
     ram_transferred_add(TARGET_PAGE_SIZE);
     qatomic_inc(&ram_counters.normal);
@@ -1402,8 +1405,8 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
 
     XBZRLE_cache_lock();
     if (rs->xbzrle_enabled && !migration_in_postcopy()) {
-        pages = save_xbzrle_page(rs, &p, current_addr, block,
-                                 offset);
+        pages = save_xbzrle_page(rs, pss->pss_channel, &p, current_addr,
+                                 block, offset);
         if (!rs->last_stage) {
             /* Can't send this cached data async, since the cache page
              * might get updated before it gets to the wire
@@ -1414,7 +1417,8 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
 
     /* XBZRLE overflow or normal page */
     if (pages == -1) {
-        pages = save_normal_page(rs, block, offset, p, send_async);
+        pages = save_normal_page(rs, pss->pss_channel, block, offset,
+                                 p, send_async);
     }
 
     XBZRLE_cache_unlock();
@@ -1422,10 +1426,10 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
     return pages;
 }
 
-static int ram_save_multifd_page(RAMState *rs, RAMBlock *block,
+static int ram_save_multifd_page(QEMUFile *file, RAMBlock *block,
                                  ram_addr_t offset)
 {
-    if (multifd_queue_page(rs->f, block, offset) < 0) {
+    if (multifd_queue_page(file, block, offset) < 0) {
         return -1;
     }
     ram_counters.normal++;
@@ -1720,7 +1724,7 @@ static int ram_save_release_protection(RAMState *rs, PageSearchStatus *pss,
         uint64_t run_length = (pss->page - start_page) << TARGET_PAGE_BITS;
 
         /* Flush async buffers before un-protect. */
-        qemu_fflush(rs->f);
+        qemu_fflush(pss->pss_channel);
         /* Un-protect memory range. */
         res = uffd_change_protection(rs->uffdio_fd, page_address, run_length,
                 false, false);
@@ -2307,7 +2311,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
     ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
     int res;
 
-    if (control_save_page(rs, block, offset, &res)) {
+    if (control_save_page(pss, block, offset, &res)) {
         return res;
     }
 
@@ -2315,7 +2319,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
         return 1;
     }
 
-    res = save_zero_page(rs, block, offset);
+    res = save_zero_page(rs, pss->pss_channel, block, offset);
     if (res > 0) {
         /* Must let xbzrle know, otherwise a previous (now 0'd) cached
          * page would be stale
@@ -2336,7 +2340,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
      */
     if (!save_page_use_compression(rs) && migrate_use_multifd()
         && !migration_in_postcopy()) {
-        return ram_save_multifd_page(rs, block, offset);
+        return ram_save_multifd_page(pss->pss_channel, block, offset);
     }
 
     return ram_save_page(rs, pss);
@@ -2533,10 +2537,6 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
         return 0;
     }
 
-    if (postcopy_preempt_active()) {
-        postcopy_preempt_choose_channel(rs, pss);
-    }
-
     /* Update host page boundary information */
     pss_host_page_prepare(pss);
 
@@ -2597,7 +2597,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
      * explicit flush or it won't flush until the buffer is full.
      */
     if (migrate_postcopy_preempt() && pss->postcopy_requested) {
-        qemu_fflush(rs->f);
+        qemu_fflush(pss->pss_channel);
     }
 
     res = ram_save_release_protection(rs, pss, start_page);
@@ -2663,6 +2663,12 @@ static int ram_find_and_save_block(RAMState *rs)
         }
 
         if (found) {
+            /* Update rs->f with correct channel */
+            if (postcopy_preempt_active()) {
+                postcopy_preempt_choose_channel(rs, &pss);
+            }
+            /* Cache rs->f in pss_channel (TODO: remove rs->f) */
+            pss.pss_channel = rs->f;
             pages = ram_save_host_page(rs, &pss);
         }
     } while (!pages && again);
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 09/14] migration: Add pss_init()
  2022-09-20 22:50 [PATCH 00/14] migration: Postcopy Preempt-Full Peter Xu
                   ` (7 preceding siblings ...)
  2022-09-20 22:52 ` [PATCH 08/14] migration: Introduce pss_channel Peter Xu
@ 2022-09-20 22:52 ` Peter Xu
  2022-10-05 13:09   ` Dr. David Alan Gilbert
  2022-09-20 22:52 ` [PATCH 10/14] migration: Make PageSearchStatus part of RAMState Peter Xu
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-09-20 22:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: peterx, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Dr . David Alan Gilbert,
	Daniel P . Berrange

Helper to init PSS structures.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 40ff5dc49f..b4b36ca59e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -535,6 +535,14 @@ static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
 static void postcopy_preempt_restore(RAMState *rs, PageSearchStatus *pss,
                                      bool postcopy_requested);
 
+/* NOTE: page is the PFN not real ram_addr_t. */
+static void pss_init(PageSearchStatus *pss, RAMBlock *rb, ram_addr_t page)
+{
+    pss->block = rb;
+    pss->page = page;
+    pss->complete_round = false;
+}
+
 static void *do_data_compress(void *opaque)
 {
     CompressParam *param = opaque;
@@ -2640,9 +2648,7 @@ static int ram_find_and_save_block(RAMState *rs)
         rs->last_page = 0;
     }
 
-    pss.block = rs->last_seen_block;
-    pss.page = rs->last_page;
-    pss.complete_round = false;
+    pss_init(&pss, rs->last_seen_block, rs->last_page);
 
     do {
         again = true;
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 10/14] migration: Make PageSearchStatus part of RAMState
  2022-09-20 22:50 [PATCH 00/14] migration: Postcopy Preempt-Full Peter Xu
                   ` (8 preceding siblings ...)
  2022-09-20 22:52 ` [PATCH 09/14] migration: Add pss_init() Peter Xu
@ 2022-09-20 22:52 ` Peter Xu
  2022-10-05 18:51   ` Dr. David Alan Gilbert
  2022-10-06  8:37   ` Dr. David Alan Gilbert
  2022-09-20 22:52 ` [PATCH 11/14] migration: Move last_sent_block into PageSearchStatus Peter Xu
                   ` (3 subsequent siblings)
  13 siblings, 2 replies; 44+ messages in thread
From: Peter Xu @ 2022-09-20 22:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: peterx, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Dr . David Alan Gilbert,
	Daniel P . Berrange

We used to allocate PSS structure on the stack for precopy when sending
pages.  Make it static, so as to describe per-channel ram migration status.

Here we declared RAM_CHANNEL_MAX instances, preparing for postcopy to use
it, even though this patch has not yet to start using the 2nd instance.

This should not have any functional change per se, but it already starts to
export PSS information via the RAMState, so that e.g. one PSS channel can
start to reference the other PSS channel.

Always protect PSS access using the same RAMState.bitmap_mutex.  We already
do so, so no code change needed, just some comment update.  Maybe we should
consider renaming bitmap_mutex some day as it's going to be a more commonly
and big mutex we use for ram states, but just leave it for later.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 112 ++++++++++++++++++++++++++----------------------
 1 file changed, 61 insertions(+), 51 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index b4b36ca59e..dbe11e1ace 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -85,6 +85,46 @@
 
 XBZRLECacheStats xbzrle_counters;
 
+/* used by the search for pages to send */
+struct PageSearchStatus {
+    /* The migration channel used for a specific host page */
+    QEMUFile    *pss_channel;
+    /* Current block being searched */
+    RAMBlock    *block;
+    /* Current page to search from */
+    unsigned long page;
+    /* Set once we wrap around */
+    bool         complete_round;
+    /*
+     * [POSTCOPY-ONLY] Whether current page is explicitly requested by
+     * postcopy.  When set, the request is "urgent" because the dest QEMU
+     * threads are waiting for us.
+     */
+    bool         postcopy_requested;
+    /*
+     * [POSTCOPY-ONLY] The target channel to use to send current page.
+     *
+     * Note: This may _not_ match with the value in postcopy_requested
+     * above. Let's imagine the case where the postcopy request is exactly
+     * the page that we're sending in progress during precopy. In this case
+     * we'll have postcopy_requested set to true but the target channel
+     * will be the precopy channel (so that we don't split brain on that
+     * specific page since the precopy channel already contains partial of
+     * that page data).
+     *
+     * Besides that specific use case, postcopy_target_channel should
+     * always be equal to postcopy_requested, because by default we send
+     * postcopy pages via postcopy preempt channel.
+     */
+    bool         postcopy_target_channel;
+    /* Whether we're sending a host page */
+    bool          host_page_sending;
+    /* The start/end of current host page.  Invalid if host_page_sending==false */
+    unsigned long host_page_start;
+    unsigned long host_page_end;
+};
+typedef struct PageSearchStatus PageSearchStatus;
+
 /* struct contains XBZRLE cache and a static page
    used by the compression */
 static struct {
@@ -319,6 +359,11 @@ typedef struct {
 struct RAMState {
     /* QEMUFile used for this migration */
     QEMUFile *f;
+    /*
+     * PageSearchStatus structures for the channels when send pages.
+     * Protected by the bitmap_mutex.
+     */
+    PageSearchStatus pss[RAM_CHANNEL_MAX];
     /* UFFD file descriptor, used in 'write-tracking' migration */
     int uffdio_fd;
     /* Last block that we have visited searching for dirty pages */
@@ -362,7 +407,12 @@ struct RAMState {
     uint64_t target_page_count;
     /* number of dirty bits in the bitmap */
     uint64_t migration_dirty_pages;
-    /* Protects modification of the bitmap and migration dirty pages */
+    /*
+     * Protects:
+     * - dirty/clear bitmap
+     * - migration_dirty_pages
+     * - pss structures
+     */
     QemuMutex bitmap_mutex;
     /* The RAMBlock used in the last src_page_requests */
     RAMBlock *last_req_rb;
@@ -444,46 +494,6 @@ void dirty_sync_missed_zero_copy(void)
     ram_counters.dirty_sync_missed_zero_copy++;
 }
 
-/* used by the search for pages to send */
-struct PageSearchStatus {
-    /* The migration channel used for a specific host page */
-    QEMUFile    *pss_channel;
-    /* Current block being searched */
-    RAMBlock    *block;
-    /* Current page to search from */
-    unsigned long page;
-    /* Set once we wrap around */
-    bool         complete_round;
-    /*
-     * [POSTCOPY-ONLY] Whether current page is explicitly requested by
-     * postcopy.  When set, the request is "urgent" because the dest QEMU
-     * threads are waiting for us.
-     */
-    bool         postcopy_requested;
-    /*
-     * [POSTCOPY-ONLY] The target channel to use to send current page.
-     *
-     * Note: This may _not_ match with the value in postcopy_requested
-     * above. Let's imagine the case where the postcopy request is exactly
-     * the page that we're sending in progress during precopy. In this case
-     * we'll have postcopy_requested set to true but the target channel
-     * will be the precopy channel (so that we don't split brain on that
-     * specific page since the precopy channel already contains partial of
-     * that page data).
-     *
-     * Besides that specific use case, postcopy_target_channel should
-     * always be equal to postcopy_requested, because by default we send
-     * postcopy pages via postcopy preempt channel.
-     */
-    bool         postcopy_target_channel;
-    /* Whether we're sending a host page */
-    bool          host_page_sending;
-    /* The start/end of current host page.  Only valid if host_page_sending==true */
-    unsigned long host_page_start;
-    unsigned long host_page_end;
-};
-typedef struct PageSearchStatus PageSearchStatus;
-
 CompressionStats compression_counters;
 
 struct CompressParam {
@@ -2627,7 +2637,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
  */
 static int ram_find_and_save_block(RAMState *rs)
 {
-    PageSearchStatus pss;
+    PageSearchStatus *pss = &rs->pss[RAM_CHANNEL_PRECOPY];
     int pages = 0;
     bool again, found;
 
@@ -2648,11 +2658,11 @@ static int ram_find_and_save_block(RAMState *rs)
         rs->last_page = 0;
     }
 
-    pss_init(&pss, rs->last_seen_block, rs->last_page);
+    pss_init(pss, rs->last_seen_block, rs->last_page);
 
     do {
         again = true;
-        found = get_queued_page(rs, &pss);
+        found = get_queued_page(rs, pss);
 
         if (!found) {
             /*
@@ -2660,27 +2670,27 @@ static int ram_find_and_save_block(RAMState *rs)
              * preempted precopy.  Otherwise find the next dirty bit.
              */
             if (postcopy_preempt_triggered(rs)) {
-                postcopy_preempt_restore(rs, &pss, false);
+                postcopy_preempt_restore(rs, pss, false);
                 found = true;
             } else {
                 /* priority queue empty, so just search for something dirty */
-                found = find_dirty_block(rs, &pss, &again);
+                found = find_dirty_block(rs, pss, &again);
             }
         }
 
         if (found) {
             /* Update rs->f with correct channel */
             if (postcopy_preempt_active()) {
-                postcopy_preempt_choose_channel(rs, &pss);
+                postcopy_preempt_choose_channel(rs, pss);
             }
             /* Cache rs->f in pss_channel (TODO: remove rs->f) */
-            pss.pss_channel = rs->f;
-            pages = ram_save_host_page(rs, &pss);
+            pss->pss_channel = rs->f;
+            pages = ram_save_host_page(rs, pss);
         }
     } while (!pages && again);
 
-    rs->last_seen_block = pss.block;
-    rs->last_page = pss.page;
+    rs->last_seen_block = pss->block;
+    rs->last_page = pss->page;
 
     return pages;
 }
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 11/14] migration: Move last_sent_block into PageSearchStatus
  2022-09-20 22:50 [PATCH 00/14] migration: Postcopy Preempt-Full Peter Xu
                   ` (9 preceding siblings ...)
  2022-09-20 22:52 ` [PATCH 10/14] migration: Make PageSearchStatus part of RAMState Peter Xu
@ 2022-09-20 22:52 ` Peter Xu
  2022-10-06 16:59   ` Dr. David Alan Gilbert
  2022-09-20 22:52 ` [PATCH 12/14] migration: Send requested page directly in rp-return thread Peter Xu
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-09-20 22:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: peterx, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Dr . David Alan Gilbert,
	Daniel P . Berrange

Since we use PageSearchStatus to represent a channel, it makes perfect
sense to keep last_sent_block (aka, leverage RAM_SAVE_FLAG_CONTINUE) to be
per-channel rather than global because each channel can be sending
different pages on ramblocks.

Hence move it from RAMState into PageSearchStatus.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 71 ++++++++++++++++++++++++++++---------------------
 1 file changed, 41 insertions(+), 30 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index dbe11e1ace..fdcb61a2c8 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -89,6 +89,8 @@ XBZRLECacheStats xbzrle_counters;
 struct PageSearchStatus {
     /* The migration channel used for a specific host page */
     QEMUFile    *pss_channel;
+    /* Last block from where we have sent data */
+    RAMBlock *last_sent_block;
     /* Current block being searched */
     RAMBlock    *block;
     /* Current page to search from */
@@ -368,8 +370,6 @@ struct RAMState {
     int uffdio_fd;
     /* Last block that we have visited searching for dirty pages */
     RAMBlock *last_seen_block;
-    /* Last block from where we have sent data */
-    RAMBlock *last_sent_block;
     /* Last dirty target page we have sent */
     ram_addr_t last_page;
     /* last ram version we have seen */
@@ -677,16 +677,17 @@ exit:
  *
  * Returns the number of bytes written
  *
- * @f: QEMUFile where to send the data
+ * @pss: current PSS channel status
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  *          in the lower bits, it contains flags
  */
-static size_t save_page_header(RAMState *rs, QEMUFile *f,  RAMBlock *block,
+static size_t save_page_header(PageSearchStatus *pss, RAMBlock *block,
                                ram_addr_t offset)
 {
     size_t size, len;
-    bool same_block = (block == rs->last_sent_block);
+    bool same_block = (block == pss->last_sent_block);
+    QEMUFile *f = pss->pss_channel;
 
     if (same_block) {
         offset |= RAM_SAVE_FLAG_CONTINUE;
@@ -699,7 +700,7 @@ static size_t save_page_header(RAMState *rs, QEMUFile *f,  RAMBlock *block,
         qemu_put_byte(f, len);
         qemu_put_buffer(f, (uint8_t *)block->idstr, len);
         size += 1 + len;
-        rs->last_sent_block = block;
+        pss->last_sent_block = block;
     }
     return size;
 }
@@ -783,17 +784,19 @@ static void xbzrle_cache_zero_page(RAMState *rs, ram_addr_t current_addr)
  *          -1 means that xbzrle would be longer than normal
  *
  * @rs: current RAM state
+ * @pss: current PSS channel
  * @current_data: pointer to the address of the page contents
  * @current_addr: addr of the page
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  */
-static int save_xbzrle_page(RAMState *rs, QEMUFile *file,
+static int save_xbzrle_page(RAMState *rs, PageSearchStatus *pss,
                             uint8_t **current_data, ram_addr_t current_addr,
                             RAMBlock *block, ram_addr_t offset)
 {
     int encoded_len = 0, bytes_xbzrle;
     uint8_t *prev_cached_page;
+    QEMUFile *file = pss->pss_channel;
 
     if (!cache_is_cached(XBZRLE.cache, current_addr,
                          ram_counters.dirty_sync_count)) {
@@ -858,7 +861,7 @@ static int save_xbzrle_page(RAMState *rs, QEMUFile *file,
     }
 
     /* Send XBZRLE based compressed page */
-    bytes_xbzrle = save_page_header(rs, file, block,
+    bytes_xbzrle = save_page_header(pss, block,
                                     offset | RAM_SAVE_FLAG_XBZRLE);
     qemu_put_byte(file, ENCODING_FLAG_XBZRLE);
     qemu_put_be16(file, encoded_len);
@@ -1289,19 +1292,19 @@ static void ram_release_page(const char *rbname, uint64_t offset)
  * Returns the size of data written to the file, 0 means the page is not
  * a zero page
  *
- * @rs: current RAM state
- * @file: the file where the data is saved
+ * @pss: current PSS channel
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  */
-static int save_zero_page_to_file(RAMState *rs, QEMUFile *file,
+static int save_zero_page_to_file(PageSearchStatus *pss,
                                   RAMBlock *block, ram_addr_t offset)
 {
     uint8_t *p = block->host + offset;
+    QEMUFile *file = pss->pss_channel;
     int len = 0;
 
     if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
-        len += save_page_header(rs, file, block, offset | RAM_SAVE_FLAG_ZERO);
+        len += save_page_header(pss, block, offset | RAM_SAVE_FLAG_ZERO);
         qemu_put_byte(file, 0);
         len += 1;
         ram_release_page(block->idstr, offset);
@@ -1314,14 +1317,14 @@ static int save_zero_page_to_file(RAMState *rs, QEMUFile *file,
  *
  * Returns the number of pages written.
  *
- * @rs: current RAM state
+ * @pss: current PSS channel
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  */
-static int save_zero_page(RAMState *rs, QEMUFile *file, RAMBlock *block,
+static int save_zero_page(PageSearchStatus *pss, RAMBlock *block,
                           ram_addr_t offset)
 {
-    int len = save_zero_page_to_file(rs, file, block, offset);
+    int len = save_zero_page_to_file(pss, block, offset);
 
     if (len) {
         qatomic_inc(&ram_counters.duplicate);
@@ -1374,16 +1377,18 @@ static bool control_save_page(PageSearchStatus *pss, RAMBlock *block,
  *
  * Returns the number of pages written.
  *
- * @rs: current RAM state
+ * @pss: current PSS channel
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  * @buf: the page to be sent
  * @async: send to page asyncly
  */
-static int save_normal_page(RAMState *rs, QEMUFile *file, RAMBlock *block,
+static int save_normal_page(PageSearchStatus *pss, RAMBlock *block,
                             ram_addr_t offset, uint8_t *buf, bool async)
 {
-    ram_transferred_add(save_page_header(rs, file, block,
+    QEMUFile *file = pss->pss_channel;
+
+    ram_transferred_add(save_page_header(pss, block,
                                          offset | RAM_SAVE_FLAG_PAGE));
     if (async) {
         qemu_put_buffer_async(file, buf, TARGET_PAGE_SIZE,
@@ -1423,7 +1428,7 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
 
     XBZRLE_cache_lock();
     if (rs->xbzrle_enabled && !migration_in_postcopy()) {
-        pages = save_xbzrle_page(rs, pss->pss_channel, &p, current_addr,
+        pages = save_xbzrle_page(rs, pss, &p, current_addr,
                                  block, offset);
         if (!rs->last_stage) {
             /* Can't send this cached data async, since the cache page
@@ -1435,8 +1440,7 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
 
     /* XBZRLE overflow or normal page */
     if (pages == -1) {
-        pages = save_normal_page(rs, pss->pss_channel, block, offset,
-                                 p, send_async);
+        pages = save_normal_page(pss, block, offset, p, send_async);
     }
 
     XBZRLE_cache_unlock();
@@ -1459,14 +1463,15 @@ static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
                                  ram_addr_t offset, uint8_t *source_buf)
 {
     RAMState *rs = ram_state;
+    PageSearchStatus *pss = &rs->pss[RAM_CHANNEL_PRECOPY];
     uint8_t *p = block->host + offset;
     int ret;
 
-    if (save_zero_page_to_file(rs, f, block, offset)) {
+    if (save_zero_page_to_file(pss, block, offset)) {
         return true;
     }
 
-    save_page_header(rs, f, block, offset | RAM_SAVE_FLAG_COMPRESS_PAGE);
+    save_page_header(pss, block, offset | RAM_SAVE_FLAG_COMPRESS_PAGE);
 
     /*
      * copy it to a internal buffer to avoid it being modified by VM
@@ -2286,7 +2291,8 @@ static bool save_page_use_compression(RAMState *rs)
  * has been properly handled by compression, otherwise needs other
  * paths to handle it
  */
-static bool save_compress_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
+static bool save_compress_page(RAMState *rs, PageSearchStatus *pss,
+                               RAMBlock *block, ram_addr_t offset)
 {
     if (!save_page_use_compression(rs)) {
         return false;
@@ -2302,7 +2308,7 @@ static bool save_compress_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
      * We post the fist page as normal page as compression will take
      * much CPU resource.
      */
-    if (block != rs->last_sent_block) {
+    if (block != pss->last_sent_block) {
         flush_compressed_data(rs);
         return false;
     }
@@ -2333,11 +2339,11 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
         return res;
     }
 
-    if (save_compress_page(rs, block, offset)) {
+    if (save_compress_page(rs, pss, block, offset)) {
         return 1;
     }
 
-    res = save_zero_page(rs, pss->pss_channel, block, offset);
+    res = save_zero_page(pss, block, offset);
     if (res > 0) {
         /* Must let xbzrle know, otherwise a previous (now 0'd) cached
          * page would be stale
@@ -2469,7 +2475,7 @@ static void postcopy_preempt_choose_channel(RAMState *rs, PageSearchStatus *pss)
          * If channel switched, reset last_sent_block since the old sent block
          * may not be on the same channel.
          */
-        rs->last_sent_block = NULL;
+        pss->last_sent_block = NULL;
 
         trace_postcopy_preempt_switch_channel(channel);
     }
@@ -2804,8 +2810,13 @@ static void ram_save_cleanup(void *opaque)
 
 static void ram_state_reset(RAMState *rs)
 {
+    int i;
+
+    for (i = 0; i < RAM_CHANNEL_MAX; i++) {
+        rs->pss[i].last_sent_block = NULL;
+    }
+
     rs->last_seen_block = NULL;
-    rs->last_sent_block = NULL;
     rs->last_page = 0;
     rs->last_version = ram_list.version;
     rs->xbzrle_enabled = false;
@@ -2999,8 +3010,8 @@ void ram_postcopy_send_discard_bitmap(MigrationState *ms)
     migration_bitmap_sync(rs);
 
     /* Easiest way to make sure we don't resume in the middle of a host-page */
+    rs->pss[RAM_CHANNEL_PRECOPY].last_sent_block = NULL;
     rs->last_seen_block = NULL;
-    rs->last_sent_block = NULL;
     rs->last_page = 0;
 
     postcopy_each_ram_send_discard(ms);
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 12/14] migration: Send requested page directly in rp-return thread
  2022-09-20 22:50 [PATCH 00/14] migration: Postcopy Preempt-Full Peter Xu
                   ` (10 preceding siblings ...)
  2022-09-20 22:52 ` [PATCH 11/14] migration: Move last_sent_block into PageSearchStatus Peter Xu
@ 2022-09-20 22:52 ` Peter Xu
  2022-10-06 17:51   ` Dr. David Alan Gilbert
  2022-09-20 22:52 ` [PATCH 13/14] migration: Remove old preempt code around state maintainance Peter Xu
  2022-09-20 22:52 ` [PATCH 14/14] migration: Drop rs->f Peter Xu
  13 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-09-20 22:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: peterx, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Dr . David Alan Gilbert,
	Daniel P . Berrange

With all the facilities ready, send the requested page directly in the
rp-return thread rather than queuing it in the request queue, if and only
if postcopy preempt is enabled.  It can achieve so because it uses separate
channel for sending urgent pages.  The only shared data is bitmap and it's
protected by the bitmap_mutex.

Note that since we're moving the ownership of the urgent channel from the
migration thread to rp thread it also means the rp thread is responsible
for managing the qemufile, e.g. properly close it when pausing migration
happens.  For this, let migration_release_from_dst_file to cover shutdown
of the urgent channel too, renaming it as migration_release_dst_files() to
better show what it does.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c |  35 +++++++------
 migration/ram.c       | 112 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 131 insertions(+), 16 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 0eacc0c99b..fae8fd378b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2845,8 +2845,11 @@ static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
     return 0;
 }
 
-/* Release ms->rp_state.from_dst_file in a safe way */
-static void migration_release_from_dst_file(MigrationState *ms)
+/*
+ * Release ms->rp_state.from_dst_file (and postcopy_qemufile_src if
+ * existed) in a safe way.
+ */
+static void migration_release_dst_files(MigrationState *ms)
 {
     QEMUFile *file;
 
@@ -2859,6 +2862,18 @@ static void migration_release_from_dst_file(MigrationState *ms)
         ms->rp_state.from_dst_file = NULL;
     }
 
+    /*
+     * Do the same to postcopy fast path socket too if there is.  No
+     * locking needed because this qemufile should only be managed by
+     * return path thread.
+     */
+    if (ms->postcopy_qemufile_src) {
+        migration_ioc_unregister_yank_from_file(ms->postcopy_qemufile_src);
+        qemu_file_shutdown(ms->postcopy_qemufile_src);
+        qemu_fclose(ms->postcopy_qemufile_src);
+        ms->postcopy_qemufile_src = NULL;
+    }
+
     qemu_fclose(file);
 }
 
@@ -3003,7 +3018,7 @@ out:
              * Maybe there is something we can do: it looks like a
              * network down issue, and we pause for a recovery.
              */
-            migration_release_from_dst_file(ms);
+            migration_release_dst_files(ms);
             rp = NULL;
             if (postcopy_pause_return_path_thread(ms)) {
                 /*
@@ -3021,7 +3036,7 @@ out:
     }
 
     trace_source_return_path_thread_end();
-    migration_release_from_dst_file(ms);
+    migration_release_dst_files(ms);
     rcu_unregister_thread();
     return NULL;
 }
@@ -3544,18 +3559,6 @@ static MigThrError postcopy_pause(MigrationState *s)
         qemu_file_shutdown(file);
         qemu_fclose(file);
 
-        /*
-         * Do the same to postcopy fast path socket too if there is.  No
-         * locking needed because no racer as long as we do this before setting
-         * status to paused.
-         */
-        if (s->postcopy_qemufile_src) {
-            migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
-            qemu_file_shutdown(s->postcopy_qemufile_src);
-            qemu_fclose(s->postcopy_qemufile_src);
-            s->postcopy_qemufile_src = NULL;
-        }
-
         migrate_set_state(&s->state, s->state,
                           MIGRATION_STATUS_POSTCOPY_PAUSED);
 
diff --git a/migration/ram.c b/migration/ram.c
index fdcb61a2c8..fd301d793c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -539,6 +539,8 @@ static QemuThread *decompress_threads;
 static QemuMutex decomp_done_lock;
 static QemuCond decomp_done_cond;
 
+static int ram_save_host_page_urgent(PageSearchStatus *pss);
+
 static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
                                  ram_addr_t offset, uint8_t *source_buf);
 
@@ -553,6 +555,16 @@ static void pss_init(PageSearchStatus *pss, RAMBlock *rb, ram_addr_t page)
     pss->complete_round = false;
 }
 
+/*
+ * Check whether two PSSs are actively sending the same page.  Return true
+ * if it is, false otherwise.
+ */
+static bool pss_overlap(PageSearchStatus *pss1, PageSearchStatus *pss2)
+{
+    return pss1->host_page_sending && pss2->host_page_sending &&
+        (pss1->host_page_start == pss2->host_page_start);
+}
+
 static void *do_data_compress(void *opaque)
 {
     CompressParam *param = opaque;
@@ -2253,6 +2265,57 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len)
         return -1;
     }
 
+    /*
+     * When with postcopy preempt, we send back the page directly in the
+     * rp-return thread.
+     */
+    if (postcopy_preempt_active()) {
+        ram_addr_t page_start = start >> TARGET_PAGE_BITS;
+        size_t page_size = qemu_ram_pagesize(ramblock);
+        PageSearchStatus *pss = &ram_state->pss[RAM_CHANNEL_POSTCOPY];
+        int ret = 0;
+
+        qemu_mutex_lock(&rs->bitmap_mutex);
+
+        pss_init(pss, ramblock, page_start);
+        /*
+         * Always use the preempt channel, and make sure it's there.  It's
+         * safe to access without lock, because when rp-thread is running
+         * we should be the only one who operates on the qemufile
+         */
+        pss->pss_channel = migrate_get_current()->postcopy_qemufile_src;
+        pss->postcopy_requested = true;
+        assert(pss->pss_channel);
+
+        /*
+         * It must be either one or multiple of host page size.  Just
+         * assert; if something wrong we're mostly split brain anyway.
+         */
+        assert(len % page_size == 0);
+        while (len) {
+            if (ram_save_host_page_urgent(pss)) {
+                error_report("%s: ram_save_host_page_urgent() failed: "
+                             "ramblock=%s, start_addr=0x"RAM_ADDR_FMT,
+                             __func__, ramblock->idstr, start);
+                ret = -1;
+                break;
+            }
+            /*
+             * NOTE: after ram_save_host_page_urgent() succeeded, pss->page
+             * will automatically be moved and point to the next host page
+             * we're going to send, so no need to update here.
+             *
+             * Normally QEMU never sends >1 host page in requests, so
+             * logically we don't even need that as the loop should only
+             * run once, but just to be consistent.
+             */
+            len -= page_size;
+        };
+        qemu_mutex_unlock(&rs->bitmap_mutex);
+
+        return ret;
+    }
+
     struct RAMSrcPageRequest *new_entry =
         g_new0(struct RAMSrcPageRequest, 1);
     new_entry->rb = ramblock;
@@ -2531,6 +2594,55 @@ static void pss_host_page_finish(PageSearchStatus *pss)
     pss->host_page_start = pss->host_page_end = 0;
 }
 
+/*
+ * Send an urgent host page specified by `pss'.  Need to be called with
+ * bitmap_mutex held.
+ *
+ * Returns 0 if save host page succeeded, false otherwise.
+ */
+static int ram_save_host_page_urgent(PageSearchStatus *pss)
+{
+    bool page_dirty, sent = false;
+    RAMState *rs = ram_state;
+    int ret = 0;
+
+    trace_postcopy_preempt_send_host_page(pss->block->idstr, pss->page);
+    pss_host_page_prepare(pss);
+
+    /*
+     * If precopy is sending the same page, let it be done in precopy, or
+     * we could send the same page in two channels and none of them will
+     * receive the whole page.
+     */
+    if (pss_overlap(pss, &ram_state->pss[RAM_CHANNEL_PRECOPY])) {
+        trace_postcopy_preempt_hit(pss->block->idstr,
+                                   pss->page << TARGET_PAGE_BITS);
+        return 0;
+    }
+
+    do {
+        page_dirty = migration_bitmap_clear_dirty(rs, pss->block, pss->page);
+
+        if (page_dirty) {
+            /* Be strict to return code; it must be 1, or what else? */
+            if (ram_save_target_page(rs, pss) != 1) {
+                error_report_once("%s: ram_save_target_page failed", __func__);
+                ret = -1;
+                goto out;
+            }
+            sent = true;
+        }
+        pss_find_next_dirty(pss);
+    } while (pss_within_range(pss));
+out:
+    pss_host_page_finish(pss);
+    /* For urgent requests, flush immediately if sent */
+    if (sent) {
+        qemu_fflush(pss->pss_channel);
+    }
+    return ret;
+}
+
 /**
  * ram_save_host_page: save a whole host page
  *
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 13/14] migration: Remove old preempt code around state maintainance
  2022-09-20 22:50 [PATCH 00/14] migration: Postcopy Preempt-Full Peter Xu
                   ` (11 preceding siblings ...)
  2022-09-20 22:52 ` [PATCH 12/14] migration: Send requested page directly in rp-return thread Peter Xu
@ 2022-09-20 22:52 ` Peter Xu
  2022-09-21  0:47   ` Peter Xu
  2022-09-20 22:52 ` [PATCH 14/14] migration: Drop rs->f Peter Xu
  13 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-09-20 22:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: peterx, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Dr . David Alan Gilbert,
	Daniel P . Berrange

With the new code to send pages in rp-return thread, there's little help to
keep lots of the old code on maintaining the preempt state in migration
thread, because the new way should always be faster..

Then if we'll always send pages in the rp-return thread anyway, we don't
need those logic to maintain preempt state anymore because now we serialize
things using the mutex directly instead of using those fields.

It's very unfortunate to have those code for a short period, but that's
still one intermediate step that we noticed the next bottleneck on the
migration thread.  Now what we can do best is to drop unnecessary code as
long as the new code is stable to reduce the burden.  It's actually a good
thing because the new "sending page in rp-return thread" model is (IMHO)
even cleaner and with better performance.

Remove the old code that was responsible for maintaining preempt states, at
the meantime also remove x-postcopy-preempt-break-huge parameter because
with concurrent sender threads we don't really need to break-huge anymore.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c |   2 -
 migration/ram.c       | 258 +-----------------------------------------
 2 files changed, 3 insertions(+), 257 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index fae8fd378b..698fd94591 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -4399,8 +4399,6 @@ static Property migration_properties[] = {
     DEFINE_PROP_SIZE("announce-step", MigrationState,
                       parameters.announce_step,
                       DEFAULT_MIGRATE_ANNOUNCE_STEP),
-    DEFINE_PROP_BOOL("x-postcopy-preempt-break-huge", MigrationState,
-                      postcopy_preempt_break_huge, true),
     DEFINE_PROP_STRING("tls-creds", MigrationState, parameters.tls_creds),
     DEFINE_PROP_STRING("tls-hostname", MigrationState, parameters.tls_hostname),
     DEFINE_PROP_STRING("tls-authz", MigrationState, parameters.tls_authz),
diff --git a/migration/ram.c b/migration/ram.c
index fd301d793c..f42efe02fc 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -343,20 +343,6 @@ struct RAMSrcPageRequest {
     QSIMPLEQ_ENTRY(RAMSrcPageRequest) next_req;
 };
 
-typedef struct {
-    /*
-     * Cached ramblock/offset values if preempted.  They're only meaningful if
-     * preempted==true below.
-     */
-    RAMBlock *ram_block;
-    unsigned long ram_page;
-    /*
-     * Whether a postcopy preemption just happened.  Will be reset after
-     * precopy recovered to background migration.
-     */
-    bool preempted;
-} PostcopyPreemptState;
-
 /* State of RAM for migration */
 struct RAMState {
     /* QEMUFile used for this migration */
@@ -419,14 +405,6 @@ struct RAMState {
     /* Queue of outstanding page requests from the destination */
     QemuMutex src_page_req_mutex;
     QSIMPLEQ_HEAD(, RAMSrcPageRequest) src_page_requests;
-
-    /* Postcopy preemption informations */
-    PostcopyPreemptState postcopy_preempt_state;
-    /*
-     * Current channel we're using on src VM.  Only valid if postcopy-preempt
-     * is enabled.
-     */
-    unsigned int postcopy_channel;
 };
 typedef struct RAMState RAMState;
 
@@ -434,11 +412,6 @@ static RAMState *ram_state;
 
 static NotifierWithReturnList precopy_notifier_list;
 
-static void postcopy_preempt_reset(RAMState *rs)
-{
-    memset(&rs->postcopy_preempt_state, 0, sizeof(PostcopyPreemptState));
-}
-
 /* Whether postcopy has queued requests? */
 static bool postcopy_has_request(RAMState *rs)
 {
@@ -544,9 +517,6 @@ static int ram_save_host_page_urgent(PageSearchStatus *pss);
 static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
                                  ram_addr_t offset, uint8_t *source_buf);
 
-static void postcopy_preempt_restore(RAMState *rs, PageSearchStatus *pss,
-                                     bool postcopy_requested);
-
 /* NOTE: page is the PFN not real ram_addr_t. */
 static void pss_init(PageSearchStatus *pss, RAMBlock *rb, ram_addr_t page)
 {
@@ -2062,55 +2032,6 @@ void ram_write_tracking_stop(void)
 }
 #endif /* defined(__linux__) */
 
-/*
- * Check whether two addr/offset of the ramblock falls onto the same host huge
- * page.  Returns true if so, false otherwise.
- */
-static bool offset_on_same_huge_page(RAMBlock *rb, uint64_t addr1,
-                                     uint64_t addr2)
-{
-    size_t page_size = qemu_ram_pagesize(rb);
-
-    addr1 = ROUND_DOWN(addr1, page_size);
-    addr2 = ROUND_DOWN(addr2, page_size);
-
-    return addr1 == addr2;
-}
-
-/*
- * Whether a previous preempted precopy huge page contains current requested
- * page?  Returns true if so, false otherwise.
- *
- * This should really happen very rarely, because it means when we were sending
- * during background migration for postcopy we're sending exactly the page that
- * some vcpu got faulted on on dest node.  When it happens, we probably don't
- * need to do much but drop the request, because we know right after we restore
- * the precopy stream it'll be serviced.  It'll slightly affect the order of
- * postcopy requests to be serviced (e.g. it'll be the same as we move current
- * request to the end of the queue) but it shouldn't be a big deal.  The most
- * imporant thing is we can _never_ try to send a partial-sent huge page on the
- * POSTCOPY channel again, otherwise that huge page will got "split brain" on
- * two channels (PRECOPY, POSTCOPY).
- */
-static bool postcopy_preempted_contains(RAMState *rs, RAMBlock *block,
-                                        ram_addr_t offset)
-{
-    PostcopyPreemptState *state = &rs->postcopy_preempt_state;
-
-    /* No preemption at all? */
-    if (!state->preempted) {
-        return false;
-    }
-
-    /* Not even the same ramblock? */
-    if (state->ram_block != block) {
-        return false;
-    }
-
-    return offset_on_same_huge_page(block, offset,
-                                    state->ram_page << TARGET_PAGE_BITS);
-}
-
 /**
  * get_queued_page: unqueue a page from the postcopy requests
  *
@@ -2150,20 +2071,7 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
 
     } while (block && !dirty);
 
-    if (block) {
-        /* See comment above postcopy_preempted_contains() */
-        if (postcopy_preempted_contains(rs, block, offset)) {
-            trace_postcopy_preempt_hit(block->idstr, offset);
-            /*
-             * If what we preempted previously was exactly what we're
-             * requesting right now, restore the preempted precopy
-             * immediately, boosting its priority as it's requested by
-             * postcopy.
-             */
-            postcopy_preempt_restore(rs, pss, true);
-            return true;
-        }
-    } else {
+    if (!block) {
         /*
          * Poll write faults too if background snapshot is enabled; that's
          * when we have vcpus got blocked by the write protected pages.
@@ -2433,129 +2341,6 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
     return ram_save_page(rs, pss);
 }
 
-static bool postcopy_needs_preempt(RAMState *rs, PageSearchStatus *pss)
-{
-    MigrationState *ms = migrate_get_current();
-
-    /* Not enabled eager preempt?  Then never do that. */
-    if (!migrate_postcopy_preempt()) {
-        return false;
-    }
-
-    /* If the user explicitly disabled breaking of huge page, skip */
-    if (!ms->postcopy_preempt_break_huge) {
-        return false;
-    }
-
-    /* If the ramblock we're sending is a small page?  Never bother. */
-    if (qemu_ram_pagesize(pss->block) == TARGET_PAGE_SIZE) {
-        return false;
-    }
-
-    /* Not in postcopy at all? */
-    if (!migration_in_postcopy()) {
-        return false;
-    }
-
-    /*
-     * If we're already handling a postcopy request, don't preempt as this page
-     * has got the same high priority.
-     */
-    if (pss->postcopy_requested) {
-        return false;
-    }
-
-    /* If there's postcopy requests, then check it up! */
-    return postcopy_has_request(rs);
-}
-
-/* Returns true if we preempted precopy, false otherwise */
-static void postcopy_do_preempt(RAMState *rs, PageSearchStatus *pss)
-{
-    PostcopyPreemptState *p_state = &rs->postcopy_preempt_state;
-
-    trace_postcopy_preempt_triggered(pss->block->idstr, pss->page);
-
-    /*
-     * Time to preempt precopy. Cache current PSS into preempt state, so that
-     * after handling the postcopy pages we can recover to it.  We need to do
-     * so because the dest VM will have partial of the precopy huge page kept
-     * over in its tmp huge page caches; better move on with it when we can.
-     */
-    p_state->ram_block = pss->block;
-    p_state->ram_page = pss->page;
-    p_state->preempted = true;
-}
-
-/* Whether we're preempted by a postcopy request during sending a huge page */
-static bool postcopy_preempt_triggered(RAMState *rs)
-{
-    return rs->postcopy_preempt_state.preempted;
-}
-
-static void postcopy_preempt_restore(RAMState *rs, PageSearchStatus *pss,
-                                     bool postcopy_requested)
-{
-    PostcopyPreemptState *state = &rs->postcopy_preempt_state;
-
-    assert(state->preempted);
-
-    pss->block = state->ram_block;
-    pss->page = state->ram_page;
-
-    /* Whether this is a postcopy request? */
-    pss->postcopy_requested = postcopy_requested;
-    /*
-     * When restoring a preempted page, the old data resides in PRECOPY
-     * slow channel, even if postcopy_requested is set.  So always use
-     * PRECOPY channel here.
-     */
-    pss->postcopy_target_channel = RAM_CHANNEL_PRECOPY;
-
-    trace_postcopy_preempt_restored(pss->block->idstr, pss->page);
-
-    /* Reset preempt state, most importantly, set preempted==false */
-    postcopy_preempt_reset(rs);
-}
-
-static void postcopy_preempt_choose_channel(RAMState *rs, PageSearchStatus *pss)
-{
-    MigrationState *s = migrate_get_current();
-    unsigned int channel = pss->postcopy_target_channel;
-    QEMUFile *next;
-
-    if (channel != rs->postcopy_channel) {
-        if (channel == RAM_CHANNEL_PRECOPY) {
-            next = s->to_dst_file;
-        } else {
-            next = s->postcopy_qemufile_src;
-        }
-        /* Update and cache the current channel */
-        rs->f = next;
-        rs->postcopy_channel = channel;
-
-        /*
-         * If channel switched, reset last_sent_block since the old sent block
-         * may not be on the same channel.
-         */
-        pss->last_sent_block = NULL;
-
-        trace_postcopy_preempt_switch_channel(channel);
-    }
-
-    trace_postcopy_preempt_send_host_page(pss->block->idstr, pss->page);
-}
-
-/* We need to make sure rs->f always points to the default channel elsewhere */
-static void postcopy_preempt_reset_channel(RAMState *rs)
-{
-    if (postcopy_preempt_active()) {
-        rs->postcopy_channel = RAM_CHANNEL_PRECOPY;
-        rs->f = migrate_get_current()->to_dst_file;
-        trace_postcopy_preempt_reset_channel();
-    }
-}
-
 /* Should be called before sending a host page */
 static void pss_host_page_prepare(PageSearchStatus *pss)
 {
@@ -2677,11 +2462,6 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
     pss_host_page_prepare(pss);
 
     do {
-        if (postcopy_needs_preempt(rs, pss)) {
-            postcopy_do_preempt(rs, pss);
-            break;
-        }
-
         page_dirty = migration_bitmap_clear_dirty(rs, pss->block, pss->page);
         /*
          * Properly yield the lock only in postcopy preempt mode because
@@ -2723,19 +2503,6 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
 
     pss_host_page_finish(pss);
 
-    /*
-     * When with postcopy preempt mode, flush the data as soon as possible for
-     * postcopy requests, because we've already sent a whole huge page, so the
-     * dst node should already have enough resource to atomically filling in
-     * the current missing page.
-     *
-     * More importantly, when using separate postcopy channel, we must do
-     * explicit flush or it won't flush until the buffer is full.
-     */
-    if (migrate_postcopy_preempt() && pss->postcopy_requested) {
-        qemu_fflush(pss->pss_channel);
-    }
-
     res = ram_save_release_protection(rs, pss, start_page);
     return (res < 0 ? res : pages);
 }
@@ -2783,24 +2550,11 @@ static int ram_find_and_save_block(RAMState *rs)
         found = get_queued_page(rs, pss);
 
         if (!found) {
-            /*
-             * Recover previous precopy ramblock/offset if postcopy has
-             * preempted precopy.  Otherwise find the next dirty bit.
-             */
-            if (postcopy_preempt_triggered(rs)) {
-                postcopy_preempt_restore(rs, pss, false);
-                found = true;
-            } else {
-                /* priority queue empty, so just search for something dirty */
-                found = find_dirty_block(rs, pss, &again);
-            }
+            /* priority queue empty, so just search for something dirty */
+            found = find_dirty_block(rs, pss, &again);
         }
 
         if (found) {
-            /* Update rs->f with correct channel */
-            if (postcopy_preempt_active()) {
-                postcopy_preempt_choose_channel(rs, pss);
-            }
             /* Cache rs->f in pss_channel (TODO: remove rs->f) */
             pss->pss_channel = rs->f;
             pages = ram_save_host_page(rs, pss);
@@ -2932,8 +2686,6 @@ static void ram_state_reset(RAMState *rs)
     rs->last_page = 0;
     rs->last_version = ram_list.version;
     rs->xbzrle_enabled = false;
-    postcopy_preempt_reset(rs);
-    rs->postcopy_channel = RAM_CHANNEL_PRECOPY;
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
@@ -3577,8 +3329,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
     }
     qemu_mutex_unlock(&rs->bitmap_mutex);
 
-    postcopy_preempt_reset_channel(rs);
-
     /*
      * Must occur before EOS (or any QEMUFile operation)
      * because of RDMA protocol.
@@ -3656,8 +3406,6 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
         return ret;
     }
 
-    postcopy_preempt_reset_channel(rs);
-
     ret = multifd_send_sync_main(rs->f);
     if (ret < 0) {
         return ret;
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 14/14] migration: Drop rs->f
  2022-09-20 22:50 [PATCH 00/14] migration: Postcopy Preempt-Full Peter Xu
                   ` (12 preceding siblings ...)
  2022-09-20 22:52 ` [PATCH 13/14] migration: Remove old preempt code around state maintainance Peter Xu
@ 2022-09-20 22:52 ` Peter Xu
  2022-10-06 17:57   ` Dr. David Alan Gilbert
  13 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-09-20 22:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: peterx, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Dr . David Alan Gilbert,
	Daniel P . Berrange

Now with rs->pss we can already cache channels in pss->pss_channels.  That
pss_channel contains more infromation than rs->f because it's per-channel.
So rs->f could be replaced by rss->pss[RAM_CHANNEL_PRECOPY].pss_channel,
while rs->f itself is a bit vague now.

Note that vanilla postcopy still send pages via pss[RAM_CHANNEL_PRECOPY],
that's slightly confusing but it reflects the reality.

Then, after the replacement we can safely drop rs->f.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index f42efe02fc..03bf2324ab 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -345,8 +345,6 @@ struct RAMSrcPageRequest {
 
 /* State of RAM for migration */
 struct RAMState {
-    /* QEMUFile used for this migration */
-    QEMUFile *f;
     /*
      * PageSearchStatus structures for the channels when send pages.
      * Protected by the bitmap_mutex.
@@ -2555,8 +2553,6 @@ static int ram_find_and_save_block(RAMState *rs)
         }
 
         if (found) {
-            /* Cache rs->f in pss_channel (TODO: remove rs->f) */
-            pss->pss_channel = rs->f;
             pages = ram_save_host_page(rs, pss);
         }
     } while (!pages && again);
@@ -3112,7 +3108,7 @@ static void ram_state_resume_prepare(RAMState *rs, QEMUFile *out)
     ram_state_reset(rs);
 
     /* Update RAMState cache of output QEMUFile */
-    rs->f = out;
+    rs->pss[RAM_CHANNEL_PRECOPY].pss_channel = out;
 
     trace_ram_state_resume_prepare(pages);
 }
@@ -3203,7 +3199,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
             return -1;
         }
     }
-    (*rsp)->f = f;
+    (*rsp)->pss[RAM_CHANNEL_PRECOPY].pss_channel = f;
 
     WITH_RCU_READ_LOCK_GUARD() {
         qemu_put_be64(f, ram_bytes_total_common(true) | RAM_SAVE_FLAG_MEM_SIZE);
@@ -3338,7 +3334,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 out:
     if (ret >= 0
         && migration_is_setup_or_active(migrate_get_current()->state)) {
-        ret = multifd_send_sync_main(rs->f);
+        ret = multifd_send_sync_main(rs->pss[RAM_CHANNEL_PRECOPY].pss_channel);
         if (ret < 0) {
             return ret;
         }
@@ -3406,7 +3402,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
         return ret;
     }
 
-    ret = multifd_send_sync_main(rs->f);
+    ret = multifd_send_sync_main(rs->pss[RAM_CHANNEL_PRECOPY].pss_channel);
     if (ret < 0) {
         return ret;
     }
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 13/14] migration: Remove old preempt code around state maintainance
  2022-09-20 22:52 ` [PATCH 13/14] migration: Remove old preempt code around state maintainance Peter Xu
@ 2022-09-21  0:47   ` Peter Xu
  2022-09-21 13:54     ` Peter Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-09-21  0:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Manish Mishra, Juan Quintela, ani, Leonardo Bras Soares Passos,
	Dr . David Alan Gilbert, Daniel P . Berrange

On Tue, Sep 20, 2022 at 06:52:27PM -0400, Peter Xu wrote:
> With the new code to send pages in rp-return thread, there's little help to
> keep lots of the old code on maintaining the preempt state in migration
> thread, because the new way should always be faster..
> 
> Then if we'll always send pages in the rp-return thread anyway, we don't
> need those logic to maintain preempt state anymore because now we serialize
> things using the mutex directly instead of using those fields.
> 
> It's very unfortunate to have those code for a short period, but that's
> still one intermediate step that we noticed the next bottleneck on the
> migration thread.  Now what we can do best is to drop unnecessary code as
> long as the new code is stable to reduce the burden.  It's actually a good
> thing because the new "sending page in rp-return thread" model is (IMHO)
> even cleaner and with better performance.
> 
> Remove the old code that was responsible for maintaining preempt states, at
> the meantime also remove x-postcopy-preempt-break-huge parameter because
> with concurrent sender threads we don't really need to break-huge anymore.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c |   2 -
>  migration/ram.c       | 258 +-----------------------------------------
>  2 files changed, 3 insertions(+), 257 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index fae8fd378b..698fd94591 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -4399,8 +4399,6 @@ static Property migration_properties[] = {
>      DEFINE_PROP_SIZE("announce-step", MigrationState,
>                        parameters.announce_step,
>                        DEFAULT_MIGRATE_ANNOUNCE_STEP),
> -    DEFINE_PROP_BOOL("x-postcopy-preempt-break-huge", MigrationState,
> -                      postcopy_preempt_break_huge, true),

Forgot to drop the variable altogether:

diff --git a/migration/migration.h b/migration/migration.h
index cdad8aceaa..ae4ffd3454 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -340,13 +340,6 @@ struct MigrationState {
     bool send_configuration;
     /* Whether we send section footer during migration */
     bool send_section_footer;
-    /*
-     * Whether we allow break sending huge pages when postcopy preempt is
-     * enabled.  When disabled, we won't interrupt precopy within sending a
-     * host huge page, which is the old behavior of vanilla postcopy.
-     * NOTE: this parameter is ignored if postcopy preempt is not enabled.
-     */
-    bool postcopy_preempt_break_huge;
 
     /* Needed by postcopy-pause state */
     QemuSemaphore postcopy_pause_sem;

Will squash this in in next version.

-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 13/14] migration: Remove old preempt code around state maintainance
  2022-09-21  0:47   ` Peter Xu
@ 2022-09-21 13:54     ` Peter Xu
  2022-10-06 17:56       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-09-21 13:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Manish Mishra, Juan Quintela, ani, Leonardo Bras Soares Passos,
	Dr . David Alan Gilbert, Daniel P . Berrange

[-- Attachment #1: Type: text/plain, Size: 2954 bytes --]

On Tue, Sep 20, 2022 at 08:47:20PM -0400, Peter Xu wrote:
> On Tue, Sep 20, 2022 at 06:52:27PM -0400, Peter Xu wrote:
> > With the new code to send pages in rp-return thread, there's little help to
> > keep lots of the old code on maintaining the preempt state in migration
> > thread, because the new way should always be faster..
> > 
> > Then if we'll always send pages in the rp-return thread anyway, we don't
> > need those logic to maintain preempt state anymore because now we serialize
> > things using the mutex directly instead of using those fields.
> > 
> > It's very unfortunate to have those code for a short period, but that's
> > still one intermediate step that we noticed the next bottleneck on the
> > migration thread.  Now what we can do best is to drop unnecessary code as
> > long as the new code is stable to reduce the burden.  It's actually a good
> > thing because the new "sending page in rp-return thread" model is (IMHO)
> > even cleaner and with better performance.
> > 
> > Remove the old code that was responsible for maintaining preempt states, at
> > the meantime also remove x-postcopy-preempt-break-huge parameter because
> > with concurrent sender threads we don't really need to break-huge anymore.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/migration.c |   2 -
> >  migration/ram.c       | 258 +-----------------------------------------
> >  2 files changed, 3 insertions(+), 257 deletions(-)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index fae8fd378b..698fd94591 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -4399,8 +4399,6 @@ static Property migration_properties[] = {
> >      DEFINE_PROP_SIZE("announce-step", MigrationState,
> >                        parameters.announce_step,
> >                        DEFAULT_MIGRATE_ANNOUNCE_STEP),
> > -    DEFINE_PROP_BOOL("x-postcopy-preempt-break-huge", MigrationState,
> > -                      postcopy_preempt_break_huge, true),
> 
> Forgot to drop the variable altogether:
> 
> diff --git a/migration/migration.h b/migration/migration.h
> index cdad8aceaa..ae4ffd3454 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -340,13 +340,6 @@ struct MigrationState {
>      bool send_configuration;
>      /* Whether we send section footer during migration */
>      bool send_section_footer;
> -    /*
> -     * Whether we allow break sending huge pages when postcopy preempt is
> -     * enabled.  When disabled, we won't interrupt precopy within sending a
> -     * host huge page, which is the old behavior of vanilla postcopy.
> -     * NOTE: this parameter is ignored if postcopy preempt is not enabled.
> -     */
> -    bool postcopy_preempt_break_huge;
>  
>      /* Needed by postcopy-pause state */
>      QemuSemaphore postcopy_pause_sem;
> 
> Will squash this in in next version.

Two more varialbes to drop, as attached..


-- 
Peter Xu

[-- Attachment #2: 0001-fixup-migration-Remove-old-preempt-code-around-state.patch --]
[-- Type: text/plain, Size: 3053 bytes --]

From b3308e34398e21c19bd36ec21aae9c7f9f623d75 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Wed, 21 Sep 2022 09:51:55 -0400
Subject: [PATCH] fixup! migration: Remove old preempt code around state
 maintainance
Content-type: text/plain

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 33 ---------------------------------
 1 file changed, 33 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 03bf2324ab..2599eee070 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -97,28 +97,6 @@ struct PageSearchStatus {
     unsigned long page;
     /* Set once we wrap around */
     bool         complete_round;
-    /*
-     * [POSTCOPY-ONLY] Whether current page is explicitly requested by
-     * postcopy.  When set, the request is "urgent" because the dest QEMU
-     * threads are waiting for us.
-     */
-    bool         postcopy_requested;
-    /*
-     * [POSTCOPY-ONLY] The target channel to use to send current page.
-     *
-     * Note: This may _not_ match with the value in postcopy_requested
-     * above. Let's imagine the case where the postcopy request is exactly
-     * the page that we're sending in progress during precopy. In this case
-     * we'll have postcopy_requested set to true but the target channel
-     * will be the precopy channel (so that we don't split brain on that
-     * specific page since the precopy channel already contains partial of
-     * that page data).
-     *
-     * Besides that specific use case, postcopy_target_channel should
-     * always be equal to postcopy_requested, because by default we send
-     * postcopy pages via postcopy preempt channel.
-     */
-    bool         postcopy_target_channel;
     /* Whether we're sending a host page */
     bool          host_page_sending;
     /* The start/end of current host page.  Invalid if host_page_sending==false */
@@ -1573,13 +1551,6 @@ retry:
  */
 static bool find_dirty_block(RAMState *rs, PageSearchStatus *pss, bool *again)
 {
-    /*
-     * This is not a postcopy requested page, mark it "not urgent", and use
-     * precopy channel to send it.
-     */
-    pss->postcopy_requested = false;
-    pss->postcopy_target_channel = RAM_CHANNEL_PRECOPY;
-
     /* Update pss->page for the next dirty bit in ramblock */
     pss_find_next_dirty(pss);
 
@@ -2091,9 +2062,6 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
          * really rare.
          */
         pss->complete_round = false;
-        /* Mark it an urgent request, meanwhile using POSTCOPY channel */
-        pss->postcopy_requested = true;
-        pss->postcopy_target_channel = RAM_CHANNEL_POSTCOPY;
     }
 
     return !!block;
@@ -2190,7 +2158,6 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len)
          * we should be the only one who operates on the qemufile
          */
         pss->pss_channel = migrate_get_current()->postcopy_qemufile_src;
-        pss->postcopy_requested = true;
         assert(pss->pss_channel);
 
         /*
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 02/14] migration: Cleanup xbzrle zero page cache update logic
  2022-09-20 22:50 ` [PATCH 02/14] migration: Cleanup xbzrle zero page cache update logic Peter Xu
@ 2022-10-04 10:33   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2022-10-04 10:33 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Manish Mishra, Juan Quintela,
	Leonardo Bras Soares Passos, ani, Daniel P . Berrange

* Peter Xu (peterx@redhat.com) wrote:
> The major change is to replace "!save_page_use_compression()" with
> "xbzrle_enabled" to make it clear.
> 
> Reasonings:
> 
> (1) When compression enabled, "!save_page_use_compression()" is exactly the
>     same as checking "xbzrle_enabled".
> 
> (2) When compression disabled, "!save_page_use_compression()" always return
>     true.  We used to try calling the xbzrle code, but after this change we
>     won't, and we shouldn't need to.
> 
> Since at it, drop the xbzrle_enabled check in xbzrle_cache_zero_page()
> because with this change it's not needed anymore.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Yes, I think that's right - it took me a bit of thinking to check it.
The important thing is that once xbzrle is enaled then we must always
take in the zero pages to squash old data in the cache.


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/ram.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index d8cf7cc901..fc59c052cf 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -741,10 +741,6 @@ void mig_throttle_counter_reset(void)
>   */
>  static void xbzrle_cache_zero_page(RAMState *rs, ram_addr_t current_addr)
>  {
> -    if (!rs->xbzrle_enabled) {
> -        return;
> -    }
> -
>      /* We don't care if this fails to allocate a new cache page
>       * as long as it updated an old one */
>      cache_insert(XBZRLE.cache, current_addr, XBZRLE.zero_target_page,
> @@ -2301,7 +2297,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
>          /* Must let xbzrle know, otherwise a previous (now 0'd) cached
>           * page would be stale
>           */
> -        if (!save_page_use_compression(rs)) {
> +        if (rs->xbzrle_enabled) {
>              XBZRLE_cache_lock();
>              xbzrle_cache_zero_page(rs, block->offset + offset);
>              XBZRLE_cache_unlock();
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 03/14] migration: Trivial cleanup save_page_header() on same block check
  2022-09-20 22:50 ` [PATCH 03/14] migration: Trivial cleanup save_page_header() on same block check Peter Xu
@ 2022-10-04 10:41   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2022-10-04 10:41 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Manish Mishra, Juan Quintela,
	Leonardo Bras Soares Passos, ani, Daniel P . Berrange

* Peter Xu (peterx@redhat.com) wrote:
> The 2nd check on RAM_SAVE_FLAG_CONTINUE is a bit redundant.  Use a boolean
> to be clearer.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/ram.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index fc59c052cf..62ff2c1469 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -661,14 +661,15 @@ static size_t save_page_header(RAMState *rs, QEMUFile *f,  RAMBlock *block,
>                                 ram_addr_t offset)
>  {
>      size_t size, len;
> +    bool same_block = (block == rs->last_sent_block);
>  
> -    if (block == rs->last_sent_block) {
> +    if (same_block) {
>          offset |= RAM_SAVE_FLAG_CONTINUE;
>      }
>      qemu_put_be64(f, offset);
>      size = 8;
>  
> -    if (!(offset & RAM_SAVE_FLAG_CONTINUE)) {
> +    if (!same_block) {
>          len = strlen(block->idstr);
>          qemu_put_byte(f, len);
>          qemu_put_buffer(f, (uint8_t *)block->idstr, len);
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 04/14] migration: Remove RAMState.f references in compression code
  2022-09-20 22:50 ` [PATCH 04/14] migration: Remove RAMState.f references in compression code Peter Xu
@ 2022-10-04 10:54   ` Dr. David Alan Gilbert
  2022-10-04 14:36     ` Peter Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2022-10-04 10:54 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Manish Mishra, Juan Quintela,
	Leonardo Bras Soares Passos, ani, Daniel P . Berrange

* Peter Xu (peterx@redhat.com) wrote:
> Removing referencing to RAMState.f in compress_page_with_multi_thread() and
> flush_compressed_data().
> 
> Compression code by default isn't compatible with having >1 channels (or it
> won't currently know which channel to flush the compressed data), so to
> make it simple we always flush on the default to_dst_file port until
> someone wants to add >1 ports support, as rs->f right now can really
> change (after postcopy preempt is introduced).
> 
> There should be no functional change at all after patch applied, since as
> long as rs->f referenced in compression code, it must be to_dst_file.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Yes, I think that's true - although I think we need to add checks to
stop someone trying to enable compression+multifd?



Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>


> ---
>  migration/ram.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 62ff2c1469..8303252b6d 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1461,6 +1461,7 @@ static bool save_page_use_compression(RAMState *rs);
>  
>  static void flush_compressed_data(RAMState *rs)
>  {
> +    MigrationState *ms = migrate_get_current();
>      int idx, len, thread_count;
>  
>      if (!save_page_use_compression(rs)) {
> @@ -1479,7 +1480,7 @@ static void flush_compressed_data(RAMState *rs)
>      for (idx = 0; idx < thread_count; idx++) {
>          qemu_mutex_lock(&comp_param[idx].mutex);
>          if (!comp_param[idx].quit) {
> -            len = qemu_put_qemu_file(rs->f, comp_param[idx].file);
> +            len = qemu_put_qemu_file(ms->to_dst_file, comp_param[idx].file);
>              /*
>               * it's safe to fetch zero_page without holding comp_done_lock
>               * as there is no further request submitted to the thread,
> @@ -1498,11 +1499,11 @@ static inline void set_compress_params(CompressParam *param, RAMBlock *block,
>      param->offset = offset;
>  }
>  
> -static int compress_page_with_multi_thread(RAMState *rs, RAMBlock *block,
> -                                           ram_addr_t offset)
> +static int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset)
>  {
>      int idx, thread_count, bytes_xmit = -1, pages = -1;
>      bool wait = migrate_compress_wait_thread();
> +    MigrationState *ms = migrate_get_current();
>  
>      thread_count = migrate_compress_threads();
>      qemu_mutex_lock(&comp_done_lock);
> @@ -1510,7 +1511,8 @@ retry:
>      for (idx = 0; idx < thread_count; idx++) {
>          if (comp_param[idx].done) {
>              comp_param[idx].done = false;
> -            bytes_xmit = qemu_put_qemu_file(rs->f, comp_param[idx].file);
> +            bytes_xmit = qemu_put_qemu_file(ms->to_dst_file,
> +                                            comp_param[idx].file);
>              qemu_mutex_lock(&comp_param[idx].mutex);
>              set_compress_params(&comp_param[idx], block, offset);
>              qemu_cond_signal(&comp_param[idx].cond);
> @@ -2263,7 +2265,7 @@ static bool save_compress_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
>          return false;
>      }
>  
> -    if (compress_page_with_multi_thread(rs, block, offset) > 0) {
> +    if (compress_page_with_multi_thread(block, offset) > 0) {
>          return true;
>      }
>  
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 05/14] migration: Yield bitmap_mutex properly when sending/sleeping
  2022-09-20 22:52 ` [PATCH 05/14] migration: Yield bitmap_mutex properly when sending/sleeping Peter Xu
@ 2022-10-04 13:55   ` Dr. David Alan Gilbert
  2022-10-04 19:13     ` Peter Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2022-10-04 13:55 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

* Peter Xu (peterx@redhat.com) wrote:
> Don't take the bitmap mutex when sending pages, or when being throttled by
> migration_rate_limit() (which is a bit tricky to call it here in ram code,
> but seems still helpful).
> 
> It prepares for the possibility of concurrently sending pages in >1 threads
> using the function ram_save_host_page() because all threads may need the
> bitmap_mutex to operate on bitmaps, so that either sendmsg() or any kind of
> qemu_sem_wait() blocking for one thread will not block the other from
> progressing.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

I generally dont like taking locks conditionally; but this kind of looks
OK; I think it needs a big comment on the start of the function saying
that it's called and left with the lock held but that it might drop it
temporarily.

> ---
>  migration/ram.c | 42 +++++++++++++++++++++++++++++++-----------
>  1 file changed, 31 insertions(+), 11 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 8303252b6d..6e7de6087a 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2463,6 +2463,7 @@ static void postcopy_preempt_reset_channel(RAMState *rs)
>   */
>  static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
>  {
> +    bool page_dirty, release_lock = postcopy_preempt_active();

Could you rename that to something like 'drop_lock' - you are taking the
lock at the end even when you have 'release_lock' set - which is a bit
strange naming.

>      int tmppages, pages = 0;
>      size_t pagesize_bits =
>          qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
> @@ -2486,22 +2487,41 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
>              break;
>          }
>  
> +        page_dirty = migration_bitmap_clear_dirty(rs, pss->block, pss->page);
> +        /*
> +         * Properly yield the lock only in postcopy preempt mode because
> +         * both migration thread and rp-return thread can operate on the
> +         * bitmaps.
> +         */
> +        if (release_lock) {
> +            qemu_mutex_unlock(&rs->bitmap_mutex);
> +        }

Shouldn't the unlock/lock move inside the 'if (page_dirty) {' ?


>          /* Check the pages is dirty and if it is send it */
> -        if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
> +        if (page_dirty) {
>              tmppages = ram_save_target_page(rs, pss);
> -            if (tmppages < 0) {
> -                return tmppages;
> +            if (tmppages >= 0) {
> +                pages += tmppages;
> +                /*
> +                 * Allow rate limiting to happen in the middle of huge pages if
> +                 * something is sent in the current iteration.
> +                 */
> +                if (pagesize_bits > 1 && tmppages > 0) {
> +                    migration_rate_limit();

This feels interesting, I know it's no change from before, and it's
difficult to do here, but it seems odd to hold the lock around the
sleeping in the rate limit.

Dave

> +                }
>              }
> +        } else {
> +            tmppages = 0;
> +        }
>  
> -            pages += tmppages;
> -            /*
> -             * Allow rate limiting to happen in the middle of huge pages if
> -             * something is sent in the current iteration.
> -             */
> -            if (pagesize_bits > 1 && tmppages > 0) {
> -                migration_rate_limit();
> -            }
> +        if (release_lock) {
> +            qemu_mutex_lock(&rs->bitmap_mutex);
>          }
> +
> +        if (tmppages < 0) {
> +            return tmppages;
> +        }
> +
>          pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
>      } while ((pss->page < hostpage_boundary) &&
>               offset_in_ramblock(pss->block,
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 04/14] migration: Remove RAMState.f references in compression code
  2022-10-04 10:54   ` Dr. David Alan Gilbert
@ 2022-10-04 14:36     ` Peter Xu
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Xu @ 2022-10-04 14:36 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Manish Mishra, Juan Quintela,
	Leonardo Bras Soares Passos, ani, Daniel P . Berrange

On Tue, Oct 04, 2022 at 11:54:02AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > Removing referencing to RAMState.f in compress_page_with_multi_thread() and
> > flush_compressed_data().
> > 
> > Compression code by default isn't compatible with having >1 channels (or it
> > won't currently know which channel to flush the compressed data), so to
> > make it simple we always flush on the default to_dst_file port until
> > someone wants to add >1 ports support, as rs->f right now can really
> > change (after postcopy preempt is introduced).
> > 
> > There should be no functional change at all after patch applied, since as
> > long as rs->f referenced in compression code, it must be to_dst_file.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> Yes, I think that's true - although I think we need to add checks to
> stop someone trying to enable compression+multifd?

We could.  I think they can be enabled and migration will just ignore the
multifd feature as ram_save_target_page() handles compress earlier.  Even
if save_compress_page() fails with -EBUSY we'll still skip multifd:

    if (!save_page_use_compression(rs) && migrate_use_multifd()
        && !migration_in_postcopy()) {
        return ram_save_multifd_page(rs, block, offset);
    }

Explicitly disable compression would be still cleaner, then we can even
drop above check on save_page_use_compression().  Slight risk of breaking
someone's config, but I don't think it should be majority.

If that looks good, I can add one patch for it (probably in the other
patchset, though, which I'll repost today).

> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Thanks!

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/14] migration: Use atomic ops properly for page accountings
  2022-09-20 22:52 ` [PATCH 06/14] migration: Use atomic ops properly for page accountings Peter Xu
@ 2022-10-04 16:59   ` Dr. David Alan Gilbert
  2022-10-04 19:23     ` Peter Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2022-10-04 16:59 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

* Peter Xu (peterx@redhat.com) wrote:
> To prepare for thread-safety on page accountings, at least below counters
> need to be accessed only atomically, they are:
> 
>         ram_counters.transferred
>         ram_counters.duplicate
>         ram_counters.normal
>         ram_counters.postcopy_bytes
> 
> There are a lot of other counters but they won't be accessed outside
> migration thread, then they're still safe to be accessed without atomic
> ops.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

I think this is OK; I'm not sure whether the memset 0's of ram_counters
technically need changing.
I'd love to put a comment somewhere saying these fields need to be
atomically read, but their qapi defined so I don't think we can.

Finally, we probably need to check these are happy on 32 bit builds,
sometimes it's a bit funny with atomic adds.


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c | 10 +++++-----
>  migration/multifd.c   |  2 +-
>  migration/ram.c       | 29 +++++++++++++++--------------
>  3 files changed, 21 insertions(+), 20 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 07c74a79a2..0eacc0c99b 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1048,13 +1048,13 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
>  
>      info->has_ram = true;
>      info->ram = g_malloc0(sizeof(*info->ram));
> -    info->ram->transferred = ram_counters.transferred;
> +    info->ram->transferred = qatomic_read(&ram_counters.transferred);
>      info->ram->total = ram_bytes_total();
> -    info->ram->duplicate = ram_counters.duplicate;
> +    info->ram->duplicate = qatomic_read(&ram_counters.duplicate);
>      /* legacy value.  It is not used anymore */
>      info->ram->skipped = 0;
> -    info->ram->normal = ram_counters.normal;
> -    info->ram->normal_bytes = ram_counters.normal * page_size;
> +    info->ram->normal = qatomic_read(&ram_counters.normal);
> +    info->ram->normal_bytes = info->ram->normal * page_size;
>      info->ram->mbps = s->mbps;
>      info->ram->dirty_sync_count = ram_counters.dirty_sync_count;
>      info->ram->dirty_sync_missed_zero_copy =
> @@ -1065,7 +1065,7 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
>      info->ram->pages_per_second = s->pages_per_second;
>      info->ram->precopy_bytes = ram_counters.precopy_bytes;
>      info->ram->downtime_bytes = ram_counters.downtime_bytes;
> -    info->ram->postcopy_bytes = ram_counters.postcopy_bytes;
> +    info->ram->postcopy_bytes = qatomic_read(&ram_counters.postcopy_bytes);
>  
>      if (migrate_use_xbzrle()) {
>          info->has_xbzrle_cache = true;
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 586ddc9d65..460326acd4 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -437,7 +437,7 @@ static int multifd_send_pages(QEMUFile *f)
>                  + p->packet_len;
>      qemu_file_acct_rate_limit(f, transferred);
>      ram_counters.multifd_bytes += transferred;
> -    ram_counters.transferred += transferred;
> +    qatomic_add(&ram_counters.transferred, transferred);
>      qemu_mutex_unlock(&p->mutex);
>      qemu_sem_post(&p->sem);
>  
> diff --git a/migration/ram.c b/migration/ram.c
> index 6e7de6087a..5bd3d76bf0 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -432,11 +432,11 @@ static void ram_transferred_add(uint64_t bytes)
>      if (runstate_is_running()) {
>          ram_counters.precopy_bytes += bytes;
>      } else if (migration_in_postcopy()) {
> -        ram_counters.postcopy_bytes += bytes;
> +        qatomic_add(&ram_counters.postcopy_bytes, bytes);
>      } else {
>          ram_counters.downtime_bytes += bytes;
>      }
> -    ram_counters.transferred += bytes;
> +    qatomic_add(&ram_counters.transferred, bytes);
>  }
>  
>  void dirty_sync_missed_zero_copy(void)
> @@ -725,7 +725,7 @@ void mig_throttle_counter_reset(void)
>  
>      rs->time_last_bitmap_sync = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>      rs->num_dirty_pages_period = 0;
> -    rs->bytes_xfer_prev = ram_counters.transferred;
> +    rs->bytes_xfer_prev = qatomic_read(&ram_counters.transferred);
>  }
>  
>  /**
> @@ -1085,8 +1085,9 @@ uint64_t ram_pagesize_summary(void)
>  
>  uint64_t ram_get_total_transferred_pages(void)
>  {
> -    return  ram_counters.normal + ram_counters.duplicate +
> -                compression_counters.pages + xbzrle_counters.pages;
> +    return  qatomic_read(&ram_counters.normal) +
> +        qatomic_read(&ram_counters.duplicate) +
> +        compression_counters.pages + xbzrle_counters.pages;
>  }
>  
>  static void migration_update_rates(RAMState *rs, int64_t end_time)
> @@ -1145,8 +1146,8 @@ static void migration_trigger_throttle(RAMState *rs)
>  {
>      MigrationState *s = migrate_get_current();
>      uint64_t threshold = s->parameters.throttle_trigger_threshold;
> -
> -    uint64_t bytes_xfer_period = ram_counters.transferred - rs->bytes_xfer_prev;
> +    uint64_t bytes_xfer_period =
> +        qatomic_read(&ram_counters.transferred) - rs->bytes_xfer_prev;
>      uint64_t bytes_dirty_period = rs->num_dirty_pages_period * TARGET_PAGE_SIZE;
>      uint64_t bytes_dirty_threshold = bytes_xfer_period * threshold / 100;
>  
> @@ -1285,7 +1286,7 @@ static int save_zero_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
>      int len = save_zero_page_to_file(rs, rs->f, block, offset);
>  
>      if (len) {
> -        ram_counters.duplicate++;
> +        qatomic_inc(&ram_counters.duplicate);
>          ram_transferred_add(len);
>          return 1;
>      }
> @@ -1322,9 +1323,9 @@ static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
>      }
>  
>      if (bytes_xmit > 0) {
> -        ram_counters.normal++;
> +        qatomic_inc(&ram_counters.normal);
>      } else if (bytes_xmit == 0) {
> -        ram_counters.duplicate++;
> +        qatomic_inc(&ram_counters.duplicate);
>      }
>  
>      return true;
> @@ -1354,7 +1355,7 @@ static int save_normal_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
>          qemu_put_buffer(rs->f, buf, TARGET_PAGE_SIZE);
>      }
>      ram_transferred_add(TARGET_PAGE_SIZE);
> -    ram_counters.normal++;
> +    qatomic_inc(&ram_counters.normal);
>      return 1;
>  }
>  
> @@ -1448,7 +1449,7 @@ update_compress_thread_counts(const CompressParam *param, int bytes_xmit)
>      ram_transferred_add(bytes_xmit);
>  
>      if (param->zero_page) {
> -        ram_counters.duplicate++;
> +        qatomic_inc(&ram_counters.duplicate);
>          return;
>      }
>  
> @@ -2620,9 +2621,9 @@ void acct_update_position(QEMUFile *f, size_t size, bool zero)
>      uint64_t pages = size / TARGET_PAGE_SIZE;
>  
>      if (zero) {
> -        ram_counters.duplicate += pages;
> +        qatomic_add(&ram_counters.duplicate, pages);
>      } else {
> -        ram_counters.normal += pages;
> +        qatomic_add(&ram_counters.normal, pages);
>          ram_transferred_add(size);
>          qemu_file_credit_transfer(f, size);
>      }
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 05/14] migration: Yield bitmap_mutex properly when sending/sleeping
  2022-10-04 13:55   ` Dr. David Alan Gilbert
@ 2022-10-04 19:13     ` Peter Xu
  2022-10-05 11:18       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-10-04 19:13 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

On Tue, Oct 04, 2022 at 02:55:10PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > Don't take the bitmap mutex when sending pages, or when being throttled by
> > migration_rate_limit() (which is a bit tricky to call it here in ram code,
> > but seems still helpful).
> > 
> > It prepares for the possibility of concurrently sending pages in >1 threads
> > using the function ram_save_host_page() because all threads may need the
> > bitmap_mutex to operate on bitmaps, so that either sendmsg() or any kind of
> > qemu_sem_wait() blocking for one thread will not block the other from
> > progressing.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> I generally dont like taking locks conditionally; but this kind of looks
> OK; I think it needs a big comment on the start of the function saying
> that it's called and left with the lock held but that it might drop it
> temporarily.

Right, the code is slightly hard to read, I just didn't yet see a good and
easy solution for it yet.  It's just that we may still want to keep the
lock as long as possible for precopy in one shot.

> 
> > ---
> >  migration/ram.c | 42 +++++++++++++++++++++++++++++++-----------
> >  1 file changed, 31 insertions(+), 11 deletions(-)
> > 
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 8303252b6d..6e7de6087a 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -2463,6 +2463,7 @@ static void postcopy_preempt_reset_channel(RAMState *rs)
> >   */
> >  static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
> >  {
> > +    bool page_dirty, release_lock = postcopy_preempt_active();
> 
> Could you rename that to something like 'drop_lock' - you are taking the
> lock at the end even when you have 'release_lock' set - which is a bit
> strange naming.

Is there any difference on "drop" or "release"?  I'll change the name
anyway since I definitely trust you on any English comments, but please
still let me know - I love to learn more on those! :)

> 
> >      int tmppages, pages = 0;
> >      size_t pagesize_bits =
> >          qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
> > @@ -2486,22 +2487,41 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
> >              break;
> >          }
> >  
> > +        page_dirty = migration_bitmap_clear_dirty(rs, pss->block, pss->page);
> > +        /*
> > +         * Properly yield the lock only in postcopy preempt mode because
> > +         * both migration thread and rp-return thread can operate on the
> > +         * bitmaps.
> > +         */
> > +        if (release_lock) {
> > +            qemu_mutex_unlock(&rs->bitmap_mutex);
> > +        }
> 
> Shouldn't the unlock/lock move inside the 'if (page_dirty) {' ?

I think we can move into it, but it may not be as optimal as keeping it
as-is.

Consider a case where we've got the bitmap with continous zero bits.
During postcopy, the migration thread could be spinning here with the lock
held even if it doesn't send a thing.  It could still block the other
return path thread on sending urgent pages which may be outside the zero
zones.

> 
> 
> >          /* Check the pages is dirty and if it is send it */
> > -        if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
> > +        if (page_dirty) {
> >              tmppages = ram_save_target_page(rs, pss);
> > -            if (tmppages < 0) {
> > -                return tmppages;
> > +            if (tmppages >= 0) {
> > +                pages += tmppages;
> > +                /*
> > +                 * Allow rate limiting to happen in the middle of huge pages if
> > +                 * something is sent in the current iteration.
> > +                 */
> > +                if (pagesize_bits > 1 && tmppages > 0) {
> > +                    migration_rate_limit();
> 
> This feels interesting, I know it's no change from before, and it's
> difficult to do here, but it seems odd to hold the lock around the
> sleeping in the rate limit.

Good point.. I think I'll leave it there for this patch because it's
totally irrelevant, but seems proper in the future to do unlocking too for
normal precopy.

Maybe I'll just attach a patch at the end of this series when I repost.
That'll be easier before things got forgotten again.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/14] migration: Use atomic ops properly for page accountings
  2022-10-04 16:59   ` Dr. David Alan Gilbert
@ 2022-10-04 19:23     ` Peter Xu
  2022-10-05 11:38       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-10-04 19:23 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

On Tue, Oct 04, 2022 at 05:59:36PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > To prepare for thread-safety on page accountings, at least below counters
> > need to be accessed only atomically, they are:
> > 
> >         ram_counters.transferred
> >         ram_counters.duplicate
> >         ram_counters.normal
> >         ram_counters.postcopy_bytes
> > 
> > There are a lot of other counters but they won't be accessed outside
> > migration thread, then they're still safe to be accessed without atomic
> > ops.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> I think this is OK; I'm not sure whether the memset 0's of ram_counters
> technically need changing.

IMHO they're fine - what we need there should be thing like WRITE_ONCE()
just to make sure no register caches (actually atomic_write() is normally
implemented with WRITE_ONCE afaik).  But I think that's already guaranteed
by memset() as the function call does, so we should be 100% safe.

> I'd love to put a comment somewhere saying these fields need to be
> atomically read, but their qapi defined so I don't think we can.

How about I add a comment above ram_counters declarations in ram.c?

> 
> Finally, we probably need to check these are happy on 32 bit builds,
> sometimes it's a bit funny with atomic adds.

Yeah.. I hope using qatomic_*() APIs can help me avoid any issues.  Or
anything concerning?  I'd be happy to test on specific things if there are.

> 
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Thanks!

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 07/14] migration: Teach PSS about host page
  2022-09-20 22:52 ` [PATCH 07/14] migration: Teach PSS about host page Peter Xu
@ 2022-10-05 11:12   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2022-10-05 11:12 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

* Peter Xu (peterx@redhat.com) wrote:
> Migration code has a lot to do with host pages.  Teaching PSS core about
> the idea of host page helps a lot and makes the code clean.  Meanwhile,
> this prepares for the future changes that can leverage the new PSS helpers
> that this patch introduces to send host page in another thread.
> 
> Three more fields are introduced for this:
> 
>   (1) host_page_sending: this is set to true when QEMU is sending a host
>       page, false otherwise.
> 
>   (2) host_page_{start|end}: these point to the start/end of host page
>       we're sending, and it's only valid when host_page_sending==true.
> 
> For example, when we look up the next dirty page on the ramblock, with
> host_page_sending==true, we'll not try to look for anything beyond the
> current host page boundary.  This can be slightly efficient than current
> code because currently we'll set pss->page to next dirty bit (which can be
> over current host page boundary) and reset it to host page boundary if we
> found it goes beyond that.
> 
> With above, we can easily make migration_bitmap_find_dirty() self contained
> by updating pss->page properly.  rs* parameter is removed because it's not
> even used in old code.
> 
> When sending a host page, we should use the pss helpers like this:
> 
>   - pss_host_page_prepare(pss): called before sending host page
>   - pss_within_range(pss): whether we're still working on the cur host page?
>   - pss_host_page_finish(pss): called after sending a host page
> 
> Then we can use ram_save_target_page() to save one small page.
> 
> Currently ram_save_host_page() is still the only user. If there'll be
> another function to send host page (e.g. in return path thread) in the
> future, it should follow the same style.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/ram.c | 95 +++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 76 insertions(+), 19 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 5bd3d76bf0..3f720b6de2 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -474,6 +474,11 @@ struct PageSearchStatus {
>       * postcopy pages via postcopy preempt channel.
>       */
>      bool         postcopy_target_channel;
> +    /* Whether we're sending a host page */
> +    bool          host_page_sending;
> +    /* The start/end of current host page.  Only valid if host_page_sending==true */
> +    unsigned long host_page_start;
> +    unsigned long host_page_end;
>  };
>  typedef struct PageSearchStatus PageSearchStatus;
>  
> @@ -851,26 +856,38 @@ static int save_xbzrle_page(RAMState *rs, uint8_t **current_data,
>  }
>  
>  /**
> - * migration_bitmap_find_dirty: find the next dirty page from start
> + * pss_find_next_dirty: find the next dirty page of current ramblock
>   *
> - * Returns the page offset within memory region of the start of a dirty page
> + * This function updates pss->page to point to the next dirty page index
> + * within the ramblock to migrate, or the end of ramblock when nothing
> + * found.  Note that when pss->host_page_sending==true it means we're
> + * during sending a host page, so we won't look for dirty page that is
> + * outside the host page boundary.
>   *
> - * @rs: current RAM state
> - * @rb: RAMBlock where to search for dirty pages
> - * @start: page where we start the search
> + * @pss: the current page search status
>   */
> -static inline
> -unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
> -                                          unsigned long start)
> +static void pss_find_next_dirty(PageSearchStatus *pss)
>  {
> +    RAMBlock *rb = pss->block;
>      unsigned long size = rb->used_length >> TARGET_PAGE_BITS;
>      unsigned long *bitmap = rb->bmap;
>  
>      if (ramblock_is_ignored(rb)) {
> -        return size;
> +        /* Points directly to the end, so we know no dirty page */
> +        pss->page = size;
> +        return;
> +    }
> +
> +    /*
> +     * If during sending a host page, only look for dirty pages within the
> +     * current host page being send.
> +     */
> +    if (pss->host_page_sending) {
> +        assert(pss->host_page_end);
> +        size = MIN(size, pss->host_page_end);
>      }
>  
> -    return find_next_bit(bitmap, size, start);
> +    pss->page = find_next_bit(bitmap, size, pss->page);
>  }
>  
>  static void migration_clear_memory_region_dirty_bitmap(RAMBlock *rb,
> @@ -1556,7 +1573,9 @@ static bool find_dirty_block(RAMState *rs, PageSearchStatus *pss, bool *again)
>      pss->postcopy_requested = false;
>      pss->postcopy_target_channel = RAM_CHANNEL_PRECOPY;
>  
> -    pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
> +    /* Update pss->page for the next dirty bit in ramblock */
> +    pss_find_next_dirty(pss);
> +
>      if (pss->complete_round && pss->block == rs->last_seen_block &&
>          pss->page >= rs->last_page) {
>          /*
> @@ -2446,6 +2465,44 @@ static void postcopy_preempt_reset_channel(RAMState *rs)
>      }
>  }
>  
> +/* Should be called before sending a host page */
> +static void pss_host_page_prepare(PageSearchStatus *pss)
> +{
> +    /* How many guest pages are there in one host page? */
> +    size_t guest_pfns = qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
> +
> +    pss->host_page_sending = true;
> +    pss->host_page_start = ROUND_DOWN(pss->page, guest_pfns);
> +    pss->host_page_end = ROUND_UP(pss->page + 1, guest_pfns);
> +}
> +
> +/*
> + * Whether the page pointed by PSS is within the host page being sent.
> + * Must be called after a previous pss_host_page_prepare().
> + */
> +static bool pss_within_range(PageSearchStatus *pss)
> +{
> +    ram_addr_t ram_addr;
> +
> +    assert(pss->host_page_sending);
> +
> +    /* Over host-page boundary? */
> +    if (pss->page >= pss->host_page_end) {
> +        return false;
> +    }
> +
> +    ram_addr = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> +
> +    return offset_in_ramblock(pss->block, ram_addr);
> +}
> +
> +static void pss_host_page_finish(PageSearchStatus *pss)
> +{
> +    pss->host_page_sending = false;
> +    /* This is not needed, but just to reset it */
> +    pss->host_page_start = pss->host_page_end = 0;
> +}
> +
>  /**
>   * ram_save_host_page: save a whole host page
>   *
> @@ -2468,8 +2525,6 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
>      int tmppages, pages = 0;
>      size_t pagesize_bits =
>          qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
> -    unsigned long hostpage_boundary =
> -        QEMU_ALIGN_UP(pss->page + 1, pagesize_bits);
>      unsigned long start_page = pss->page;
>      int res;
>  
> @@ -2482,6 +2537,9 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
>          postcopy_preempt_choose_channel(rs, pss);
>      }
>  
> +    /* Update host page boundary information */
> +    pss_host_page_prepare(pss);
> +
>      do {
>          if (postcopy_needs_preempt(rs, pss)) {
>              postcopy_do_preempt(rs, pss);
> @@ -2520,15 +2578,14 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
>          }
>  
>          if (tmppages < 0) {
> +            pss_host_page_finish(pss);
>              return tmppages;
>          }
>  
> -        pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
> -    } while ((pss->page < hostpage_boundary) &&
> -             offset_in_ramblock(pss->block,
> -                                ((ram_addr_t)pss->page) << TARGET_PAGE_BITS));
> -    /* The offset we leave with is the min boundary of host page and block */
> -    pss->page = MIN(pss->page, hostpage_boundary);
> +        pss_find_next_dirty(pss);
> +    } while (pss_within_range(pss));
> +
> +    pss_host_page_finish(pss);
>  
>      /*
>       * When with postcopy preempt mode, flush the data as soon as possible for
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 05/14] migration: Yield bitmap_mutex properly when sending/sleeping
  2022-10-04 19:13     ` Peter Xu
@ 2022-10-05 11:18       ` Dr. David Alan Gilbert
  2022-10-05 13:40         ` Peter Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2022-10-05 11:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

* Peter Xu (peterx@redhat.com) wrote:
> On Tue, Oct 04, 2022 at 02:55:10PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > Don't take the bitmap mutex when sending pages, or when being throttled by
> > > migration_rate_limit() (which is a bit tricky to call it here in ram code,
> > > but seems still helpful).
> > > 
> > > It prepares for the possibility of concurrently sending pages in >1 threads
> > > using the function ram_save_host_page() because all threads may need the
> > > bitmap_mutex to operate on bitmaps, so that either sendmsg() or any kind of
> > > qemu_sem_wait() blocking for one thread will not block the other from
> > > progressing.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > 
> > I generally dont like taking locks conditionally; but this kind of looks
> > OK; I think it needs a big comment on the start of the function saying
> > that it's called and left with the lock held but that it might drop it
> > temporarily.
> 
> Right, the code is slightly hard to read, I just didn't yet see a good and
> easy solution for it yet.  It's just that we may still want to keep the
> lock as long as possible for precopy in one shot.
> 
> > 
> > > ---
> > >  migration/ram.c | 42 +++++++++++++++++++++++++++++++-----------
> > >  1 file changed, 31 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index 8303252b6d..6e7de6087a 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -2463,6 +2463,7 @@ static void postcopy_preempt_reset_channel(RAMState *rs)
> > >   */
> > >  static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
> > >  {
> > > +    bool page_dirty, release_lock = postcopy_preempt_active();
> > 
> > Could you rename that to something like 'drop_lock' - you are taking the
> > lock at the end even when you have 'release_lock' set - which is a bit
> > strange naming.
> 
> Is there any difference on "drop" or "release"?  I'll change the name
> anyway since I definitely trust you on any English comments, but please
> still let me know - I love to learn more on those! :)

I'm not sure 'drop' is much better either; I was struggling to find a
good nam.

> > 
> > >      int tmppages, pages = 0;
> > >      size_t pagesize_bits =
> > >          qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
> > > @@ -2486,22 +2487,41 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
> > >              break;
> > >          }
> > >  
> > > +        page_dirty = migration_bitmap_clear_dirty(rs, pss->block, pss->page);
> > > +        /*
> > > +         * Properly yield the lock only in postcopy preempt mode because
> > > +         * both migration thread and rp-return thread can operate on the
> > > +         * bitmaps.
> > > +         */
> > > +        if (release_lock) {
> > > +            qemu_mutex_unlock(&rs->bitmap_mutex);
> > > +        }
> > 
> > Shouldn't the unlock/lock move inside the 'if (page_dirty) {' ?
> 
> I think we can move into it, but it may not be as optimal as keeping it
> as-is.
> 
> Consider a case where we've got the bitmap with continous zero bits.
> During postcopy, the migration thread could be spinning here with the lock
> held even if it doesn't send a thing.  It could still block the other
> return path thread on sending urgent pages which may be outside the zero
> zones.

OK, that reason needs commenting then - you're going to do a lot of
release/take pairs in that case which is going to show up as very hot;
so hmm, if ti was just for that type of 'yield' behaviour you wouldn't
normally do it for each bit.

> > 
> > 
> > >          /* Check the pages is dirty and if it is send it */
> > > -        if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
> > > +        if (page_dirty) {
> > >              tmppages = ram_save_target_page(rs, pss);
> > > -            if (tmppages < 0) {
> > > -                return tmppages;
> > > +            if (tmppages >= 0) {
> > > +                pages += tmppages;
> > > +                /*
> > > +                 * Allow rate limiting to happen in the middle of huge pages if
> > > +                 * something is sent in the current iteration.
> > > +                 */
> > > +                if (pagesize_bits > 1 && tmppages > 0) {
> > > +                    migration_rate_limit();
> > 
> > This feels interesting, I know it's no change from before, and it's
> > difficult to do here, but it seems odd to hold the lock around the
> > sleeping in the rate limit.
> 
> Good point.. I think I'll leave it there for this patch because it's
> totally irrelevant, but seems proper in the future to do unlocking too for
> normal precopy.
> 
> Maybe I'll just attach a patch at the end of this series when I repost.
> That'll be easier before things got forgotten again.

Dave

> -- 
> Peter Xu
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/14] migration: Use atomic ops properly for page accountings
  2022-10-04 19:23     ` Peter Xu
@ 2022-10-05 11:38       ` Dr. David Alan Gilbert
  2022-10-05 13:53         ` Peter Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2022-10-05 11:38 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

* Peter Xu (peterx@redhat.com) wrote:
> On Tue, Oct 04, 2022 at 05:59:36PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > To prepare for thread-safety on page accountings, at least below counters
> > > need to be accessed only atomically, they are:
> > > 
> > >         ram_counters.transferred
> > >         ram_counters.duplicate
> > >         ram_counters.normal
> > >         ram_counters.postcopy_bytes
> > > 
> > > There are a lot of other counters but they won't be accessed outside
> > > migration thread, then they're still safe to be accessed without atomic
> > > ops.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > 
> > I think this is OK; I'm not sure whether the memset 0's of ram_counters
> > technically need changing.
> 
> IMHO they're fine - what we need there should be thing like WRITE_ONCE()
> just to make sure no register caches (actually atomic_write() is normally
> implemented with WRITE_ONCE afaik).  But I think that's already guaranteed
> by memset() as the function call does, so we should be 100% safe.

I agree you're probably OK.

> > I'd love to put a comment somewhere saying these fields need to be
> > atomically read, but their qapi defined so I don't think we can.
> 
> How about I add a comment above ram_counters declarations in ram.c?

Yeh.

> > 
> > Finally, we probably need to check these are happy on 32 bit builds,
> > sometimes it's a bit funny with atomic adds.
> 
> Yeah.. I hope using qatomic_*() APIs can help me avoid any issues.  Or
> anything concerning?  I'd be happy to test on specific things if there are.

I just remember hitting problems in the past; especially if we end up
with trying to do a 64 bit atomic on a platofmr that can only do 32???

Dave

> > 
> > 
> > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Thanks!
> 
> -- 
> Peter Xu
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 08/14] migration: Introduce pss_channel
  2022-09-20 22:52 ` [PATCH 08/14] migration: Introduce pss_channel Peter Xu
@ 2022-10-05 13:03   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2022-10-05 13:03 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

* Peter Xu (peterx@redhat.com) wrote:
> Introduce pss_channel for PageSearchStatus, define it as "the migration
> channel to be used to transfer this host page".
> 
> We used to have rs->f, which is a mirror to MigrationState.to_dst_file.
> 
> After postcopy preempt initial version, rs->f can be dynamically changed
> depending on which channel we want to use.
> 
> But that later work still doesn't grant full concurrency of sending pages
> in e.g. different threads, because rs->f can either be the PRECOPY channel
> or POSTCOPY channel.  This needs to be per-thread too.
> 
> PageSearchStatus is actually a good piece of struct which we can leverage
> if we want to have multiple threads sending pages.  Sending a single guest
> page may not make sense, so we make the granule to be "host page", and in
> the PSS structure we allow specify a QEMUFile* to migrate a specific host
> page.  Then we open the possibility to specify different channels in
> different threads with different PSS structures.
> 
> The PSS prefix can be slightly misleading here because e.g. for the
> upcoming usage of postcopy channel/thread it's not "searching" (or,
> scanning) at all but sending the explicit page that was requested.  However
> since PSS existed for some years keep it as-is until someone complains.
> 
> This patch mostly (simply) replace rs->f with pss->pss_channel only. No
> functional change intended for this patch yet.  But it does prepare to
> finally drop rs->f, and make ram_save_guest_page() thread safe.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/ram.c | 70 +++++++++++++++++++++++++++----------------------
>  1 file changed, 38 insertions(+), 32 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 3f720b6de2..40ff5dc49f 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -446,6 +446,8 @@ void dirty_sync_missed_zero_copy(void)
>  
>  /* used by the search for pages to send */
>  struct PageSearchStatus {
> +    /* The migration channel used for a specific host page */
> +    QEMUFile    *pss_channel;
>      /* Current block being searched */
>      RAMBlock    *block;
>      /* Current page to search from */
> @@ -768,9 +770,9 @@ static void xbzrle_cache_zero_page(RAMState *rs, ram_addr_t current_addr)
>   * @block: block that contains the page we want to send
>   * @offset: offset inside the block for the page
>   */
> -static int save_xbzrle_page(RAMState *rs, uint8_t **current_data,
> -                            ram_addr_t current_addr, RAMBlock *block,
> -                            ram_addr_t offset)
> +static int save_xbzrle_page(RAMState *rs, QEMUFile *file,
> +                            uint8_t **current_data, ram_addr_t current_addr,
> +                            RAMBlock *block, ram_addr_t offset)
>  {
>      int encoded_len = 0, bytes_xbzrle;
>      uint8_t *prev_cached_page;
> @@ -838,11 +840,11 @@ static int save_xbzrle_page(RAMState *rs, uint8_t **current_data,
>      }
>  
>      /* Send XBZRLE based compressed page */
> -    bytes_xbzrle = save_page_header(rs, rs->f, block,
> +    bytes_xbzrle = save_page_header(rs, file, block,
>                                      offset | RAM_SAVE_FLAG_XBZRLE);
> -    qemu_put_byte(rs->f, ENCODING_FLAG_XBZRLE);
> -    qemu_put_be16(rs->f, encoded_len);
> -    qemu_put_buffer(rs->f, XBZRLE.encoded_buf, encoded_len);
> +    qemu_put_byte(file, ENCODING_FLAG_XBZRLE);
> +    qemu_put_be16(file, encoded_len);
> +    qemu_put_buffer(file, XBZRLE.encoded_buf, encoded_len);
>      bytes_xbzrle += encoded_len + 1 + 2;
>      /*
>       * Like compressed_size (please see update_compress_thread_counts),
> @@ -1298,9 +1300,10 @@ static int save_zero_page_to_file(RAMState *rs, QEMUFile *file,
>   * @block: block that contains the page we want to send
>   * @offset: offset inside the block for the page
>   */
> -static int save_zero_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
> +static int save_zero_page(RAMState *rs, QEMUFile *file, RAMBlock *block,
> +                          ram_addr_t offset)
>  {
> -    int len = save_zero_page_to_file(rs, rs->f, block, offset);
> +    int len = save_zero_page_to_file(rs, file, block, offset);
>  
>      if (len) {
>          qatomic_inc(&ram_counters.duplicate);
> @@ -1317,15 +1320,15 @@ static int save_zero_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
>   *
>   * Return true if the pages has been saved, otherwise false is returned.
>   */
> -static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
> -                              int *pages)
> +static bool control_save_page(PageSearchStatus *pss, RAMBlock *block,
> +                              ram_addr_t offset, int *pages)
>  {
>      uint64_t bytes_xmit = 0;
>      int ret;
>  
>      *pages = -1;
> -    ret = ram_control_save_page(rs->f, block->offset, offset, TARGET_PAGE_SIZE,
> -                                &bytes_xmit);
> +    ret = ram_control_save_page(pss->pss_channel, block->offset, offset,
> +                                TARGET_PAGE_SIZE, &bytes_xmit);
>      if (ret == RAM_SAVE_CONTROL_NOT_SUPP) {
>          return false;
>      }
> @@ -1359,17 +1362,17 @@ static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
>   * @buf: the page to be sent
>   * @async: send to page asyncly
>   */
> -static int save_normal_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
> -                            uint8_t *buf, bool async)
> +static int save_normal_page(RAMState *rs, QEMUFile *file, RAMBlock *block,
> +                            ram_addr_t offset, uint8_t *buf, bool async)
>  {
> -    ram_transferred_add(save_page_header(rs, rs->f, block,
> +    ram_transferred_add(save_page_header(rs, file, block,
>                                           offset | RAM_SAVE_FLAG_PAGE));
>      if (async) {
> -        qemu_put_buffer_async(rs->f, buf, TARGET_PAGE_SIZE,
> +        qemu_put_buffer_async(file, buf, TARGET_PAGE_SIZE,
>                                migrate_release_ram() &&
>                                migration_in_postcopy());
>      } else {
> -        qemu_put_buffer(rs->f, buf, TARGET_PAGE_SIZE);
> +        qemu_put_buffer(file, buf, TARGET_PAGE_SIZE);
>      }
>      ram_transferred_add(TARGET_PAGE_SIZE);
>      qatomic_inc(&ram_counters.normal);
> @@ -1402,8 +1405,8 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
>  
>      XBZRLE_cache_lock();
>      if (rs->xbzrle_enabled && !migration_in_postcopy()) {
> -        pages = save_xbzrle_page(rs, &p, current_addr, block,
> -                                 offset);
> +        pages = save_xbzrle_page(rs, pss->pss_channel, &p, current_addr,
> +                                 block, offset);
>          if (!rs->last_stage) {
>              /* Can't send this cached data async, since the cache page
>               * might get updated before it gets to the wire
> @@ -1414,7 +1417,8 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
>  
>      /* XBZRLE overflow or normal page */
>      if (pages == -1) {
> -        pages = save_normal_page(rs, block, offset, p, send_async);
> +        pages = save_normal_page(rs, pss->pss_channel, block, offset,
> +                                 p, send_async);
>      }
>  
>      XBZRLE_cache_unlock();
> @@ -1422,10 +1426,10 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
>      return pages;
>  }
>  
> -static int ram_save_multifd_page(RAMState *rs, RAMBlock *block,
> +static int ram_save_multifd_page(QEMUFile *file, RAMBlock *block,
>                                   ram_addr_t offset)
>  {
> -    if (multifd_queue_page(rs->f, block, offset) < 0) {
> +    if (multifd_queue_page(file, block, offset) < 0) {
>          return -1;
>      }
>      ram_counters.normal++;
> @@ -1720,7 +1724,7 @@ static int ram_save_release_protection(RAMState *rs, PageSearchStatus *pss,
>          uint64_t run_length = (pss->page - start_page) << TARGET_PAGE_BITS;
>  
>          /* Flush async buffers before un-protect. */
> -        qemu_fflush(rs->f);
> +        qemu_fflush(pss->pss_channel);
>          /* Un-protect memory range. */
>          res = uffd_change_protection(rs->uffdio_fd, page_address, run_length,
>                  false, false);
> @@ -2307,7 +2311,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
>      ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
>      int res;
>  
> -    if (control_save_page(rs, block, offset, &res)) {
> +    if (control_save_page(pss, block, offset, &res)) {
>          return res;
>      }
>  
> @@ -2315,7 +2319,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
>          return 1;
>      }
>  
> -    res = save_zero_page(rs, block, offset);
> +    res = save_zero_page(rs, pss->pss_channel, block, offset);
>      if (res > 0) {
>          /* Must let xbzrle know, otherwise a previous (now 0'd) cached
>           * page would be stale
> @@ -2336,7 +2340,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
>       */
>      if (!save_page_use_compression(rs) && migrate_use_multifd()
>          && !migration_in_postcopy()) {
> -        return ram_save_multifd_page(rs, block, offset);
> +        return ram_save_multifd_page(pss->pss_channel, block, offset);
>      }
>  
>      return ram_save_page(rs, pss);
> @@ -2533,10 +2537,6 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
>          return 0;
>      }
>  
> -    if (postcopy_preempt_active()) {
> -        postcopy_preempt_choose_channel(rs, pss);
> -    }
> -
>      /* Update host page boundary information */
>      pss_host_page_prepare(pss);
>  
> @@ -2597,7 +2597,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
>       * explicit flush or it won't flush until the buffer is full.
>       */
>      if (migrate_postcopy_preempt() && pss->postcopy_requested) {
> -        qemu_fflush(rs->f);
> +        qemu_fflush(pss->pss_channel);
>      }
>  
>      res = ram_save_release_protection(rs, pss, start_page);
> @@ -2663,6 +2663,12 @@ static int ram_find_and_save_block(RAMState *rs)
>          }
>  
>          if (found) {
> +            /* Update rs->f with correct channel */
> +            if (postcopy_preempt_active()) {
> +                postcopy_preempt_choose_channel(rs, &pss);
> +            }
> +            /* Cache rs->f in pss_channel (TODO: remove rs->f) */
> +            pss.pss_channel = rs->f;
>              pages = ram_save_host_page(rs, &pss);
>          }
>      } while (!pages && again);
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 09/14] migration: Add pss_init()
  2022-09-20 22:52 ` [PATCH 09/14] migration: Add pss_init() Peter Xu
@ 2022-10-05 13:09   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2022-10-05 13:09 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

* Peter Xu (peterx@redhat.com) wrote:
> Helper to init PSS structures.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/ram.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 40ff5dc49f..b4b36ca59e 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -535,6 +535,14 @@ static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
>  static void postcopy_preempt_restore(RAMState *rs, PageSearchStatus *pss,
>                                       bool postcopy_requested);
>  
> +/* NOTE: page is the PFN not real ram_addr_t. */
> +static void pss_init(PageSearchStatus *pss, RAMBlock *rb, ram_addr_t page)
> +{
> +    pss->block = rb;
> +    pss->page = page;
> +    pss->complete_round = false;
> +}
> +
>  static void *do_data_compress(void *opaque)
>  {
>      CompressParam *param = opaque;
> @@ -2640,9 +2648,7 @@ static int ram_find_and_save_block(RAMState *rs)
>          rs->last_page = 0;
>      }
>  
> -    pss.block = rs->last_seen_block;
> -    pss.page = rs->last_page;
> -    pss.complete_round = false;
> +    pss_init(&pss, rs->last_seen_block, rs->last_page);
>  
>      do {
>          again = true;
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 05/14] migration: Yield bitmap_mutex properly when sending/sleeping
  2022-10-05 11:18       ` Dr. David Alan Gilbert
@ 2022-10-05 13:40         ` Peter Xu
  2022-10-05 19:48           ` Peter Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-10-05 13:40 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

On Wed, Oct 05, 2022 at 12:18:00PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Tue, Oct 04, 2022 at 02:55:10PM +0100, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > Don't take the bitmap mutex when sending pages, or when being throttled by
> > > > migration_rate_limit() (which is a bit tricky to call it here in ram code,
> > > > but seems still helpful).
> > > > 
> > > > It prepares for the possibility of concurrently sending pages in >1 threads
> > > > using the function ram_save_host_page() because all threads may need the
> > > > bitmap_mutex to operate on bitmaps, so that either sendmsg() or any kind of
> > > > qemu_sem_wait() blocking for one thread will not block the other from
> > > > progressing.
> > > > 
> > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > 
> > > I generally dont like taking locks conditionally; but this kind of looks
> > > OK; I think it needs a big comment on the start of the function saying
> > > that it's called and left with the lock held but that it might drop it
> > > temporarily.
> > 
> > Right, the code is slightly hard to read, I just didn't yet see a good and
> > easy solution for it yet.  It's just that we may still want to keep the
> > lock as long as possible for precopy in one shot.
> > 
> > > 
> > > > ---
> > > >  migration/ram.c | 42 +++++++++++++++++++++++++++++++-----------
> > > >  1 file changed, 31 insertions(+), 11 deletions(-)
> > > > 
> > > > diff --git a/migration/ram.c b/migration/ram.c
> > > > index 8303252b6d..6e7de6087a 100644
> > > > --- a/migration/ram.c
> > > > +++ b/migration/ram.c
> > > > @@ -2463,6 +2463,7 @@ static void postcopy_preempt_reset_channel(RAMState *rs)
> > > >   */
> > > >  static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
> > > >  {
> > > > +    bool page_dirty, release_lock = postcopy_preempt_active();
> > > 
> > > Could you rename that to something like 'drop_lock' - you are taking the
> > > lock at the end even when you have 'release_lock' set - which is a bit
> > > strange naming.
> > 
> > Is there any difference on "drop" or "release"?  I'll change the name
> > anyway since I definitely trust you on any English comments, but please
> > still let me know - I love to learn more on those! :)
> 
> I'm not sure 'drop' is much better either; I was struggling to find a
> good nam.

I can also call it "preempt_enabled".

Actually I can directly replace it with calling postcopy_preempt_active()
always but I just want to make it crystal clear that the value is not
changing and lock & unlock are always paired - in our case I think it is
not changing, but the var helps to be 100% sure there'll be no possible bug
on e.g. deadlock caused by state changing.

> 
> > > 
> > > >      int tmppages, pages = 0;
> > > >      size_t pagesize_bits =
> > > >          qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
> > > > @@ -2486,22 +2487,41 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
> > > >              break;
> > > >          }
> > > >  
> > > > +        page_dirty = migration_bitmap_clear_dirty(rs, pss->block, pss->page);
> > > > +        /*
> > > > +         * Properly yield the lock only in postcopy preempt mode because
> > > > +         * both migration thread and rp-return thread can operate on the
> > > > +         * bitmaps.
> > > > +         */
> > > > +        if (release_lock) {
> > > > +            qemu_mutex_unlock(&rs->bitmap_mutex);
> > > > +        }
> > > 
> > > Shouldn't the unlock/lock move inside the 'if (page_dirty) {' ?
> > 
> > I think we can move into it, but it may not be as optimal as keeping it
> > as-is.
> > 
> > Consider a case where we've got the bitmap with continous zero bits.
> > During postcopy, the migration thread could be spinning here with the lock
> > held even if it doesn't send a thing.  It could still block the other
> > return path thread on sending urgent pages which may be outside the zero
> > zones.
> 
> OK, that reason needs commenting then - you're going to do a lot of
> release/take pairs in that case which is going to show up as very hot;
> so hmm, if ti was just for that type of 'yield' behaviour you wouldn't
> normally do it for each bit.

Hold on.. I think my assumption won't easily trigger, because at the end of
the loop we'll try to look for the next "dirty" page.  So continuously
clean pages are unlikely, or I even think it's impossible because we're
holding the mutex during scanning and clear-dirty, so no one will be able
to flip the bit.

So yeah I think it's okay to move it into "page_dirty", but since we'll
mostly always go into dirty maybe it's just that it won't help a lot
either, because it'll be mostly the same as keeping it outside?

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/14] migration: Use atomic ops properly for page accountings
  2022-10-05 11:38       ` Dr. David Alan Gilbert
@ 2022-10-05 13:53         ` Peter Xu
  2022-10-06 20:40           ` Peter Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-10-05 13:53 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

On Wed, Oct 05, 2022 at 12:38:05PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Tue, Oct 04, 2022 at 05:59:36PM +0100, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > To prepare for thread-safety on page accountings, at least below counters
> > > > need to be accessed only atomically, they are:
> > > > 
> > > >         ram_counters.transferred
> > > >         ram_counters.duplicate
> > > >         ram_counters.normal
> > > >         ram_counters.postcopy_bytes
> > > > 
> > > > There are a lot of other counters but they won't be accessed outside
> > > > migration thread, then they're still safe to be accessed without atomic
> > > > ops.
> > > > 
> > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > 
> > > I think this is OK; I'm not sure whether the memset 0's of ram_counters
> > > technically need changing.
> > 
> > IMHO they're fine - what we need there should be thing like WRITE_ONCE()
> > just to make sure no register caches (actually atomic_write() is normally
> > implemented with WRITE_ONCE afaik).  But I think that's already guaranteed
> > by memset() as the function call does, so we should be 100% safe.
> 
> I agree you're probably OK.
> 
> > > I'd love to put a comment somewhere saying these fields need to be
> > > atomically read, but their qapi defined so I don't think we can.
> > 
> > How about I add a comment above ram_counters declarations in ram.c?
> 
> Yeh.
> 
> > > 
> > > Finally, we probably need to check these are happy on 32 bit builds,
> > > sometimes it's a bit funny with atomic adds.
> > 
> > Yeah.. I hope using qatomic_*() APIs can help me avoid any issues.  Or
> > anything concerning?  I'd be happy to test on specific things if there are.
> 
> I just remember hitting problems in the past; especially if we end up
> with trying to do a 64 bit atomic on a platofmr that can only do 32???

I see what you meant... when I was looking in the existing callers of
qatomic_add(), I do find that we seem to have Stat64 just for that
!CONFIG_ATOMIC64 problem.

I'll dig a bit on whether and how we can do that; the thing is these
counters are in the qapi so I need to make sure it can support Stat64
somehow.  Hmm..

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 10/14] migration: Make PageSearchStatus part of RAMState
  2022-09-20 22:52 ` [PATCH 10/14] migration: Make PageSearchStatus part of RAMState Peter Xu
@ 2022-10-05 18:51   ` Dr. David Alan Gilbert
  2022-10-05 19:41     ` Peter Xu
  2022-10-06  8:37   ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2022-10-05 18:51 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

* Peter Xu (peterx@redhat.com) wrote:
> We used to allocate PSS structure on the stack for precopy when sending
> pages.  Make it static, so as to describe per-channel ram migration status.
> 
> Here we declared RAM_CHANNEL_MAX instances, preparing for postcopy to use
> it, even though this patch has not yet to start using the 2nd instance.
> 
> This should not have any functional change per se, but it already starts to
> export PSS information via the RAMState, so that e.g. one PSS channel can
> start to reference the other PSS channel.
> 
> Always protect PSS access using the same RAMState.bitmap_mutex.  We already
> do so, so no code change needed, just some comment update.  Maybe we should
> consider renaming bitmap_mutex some day as it's going to be a more commonly
> and big mutex we use for ram states, but just leave it for later.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/ram.c | 112 ++++++++++++++++++++++++++----------------------
>  1 file changed, 61 insertions(+), 51 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index b4b36ca59e..dbe11e1ace 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -85,6 +85,46 @@
>  
>  XBZRLECacheStats xbzrle_counters;
>  
> +/* used by the search for pages to send */
> +struct PageSearchStatus {
> +    /* The migration channel used for a specific host page */
> +    QEMUFile    *pss_channel;
> +    /* Current block being searched */
> +    RAMBlock    *block;
> +    /* Current page to search from */
> +    unsigned long page;
> +    /* Set once we wrap around */
> +    bool         complete_round;
> +    /*
> +     * [POSTCOPY-ONLY] Whether current page is explicitly requested by
> +     * postcopy.  When set, the request is "urgent" because the dest QEMU
> +     * threads are waiting for us.
> +     */
> +    bool         postcopy_requested;
> +    /*
> +     * [POSTCOPY-ONLY] The target channel to use to send current page.
> +     *
> +     * Note: This may _not_ match with the value in postcopy_requested
> +     * above. Let's imagine the case where the postcopy request is exactly
> +     * the page that we're sending in progress during precopy. In this case
> +     * we'll have postcopy_requested set to true but the target channel
> +     * will be the precopy channel (so that we don't split brain on that
> +     * specific page since the precopy channel already contains partial of
> +     * that page data).
> +     *
> +     * Besides that specific use case, postcopy_target_channel should
> +     * always be equal to postcopy_requested, because by default we send
> +     * postcopy pages via postcopy preempt channel.
> +     */
> +    bool         postcopy_target_channel;
> +    /* Whether we're sending a host page */
> +    bool          host_page_sending;
> +    /* The start/end of current host page.  Invalid if host_page_sending==false */
> +    unsigned long host_page_start;
> +    unsigned long host_page_end;
> +};
> +typedef struct PageSearchStatus PageSearchStatus;
> +
>  /* struct contains XBZRLE cache and a static page
>     used by the compression */
>  static struct {
> @@ -319,6 +359,11 @@ typedef struct {
>  struct RAMState {
>      /* QEMUFile used for this migration */
>      QEMUFile *f;
> +    /*
> +     * PageSearchStatus structures for the channels when send pages.
> +     * Protected by the bitmap_mutex.
> +     */
> +    PageSearchStatus pss[RAM_CHANNEL_MAX];

Why statically size this rather than allocate it in ram_state_init ?

Dave

>      /* UFFD file descriptor, used in 'write-tracking' migration */
>      int uffdio_fd;
>      /* Last block that we have visited searching for dirty pages */
> @@ -362,7 +407,12 @@ struct RAMState {
>      uint64_t target_page_count;
>      /* number of dirty bits in the bitmap */
>      uint64_t migration_dirty_pages;
> -    /* Protects modification of the bitmap and migration dirty pages */
> +    /*
> +     * Protects:
> +     * - dirty/clear bitmap
> +     * - migration_dirty_pages
> +     * - pss structures
> +     */
>      QemuMutex bitmap_mutex;
>      /* The RAMBlock used in the last src_page_requests */
>      RAMBlock *last_req_rb;
> @@ -444,46 +494,6 @@ void dirty_sync_missed_zero_copy(void)
>      ram_counters.dirty_sync_missed_zero_copy++;
>  }
>  
> -/* used by the search for pages to send */
> -struct PageSearchStatus {
> -    /* The migration channel used for a specific host page */
> -    QEMUFile    *pss_channel;
> -    /* Current block being searched */
> -    RAMBlock    *block;
> -    /* Current page to search from */
> -    unsigned long page;
> -    /* Set once we wrap around */
> -    bool         complete_round;
> -    /*
> -     * [POSTCOPY-ONLY] Whether current page is explicitly requested by
> -     * postcopy.  When set, the request is "urgent" because the dest QEMU
> -     * threads are waiting for us.
> -     */
> -    bool         postcopy_requested;
> -    /*
> -     * [POSTCOPY-ONLY] The target channel to use to send current page.
> -     *
> -     * Note: This may _not_ match with the value in postcopy_requested
> -     * above. Let's imagine the case where the postcopy request is exactly
> -     * the page that we're sending in progress during precopy. In this case
> -     * we'll have postcopy_requested set to true but the target channel
> -     * will be the precopy channel (so that we don't split brain on that
> -     * specific page since the precopy channel already contains partial of
> -     * that page data).
> -     *
> -     * Besides that specific use case, postcopy_target_channel should
> -     * always be equal to postcopy_requested, because by default we send
> -     * postcopy pages via postcopy preempt channel.
> -     */
> -    bool         postcopy_target_channel;
> -    /* Whether we're sending a host page */
> -    bool          host_page_sending;
> -    /* The start/end of current host page.  Only valid if host_page_sending==true */
> -    unsigned long host_page_start;
> -    unsigned long host_page_end;
> -};
> -typedef struct PageSearchStatus PageSearchStatus;
> -
>  CompressionStats compression_counters;
>  
>  struct CompressParam {
> @@ -2627,7 +2637,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
>   */
>  static int ram_find_and_save_block(RAMState *rs)
>  {
> -    PageSearchStatus pss;
> +    PageSearchStatus *pss = &rs->pss[RAM_CHANNEL_PRECOPY];
>      int pages = 0;
>      bool again, found;
>  
> @@ -2648,11 +2658,11 @@ static int ram_find_and_save_block(RAMState *rs)
>          rs->last_page = 0;
>      }
>  
> -    pss_init(&pss, rs->last_seen_block, rs->last_page);
> +    pss_init(pss, rs->last_seen_block, rs->last_page);
>  
>      do {
>          again = true;
> -        found = get_queued_page(rs, &pss);
> +        found = get_queued_page(rs, pss);
>  
>          if (!found) {
>              /*
> @@ -2660,27 +2670,27 @@ static int ram_find_and_save_block(RAMState *rs)
>               * preempted precopy.  Otherwise find the next dirty bit.
>               */
>              if (postcopy_preempt_triggered(rs)) {
> -                postcopy_preempt_restore(rs, &pss, false);
> +                postcopy_preempt_restore(rs, pss, false);
>                  found = true;
>              } else {
>                  /* priority queue empty, so just search for something dirty */
> -                found = find_dirty_block(rs, &pss, &again);
> +                found = find_dirty_block(rs, pss, &again);
>              }
>          }
>  
>          if (found) {
>              /* Update rs->f with correct channel */
>              if (postcopy_preempt_active()) {
> -                postcopy_preempt_choose_channel(rs, &pss);
> +                postcopy_preempt_choose_channel(rs, pss);
>              }
>              /* Cache rs->f in pss_channel (TODO: remove rs->f) */
> -            pss.pss_channel = rs->f;
> -            pages = ram_save_host_page(rs, &pss);
> +            pss->pss_channel = rs->f;
> +            pages = ram_save_host_page(rs, pss);
>          }
>      } while (!pages && again);
>  
> -    rs->last_seen_block = pss.block;
> -    rs->last_page = pss.page;
> +    rs->last_seen_block = pss->block;
> +    rs->last_page = pss->page;
>  
>      return pages;
>  }
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 10/14] migration: Make PageSearchStatus part of RAMState
  2022-10-05 18:51   ` Dr. David Alan Gilbert
@ 2022-10-05 19:41     ` Peter Xu
  2022-10-06  8:36       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-10-05 19:41 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

On Wed, Oct 05, 2022 at 07:51:34PM +0100, Dr. David Alan Gilbert wrote:
> >  /* struct contains XBZRLE cache and a static page
> >     used by the compression */
> >  static struct {
> > @@ -319,6 +359,11 @@ typedef struct {
> >  struct RAMState {
> >      /* QEMUFile used for this migration */
> >      QEMUFile *f;
> > +    /*
> > +     * PageSearchStatus structures for the channels when send pages.
> > +     * Protected by the bitmap_mutex.
> > +     */
> > +    PageSearchStatus pss[RAM_CHANNEL_MAX];
> 
> Why statically size this rather than allocate it in ram_state_init ?

I don't strongly feel like it needs the complexity? As there're only at
most 2 channels anyway, so the best chance is we save ~56 bytes on src qemu
but only during migration (RAMState allocated only in ram setup).

If you think we should still do the dynamic allcation, definitely doable on
my side too.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 05/14] migration: Yield bitmap_mutex properly when sending/sleeping
  2022-10-05 13:40         ` Peter Xu
@ 2022-10-05 19:48           ` Peter Xu
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Xu @ 2022-10-05 19:48 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

On Wed, Oct 05, 2022 at 09:40:53AM -0400, Peter Xu wrote:
> On Wed, Oct 05, 2022 at 12:18:00PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Tue, Oct 04, 2022 at 02:55:10PM +0100, Dr. David Alan Gilbert wrote:
> > > > * Peter Xu (peterx@redhat.com) wrote:
> > > > > Don't take the bitmap mutex when sending pages, or when being throttled by
> > > > > migration_rate_limit() (which is a bit tricky to call it here in ram code,
> > > > > but seems still helpful).
> > > > > 
> > > > > It prepares for the possibility of concurrently sending pages in >1 threads
> > > > > using the function ram_save_host_page() because all threads may need the
> > > > > bitmap_mutex to operate on bitmaps, so that either sendmsg() or any kind of
> > > > > qemu_sem_wait() blocking for one thread will not block the other from
> > > > > progressing.
> > > > > 
> > > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > 
> > > > I generally dont like taking locks conditionally; but this kind of looks
> > > > OK; I think it needs a big comment on the start of the function saying
> > > > that it's called and left with the lock held but that it might drop it
> > > > temporarily.
> > > 
> > > Right, the code is slightly hard to read, I just didn't yet see a good and
> > > easy solution for it yet.  It's just that we may still want to keep the
> > > lock as long as possible for precopy in one shot.
> > > 
> > > > 
> > > > > ---
> > > > >  migration/ram.c | 42 +++++++++++++++++++++++++++++++-----------
> > > > >  1 file changed, 31 insertions(+), 11 deletions(-)
> > > > > 
> > > > > diff --git a/migration/ram.c b/migration/ram.c
> > > > > index 8303252b6d..6e7de6087a 100644
> > > > > --- a/migration/ram.c
> > > > > +++ b/migration/ram.c
> > > > > @@ -2463,6 +2463,7 @@ static void postcopy_preempt_reset_channel(RAMState *rs)
> > > > >   */
> > > > >  static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
> > > > >  {
> > > > > +    bool page_dirty, release_lock = postcopy_preempt_active();
> > > > 
> > > > Could you rename that to something like 'drop_lock' - you are taking the
> > > > lock at the end even when you have 'release_lock' set - which is a bit
> > > > strange naming.
> > > 
> > > Is there any difference on "drop" or "release"?  I'll change the name
> > > anyway since I definitely trust you on any English comments, but please
> > > still let me know - I love to learn more on those! :)
> > 
> > I'm not sure 'drop' is much better either; I was struggling to find a
> > good nam.
> 
> I can also call it "preempt_enabled".
> 
> Actually I can directly replace it with calling postcopy_preempt_active()
> always but I just want to make it crystal clear that the value is not
> changing and lock & unlock are always paired - in our case I think it is
> not changing, but the var helps to be 100% sure there'll be no possible bug
> on e.g. deadlock caused by state changing.
> 
> > 
> > > > 
> > > > >      int tmppages, pages = 0;
> > > > >      size_t pagesize_bits =
> > > > >          qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
> > > > > @@ -2486,22 +2487,41 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
> > > > >              break;
> > > > >          }
> > > > >  
> > > > > +        page_dirty = migration_bitmap_clear_dirty(rs, pss->block, pss->page);
> > > > > +        /*
> > > > > +         * Properly yield the lock only in postcopy preempt mode because
> > > > > +         * both migration thread and rp-return thread can operate on the
> > > > > +         * bitmaps.
> > > > > +         */
> > > > > +        if (release_lock) {
> > > > > +            qemu_mutex_unlock(&rs->bitmap_mutex);
> > > > > +        }
> > > > 
> > > > Shouldn't the unlock/lock move inside the 'if (page_dirty) {' ?
> > > 
> > > I think we can move into it, but it may not be as optimal as keeping it
> > > as-is.
> > > 
> > > Consider a case where we've got the bitmap with continous zero bits.
> > > During postcopy, the migration thread could be spinning here with the lock
> > > held even if it doesn't send a thing.  It could still block the other
> > > return path thread on sending urgent pages which may be outside the zero
> > > zones.
> > 
> > OK, that reason needs commenting then - you're going to do a lot of
> > release/take pairs in that case which is going to show up as very hot;
> > so hmm, if ti was just for that type of 'yield' behaviour you wouldn't
> > normally do it for each bit.
> 
> Hold on.. I think my assumption won't easily trigger, because at the end of
> the loop we'll try to look for the next "dirty" page.  So continuously
> clean pages are unlikely, or I even think it's impossible because we're
> holding the mutex during scanning and clear-dirty, so no one will be able
> to flip the bit.
> 
> So yeah I think it's okay to move it into "page_dirty", but since we'll
> mostly always go into dirty maybe it's just that it won't help a lot
> either, because it'll be mostly the same as keeping it outside?

IOW, maybe I should drop page_dirty directly and replace it with a check,
failing migration if migration_bitmap_clear_dirty() returned false?

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 10/14] migration: Make PageSearchStatus part of RAMState
  2022-10-05 19:41     ` Peter Xu
@ 2022-10-06  8:36       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2022-10-06  8:36 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

* Peter Xu (peterx@redhat.com) wrote:
> On Wed, Oct 05, 2022 at 07:51:34PM +0100, Dr. David Alan Gilbert wrote:
> > >  /* struct contains XBZRLE cache and a static page
> > >     used by the compression */
> > >  static struct {
> > > @@ -319,6 +359,11 @@ typedef struct {
> > >  struct RAMState {
> > >      /* QEMUFile used for this migration */
> > >      QEMUFile *f;
> > > +    /*
> > > +     * PageSearchStatus structures for the channels when send pages.
> > > +     * Protected by the bitmap_mutex.
> > > +     */
> > > +    PageSearchStatus pss[RAM_CHANNEL_MAX];
> > 
> > Why statically size this rather than allocate it in ram_state_init ?
> 
> I don't strongly feel like it needs the complexity? As there're only at
> most 2 channels anyway, so the best chance is we save ~56 bytes on src qemu
> but only during migration (RAMState allocated only in ram setup).
> 
> If you think we should still do the dynamic allcation, definitely doable on
> my side too.

Oh for 2 channels, yes that's fine - what confused me here was
'RAM_CHANNEL_...' - I forgot that was PRE vs POST - I was getting
confused with multifd channel numbering; too many concepts of 'channel'.

Dave

> -- 
> Peter Xu
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 10/14] migration: Make PageSearchStatus part of RAMState
  2022-09-20 22:52 ` [PATCH 10/14] migration: Make PageSearchStatus part of RAMState Peter Xu
  2022-10-05 18:51   ` Dr. David Alan Gilbert
@ 2022-10-06  8:37   ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2022-10-06  8:37 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

* Peter Xu (peterx@redhat.com) wrote:
> We used to allocate PSS structure on the stack for precopy when sending
> pages.  Make it static, so as to describe per-channel ram migration status.
> 
> Here we declared RAM_CHANNEL_MAX instances, preparing for postcopy to use
> it, even though this patch has not yet to start using the 2nd instance.
> 
> This should not have any functional change per se, but it already starts to
> export PSS information via the RAMState, so that e.g. one PSS channel can
> start to reference the other PSS channel.
> 
> Always protect PSS access using the same RAMState.bitmap_mutex.  We already
> do so, so no code change needed, just some comment update.  Maybe we should
> consider renaming bitmap_mutex some day as it's going to be a more commonly
> and big mutex we use for ram states, but just leave it for later.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/ram.c | 112 ++++++++++++++++++++++++++----------------------
>  1 file changed, 61 insertions(+), 51 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index b4b36ca59e..dbe11e1ace 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -85,6 +85,46 @@
>  
>  XBZRLECacheStats xbzrle_counters;
>  
> +/* used by the search for pages to send */
> +struct PageSearchStatus {
> +    /* The migration channel used for a specific host page */
> +    QEMUFile    *pss_channel;
> +    /* Current block being searched */
> +    RAMBlock    *block;
> +    /* Current page to search from */
> +    unsigned long page;
> +    /* Set once we wrap around */
> +    bool         complete_round;
> +    /*
> +     * [POSTCOPY-ONLY] Whether current page is explicitly requested by
> +     * postcopy.  When set, the request is "urgent" because the dest QEMU
> +     * threads are waiting for us.
> +     */
> +    bool         postcopy_requested;
> +    /*
> +     * [POSTCOPY-ONLY] The target channel to use to send current page.
> +     *
> +     * Note: This may _not_ match with the value in postcopy_requested
> +     * above. Let's imagine the case where the postcopy request is exactly
> +     * the page that we're sending in progress during precopy. In this case
> +     * we'll have postcopy_requested set to true but the target channel
> +     * will be the precopy channel (so that we don't split brain on that
> +     * specific page since the precopy channel already contains partial of
> +     * that page data).
> +     *
> +     * Besides that specific use case, postcopy_target_channel should
> +     * always be equal to postcopy_requested, because by default we send
> +     * postcopy pages via postcopy preempt channel.
> +     */
> +    bool         postcopy_target_channel;
> +    /* Whether we're sending a host page */
> +    bool          host_page_sending;
> +    /* The start/end of current host page.  Invalid if host_page_sending==false */
> +    unsigned long host_page_start;
> +    unsigned long host_page_end;
> +};
> +typedef struct PageSearchStatus PageSearchStatus;
> +
>  /* struct contains XBZRLE cache and a static page
>     used by the compression */
>  static struct {
> @@ -319,6 +359,11 @@ typedef struct {
>  struct RAMState {
>      /* QEMUFile used for this migration */
>      QEMUFile *f;
> +    /*
> +     * PageSearchStatus structures for the channels when send pages.
> +     * Protected by the bitmap_mutex.
> +     */
> +    PageSearchStatus pss[RAM_CHANNEL_MAX];
>      /* UFFD file descriptor, used in 'write-tracking' migration */
>      int uffdio_fd;
>      /* Last block that we have visited searching for dirty pages */
> @@ -362,7 +407,12 @@ struct RAMState {
>      uint64_t target_page_count;
>      /* number of dirty bits in the bitmap */
>      uint64_t migration_dirty_pages;
> -    /* Protects modification of the bitmap and migration dirty pages */
> +    /*
> +     * Protects:
> +     * - dirty/clear bitmap
> +     * - migration_dirty_pages
> +     * - pss structures
> +     */
>      QemuMutex bitmap_mutex;
>      /* The RAMBlock used in the last src_page_requests */
>      RAMBlock *last_req_rb;
> @@ -444,46 +494,6 @@ void dirty_sync_missed_zero_copy(void)
>      ram_counters.dirty_sync_missed_zero_copy++;
>  }
>  
> -/* used by the search for pages to send */
> -struct PageSearchStatus {
> -    /* The migration channel used for a specific host page */
> -    QEMUFile    *pss_channel;
> -    /* Current block being searched */
> -    RAMBlock    *block;
> -    /* Current page to search from */
> -    unsigned long page;
> -    /* Set once we wrap around */
> -    bool         complete_round;
> -    /*
> -     * [POSTCOPY-ONLY] Whether current page is explicitly requested by
> -     * postcopy.  When set, the request is "urgent" because the dest QEMU
> -     * threads are waiting for us.
> -     */
> -    bool         postcopy_requested;
> -    /*
> -     * [POSTCOPY-ONLY] The target channel to use to send current page.
> -     *
> -     * Note: This may _not_ match with the value in postcopy_requested
> -     * above. Let's imagine the case where the postcopy request is exactly
> -     * the page that we're sending in progress during precopy. In this case
> -     * we'll have postcopy_requested set to true but the target channel
> -     * will be the precopy channel (so that we don't split brain on that
> -     * specific page since the precopy channel already contains partial of
> -     * that page data).
> -     *
> -     * Besides that specific use case, postcopy_target_channel should
> -     * always be equal to postcopy_requested, because by default we send
> -     * postcopy pages via postcopy preempt channel.
> -     */
> -    bool         postcopy_target_channel;
> -    /* Whether we're sending a host page */
> -    bool          host_page_sending;
> -    /* The start/end of current host page.  Only valid if host_page_sending==true */
> -    unsigned long host_page_start;
> -    unsigned long host_page_end;
> -};
> -typedef struct PageSearchStatus PageSearchStatus;
> -
>  CompressionStats compression_counters;
>  
>  struct CompressParam {
> @@ -2627,7 +2637,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
>   */
>  static int ram_find_and_save_block(RAMState *rs)
>  {
> -    PageSearchStatus pss;
> +    PageSearchStatus *pss = &rs->pss[RAM_CHANNEL_PRECOPY];
>      int pages = 0;
>      bool again, found;
>  
> @@ -2648,11 +2658,11 @@ static int ram_find_and_save_block(RAMState *rs)
>          rs->last_page = 0;
>      }
>  
> -    pss_init(&pss, rs->last_seen_block, rs->last_page);
> +    pss_init(pss, rs->last_seen_block, rs->last_page);
>  
>      do {
>          again = true;
> -        found = get_queued_page(rs, &pss);
> +        found = get_queued_page(rs, pss);
>  
>          if (!found) {
>              /*
> @@ -2660,27 +2670,27 @@ static int ram_find_and_save_block(RAMState *rs)
>               * preempted precopy.  Otherwise find the next dirty bit.
>               */
>              if (postcopy_preempt_triggered(rs)) {
> -                postcopy_preempt_restore(rs, &pss, false);
> +                postcopy_preempt_restore(rs, pss, false);
>                  found = true;
>              } else {
>                  /* priority queue empty, so just search for something dirty */
> -                found = find_dirty_block(rs, &pss, &again);
> +                found = find_dirty_block(rs, pss, &again);
>              }
>          }
>  
>          if (found) {
>              /* Update rs->f with correct channel */
>              if (postcopy_preempt_active()) {
> -                postcopy_preempt_choose_channel(rs, &pss);
> +                postcopy_preempt_choose_channel(rs, pss);
>              }
>              /* Cache rs->f in pss_channel (TODO: remove rs->f) */
> -            pss.pss_channel = rs->f;
> -            pages = ram_save_host_page(rs, &pss);
> +            pss->pss_channel = rs->f;
> +            pages = ram_save_host_page(rs, pss);
>          }
>      } while (!pages && again);
>  
> -    rs->last_seen_block = pss.block;
> -    rs->last_page = pss.page;
> +    rs->last_seen_block = pss->block;
> +    rs->last_page = pss->page;
>  
>      return pages;
>  }
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 11/14] migration: Move last_sent_block into PageSearchStatus
  2022-09-20 22:52 ` [PATCH 11/14] migration: Move last_sent_block into PageSearchStatus Peter Xu
@ 2022-10-06 16:59   ` Dr. David Alan Gilbert
  2022-10-06 18:34     ` Peter Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2022-10-06 16:59 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

* Peter Xu (peterx@redhat.com) wrote:
> Since we use PageSearchStatus to represent a channel, it makes perfect
> sense to keep last_sent_block (aka, leverage RAM_SAVE_FLAG_CONTINUE) to be
> per-channel rather than global because each channel can be sending
> different pages on ramblocks.
> 
> Hence move it from RAMState into PageSearchStatus.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/ram.c | 71 ++++++++++++++++++++++++++++---------------------
>  1 file changed, 41 insertions(+), 30 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index dbe11e1ace..fdcb61a2c8 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -89,6 +89,8 @@ XBZRLECacheStats xbzrle_counters;
>  struct PageSearchStatus {
>      /* The migration channel used for a specific host page */
>      QEMUFile    *pss_channel;
> +    /* Last block from where we have sent data */
> +    RAMBlock *last_sent_block;
>      /* Current block being searched */
>      RAMBlock    *block;
>      /* Current page to search from */
> @@ -368,8 +370,6 @@ struct RAMState {
>      int uffdio_fd;
>      /* Last block that we have visited searching for dirty pages */
>      RAMBlock *last_seen_block;
> -    /* Last block from where we have sent data */
> -    RAMBlock *last_sent_block;
>      /* Last dirty target page we have sent */
>      ram_addr_t last_page;
>      /* last ram version we have seen */
> @@ -677,16 +677,17 @@ exit:
>   *
>   * Returns the number of bytes written
>   *
> - * @f: QEMUFile where to send the data
> + * @pss: current PSS channel status
>   * @block: block that contains the page we want to send
>   * @offset: offset inside the block for the page
>   *          in the lower bits, it contains flags
>   */
> -static size_t save_page_header(RAMState *rs, QEMUFile *f,  RAMBlock *block,
> +static size_t save_page_header(PageSearchStatus *pss, RAMBlock *block,
>                                 ram_addr_t offset)
>  {
>      size_t size, len;
> -    bool same_block = (block == rs->last_sent_block);
> +    bool same_block = (block == pss->last_sent_block);
> +    QEMUFile *f = pss->pss_channel;
>  
>      if (same_block) {
>          offset |= RAM_SAVE_FLAG_CONTINUE;
> @@ -699,7 +700,7 @@ static size_t save_page_header(RAMState *rs, QEMUFile *f,  RAMBlock *block,
>          qemu_put_byte(f, len);
>          qemu_put_buffer(f, (uint8_t *)block->idstr, len);
>          size += 1 + len;
> -        rs->last_sent_block = block;
> +        pss->last_sent_block = block;
>      }
>      return size;
>  }
> @@ -783,17 +784,19 @@ static void xbzrle_cache_zero_page(RAMState *rs, ram_addr_t current_addr)
>   *          -1 means that xbzrle would be longer than normal
>   *
>   * @rs: current RAM state
> + * @pss: current PSS channel
>   * @current_data: pointer to the address of the page contents
>   * @current_addr: addr of the page
>   * @block: block that contains the page we want to send
>   * @offset: offset inside the block for the page
>   */
> -static int save_xbzrle_page(RAMState *rs, QEMUFile *file,
> +static int save_xbzrle_page(RAMState *rs, PageSearchStatus *pss,
>                              uint8_t **current_data, ram_addr_t current_addr,
>                              RAMBlock *block, ram_addr_t offset)
>  {
>      int encoded_len = 0, bytes_xbzrle;
>      uint8_t *prev_cached_page;
> +    QEMUFile *file = pss->pss_channel;
>  
>      if (!cache_is_cached(XBZRLE.cache, current_addr,
>                           ram_counters.dirty_sync_count)) {
> @@ -858,7 +861,7 @@ static int save_xbzrle_page(RAMState *rs, QEMUFile *file,
>      }
>  
>      /* Send XBZRLE based compressed page */
> -    bytes_xbzrle = save_page_header(rs, file, block,
> +    bytes_xbzrle = save_page_header(pss, block,
>                                      offset | RAM_SAVE_FLAG_XBZRLE);
>      qemu_put_byte(file, ENCODING_FLAG_XBZRLE);
>      qemu_put_be16(file, encoded_len);
> @@ -1289,19 +1292,19 @@ static void ram_release_page(const char *rbname, uint64_t offset)
>   * Returns the size of data written to the file, 0 means the page is not
>   * a zero page
>   *
> - * @rs: current RAM state
> - * @file: the file where the data is saved
> + * @pss: current PSS channel
>   * @block: block that contains the page we want to send
>   * @offset: offset inside the block for the page
>   */
> -static int save_zero_page_to_file(RAMState *rs, QEMUFile *file,
> +static int save_zero_page_to_file(PageSearchStatus *pss,
>                                    RAMBlock *block, ram_addr_t offset)
>  {
>      uint8_t *p = block->host + offset;
> +    QEMUFile *file = pss->pss_channel;
>      int len = 0;
>  
>      if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
> -        len += save_page_header(rs, file, block, offset | RAM_SAVE_FLAG_ZERO);
> +        len += save_page_header(pss, block, offset | RAM_SAVE_FLAG_ZERO);
>          qemu_put_byte(file, 0);
>          len += 1;
>          ram_release_page(block->idstr, offset);
> @@ -1314,14 +1317,14 @@ static int save_zero_page_to_file(RAMState *rs, QEMUFile *file,
>   *
>   * Returns the number of pages written.
>   *
> - * @rs: current RAM state
> + * @pss: current PSS channel
>   * @block: block that contains the page we want to send
>   * @offset: offset inside the block for the page
>   */
> -static int save_zero_page(RAMState *rs, QEMUFile *file, RAMBlock *block,
> +static int save_zero_page(PageSearchStatus *pss, RAMBlock *block,
>                            ram_addr_t offset)
>  {
> -    int len = save_zero_page_to_file(rs, file, block, offset);
> +    int len = save_zero_page_to_file(pss, block, offset);
>  
>      if (len) {
>          qatomic_inc(&ram_counters.duplicate);
> @@ -1374,16 +1377,18 @@ static bool control_save_page(PageSearchStatus *pss, RAMBlock *block,
>   *
>   * Returns the number of pages written.
>   *
> - * @rs: current RAM state
> + * @pss: current PSS channel
>   * @block: block that contains the page we want to send
>   * @offset: offset inside the block for the page
>   * @buf: the page to be sent
>   * @async: send to page asyncly
>   */
> -static int save_normal_page(RAMState *rs, QEMUFile *file, RAMBlock *block,
> +static int save_normal_page(PageSearchStatus *pss, RAMBlock *block,
>                              ram_addr_t offset, uint8_t *buf, bool async)
>  {
> -    ram_transferred_add(save_page_header(rs, file, block,
> +    QEMUFile *file = pss->pss_channel;
> +
> +    ram_transferred_add(save_page_header(pss, block,
>                                           offset | RAM_SAVE_FLAG_PAGE));
>      if (async) {
>          qemu_put_buffer_async(file, buf, TARGET_PAGE_SIZE,
> @@ -1423,7 +1428,7 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
>  
>      XBZRLE_cache_lock();
>      if (rs->xbzrle_enabled && !migration_in_postcopy()) {
> -        pages = save_xbzrle_page(rs, pss->pss_channel, &p, current_addr,
> +        pages = save_xbzrle_page(rs, pss, &p, current_addr,
>                                   block, offset);
>          if (!rs->last_stage) {
>              /* Can't send this cached data async, since the cache page
> @@ -1435,8 +1440,7 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
>  
>      /* XBZRLE overflow or normal page */
>      if (pages == -1) {
> -        pages = save_normal_page(rs, pss->pss_channel, block, offset,
> -                                 p, send_async);
> +        pages = save_normal_page(pss, block, offset, p, send_async);
>      }
>  
>      XBZRLE_cache_unlock();
> @@ -1459,14 +1463,15 @@ static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
>                                   ram_addr_t offset, uint8_t *source_buf)
>  {
>      RAMState *rs = ram_state;
> +    PageSearchStatus *pss = &rs->pss[RAM_CHANNEL_PRECOPY];
>      uint8_t *p = block->host + offset;
>      int ret;
>  
> -    if (save_zero_page_to_file(rs, f, block, offset)) {
> +    if (save_zero_page_to_file(pss, block, offset)) {
>          return true;
>      }
>  
> -    save_page_header(rs, f, block, offset | RAM_SAVE_FLAG_COMPRESS_PAGE);
> +    save_page_header(pss, block, offset | RAM_SAVE_FLAG_COMPRESS_PAGE);
>  
>      /*
>       * copy it to a internal buffer to avoid it being modified by VM
> @@ -2286,7 +2291,8 @@ static bool save_page_use_compression(RAMState *rs)
>   * has been properly handled by compression, otherwise needs other
>   * paths to handle it
>   */
> -static bool save_compress_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
> +static bool save_compress_page(RAMState *rs, PageSearchStatus *pss,
> +                               RAMBlock *block, ram_addr_t offset)
>  {
>      if (!save_page_use_compression(rs)) {
>          return false;
> @@ -2302,7 +2308,7 @@ static bool save_compress_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
>       * We post the fist page as normal page as compression will take
>       * much CPU resource.
>       */
> -    if (block != rs->last_sent_block) {
> +    if (block != pss->last_sent_block) {
>          flush_compressed_data(rs);
>          return false;
>      }
> @@ -2333,11 +2339,11 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
>          return res;
>      }
>  
> -    if (save_compress_page(rs, block, offset)) {
> +    if (save_compress_page(rs, pss, block, offset)) {
>          return 1;
>      }
>  
> -    res = save_zero_page(rs, pss->pss_channel, block, offset);
> +    res = save_zero_page(pss, block, offset);
>      if (res > 0) {
>          /* Must let xbzrle know, otherwise a previous (now 0'd) cached
>           * page would be stale
> @@ -2469,7 +2475,7 @@ static void postcopy_preempt_choose_channel(RAMState *rs, PageSearchStatus *pss)
>           * If channel switched, reset last_sent_block since the old sent block
>           * may not be on the same channel.
>           */
> -        rs->last_sent_block = NULL;
> +        pss->last_sent_block = NULL;
>  
>          trace_postcopy_preempt_switch_channel(channel);
>      }
> @@ -2804,8 +2810,13 @@ static void ram_save_cleanup(void *opaque)
>  
>  static void ram_state_reset(RAMState *rs)
>  {
> +    int i;
> +
> +    for (i = 0; i < RAM_CHANNEL_MAX; i++) {
> +        rs->pss[i].last_sent_block = NULL;
> +    }
> +
>      rs->last_seen_block = NULL;
> -    rs->last_sent_block = NULL;
>      rs->last_page = 0;
>      rs->last_version = ram_list.version;
>      rs->xbzrle_enabled = false;
> @@ -2999,8 +3010,8 @@ void ram_postcopy_send_discard_bitmap(MigrationState *ms)
>      migration_bitmap_sync(rs);
>  
>      /* Easiest way to make sure we don't resume in the middle of a host-page */
> +    rs->pss[RAM_CHANNEL_PRECOPY].last_sent_block = NULL;

Why don't we reset the postcopy one here as well?

Dave

>      rs->last_seen_block = NULL;
> -    rs->last_sent_block = NULL;
>      rs->last_page = 0;
>  
>      postcopy_each_ram_send_discard(ms);
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 12/14] migration: Send requested page directly in rp-return thread
  2022-09-20 22:52 ` [PATCH 12/14] migration: Send requested page directly in rp-return thread Peter Xu
@ 2022-10-06 17:51   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2022-10-06 17:51 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

* Peter Xu (peterx@redhat.com) wrote:
> With all the facilities ready, send the requested page directly in the
> rp-return thread rather than queuing it in the request queue, if and only
> if postcopy preempt is enabled.  It can achieve so because it uses separate
> channel for sending urgent pages.  The only shared data is bitmap and it's
> protected by the bitmap_mutex.
> 
> Note that since we're moving the ownership of the urgent channel from the
> migration thread to rp thread it also means the rp thread is responsible
> for managing the qemufile, e.g. properly close it when pausing migration
> happens.  For this, let migration_release_from_dst_file to cover shutdown
> of the urgent channel too, renaming it as migration_release_dst_files() to
> better show what it does.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Yes, getting a bit complex, but I think OK.


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c |  35 +++++++------
>  migration/ram.c       | 112 ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 131 insertions(+), 16 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 0eacc0c99b..fae8fd378b 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2845,8 +2845,11 @@ static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
>      return 0;
>  }
>  
> -/* Release ms->rp_state.from_dst_file in a safe way */
> -static void migration_release_from_dst_file(MigrationState *ms)
> +/*
> + * Release ms->rp_state.from_dst_file (and postcopy_qemufile_src if
> + * existed) in a safe way.
> + */
> +static void migration_release_dst_files(MigrationState *ms)
>  {
>      QEMUFile *file;
>  
> @@ -2859,6 +2862,18 @@ static void migration_release_from_dst_file(MigrationState *ms)
>          ms->rp_state.from_dst_file = NULL;
>      }
>  
> +    /*
> +     * Do the same to postcopy fast path socket too if there is.  No
> +     * locking needed because this qemufile should only be managed by
> +     * return path thread.
> +     */
> +    if (ms->postcopy_qemufile_src) {
> +        migration_ioc_unregister_yank_from_file(ms->postcopy_qemufile_src);
> +        qemu_file_shutdown(ms->postcopy_qemufile_src);
> +        qemu_fclose(ms->postcopy_qemufile_src);
> +        ms->postcopy_qemufile_src = NULL;
> +    }
> +
>      qemu_fclose(file);
>  }
>  
> @@ -3003,7 +3018,7 @@ out:
>               * Maybe there is something we can do: it looks like a
>               * network down issue, and we pause for a recovery.
>               */
> -            migration_release_from_dst_file(ms);
> +            migration_release_dst_files(ms);
>              rp = NULL;
>              if (postcopy_pause_return_path_thread(ms)) {
>                  /*
> @@ -3021,7 +3036,7 @@ out:
>      }
>  
>      trace_source_return_path_thread_end();
> -    migration_release_from_dst_file(ms);
> +    migration_release_dst_files(ms);
>      rcu_unregister_thread();
>      return NULL;
>  }
> @@ -3544,18 +3559,6 @@ static MigThrError postcopy_pause(MigrationState *s)
>          qemu_file_shutdown(file);
>          qemu_fclose(file);
>  
> -        /*
> -         * Do the same to postcopy fast path socket too if there is.  No
> -         * locking needed because no racer as long as we do this before setting
> -         * status to paused.
> -         */
> -        if (s->postcopy_qemufile_src) {
> -            migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
> -            qemu_file_shutdown(s->postcopy_qemufile_src);
> -            qemu_fclose(s->postcopy_qemufile_src);
> -            s->postcopy_qemufile_src = NULL;
> -        }
> -
>          migrate_set_state(&s->state, s->state,
>                            MIGRATION_STATUS_POSTCOPY_PAUSED);
>  
> diff --git a/migration/ram.c b/migration/ram.c
> index fdcb61a2c8..fd301d793c 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -539,6 +539,8 @@ static QemuThread *decompress_threads;
>  static QemuMutex decomp_done_lock;
>  static QemuCond decomp_done_cond;
>  
> +static int ram_save_host_page_urgent(PageSearchStatus *pss);
> +
>  static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
>                                   ram_addr_t offset, uint8_t *source_buf);
>  
> @@ -553,6 +555,16 @@ static void pss_init(PageSearchStatus *pss, RAMBlock *rb, ram_addr_t page)
>      pss->complete_round = false;
>  }
>  
> +/*
> + * Check whether two PSSs are actively sending the same page.  Return true
> + * if it is, false otherwise.
> + */
> +static bool pss_overlap(PageSearchStatus *pss1, PageSearchStatus *pss2)
> +{
> +    return pss1->host_page_sending && pss2->host_page_sending &&
> +        (pss1->host_page_start == pss2->host_page_start);
> +}
> +
>  static void *do_data_compress(void *opaque)
>  {
>      CompressParam *param = opaque;
> @@ -2253,6 +2265,57 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len)
>          return -1;
>      }
>  
> +    /*
> +     * When with postcopy preempt, we send back the page directly in the
> +     * rp-return thread.
> +     */
> +    if (postcopy_preempt_active()) {
> +        ram_addr_t page_start = start >> TARGET_PAGE_BITS;
> +        size_t page_size = qemu_ram_pagesize(ramblock);
> +        PageSearchStatus *pss = &ram_state->pss[RAM_CHANNEL_POSTCOPY];
> +        int ret = 0;
> +
> +        qemu_mutex_lock(&rs->bitmap_mutex);
> +
> +        pss_init(pss, ramblock, page_start);
> +        /*
> +         * Always use the preempt channel, and make sure it's there.  It's
> +         * safe to access without lock, because when rp-thread is running
> +         * we should be the only one who operates on the qemufile
> +         */
> +        pss->pss_channel = migrate_get_current()->postcopy_qemufile_src;
> +        pss->postcopy_requested = true;
> +        assert(pss->pss_channel);
> +
> +        /*
> +         * It must be either one or multiple of host page size.  Just
> +         * assert; if something wrong we're mostly split brain anyway.
> +         */
> +        assert(len % page_size == 0);
> +        while (len) {
> +            if (ram_save_host_page_urgent(pss)) {
> +                error_report("%s: ram_save_host_page_urgent() failed: "
> +                             "ramblock=%s, start_addr=0x"RAM_ADDR_FMT,
> +                             __func__, ramblock->idstr, start);
> +                ret = -1;
> +                break;
> +            }
> +            /*
> +             * NOTE: after ram_save_host_page_urgent() succeeded, pss->page
> +             * will automatically be moved and point to the next host page
> +             * we're going to send, so no need to update here.
> +             *
> +             * Normally QEMU never sends >1 host page in requests, so
> +             * logically we don't even need that as the loop should only
> +             * run once, but just to be consistent.
> +             */
> +            len -= page_size;
> +        };
> +        qemu_mutex_unlock(&rs->bitmap_mutex);
> +
> +        return ret;
> +    }
> +
>      struct RAMSrcPageRequest *new_entry =
>          g_new0(struct RAMSrcPageRequest, 1);
>      new_entry->rb = ramblock;
> @@ -2531,6 +2594,55 @@ static void pss_host_page_finish(PageSearchStatus *pss)
>      pss->host_page_start = pss->host_page_end = 0;
>  }
>  
> +/*
> + * Send an urgent host page specified by `pss'.  Need to be called with
> + * bitmap_mutex held.
> + *
> + * Returns 0 if save host page succeeded, false otherwise.
> + */
> +static int ram_save_host_page_urgent(PageSearchStatus *pss)
> +{
> +    bool page_dirty, sent = false;
> +    RAMState *rs = ram_state;
> +    int ret = 0;
> +
> +    trace_postcopy_preempt_send_host_page(pss->block->idstr, pss->page);
> +    pss_host_page_prepare(pss);
> +
> +    /*
> +     * If precopy is sending the same page, let it be done in precopy, or
> +     * we could send the same page in two channels and none of them will
> +     * receive the whole page.
> +     */
> +    if (pss_overlap(pss, &ram_state->pss[RAM_CHANNEL_PRECOPY])) {
> +        trace_postcopy_preempt_hit(pss->block->idstr,
> +                                   pss->page << TARGET_PAGE_BITS);
> +        return 0;
> +    }
> +
> +    do {
> +        page_dirty = migration_bitmap_clear_dirty(rs, pss->block, pss->page);
> +
> +        if (page_dirty) {
> +            /* Be strict to return code; it must be 1, or what else? */
> +            if (ram_save_target_page(rs, pss) != 1) {
> +                error_report_once("%s: ram_save_target_page failed", __func__);
> +                ret = -1;
> +                goto out;
> +            }
> +            sent = true;
> +        }
> +        pss_find_next_dirty(pss);
> +    } while (pss_within_range(pss));
> +out:
> +    pss_host_page_finish(pss);
> +    /* For urgent requests, flush immediately if sent */
> +    if (sent) {
> +        qemu_fflush(pss->pss_channel);
> +    }
> +    return ret;
> +}
> +
>  /**
>   * ram_save_host_page: save a whole host page
>   *
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 13/14] migration: Remove old preempt code around state maintainance
  2022-09-21 13:54     ` Peter Xu
@ 2022-10-06 17:56       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2022-10-06 17:56 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

For the set of 3:


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

* Peter Xu (peterx@redhat.com) wrote:
> On Tue, Sep 20, 2022 at 08:47:20PM -0400, Peter Xu wrote:
> > On Tue, Sep 20, 2022 at 06:52:27PM -0400, Peter Xu wrote:
> > > With the new code to send pages in rp-return thread, there's little help to
> > > keep lots of the old code on maintaining the preempt state in migration
> > > thread, because the new way should always be faster..
> > > 
> > > Then if we'll always send pages in the rp-return thread anyway, we don't
> > > need those logic to maintain preempt state anymore because now we serialize
> > > things using the mutex directly instead of using those fields.
> > > 
> > > It's very unfortunate to have those code for a short period, but that's
> > > still one intermediate step that we noticed the next bottleneck on the
> > > migration thread.  Now what we can do best is to drop unnecessary code as
> > > long as the new code is stable to reduce the burden.  It's actually a good
> > > thing because the new "sending page in rp-return thread" model is (IMHO)
> > > even cleaner and with better performance.
> > > 
> > > Remove the old code that was responsible for maintaining preempt states, at
> > > the meantime also remove x-postcopy-preempt-break-huge parameter because
> > > with concurrent sender threads we don't really need to break-huge anymore.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  migration/migration.c |   2 -
> > >  migration/ram.c       | 258 +-----------------------------------------
> > >  2 files changed, 3 insertions(+), 257 deletions(-)
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index fae8fd378b..698fd94591 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -4399,8 +4399,6 @@ static Property migration_properties[] = {
> > >      DEFINE_PROP_SIZE("announce-step", MigrationState,
> > >                        parameters.announce_step,
> > >                        DEFAULT_MIGRATE_ANNOUNCE_STEP),
> > > -    DEFINE_PROP_BOOL("x-postcopy-preempt-break-huge", MigrationState,
> > > -                      postcopy_preempt_break_huge, true),
> > 
> > Forgot to drop the variable altogether:
> > 
> > diff --git a/migration/migration.h b/migration/migration.h
> > index cdad8aceaa..ae4ffd3454 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -340,13 +340,6 @@ struct MigrationState {
> >      bool send_configuration;
> >      /* Whether we send section footer during migration */
> >      bool send_section_footer;
> > -    /*
> > -     * Whether we allow break sending huge pages when postcopy preempt is
> > -     * enabled.  When disabled, we won't interrupt precopy within sending a
> > -     * host huge page, which is the old behavior of vanilla postcopy.
> > -     * NOTE: this parameter is ignored if postcopy preempt is not enabled.
> > -     */
> > -    bool postcopy_preempt_break_huge;
> >  
> >      /* Needed by postcopy-pause state */
> >      QemuSemaphore postcopy_pause_sem;
> > 
> > Will squash this in in next version.
> 
> Two more varialbes to drop, as attached..
> 
> 
> -- 
> Peter Xu

> From b3308e34398e21c19bd36ec21aae9c7f9f623d75 Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Wed, 21 Sep 2022 09:51:55 -0400
> Subject: [PATCH] fixup! migration: Remove old preempt code around state
>  maintainance
> Content-type: text/plain
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/ram.c | 33 ---------------------------------
>  1 file changed, 33 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 03bf2324ab..2599eee070 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -97,28 +97,6 @@ struct PageSearchStatus {
>      unsigned long page;
>      /* Set once we wrap around */
>      bool         complete_round;
> -    /*
> -     * [POSTCOPY-ONLY] Whether current page is explicitly requested by
> -     * postcopy.  When set, the request is "urgent" because the dest QEMU
> -     * threads are waiting for us.
> -     */
> -    bool         postcopy_requested;
> -    /*
> -     * [POSTCOPY-ONLY] The target channel to use to send current page.
> -     *
> -     * Note: This may _not_ match with the value in postcopy_requested
> -     * above. Let's imagine the case where the postcopy request is exactly
> -     * the page that we're sending in progress during precopy. In this case
> -     * we'll have postcopy_requested set to true but the target channel
> -     * will be the precopy channel (so that we don't split brain on that
> -     * specific page since the precopy channel already contains partial of
> -     * that page data).
> -     *
> -     * Besides that specific use case, postcopy_target_channel should
> -     * always be equal to postcopy_requested, because by default we send
> -     * postcopy pages via postcopy preempt channel.
> -     */
> -    bool         postcopy_target_channel;
>      /* Whether we're sending a host page */
>      bool          host_page_sending;
>      /* The start/end of current host page.  Invalid if host_page_sending==false */
> @@ -1573,13 +1551,6 @@ retry:
>   */
>  static bool find_dirty_block(RAMState *rs, PageSearchStatus *pss, bool *again)
>  {
> -    /*
> -     * This is not a postcopy requested page, mark it "not urgent", and use
> -     * precopy channel to send it.
> -     */
> -    pss->postcopy_requested = false;
> -    pss->postcopy_target_channel = RAM_CHANNEL_PRECOPY;
> -
>      /* Update pss->page for the next dirty bit in ramblock */
>      pss_find_next_dirty(pss);
>  
> @@ -2091,9 +2062,6 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
>           * really rare.
>           */
>          pss->complete_round = false;
> -        /* Mark it an urgent request, meanwhile using POSTCOPY channel */
> -        pss->postcopy_requested = true;
> -        pss->postcopy_target_channel = RAM_CHANNEL_POSTCOPY;
>      }
>  
>      return !!block;
> @@ -2190,7 +2158,6 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len)
>           * we should be the only one who operates on the qemufile
>           */
>          pss->pss_channel = migrate_get_current()->postcopy_qemufile_src;
> -        pss->postcopy_requested = true;
>          assert(pss->pss_channel);
>  
>          /*
> -- 
> 2.32.0
> 

-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 14/14] migration: Drop rs->f
  2022-09-20 22:52 ` [PATCH 14/14] migration: Drop rs->f Peter Xu
@ 2022-10-06 17:57   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2022-10-06 17:57 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

* Peter Xu (peterx@redhat.com) wrote:
> Now with rs->pss we can already cache channels in pss->pss_channels.  That
> pss_channel contains more infromation than rs->f because it's per-channel.
> So rs->f could be replaced by rss->pss[RAM_CHANNEL_PRECOPY].pss_channel,
> while rs->f itself is a bit vague now.
> 
> Note that vanilla postcopy still send pages via pss[RAM_CHANNEL_PRECOPY],
> that's slightly confusing but it reflects the reality.
> 
> Then, after the replacement we can safely drop rs->f.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/ram.c | 12 ++++--------
>  1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index f42efe02fc..03bf2324ab 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -345,8 +345,6 @@ struct RAMSrcPageRequest {
>  
>  /* State of RAM for migration */
>  struct RAMState {
> -    /* QEMUFile used for this migration */
> -    QEMUFile *f;
>      /*
>       * PageSearchStatus structures for the channels when send pages.
>       * Protected by the bitmap_mutex.
> @@ -2555,8 +2553,6 @@ static int ram_find_and_save_block(RAMState *rs)
>          }
>  
>          if (found) {
> -            /* Cache rs->f in pss_channel (TODO: remove rs->f) */
> -            pss->pss_channel = rs->f;
>              pages = ram_save_host_page(rs, pss);
>          }
>      } while (!pages && again);
> @@ -3112,7 +3108,7 @@ static void ram_state_resume_prepare(RAMState *rs, QEMUFile *out)
>      ram_state_reset(rs);
>  
>      /* Update RAMState cache of output QEMUFile */
> -    rs->f = out;
> +    rs->pss[RAM_CHANNEL_PRECOPY].pss_channel = out;
>  
>      trace_ram_state_resume_prepare(pages);
>  }
> @@ -3203,7 +3199,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>              return -1;
>          }
>      }
> -    (*rsp)->f = f;
> +    (*rsp)->pss[RAM_CHANNEL_PRECOPY].pss_channel = f;
>  
>      WITH_RCU_READ_LOCK_GUARD() {
>          qemu_put_be64(f, ram_bytes_total_common(true) | RAM_SAVE_FLAG_MEM_SIZE);
> @@ -3338,7 +3334,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>  out:
>      if (ret >= 0
>          && migration_is_setup_or_active(migrate_get_current()->state)) {
> -        ret = multifd_send_sync_main(rs->f);
> +        ret = multifd_send_sync_main(rs->pss[RAM_CHANNEL_PRECOPY].pss_channel);
>          if (ret < 0) {
>              return ret;
>          }
> @@ -3406,7 +3402,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>          return ret;
>      }
>  
> -    ret = multifd_send_sync_main(rs->f);
> +    ret = multifd_send_sync_main(rs->pss[RAM_CHANNEL_PRECOPY].pss_channel);
>      if (ret < 0) {
>          return ret;
>      }
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 11/14] migration: Move last_sent_block into PageSearchStatus
  2022-10-06 16:59   ` Dr. David Alan Gilbert
@ 2022-10-06 18:34     ` Peter Xu
  2022-10-06 18:38       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-10-06 18:34 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

On Thu, Oct 06, 2022 at 05:59:51PM +0100, Dr. David Alan Gilbert wrote:
> > @@ -2999,8 +3010,8 @@ void ram_postcopy_send_discard_bitmap(MigrationState *ms)
> >      migration_bitmap_sync(rs);
> >  
> >      /* Easiest way to make sure we don't resume in the middle of a host-page */
> > +    rs->pss[RAM_CHANNEL_PRECOPY].last_sent_block = NULL;
> 
> Why don't we reset the postcopy one here as well?

Because ram_postcopy_send_discard_bitmap() is only called before postcopy
starts, so the other field should be NULL already.  Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 11/14] migration: Move last_sent_block into PageSearchStatus
  2022-10-06 18:34     ` Peter Xu
@ 2022-10-06 18:38       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2022-10-06 18:38 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Oct 06, 2022 at 05:59:51PM +0100, Dr. David Alan Gilbert wrote:
> > > @@ -2999,8 +3010,8 @@ void ram_postcopy_send_discard_bitmap(MigrationState *ms)
> > >      migration_bitmap_sync(rs);
> > >  
> > >      /* Easiest way to make sure we don't resume in the middle of a host-page */
> > > +    rs->pss[RAM_CHANNEL_PRECOPY].last_sent_block = NULL;
> > 
> > Why don't we reset the postcopy one here as well?
> 
> Because ram_postcopy_send_discard_bitmap() is only called before postcopy
> starts, so the other field should be NULL already.  Thanks,

Ah, yes, OK


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> -- 
> Peter Xu
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/14] migration: Use atomic ops properly for page accountings
  2022-10-05 13:53         ` Peter Xu
@ 2022-10-06 20:40           ` Peter Xu
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Xu @ 2022-10-06 20:40 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Manish Mishra, Juan Quintela, ani,
	Leonardo Bras Soares Passos, Daniel P . Berrange

On Wed, Oct 05, 2022 at 09:53:57AM -0400, Peter Xu wrote:
> On Wed, Oct 05, 2022 at 12:38:05PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Tue, Oct 04, 2022 at 05:59:36PM +0100, Dr. David Alan Gilbert wrote:
> > > > * Peter Xu (peterx@redhat.com) wrote:
> > > > > To prepare for thread-safety on page accountings, at least below counters
> > > > > need to be accessed only atomically, they are:
> > > > > 
> > > > >         ram_counters.transferred
> > > > >         ram_counters.duplicate
> > > > >         ram_counters.normal
> > > > >         ram_counters.postcopy_bytes
> > > > > 
> > > > > There are a lot of other counters but they won't be accessed outside
> > > > > migration thread, then they're still safe to be accessed without atomic
> > > > > ops.
> > > > > 
> > > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > 
> > > > I think this is OK; I'm not sure whether the memset 0's of ram_counters
> > > > technically need changing.
> > > 
> > > IMHO they're fine - what we need there should be thing like WRITE_ONCE()
> > > just to make sure no register caches (actually atomic_write() is normally
> > > implemented with WRITE_ONCE afaik).  But I think that's already guaranteed
> > > by memset() as the function call does, so we should be 100% safe.
> > 
> > I agree you're probably OK.
> > 
> > > > I'd love to put a comment somewhere saying these fields need to be
> > > > atomically read, but their qapi defined so I don't think we can.
> > > 
> > > How about I add a comment above ram_counters declarations in ram.c?
> > 
> > Yeh.
> > 
> > > > 
> > > > Finally, we probably need to check these are happy on 32 bit builds,
> > > > sometimes it's a bit funny with atomic adds.
> > > 
> > > Yeah.. I hope using qatomic_*() APIs can help me avoid any issues.  Or
> > > anything concerning?  I'd be happy to test on specific things if there are.
> > 
> > I just remember hitting problems in the past; especially if we end up
> > with trying to do a 64 bit atomic on a platofmr that can only do 32???
> 
> I see what you meant... when I was looking in the existing callers of
> qatomic_add(), I do find that we seem to have Stat64 just for that
> !CONFIG_ATOMIC64 problem.
> 
> I'll dig a bit on whether and how we can do that; the thing is these
> counters are in the qapi so I need to make sure it can support Stat64
> somehow.  Hmm..

I think I can't directly change the qapi MigrationStats to make some of
them to Stat64 since for !ATOMIC_64 systems Stat64 actually takes more than
64 bits space (since we'll need to do the locking with Stat64.lock), so
it'll definitely break the ABI no matter what..

I don't have a better option but introduce another ram_counters_internal to
maintain the fields that need atomic access, declaring as a Stat64 array.
Then we only mirror those values to MigrationStats in QMP queries when
needed.  The mirror will not contain the lock itself so it'll keep the ABI.

Let me know if there's early comment for that, or I'll go with it.  I'll
definitely add some comment for ram_counters to explain the mirror counters
in that case.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2022-10-06 20:42 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-20 22:50 [PATCH 00/14] migration: Postcopy Preempt-Full Peter Xu
2022-09-20 22:50 ` [PATCH 01/14] migration: Add postcopy_preempt_active() Peter Xu
2022-09-20 22:50 ` [PATCH 02/14] migration: Cleanup xbzrle zero page cache update logic Peter Xu
2022-10-04 10:33   ` Dr. David Alan Gilbert
2022-09-20 22:50 ` [PATCH 03/14] migration: Trivial cleanup save_page_header() on same block check Peter Xu
2022-10-04 10:41   ` Dr. David Alan Gilbert
2022-09-20 22:50 ` [PATCH 04/14] migration: Remove RAMState.f references in compression code Peter Xu
2022-10-04 10:54   ` Dr. David Alan Gilbert
2022-10-04 14:36     ` Peter Xu
2022-09-20 22:52 ` [PATCH 05/14] migration: Yield bitmap_mutex properly when sending/sleeping Peter Xu
2022-10-04 13:55   ` Dr. David Alan Gilbert
2022-10-04 19:13     ` Peter Xu
2022-10-05 11:18       ` Dr. David Alan Gilbert
2022-10-05 13:40         ` Peter Xu
2022-10-05 19:48           ` Peter Xu
2022-09-20 22:52 ` [PATCH 06/14] migration: Use atomic ops properly for page accountings Peter Xu
2022-10-04 16:59   ` Dr. David Alan Gilbert
2022-10-04 19:23     ` Peter Xu
2022-10-05 11:38       ` Dr. David Alan Gilbert
2022-10-05 13:53         ` Peter Xu
2022-10-06 20:40           ` Peter Xu
2022-09-20 22:52 ` [PATCH 07/14] migration: Teach PSS about host page Peter Xu
2022-10-05 11:12   ` Dr. David Alan Gilbert
2022-09-20 22:52 ` [PATCH 08/14] migration: Introduce pss_channel Peter Xu
2022-10-05 13:03   ` Dr. David Alan Gilbert
2022-09-20 22:52 ` [PATCH 09/14] migration: Add pss_init() Peter Xu
2022-10-05 13:09   ` Dr. David Alan Gilbert
2022-09-20 22:52 ` [PATCH 10/14] migration: Make PageSearchStatus part of RAMState Peter Xu
2022-10-05 18:51   ` Dr. David Alan Gilbert
2022-10-05 19:41     ` Peter Xu
2022-10-06  8:36       ` Dr. David Alan Gilbert
2022-10-06  8:37   ` Dr. David Alan Gilbert
2022-09-20 22:52 ` [PATCH 11/14] migration: Move last_sent_block into PageSearchStatus Peter Xu
2022-10-06 16:59   ` Dr. David Alan Gilbert
2022-10-06 18:34     ` Peter Xu
2022-10-06 18:38       ` Dr. David Alan Gilbert
2022-09-20 22:52 ` [PATCH 12/14] migration: Send requested page directly in rp-return thread Peter Xu
2022-10-06 17:51   ` Dr. David Alan Gilbert
2022-09-20 22:52 ` [PATCH 13/14] migration: Remove old preempt code around state maintainance Peter Xu
2022-09-21  0:47   ` Peter Xu
2022-09-21 13:54     ` Peter Xu
2022-10-06 17:56       ` Dr. David Alan Gilbert
2022-09-20 22:52 ` [PATCH 14/14] migration: Drop rs->f Peter Xu
2022-10-06 17:57   ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.