All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 00/13] migration: Postcopy Preempt-Full
@ 2022-08-29 16:56 Peter Xu
  2022-08-29 16:56 ` [PATCH RFC 01/13] migration: Use non-atomic ops for clear log bitmap Peter Xu
                   ` (13 more replies)
  0 siblings, 14 replies; 18+ messages in thread
From: Peter Xu @ 2022-08-29 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P . Berrange, Leonardo Bras Soares Passos, Manish Mishra,
	Juan Quintela, Dr . David Alan Gilbert, peterx

This is a RFC series.  Tree is here:

  https://github.com/xzpeter/qemu/tree/preempt-full

It's not complete because there're still something we need to do which will
be attached to the end of this cover letter, however this series can
already safely pass qtest and any of my test.

Comparing to the recently merged preempt mode I called it "preempt-full"
because it threadifies the postcopy channels so now urgent pages can be
fully handled separately outside of the ram save loop.  Sorry to have the
same name as the PREEMPT_FULL in the Linux RT world, it's just that we
needed a name for the capability and it was named as preempt already
anyway..

The existing preempt code has reduced ramdom page req latency over 10Gbps
network from ~12ms to ~500us which has already landed.

This preempt-full series can further reduces that ~500us to ~230us per my
initial test.  More to share below.

Note that no new capability is needed, IOW it's fully compatible with the
existing preempt mode.  So the naming is actually not important but just to
identify the difference on the binaries.  It's because this series only
reworks the sender side code and does not change the migration protocol, it
just runs faster.

IOW, old "preempt" QEMU can also migrate to "preempt-full" QEMU, vice versa.

  - When old "preempt" mode QEMU migrates to "preempt-full" QEMU, it'll be
    the same as running both old "preempt" QEMUs.

  - When "preempt-full" QEMU migrates to old "preempt" QEMU, it'll be the
    same as running both "preempt-full".

The logic of the series is quite simple too: simply moving the existing
preempt channel page sends to rp-return thread.  It can slow down rp-return
thread on receiving pages, but I don't really see a major issue with it so
far.

This latency number is getting close to the extreme of 4K page request
latency of any TCP roundtrip of the 10Gbps nic I have.  The 'extreme
number' is something I get from mig_mon tool which has a mode [1] to
emulate the extreme tcp roundtrips of page requests.

Performance
===========

Page request latencies has distributions as below, with a VM of 20G mem, 20
cores, 10Gbps nic, 18G fully random writes:

Postcopy Vanilla
----------------

Average: 12093 (us)
@delay_us:
[1]                    1 |                                                    |
[2, 4)                 0 |                                                    |
[4, 8)                 0 |                                                    |
[8, 16)                0 |                                                    |
[16, 32)               1 |                                                    |
[32, 64)               8 |                                                    |
[64, 128)             11 |                                                    |
[128, 256)            14 |                                                    |
[256, 512)            19 |                                                    |
[512, 1K)             14 |                                                    |
[1K, 2K)              35 |                                                    |
[2K, 4K)              18 |                                                    |
[4K, 8K)              87 |@                                                   |
[8K, 16K)           2397 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16K, 32K)             7 |                                                    |
[32K, 64K)             2 |                                                    |
[64K, 128K)           20 |                                                    |
[128K, 256K)           6 |                                                    |

Postcopy Preempt
----------------

Average: 496 (us)

@delay_us:
[32, 64)               2 |                                                    |
[64, 128)           2306 |@@@@                                                |
[128, 256)         25422 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[256, 512)          8238 |@@@@@@@@@@@@@@@@                                    |
[512, 1K)           1066 |@@                                                  |
[1K, 2K)            2167 |@@@@                                                |
[2K, 4K)            3329 |@@@@@@                                              |
[4K, 8K)             109 |                                                    |
[8K, 16K)             48 |                                                    |

Postcopy Preempt-Full
---------------------

Average: 229 (us)

@delay_us:
[8, 16)                1 |                                                    |
[16, 32)               3 |                                                    |
[32, 64)               2 |                                                    |
[64, 128)          11956 |@@@@@@@@@@                                          |
[128, 256)         60403 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[256, 512)         15047 |@@@@@@@@@@@@                                        |
[512, 1K)            846 |                                                    |
[1K, 2K)              25 |                                                    |
[2K, 4K)              41 |                                                    |
[4K, 8K)             131 |                                                    |
[8K, 16K)             72 |                                                    |
[16K, 32K)             2 |                                                    |
[32K, 64K)             8 |                                                    |
[64K, 128K)            6 |                                                    |

For fully sequential page access workloads, I have described in the
previous preempt-mode work that such workload may not benefit much from
preempt mode much, but surprisingly at least in my seq write test the
preempt-full mode can also benefit sequential access patterns at least when
I measured it:

Postcopy Vanilla
----------------

Average: 1487 (us)

@delay_us:
[0]                   93 |@                                                   |
[1]                 1920 |@@@@@@@@@@@@@@@@@@@@@@@                             |
[2, 4)               504 |@@@@@@                                              |
[4, 8)              2234 |@@@@@@@@@@@@@@@@@@@@@@@@@@@                         |
[8, 16)             4199 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16, 32)            3782 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      |
[32, 64)            1016 |@@@@@@@@@@@@                                        |
[64, 128)             81 |@                                                   |
[128, 256)            14 |                                                    |
[256, 512)            26 |                                                    |
[512, 1K)             69 |                                                    |
[1K, 2K)             208 |@@                                                  |
[2K, 4K)             429 |@@@@@                                               |
[4K, 8K)            2779 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                  |
[8K, 16K)            792 |@@@@@@@@@                                           |
[16K, 32K)             9 |                                                    |

Postcopy Preempt-Full
---------------------

Average: 1582 (us)

@delay_us:
[0]                   45 |                                                    |
[1]                 1786 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                       |
[2, 4)               423 |@@@@@@@                                             |
[4, 8)              1903 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                     |
[8, 16)             2933 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    |
[16, 32)            3132 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[32, 64)             518 |@@@@@@@@                                            |
[64, 128)             30 |                                                    |
[128, 256)           218 |@@@                                                 |
[256, 512)           214 |@@@                                                 |
[512, 1K)            211 |@@@                                                 |
[1K, 2K)             131 |@@                                                  |
[2K, 4K)             336 |@@@@@                                               |
[4K, 8K)            3023 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |
[8K, 16K)            479 |@@@@@@@                                             |

Postcopy Preempt-Full
---------------------

Average: 439 (us)

@delay_us:
[0]                    3 |                                                    |
[1]                 1058 |@                                                   |
[2, 4)               179 |                                                    |
[4, 8)              1079 |@                                                   |
[8, 16)             2251 |@@@                                                 |
[16, 32)            2345 |@@@@                                                |
[32, 64)             713 |@                                                   |
[64, 128)           5386 |@@@@@@@@@                                           |
[128, 256)         30252 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[256, 512)         10789 |@@@@@@@@@@@@@@@@@@                                  |
[512, 1K)            367 |                                                    |
[1K, 2K)              26 |                                                    |
[2K, 4K)             256 |                                                    |
[4K, 8K)            1840 |@@@                                                 |
[8K, 16K)            300 |                                                    |

I always don't think seq access is important in migrations, because for any
not-small VM that has a migration challenge, any multiple seq accesses will
also be grown into a random access pattern.  But I'm anyway laying the data
around for good reference.

Comments welcomed, thanks.

TODO List
=========

- Make migration accountings atomic
- Drop rs->f?
- Disable xbzrle for preempt mode?  Is it already perhaps disabled for postcopy?
- If this series can be really accepted, we can logically drop some of the
  old (complcated) code with the old preempt series.
- Drop x-postcopy-preempt-break-huge parameter?
- More to come

[1] https://github.com/xzpeter/mig_mon#vm-live-migration-network-emulator

Peter Xu (13):
  migration: Use non-atomic ops for clear log bitmap
  migration: Add postcopy_preempt_active()
  migration: Yield bitmap_mutex properly when sending/sleeping
  migration: Cleanup xbzrle zero page cache update logic
  migration: Disallow postcopy preempt to be used with compress
  migration: Trivial cleanup save_page_header() on same block check
  migration: Remove RAMState.f references in compression code
  migration: Teach PSS about host page
  migration: Introduce pss_channel
  migration: Add pss_init()
  migration: Make PageSearchStatus part of RAMState
  migration: Move last_sent_block into PageSearchStatus
  migration: Send requested page directly in rp-return thread

 include/exec/ram_addr.h |  11 +-
 include/qemu/bitmap.h   |   1 +
 migration/migration.c   |  11 +
 migration/ram.c         | 496 +++++++++++++++++++++++++++++-----------
 util/bitmap.c           |  45 ++++
 5 files changed, 421 insertions(+), 143 deletions(-)

-- 
2.32.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH RFC 01/13] migration: Use non-atomic ops for clear log bitmap
  2022-08-29 16:56 [PATCH RFC 00/13] migration: Postcopy Preempt-Full Peter Xu
@ 2022-08-29 16:56 ` Peter Xu
  2022-09-15 18:49   ` Dr. David Alan Gilbert
  2022-08-29 16:56 ` [PATCH RFC 02/13] migration: Add postcopy_preempt_active() Peter Xu
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 18+ messages in thread
From: Peter Xu @ 2022-08-29 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P . Berrange, Leonardo Bras Soares Passos, Manish Mishra,
	Juan Quintela, Dr . David Alan Gilbert, peterx

Since we already have bitmap_mutex to protect either the dirty bitmap or
the clear log bitmap, we don't need atomic operations to set/clear/test on
the clear log bitmap.  Switching all ops from atomic to non-atomic
versions, meanwhile touch up the comments to show which lock is in charge.

Introduced non-atomic version of bitmap_test_and_clear_atomic(), mostly the
same as the atomic version but simplified a few places, e.g. dropped the
"old_bits" variable, and also the explicit memory barriers.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/exec/ram_addr.h | 11 +++++-----
 include/qemu/bitmap.h   |  1 +
 util/bitmap.c           | 45 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index f3e0c78161..5092a2e0ff 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -42,7 +42,8 @@ static inline long clear_bmap_size(uint64_t pages, uint8_t shift)
 }
 
 /**
- * clear_bmap_set: set clear bitmap for the page range
+ * clear_bmap_set: set clear bitmap for the page range.  Must be with
+ * bitmap_mutex held.
  *
  * @rb: the ramblock to operate on
  * @start: the start page number
@@ -55,12 +56,12 @@ static inline void clear_bmap_set(RAMBlock *rb, uint64_t start,
 {
     uint8_t shift = rb->clear_bmap_shift;
 
-    bitmap_set_atomic(rb->clear_bmap, start >> shift,
-                      clear_bmap_size(npages, shift));
+    bitmap_set(rb->clear_bmap, start >> shift, clear_bmap_size(npages, shift));
 }
 
 /**
- * clear_bmap_test_and_clear: test clear bitmap for the page, clear if set
+ * clear_bmap_test_and_clear: test clear bitmap for the page, clear if set.
+ * Must be with bitmap_mutex held.
  *
  * @rb: the ramblock to operate on
  * @page: the page number to check
@@ -71,7 +72,7 @@ static inline bool clear_bmap_test_and_clear(RAMBlock *rb, uint64_t page)
 {
     uint8_t shift = rb->clear_bmap_shift;
 
-    return bitmap_test_and_clear_atomic(rb->clear_bmap, page >> shift, 1);
+    return bitmap_test_and_clear(rb->clear_bmap, page >> shift, 1);
 }
 
 static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 82a1d2f41f..3ccb00865f 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -253,6 +253,7 @@ void bitmap_set(unsigned long *map, long i, long len);
 void bitmap_set_atomic(unsigned long *map, long i, long len);
 void bitmap_clear(unsigned long *map, long start, long nr);
 bool bitmap_test_and_clear_atomic(unsigned long *map, long start, long nr);
+bool bitmap_test_and_clear(unsigned long *map, long start, long nr);
 void bitmap_copy_and_clear_atomic(unsigned long *dst, unsigned long *src,
                                   long nr);
 unsigned long bitmap_find_next_zero_area(unsigned long *map,
diff --git a/util/bitmap.c b/util/bitmap.c
index f81d8057a7..8d12e90a5a 100644
--- a/util/bitmap.c
+++ b/util/bitmap.c
@@ -240,6 +240,51 @@ void bitmap_clear(unsigned long *map, long start, long nr)
     }
 }
 
+bool bitmap_test_and_clear(unsigned long *map, long start, long nr)
+{
+    unsigned long *p = map + BIT_WORD(start);
+    const long size = start + nr;
+    int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
+    unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
+    bool dirty = false;
+
+    assert(start >= 0 && nr >= 0);
+
+    /* First word */
+    if (nr - bits_to_clear > 0) {
+        if ((*p) & mask_to_clear) {
+            dirty = true;
+        }
+        *p &= ~mask_to_clear;
+        nr -= bits_to_clear;
+        bits_to_clear = BITS_PER_LONG;
+        p++;
+    }
+
+    /* Full words */
+    if (bits_to_clear == BITS_PER_LONG) {
+        while (nr >= BITS_PER_LONG) {
+            if (*p) {
+                dirty = true;
+                *p = 0;
+            }
+            nr -= BITS_PER_LONG;
+            p++;
+        }
+    }
+
+    /* Last word */
+    if (nr) {
+        mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
+        if ((*p) & mask_to_clear) {
+            dirty = true;
+        }
+        *p &= ~mask_to_clear;
+    }
+
+    return dirty;
+}
+
 bool bitmap_test_and_clear_atomic(unsigned long *map, long start, long nr)
 {
     unsigned long *p = map + BIT_WORD(start);
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 02/13] migration: Add postcopy_preempt_active()
  2022-08-29 16:56 [PATCH RFC 00/13] migration: Postcopy Preempt-Full Peter Xu
  2022-08-29 16:56 ` [PATCH RFC 01/13] migration: Use non-atomic ops for clear log bitmap Peter Xu
@ 2022-08-29 16:56 ` Peter Xu
  2022-09-15 18:50   ` Dr. David Alan Gilbert
  2022-08-29 16:56 ` [PATCH RFC 03/13] migration: Yield bitmap_mutex properly when sending/sleeping Peter Xu
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 18+ messages in thread
From: Peter Xu @ 2022-08-29 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P . Berrange, Leonardo Bras Soares Passos, Manish Mishra,
	Juan Quintela, Dr . David Alan Gilbert, peterx

Add the helper to show that postcopy preempt enabled, meanwhile active.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index dc1de9ddbc..8c5d5332e8 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -162,6 +162,11 @@ out:
     return ret;
 }
 
+static bool postcopy_preempt_active(void)
+{
+    return migrate_postcopy_preempt() && migration_in_postcopy();
+}
+
 bool ramblock_is_ignored(RAMBlock *block)
 {
     return !qemu_ram_is_migratable(block) ||
@@ -2434,7 +2439,7 @@ static void postcopy_preempt_choose_channel(RAMState *rs, PageSearchStatus *pss)
 /* We need to make sure rs->f always points to the default channel elsewhere */
 static void postcopy_preempt_reset_channel(RAMState *rs)
 {
-    if (migrate_postcopy_preempt() && migration_in_postcopy()) {
+    if (postcopy_preempt_active()) {
         rs->postcopy_channel = RAM_CHANNEL_PRECOPY;
         rs->f = migrate_get_current()->to_dst_file;
         trace_postcopy_preempt_reset_channel();
@@ -2472,7 +2477,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
         return 0;
     }
 
-    if (migrate_postcopy_preempt() && migration_in_postcopy()) {
+    if (postcopy_preempt_active()) {
         postcopy_preempt_choose_channel(rs, pss);
     }
 
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 03/13] migration: Yield bitmap_mutex properly when sending/sleeping
  2022-08-29 16:56 [PATCH RFC 00/13] migration: Postcopy Preempt-Full Peter Xu
  2022-08-29 16:56 ` [PATCH RFC 01/13] migration: Use non-atomic ops for clear log bitmap Peter Xu
  2022-08-29 16:56 ` [PATCH RFC 02/13] migration: Add postcopy_preempt_active() Peter Xu
@ 2022-08-29 16:56 ` Peter Xu
  2022-08-29 16:56 ` [PATCH RFC 04/13] migration: Cleanup xbzrle zero page cache update logic Peter Xu
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Peter Xu @ 2022-08-29 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P . Berrange, Leonardo Bras Soares Passos, Manish Mishra,
	Juan Quintela, Dr . David Alan Gilbert, peterx

Don't take the bitmap mutex when sending pages, or when being throttled by
migration_rate_limit() (which is a bit tricky to call it here in ram code,
but seems still helpful).

It prepares for the possibility of concurrently sending pages in >1 threads
using the function ram_save_host_page() because all threads may need the
bitmap_mutex to operate on bitmaps, so that either sendmsg() or any kind of
qemu_sem_wait() blocking for one thread will not block the other from
progressing.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 42 +++++++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 11 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 8c5d5332e8..9e96a46323 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2470,6 +2470,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
     unsigned long hostpage_boundary =
         QEMU_ALIGN_UP(pss->page + 1, pagesize_bits);
     unsigned long start_page = pss->page;
+    bool page_dirty;
     int res;
 
     if (ramblock_is_ignored(pss->block)) {
@@ -2487,22 +2488,41 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
             break;
         }
 
+        page_dirty = migration_bitmap_clear_dirty(rs, pss->block, pss->page);
+        /*
+         * Properly yield the lock only in postcopy preempt mode because
+         * both migration thread and rp-return thread can operate on the
+         * bitmaps.
+         */
+        if (postcopy_preempt_active()) {
+            qemu_mutex_unlock(&rs->bitmap_mutex);
+        }
+
         /* Check the pages is dirty and if it is send it */
-        if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
+        if (page_dirty) {
             tmppages = ram_save_target_page(rs, pss);
-            if (tmppages < 0) {
-                return tmppages;
+            if (tmppages >= 0) {
+                pages += tmppages;
+                /*
+                 * Allow rate limiting to happen in the middle of huge pages if
+                 * something is sent in the current iteration.
+                 */
+                if (pagesize_bits > 1 && tmppages > 0) {
+                    migration_rate_limit();
+                }
             }
+        } else {
+            tmppages = 0;
+        }
 
-            pages += tmppages;
-            /*
-             * Allow rate limiting to happen in the middle of huge pages if
-             * something is sent in the current iteration.
-             */
-            if (pagesize_bits > 1 && tmppages > 0) {
-                migration_rate_limit();
-            }
+        if (postcopy_preempt_active()) {
+            qemu_mutex_lock(&rs->bitmap_mutex);
         }
+
+        if (tmppages < 0) {
+            return tmppages;
+        }
+
         pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
     } while ((pss->page < hostpage_boundary) &&
              offset_in_ramblock(pss->block,
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 04/13] migration: Cleanup xbzrle zero page cache update logic
  2022-08-29 16:56 [PATCH RFC 00/13] migration: Postcopy Preempt-Full Peter Xu
                   ` (2 preceding siblings ...)
  2022-08-29 16:56 ` [PATCH RFC 03/13] migration: Yield bitmap_mutex properly when sending/sleeping Peter Xu
@ 2022-08-29 16:56 ` Peter Xu
  2022-08-29 16:56 ` [PATCH RFC 05/13] migration: Disallow postcopy preempt to be used with compress Peter Xu
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Peter Xu @ 2022-08-29 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P . Berrange, Leonardo Bras Soares Passos, Manish Mishra,
	Juan Quintela, Dr . David Alan Gilbert, peterx

The major change is to replace "!save_page_use_compression()" with
"xbzrle_enabled" to make it clear.

Reasonings:

(1) When compression enabled, "!save_page_use_compression()" is exactly the
    same as checking "xbzrle_enabled".

(2) When compression disabled, "!save_page_use_compression()" always return
    true.  We used to try calling the xbzrle code, but after this change we
    won't, and we shouldn't need to.

Since at it, drop the xbzrle_enabled check in xbzrle_cache_zero_page()
because with this change it's not needed anymore.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 9e96a46323..612c7dd708 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -741,10 +741,6 @@ void mig_throttle_counter_reset(void)
  */
 static void xbzrle_cache_zero_page(RAMState *rs, ram_addr_t current_addr)
 {
-    if (!rs->xbzrle_enabled) {
-        return;
-    }
-
     /* We don't care if this fails to allocate a new cache page
      * as long as it updated an old one */
     cache_insert(XBZRLE.cache, current_addr, XBZRLE.zero_target_page,
@@ -2301,7 +2297,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
         /* Must let xbzrle know, otherwise a previous (now 0'd) cached
          * page would be stale
          */
-        if (!save_page_use_compression(rs)) {
+        if (rs->xbzrle_enabled) {
             XBZRLE_cache_lock();
             xbzrle_cache_zero_page(rs, block->offset + offset);
             XBZRLE_cache_unlock();
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 05/13] migration: Disallow postcopy preempt to be used with compress
  2022-08-29 16:56 [PATCH RFC 00/13] migration: Postcopy Preempt-Full Peter Xu
                   ` (3 preceding siblings ...)
  2022-08-29 16:56 ` [PATCH RFC 04/13] migration: Cleanup xbzrle zero page cache update logic Peter Xu
@ 2022-08-29 16:56 ` Peter Xu
  2022-08-29 16:56 ` [PATCH RFC 06/13] migration: Trivial cleanup save_page_header() on same block check Peter Xu
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Peter Xu @ 2022-08-29 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P . Berrange, Leonardo Bras Soares Passos, Manish Mishra,
	Juan Quintela, Dr . David Alan Gilbert, peterx

The preempt mode requires the capability to assign channel for each of the
page, while the compression logic will currently assign pages to different
compress thread/local-channel so potentially they're incompatible.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index bb8bbddfe4..844bca1ff6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1336,6 +1336,17 @@ static bool migrate_caps_check(bool *cap_list,
             error_setg(errp, "Postcopy preempt requires postcopy-ram");
             return false;
         }
+
+        /*
+         * Preempt mode requires urgent pages to be sent in separate
+         * channel, OTOH compression logic will disorder all pages into
+         * different compression channels, which is not compatible with the
+         * preempt assumptions on channel assignments.
+         */
+        if (cap_list[MIGRATION_CAPABILITY_COMPRESS]) {
+            error_setg(errp, "Postcopy preempt not compatible with compress");
+            return false;
+        }
     }
 
     return true;
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 06/13] migration: Trivial cleanup save_page_header() on same block check
  2022-08-29 16:56 [PATCH RFC 00/13] migration: Postcopy Preempt-Full Peter Xu
                   ` (4 preceding siblings ...)
  2022-08-29 16:56 ` [PATCH RFC 05/13] migration: Disallow postcopy preempt to be used with compress Peter Xu
@ 2022-08-29 16:56 ` Peter Xu
  2022-08-29 16:56 ` [PATCH RFC 07/13] migration: Remove RAMState.f references in compression code Peter Xu
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Peter Xu @ 2022-08-29 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P . Berrange, Leonardo Bras Soares Passos, Manish Mishra,
	Juan Quintela, Dr . David Alan Gilbert, peterx

The 2nd check on RAM_SAVE_FLAG_CONTINUE is a bit redundant.  Use a boolean
to be clearer.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 612c7dd708..43893d0a40 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -661,14 +661,15 @@ static size_t save_page_header(RAMState *rs, QEMUFile *f,  RAMBlock *block,
                                ram_addr_t offset)
 {
     size_t size, len;
+    bool same_block = (block == rs->last_sent_block);
 
-    if (block == rs->last_sent_block) {
+    if (same_block) {
         offset |= RAM_SAVE_FLAG_CONTINUE;
     }
     qemu_put_be64(f, offset);
     size = 8;
 
-    if (!(offset & RAM_SAVE_FLAG_CONTINUE)) {
+    if (!same_block) {
         len = strlen(block->idstr);
         qemu_put_byte(f, len);
         qemu_put_buffer(f, (uint8_t *)block->idstr, len);
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 07/13] migration: Remove RAMState.f references in compression code
  2022-08-29 16:56 [PATCH RFC 00/13] migration: Postcopy Preempt-Full Peter Xu
                   ` (5 preceding siblings ...)
  2022-08-29 16:56 ` [PATCH RFC 06/13] migration: Trivial cleanup save_page_header() on same block check Peter Xu
@ 2022-08-29 16:56 ` Peter Xu
  2022-08-29 16:56 ` [PATCH RFC 08/13] migration: Teach PSS about host page Peter Xu
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Peter Xu @ 2022-08-29 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P . Berrange, Leonardo Bras Soares Passos, Manish Mishra,
	Juan Quintela, Dr . David Alan Gilbert, peterx

Removing referencing to RAMState.f in compress_page_with_multi_thread() and
flush_compressed_data().

Compression code by default isn't compatible with having >1 channels (or it
won't currently know which channel to flush the compressed data), so to
make it simple we always flush on the default to_dst_file port until
someone wants to add >1 ports support, as rs->f right now can really
change (after postcopy preempt is introduced).

There should be no functional change at all after patch applied, since as
long as rs->f referenced in compression code, it must be to_dst_file.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 43893d0a40..2f37520be4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1461,6 +1461,7 @@ static bool save_page_use_compression(RAMState *rs);
 
 static void flush_compressed_data(RAMState *rs)
 {
+    MigrationState *ms = migrate_get_current();
     int idx, len, thread_count;
 
     if (!save_page_use_compression(rs)) {
@@ -1479,7 +1480,7 @@ static void flush_compressed_data(RAMState *rs)
     for (idx = 0; idx < thread_count; idx++) {
         qemu_mutex_lock(&comp_param[idx].mutex);
         if (!comp_param[idx].quit) {
-            len = qemu_put_qemu_file(rs->f, comp_param[idx].file);
+            len = qemu_put_qemu_file(ms->to_dst_file, comp_param[idx].file);
             /*
              * it's safe to fetch zero_page without holding comp_done_lock
              * as there is no further request submitted to the thread,
@@ -1498,11 +1499,11 @@ static inline void set_compress_params(CompressParam *param, RAMBlock *block,
     param->offset = offset;
 }
 
-static int compress_page_with_multi_thread(RAMState *rs, RAMBlock *block,
-                                           ram_addr_t offset)
+static int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset)
 {
     int idx, thread_count, bytes_xmit = -1, pages = -1;
     bool wait = migrate_compress_wait_thread();
+    MigrationState *ms = migrate_get_current();
 
     thread_count = migrate_compress_threads();
     qemu_mutex_lock(&comp_done_lock);
@@ -1510,7 +1511,8 @@ retry:
     for (idx = 0; idx < thread_count; idx++) {
         if (comp_param[idx].done) {
             comp_param[idx].done = false;
-            bytes_xmit = qemu_put_qemu_file(rs->f, comp_param[idx].file);
+            bytes_xmit = qemu_put_qemu_file(ms->to_dst_file,
+                                            comp_param[idx].file);
             qemu_mutex_lock(&comp_param[idx].mutex);
             set_compress_params(&comp_param[idx], block, offset);
             qemu_cond_signal(&comp_param[idx].cond);
@@ -2263,7 +2265,7 @@ static bool save_compress_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
         return false;
     }
 
-    if (compress_page_with_multi_thread(rs, block, offset) > 0) {
+    if (compress_page_with_multi_thread(block, offset) > 0) {
         return true;
     }
 
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 08/13] migration: Teach PSS about host page
  2022-08-29 16:56 [PATCH RFC 00/13] migration: Postcopy Preempt-Full Peter Xu
                   ` (6 preceding siblings ...)
  2022-08-29 16:56 ` [PATCH RFC 07/13] migration: Remove RAMState.f references in compression code Peter Xu
@ 2022-08-29 16:56 ` Peter Xu
  2022-08-29 16:56 ` [PATCH RFC 09/13] migration: Introduce pss_channel Peter Xu
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Peter Xu @ 2022-08-29 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P . Berrange, Leonardo Bras Soares Passos, Manish Mishra,
	Juan Quintela, Dr . David Alan Gilbert, peterx

Migration code has a lot to do with host pages.  Teaching PSS core about
the idea of host page helps a lot and makes the code clean.  Meanwhile,
this prepares for the future changes that can leverage the new PSS helpers
that this patch introduces to send host page in another thread.

Three more fields are introduced for this:

  (1) host_page_sending: this is set to true when QEMU is sending a host
      page, false otherwise.

  (2) host_page_{start|end}: this points to the end of host page, and it's
      only valid when host_page_sending==true.

For example, when we look up the next dirty page on the ramblock, with
host_page_sending==true, we'll not even try to look for anything beyond the
current host page.  This can be efficient than current code because
currently we'll set pss->page to next dirty bit (which can be over current
host page) and reset it to host page boundary if we found overflow.  The
latter is not efficient as we don't need to scan over host page boundary.

Meanwhile with above, we can easily make migration_bitmap_find_dirty() self
contained by updating pss->page properly.  rs* parameter is removed because
it's not even used in old code.

When sending a host page, we should use the pss helpers like this:

  - pss_host_page_prepare(pss): called before sending host page
  - pss_within_range(pss): whether we're still working on the cur host page?
  - pss_host_page_finish(pss): called after sending a host page

If there'll be another function to send host page (e.g. in return path
thread) in the future, it should follow the same style.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 91 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 73 insertions(+), 18 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 2f37520be4..e2b922ad59 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -474,6 +474,11 @@ struct PageSearchStatus {
      * postcopy pages via postcopy preempt channel.
      */
     bool         postcopy_target_channel;
+    /* Whether we're sending a host page */
+    bool          host_page_sending;
+    /* The start/end of current host page.  Invalid if host_page_sending==false */
+    unsigned long host_page_start;
+    unsigned long host_page_end;
 };
 typedef struct PageSearchStatus PageSearchStatus;
 
@@ -851,26 +856,36 @@ static int save_xbzrle_page(RAMState *rs, uint8_t **current_data,
 }
 
 /**
- * migration_bitmap_find_dirty: find the next dirty page from start
+ * pss_find_next_dirty: find the next dirty page of current ramblock
  *
- * Returns the page offset within memory region of the start of a dirty page
+ * This function updates pss->page to point to the next dirty page index
+ * within the ramblock, or the end of ramblock when nothing found.
  *
  * @rs: current RAM state
- * @rb: RAMBlock where to search for dirty pages
- * @start: page where we start the search
+ * @pss: the current page search status
  */
-static inline
-unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
-                                          unsigned long start)
+static void pss_find_next_dirty(PageSearchStatus *pss)
 {
+    RAMBlock *rb = pss->block;
     unsigned long size = rb->used_length >> TARGET_PAGE_BITS;
     unsigned long *bitmap = rb->bmap;
 
     if (ramblock_is_ignored(rb)) {
-        return size;
+        /* Points directly to the end, so we know no dirty page */
+        pss->page = size;
+        return;
     }
 
-    return find_next_bit(bitmap, size, start);
+    /*
+     * If during sending a host page, only look for dirty pages within the
+     * current host page being send.
+     */
+    if (pss->host_page_sending) {
+        assert(pss->host_page_end);
+        size = MIN(size, pss->host_page_end);
+    }
+
+    pss->page = find_next_bit(bitmap, size, pss->page);
 }
 
 static void migration_clear_memory_region_dirty_bitmap(RAMBlock *rb,
@@ -1555,7 +1570,9 @@ static bool find_dirty_block(RAMState *rs, PageSearchStatus *pss, bool *again)
     pss->postcopy_requested = false;
     pss->postcopy_target_channel = RAM_CHANNEL_PRECOPY;
 
-    pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
+    /* Update pss->page for the next dirty bit in ramblock */
+    pss_find_next_dirty(pss);
+
     if (pss->complete_round && pss->block == rs->last_seen_block &&
         pss->page >= rs->last_page) {
         /*
@@ -2445,6 +2462,44 @@ static void postcopy_preempt_reset_channel(RAMState *rs)
     }
 }
 
+/* Should be called before sending a host page */
+static void pss_host_page_prepare(PageSearchStatus *pss)
+{
+    /* How many guest pages are there in one host page? */
+    size_t guest_pfns = qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
+
+    pss->host_page_sending = true;
+    pss->host_page_start = ROUND_DOWN(pss->page, guest_pfns);
+    pss->host_page_end = ROUND_UP(pss->page + 1, guest_pfns);
+}
+
+/*
+ * Whether the page pointed by PSS is within the host page being sent.
+ * Must be called after a previous pss_host_page_prepare().
+ */
+static bool pss_within_range(PageSearchStatus *pss)
+{
+    ram_addr_t ram_addr;
+
+    assert(pss->host_page_sending);
+
+    /* Over host-page boundary? */
+    if (pss->page >= pss->host_page_end) {
+        return false;
+    }
+
+    ram_addr = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
+
+    return offset_in_ramblock(pss->block, ram_addr);
+}
+
+static void pss_host_page_finish(PageSearchStatus *pss)
+{
+    pss->host_page_sending = false;
+    /* This is not needed, but just to reset it */
+    pss->host_page_end = 0;
+}
+
 /**
  * ram_save_host_page: save a whole host page
  *
@@ -2466,8 +2521,6 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
     int tmppages, pages = 0;
     size_t pagesize_bits =
         qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
-    unsigned long hostpage_boundary =
-        QEMU_ALIGN_UP(pss->page + 1, pagesize_bits);
     unsigned long start_page = pss->page;
     bool page_dirty;
     int res;
@@ -2481,6 +2534,9 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
         postcopy_preempt_choose_channel(rs, pss);
     }
 
+    /* Update host page boundary information */
+    pss_host_page_prepare(pss);
+
     do {
         if (postcopy_needs_preempt(rs, pss)) {
             postcopy_do_preempt(rs, pss);
@@ -2519,15 +2575,14 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
         }
 
         if (tmppages < 0) {
+            pss_host_page_finish(pss);
             return tmppages;
         }
 
-        pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
-    } while ((pss->page < hostpage_boundary) &&
-             offset_in_ramblock(pss->block,
-                                ((ram_addr_t)pss->page) << TARGET_PAGE_BITS));
-    /* The offset we leave with is the min boundary of host page and block */
-    pss->page = MIN(pss->page, hostpage_boundary);
+        pss_find_next_dirty(pss);
+    } while (pss_within_range(pss));
+
+    pss_host_page_finish(pss);
 
     /*
      * When with postcopy preempt mode, flush the data as soon as possible for
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 09/13] migration: Introduce pss_channel
  2022-08-29 16:56 [PATCH RFC 00/13] migration: Postcopy Preempt-Full Peter Xu
                   ` (7 preceding siblings ...)
  2022-08-29 16:56 ` [PATCH RFC 08/13] migration: Teach PSS about host page Peter Xu
@ 2022-08-29 16:56 ` Peter Xu
  2022-08-29 16:56 ` [PATCH RFC 10/13] migration: Add pss_init() Peter Xu
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Peter Xu @ 2022-08-29 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P . Berrange, Leonardo Bras Soares Passos, Manish Mishra,
	Juan Quintela, Dr . David Alan Gilbert, peterx

Introduce pss_channel for PageSearchStatus, define it as "the migration
channel to be used to transfer this host page".

We used to have rs->f, which is a mirror to MigrationState.to_dst_file.

After postcopy preempt initial version, rs->f can be dynamically changed
depending on which channel we want to use.

But that later work still doesn't grant full concurrency of sending pages
in e.g. different threads, because rs->f can either be the PRECOPY channel
or POSTCOPY channel.  This needs to be per-thread too.

PageSearchStatus is actually a good piece of struct which we can leverage
if we want to have multiple threads sending pages.  Sending a single guest
page may not make sense, so we make the granule to be "host page", and in
the PSS structure we allow specify a QEMUFile* to migrate a specific host
page.  Then we open the possibility to specify different channels in
different threads with different PSS structures.

The PSS prefix can be slightly misleading here because e.g. for the
upcoming usage of postcopy channel/thread it's not "searching" (or,
scanning) at all but sending the explicit page that was requested.  However
since PSS existed for some years keep it as-is until someone complains.

This patch mostly (simply) replace rs->f with pss->pss_channel only. No
functional change intended for this patch yet.  But it does prepare to
finally drop rs->f, and make ram_save_guest_page() thread safe.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 70 +++++++++++++++++++++++++++----------------------
 1 file changed, 38 insertions(+), 32 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index e2b922ad59..adcc57c584 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -446,6 +446,8 @@ void dirty_sync_missed_zero_copy(void)
 
 /* used by the search for pages to send */
 struct PageSearchStatus {
+    /* The migration channel used for a specific host page */
+    QEMUFile    *pss_channel;
     /* Current block being searched */
     RAMBlock    *block;
     /* Current page to search from */
@@ -768,9 +770,9 @@ static void xbzrle_cache_zero_page(RAMState *rs, ram_addr_t current_addr)
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  */
-static int save_xbzrle_page(RAMState *rs, uint8_t **current_data,
-                            ram_addr_t current_addr, RAMBlock *block,
-                            ram_addr_t offset)
+static int save_xbzrle_page(RAMState *rs, QEMUFile *file,
+                            uint8_t **current_data, ram_addr_t current_addr,
+                            RAMBlock *block, ram_addr_t offset)
 {
     int encoded_len = 0, bytes_xbzrle;
     uint8_t *prev_cached_page;
@@ -838,11 +840,11 @@ static int save_xbzrle_page(RAMState *rs, uint8_t **current_data,
     }
 
     /* Send XBZRLE based compressed page */
-    bytes_xbzrle = save_page_header(rs, rs->f, block,
+    bytes_xbzrle = save_page_header(rs, file, block,
                                     offset | RAM_SAVE_FLAG_XBZRLE);
-    qemu_put_byte(rs->f, ENCODING_FLAG_XBZRLE);
-    qemu_put_be16(rs->f, encoded_len);
-    qemu_put_buffer(rs->f, XBZRLE.encoded_buf, encoded_len);
+    qemu_put_byte(file, ENCODING_FLAG_XBZRLE);
+    qemu_put_be16(file, encoded_len);
+    qemu_put_buffer(file, XBZRLE.encoded_buf, encoded_len);
     bytes_xbzrle += encoded_len + 1 + 2;
     /*
      * Like compressed_size (please see update_compress_thread_counts),
@@ -1295,9 +1297,10 @@ static int save_zero_page_to_file(RAMState *rs, QEMUFile *file,
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  */
-static int save_zero_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
+static int save_zero_page(RAMState *rs, QEMUFile *file, RAMBlock *block,
+                          ram_addr_t offset)
 {
-    int len = save_zero_page_to_file(rs, rs->f, block, offset);
+    int len = save_zero_page_to_file(rs, file, block, offset);
 
     if (len) {
         ram_counters.duplicate++;
@@ -1314,15 +1317,15 @@ static int save_zero_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
  *
  * Return true if the pages has been saved, otherwise false is returned.
  */
-static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
-                              int *pages)
+static bool control_save_page(PageSearchStatus *pss, RAMBlock *block,
+                              ram_addr_t offset, int *pages)
 {
     uint64_t bytes_xmit = 0;
     int ret;
 
     *pages = -1;
-    ret = ram_control_save_page(rs->f, block->offset, offset, TARGET_PAGE_SIZE,
-                                &bytes_xmit);
+    ret = ram_control_save_page(pss->pss_channel, block->offset, offset,
+                                TARGET_PAGE_SIZE, &bytes_xmit);
     if (ret == RAM_SAVE_CONTROL_NOT_SUPP) {
         return false;
     }
@@ -1356,17 +1359,17 @@ static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
  * @buf: the page to be sent
  * @async: send to page asyncly
  */
-static int save_normal_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
-                            uint8_t *buf, bool async)
+static int save_normal_page(RAMState *rs, QEMUFile *file, RAMBlock *block,
+                            ram_addr_t offset, uint8_t *buf, bool async)
 {
-    ram_transferred_add(save_page_header(rs, rs->f, block,
+    ram_transferred_add(save_page_header(rs, file, block,
                                          offset | RAM_SAVE_FLAG_PAGE));
     if (async) {
-        qemu_put_buffer_async(rs->f, buf, TARGET_PAGE_SIZE,
+        qemu_put_buffer_async(file, buf, TARGET_PAGE_SIZE,
                               migrate_release_ram() &&
                               migration_in_postcopy());
     } else {
-        qemu_put_buffer(rs->f, buf, TARGET_PAGE_SIZE);
+        qemu_put_buffer(file, buf, TARGET_PAGE_SIZE);
     }
     ram_transferred_add(TARGET_PAGE_SIZE);
     ram_counters.normal++;
@@ -1399,8 +1402,8 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
 
     XBZRLE_cache_lock();
     if (rs->xbzrle_enabled && !migration_in_postcopy()) {
-        pages = save_xbzrle_page(rs, &p, current_addr, block,
-                                 offset);
+        pages = save_xbzrle_page(rs, pss->pss_channel, &p, current_addr,
+                                 block, offset);
         if (!rs->last_stage) {
             /* Can't send this cached data async, since the cache page
              * might get updated before it gets to the wire
@@ -1411,7 +1414,8 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
 
     /* XBZRLE overflow or normal page */
     if (pages == -1) {
-        pages = save_normal_page(rs, block, offset, p, send_async);
+        pages = save_normal_page(rs, pss->pss_channel, block, offset,
+                                 p, send_async);
     }
 
     XBZRLE_cache_unlock();
@@ -1419,10 +1423,10 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
     return pages;
 }
 
-static int ram_save_multifd_page(RAMState *rs, RAMBlock *block,
+static int ram_save_multifd_page(QEMUFile *file, RAMBlock *block,
                                  ram_addr_t offset)
 {
-    if (multifd_queue_page(rs->f, block, offset) < 0) {
+    if (multifd_queue_page(file, block, offset) < 0) {
         return -1;
     }
     ram_counters.normal++;
@@ -1717,7 +1721,7 @@ static int ram_save_release_protection(RAMState *rs, PageSearchStatus *pss,
         uint64_t run_length = (pss->page - start_page) << TARGET_PAGE_BITS;
 
         /* Flush async buffers before un-protect. */
-        qemu_fflush(rs->f);
+        qemu_fflush(pss->pss_channel);
         /* Un-protect memory range. */
         res = uffd_change_protection(rs->uffdio_fd, page_address, run_length,
                 false, false);
@@ -2304,7 +2308,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
     ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
     int res;
 
-    if (control_save_page(rs, block, offset, &res)) {
+    if (control_save_page(pss, block, offset, &res)) {
         return res;
     }
 
@@ -2312,7 +2316,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
         return 1;
     }
 
-    res = save_zero_page(rs, block, offset);
+    res = save_zero_page(rs, pss->pss_channel, block, offset);
     if (res > 0) {
         /* Must let xbzrle know, otherwise a previous (now 0'd) cached
          * page would be stale
@@ -2333,7 +2337,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
      */
     if (!save_page_use_compression(rs) && migrate_use_multifd()
         && !migration_in_postcopy()) {
-        return ram_save_multifd_page(rs, block, offset);
+        return ram_save_multifd_page(pss->pss_channel, block, offset);
     }
 
     return ram_save_page(rs, pss);
@@ -2530,10 +2534,6 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
         return 0;
     }
 
-    if (postcopy_preempt_active()) {
-        postcopy_preempt_choose_channel(rs, pss);
-    }
-
     /* Update host page boundary information */
     pss_host_page_prepare(pss);
 
@@ -2594,7 +2594,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
      * explicit flush or it won't flush until the buffer is full.
      */
     if (migrate_postcopy_preempt() && pss->postcopy_requested) {
-        qemu_fflush(rs->f);
+        qemu_fflush(pss->pss_channel);
     }
 
     res = ram_save_release_protection(rs, pss, start_page);
@@ -2652,6 +2652,12 @@ static int ram_find_and_save_block(RAMState *rs)
         }
 
         if (found) {
+            /* Update rs->f with correct channel */
+            if (postcopy_preempt_active()) {
+                postcopy_preempt_choose_channel(rs, &pss);
+            }
+            /* Cache rs->f in pss_channel (TODO: remove rs->f) */
+            pss.pss_channel = rs->f;
             pages = ram_save_host_page(rs, &pss);
         }
     } while (!pages && again);
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 10/13] migration: Add pss_init()
  2022-08-29 16:56 [PATCH RFC 00/13] migration: Postcopy Preempt-Full Peter Xu
                   ` (8 preceding siblings ...)
  2022-08-29 16:56 ` [PATCH RFC 09/13] migration: Introduce pss_channel Peter Xu
@ 2022-08-29 16:56 ` Peter Xu
  2022-08-29 16:56 ` [PATCH RFC 11/13] migration: Make PageSearchStatus part of RAMState Peter Xu
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Peter Xu @ 2022-08-29 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P . Berrange, Leonardo Bras Soares Passos, Manish Mishra,
	Juan Quintela, Dr . David Alan Gilbert, peterx

Helper to init PSS structures.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index adcc57c584..bdfcc6171a 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -535,6 +535,14 @@ static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
 static void postcopy_preempt_restore(RAMState *rs, PageSearchStatus *pss,
                                      bool postcopy_requested);
 
+/* NOTE: page is the PFN not real ram_addr_t. */
+static void pss_init(PageSearchStatus *pss, RAMBlock *rb, ram_addr_t page)
+{
+    pss->block = rb;
+    pss->page = page;
+    pss->complete_round = false;
+}
+
 static void *do_data_compress(void *opaque)
 {
     CompressParam *param = opaque;
@@ -2625,9 +2633,7 @@ static int ram_find_and_save_block(RAMState *rs)
         return pages;
     }
 
-    pss.block = rs->last_seen_block;
-    pss.page = rs->last_page;
-    pss.complete_round = false;
+    pss_init(&pss, rs->last_seen_block, rs->last_page);
 
     if (!pss.block) {
         pss.block = QLIST_FIRST_RCU(&ram_list.blocks);
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 11/13] migration: Make PageSearchStatus part of RAMState
  2022-08-29 16:56 [PATCH RFC 00/13] migration: Postcopy Preempt-Full Peter Xu
                   ` (9 preceding siblings ...)
  2022-08-29 16:56 ` [PATCH RFC 10/13] migration: Add pss_init() Peter Xu
@ 2022-08-29 16:56 ` Peter Xu
  2022-08-29 16:56 ` [PATCH RFC 12/13] migration: Move last_sent_block into PageSearchStatus Peter Xu
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Peter Xu @ 2022-08-29 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P . Berrange, Leonardo Bras Soares Passos, Manish Mishra,
	Juan Quintela, Dr . David Alan Gilbert, peterx

We used to allocate PSS structure on the stack for precopy when sending
pages.  Make it static, so as to describe per-channel ram migration status.

Here we declared RAM_CHANNEL_MAX instances, preparing for postcopy to use
it, even though this patch has not yet to start using the 2nd instance.

This should not have any functional change per se, but it already starts to
export PSS information via the RAMState, so that e.g. one PSS channel can
start to reference the other PSS channel.

Always protect PSS access using the same RAMState.bitmap_mutex.  We already
do so, so no code change needed, just some comment update.  Maybe we should
consider renaming bitmap_mutex some day as it's going to be a more commonly
and big mutex we use for ram states, but just leave it for later.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 116 ++++++++++++++++++++++++++----------------------
 1 file changed, 63 insertions(+), 53 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index bdfcc6171a..2be9b91ffc 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -85,6 +85,46 @@
 
 XBZRLECacheStats xbzrle_counters;
 
+/* used by the search for pages to send */
+struct PageSearchStatus {
+    /* The migration channel used for a specific host page */
+    QEMUFile    *pss_channel;
+    /* Current block being searched */
+    RAMBlock    *block;
+    /* Current page to search from */
+    unsigned long page;
+    /* Set once we wrap around */
+    bool         complete_round;
+    /*
+     * [POSTCOPY-ONLY] Whether current page is explicitly requested by
+     * postcopy.  When set, the request is "urgent" because the dest QEMU
+     * threads are waiting for us.
+     */
+    bool         postcopy_requested;
+    /*
+     * [POSTCOPY-ONLY] The target channel to use to send current page.
+     *
+     * Note: This may _not_ match with the value in postcopy_requested
+     * above. Let's imagine the case where the postcopy request is exactly
+     * the page that we're sending in progress during precopy. In this case
+     * we'll have postcopy_requested set to true but the target channel
+     * will be the precopy channel (so that we don't split brain on that
+     * specific page since the precopy channel already contains partial of
+     * that page data).
+     *
+     * Besides that specific use case, postcopy_target_channel should
+     * always be equal to postcopy_requested, because by default we send
+     * postcopy pages via postcopy preempt channel.
+     */
+    bool         postcopy_target_channel;
+    /* Whether we're sending a host page */
+    bool          host_page_sending;
+    /* The start/end of current host page.  Invalid if host_page_sending==false */
+    unsigned long host_page_start;
+    unsigned long host_page_end;
+};
+typedef struct PageSearchStatus PageSearchStatus;
+
 /* struct contains XBZRLE cache and a static page
    used by the compression */
 static struct {
@@ -319,6 +359,11 @@ typedef struct {
 struct RAMState {
     /* QEMUFile used for this migration */
     QEMUFile *f;
+    /*
+     * PageSearchStatus structures for the channels when send pages.
+     * Protected by the bitmap_mutex.
+     */
+    PageSearchStatus pss[RAM_CHANNEL_MAX];
     /* UFFD file descriptor, used in 'write-tracking' migration */
     int uffdio_fd;
     /* Last block that we have visited searching for dirty pages */
@@ -362,7 +407,12 @@ struct RAMState {
     uint64_t target_page_count;
     /* number of dirty bits in the bitmap */
     uint64_t migration_dirty_pages;
-    /* Protects modification of the bitmap and migration dirty pages */
+    /*
+     * Protects:
+     * - dirty/clear bitmap
+     * - migration_dirty_pages
+     * - pss structures
+     */
     QemuMutex bitmap_mutex;
     /* The RAMBlock used in the last src_page_requests */
     RAMBlock *last_req_rb;
@@ -444,46 +494,6 @@ void dirty_sync_missed_zero_copy(void)
     ram_counters.dirty_sync_missed_zero_copy++;
 }
 
-/* used by the search for pages to send */
-struct PageSearchStatus {
-    /* The migration channel used for a specific host page */
-    QEMUFile    *pss_channel;
-    /* Current block being searched */
-    RAMBlock    *block;
-    /* Current page to search from */
-    unsigned long page;
-    /* Set once we wrap around */
-    bool         complete_round;
-    /*
-     * [POSTCOPY-ONLY] Whether current page is explicitly requested by
-     * postcopy.  When set, the request is "urgent" because the dest QEMU
-     * threads are waiting for us.
-     */
-    bool         postcopy_requested;
-    /*
-     * [POSTCOPY-ONLY] The target channel to use to send current page.
-     *
-     * Note: This may _not_ match with the value in postcopy_requested
-     * above. Let's imagine the case where the postcopy request is exactly
-     * the page that we're sending in progress during precopy. In this case
-     * we'll have postcopy_requested set to true but the target channel
-     * will be the precopy channel (so that we don't split brain on that
-     * specific page since the precopy channel already contains partial of
-     * that page data).
-     *
-     * Besides that specific use case, postcopy_target_channel should
-     * always be equal to postcopy_requested, because by default we send
-     * postcopy pages via postcopy preempt channel.
-     */
-    bool         postcopy_target_channel;
-    /* Whether we're sending a host page */
-    bool          host_page_sending;
-    /* The start/end of current host page.  Invalid if host_page_sending==false */
-    unsigned long host_page_start;
-    unsigned long host_page_end;
-};
-typedef struct PageSearchStatus PageSearchStatus;
-
 CompressionStats compression_counters;
 
 struct CompressParam {
@@ -2624,7 +2634,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
  */
 static int ram_find_and_save_block(RAMState *rs)
 {
-    PageSearchStatus pss;
+    PageSearchStatus *pss = &rs->pss[RAM_CHANNEL_PRECOPY];
     int pages = 0;
     bool again, found;
 
@@ -2633,15 +2643,15 @@ static int ram_find_and_save_block(RAMState *rs)
         return pages;
     }
 
-    pss_init(&pss, rs->last_seen_block, rs->last_page);
+    pss_init(pss, rs->last_seen_block, rs->last_page);
 
-    if (!pss.block) {
-        pss.block = QLIST_FIRST_RCU(&ram_list.blocks);
+    if (!pss->block) {
+        pss->block = QLIST_FIRST_RCU(&ram_list.blocks);
     }
 
     do {
         again = true;
-        found = get_queued_page(rs, &pss);
+        found = get_queued_page(rs, pss);
 
         if (!found) {
             /*
@@ -2649,27 +2659,27 @@ static int ram_find_and_save_block(RAMState *rs)
              * preempted precopy.  Otherwise find the next dirty bit.
              */
             if (postcopy_preempt_triggered(rs)) {
-                postcopy_preempt_restore(rs, &pss, false);
+                postcopy_preempt_restore(rs, pss, false);
                 found = true;
             } else {
                 /* priority queue empty, so just search for something dirty */
-                found = find_dirty_block(rs, &pss, &again);
+                found = find_dirty_block(rs, pss, &again);
             }
         }
 
         if (found) {
             /* Update rs->f with correct channel */
             if (postcopy_preempt_active()) {
-                postcopy_preempt_choose_channel(rs, &pss);
+                postcopy_preempt_choose_channel(rs, pss);
             }
             /* Cache rs->f in pss_channel (TODO: remove rs->f) */
-            pss.pss_channel = rs->f;
-            pages = ram_save_host_page(rs, &pss);
+            pss->pss_channel = rs->f;
+            pages = ram_save_host_page(rs, pss);
         }
     } while (!pages && again);
 
-    rs->last_seen_block = pss.block;
-    rs->last_page = pss.page;
+    rs->last_seen_block = pss->block;
+    rs->last_page = pss->page;
 
     return pages;
 }
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 12/13] migration: Move last_sent_block into PageSearchStatus
  2022-08-29 16:56 [PATCH RFC 00/13] migration: Postcopy Preempt-Full Peter Xu
                   ` (10 preceding siblings ...)
  2022-08-29 16:56 ` [PATCH RFC 11/13] migration: Make PageSearchStatus part of RAMState Peter Xu
@ 2022-08-29 16:56 ` Peter Xu
  2022-08-29 16:56 ` [PATCH RFC 13/13] migration: Send requested page directly in rp-return thread Peter Xu
  2022-08-29 17:39 ` [PATCH RFC 00/13] migration: Postcopy Preempt-Full Peter Xu
  13 siblings, 0 replies; 18+ messages in thread
From: Peter Xu @ 2022-08-29 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P . Berrange, Leonardo Bras Soares Passos, Manish Mishra,
	Juan Quintela, Dr . David Alan Gilbert, peterx

Since we use PageSearchStatus to represent a channel, it makes perfect
sense to keep last_sent_block (aka, leverage RAM_SAVE_FLAG_CONTINUE) to be
per-channel rather than global because each channel can be sending
different pages on ramblocks.

Hence move it from RAMState into PageSearchStatus.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 71 ++++++++++++++++++++++++++++---------------------
 1 file changed, 41 insertions(+), 30 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 2be9b91ffc..ef89812c69 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -89,6 +89,8 @@ XBZRLECacheStats xbzrle_counters;
 struct PageSearchStatus {
     /* The migration channel used for a specific host page */
     QEMUFile    *pss_channel;
+    /* Last block from where we have sent data */
+    RAMBlock *last_sent_block;
     /* Current block being searched */
     RAMBlock    *block;
     /* Current page to search from */
@@ -368,8 +370,6 @@ struct RAMState {
     int uffdio_fd;
     /* Last block that we have visited searching for dirty pages */
     RAMBlock *last_seen_block;
-    /* Last block from where we have sent data */
-    RAMBlock *last_sent_block;
     /* Last dirty target page we have sent */
     ram_addr_t last_page;
     /* last ram version we have seen */
@@ -677,16 +677,17 @@ exit:
  *
  * Returns the number of bytes written
  *
- * @f: QEMUFile where to send the data
+ * @pss: current PSS channel status
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  *          in the lower bits, it contains flags
  */
-static size_t save_page_header(RAMState *rs, QEMUFile *f,  RAMBlock *block,
+static size_t save_page_header(PageSearchStatus *pss, RAMBlock *block,
                                ram_addr_t offset)
 {
     size_t size, len;
-    bool same_block = (block == rs->last_sent_block);
+    bool same_block = (block == pss->last_sent_block);
+    QEMUFile *f = pss->pss_channel;
 
     if (same_block) {
         offset |= RAM_SAVE_FLAG_CONTINUE;
@@ -699,7 +700,7 @@ static size_t save_page_header(RAMState *rs, QEMUFile *f,  RAMBlock *block,
         qemu_put_byte(f, len);
         qemu_put_buffer(f, (uint8_t *)block->idstr, len);
         size += 1 + len;
-        rs->last_sent_block = block;
+        pss->last_sent_block = block;
     }
     return size;
 }
@@ -783,17 +784,19 @@ static void xbzrle_cache_zero_page(RAMState *rs, ram_addr_t current_addr)
  *          -1 means that xbzrle would be longer than normal
  *
  * @rs: current RAM state
+ * @pss: current PSS channel
  * @current_data: pointer to the address of the page contents
  * @current_addr: addr of the page
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  */
-static int save_xbzrle_page(RAMState *rs, QEMUFile *file,
+static int save_xbzrle_page(RAMState *rs, PageSearchStatus *pss,
                             uint8_t **current_data, ram_addr_t current_addr,
                             RAMBlock *block, ram_addr_t offset)
 {
     int encoded_len = 0, bytes_xbzrle;
     uint8_t *prev_cached_page;
+    QEMUFile *file = pss->pss_channel;
 
     if (!cache_is_cached(XBZRLE.cache, current_addr,
                          ram_counters.dirty_sync_count)) {
@@ -858,7 +861,7 @@ static int save_xbzrle_page(RAMState *rs, QEMUFile *file,
     }
 
     /* Send XBZRLE based compressed page */
-    bytes_xbzrle = save_page_header(rs, file, block,
+    bytes_xbzrle = save_page_header(pss, block,
                                     offset | RAM_SAVE_FLAG_XBZRLE);
     qemu_put_byte(file, ENCODING_FLAG_XBZRLE);
     qemu_put_be16(file, encoded_len);
@@ -1286,19 +1289,19 @@ static void ram_release_page(const char *rbname, uint64_t offset)
  * Returns the size of data written to the file, 0 means the page is not
  * a zero page
  *
- * @rs: current RAM state
- * @file: the file where the data is saved
+ * @pss: current PSS channel
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  */
-static int save_zero_page_to_file(RAMState *rs, QEMUFile *file,
+static int save_zero_page_to_file(PageSearchStatus *pss,
                                   RAMBlock *block, ram_addr_t offset)
 {
     uint8_t *p = block->host + offset;
+    QEMUFile *file = pss->pss_channel;
     int len = 0;
 
     if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
-        len += save_page_header(rs, file, block, offset | RAM_SAVE_FLAG_ZERO);
+        len += save_page_header(pss, block, offset | RAM_SAVE_FLAG_ZERO);
         qemu_put_byte(file, 0);
         len += 1;
         ram_release_page(block->idstr, offset);
@@ -1311,14 +1314,14 @@ static int save_zero_page_to_file(RAMState *rs, QEMUFile *file,
  *
  * Returns the number of pages written.
  *
- * @rs: current RAM state
+ * @pss: current PSS channel
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  */
-static int save_zero_page(RAMState *rs, QEMUFile *file, RAMBlock *block,
+static int save_zero_page(PageSearchStatus *pss, RAMBlock *block,
                           ram_addr_t offset)
 {
-    int len = save_zero_page_to_file(rs, file, block, offset);
+    int len = save_zero_page_to_file(pss, block, offset);
 
     if (len) {
         ram_counters.duplicate++;
@@ -1371,16 +1374,18 @@ static bool control_save_page(PageSearchStatus *pss, RAMBlock *block,
  *
  * Returns the number of pages written.
  *
- * @rs: current RAM state
+ * @pss: current PSS channel
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  * @buf: the page to be sent
  * @async: send to page asyncly
  */
-static int save_normal_page(RAMState *rs, QEMUFile *file, RAMBlock *block,
+static int save_normal_page(PageSearchStatus *pss, RAMBlock *block,
                             ram_addr_t offset, uint8_t *buf, bool async)
 {
-    ram_transferred_add(save_page_header(rs, file, block,
+    QEMUFile *file = pss->pss_channel;
+
+    ram_transferred_add(save_page_header(pss, block,
                                          offset | RAM_SAVE_FLAG_PAGE));
     if (async) {
         qemu_put_buffer_async(file, buf, TARGET_PAGE_SIZE,
@@ -1420,7 +1425,7 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
 
     XBZRLE_cache_lock();
     if (rs->xbzrle_enabled && !migration_in_postcopy()) {
-        pages = save_xbzrle_page(rs, pss->pss_channel, &p, current_addr,
+        pages = save_xbzrle_page(rs, pss, &p, current_addr,
                                  block, offset);
         if (!rs->last_stage) {
             /* Can't send this cached data async, since the cache page
@@ -1432,8 +1437,7 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
 
     /* XBZRLE overflow or normal page */
     if (pages == -1) {
-        pages = save_normal_page(rs, pss->pss_channel, block, offset,
-                                 p, send_async);
+        pages = save_normal_page(pss, block, offset, p, send_async);
     }
 
     XBZRLE_cache_unlock();
@@ -1456,14 +1460,15 @@ static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
                                  ram_addr_t offset, uint8_t *source_buf)
 {
     RAMState *rs = ram_state;
+    PageSearchStatus *pss = &rs->pss[RAM_CHANNEL_PRECOPY];
     uint8_t *p = block->host + offset;
     int ret;
 
-    if (save_zero_page_to_file(rs, f, block, offset)) {
+    if (save_zero_page_to_file(pss, block, offset)) {
         return true;
     }
 
-    save_page_header(rs, f, block, offset | RAM_SAVE_FLAG_COMPRESS_PAGE);
+    save_page_header(pss, block, offset | RAM_SAVE_FLAG_COMPRESS_PAGE);
 
     /*
      * copy it to a internal buffer to avoid it being modified by VM
@@ -2283,7 +2288,8 @@ static bool save_page_use_compression(RAMState *rs)
  * has been properly handled by compression, otherwise needs other
  * paths to handle it
  */
-static bool save_compress_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
+static bool save_compress_page(RAMState *rs, PageSearchStatus *pss,
+                               RAMBlock *block, ram_addr_t offset)
 {
     if (!save_page_use_compression(rs)) {
         return false;
@@ -2299,7 +2305,7 @@ static bool save_compress_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
      * We post the fist page as normal page as compression will take
      * much CPU resource.
      */
-    if (block != rs->last_sent_block) {
+    if (block != pss->last_sent_block) {
         flush_compressed_data(rs);
         return false;
     }
@@ -2330,11 +2336,11 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
         return res;
     }
 
-    if (save_compress_page(rs, block, offset)) {
+    if (save_compress_page(rs, pss, block, offset)) {
         return 1;
     }
 
-    res = save_zero_page(rs, pss->pss_channel, block, offset);
+    res = save_zero_page(pss, block, offset);
     if (res > 0) {
         /* Must let xbzrle know, otherwise a previous (now 0'd) cached
          * page would be stale
@@ -2466,7 +2472,7 @@ static void postcopy_preempt_choose_channel(RAMState *rs, PageSearchStatus *pss)
          * If channel switched, reset last_sent_block since the old sent block
          * may not be on the same channel.
          */
-        rs->last_sent_block = NULL;
+        pss->last_sent_block = NULL;
 
         trace_postcopy_preempt_switch_channel(channel);
     }
@@ -2793,8 +2799,13 @@ static void ram_save_cleanup(void *opaque)
 
 static void ram_state_reset(RAMState *rs)
 {
+    int i;
+
+    for (i = 0; i < RAM_CHANNEL_MAX; i++) {
+        rs->pss[i].last_sent_block = NULL;
+    }
+
     rs->last_seen_block = NULL;
-    rs->last_sent_block = NULL;
     rs->last_page = 0;
     rs->last_version = ram_list.version;
     rs->xbzrle_enabled = false;
@@ -2988,8 +2999,8 @@ void ram_postcopy_send_discard_bitmap(MigrationState *ms)
     migration_bitmap_sync(rs);
 
     /* Easiest way to make sure we don't resume in the middle of a host-page */
+    rs->pss[RAM_CHANNEL_PRECOPY].last_sent_block = NULL;
     rs->last_seen_block = NULL;
-    rs->last_sent_block = NULL;
     rs->last_page = 0;
 
     postcopy_each_ram_send_discard(ms);
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 13/13] migration: Send requested page directly in rp-return thread
  2022-08-29 16:56 [PATCH RFC 00/13] migration: Postcopy Preempt-Full Peter Xu
                   ` (11 preceding siblings ...)
  2022-08-29 16:56 ` [PATCH RFC 12/13] migration: Move last_sent_block into PageSearchStatus Peter Xu
@ 2022-08-29 16:56 ` Peter Xu
  2022-08-29 17:39 ` [PATCH RFC 00/13] migration: Postcopy Preempt-Full Peter Xu
  13 siblings, 0 replies; 18+ messages in thread
From: Peter Xu @ 2022-08-29 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P . Berrange, Leonardo Bras Soares Passos, Manish Mishra,
	Juan Quintela, Dr . David Alan Gilbert, peterx

With all the facilities ready, send the requested page directly in the
rp-return thread rather than queuing it in the request queue, if and only
if postcopy preempt is enabled.  It can achieve so because it uses separate
channel for sending urgent pages.  The only shared data is bitmap and it's
protected by the bitmap_mutex.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 108 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index ef89812c69..e731a70255 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -539,6 +539,8 @@ static QemuThread *decompress_threads;
 static QemuMutex decomp_done_lock;
 static QemuCond decomp_done_cond;
 
+static int ram_save_host_page_urgent(PageSearchStatus *pss);
+
 static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
                                  ram_addr_t offset, uint8_t *source_buf);
 
@@ -553,6 +555,16 @@ static void pss_init(PageSearchStatus *pss, RAMBlock *rb, ram_addr_t page)
     pss->complete_round = false;
 }
 
+/*
+ * Check whether two PSSs are actively sending the same page.  Return true
+ * if it is, false otherwise.
+ */
+static bool pss_overlap(PageSearchStatus *pss1, PageSearchStatus *pss2)
+{
+    return pss1->host_page_sending && pss2->host_page_sending &&
+        (pss1->host_page_start == pss2->host_page_start);
+}
+
 static void *do_data_compress(void *opaque)
 {
     CompressParam *param = opaque;
@@ -2250,6 +2262,53 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len)
         return -1;
     }
 
+    /*
+     * When with postcopy preempt, we send back the page directly in the
+     * rp-return thread.
+     */
+    if (postcopy_preempt_active()) {
+        ram_addr_t page_start = start >> TARGET_PAGE_BITS;
+        size_t page_size = qemu_ram_pagesize(ramblock);
+        PageSearchStatus *pss = &ram_state->pss[RAM_CHANNEL_POSTCOPY];
+        int ret = 0;
+
+        qemu_mutex_lock(&rs->bitmap_mutex);
+
+        pss_init(pss, ramblock, page_start);
+        /* Always use the preempt channel, and make sure it's there */
+        pss->pss_channel = migrate_get_current()->postcopy_qemufile_src;
+        pss->postcopy_requested = true;
+        assert(pss->pss_channel);
+
+        /*
+         * It must be either one or multiple of host page size.  Just
+         * assert; if something wrong we're mostly split brain anyway.
+         */
+        assert(len % page_size == 0);
+        while (len) {
+            if (ram_save_host_page_urgent(pss)) {
+                error_report("%s: ram_save_host_page_urgent() failed: "
+                             "ramblock=%s, start_addr=0x"RAM_ADDR_FMT,
+                             __func__, ramblock->idstr, start);
+                ret = -1;
+                break;
+            }
+            /*
+             * NOTE: after ram_save_host_page_urgent() succeeded, pss->page
+             * will automatically be moved and point to the next host page
+             * we're going to send, so no need to update here.
+             *
+             * Normally QEMU never sends >1 host page in requests, so
+             * logically we don't even need that as the loop should only
+             * run once, but just to be consistent.
+             */
+            len -= page_size;
+        };
+        qemu_mutex_unlock(&rs->bitmap_mutex);
+
+        return ret;
+    }
+
     struct RAMSrcPageRequest *new_entry =
         g_new0(struct RAMSrcPageRequest, 1);
     new_entry->rb = ramblock;
@@ -2528,6 +2587,55 @@ static void pss_host_page_finish(PageSearchStatus *pss)
     pss->host_page_end = 0;
 }
 
+/*
+ * Send an urgent host page specified by `pss'.  Need to be called with
+ * bitmap_mutex held.
+ *
+ * Returns 0 if save host page succeeded, false otherwise.
+ */
+static int ram_save_host_page_urgent(PageSearchStatus *pss)
+{
+    bool page_dirty, sent = false;
+    RAMState *rs = ram_state;
+    int ret = 0;
+
+    trace_postcopy_preempt_send_host_page(pss->block->idstr, pss->page);
+    pss_host_page_prepare(pss);
+
+    /*
+     * If precopy is sending the same page, let it be done in precopy, or
+     * we could send the same page in two channels and none of them will
+     * receive the whole page.
+     */
+    if (pss_overlap(pss, &ram_state->pss[RAM_CHANNEL_PRECOPY])) {
+        trace_postcopy_preempt_hit(pss->block->idstr,
+                                   pss->page << TARGET_PAGE_BITS);
+        return 0;
+    }
+
+    do {
+        page_dirty = migration_bitmap_clear_dirty(rs, pss->block, pss->page);
+
+        if (page_dirty) {
+            /* Be strict to return code; it must be 1, or what else? */
+            if (ram_save_target_page(rs, pss) != 1) {
+                error_report_once("%s: ram_save_target_page failed", __func__);
+                ret = -1;
+                goto out;
+            }
+            sent = true;
+        }
+        pss_find_next_dirty(pss);
+    } while (pss_within_range(pss));
+out:
+    pss_host_page_finish(pss);
+    /* For urgent requests, flush immediately if sent */
+    if (sent) {
+        qemu_fflush(pss->pss_channel);
+    }
+    return ret;
+}
+
 /**
  * ram_save_host_page: save a whole host page
  *
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 00/13] migration: Postcopy Preempt-Full
  2022-08-29 16:56 [PATCH RFC 00/13] migration: Postcopy Preempt-Full Peter Xu
                   ` (12 preceding siblings ...)
  2022-08-29 16:56 ` [PATCH RFC 13/13] migration: Send requested page directly in rp-return thread Peter Xu
@ 2022-08-29 17:39 ` Peter Xu
  13 siblings, 0 replies; 18+ messages in thread
From: Peter Xu @ 2022-08-29 17:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P . Berrange, Leonardo Bras Soares Passos, Manish Mishra,
	Juan Quintela, Dr . David Alan Gilbert

On Mon, Aug 29, 2022 at 12:56:46PM -0400, Peter Xu wrote:
> This is a RFC series.  Tree is here:
> 
>   https://github.com/xzpeter/qemu/tree/preempt-full
> 
> It's not complete because there're still something we need to do which will
> be attached to the end of this cover letter, however this series can
> already safely pass qtest and any of my test.
> 
> Comparing to the recently merged preempt mode I called it "preempt-full"
> because it threadifies the postcopy channels so now urgent pages can be
> fully handled separately outside of the ram save loop.  Sorry to have the
> same name as the PREEMPT_FULL in the Linux RT world, it's just that we
> needed a name for the capability and it was named as preempt already
> anyway..
> 
> The existing preempt code has reduced ramdom page req latency over 10Gbps
> network from ~12ms to ~500us which has already landed.
> 
> This preempt-full series can further reduces that ~500us to ~230us per my
> initial test.  More to share below.
> 
> Note that no new capability is needed, IOW it's fully compatible with the
> existing preempt mode.  So the naming is actually not important but just to
> identify the difference on the binaries.  It's because this series only
> reworks the sender side code and does not change the migration protocol, it
> just runs faster.
> 
> IOW, old "preempt" QEMU can also migrate to "preempt-full" QEMU, vice versa.
> 
>   - When old "preempt" mode QEMU migrates to "preempt-full" QEMU, it'll be
>     the same as running both old "preempt" QEMUs.
> 
>   - When "preempt-full" QEMU migrates to old "preempt" QEMU, it'll be the
>     same as running both "preempt-full".
> 
> The logic of the series is quite simple too: simply moving the existing
> preempt channel page sends to rp-return thread.  It can slow down rp-return
> thread on receiving pages, but I don't really see a major issue with it so
> far.
> 
> This latency number is getting close to the extreme of 4K page request
> latency of any TCP roundtrip of the 10Gbps nic I have.  The 'extreme
> number' is something I get from mig_mon tool which has a mode [1] to
> emulate the extreme tcp roundtrips of page requests.
> 
> Performance
> ===========
> 
> Page request latencies has distributions as below, with a VM of 20G mem, 20
> cores, 10Gbps nic, 18G fully random writes:
> 
> Postcopy Vanilla
> ----------------
> 
> Average: 12093 (us)
> @delay_us:
> [1]                    1 |                                                    |
> [2, 4)                 0 |                                                    |
> [4, 8)                 0 |                                                    |
> [8, 16)                0 |                                                    |
> [16, 32)               1 |                                                    |
> [32, 64)               8 |                                                    |
> [64, 128)             11 |                                                    |
> [128, 256)            14 |                                                    |
> [256, 512)            19 |                                                    |
> [512, 1K)             14 |                                                    |
> [1K, 2K)              35 |                                                    |
> [2K, 4K)              18 |                                                    |
> [4K, 8K)              87 |@                                                   |
> [8K, 16K)           2397 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [16K, 32K)             7 |                                                    |
> [32K, 64K)             2 |                                                    |
> [64K, 128K)           20 |                                                    |
> [128K, 256K)           6 |                                                    |
> 
> Postcopy Preempt
> ----------------
> 
> Average: 496 (us)
> 
> @delay_us:
> [32, 64)               2 |                                                    |
> [64, 128)           2306 |@@@@                                                |
> [128, 256)         25422 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [256, 512)          8238 |@@@@@@@@@@@@@@@@                                    |
> [512, 1K)           1066 |@@                                                  |
> [1K, 2K)            2167 |@@@@                                                |
> [2K, 4K)            3329 |@@@@@@                                              |
> [4K, 8K)             109 |                                                    |
> [8K, 16K)             48 |                                                    |
> 
> Postcopy Preempt-Full
> ---------------------
> 
> Average: 229 (us)
> 
> @delay_us:
> [8, 16)                1 |                                                    |
> [16, 32)               3 |                                                    |
> [32, 64)               2 |                                                    |
> [64, 128)          11956 |@@@@@@@@@@                                          |
> [128, 256)         60403 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [256, 512)         15047 |@@@@@@@@@@@@                                        |
> [512, 1K)            846 |                                                    |
> [1K, 2K)              25 |                                                    |
> [2K, 4K)              41 |                                                    |
> [4K, 8K)             131 |                                                    |
> [8K, 16K)             72 |                                                    |
> [16K, 32K)             2 |                                                    |
> [32K, 64K)             8 |                                                    |
> [64K, 128K)            6 |                                                    |
> 
> For fully sequential page access workloads, I have described in the
> previous preempt-mode work that such workload may not benefit much from
> preempt mode much, but surprisingly at least in my seq write test the
> preempt-full mode can also benefit sequential access patterns at least when
> I measured it:
> 
> Postcopy Vanilla
> ----------------
> 
> Average: 1487 (us)
> 
> @delay_us:
> [0]                   93 |@                                                   |
> [1]                 1920 |@@@@@@@@@@@@@@@@@@@@@@@                             |
> [2, 4)               504 |@@@@@@                                              |
> [4, 8)              2234 |@@@@@@@@@@@@@@@@@@@@@@@@@@@                         |
> [8, 16)             4199 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [16, 32)            3782 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      |
> [32, 64)            1016 |@@@@@@@@@@@@                                        |
> [64, 128)             81 |@                                                   |
> [128, 256)            14 |                                                    |
> [256, 512)            26 |                                                    |
> [512, 1K)             69 |                                                    |
> [1K, 2K)             208 |@@                                                  |
> [2K, 4K)             429 |@@@@@                                               |
> [4K, 8K)            2779 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                  |
> [8K, 16K)            792 |@@@@@@@@@                                           |
> [16K, 32K)             9 |                                                    |
> 
> Postcopy Preempt-Full
> ---------------------
> 
> Average: 1582 (us)
> 
> @delay_us:
> [0]                   45 |                                                    |
> [1]                 1786 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                       |
> [2, 4)               423 |@@@@@@@                                             |
> [4, 8)              1903 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                     |
> [8, 16)             2933 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    |
> [16, 32)            3132 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [32, 64)             518 |@@@@@@@@                                            |
> [64, 128)             30 |                                                    |
> [128, 256)           218 |@@@                                                 |
> [256, 512)           214 |@@@                                                 |
> [512, 1K)            211 |@@@                                                 |
> [1K, 2K)             131 |@@                                                  |
> [2K, 4K)             336 |@@@@@                                               |
> [4K, 8K)            3023 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |
> [8K, 16K)            479 |@@@@@@@                                             |
> 
> Postcopy Preempt-Full
> ---------------------
> 
> Average: 439 (us)
> 
> @delay_us:
> [0]                    3 |                                                    |
> [1]                 1058 |@                                                   |
> [2, 4)               179 |                                                    |
> [4, 8)              1079 |@                                                   |
> [8, 16)             2251 |@@@                                                 |
> [16, 32)            2345 |@@@@                                                |
> [32, 64)             713 |@                                                   |
> [64, 128)           5386 |@@@@@@@@@                                           |
> [128, 256)         30252 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [256, 512)         10789 |@@@@@@@@@@@@@@@@@@                                  |
> [512, 1K)            367 |                                                    |
> [1K, 2K)              26 |                                                    |
> [2K, 4K)             256 |                                                    |
> [4K, 8K)            1840 |@@@                                                 |
> [8K, 16K)            300 |                                                    |
> 
> I always don't think seq access is important in migrations, because for any
> not-small VM that has a migration challenge, any multiple seq accesses will
> also be grown into a random access pattern.  But I'm anyway laying the data
> around for good reference.
> 
> Comments welcomed, thanks.
> 
> TODO List
> =========
> 
> - Make migration accountings atomic
> - Drop rs->f?
> - Disable xbzrle for preempt mode?  Is it already perhaps disabled for postcopy?
> - If this series can be really accepted, we can logically drop some of the
>   old (complcated) code with the old preempt series.
> - Drop x-postcopy-preempt-break-huge parameter?
> - More to come
> 
> [1] https://github.com/xzpeter/mig_mon#vm-live-migration-network-emulator
> 
> Peter Xu (13):
>   migration: Use non-atomic ops for clear log bitmap
>   migration: Add postcopy_preempt_active()
>   migration: Yield bitmap_mutex properly when sending/sleeping
>   migration: Cleanup xbzrle zero page cache update logic
>   migration: Disallow postcopy preempt to be used with compress
>   migration: Trivial cleanup save_page_header() on same block check
>   migration: Remove RAMState.f references in compression code
>   migration: Teach PSS about host page
>   migration: Introduce pss_channel
>   migration: Add pss_init()
>   migration: Make PageSearchStatus part of RAMState
>   migration: Move last_sent_block into PageSearchStatus
>   migration: Send requested page directly in rp-return thread

Side note:

Not all the patches here are servicing the preempt-full goal.  E.g. we
could consider reviewing/merging patch 1, 5 earlier: patch 1 is long
standing perf improvement on clear bitmap ops, patch 5 should be seem as a
fix I think.

Some other trivial cleanup patches can be picked up too but not urgent.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 01/13] migration: Use non-atomic ops for clear log bitmap
  2022-08-29 16:56 ` [PATCH RFC 01/13] migration: Use non-atomic ops for clear log bitmap Peter Xu
@ 2022-09-15 18:49   ` Dr. David Alan Gilbert
  2022-09-15 19:44     ` Peter Xu
  0 siblings, 1 reply; 18+ messages in thread
From: Dr. David Alan Gilbert @ 2022-09-15 18:49 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Daniel P . Berrange, Leonardo Bras Soares Passos,
	Manish Mishra, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> Since we already have bitmap_mutex to protect either the dirty bitmap or
> the clear log bitmap, we don't need atomic operations to set/clear/test on
> the clear log bitmap.  Switching all ops from atomic to non-atomic
> versions, meanwhile touch up the comments to show which lock is in charge.
> 
> Introduced non-atomic version of bitmap_test_and_clear_atomic(), mostly the
> same as the atomic version but simplified a few places, e.g. dropped the
> "old_bits" variable, and also the explicit memory barriers.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Can you update the comment in ramblock.h above clear_bmap to say it's
always updated under that lock.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  include/exec/ram_addr.h | 11 +++++-----
>  include/qemu/bitmap.h   |  1 +
>  util/bitmap.c           | 45 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 52 insertions(+), 5 deletions(-)
> 
> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> index f3e0c78161..5092a2e0ff 100644
> --- a/include/exec/ram_addr.h
> +++ b/include/exec/ram_addr.h
> @@ -42,7 +42,8 @@ static inline long clear_bmap_size(uint64_t pages, uint8_t shift)
>  }
>  
>  /**
> - * clear_bmap_set: set clear bitmap for the page range
> + * clear_bmap_set: set clear bitmap for the page range.  Must be with
> + * bitmap_mutex held.
>   *
>   * @rb: the ramblock to operate on
>   * @start: the start page number
> @@ -55,12 +56,12 @@ static inline void clear_bmap_set(RAMBlock *rb, uint64_t start,
>  {
>      uint8_t shift = rb->clear_bmap_shift;
>  
> -    bitmap_set_atomic(rb->clear_bmap, start >> shift,
> -                      clear_bmap_size(npages, shift));
> +    bitmap_set(rb->clear_bmap, start >> shift, clear_bmap_size(npages, shift));
>  }
>  
>  /**
> - * clear_bmap_test_and_clear: test clear bitmap for the page, clear if set
> + * clear_bmap_test_and_clear: test clear bitmap for the page, clear if set.
> + * Must be with bitmap_mutex held.
>   *
>   * @rb: the ramblock to operate on
>   * @page: the page number to check
> @@ -71,7 +72,7 @@ static inline bool clear_bmap_test_and_clear(RAMBlock *rb, uint64_t page)
>  {
>      uint8_t shift = rb->clear_bmap_shift;
>  
> -    return bitmap_test_and_clear_atomic(rb->clear_bmap, page >> shift, 1);
> +    return bitmap_test_and_clear(rb->clear_bmap, page >> shift, 1);
>  }
>  
>  static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
> diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
> index 82a1d2f41f..3ccb00865f 100644
> --- a/include/qemu/bitmap.h
> +++ b/include/qemu/bitmap.h
> @@ -253,6 +253,7 @@ void bitmap_set(unsigned long *map, long i, long len);
>  void bitmap_set_atomic(unsigned long *map, long i, long len);
>  void bitmap_clear(unsigned long *map, long start, long nr);
>  bool bitmap_test_and_clear_atomic(unsigned long *map, long start, long nr);
> +bool bitmap_test_and_clear(unsigned long *map, long start, long nr);
>  void bitmap_copy_and_clear_atomic(unsigned long *dst, unsigned long *src,
>                                    long nr);
>  unsigned long bitmap_find_next_zero_area(unsigned long *map,
> diff --git a/util/bitmap.c b/util/bitmap.c
> index f81d8057a7..8d12e90a5a 100644
> --- a/util/bitmap.c
> +++ b/util/bitmap.c
> @@ -240,6 +240,51 @@ void bitmap_clear(unsigned long *map, long start, long nr)
>      }
>  }
>  
> +bool bitmap_test_and_clear(unsigned long *map, long start, long nr)
> +{
> +    unsigned long *p = map + BIT_WORD(start);
> +    const long size = start + nr;
> +    int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
> +    unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
> +    bool dirty = false;
> +
> +    assert(start >= 0 && nr >= 0);
> +
> +    /* First word */
> +    if (nr - bits_to_clear > 0) {
> +        if ((*p) & mask_to_clear) {
> +            dirty = true;
> +        }
> +        *p &= ~mask_to_clear;
> +        nr -= bits_to_clear;
> +        bits_to_clear = BITS_PER_LONG;
> +        p++;
> +    }
> +
> +    /* Full words */
> +    if (bits_to_clear == BITS_PER_LONG) {
> +        while (nr >= BITS_PER_LONG) {
> +            if (*p) {
> +                dirty = true;
> +                *p = 0;
> +            }
> +            nr -= BITS_PER_LONG;
> +            p++;
> +        }
> +    }
> +
> +    /* Last word */
> +    if (nr) {
> +        mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
> +        if ((*p) & mask_to_clear) {
> +            dirty = true;
> +        }
> +        *p &= ~mask_to_clear;
> +    }
> +
> +    return dirty;
> +}
> +
>  bool bitmap_test_and_clear_atomic(unsigned long *map, long start, long nr)
>  {
>      unsigned long *p = map + BIT_WORD(start);
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 02/13] migration: Add postcopy_preempt_active()
  2022-08-29 16:56 ` [PATCH RFC 02/13] migration: Add postcopy_preempt_active() Peter Xu
@ 2022-09-15 18:50   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 18+ messages in thread
From: Dr. David Alan Gilbert @ 2022-09-15 18:50 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Daniel P . Berrange, Leonardo Bras Soares Passos,
	Manish Mishra, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> Add the helper to show that postcopy preempt enabled, meanwhile active.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/ram.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index dc1de9ddbc..8c5d5332e8 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -162,6 +162,11 @@ out:
>      return ret;
>  }
>  
> +static bool postcopy_preempt_active(void)
> +{
> +    return migrate_postcopy_preempt() && migration_in_postcopy();
> +}
> +
>  bool ramblock_is_ignored(RAMBlock *block)
>  {
>      return !qemu_ram_is_migratable(block) ||
> @@ -2434,7 +2439,7 @@ static void postcopy_preempt_choose_channel(RAMState *rs, PageSearchStatus *pss)
>  /* We need to make sure rs->f always points to the default channel elsewhere */
>  static void postcopy_preempt_reset_channel(RAMState *rs)
>  {
> -    if (migrate_postcopy_preempt() && migration_in_postcopy()) {
> +    if (postcopy_preempt_active()) {
>          rs->postcopy_channel = RAM_CHANNEL_PRECOPY;
>          rs->f = migrate_get_current()->to_dst_file;
>          trace_postcopy_preempt_reset_channel();
> @@ -2472,7 +2477,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
>          return 0;
>      }
>  
> -    if (migrate_postcopy_preempt() && migration_in_postcopy()) {
> +    if (postcopy_preempt_active()) {
>          postcopy_preempt_choose_channel(rs, pss);
>      }
>  
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 01/13] migration: Use non-atomic ops for clear log bitmap
  2022-09-15 18:49   ` Dr. David Alan Gilbert
@ 2022-09-15 19:44     ` Peter Xu
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Xu @ 2022-09-15 19:44 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Daniel P . Berrange, Leonardo Bras Soares Passos,
	Manish Mishra, Juan Quintela

On Thu, Sep 15, 2022 at 07:49:57PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > Since we already have bitmap_mutex to protect either the dirty bitmap or
> > the clear log bitmap, we don't need atomic operations to set/clear/test on
> > the clear log bitmap.  Switching all ops from atomic to non-atomic
> > versions, meanwhile touch up the comments to show which lock is in charge.
> > 
> > Introduced non-atomic version of bitmap_test_and_clear_atomic(), mostly the
> > same as the atomic version but simplified a few places, e.g. dropped the
> > "old_bits" variable, and also the explicit memory barriers.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> Can you update the comment in ramblock.h above clear_bmap to say it's
> always updated under that lock.

I'll squash below into the same patch:

---8<---
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 6cbedf9e0c..adc03df59c 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -53,6 +53,9 @@ struct RAMBlock {
      * and split clearing of dirty bitmap on the remote node (e.g.,
      * KVM).  The bitmap will be set only when doing global sync.
      *
+     * It is only used during src side of ram migration, and it is
+     * protected by the global ram_state.bitmap_mutex.
+     *
      * NOTE: this bitmap is different comparing to the other bitmaps
      * in that one bit can represent multiple guest pages (which is
      * decided by the `clear_bmap_shift' variable below).  On
---8<---

> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Thanks,

-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-09-15 19:46 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-29 16:56 [PATCH RFC 00/13] migration: Postcopy Preempt-Full Peter Xu
2022-08-29 16:56 ` [PATCH RFC 01/13] migration: Use non-atomic ops for clear log bitmap Peter Xu
2022-09-15 18:49   ` Dr. David Alan Gilbert
2022-09-15 19:44     ` Peter Xu
2022-08-29 16:56 ` [PATCH RFC 02/13] migration: Add postcopy_preempt_active() Peter Xu
2022-09-15 18:50   ` Dr. David Alan Gilbert
2022-08-29 16:56 ` [PATCH RFC 03/13] migration: Yield bitmap_mutex properly when sending/sleeping Peter Xu
2022-08-29 16:56 ` [PATCH RFC 04/13] migration: Cleanup xbzrle zero page cache update logic Peter Xu
2022-08-29 16:56 ` [PATCH RFC 05/13] migration: Disallow postcopy preempt to be used with compress Peter Xu
2022-08-29 16:56 ` [PATCH RFC 06/13] migration: Trivial cleanup save_page_header() on same block check Peter Xu
2022-08-29 16:56 ` [PATCH RFC 07/13] migration: Remove RAMState.f references in compression code Peter Xu
2022-08-29 16:56 ` [PATCH RFC 08/13] migration: Teach PSS about host page Peter Xu
2022-08-29 16:56 ` [PATCH RFC 09/13] migration: Introduce pss_channel Peter Xu
2022-08-29 16:56 ` [PATCH RFC 10/13] migration: Add pss_init() Peter Xu
2022-08-29 16:56 ` [PATCH RFC 11/13] migration: Make PageSearchStatus part of RAMState Peter Xu
2022-08-29 16:56 ` [PATCH RFC 12/13] migration: Move last_sent_block into PageSearchStatus Peter Xu
2022-08-29 16:56 ` [PATCH RFC 13/13] migration: Send requested page directly in rp-return thread Peter Xu
2022-08-29 17:39 ` [PATCH RFC 00/13] migration: Postcopy Preempt-Full Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.