qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
@ 2021-06-30 20:08 Peter Xu
  2021-07-01  4:42 ` Wang, Wei W
                   ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Peter Xu @ 2021-06-30 20:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Hailiang Zhang, Juan Quintela, David Hildenbrand,
	Dr . David Alan Gilbert, peterx, Wei Wang,
	Leonardo Bras Soares Passos

Taking the mutex every time for each dirty bit to clear is too slow, especially
we'll take/release even if the dirty bit is cleared.  So far it's only used to
sync with special cases with qemu_guest_free_page_hint() against migration
thread, nothing really that serious yet.  Let's move the lock to be upper.

There're two callers of migration_bitmap_clear_dirty().

For migration, move it into ram_save_iterate().  With the help of MAX_WAIT
logic, we'll only run ram_save_iterate() for no more than 50ms-ish time, so
taking the lock once there at the entry.  It also means any call sites to
qemu_guest_free_page_hint() can be delayed; but it should be very rare, only
during migration, and I don't see a problem with it.

For COLO, move it up to colo_flush_ram_cache().  I think COLO forgot to take
that lock even when calling ramblock_sync_dirty_bitmap(), where another example
is migration_bitmap_sync() who took it right.  So let the mutex cover both the
ramblock_sync_dirty_bitmap() and migration_bitmap_clear_dirty() calls.

It's even possible to drop the lock so we use atomic operations upon rb->bmap
and the variable migration_dirty_pages.  I didn't do it just to still be safe,
also not predictable whether the frequent atomic ops could bring overhead too
e.g. on huge vms when it happens very often.  When that really comes, we can
keep a local counter and periodically call atomic ops.  Keep it simple for now.

Cc: Wei Wang <wei.w.wang@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Leonardo Bras Soares Passos <lsoaresp@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 723af67c2e..9f2965675d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -795,8 +795,6 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
 {
     bool ret;
 
-    QEMU_LOCK_GUARD(&rs->bitmap_mutex);
-
     /*
      * Clear dirty bitmap if needed.  This _must_ be called before we
      * send any of the page in the chunk because we need to make sure
@@ -2834,6 +2832,14 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
         goto out;
     }
 
+    /*
+     * We'll take this lock a little bit long, but it's okay for two reasons.
+     * Firstly, the only possible other thread to take it is who calls
+     * qemu_guest_free_page_hint(), which should be rare; secondly, see
+     * MAX_WAIT (if curious, further see commit 4508bd9ed8053ce) below, which
+     * guarantees that we'll at least released it in a regular basis.
+     */
+    qemu_mutex_lock(&rs->bitmap_mutex);
     WITH_RCU_READ_LOCK_GUARD() {
         if (ram_list.version != rs->last_version) {
             ram_state_reset(rs);
@@ -2893,6 +2899,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
             i++;
         }
     }
+    qemu_mutex_unlock(&rs->bitmap_mutex);
 
     /*
      * Must occur before EOS (or any QEMUFile operation)
@@ -3682,6 +3689,7 @@ void colo_flush_ram_cache(void)
     unsigned long offset = 0;
 
     memory_global_dirty_log_sync();
+    qemu_mutex_lock(&ram_state->bitmap_mutex);
     WITH_RCU_READ_LOCK_GUARD() {
         RAMBLOCK_FOREACH_NOT_IGNORED(block) {
             ramblock_sync_dirty_bitmap(ram_state, block);
@@ -3710,6 +3718,7 @@ void colo_flush_ram_cache(void)
         }
     }
     trace_colo_flush_ram_cache_end();
+    qemu_mutex_unlock(&ram_state->bitmap_mutex);
 }
 
 /**
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-06-30 20:08 [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty() Peter Xu
@ 2021-07-01  4:42 ` Wang, Wei W
  2021-07-01 12:51   ` Peter Xu
  2021-07-03 16:31 ` Lukas Straub
  2021-07-13  8:40 ` Wang, Wei W
  2 siblings, 1 reply; 37+ messages in thread
From: Wang, Wei W @ 2021-07-01  4:42 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Leonardo Bras Soares Passos, Juan Quintela, Hailiang Zhang,
	Dr . David Alan Gilbert, David Hildenbrand

On Thursday, July 1, 2021 4:08 AM, Peter Xu wrote:
> Taking the mutex every time for each dirty bit to clear is too slow, especially we'll
> take/release even if the dirty bit is cleared.  So far it's only used to sync with
> special cases with qemu_guest_free_page_hint() against migration thread,
> nothing really that serious yet.  Let's move the lock to be upper.
> 
> There're two callers of migration_bitmap_clear_dirty().
> 
> For migration, move it into ram_save_iterate().  With the help of MAX_WAIT
> logic, we'll only run ram_save_iterate() for no more than 50ms-ish time, so taking
> the lock once there at the entry.  It also means any call sites to
> qemu_guest_free_page_hint() can be delayed; but it should be very rare, only
> during migration, and I don't see a problem with it.
> 
> For COLO, move it up to colo_flush_ram_cache().  I think COLO forgot to take
> that lock even when calling ramblock_sync_dirty_bitmap(), where another
> example is migration_bitmap_sync() who took it right.  So let the mutex cover
> both the
> ramblock_sync_dirty_bitmap() and migration_bitmap_clear_dirty() calls.
> 
> It's even possible to drop the lock so we use atomic operations upon rb->bmap
> and the variable migration_dirty_pages.  I didn't do it just to still be safe, also
> not predictable whether the frequent atomic ops could bring overhead too e.g.
> on huge vms when it happens very often.  When that really comes, we can
> keep a local counter and periodically call atomic ops.  Keep it simple for now.
> 

If free page opt is enabled, 50ms waiting time might be too long for handling just one hint (via qemu_guest_free_page_hint)?
How about making the lock conditionally?
e.g.
#define QEMU_LOCK_GUARD_COND (lock, cond) {
	if (cond)
		QEMU_LOCK_GUARD(lock);
}
Then in migration_bitmap_clear_dirty:
QEMU_LOCK_GUARD_COND(&rs->bitmap_mutex, rs->fpo_enabled);


Best,
Wei


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-01  4:42 ` Wang, Wei W
@ 2021-07-01 12:51   ` Peter Xu
  2021-07-01 14:21     ` David Hildenbrand
  2021-07-02  2:29     ` Wang, Wei W
  0 siblings, 2 replies; 37+ messages in thread
From: Peter Xu @ 2021-07-01 12:51 UTC (permalink / raw)
  To: Wang, Wei W
  Cc: Hailiang Zhang, Juan Quintela, David Hildenbrand, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Thu, Jul 01, 2021 at 04:42:38AM +0000, Wang, Wei W wrote:
> On Thursday, July 1, 2021 4:08 AM, Peter Xu wrote:
> > Taking the mutex every time for each dirty bit to clear is too slow, especially we'll
> > take/release even if the dirty bit is cleared.  So far it's only used to sync with
> > special cases with qemu_guest_free_page_hint() against migration thread,
> > nothing really that serious yet.  Let's move the lock to be upper.
> > 
> > There're two callers of migration_bitmap_clear_dirty().
> > 
> > For migration, move it into ram_save_iterate().  With the help of MAX_WAIT
> > logic, we'll only run ram_save_iterate() for no more than 50ms-ish time, so taking
> > the lock once there at the entry.  It also means any call sites to
> > qemu_guest_free_page_hint() can be delayed; but it should be very rare, only
> > during migration, and I don't see a problem with it.
> > 
> > For COLO, move it up to colo_flush_ram_cache().  I think COLO forgot to take
> > that lock even when calling ramblock_sync_dirty_bitmap(), where another
> > example is migration_bitmap_sync() who took it right.  So let the mutex cover
> > both the
> > ramblock_sync_dirty_bitmap() and migration_bitmap_clear_dirty() calls.
> > 
> > It's even possible to drop the lock so we use atomic operations upon rb->bmap
> > and the variable migration_dirty_pages.  I didn't do it just to still be safe, also
> > not predictable whether the frequent atomic ops could bring overhead too e.g.
> > on huge vms when it happens very often.  When that really comes, we can
> > keep a local counter and periodically call atomic ops.  Keep it simple for now.
> > 
> 
> If free page opt is enabled, 50ms waiting time might be too long for handling just one hint (via qemu_guest_free_page_hint)?
> How about making the lock conditionally?
> e.g.
> #define QEMU_LOCK_GUARD_COND (lock, cond) {
> 	if (cond)
> 		QEMU_LOCK_GUARD(lock);
> }
> Then in migration_bitmap_clear_dirty:
> QEMU_LOCK_GUARD_COND(&rs->bitmap_mutex, rs->fpo_enabled);

Yeah that's indeed some kind of comment I'd like to get from either you or
David when I add the cc list.. :)

I was curious how that would affect the guest when the free page hint helper
can stuck for a while.  Per my understanding it's fully async as the blocked
thread here is asynchronously with the guest since both virtio-balloon and
virtio-mem are fully async. If so, would it really affect the guest a lot?  Is
it still tolerable if it only happens during migration?

Taking that mutex for each dirty bit is still an overkill to me, irrelevant of
whether it's "conditional" or not.  If I'm the cloud admin, I would more prefer
migration finishes earlier, imho, rather than freeing some more pages on the
host (after migration all pages will be gone!).  If it still blocks the guest
in some unhealthy way I still prefer to take the lock here, however maybe make
it shorter than 50ms.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-01 12:51   ` Peter Xu
@ 2021-07-01 14:21     ` David Hildenbrand
  2021-07-02  2:48       ` Wang, Wei W
  2021-07-02  2:29     ` Wang, Wei W
  1 sibling, 1 reply; 37+ messages in thread
From: David Hildenbrand @ 2021-07-01 14:21 UTC (permalink / raw)
  To: Peter Xu, Wang, Wei W
  Cc: Hailiang Zhang, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On 01.07.21 14:51, Peter Xu wrote:
> On Thu, Jul 01, 2021 at 04:42:38AM +0000, Wang, Wei W wrote:
>> On Thursday, July 1, 2021 4:08 AM, Peter Xu wrote:
>>> Taking the mutex every time for each dirty bit to clear is too slow, especially we'll
>>> take/release even if the dirty bit is cleared.  So far it's only used to sync with
>>> special cases with qemu_guest_free_page_hint() against migration thread,
>>> nothing really that serious yet.  Let's move the lock to be upper.
>>>
>>> There're two callers of migration_bitmap_clear_dirty().
>>>
>>> For migration, move it into ram_save_iterate().  With the help of MAX_WAIT
>>> logic, we'll only run ram_save_iterate() for no more than 50ms-ish time, so taking
>>> the lock once there at the entry.  It also means any call sites to
>>> qemu_guest_free_page_hint() can be delayed; but it should be very rare, only
>>> during migration, and I don't see a problem with it.
>>>
>>> For COLO, move it up to colo_flush_ram_cache().  I think COLO forgot to take
>>> that lock even when calling ramblock_sync_dirty_bitmap(), where another
>>> example is migration_bitmap_sync() who took it right.  So let the mutex cover
>>> both the
>>> ramblock_sync_dirty_bitmap() and migration_bitmap_clear_dirty() calls.
>>>
>>> It's even possible to drop the lock so we use atomic operations upon rb->bmap
>>> and the variable migration_dirty_pages.  I didn't do it just to still be safe, also
>>> not predictable whether the frequent atomic ops could bring overhead too e.g.
>>> on huge vms when it happens very often.  When that really comes, we can
>>> keep a local counter and periodically call atomic ops.  Keep it simple for now.
>>>
>>
>> If free page opt is enabled, 50ms waiting time might be too long for handling just one hint (via qemu_guest_free_page_hint)?
>> How about making the lock conditionally?
>> e.g.
>> #define QEMU_LOCK_GUARD_COND (lock, cond) {
>> 	if (cond)
>> 		QEMU_LOCK_GUARD(lock);
>> }
>> Then in migration_bitmap_clear_dirty:
>> QEMU_LOCK_GUARD_COND(&rs->bitmap_mutex, rs->fpo_enabled);
> 
> Yeah that's indeed some kind of comment I'd like to get from either you or
> David when I add the cc list.. :)
> 
> I was curious how that would affect the guest when the free page hint helper
> can stuck for a while.  Per my understanding it's fully async as the blocked
> thread here is asynchronously with the guest since both virtio-balloon and
> virtio-mem are fully async. If so, would it really affect the guest a lot?  Is
> it still tolerable if it only happens during migration?

For virtio-mem, we call qemu_guest_free_page_hint() synchronously from 
the migration thread, directly via the precopy notifier. I recently sent 
patches that stop using qemu_guest_free_page_hint() from virtio-mem 
code. Until then, virtio-mem shouldn't care too much about that change 
here I guess, as it doesn't interact with guest reqests.

https://lkml.kernel.org/r/20210616162940.28630-1-david@redhat.com

For virtio-balloon, it's called via the (asynchronous) iothread.

> 
> Taking that mutex for each dirty bit is still an overkill to me, irrelevant of
> whether it's "conditional" or not.  If I'm the cloud admin, I would more prefer
> migration finishes earlier, imho, rather than freeing some more pages on the
> host (after migration all pages will be gone!).  If it still blocks the guest
> in some unhealthy way I still prefer to take the lock here, however maybe make
> it shorter than 50ms.

Spoiler alert: the introduction of clean bitmaps partially broke free 
page hinting already (as clearing happens deferred -- and might never 
happen if we don't migrate *any* page within a clean bitmap chunk, so 
pages actually remain dirty ...). "broke" here means that pages still 
get migrated even though they were reported by the guest. We'd actually 
not want to use clean bmaps with free page hinting ... long story short, 
free page hinting is a very fragile beast already and some of the hints 
are basically ignored and pure overhead ...


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-01 12:51   ` Peter Xu
  2021-07-01 14:21     ` David Hildenbrand
@ 2021-07-02  2:29     ` Wang, Wei W
  2021-07-06 17:59       ` Peter Xu
  1 sibling, 1 reply; 37+ messages in thread
From: Wang, Wei W @ 2021-07-02  2:29 UTC (permalink / raw)
  To: Peter Xu
  Cc: Hailiang Zhang, Juan Quintela, David Hildenbrand, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Thursday, July 1, 2021 8:51 PM, Peter Xu wrote:
> On Thu, Jul 01, 2021 at 04:42:38AM +0000, Wang, Wei W wrote:
> > On Thursday, July 1, 2021 4:08 AM, Peter Xu wrote:
> > > Taking the mutex every time for each dirty bit to clear is too slow,
> > > especially we'll take/release even if the dirty bit is cleared.  So
> > > far it's only used to sync with special cases with
> > > qemu_guest_free_page_hint() against migration thread, nothing really that
> serious yet.  Let's move the lock to be upper.
> > >
> > > There're two callers of migration_bitmap_clear_dirty().
> > >
> > > For migration, move it into ram_save_iterate().  With the help of
> > > MAX_WAIT logic, we'll only run ram_save_iterate() for no more than
> > > 50ms-ish time, so taking the lock once there at the entry.  It also
> > > means any call sites to
> > > qemu_guest_free_page_hint() can be delayed; but it should be very
> > > rare, only during migration, and I don't see a problem with it.
> > >
> > > For COLO, move it up to colo_flush_ram_cache().  I think COLO forgot
> > > to take that lock even when calling ramblock_sync_dirty_bitmap(),
> > > where another example is migration_bitmap_sync() who took it right.
> > > So let the mutex cover both the
> > > ramblock_sync_dirty_bitmap() and migration_bitmap_clear_dirty() calls.
> > >
> > > It's even possible to drop the lock so we use atomic operations upon
> > > rb->bmap and the variable migration_dirty_pages.  I didn't do it
> > > just to still be safe, also not predictable whether the frequent atomic ops
> could bring overhead too e.g.
> > > on huge vms when it happens very often.  When that really comes, we
> > > can keep a local counter and periodically call atomic ops.  Keep it simple for
> now.
> > >
> >
> > If free page opt is enabled, 50ms waiting time might be too long for handling
> just one hint (via qemu_guest_free_page_hint)?
> > How about making the lock conditionally?
> > e.g.
> > #define QEMU_LOCK_GUARD_COND (lock, cond) {
> > 	if (cond)
> > 		QEMU_LOCK_GUARD(lock);
> > }
> > Then in migration_bitmap_clear_dirty:
> > QEMU_LOCK_GUARD_COND(&rs->bitmap_mutex, rs->fpo_enabled);
> 
> Yeah that's indeed some kind of comment I'd like to get from either you or David
> when I add the cc list.. :)
> 
> I was curious how that would affect the guest when the free page hint helper can
> stuck for a while.  Per my understanding it's fully async as the blocked thread
> here is asynchronously with the guest since both virtio-balloon and virtio-mem
> are fully async. If so, would it really affect the guest a lot?  Is it still tolerable if it
> only happens during migration?

Yes, it is async and won't block the guest. But it will make the optimization doesn’t run as expected.
The intention is to have the migration thread skip the transfer of the free pages, but now the migration
thread is kind of using the 50ms lock to prevent the clearing of free pages while it is likely just sending free pages inside the lock.
(the reported free pages are better to be cleared in the bitmap in time in case they have already sent)

> 
> Taking that mutex for each dirty bit is still an overkill to me, irrelevant of whether
> it's "conditional" or not.  

With that, if free page opt is off, the mutex is skipped, isn't it?

> If I'm the cloud admin, I would more prefer migration
> finishes earlier, imho, rather than freeing some more pages on the host (after
> migration all pages will be gone!).  If it still blocks the guest in some unhealthy
> way I still prefer to take the lock here, however maybe make it shorter than
> 50ms.
> 

Yes, with the optimization, migration will be finished earlier.
Why it needs to free pages on the host?
(just skip sending the page)

Best,
Wei




^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-01 14:21     ` David Hildenbrand
@ 2021-07-02  2:48       ` Wang, Wei W
  2021-07-02  7:06         ` David Hildenbrand
  0 siblings, 1 reply; 37+ messages in thread
From: Wang, Wei W @ 2021-07-02  2:48 UTC (permalink / raw)
  To: David Hildenbrand, Peter Xu
  Cc: Hailiang Zhang, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Thursday, July 1, 2021 10:22 PM, David Hildenbrand wrote:
> On 01.07.21 14:51, Peter Xu wrote:
> Spoiler alert: the introduction of clean bitmaps partially broke free page hinting
> already (as clearing happens deferred -- and might never happen if we don't
> migrate *any* page within a clean bitmap chunk, so pages actually remain
> dirty ...). "broke" here means that pages still get migrated even though they
> were reported by the guest. We'd actually not want to use clean bmaps with free
> page hinting ... long story short, free page hinting is a very fragile beast already
> and some of the hints are basically ignored and pure overhead ...

Not really. Both clear_bmap and free page hint are to "clear" the bitmap.
No matter who comes first to clear it, it won’t cause more (unexpected) pages to be sent.

Best,
Wei

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-02  2:48       ` Wang, Wei W
@ 2021-07-02  7:06         ` David Hildenbrand
  2021-07-03  2:53           ` Wang, Wei W
  0 siblings, 1 reply; 37+ messages in thread
From: David Hildenbrand @ 2021-07-02  7:06 UTC (permalink / raw)
  To: Wang, Wei W, Peter Xu
  Cc: Hailiang Zhang, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On 02.07.21 04:48, Wang, Wei W wrote:
> On Thursday, July 1, 2021 10:22 PM, David Hildenbrand wrote:
>> On 01.07.21 14:51, Peter Xu wrote:
>> Spoiler alert: the introduction of clean bitmaps partially broke free page hinting
>> already (as clearing happens deferred -- and might never happen if we don't
>> migrate *any* page within a clean bitmap chunk, so pages actually remain
>> dirty ...). "broke" here means that pages still get migrated even though they
>> were reported by the guest. We'd actually not want to use clean bmaps with free
>> page hinting ... long story short, free page hinting is a very fragile beast already
>> and some of the hints are basically ignored and pure overhead ...
> 
> Not really. Both clear_bmap and free page hint are to "clear" the bitmap.
> No matter who comes first to clear it, it won’t cause more (unexpected) pages to be sent.

I was able to reproduce something like that before my vacation by 
starting a 8GB VM, wait until Linux is up, migrate it.

With clear_bmap:

QEMU 6.0.50 monitor - type 'help' for more information
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
clear-bitmap-shift: 18
Migration status: completed
total time: 4532 ms
downtime: 4 ms
setup: 3 ms
transferred ram: 578480 kbytes
throughput: 1047.05 mbps
remaining ram: 0 kbytes
total ram: 8389384 kbytes
duplicate: 1080349 pages
skipped: 0 pages
normal: 141969 pages
normal bytes: 567876 kbytes
dirty sync count: 4
page size: 4 kbytes
multifd bytes: 0 kbytes
pages-per-second: 4102921
(qemu)

Without clear_bmap:

(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
clear-bitmap-shift: 18
Migration status: completed
total time: 4891 ms
downtime: 61 ms
setup: 29 ms
transferred ram: 666400 kbytes
throughput: 1123.47 mbps
remaining ram: 0 kbytes
total ram: 8389384 kbytes
duplicate: 33427 pages
skipped: 0 pages
normal: 166202 pages
normal bytes: 664808 kbytes
dirty sync count: 4
page size: 4 kbytes
multifd bytes: 0 kbytes
pages-per-second: 32386
(qemu)

Without free page hinting:

QEMU 6.0.50 monitor - type 'help' for more information
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
clear-bitmap-shift: 18
Migration status: completed
total time: 4975 ms
downtime: 48 ms
setup: 3 ms
transferred ram: 639982 kbytes
throughput: 1055.09 mbps
remaining ram: 0 kbytes
total ram: 8389384 kbytes
duplicate: 1942431 pages
skipped: 0 pages
normal: 155424 pages
normal bytes: 621696 kbytes
dirty sync count: 3
page size: 4 kbytes
multifd bytes: 0 kbytes
pages-per-second: 32386
(qemu)


Take a look at "duplicate". We seem to end up reading+migrating a lot 
more pages (all filled with zero, as the guest never touched them) with 
clear_bmap, but not as much as with free page hinting disabled.


Repeating the same experiment (not performing the "no free page hinting" 
step) by running "memhog 7g" in the guest before migrating, such that we 
don't have all-zero pages:

With clear_bmap:

QEMU 6.0.50 monitor - type 'help' for more information
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
clear-bitmap-shift: 18
Migration status: completed
total time: 28574 ms
downtime: 73 ms
setup: 4 ms
transferred ram: 3715000 kbytes
throughput: 1065.33 mbps
remaining ram: 0 kbytes
total ram: 8389384 kbytes
duplicate: 21472 pages
skipped: 0 pages
normal: 926892 pages
normal bytes: 3707568 kbytes
dirty sync count: 4
page size: 4 kbytes
multifd bytes: 0 kbytes
pages-per-second: 32710
(qemu)

Without clear_bmap:

QEMU 6.0.50 monitor - type 'help' for more information
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
clear-bitmap-shift: 18
Migration status: completed
total time: 5818 ms
downtime: 24 ms
setup: 29 ms
transferred ram: 672486 kbytes
throughput: 952.18 mbps
remaining ram: 0 kbytes
total ram: 8389384 kbytes
duplicate: 19449 pages
skipped: 0 pages
normal: 167751 pages
normal bytes: 671004 kbytes
dirty sync count: 4
page size: 4 kbytes
multifd bytes: 0 kbytes
pages-per-second: 32710
(qemu)



I think that clearly shows the issue.

My theory I did not verify yet: Assume we have 1GB chunks in the clear 
bmap. Assume the VM reports all pages within a 1GB chunk as free (easy 
with a fresh VM). While processing hints, we will clear the bits from 
the dirty bmap in the RAMBlock. As we will never migrate any page of 
that 1GB chunk, we will not actually clear the dirty bitmap of the 
memory region. When re-syncing, we will set all bits bits in the dirty 
bmap again from the dirty bitmap in the memory region. Thus, many of our 
hints end up being mostly ignored. The smaller the clear bmap chunk, the 
more extreme the issue.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-02  7:06         ` David Hildenbrand
@ 2021-07-03  2:53           ` Wang, Wei W
  2021-07-05 13:41             ` David Hildenbrand
  2021-07-06 17:47             ` Peter Xu
  0 siblings, 2 replies; 37+ messages in thread
From: Wang, Wei W @ 2021-07-03  2:53 UTC (permalink / raw)
  To: David Hildenbrand, Peter Xu
  Cc: Hailiang Zhang, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Friday, July 2, 2021 3:07 PM, David Hildenbrand wrote:
> On 02.07.21 04:48, Wang, Wei W wrote:
> > On Thursday, July 1, 2021 10:22 PM, David Hildenbrand wrote:
> >> On 01.07.21 14:51, Peter Xu wrote:
> 
> I think that clearly shows the issue.
> 
> My theory I did not verify yet: Assume we have 1GB chunks in the clear bmap.
> Assume the VM reports all pages within a 1GB chunk as free (easy with a fresh
> VM). While processing hints, we will clear the bits from the dirty bmap in the
> RAMBlock. As we will never migrate any page of that 1GB chunk, we will not
> actually clear the dirty bitmap of the memory region. When re-syncing, we will
> set all bits bits in the dirty bmap again from the dirty bitmap in the memory
> region. Thus, many of our hints end up being mostly ignored. The smaller the
> clear bmap chunk, the more extreme the issue.

OK, that looks possible. We need to clear the related bits from the memory region
bitmap before skipping the free pages. Could you try with below patch:

diff --git a/migration/ram.c b/migration/ram.c
index ace8ad431c..a1f6df3e6c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -811,6 +811,26 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
     return next;
 }

+
+static void migration_clear_memory_region_dirty_bitmap(RAMState *rs,
+                                                       RAMBlock *rb,
+                                                      unsigned long page)
+{
+    uint8_t shift;
+    hwaddr size, start;
+
+    if (!rb->clear_bmap || !clear_bmap_test_and_clear(rb, page))
+        return;
+
+    shift = rb->clear_bmap_shift;
+    assert(shift >= 6);
+
+    size = 1ULL << (TARGET_PAGE_BITS + shift);
+    start = (((ram_addr_t)page) << TARGET_PAGE_BITS) & (-size);
+    trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page);
+    memory_region_clear_dirty_bitmap(rb->mr, start, size);
+}
+
 static inline bool migration_bitmap_clear_dirty(RAMState *rs,
                                                 RAMBlock *rb,
                                                 unsigned long page)
@@ -827,26 +847,9 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
      * the page in the chunk we clear the remote dirty bitmap for all.
      * Clearing it earlier won't be a problem, but too late will.
      */
-    if (rb->clear_bmap && clear_bmap_test_and_clear(rb, page)) {
-        uint8_t shift = rb->clear_bmap_shift;
-        hwaddr size = 1ULL << (TARGET_PAGE_BITS + shift);
-        hwaddr start = (((ram_addr_t)page) << TARGET_PAGE_BITS) & (-size);
-
-        /*
-         * CLEAR_BITMAP_SHIFT_MIN should always guarantee this... this
-         * can make things easier sometimes since then start address
-         * of the small chunk will always be 64 pages aligned so the
-         * bitmap will always be aligned to unsigned long.  We should
-         * even be able to remove this restriction but I'm simply
-         * keeping it.
-         */
-        assert(shift >= 6);
-        trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page);
-        memory_region_clear_dirty_bitmap(rb->mr, start, size);
-    }
+    migration_clear_memory_region_dirty_bitmap(rs, rb, page);

     ret = test_and_clear_bit(page, rb->bmap);
-
     if (ret) {
         rs->migration_dirty_pages--;
     }
@@ -2746,7 +2749,7 @@ void qemu_guest_free_page_hint(void *addr, size_t len)
 {
     RAMBlock *block;
     ram_addr_t offset;
-    size_t used_len, start, npages;
+    size_t used_len, start, npages, page_to_clear, i = 0;
     MigrationState *s = migrate_get_current();

     /* This function is currently expected to be used during live migration */
@@ -2775,6 +2778,19 @@ void qemu_guest_free_page_hint(void *addr, size_t len)
         start = offset >> TARGET_PAGE_BITS;
         npages = used_len >> TARGET_PAGE_BITS;

+        /*
+         * The skipped free pages are equavelent to be sent from clear_bmap's
+        * perspective, so clear the bits from the memory region bitmap which
+        * are initially set. Otherwise those skipped pages will be sent in
+        * the next round after syncing from the memory region bitmap.
+        */
+        /*
+         * The skipped free pages are equavelent to be sent from clear_bmap's
+        * perspective, so clear the bits from the memory region bitmap which
+        * are initially set. Otherwise those skipped pages will be sent in
+        * the next round after syncing from the memory region bitmap.
+        */
+       do {
+            page_to_clear = start + (i++ << block->clear_bmap_shift);
+            migration_clear_memory_region_dirty_bitmap(ram_state,
+                                                       block,
+                                                       page_to_clear);
+       } while (i <= npages >> block->clear_bmap_shift);
+
         qemu_mutex_lock(&ram_state->bitmap_mutex);
         ram_state->migration_dirty_pages -=
                       bitmap_count_one_with_offset(block->bmap, start, npages);

Best,
Wei

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-06-30 20:08 [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty() Peter Xu
  2021-07-01  4:42 ` Wang, Wei W
@ 2021-07-03 16:31 ` Lukas Straub
  2021-07-04 14:14   ` Lukas Straub
  2021-07-13  8:40 ` Wang, Wei W
  2 siblings, 1 reply; 37+ messages in thread
From: Lukas Straub @ 2021-07-03 16:31 UTC (permalink / raw)
  To: Peter Xu
  Cc: Hailiang Zhang, Juan Quintela, David Hildenbrand, qemu-devel,
	Dr . David Alan Gilbert, Wei Wang, Leonardo Bras Soares Passos

[-- Attachment #1: Type: text/plain, Size: 4320 bytes --]

On Wed, 30 Jun 2021 16:08:05 -0400
Peter Xu <peterx@redhat.com> wrote:

> Taking the mutex every time for each dirty bit to clear is too slow, especially
> we'll take/release even if the dirty bit is cleared.  So far it's only used to
> sync with special cases with qemu_guest_free_page_hint() against migration
> thread, nothing really that serious yet.  Let's move the lock to be upper.
> 
> There're two callers of migration_bitmap_clear_dirty().
> 
> For migration, move it into ram_save_iterate().  With the help of MAX_WAIT
> logic, we'll only run ram_save_iterate() for no more than 50ms-ish time, so
> taking the lock once there at the entry.  It also means any call sites to
> qemu_guest_free_page_hint() can be delayed; but it should be very rare, only
> during migration, and I don't see a problem with it.
> 
> For COLO, move it up to colo_flush_ram_cache().  I think COLO forgot to take
> that lock even when calling ramblock_sync_dirty_bitmap(), where another example
> is migration_bitmap_sync() who took it right.  So let the mutex cover both the
> ramblock_sync_dirty_bitmap() and migration_bitmap_clear_dirty() calls.

Hi,
I don't think COLO needs it, colo_flush_ram_cache() only runs on
the secondary (incoming) side and AFAIK the bitmap is only set in
ram_load_precopy() and they don't run in parallel.

Although I'm not sure what ramblock_sync_dirty_bitmap() does. I guess
it's only there to make the rest of the migration code happy?

Regards,
Lukas Straub

> It's even possible to drop the lock so we use atomic operations upon rb->bmap
> and the variable migration_dirty_pages.  I didn't do it just to still be safe,
> also not predictable whether the frequent atomic ops could bring overhead too
> e.g. on huge vms when it happens very often.  When that really comes, we can
> keep a local counter and periodically call atomic ops.  Keep it simple for now.
> 
> Cc: Wei Wang <wei.w.wang@intel.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Juan Quintela <quintela@redhat.com>
> Cc: Leonardo Bras Soares Passos <lsoaresp@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/ram.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 723af67c2e..9f2965675d 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -795,8 +795,6 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
>  {
>      bool ret;
>  
> -    QEMU_LOCK_GUARD(&rs->bitmap_mutex);
> -
>      /*
>       * Clear dirty bitmap if needed.  This _must_ be called before we
>       * send any of the page in the chunk because we need to make sure
> @@ -2834,6 +2832,14 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>          goto out;
>      }
>  
> +    /*
> +     * We'll take this lock a little bit long, but it's okay for two reasons.
> +     * Firstly, the only possible other thread to take it is who calls
> +     * qemu_guest_free_page_hint(), which should be rare; secondly, see
> +     * MAX_WAIT (if curious, further see commit 4508bd9ed8053ce) below, which
> +     * guarantees that we'll at least released it in a regular basis.
> +     */
> +    qemu_mutex_lock(&rs->bitmap_mutex);
>      WITH_RCU_READ_LOCK_GUARD() {
>          if (ram_list.version != rs->last_version) {
>              ram_state_reset(rs);
> @@ -2893,6 +2899,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>              i++;
>          }
>      }
> +    qemu_mutex_unlock(&rs->bitmap_mutex);
>  
>      /*
>       * Must occur before EOS (or any QEMUFile operation)
> @@ -3682,6 +3689,7 @@ void colo_flush_ram_cache(void)
>      unsigned long offset = 0;
>  
>      memory_global_dirty_log_sync();
> +    qemu_mutex_lock(&ram_state->bitmap_mutex);
>      WITH_RCU_READ_LOCK_GUARD() {
>          RAMBLOCK_FOREACH_NOT_IGNORED(block) {
>              ramblock_sync_dirty_bitmap(ram_state, block);
> @@ -3710,6 +3718,7 @@ void colo_flush_ram_cache(void)
>          }
>      }
>      trace_colo_flush_ram_cache_end();
> +    qemu_mutex_unlock(&ram_state->bitmap_mutex);
>  }
>  
>  /**



-- 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-03 16:31 ` Lukas Straub
@ 2021-07-04 14:14   ` Lukas Straub
  2021-07-06 18:37     ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Lukas Straub @ 2021-07-04 14:14 UTC (permalink / raw)
  To: Peter Xu
  Cc: Hailiang Zhang, Juan Quintela, David Hildenbrand, qemu-devel,
	Dr . David Alan Gilbert, Wei Wang, Leonardo Bras Soares Passos

[-- Attachment #1: Type: text/plain, Size: 2978 bytes --]

On Sat, 3 Jul 2021 18:31:15 +0200
Lukas Straub <lukasstraub2@web.de> wrote:

> On Wed, 30 Jun 2021 16:08:05 -0400
> Peter Xu <peterx@redhat.com> wrote:
> 
> > Taking the mutex every time for each dirty bit to clear is too slow, especially
> > we'll take/release even if the dirty bit is cleared.  So far it's only used to
> > sync with special cases with qemu_guest_free_page_hint() against migration
> > thread, nothing really that serious yet.  Let's move the lock to be upper.
> > 
> > There're two callers of migration_bitmap_clear_dirty().
> > 
> > For migration, move it into ram_save_iterate().  With the help of MAX_WAIT
> > logic, we'll only run ram_save_iterate() for no more than 50ms-ish time, so
> > taking the lock once there at the entry.  It also means any call sites to
> > qemu_guest_free_page_hint() can be delayed; but it should be very rare, only
> > during migration, and I don't see a problem with it.
> > 
> > For COLO, move it up to colo_flush_ram_cache().  I think COLO forgot to take
> > that lock even when calling ramblock_sync_dirty_bitmap(), where another example
> > is migration_bitmap_sync() who took it right.  So let the mutex cover both the
> > ramblock_sync_dirty_bitmap() and migration_bitmap_clear_dirty() calls.  
> 
> Hi,
> I don't think COLO needs it, colo_flush_ram_cache() only runs on
> the secondary (incoming) side and AFAIK the bitmap is only set in
> ram_load_precopy() and they don't run in parallel.
> 
> Although I'm not sure what ramblock_sync_dirty_bitmap() does. I guess
> it's only there to make the rest of the migration code happy?

To answer myself, it syncs the dirty bitmap of the guest itself with
the ramblock. Of course not only changed pages on the primary need to
be overwritten from the cache, but also changed pages on the secondary
so the ram content exactly matches the primary's.

Now, I still don't know what would run concurrently there since the
guest is stopped when colo_flush_ram_cache() runs.

Regards,
Lukas Straub

> Regards,
> Lukas Straub
> 
> > It's even possible to drop the lock so we use atomic operations upon rb->bmap
> > and the variable migration_dirty_pages.  I didn't do it just to still be safe,
> > also not predictable whether the frequent atomic ops could bring overhead too
> > e.g. on huge vms when it happens very often.  When that really comes, we can
> > keep a local counter and periodically call atomic ops.  Keep it simple for now.
> > 
> > Cc: Wei Wang <wei.w.wang@intel.com>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
> > Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Cc: Juan Quintela <quintela@redhat.com>
> > Cc: Leonardo Bras Soares Passos <lsoaresp@redhat.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/ram.c | 13 +++++++++++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> > 
> > ...



-- 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-03  2:53           ` Wang, Wei W
@ 2021-07-05 13:41             ` David Hildenbrand
  2021-07-06  9:41               ` Wang, Wei W
  2021-07-06 17:47             ` Peter Xu
  1 sibling, 1 reply; 37+ messages in thread
From: David Hildenbrand @ 2021-07-05 13:41 UTC (permalink / raw)
  To: Wang, Wei W, Peter Xu
  Cc: Hailiang Zhang, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On 03.07.21 04:53, Wang, Wei W wrote:
> On Friday, July 2, 2021 3:07 PM, David Hildenbrand wrote:
>> On 02.07.21 04:48, Wang, Wei W wrote:
>>> On Thursday, July 1, 2021 10:22 PM, David Hildenbrand wrote:
>>>> On 01.07.21 14:51, Peter Xu wrote:
>>
>> I think that clearly shows the issue.
>>
>> My theory I did not verify yet: Assume we have 1GB chunks in the clear bmap.
>> Assume the VM reports all pages within a 1GB chunk as free (easy with a fresh
>> VM). While processing hints, we will clear the bits from the dirty bmap in the
>> RAMBlock. As we will never migrate any page of that 1GB chunk, we will not
>> actually clear the dirty bitmap of the memory region. When re-syncing, we will
>> set all bits bits in the dirty bmap again from the dirty bitmap in the memory
>> region. Thus, many of our hints end up being mostly ignored. The smaller the
>> clear bmap chunk, the more extreme the issue.
> 
> OK, that looks possible. We need to clear the related bits from the memory region
> bitmap before skipping the free pages. Could you try with below patch:

I did a quick test (with the memhog example) and it seems like it mostly 
works. However, we're now doing the bitmap clearing from another, racing 
with the migration thread. Especially:

1. Racing with clear_bmap_set() via cpu_physical_memory_sync_dirty_bitmap()
2. Racing with migration_bitmap_clear_dirty()

So that might need some thought, if I'm not wrong.

The simplest approach would be removing/freeing the clear_bmap via 
PRECOPY_NOTIFY_SETUP(), similar to 
precopy_enable_free_page_optimization() we had before. Of course, this 
will skip the clear_bmap optimization.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-05 13:41             ` David Hildenbrand
@ 2021-07-06  9:41               ` Wang, Wei W
  2021-07-06 10:05                 ` David Hildenbrand
  0 siblings, 1 reply; 37+ messages in thread
From: Wang, Wei W @ 2021-07-06  9:41 UTC (permalink / raw)
  To: David Hildenbrand, Peter Xu
  Cc: Hailiang Zhang, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Monday, July 5, 2021 9:42 PM, David Hildenbrand wrote:
> On 03.07.21 04:53, Wang, Wei W wrote:
> > On Friday, July 2, 2021 3:07 PM, David Hildenbrand wrote:
> >> On 02.07.21 04:48, Wang, Wei W wrote:
> >>> On Thursday, July 1, 2021 10:22 PM, David Hildenbrand wrote:
> >>>> On 01.07.21 14:51, Peter Xu wrote:
> >>
> >> I think that clearly shows the issue.
> >>
> >> My theory I did not verify yet: Assume we have 1GB chunks in the clear bmap.
> >> Assume the VM reports all pages within a 1GB chunk as free (easy with
> >> a fresh VM). While processing hints, we will clear the bits from the
> >> dirty bmap in the RAMBlock. As we will never migrate any page of that
> >> 1GB chunk, we will not actually clear the dirty bitmap of the memory
> >> region. When re-syncing, we will set all bits bits in the dirty bmap
> >> again from the dirty bitmap in the memory region. Thus, many of our
> >> hints end up being mostly ignored. The smaller the clear bmap chunk, the
> more extreme the issue.
> >
> > OK, that looks possible. We need to clear the related bits from the
> > memory region bitmap before skipping the free pages. Could you try with
> below patch:
> 
> I did a quick test (with the memhog example) and it seems like it mostly works.
> However, we're now doing the bitmap clearing from another, racing with the
> migration thread. Especially:
> 
> 1. Racing with clear_bmap_set() via cpu_physical_memory_sync_dirty_bitmap()
> 2. Racing with migration_bitmap_clear_dirty()
> 
> So that might need some thought, if I'm not wrong.

I think this is similar to the non-clear_bmap case, where the rb->bmap is set or cleared by
the migration thread and qemu_guest_free_page_hint. For example, the migration thread
could find a bit is set from rb->bmap before qemu_guest_free_page_hint gets that bit cleared in time.
The result is that the free page is transferred, which isn't necessary, but won't cause any issue.
This is very rare in practice.

> 
> The simplest approach would be removing/freeing the clear_bmap via
> PRECOPY_NOTIFY_SETUP(), similar to
> precopy_enable_free_page_optimization() we had before. Of course, this will
> skip the clear_bmap optimization.

Not necessarily, I think. The two optimizations are not mutual exclusive essentially.
At least, they work well together now. If there are other implementation issues reported,
we could check them then.

Best,
Wei

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-06  9:41               ` Wang, Wei W
@ 2021-07-06 10:05                 ` David Hildenbrand
  2021-07-06 17:39                   ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: David Hildenbrand @ 2021-07-06 10:05 UTC (permalink / raw)
  To: Wang, Wei W, Peter Xu
  Cc: Hailiang Zhang, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On 06.07.21 11:41, Wang, Wei W wrote:
> On Monday, July 5, 2021 9:42 PM, David Hildenbrand wrote:
>> On 03.07.21 04:53, Wang, Wei W wrote:
>>> On Friday, July 2, 2021 3:07 PM, David Hildenbrand wrote:
>>>> On 02.07.21 04:48, Wang, Wei W wrote:
>>>>> On Thursday, July 1, 2021 10:22 PM, David Hildenbrand wrote:
>>>>>> On 01.07.21 14:51, Peter Xu wrote:
>>>>
>>>> I think that clearly shows the issue.
>>>>
>>>> My theory I did not verify yet: Assume we have 1GB chunks in the clear bmap.
>>>> Assume the VM reports all pages within a 1GB chunk as free (easy with
>>>> a fresh VM). While processing hints, we will clear the bits from the
>>>> dirty bmap in the RAMBlock. As we will never migrate any page of that
>>>> 1GB chunk, we will not actually clear the dirty bitmap of the memory
>>>> region. When re-syncing, we will set all bits bits in the dirty bmap
>>>> again from the dirty bitmap in the memory region. Thus, many of our
>>>> hints end up being mostly ignored. The smaller the clear bmap chunk, the
>> more extreme the issue.
>>>
>>> OK, that looks possible. We need to clear the related bits from the
>>> memory region bitmap before skipping the free pages. Could you try with
>> below patch:
>>
>> I did a quick test (with the memhog example) and it seems like it mostly works.
>> However, we're now doing the bitmap clearing from another, racing with the
>> migration thread. Especially:
>>
>> 1. Racing with clear_bmap_set() via cpu_physical_memory_sync_dirty_bitmap()
>> 2. Racing with migration_bitmap_clear_dirty()
>>
>> So that might need some thought, if I'm not wrong.
> 
> I think this is similar to the non-clear_bmap case, where the rb->bmap is set or cleared by
> the migration thread and qemu_guest_free_page_hint. For example, the migration thread
> could find a bit is set from rb->bmap before qemu_guest_free_page_hint gets that bit cleared in time.
> The result is that the free page is transferred, which isn't necessary, but won't cause any issue.
> This is very rare in practice.

Here are my concerns regarding races:

a) We now end up calling migration_clear_memory_region_dirty_bitmap() 
without holding the bitmap_mutex. We have to clarify if that is ok. At 
least migration_bitmap_clear_dirty() holds it *while* clearing the log 
and migration_bitmap_sync() while setting bits in the clear_bmap. I 
think we also have to hold the mutex on the new path.

b) We now can end up calling memory_region_clear_dirty_bitmap() 
concurrently, not sure if all notifiers can handle that. I'd assume it 
is okay, but I wouldn't bet on it.

Maybe Peter can help clarifying.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-06 10:05                 ` David Hildenbrand
@ 2021-07-06 17:39                   ` Peter Xu
  2021-07-07 12:45                     ` Wang, Wei W
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2021-07-06 17:39 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Hailiang Zhang, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Wang, Wei W,
	Leonardo Bras Soares Passos

On Tue, Jul 06, 2021 at 12:05:49PM +0200, David Hildenbrand wrote:
> On 06.07.21 11:41, Wang, Wei W wrote:
> > On Monday, July 5, 2021 9:42 PM, David Hildenbrand wrote:
> > > On 03.07.21 04:53, Wang, Wei W wrote:
> > > > On Friday, July 2, 2021 3:07 PM, David Hildenbrand wrote:
> > > > > On 02.07.21 04:48, Wang, Wei W wrote:
> > > > > > On Thursday, July 1, 2021 10:22 PM, David Hildenbrand wrote:
> > > > > > > On 01.07.21 14:51, Peter Xu wrote:
> > > > > 
> > > > > I think that clearly shows the issue.
> > > > > 
> > > > > My theory I did not verify yet: Assume we have 1GB chunks in the clear bmap.
> > > > > Assume the VM reports all pages within a 1GB chunk as free (easy with
> > > > > a fresh VM). While processing hints, we will clear the bits from the
> > > > > dirty bmap in the RAMBlock. As we will never migrate any page of that
> > > > > 1GB chunk, we will not actually clear the dirty bitmap of the memory
> > > > > region. When re-syncing, we will set all bits bits in the dirty bmap
> > > > > again from the dirty bitmap in the memory region. Thus, many of our
> > > > > hints end up being mostly ignored. The smaller the clear bmap chunk, the
> > > more extreme the issue.
> > > > 
> > > > OK, that looks possible. We need to clear the related bits from the
> > > > memory region bitmap before skipping the free pages. Could you try with
> > > below patch:
> > > 
> > > I did a quick test (with the memhog example) and it seems like it mostly works.
> > > However, we're now doing the bitmap clearing from another, racing with the
> > > migration thread. Especially:
> > > 
> > > 1. Racing with clear_bmap_set() via cpu_physical_memory_sync_dirty_bitmap()
> > > 2. Racing with migration_bitmap_clear_dirty()
> > > 
> > > So that might need some thought, if I'm not wrong.
> > 
> > I think this is similar to the non-clear_bmap case, where the rb->bmap is set or cleared by
> > the migration thread and qemu_guest_free_page_hint. For example, the migration thread
> > could find a bit is set from rb->bmap before qemu_guest_free_page_hint gets that bit cleared in time.
> > The result is that the free page is transferred, which isn't necessary, but won't cause any issue.
> > This is very rare in practice.
> 
> Here are my concerns regarding races:
> 
> a) We now end up calling migration_clear_memory_region_dirty_bitmap()
> without holding the bitmap_mutex. We have to clarify if that is ok. At least
> migration_bitmap_clear_dirty() holds it *while* clearing the log and
> migration_bitmap_sync() while setting bits in the clear_bmap. I think we
> also have to hold the mutex on the new path.

Makes sense; I think we can let bitmap_mutex to protect both dirty/clear
bitmaps, and also the dirty pages counter.  I'll comment in Wei's patch too later.

> 
> b) We now can end up calling memory_region_clear_dirty_bitmap()
> concurrently, not sure if all notifiers can handle that. I'd assume it is
> okay, but I wouldn't bet on it.

Yes, kvm should be the only one that uses it for real, while there's the slots
lock in kvm_physical_log_clear(), looks okay so far.

We'd better not turn off clear dirty log for this; it's a feature that I know
brought lots of benefits to get dirty logging, and we're even considering to
enable it for dirty ring so dirty ring can also benefit from it.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-03  2:53           ` Wang, Wei W
  2021-07-05 13:41             ` David Hildenbrand
@ 2021-07-06 17:47             ` Peter Xu
  2021-07-07  8:34               ` Wang, Wei W
  1 sibling, 1 reply; 37+ messages in thread
From: Peter Xu @ 2021-07-06 17:47 UTC (permalink / raw)
  To: Wang, Wei W
  Cc: Hailiang Zhang, Juan Quintela, David Hildenbrand, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Sat, Jul 03, 2021 at 02:53:27AM +0000, Wang, Wei W wrote:
> On Friday, July 2, 2021 3:07 PM, David Hildenbrand wrote:
> > On 02.07.21 04:48, Wang, Wei W wrote:
> > > On Thursday, July 1, 2021 10:22 PM, David Hildenbrand wrote:
> > >> On 01.07.21 14:51, Peter Xu wrote:
> > 
> > I think that clearly shows the issue.
> > 
> > My theory I did not verify yet: Assume we have 1GB chunks in the clear bmap.
> > Assume the VM reports all pages within a 1GB chunk as free (easy with a fresh
> > VM). While processing hints, we will clear the bits from the dirty bmap in the
> > RAMBlock. As we will never migrate any page of that 1GB chunk, we will not
> > actually clear the dirty bitmap of the memory region. When re-syncing, we will
> > set all bits bits in the dirty bmap again from the dirty bitmap in the memory
> > region. Thus, many of our hints end up being mostly ignored. The smaller the
> > clear bmap chunk, the more extreme the issue.
> 
> OK, that looks possible. We need to clear the related bits from the memory region
> bitmap before skipping the free pages. Could you try with below patch:
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index ace8ad431c..a1f6df3e6c 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -811,6 +811,26 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
>      return next;
>  }
> 
> +
> +static void migration_clear_memory_region_dirty_bitmap(RAMState *rs,
> +                                                       RAMBlock *rb,
> +                                                      unsigned long page)
> +{
> +    uint8_t shift;
> +    hwaddr size, start;
> +
> +    if (!rb->clear_bmap || !clear_bmap_test_and_clear(rb, page))
> +        return;
> +
> +    shift = rb->clear_bmap_shift;
> +    assert(shift >= 6);
> +
> +    size = 1ULL << (TARGET_PAGE_BITS + shift);
> +    start = (((ram_addr_t)page) << TARGET_PAGE_BITS) & (-size);
> +    trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page);
> +    memory_region_clear_dirty_bitmap(rb->mr, start, size);
> +}
> +
>  static inline bool migration_bitmap_clear_dirty(RAMState *rs,
>                                                  RAMBlock *rb,
>                                                  unsigned long page)
> @@ -827,26 +847,9 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
>       * the page in the chunk we clear the remote dirty bitmap for all.
>       * Clearing it earlier won't be a problem, but too late will.
>       */
> -    if (rb->clear_bmap && clear_bmap_test_and_clear(rb, page)) {
> -        uint8_t shift = rb->clear_bmap_shift;
> -        hwaddr size = 1ULL << (TARGET_PAGE_BITS + shift);
> -        hwaddr start = (((ram_addr_t)page) << TARGET_PAGE_BITS) & (-size);
> -
> -        /*
> -         * CLEAR_BITMAP_SHIFT_MIN should always guarantee this... this
> -         * can make things easier sometimes since then start address
> -         * of the small chunk will always be 64 pages aligned so the
> -         * bitmap will always be aligned to unsigned long.  We should
> -         * even be able to remove this restriction but I'm simply
> -         * keeping it.
> -         */
> -        assert(shift >= 6);
> -        trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page);
> -        memory_region_clear_dirty_bitmap(rb->mr, start, size);
> -    }
> +    migration_clear_memory_region_dirty_bitmap(rs, rb, page);
> 
>      ret = test_and_clear_bit(page, rb->bmap);
> -
>      if (ret) {
>          rs->migration_dirty_pages--;
>      }
> @@ -2746,7 +2749,7 @@ void qemu_guest_free_page_hint(void *addr, size_t len)
>  {
>      RAMBlock *block;
>      ram_addr_t offset;
> -    size_t used_len, start, npages;
> +    size_t used_len, start, npages, page_to_clear, i = 0;
>      MigrationState *s = migrate_get_current();
> 
>      /* This function is currently expected to be used during live migration */
> @@ -2775,6 +2778,19 @@ void qemu_guest_free_page_hint(void *addr, size_t len)
>          start = offset >> TARGET_PAGE_BITS;
>          npages = used_len >> TARGET_PAGE_BITS;
> 
> +        /*
> +         * The skipped free pages are equavelent to be sent from clear_bmap's
> +        * perspective, so clear the bits from the memory region bitmap which
> +        * are initially set. Otherwise those skipped pages will be sent in
> +        * the next round after syncing from the memory region bitmap.
> +        */
> +        /*
> +         * The skipped free pages are equavelent to be sent from clear_bmap's
> +        * perspective, so clear the bits from the memory region bitmap which
> +        * are initially set. Otherwise those skipped pages will be sent in
> +        * the next round after syncing from the memory region bitmap.
> +        */
> +       do {
> +            page_to_clear = start + (i++ << block->clear_bmap_shift);

Why "i" needs to be shifted?

> +            migration_clear_memory_region_dirty_bitmap(ram_state,
> +                                                       block,
> +                                                       page_to_clear);
> +       } while (i <= npages >> block->clear_bmap_shift);

I agree with David that this should be better put into the mutex section, if so
we'd also touch up comment for bitmap_mutex.  Or is it a reason to explicitly
not do so?

> +
>          qemu_mutex_lock(&ram_state->bitmap_mutex);
>          ram_state->migration_dirty_pages -=
>                        bitmap_count_one_with_offset(block->bmap, start, npages);

After my patch (move mutex out of migration_bitmap_clear_dirty()), maybe we can
call migration_bitmap_clear_dirty() directly here rather than introducing a new
helper?  It'll done all the dirty/clear bmap ops including dirty page accounting.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-02  2:29     ` Wang, Wei W
@ 2021-07-06 17:59       ` Peter Xu
  2021-07-07  8:33         ` Wang, Wei W
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2021-07-06 17:59 UTC (permalink / raw)
  To: Wang, Wei W
  Cc: Hailiang Zhang, David Hildenbrand, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Fri, Jul 02, 2021 at 02:29:41AM +0000, Wang, Wei W wrote:
> On Thursday, July 1, 2021 8:51 PM, Peter Xu wrote:
> > On Thu, Jul 01, 2021 at 04:42:38AM +0000, Wang, Wei W wrote:
> > > On Thursday, July 1, 2021 4:08 AM, Peter Xu wrote:
> > > > Taking the mutex every time for each dirty bit to clear is too slow,
> > > > especially we'll take/release even if the dirty bit is cleared.  So
> > > > far it's only used to sync with special cases with
> > > > qemu_guest_free_page_hint() against migration thread, nothing really that
> > serious yet.  Let's move the lock to be upper.
> > > >
> > > > There're two callers of migration_bitmap_clear_dirty().
> > > >
> > > > For migration, move it into ram_save_iterate().  With the help of
> > > > MAX_WAIT logic, we'll only run ram_save_iterate() for no more than
> > > > 50ms-ish time, so taking the lock once there at the entry.  It also
> > > > means any call sites to
> > > > qemu_guest_free_page_hint() can be delayed; but it should be very
> > > > rare, only during migration, and I don't see a problem with it.
> > > >
> > > > For COLO, move it up to colo_flush_ram_cache().  I think COLO forgot
> > > > to take that lock even when calling ramblock_sync_dirty_bitmap(),
> > > > where another example is migration_bitmap_sync() who took it right.
> > > > So let the mutex cover both the
> > > > ramblock_sync_dirty_bitmap() and migration_bitmap_clear_dirty() calls.
> > > >
> > > > It's even possible to drop the lock so we use atomic operations upon
> > > > rb->bmap and the variable migration_dirty_pages.  I didn't do it
> > > > just to still be safe, also not predictable whether the frequent atomic ops
> > could bring overhead too e.g.
> > > > on huge vms when it happens very often.  When that really comes, we
> > > > can keep a local counter and periodically call atomic ops.  Keep it simple for
> > now.
> > > >
> > >
> > > If free page opt is enabled, 50ms waiting time might be too long for handling
> > just one hint (via qemu_guest_free_page_hint)?
> > > How about making the lock conditionally?
> > > e.g.
> > > #define QEMU_LOCK_GUARD_COND (lock, cond) {
> > > 	if (cond)
> > > 		QEMU_LOCK_GUARD(lock);
> > > }
> > > Then in migration_bitmap_clear_dirty:
> > > QEMU_LOCK_GUARD_COND(&rs->bitmap_mutex, rs->fpo_enabled);
> > 
> > Yeah that's indeed some kind of comment I'd like to get from either you or David
> > when I add the cc list.. :)
> > 
> > I was curious how that would affect the guest when the free page hint helper can
> > stuck for a while.  Per my understanding it's fully async as the blocked thread
> > here is asynchronously with the guest since both virtio-balloon and virtio-mem
> > are fully async. If so, would it really affect the guest a lot?  Is it still tolerable if it
> > only happens during migration?
> 
> Yes, it is async and won't block the guest. But it will make the optimization doesn’t run as expected.
> The intention is to have the migration thread skip the transfer of the free pages, but now the migration
> thread is kind of using the 50ms lock to prevent the clearing of free pages while it is likely just sending free pages inside the lock.
> (the reported free pages are better to be cleared in the bitmap in time in case they have already sent)

Yes it would be great of the page can be hinted for clean.  However I think
that's an tradeoff - the migration thread is not less important from that pov.
50ms is not some random number I chose, it's something existed there so I don't
even to bother with thinking a saner period. :) However if there's a known good
suggested number from any balloon developers I'd be love to cooperate.

> 
> > 
> > Taking that mutex for each dirty bit is still an overkill to me, irrelevant of whether
> > it's "conditional" or not.  
> 
> With that, if free page opt is off, the mutex is skipped, isn't it?

Yes, but when free page is on, it'll check once per page.  As I mentioned I
still don't think it's the right thing to do.

We encountered this problem when migrating a 3tb vm and the mutex spins and
eats tons of cpu resources.  It shouldn't happen with/without balloon, imho.

Not to mention the hard migration issues are mostly with non-idle guest, in
that case having the balloon in the guest will be disastrous from this pov
since it'll start to take mutex for each page, while balloon would hardly
report anything valid since most guest pages are being used.

> 
> > If I'm the cloud admin, I would more prefer migration
> > finishes earlier, imho, rather than freeing some more pages on the host (after
> > migration all pages will be gone!).  If it still blocks the guest in some unhealthy
> > way I still prefer to take the lock here, however maybe make it shorter than
> > 50ms.
> > 
> 
> Yes, with the optimization, migration will be finished earlier.
> Why it needs to free pages on the host?
> (just skip sending the page)

However again I would still think lock once per page is too much.  If 50ms is
not acceptable, do you have a best interval number?

Or it would be great if we can start with 50ms interval because that introduces
the least changes, so we separate the problems, on: (1) resolving the mutex
spinning issue with this patch, and (2) finding a suitable period for migration
thread to release the mutex so balloon can perform the best.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-04 14:14   ` Lukas Straub
@ 2021-07-06 18:37     ` Peter Xu
  0 siblings, 0 replies; 37+ messages in thread
From: Peter Xu @ 2021-07-06 18:37 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Hailiang Zhang, David Hildenbrand, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Wei Wang, Leonardo Bras Soares Passos

On Sun, Jul 04, 2021 at 04:14:57PM +0200, Lukas Straub wrote:
> On Sat, 3 Jul 2021 18:31:15 +0200
> Lukas Straub <lukasstraub2@web.de> wrote:
> 
> > On Wed, 30 Jun 2021 16:08:05 -0400
> > Peter Xu <peterx@redhat.com> wrote:
> > 
> > > Taking the mutex every time for each dirty bit to clear is too slow, especially
> > > we'll take/release even if the dirty bit is cleared.  So far it's only used to
> > > sync with special cases with qemu_guest_free_page_hint() against migration
> > > thread, nothing really that serious yet.  Let's move the lock to be upper.
> > > 
> > > There're two callers of migration_bitmap_clear_dirty().
> > > 
> > > For migration, move it into ram_save_iterate().  With the help of MAX_WAIT
> > > logic, we'll only run ram_save_iterate() for no more than 50ms-ish time, so
> > > taking the lock once there at the entry.  It also means any call sites to
> > > qemu_guest_free_page_hint() can be delayed; but it should be very rare, only
> > > during migration, and I don't see a problem with it.
> > > 
> > > For COLO, move it up to colo_flush_ram_cache().  I think COLO forgot to take
> > > that lock even when calling ramblock_sync_dirty_bitmap(), where another example
> > > is migration_bitmap_sync() who took it right.  So let the mutex cover both the
> > > ramblock_sync_dirty_bitmap() and migration_bitmap_clear_dirty() calls.  
> > 
> > Hi,
> > I don't think COLO needs it, colo_flush_ram_cache() only runs on
> > the secondary (incoming) side and AFAIK the bitmap is only set in
> > ram_load_precopy() and they don't run in parallel.
> > 
> > Although I'm not sure what ramblock_sync_dirty_bitmap() does. I guess
> > it's only there to make the rest of the migration code happy?
> 
> To answer myself, it syncs the dirty bitmap of the guest itself with
> the ramblock. Of course not only changed pages on the primary need to
> be overwritten from the cache, but also changed pages on the secondary
> so the ram content exactly matches the primary's.
> 
> Now, I still don't know what would run concurrently there since the
> guest is stopped when colo_flush_ram_cache() runs.

Indeed I know little on COLO so I don't know whether it's needed in practise.
It's just easier to always take the mutex as long as those protected fields are
modified; mutexes always work with single threaded apps anyways.

Or do you prefer me to drop it?  I'll need to rely on your colo knowledge to
know whether it's safe..  I don't think common migration code will be run
during colo, then would qemu_guest_free_page_hint() be called for a colo SVM?
If not, it looks safe to drop the mutex indeed.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-06 17:59       ` Peter Xu
@ 2021-07-07  8:33         ` Wang, Wei W
  2021-07-07 16:44           ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Wang, Wei W @ 2021-07-07  8:33 UTC (permalink / raw)
  To: Peter Xu
  Cc: Hailiang Zhang, David Hildenbrand, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Wednesday, July 7, 2021 2:00 AM, Peter Xu wrote:
> On Fri, Jul 02, 2021 at 02:29:41AM +0000, Wang, Wei W wrote:
> > With that, if free page opt is off, the mutex is skipped, isn't it?
> 
> Yes, but when free page is on, it'll check once per page.  As I mentioned I still
> don't think it's the right thing to do.

With free page opt on, if the migration thread waits for lock acquire on a page, it actually means that it is trying to skip the transfer of a page.
For example, waiting for the lock takes 100ns, then the skip of sending a page saves back 1000ns, then overall we saved 900ns per page (i.e. pay less and earn more).

> 
> We encountered this problem when migrating a 3tb vm and the mutex spins and
> eats tons of cpu resources.  It shouldn't happen with/without balloon, imho.

I think we should compare the overall migration time.

> 
> Not to mention the hard migration issues are mostly with non-idle guest, in that
> case having the balloon in the guest will be disastrous from this pov since it'll start
> to take mutex for each page, while balloon would hardly report anything valid
> since most guest pages are being used.

If no pages are reported, migration thread wouldn't wait on the lock then.

To conclude: to decide whether the per page lock hurts the performance considering that the lock in some sense actually prevents the migration thread from sending free pages which it shouldn't, we need to compare the overall migration time.
(previous data could be found here:https://patchwork.kernel.org/project/kvm/cover/1535333539-32420-1-git-send-email-wei.w.wang@intel.com/, I think the situation should be the same for either 8GB or 3TB guest, in terms of the overall migration time comparison) 

Best,
Wei

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-06 17:47             ` Peter Xu
@ 2021-07-07  8:34               ` Wang, Wei W
  2021-07-07 16:54                 ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Wang, Wei W @ 2021-07-07  8:34 UTC (permalink / raw)
  To: Peter Xu
  Cc: Hailiang Zhang, Juan Quintela, David Hildenbrand, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Wednesday, July 7, 2021 1:47 AM, Peter Xu wrote:
> On Sat, Jul 03, 2021 at 02:53:27AM +0000, Wang, Wei W wrote:
> > +       do {
> > +            page_to_clear = start + (i++ << block->clear_bmap_shift);
> 
> Why "i" needs to be shifted?

Just move to the next clear chunk, no?
For example, (1 << 18) pages chunk (i.e. 1GB).

> 
> > +            migration_clear_memory_region_dirty_bitmap(ram_state,
> > +                                                       block,
> > +
> page_to_clear);
> > +       } while (i <= npages >> block->clear_bmap_shift);
> 
> I agree with David that this should be better put into the mutex section, if so
> we'd also touch up comment for bitmap_mutex.  Or is it a reason to explicitly
> not do so?

clear_bmap_test_and_clear already uses atomic operation on clear_bmap.
But it's also OK to me if you guys feel safer to move it under the lock.

Best,
Wei


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-06 17:39                   ` Peter Xu
@ 2021-07-07 12:45                     ` Wang, Wei W
  2021-07-07 16:45                       ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Wang, Wei W @ 2021-07-07 12:45 UTC (permalink / raw)
  To: Peter Xu, David Hildenbrand
  Cc: Hailiang Zhang, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Wednesday, July 7, 2021 1:40 AM, Peter Xu wrote:
> On Tue, Jul 06, 2021 at 12:05:49PM +0200, David Hildenbrand wrote:
> > On 06.07.21 11:41, Wang, Wei W wrote:
> > > On Monday, July 5, 2021 9:42 PM, David Hildenbrand wrote:
> > > > On 03.07.21 04:53, Wang, Wei W wrote:
> > > > > On Friday, July 2, 2021 3:07 PM, David Hildenbrand wrote:
> > > > > > On 02.07.21 04:48, Wang, Wei W wrote:
> > > > > > > On Thursday, July 1, 2021 10:22 PM, David Hildenbrand wrote:
> > > > > > > > On 01.07.21 14:51, Peter Xu wrote:
> > > > > >
> > > > > > I think that clearly shows the issue.
> > > > > >
> > > > > > My theory I did not verify yet: Assume we have 1GB chunks in the clear
> bmap.
> > > > > > Assume the VM reports all pages within a 1GB chunk as free
> > > > > > (easy with a fresh VM). While processing hints, we will clear
> > > > > > the bits from the dirty bmap in the RAMBlock. As we will never
> > > > > > migrate any page of that 1GB chunk, we will not actually clear
> > > > > > the dirty bitmap of the memory region. When re-syncing, we
> > > > > > will set all bits bits in the dirty bmap again from the dirty
> > > > > > bitmap in the memory region. Thus, many of our hints end up
> > > > > > being mostly ignored. The smaller the clear bmap chunk, the
> > > > more extreme the issue.
> > > > >
> > > > > OK, that looks possible. We need to clear the related bits from
> > > > > the memory region bitmap before skipping the free pages. Could
> > > > > you try with
> > > > below patch:
> > > >
> > > > I did a quick test (with the memhog example) and it seems like it mostly
> works.
> > > > However, we're now doing the bitmap clearing from another, racing
> > > > with the migration thread. Especially:
> > > >
> > > > 1. Racing with clear_bmap_set() via
> > > > cpu_physical_memory_sync_dirty_bitmap()
> > > > 2. Racing with migration_bitmap_clear_dirty()
> > > >
> > > > So that might need some thought, if I'm not wrong.
> > >
> > > I think this is similar to the non-clear_bmap case, where the
> > > rb->bmap is set or cleared by the migration thread and
> > > qemu_guest_free_page_hint. For example, the migration thread could find a
> bit is set from rb->bmap before qemu_guest_free_page_hint gets that bit
> cleared in time.
> > > The result is that the free page is transferred, which isn't necessary, but won't
> cause any issue.
> > > This is very rare in practice.
> >
> > Here are my concerns regarding races:
> >
> > a) We now end up calling migration_clear_memory_region_dirty_bitmap()
> > without holding the bitmap_mutex. We have to clarify if that is ok. At
> > least
> > migration_bitmap_clear_dirty() holds it *while* clearing the log and
> > migration_bitmap_sync() while setting bits in the clear_bmap. I think
> > we also have to hold the mutex on the new path.
> 
> Makes sense; I think we can let bitmap_mutex to protect both dirty/clear
> bitmaps, and also the dirty pages counter.  I'll comment in Wei's patch too later.

Btw, what would you think if we change mutex to QemuSpin? It will also reduce the overhead, I think.

Best,
Wei

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-07  8:33         ` Wang, Wei W
@ 2021-07-07 16:44           ` Peter Xu
  2021-07-08  2:49             ` Wang, Wei W
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2021-07-07 16:44 UTC (permalink / raw)
  To: Wang, Wei W
  Cc: Hailiang Zhang, David Hildenbrand, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Wed, Jul 07, 2021 at 08:33:21AM +0000, Wang, Wei W wrote:
> On Wednesday, July 7, 2021 2:00 AM, Peter Xu wrote:
> > On Fri, Jul 02, 2021 at 02:29:41AM +0000, Wang, Wei W wrote:
> > > With that, if free page opt is off, the mutex is skipped, isn't it?
> > 
> > Yes, but when free page is on, it'll check once per page.  As I mentioned I still
> > don't think it's the right thing to do.
> 
> With free page opt on, if the migration thread waits for lock acquire on a page, it actually means that it is trying to skip the transfer of a page.
> For example, waiting for the lock takes 100ns, then the skip of sending a page saves back 1000ns, then overall we saved 900ns per page (i.e. pay less and earn more).

The overhead we measured are purely for taking the lock, without sleeping.  The
case you mentioned happens very rare, while the cpu cycles to take the lock
(even if it's a cmpxchg) happens constantly for every guest page.

> 
> > 
> > We encountered this problem when migrating a 3tb vm and the mutex spins and
> > eats tons of cpu resources.  It shouldn't happen with/without balloon, imho.
> 
> I think we should compare the overall migration time.

In reality, we've already applied this patch with the 3tb migration test and it
allows us to start migrate the 3tb vm with some light workload, while we can't
do so without this patch.  I don't know whether balloon is enabled or not,
but.. It means, if virtio balloon is enabled, we can't migrate either even if
we make it a conditional lock, becaust the guest is using 2tb+ memory so there
aren't a lot free pages.

> 
> > 
> > Not to mention the hard migration issues are mostly with non-idle guest, in that
> > case having the balloon in the guest will be disastrous from this pov since it'll start
> > to take mutex for each page, while balloon would hardly report anything valid
> > since most guest pages are being used.
> 
> If no pages are reported, migration thread wouldn't wait on the lock then.

Yes I think this is the place I didn't make myself clear.  It's not about
sleeping, it's about the cmpxchg being expensive already when the vm is huge.

> 
> To conclude: to decide whether the per page lock hurts the performance considering that the lock in some sense actually prevents the migration thread from sending free pages which it shouldn't, we need to compare the overall migration time.
> (previous data could be found here:https://patchwork.kernel.org/project/kvm/cover/1535333539-32420-1-git-send-email-wei.w.wang@intel.com/, I think the situation should be the same for either 8GB or 3TB guest, in terms of the overall migration time comparison) 

We can't compare migration time if it can't even converge, isn't it? :) The
mutex is too expensive there so this patch already start to help it converge.

Again, I understand you're worried the patch could make balloon less efficient
for some use cases.  I think we can take the lock less than 50ms, but as I said
it multiple times.. I still don't think it's good to take it per-page; I still
don't believe we need that granularity.  Or please justify why per-page locking
is necessary.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-07 12:45                     ` Wang, Wei W
@ 2021-07-07 16:45                       ` Peter Xu
  2021-07-07 23:25                         ` Wang, Wei W
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2021-07-07 16:45 UTC (permalink / raw)
  To: Wang, Wei W
  Cc: Hailiang Zhang, Juan Quintela, David Hildenbrand, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Wed, Jul 07, 2021 at 12:45:32PM +0000, Wang, Wei W wrote:
> Btw, what would you think if we change mutex to QemuSpin? It will also reduce the overhead, I think.

As I replied at the other place, the bottleneck we encountered is the lock
taking not sleeping, so I'm afraid spinlock will have the same issue. Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-07  8:34               ` Wang, Wei W
@ 2021-07-07 16:54                 ` Peter Xu
  2021-07-08  2:55                   ` Wang, Wei W
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2021-07-07 16:54 UTC (permalink / raw)
  To: Wang, Wei W
  Cc: Hailiang Zhang, Juan Quintela, David Hildenbrand, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Wed, Jul 07, 2021 at 08:34:50AM +0000, Wang, Wei W wrote:
> On Wednesday, July 7, 2021 1:47 AM, Peter Xu wrote:
> > On Sat, Jul 03, 2021 at 02:53:27AM +0000, Wang, Wei W wrote:
> > > +       do {
> > > +            page_to_clear = start + (i++ << block->clear_bmap_shift);
> > 
> > Why "i" needs to be shifted?
> 
> Just move to the next clear chunk, no?
> For example, (1 << 18) pages chunk (i.e. 1GB).

But migration_clear_memory_region_dirty_bitmap() has done the shifting?

> 
> > 
> > > +            migration_clear_memory_region_dirty_bitmap(ram_state,
> > > +                                                       block,
> > > +
> > page_to_clear);
> > > +       } while (i <= npages >> block->clear_bmap_shift);
> > 
> > I agree with David that this should be better put into the mutex section, if so
> > we'd also touch up comment for bitmap_mutex.  Or is it a reason to explicitly
> > not do so?
> 
> clear_bmap_test_and_clear already uses atomic operation on clear_bmap.
> But it's also OK to me if you guys feel safer to move it under the lock.

I see, yes seems ok to be out of the lock.  Or we use mutex to protect all of
them, then make clear_bmap* helpers non-atomic too, just like dirty bmap.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-07 16:45                       ` Peter Xu
@ 2021-07-07 23:25                         ` Wang, Wei W
  2021-07-08  0:21                           ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Wang, Wei W @ 2021-07-07 23:25 UTC (permalink / raw)
  To: Peter Xu
  Cc: Hailiang Zhang, Juan Quintela, David Hildenbrand, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Thursday, July 8, 2021 12:45 AM, Peter Xu wrote:
> On Wed, Jul 07, 2021 at 12:45:32PM +0000, Wang, Wei W wrote:
> > Btw, what would you think if we change mutex to QemuSpin? It will also reduce
> the overhead, I think.
> 
> As I replied at the other place, the bottleneck we encountered is the lock taking
> not sleeping, so I'm afraid spinlock will have the same issue. Thanks,

I suspect the overhead you observed might be caused by the syscalls for mutex. Per-page syscall might be too much.
If possible, you could have another test of the 3GB guest migration using QemuSpin.

Best,
Wei


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-07 23:25                         ` Wang, Wei W
@ 2021-07-08  0:21                           ` Peter Xu
  0 siblings, 0 replies; 37+ messages in thread
From: Peter Xu @ 2021-07-08  0:21 UTC (permalink / raw)
  To: Wang, Wei W
  Cc: Hailiang Zhang, Juan Quintela, David Hildenbrand, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Wed, Jul 07, 2021 at 11:25:50PM +0000, Wang, Wei W wrote:
> On Thursday, July 8, 2021 12:45 AM, Peter Xu wrote:
> > On Wed, Jul 07, 2021 at 12:45:32PM +0000, Wang, Wei W wrote:
> > > Btw, what would you think if we change mutex to QemuSpin? It will also reduce
> > the overhead, I think.
> > 
> > As I replied at the other place, the bottleneck we encountered is the lock taking
> > not sleeping, so I'm afraid spinlock will have the same issue. Thanks,
> 
> I suspect the overhead you observed might be caused by the syscalls for mutex. Per-page syscall might be too much.
> If possible, you could have another test of the 3GB guest migration using QemuSpin.

I'm a bit confused.. Why taking a mutex with no contention would need a syscall?

Please feel free to strace on a general version glibc impl of pthread mutex
which qemu uses, I believe it didn't need to.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-07 16:44           ` Peter Xu
@ 2021-07-08  2:49             ` Wang, Wei W
  2021-07-08 18:30               ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Wang, Wei W @ 2021-07-08  2:49 UTC (permalink / raw)
  To: Peter Xu
  Cc: Hailiang Zhang, David Hildenbrand, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Thursday, July 8, 2021 12:44 AM, Peter Xu wrote:
> > > Not to mention the hard migration issues are mostly with non-idle
> > > guest, in that case having the balloon in the guest will be
> > > disastrous from this pov since it'll start to take mutex for each
> > > page, while balloon would hardly report anything valid since most guest pages
> are being used.
> >
> > If no pages are reported, migration thread wouldn't wait on the lock then.
> 
> Yes I think this is the place I didn't make myself clear.  It's not about sleeping, it's
> about the cmpxchg being expensive already when the vm is huge.

OK.
How did you root cause that it's caused by cmpxchg, instead of lock contention (i.e. syscall and sleep) or
some other code inside pthread_mutex_lock(). Do you have cycles about cmpxchg v.s. cycles of pthread_mutex_lock()?

I check the implementation of pthread_mutex_lock(). The code path for lock acquire is long. QemuSpin looks more efficient.
(probably we also don’t want migration thread to sleep in any case)

I think it's also better to see the comparison of migration throughput data (i.e. pages per second) in the following cases, before we make a decision:
- per-page mutex
- per-page spinlock
- 50-ms mutex

Best,
Wei


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-07 16:54                 ` Peter Xu
@ 2021-07-08  2:55                   ` Wang, Wei W
  2021-07-08 18:10                     ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Wang, Wei W @ 2021-07-08  2:55 UTC (permalink / raw)
  To: Peter Xu
  Cc: Hailiang Zhang, Juan Quintela, David Hildenbrand, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Thursday, July 8, 2021 12:55 AM, Peter Xu wrote:
> On Wed, Jul 07, 2021 at 08:34:50AM +0000, Wang, Wei W wrote:
> > On Wednesday, July 7, 2021 1:47 AM, Peter Xu wrote:
> > > On Sat, Jul 03, 2021 at 02:53:27AM +0000, Wang, Wei W wrote:
> > > > +       do {
> > > > +            page_to_clear = start + (i++ <<
> > > > + block->clear_bmap_shift);
> > >
> > > Why "i" needs to be shifted?
> >
> > Just move to the next clear chunk, no?
> > For example, (1 << 18) pages chunk (i.e. 1GB).
> 
> But migration_clear_memory_region_dirty_bitmap() has done the shifting?
> 

Please see this example: start=0, npages = 2 * (1 <<18), i.e. we have 2 chunks of pages to clear, and start from 0.
First chunk: from 0 to (1 <<18);
Second chunk: from (1 << 18) to 2*(1<<18).

To clear the second chunk, we need to pass (start + "1 << 18") to migration_clear_memory_region_dirty_bitmap(),
and clear_bmap_test_and_clear() there will do ">>18" to transform it into the id of clear_bitmap, which is 1.

Best,
Wei
 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-08  2:55                   ` Wang, Wei W
@ 2021-07-08 18:10                     ` Peter Xu
  0 siblings, 0 replies; 37+ messages in thread
From: Peter Xu @ 2021-07-08 18:10 UTC (permalink / raw)
  To: Wang, Wei W
  Cc: Hailiang Zhang, Juan Quintela, David Hildenbrand, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Thu, Jul 08, 2021 at 02:55:23AM +0000, Wang, Wei W wrote:
> On Thursday, July 8, 2021 12:55 AM, Peter Xu wrote:
> > On Wed, Jul 07, 2021 at 08:34:50AM +0000, Wang, Wei W wrote:
> > > On Wednesday, July 7, 2021 1:47 AM, Peter Xu wrote:
> > > > On Sat, Jul 03, 2021 at 02:53:27AM +0000, Wang, Wei W wrote:
> > > > > +       do {
> > > > > +            page_to_clear = start + (i++ <<
> > > > > + block->clear_bmap_shift);
> > > >
> > > > Why "i" needs to be shifted?
> > >
> > > Just move to the next clear chunk, no?
> > > For example, (1 << 18) pages chunk (i.e. 1GB).
> > 
> > But migration_clear_memory_region_dirty_bitmap() has done the shifting?
> > 
> 
> Please see this example: start=0, npages = 2 * (1 <<18), i.e. we have 2 chunks of pages to clear, and start from 0.
> First chunk: from 0 to (1 <<18);
> Second chunk: from (1 << 18) to 2*(1<<18).
> 
> To clear the second chunk, we need to pass (start + "1 << 18") to migration_clear_memory_region_dirty_bitmap(),
> and clear_bmap_test_and_clear() there will do ">>18" to transform it into the id of clear_bitmap, which is 1.

I see what you meant, thanks; however I still think it's wrong.

+       do {
+            page_to_clear = start + (i++ << block->clear_bmap_shift);
+            migration_clear_memory_region_dirty_bitmap(ram_state,
+                                                       block,
+                                                       page_to_clear);
+       } while (i <= npages >> block->clear_bmap_shift);.

Consider start=1G-1M, npages=2M, this code will only clear log for the 1st 1G
chunk, while we need to clear both the 1st and 2nd 1G chunk.

If we want to go this route, I think a helper would be great to clear-log for a
range of memory too.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-08  2:49             ` Wang, Wei W
@ 2021-07-08 18:30               ` Peter Xu
  2021-07-09  8:58                 ` Wang, Wei W
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2021-07-08 18:30 UTC (permalink / raw)
  To: Wang, Wei W
  Cc: Hailiang Zhang, David Hildenbrand, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Thu, Jul 08, 2021 at 02:49:51AM +0000, Wang, Wei W wrote:
> On Thursday, July 8, 2021 12:44 AM, Peter Xu wrote:
> > > > Not to mention the hard migration issues are mostly with non-idle
> > > > guest, in that case having the balloon in the guest will be
> > > > disastrous from this pov since it'll start to take mutex for each
> > > > page, while balloon would hardly report anything valid since most guest pages
> > are being used.
> > >
> > > If no pages are reported, migration thread wouldn't wait on the lock then.
> > 
> > Yes I think this is the place I didn't make myself clear.  It's not about sleeping, it's
> > about the cmpxchg being expensive already when the vm is huge.
> 
> OK.
> How did you root cause that it's caused by cmpxchg, instead of lock contention (i.e. syscall and sleep) or
> some other code inside pthread_mutex_lock(). Do you have cycles about cmpxchg v.s. cycles of pthread_mutex_lock()?

We've got "perf top -g" showing a huge amount of stacks lying in
pthread_mutex_lock().

> 
> I check the implementation of pthread_mutex_lock(). The code path for lock acquire is long. QemuSpin looks more efficient.
> (probably we also don’t want migration thread to sleep in any case)

I didn't check it, but I really hoped it should be very close to a spinlock
version when it's not contended.  We should be careful on using spin locks in
userspace, e.g., with that moving clear log into critical section will be too
much and actuall close to "wrong", imho, because the kvm memslot lock inside is
sleepable.

I think it's very fine to have migration thread sleep.  IMHO we need explicit
justification for each mutex to be converted to a spinlock in userspace.  So
far I don't see it yet for this bitmap lock.

Frankly I also don't know how spinlock would work reliably without being able
to disable scheduling, then could the thread got scheduled out duing taking the
spinlock?  Would another thread trying to take this lock spinning on this
sleeping task?

> 
> I think it's also better to see the comparison of migration throughput data (i.e. pages per second) in the following cases, before we make a decision:
> - per-page mutex
> - per-page spinlock
> - 50-ms mutex

I don't have the machines, so I can't do this.  I also see this as separate
issues to solve to use spinlock, as I said before I would prefer some
justification first to use it rather than blindly running tests and pick a
patch with higher number.

I also hope we can reach a consensus that we fix one issue at a time.  This
patch already proves itself with some real workloads, I hope we can merge it
first, either with 50ms period or less.

Per-page locking is already against at least my instinction of how this should
be done; I just regreted I didn't see that an issue when reviewing the balloon
patch as I offered the r-b myself.  However I want to make it work as before
then we figure out a better approach on how the lock is taken, and which lock
we should use.  I don't see it a must to do all things in one patch.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-08 18:30               ` Peter Xu
@ 2021-07-09  8:58                 ` Wang, Wei W
  2021-07-09 14:48                   ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Wang, Wei W @ 2021-07-09  8:58 UTC (permalink / raw)
  To: Peter Xu
  Cc: Hailiang Zhang, David Hildenbrand, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Friday, July 9, 2021 2:31 AM, Peter Xu wrote:
> > > Yes I think this is the place I didn't make myself clear.  It's not
> > > about sleeping, it's about the cmpxchg being expensive already when the vm
> is huge.
> >
> > OK.
> > How did you root cause that it's caused by cmpxchg, instead of lock
> > contention (i.e. syscall and sleep) or some other code inside
> pthread_mutex_lock(). Do you have cycles about cmpxchg v.s. cycles of
> pthread_mutex_lock()?
> 
> We've got "perf top -g" showing a huge amount of stacks lying in
> pthread_mutex_lock().

This only explains pthread_mutex_lock is the cause, not root caused to cmpxchg.

> 
> >
> > I check the implementation of pthread_mutex_lock(). The code path for lock
> acquire is long. QemuSpin looks more efficient.
> > (probably we also don’t want migration thread to sleep in any case)
> 
> I didn't check it, but I really hoped it should be very close to a spinlock version
> when it's not contended.  We should be careful on using spin locks in userspace,
> e.g., with that moving clear log into critical section will be too much and actuall
> close to "wrong", imho, because the kvm memslot lock inside is sleepable.

OK, that might be a good reason to keep clear_map out of the lock.
We also don’t want the lock holder to have more chances to go to sleep though it is mutex.

> 
> I think it's very fine to have migration thread sleep.  IMHO we need explicit
> justification for each mutex to be converted to a spinlock in userspace.  So far I
> don't see it yet for this bitmap lock.

What if the guest gets stopped and then the migration thread goes to sleep?

> 
> Frankly I also don't know how spinlock would work reliably without being able to
> disable scheduling, then could the thread got scheduled out duing taking the
> spinlock?  Would another thread trying to take this lock spinning on this sleeping
> task?

Yes, and it needs good justification:
If it's per-page spinlock, the granularity is very small, so it has very little chance that a lock holder gets scheduled out as time slice uses up. Even that happens once, it seems no big issues, just the waiter wastes some CPU cycles, this is better than having the migration thread scheduled out by mutex, I think.

> 
> >
> > I think it's also better to see the comparison of migration throughput data (i.e.
> pages per second) in the following cases, before we make a decision:
> > - per-page mutex
> > - per-page spinlock
> > - 50-ms mutex
> 
> I don't have the machines, so I can't do this.  I also see this as separate issues to
> solve to use spinlock, as I said before I would prefer some justification first to use
> it rather than blindly running tests and pick a patch with higher number.
> 
> I also hope we can reach a consensus that we fix one issue at a time.  This patch
> already proves itself with some real workloads, I hope we can merge it first,
> either with 50ms period or less.
> 
> Per-page locking is already against at least my instinction of how this should be
> done; I just regreted I didn't see that an issue when reviewing the balloon patch
> as I offered the r-b myself.  However I want to make it work as before then we
> figure out a better approach on how the lock is taken, and which lock we should
> use.  I don't see it a must to do all things in one patch.

Sorry, it wasn't my intention to be a barrier to merging your patch.
Just try to come up with better solutions if possible.
- Option1 QemuSpin lock would reduce the lock acquiring overhead, but need to test if that migration could converge;
- Option2 conditional lock. We haven't see the test results if turning on free page optimization with that per-page lock could still make your 2TB guest migration converge.

Seems we lack resources for those tests right now. If you are urgent for a decision to have it work first, I'm also OK with you to merge it.

Best,
Wei


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-09  8:58                 ` Wang, Wei W
@ 2021-07-09 14:48                   ` Peter Xu
  2021-07-13  8:20                     ` Wang, Wei W
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2021-07-09 14:48 UTC (permalink / raw)
  To: Wang, Wei W
  Cc: Hailiang Zhang, David Hildenbrand, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Fri, Jul 09, 2021 at 08:58:08AM +0000, Wang, Wei W wrote:
> On Friday, July 9, 2021 2:31 AM, Peter Xu wrote:
> > > > Yes I think this is the place I didn't make myself clear.  It's not
> > > > about sleeping, it's about the cmpxchg being expensive already when the vm
> > is huge.
> > >
> > > OK.
> > > How did you root cause that it's caused by cmpxchg, instead of lock
> > > contention (i.e. syscall and sleep) or some other code inside
> > pthread_mutex_lock(). Do you have cycles about cmpxchg v.s. cycles of
> > pthread_mutex_lock()?
> > 
> > We've got "perf top -g" showing a huge amount of stacks lying in
> > pthread_mutex_lock().
> 
> This only explains pthread_mutex_lock is the cause, not root caused to cmpxchg.

I think that's enough already to prove we can move the lock elsewhere.

It's not really a heavy race between threads; it's the pure overhead we called
it too many times.  So it's not really a problem yet about "what type of lock
we should use" (mutex or spin lock) or "how this lock is implemented" (say,
whether using cmpxchg only or optimize using test + test-and-set, as that
sounds like a good optimization of pure test-and-set spinlocks) because the
lock is not busy at all.

If we have two busy threads contentioning on the lock for real, that'll be
another problem, imho.  It's not yet the case.

> 
> > 
> > >
> > > I check the implementation of pthread_mutex_lock(). The code path for lock
> > acquire is long. QemuSpin looks more efficient.
> > > (probably we also don’t want migration thread to sleep in any case)
> > 
> > I didn't check it, but I really hoped it should be very close to a spinlock version
> > when it's not contended.  We should be careful on using spin locks in userspace,
> > e.g., with that moving clear log into critical section will be too much and actuall
> > close to "wrong", imho, because the kvm memslot lock inside is sleepable.
> 
> OK, that might be a good reason to keep clear_map out of the lock.
> We also don’t want the lock holder to have more chances to go to sleep though it is mutex.

Yes that looks fine to me, but that's really another topic.  It's not a
requirement either before spinlock is justified to be better than mutex here.

> 
> > 
> > I think it's very fine to have migration thread sleep.  IMHO we need explicit
> > justification for each mutex to be converted to a spinlock in userspace.  So far I
> > don't see it yet for this bitmap lock.
> 
> What if the guest gets stopped and then the migration thread goes to sleep?

Isn't the balloon code run in a standalone iothread?  Why guest stopped can
stop migration thread?

> 
> > 
> > Frankly I also don't know how spinlock would work reliably without being able to
> > disable scheduling, then could the thread got scheduled out duing taking the
> > spinlock?  Would another thread trying to take this lock spinning on this sleeping
> > task?
> 
> Yes, and it needs good justification:
> If it's per-page spinlock, the granularity is very small, so it has very little chance that a lock holder gets scheduled out as time slice uses up. Even that happens once, it seems no big issues, just the waiter wastes some CPU cycles, this is better than having the migration thread scheduled out by mutex, I think.

Yes I believe there's good reason when Emilio introduced the lock, I didn't
measure it myself so I can't tell.  It's just that we need more justification
that this mutex is needed to be replaced with a spinlock.

> 
> > 
> > >
> > > I think it's also better to see the comparison of migration throughput data (i.e.
> > pages per second) in the following cases, before we make a decision:
> > > - per-page mutex
> > > - per-page spinlock
> > > - 50-ms mutex
> > 
> > I don't have the machines, so I can't do this.  I also see this as separate issues to
> > solve to use spinlock, as I said before I would prefer some justification first to use
> > it rather than blindly running tests and pick a patch with higher number.
> > 
> > I also hope we can reach a consensus that we fix one issue at a time.  This patch
> > already proves itself with some real workloads, I hope we can merge it first,
> > either with 50ms period or less.
> > 
> > Per-page locking is already against at least my instinction of how this should be
> > done; I just regreted I didn't see that an issue when reviewing the balloon patch
> > as I offered the r-b myself.  However I want to make it work as before then we
> > figure out a better approach on how the lock is taken, and which lock we should
> > use.  I don't see it a must to do all things in one patch.
> 
> Sorry, it wasn't my intention to be a barrier to merging your patch.

No problem, and I appreciate for all the discussions.  It's just that as you
see I still prefer to keep it like this first.

free-page-hint is a great feature, though if we grep it in libvirt we found
that it's not yet enabled anywhere.  It means it's not used as a majority, yet.
Meanwhile the code path this patch changes happen for each migration that qemu
started.  What I'm afraid is we worry too much on what most people are not
using, during which we didn't really make the majority works better.

From what I learned in the past few years, funnily "speed of migration" is
normally not what people care the most.  Issues are mostly with convergence and
being transparent to users using the VMs so they aren't even aware.

I admit this patch may not be the perfect one, but that's the reason I wanted
to do it in two steps as spinlock definitely needs more justification here,
which I can't provide myself, while this patch is verified to help.

> Just try to come up with better solutions if possible.
> - Option1 QemuSpin lock would reduce the lock acquiring overhead, but need to test if that migration could converge;
> - Option2 conditional lock. We haven't see the test results if turning on free page optimization with that per-page lock could still make your 2TB guest migration converge.

Yes after I checked libvirt I am more confident it's not with free-page-hint,
and as I said I wonder why you think free-page-hint could help a heavy memory
usage guest at all, as I mentioned most memories are used.

If we want to further optimize it, I don't think it'll make sense to chooise
either option 1 or 2 but only to take them all, but it'll be great when it
comes who proposed that new change also compare that with current patch 50ms
even if sleepable, so I'm wondering whether that'll really make a measurable
difference, especially if we can also shrink that 50ms period too, which is
what I asked initially.

> 
> Seems we lack resources for those tests right now. If you are urgent for a decision to have it work first, I'm also OK with you to merge it.

No I can't merge it myself as I'm not the maintainer. :) I haven't received any
ack yet, so at least I'll need to see how Dave and Juan think.  It's just that
I don't think qemuspin could help much in this case, and I don't want to mess
up the issue too.

Thanks for being considerate.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-09 14:48                   ` Peter Xu
@ 2021-07-13  8:20                     ` Wang, Wei W
  0 siblings, 0 replies; 37+ messages in thread
From: Wang, Wei W @ 2021-07-13  8:20 UTC (permalink / raw)
  To: Peter Xu
  Cc: Hailiang Zhang, David Hildenbrand, Juan Quintela, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Friday, July 9, 2021 10:48 PM, Peter Xu wrote:
> On Fri, Jul 09, 2021 at 08:58:08AM +0000, Wang, Wei W wrote:
> > On Friday, July 9, 2021 2:31 AM, Peter Xu wrote:
> > > > > Yes I think this is the place I didn't make myself clear.  It's
> > > > > not about sleeping, it's about the cmpxchg being expensive
> > > > > already when the vm
> > > is huge.
> > > >
> > > > OK.
> > > > How did you root cause that it's caused by cmpxchg, instead of
> > > > lock contention (i.e. syscall and sleep) or some other code inside
> > > pthread_mutex_lock(). Do you have cycles about cmpxchg v.s. cycles
> > > of pthread_mutex_lock()?
> > >
> > > We've got "perf top -g" showing a huge amount of stacks lying in
> > > pthread_mutex_lock().
> >
> > This only explains pthread_mutex_lock is the cause, not root caused to
> cmpxchg.
> 
> I think that's enough already to prove we can move the lock elsewhere.
> 
> It's not really a heavy race between threads; it's the pure overhead we called it
> too many times.  So it's not really a problem yet about "what type of lock we
> should use" (mutex or spin lock) or "how this lock is implemented" (say, whether
> using cmpxchg only or optimize using test + test-and-set, as that sounds like a
> good optimization of pure test-and-set spinlocks) because the lock is not busy at
> all.

Just FYI:
there is a big while(1) {} inside pthread_mutex_lock, not sure if the hotspot is in the loop there.


> > What if the guest gets stopped and then the migration thread goes to sleep?
> 
> Isn't the balloon code run in a standalone iothread?  Why guest stopped can
> stop migration thread?

Yes, it is async as you know. Guest puts hints into the vq, then gets paused,
and then the device takes the mutex, then the migration thread gets blocked.
In general, when we use mutex, we need to consider that case that it could be blocked.

> From what I learned in the past few years, funnily "speed of migration" is
> normally not what people care the most.  Issues are mostly with convergence
> and being transparent to users using the VMs so they aren't even aware.

Yes, migration time isn’t that critically important, but shorter is better than longer.
Skipping those free pages saves network bandwidth, which is also good.
Oherwise, 0-page optimization in migration is also meaningless.
In theory, free pages in the last round could also be skipped to reduce downtime
(just haven't got a good testcase to show it).

> >
> > Seems we lack resources for those tests right now. If you are urgent for a
> decision to have it work first, I'm also OK with you to merge it.
> 
> No I can't merge it myself as I'm not the maintainer. :) I haven't received any ack
> yet, so at least I'll need to see how Dave and Juan think.  It's just that I don't
> think qemuspin could help much in this case, and I don't want to mess up the
> issue too.
> 

Yes, I'm also OK if they want to merge it.
If it is possible that anyone from your testing team (QA?) could help do a regression test of free page hint,
checking the difference (e.g. migration time of an idle guest) after applying this patch, that would be greater. 
Thanks!

Best,
Wei

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-06-30 20:08 [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty() Peter Xu
  2021-07-01  4:42 ` Wang, Wei W
  2021-07-03 16:31 ` Lukas Straub
@ 2021-07-13  8:40 ` Wang, Wei W
  2021-07-13 10:22   ` David Hildenbrand
  2021-07-13 15:59   ` Peter Xu
  2 siblings, 2 replies; 37+ messages in thread
From: Wang, Wei W @ 2021-07-13  8:40 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Leonardo Bras Soares Passos, Juan Quintela, Hailiang Zhang,
	Dr . David Alan Gilbert, David Hildenbrand

On Thursday, July 1, 2021 4:08 AM, Peter Xu wrote:
> Taking the mutex every time for each dirty bit to clear is too slow, especially we'll
> take/release even if the dirty bit is cleared.  So far it's only used to sync with
> special cases with qemu_guest_free_page_hint() against migration thread,
> nothing really that serious yet.  Let's move the lock to be upper.
> 
> There're two callers of migration_bitmap_clear_dirty().
> 
> For migration, move it into ram_save_iterate().  With the help of MAX_WAIT
> logic, we'll only run ram_save_iterate() for no more than 50ms-ish time, so taking
> the lock once there at the entry.  It also means any call sites to
> qemu_guest_free_page_hint() can be delayed; but it should be very rare, only
> during migration, and I don't see a problem with it.
> 
> For COLO, move it up to colo_flush_ram_cache().  I think COLO forgot to take
> that lock even when calling ramblock_sync_dirty_bitmap(), where another
> example is migration_bitmap_sync() who took it right.  So let the mutex cover
> both the
> ramblock_sync_dirty_bitmap() and migration_bitmap_clear_dirty() calls.
> 
> It's even possible to drop the lock so we use atomic operations upon rb->bmap
> and the variable migration_dirty_pages.  I didn't do it just to still be safe, also
> not predictable whether the frequent atomic ops could bring overhead too e.g.
> on huge vms when it happens very often.  When that really comes, we can
> keep a local counter and periodically call atomic ops.  Keep it simple for now.
> 
> Cc: Wei Wang <wei.w.wang@intel.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Juan Quintela <quintela@redhat.com>
> Cc: Leonardo Bras Soares Passos <lsoaresp@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>

FWIW
Reviewed-by: Wei Wang <wei.w.wang@intel.com>

If no one could help do a regression test of free page hint, please document something like this in the patch:
Locking at the coarser granularity is possible to minimize the improvement brought by free page hints, but seems not causing critical issues.
We will let users of free page hints to report back any requirement and come up with a better solution later.

Best,
Wei




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-13  8:40 ` Wang, Wei W
@ 2021-07-13 10:22   ` David Hildenbrand
  2021-07-14  5:03     ` Wang, Wei W
  2021-07-13 15:59   ` Peter Xu
  1 sibling, 1 reply; 37+ messages in thread
From: David Hildenbrand @ 2021-07-13 10:22 UTC (permalink / raw)
  To: Wang, Wei W, Peter Xu, qemu-devel
  Cc: Hailiang Zhang, Leonardo Bras Soares Passos,
	Dr . David Alan Gilbert, Juan Quintela

On 13.07.21 10:40, Wang, Wei W wrote:
> On Thursday, July 1, 2021 4:08 AM, Peter Xu wrote:
>> Taking the mutex every time for each dirty bit to clear is too slow, especially we'll
>> take/release even if the dirty bit is cleared.  So far it's only used to sync with
>> special cases with qemu_guest_free_page_hint() against migration thread,
>> nothing really that serious yet.  Let's move the lock to be upper.
>>
>> There're two callers of migration_bitmap_clear_dirty().
>>
>> For migration, move it into ram_save_iterate().  With the help of MAX_WAIT
>> logic, we'll only run ram_save_iterate() for no more than 50ms-ish time, so taking
>> the lock once there at the entry.  It also means any call sites to
>> qemu_guest_free_page_hint() can be delayed; but it should be very rare, only
>> during migration, and I don't see a problem with it.
>>
>> For COLO, move it up to colo_flush_ram_cache().  I think COLO forgot to take
>> that lock even when calling ramblock_sync_dirty_bitmap(), where another
>> example is migration_bitmap_sync() who took it right.  So let the mutex cover
>> both the
>> ramblock_sync_dirty_bitmap() and migration_bitmap_clear_dirty() calls.
>>
>> It's even possible to drop the lock so we use atomic operations upon rb->bmap
>> and the variable migration_dirty_pages.  I didn't do it just to still be safe, also
>> not predictable whether the frequent atomic ops could bring overhead too e.g.
>> on huge vms when it happens very often.  When that really comes, we can
>> keep a local counter and periodically call atomic ops.  Keep it simple for now.
>>
>> Cc: Wei Wang <wei.w.wang@intel.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
>> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> Cc: Juan Quintela <quintela@redhat.com>
>> Cc: Leonardo Bras Soares Passos <lsoaresp@redhat.com>
>> Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> FWIW
> Reviewed-by: Wei Wang <wei.w.wang@intel.com>
> 
> If no one could help do a regression test of free page hint, please document something like this in the patch:
> Locking at the coarser granularity is possible to minimize the improvement brought by free page hints, but seems not causing critical issues.
> We will let users of free page hints to report back any requirement and come up with a better solution later.

Can you send an official patch for the free page hinting clean_bmap 
handling I reported?

I can then give both tests in combination a quick test (before/after 
this patch here).

Cheers

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-13  8:40 ` Wang, Wei W
  2021-07-13 10:22   ` David Hildenbrand
@ 2021-07-13 15:59   ` Peter Xu
  2021-07-14  5:04     ` Wang, Wei W
  1 sibling, 1 reply; 37+ messages in thread
From: Peter Xu @ 2021-07-13 15:59 UTC (permalink / raw)
  To: Wang, Wei W
  Cc: Hailiang Zhang, Juan Quintela, David Hildenbrand, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Tue, Jul 13, 2021 at 08:40:21AM +0000, Wang, Wei W wrote:
> On Thursday, July 1, 2021 4:08 AM, Peter Xu wrote:
> > Taking the mutex every time for each dirty bit to clear is too slow, especially we'll
> > take/release even if the dirty bit is cleared.  So far it's only used to sync with
> > special cases with qemu_guest_free_page_hint() against migration thread,
> > nothing really that serious yet.  Let's move the lock to be upper.
> > 
> > There're two callers of migration_bitmap_clear_dirty().
> > 
> > For migration, move it into ram_save_iterate().  With the help of MAX_WAIT
> > logic, we'll only run ram_save_iterate() for no more than 50ms-ish time, so taking
> > the lock once there at the entry.  It also means any call sites to
> > qemu_guest_free_page_hint() can be delayed; but it should be very rare, only
> > during migration, and I don't see a problem with it.
> > 
> > For COLO, move it up to colo_flush_ram_cache().  I think COLO forgot to take
> > that lock even when calling ramblock_sync_dirty_bitmap(), where another
> > example is migration_bitmap_sync() who took it right.  So let the mutex cover
> > both the
> > ramblock_sync_dirty_bitmap() and migration_bitmap_clear_dirty() calls.
> > 
> > It's even possible to drop the lock so we use atomic operations upon rb->bmap
> > and the variable migration_dirty_pages.  I didn't do it just to still be safe, also
> > not predictable whether the frequent atomic ops could bring overhead too e.g.
> > on huge vms when it happens very often.  When that really comes, we can
> > keep a local counter and periodically call atomic ops.  Keep it simple for now.
> > 
> > Cc: Wei Wang <wei.w.wang@intel.com>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
> > Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Cc: Juan Quintela <quintela@redhat.com>
> > Cc: Leonardo Bras Soares Passos <lsoaresp@redhat.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> FWIW
> Reviewed-by: Wei Wang <wei.w.wang@intel.com>

Thanks, Wei.

> 
> If no one could help do a regression test of free page hint, please document something like this in the patch:
> Locking at the coarser granularity is possible to minimize the improvement brought by free page hints, but seems not causing critical issues.
> We will let users of free page hints to report back any requirement and come up with a better solution later.

Didn't get a chance to document it as it's in a pull now; but as long as you're
okay with no-per-page lock (which I still don't agree with), I can follow this up.

Are below parameters enough for me to enable free-page-hint?

     -object iothread,id=io1 \
     -device virtio-balloon,free-page-hint=on,iothread=io1 \

I tried to verify it's running by tracing inside guest with kprobe
report_free_page_func() but it didn't really trigger.  Guest has kernel
5.11.12-300.fc34.x86_64, should be fairly new to have that supported.  Do you
know what I'm missing?

P.S. This also reminded me that, maybe we want an entry in qemu-options.hx for
balloon device, as it has lots of options, some of them may not be easy to be
setup right.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-13 10:22   ` David Hildenbrand
@ 2021-07-14  5:03     ` Wang, Wei W
  0 siblings, 0 replies; 37+ messages in thread
From: Wang, Wei W @ 2021-07-14  5:03 UTC (permalink / raw)
  To: David Hildenbrand, Peter Xu, qemu-devel
  Cc: Hailiang Zhang, Leonardo Bras Soares Passos,
	Dr . David Alan Gilbert, Juan Quintela

On Tuesday, July 13, 2021 6:22 PM, David Hildenbrand wrote:
> Can you send an official patch for the free page hinting clean_bmap handling I
> reported?
> 
> I can then give both tests in combination a quick test (before/after this patch
> here).
> 

Yes, I'll send, thanks!

Best,
Wei

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
  2021-07-13 15:59   ` Peter Xu
@ 2021-07-14  5:04     ` Wang, Wei W
  0 siblings, 0 replies; 37+ messages in thread
From: Wang, Wei W @ 2021-07-14  5:04 UTC (permalink / raw)
  To: Peter Xu
  Cc: Hailiang Zhang, Juan Quintela, David Hildenbrand, qemu-devel,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Tuesday, July 13, 2021 11:59 PM, Peter Xu wrote:
> On Tue, Jul 13, 2021 at 08:40:21AM +0000, Wang, Wei W wrote:
> 
> Didn't get a chance to document it as it's in a pull now; but as long as you're okay
> with no-per-page lock (which I still don't agree with), I can follow this up.
> 
> Are below parameters enough for me to enable free-page-hint?
> 
>      -object iothread,id=io1 \
>      -device virtio-balloon,free-page-hint=on,iothread=io1 \
> 
> I tried to verify it's running by tracing inside guest with kprobe
> report_free_page_func() but it didn't really trigger.  Guest has kernel
> 5.11.12-300.fc34.x86_64, should be fairly new to have that supported.  Do you
> know what I'm missing?

Please check if you have the virtio-balloon driver loaded in the guest.

> 
> P.S. This also reminded me that, maybe we want an entry in qemu-options.hx for
> balloon device, as it has lots of options, some of them may not be easy to be
> setup right.
> 

Sounds good to me.

Best,
Wei

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2021-07-14  5:05 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-30 20:08 [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty() Peter Xu
2021-07-01  4:42 ` Wang, Wei W
2021-07-01 12:51   ` Peter Xu
2021-07-01 14:21     ` David Hildenbrand
2021-07-02  2:48       ` Wang, Wei W
2021-07-02  7:06         ` David Hildenbrand
2021-07-03  2:53           ` Wang, Wei W
2021-07-05 13:41             ` David Hildenbrand
2021-07-06  9:41               ` Wang, Wei W
2021-07-06 10:05                 ` David Hildenbrand
2021-07-06 17:39                   ` Peter Xu
2021-07-07 12:45                     ` Wang, Wei W
2021-07-07 16:45                       ` Peter Xu
2021-07-07 23:25                         ` Wang, Wei W
2021-07-08  0:21                           ` Peter Xu
2021-07-06 17:47             ` Peter Xu
2021-07-07  8:34               ` Wang, Wei W
2021-07-07 16:54                 ` Peter Xu
2021-07-08  2:55                   ` Wang, Wei W
2021-07-08 18:10                     ` Peter Xu
2021-07-02  2:29     ` Wang, Wei W
2021-07-06 17:59       ` Peter Xu
2021-07-07  8:33         ` Wang, Wei W
2021-07-07 16:44           ` Peter Xu
2021-07-08  2:49             ` Wang, Wei W
2021-07-08 18:30               ` Peter Xu
2021-07-09  8:58                 ` Wang, Wei W
2021-07-09 14:48                   ` Peter Xu
2021-07-13  8:20                     ` Wang, Wei W
2021-07-03 16:31 ` Lukas Straub
2021-07-04 14:14   ` Lukas Straub
2021-07-06 18:37     ` Peter Xu
2021-07-13  8:40 ` Wang, Wei W
2021-07-13 10:22   ` David Hildenbrand
2021-07-14  5:03     ` Wang, Wei W
2021-07-13 15:59   ` Peter Xu
2021-07-14  5:04     ` Wang, Wei W

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).