On Wed, 30 Jun 2021 16:08:05 -0400 Peter Xu wrote: > Taking the mutex every time for each dirty bit to clear is too slow, especially > we'll take/release even if the dirty bit is cleared. So far it's only used to > sync with special cases with qemu_guest_free_page_hint() against migration > thread, nothing really that serious yet. Let's move the lock to be upper. > > There're two callers of migration_bitmap_clear_dirty(). > > For migration, move it into ram_save_iterate(). With the help of MAX_WAIT > logic, we'll only run ram_save_iterate() for no more than 50ms-ish time, so > taking the lock once there at the entry. It also means any call sites to > qemu_guest_free_page_hint() can be delayed; but it should be very rare, only > during migration, and I don't see a problem with it. > > For COLO, move it up to colo_flush_ram_cache(). I think COLO forgot to take > that lock even when calling ramblock_sync_dirty_bitmap(), where another example > is migration_bitmap_sync() who took it right. So let the mutex cover both the > ramblock_sync_dirty_bitmap() and migration_bitmap_clear_dirty() calls. Hi, I don't think COLO needs it, colo_flush_ram_cache() only runs on the secondary (incoming) side and AFAIK the bitmap is only set in ram_load_precopy() and they don't run in parallel. Although I'm not sure what ramblock_sync_dirty_bitmap() does. I guess it's only there to make the rest of the migration code happy? Regards, Lukas Straub > It's even possible to drop the lock so we use atomic operations upon rb->bmap > and the variable migration_dirty_pages. I didn't do it just to still be safe, > also not predictable whether the frequent atomic ops could bring overhead too > e.g. on huge vms when it happens very often. When that really comes, we can > keep a local counter and periodically call atomic ops. Keep it simple for now. > > Cc: Wei Wang > Cc: David Hildenbrand > Cc: Hailiang Zhang > Cc: Dr. David Alan Gilbert > Cc: Juan Quintela > Cc: Leonardo Bras Soares Passos > Signed-off-by: Peter Xu > --- > migration/ram.c | 13 +++++++++++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/migration/ram.c b/migration/ram.c > index 723af67c2e..9f2965675d 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -795,8 +795,6 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs, > { > bool ret; > > - QEMU_LOCK_GUARD(&rs->bitmap_mutex); > - > /* > * Clear dirty bitmap if needed. This _must_ be called before we > * send any of the page in the chunk because we need to make sure > @@ -2834,6 +2832,14 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) > goto out; > } > > + /* > + * We'll take this lock a little bit long, but it's okay for two reasons. > + * Firstly, the only possible other thread to take it is who calls > + * qemu_guest_free_page_hint(), which should be rare; secondly, see > + * MAX_WAIT (if curious, further see commit 4508bd9ed8053ce) below, which > + * guarantees that we'll at least released it in a regular basis. > + */ > + qemu_mutex_lock(&rs->bitmap_mutex); > WITH_RCU_READ_LOCK_GUARD() { > if (ram_list.version != rs->last_version) { > ram_state_reset(rs); > @@ -2893,6 +2899,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) > i++; > } > } > + qemu_mutex_unlock(&rs->bitmap_mutex); > > /* > * Must occur before EOS (or any QEMUFile operation) > @@ -3682,6 +3689,7 @@ void colo_flush_ram_cache(void) > unsigned long offset = 0; > > memory_global_dirty_log_sync(); > + qemu_mutex_lock(&ram_state->bitmap_mutex); > WITH_RCU_READ_LOCK_GUARD() { > RAMBLOCK_FOREACH_NOT_IGNORED(block) { > ramblock_sync_dirty_bitmap(ram_state, block); > @@ -3710,6 +3718,7 @@ void colo_flush_ram_cache(void) > } > } > trace_colo_flush_ram_cache_end(); > + qemu_mutex_unlock(&ram_state->bitmap_mutex); > } > > /** --