[Qemu-devel] migration: broken ram_save_pending

* [Qemu-devel] migration: broken ram_save_pending
@ 2014-02-04  7:15 Alexey Kardashevskiy
  2014-02-04 10:46 ` Paolo Bonzini
  0 siblings, 1 reply; 18+ messages in thread
From: Alexey Kardashevskiy @ 2014-02-04  7:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Alex Graf

Hi!

I hit a problem with migration which I would like to know how to fix.

I run the source QEMU as this:
./qemu-system-ppc64 -enable-kvm -m 1024 -machine pseries \
	-nographic -vga none

For the destination, I add "-incoming tcp:localhost:4000".

Both run on the same POWER8 machine, the latest QEMU with the my
"hpratio=1" and "fix SLB migration" patches applied. The host kernel is
3.12 with 64K system page size (does not matter much though).

Since the source QEMU does not get any kernel or disk or net, it stays in
SLOF prompt. Very simple config.

Now I do migration. First "bulk" iteration goes pretty quick, all good.
When it is done, we enter "while (pending()) iterate()" loop in the
migration_thread() function.

The idea is - when the number of changes becomes small enough to get
transferred within a "maximum downtime" timeout, the migration will finish.

However bit different thing happens in this configuration. For some reason
(which I do not really know, I would like to but it is irrelevant here)
SLOF keeps dirtying few pages (for example, 6), each is 64K or up to 96 4K
pages in QEMU terms (393216 bytes). Every time the ram_save_pending() is
called, 96 pages are dirty. This is not a huge number but
ram_save_iterate() moves the migration file pointer only 287544 bytes
further because this is what was actually transferred and this number is
less due to the "is_zero_range(p, TARGET_PAGE_SIZE)" optimization.

So. migration_thread() gets dirty pages number, tries to send them in a
loop but every iteration resets the number of pages to 96 and we start
again. After several tries we cross BUFFER_DELAY timeout and calculate new
@max_size and if the host machine is fast enough it is bigger than 393216
and next loop will finally finish the migration.

How to fix this misbehavior?

I can only think of something simple like below and not sure it does not
break other things. I would expect ram_save_pending() to return correct
number of bytes QEMU is going to send rather than number of pages
multiplied by 4096 but checking if all these pages are really empty is not
too cheap.

Thanks!

diff --git a/arch_init.c b/arch_init.c
index 2ba297e..90949b0 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -537,16 +537,17 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
                         acct_info.dup_pages++;
                     }
                 }
             } else if (is_zero_range(p, TARGET_PAGE_SIZE)) {
                 acct_info.dup_pages++;
                 bytes_sent = save_block_hdr(f, block, offset, cont,
                                             RAM_SAVE_FLAG_COMPRESS);
                 qemu_put_byte(f, 0);
+                qemu_update_position(f, TARGET_PAGE_SIZE);
                 bytes_sent++;
             } else if (!ram_bulk_stage && migrate_use_xbzrle()) {
                 current_addr = block->offset + offset;
                 bytes_sent = save_xbzrle_page(f, p, current_addr, block,
                                               offset, cont, last_stage);
                 if (!last_stage) {
                     p = get_cached_data(XBZRLE.cache, current_addr);
                 }


-- 
Alexey

^ permalink raw reply related	[flat|nested] 18+ messages in thread