[Qemu-devel] dirty page count problem

* [Qemu-devel] dirty page count problem
@ 2017-07-21 17:28 Dr. David Alan Gilbert
  2017-07-21 19:07 ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 3+ messages in thread
From: Dr. David Alan Gilbert @ 2017-07-21 17:28 UTC (permalink / raw)
  To: haozhong.zhang; +Cc: qemu-devel, alex.benee, peterx, lvivier, quintela

Hi,
  Git bisect is pointing to your patch 084140bd49:
  exec: fix access to ram_list.dirty_memory when sync dirty bitmap

trying to diagnose a bug I'm seeing; it looks like the dirty page count
is wrong for some reason.

Alex Bennée spotted a problem where the postcopy test would occasionally
fail under very heavy load;    attaching a debugger and it looks like
the problem is we have a migration_dirty_page count stuck at 2;
in the normal migration tests we don't spot this, because 2 pages is
smaller than the threshold to end migration and so an extra 2 pages
doesn't block it finishing.   However, with a very
small downtime setting (like we use in the postcopy test) and with
very low bandwidth (as when Alex ran the test on a very heavily loaded
machine) we end up never calling the bitmap sync again and never
completing the iteration.

I'm using the following addition to spot the problem:

diff --git a/migration/ram.c b/migration/ram.c
index e75f1050e4..3ddf884952 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1350,6 +1350,13 @@ static int ram_find_and_save_block(RAMState *rs, bool last_stage)
         }
     } while (!pages && again);

+    if (!pages && !again && pss.complete_round && rs->migration_dirty_pages)
+    {
+        /* Should make this fail migration ? */
+        fprintf(stderr, "%s: no page found, yet dirty_pages=%"PRIu64"\n",
+                __func__, rs->migration_dirty_pages);
+    }
+
     rs->last_seen_block = pss.block;
     rs->last_page = pss.page;

(which I might add as a test to fail a migration)

That test fails easily even on an unloaded machine:
tests/postcopy-test
/x86_64/postcopy: ram_find_and_save_block: no page found, yet dirty_pages=2
ram_find_and_save_block: no page found, yet dirty_pages=2
ram_find_and_save_block: no page found, yet dirty_pages=2
OK


I'll try and debug where our extra two pages are coming from.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply related	[flat|nested] 3+ messages in thread