All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC 0/7] Fix migration with lots of memory
@ 2012-05-22 18:32 Juan Quintela
  2012-05-22 18:32 ` [Qemu-devel] [PATCH 1/7] Add spent time for migration Juan Quintela
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Juan Quintela @ 2012-05-22 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: chegu_vinod

Hi

After a long, long time, this is v2.

This are basically the changes that we have for RHEL, due to the
problems that we have with big memory machines.  I just rebased the
patches and fixed the easy parts:

- buffered_file_limit is gone: we just use 50ms and call it a day

- I let ram_addr_t as a valid type for a counter (no, I still don't
  agree with Anthony on this, but it is not important).

- Print total time of migration always.  Notice that I also print it
  when migration is completed.  Luiz, could you take a look to see if
  I did something worng (probably).

- Moved debug printfs to tracepointns.  Thanks a lot to Stefan for
  helping with it.  Once here, I had to put the traces in the middle
  of trace-events file, if I put them on the end of the file, when I
  enable them, I got generated the previous two tracepoints, instead
  of the ones I just defined.  Stefan is looking on that.  Workaround
  is defining them anywhere else.

- exit from cpu_physical_memory_reset_dirty().  Anthony wanted that I
  created an empty stub for kvm, and maintain the code for tcg.  The
  problem is that we can have both kvm and tcg running from the same
  binary.  Intead of exiting in the middle of the function, I just
  refactored the code out.  Is there an struct where I could add a new
  function pointer for this behaviour?

- exit if we have been too long on ram_save_live() loop.  Anthony
  didn't like this, I will sent a version based on the migration
  thread in the following days.  But just need something working for
  other people to test.

  Notice that I still got "lots" of more than 50ms printf's. (Yes,
  there is a debugging printf there).

- Bitmap handling.  Still all code to count dirty pages, will try to
  get something saner based on bitmap optimizations.

Comments?

Later, Juan.




v1:
---

Executive Summary
-----------------

This series of patches fix migration with lots of memory.  With them stalls
are removed, and we honored max_dowtime.
I also add infrastructure to measure what is happening during migration
(#define DEBUG_MIGRATION and DEBUG_SAVEVM).

Migration is broken at the momment in qemu tree, Michael patch is needed to
fix virtio migration.  Measurements are given for qemu-kvm tree.  At the end, some measurements
with qemu tree.

Long Version with measurements (for those that like numbers O:-)
------------------------------

8 vCPUS and 64GB RAM, a RHEL5 guest that is completelly idle

initial
-------

   savevm: save live iterate section id 3 name ram took 3266 milliseconds 46 times

We have 46 stalls, and missed the 100ms deadline 46 times.
stalls took around 3.5 and 3.6 seconds each.

   savevm: save devices took 1 milliseconds

if you had any doubt (rest of devices, not RAM) took less than 1ms, so
we don't care for now to optimize them.

   migration: ended after 207411 milliseconds

total migration took 207 seconds for this guest

samples  %        image name               symbol name
2161431  72.8297  qemu-system-x86_64       cpu_physical_memory_reset_dirty
379416   12.7845  qemu-system-x86_64       ram_save_live
367880   12.3958  qemu-system-x86_64       ram_save_block
16647     0.5609  qemu-system-x86_64       qemu_put_byte
10416     0.3510  qemu-system-x86_64       kvm_client_sync_dirty_bitmap
9013      0.3037  qemu-system-x86_64       qemu_put_be32

Clearly, we are spending too much time on cpu_physical_memory_reset_dirty.

ping results during the migration.

rtt min/avg/max/mdev = 474.395/39772.087/151843.178/55413.633 ms, pipe 152

You can see that the mean and maximun values are quite big.

We got in the guests the dreade: CPU softlookup for 10s

No need to iterate if we already are over the limit
---------------------------------------------------

   Numbers similar to previous ones.

KVM don't care about TLB handling
---------------------------------

   savevm: save livne iterate section id 3 name ram took 466 milliseconds 56 times

56 stalls, but much smaller, betweenn 0.5 and 1.4 seconds

    migration: ended after 115949 milliseconds

total time has improved a lot. 115 seconds.

samples  %        image name               symbol name
431530   52.1152  qemu-system-x86_64       ram_save_live
355568   42.9414  qemu-system-x86_64       ram_save_block
14446     1.7446  qemu-system-x86_64       qemu_put_byte
11856     1.4318  qemu-system-x86_64       kvm_client_sync_dirty_bitmap
3281      0.3962  qemu-system-x86_64       qemu_put_be32
2426      0.2930  qemu-system-x86_64       cpu_physical_memory_reset_dirty
2180      0.2633  qemu-system-x86_64       qemu_put_be64

notice how cpu_physical_memory_dirty() use much less time.

rtt min/avg/max/mdev = 474.438/1529.387/15578.055/2595.186 ms, pipe 16

ping values from outside to the guest have improved a bit, but still
bad.

Exit loop if we have been there too long
----------------------------------------

not a single stall bigger than 100ms

   migration: ended after 157511 milliseconds

not as good time as previous one, but we have removed the stalls.

samples  %        image name               symbol name
1104546  71.8260  qemu-system-x86_64       ram_save_live
370472   24.0909  qemu-system-x86_64       ram_save_block
30419     1.9781  qemu-system-x86_64       kvm_client_sync_dirty_bitmap
16252     1.0568  qemu-system-x86_64       qemu_put_byte
3400      0.2211  qemu-system-x86_64       qemu_put_be32
2657      0.1728  qemu-system-x86_64       cpu_physical_memory_reset_dirty
2206      0.1435  qemu-system-x86_64       qemu_put_be64
1559      0.1014  qemu-system-x86_64       qemu_file_rate_limit


You can see that ping times are improving
  rtt min/avg/max/mdev = 474.422/504.416/628.508/35.366 ms

now the maximun is near the minimum, in reasonable values.

The limit in the loop in stage loop has been put into 50ms because
buffered_file run a timer each 100ms.  If we miss that timer, we ended
having trouble.  So, I put 100/2.

I tried other values: 15ms (max_downtime/2, so it could be set by the
user), but gave too much total time (~400seconds).

I tried bigger values, 75ms and 100ms, but with any of them we got
stalls, some times as big as 1s, as we loss some timer run, and then
calculations are wrong.

With this patch, the softlookups are gone.

Change calculation to exit live migration
-----------------------------------------

we spent too much time on ram_save_live(), the problem is the
calculation of number of dirty pages (ram_save_remaining()).  Instead
of walking the bitmap each time that we need the value, we just
maintain the number of dirty pages each time that we change one value
in the bitmap.

   migration: ended after 151187 milliseconds

same total time.

samples  %        image name               symbol name
365104   84.1659  qemu-system-x86_64       ram_save_block
32048     7.3879  qemu-system-x86_64       kvm_client_sync_dirty_bitmap
16033     3.6960  qemu-system-x86_64       qemu_put_byte
3383      0.7799  qemu-system-x86_64       qemu_put_be32
3028      0.6980  qemu-system-x86_64       cpu_physical_memory_reset_dirty
2174      0.5012  qemu-system-x86_64       qemu_put_be64
1953      0.4502  qemu-system-x86_64       ram_save_live
1408      0.3246  qemu-system-x86_64       qemu_file_rate_limit

time is spent in ram_save_block() as expected.

rtt min/avg/max/mdev = 474.412/492.713/539.419/21.896 ms

std deviation is still better than without this.


and now, with load on the guest!!!
----------------------------------

will show only without my patches applied, and at the end (as with
load it takes more time to run the tests).

load is synthetic:

 stress -c 2 -m 4 --vm-bytes 256M

(2 cpu threads and two memory threads dirtying each 256MB RAM)

Notice that we are dirtying too much memory to be able to migrate with
the default downtime of 30ms.  What the migration should do is loop over
but without having stalls.  To get the migration ending, I just kill the
stress process after several iterations through all memory.

initial
-------

same stalls that without load (stalls are caused when it finds lots of
contiguous zero pages).


samples  %        image name               symbol name
2328320  52.9645  qemu-system-x86_64       cpu_physical_memory_reset_dirty
1504561  34.2257  qemu-system-x86_64       ram_save_live
382838    8.7088  qemu-system-x86_64       ram_save_block
52050     1.1840  qemu-system-x86_64       cpu_get_physical_page_desc
48975     1.1141  qemu-system-x86_64       kvm_client_sync_dirty_bitmap

rtt min/avg/max/mdev = 474.428/21033.451/134818.933/38245.396 ms, pipe 135

You can see that values/results are similar to what we had.

with all patches
----------------

no stalls, I stopped it after 438 seconds

samples  %        image name               symbol name
387722   56.4676  qemu-system-x86_64       ram_save_block
109500   15.9475  qemu-system-x86_64       kvm_client_sync_dirty_bitmap
92328    13.4466  qemu-system-x86_64       cpu_get_physical_page_desc
43573     6.3459  qemu-system-x86_64       phys_page_find_alloc
18255     2.6586  qemu-system-x86_64       qemu_put_byte
3940      0.5738  qemu-system-x86_64       qemu_put_be32
3621      0.5274  qemu-system-x86_64       cpu_physical_memory_reset_dirty
2591      0.3774  qemu-system-x86_64       ram_save_live

and ping gives similar values to unload one.

rtt min/avg/max/mdev = 474.400/486.094/548.479/15.820 ms

Note:

- I tested a version of this patches/algorithms with 400GB guests with
  an old qemu-kvm version (0.9.1, the one in RHEL5.  with so many
  memory the handling of the dirty bitmap is the thing that end
  causing stalls, will try to retest when I got access to the machines
  again).


QEMU tree
---------

original qemu
-------------

   savevm: save live iterate section id 2 name ram took 296 milliseconds 47 times

stalls similar to qemu-kvm.

  migration: ended after 205938 milliseconds

similar total time.

samples  %        image name               symbol name
2158149  72.3752  qemu-system-x86_64       cpu_physical_memory_reset_dirty
382016   12.8112  qemu-system-x86_64       ram_save_live
367000   12.3076  qemu-system-x86_64       ram_save_block
18012     0.6040  qemu-system-x86_64       qemu_put_byte
10496     0.3520  qemu-system-x86_64       kvm_client_sync_dirty_bitmap
7366      0.2470  qemu-system-x86_64       qemu_get_ram_ptr

very bad ping times
   rtt min/avg/max/mdev = 474.424/54575.554/159139.429/54473.043 ms, pipe 160


with all patches applied (no load)
----------------------------------

   savevm: save live iterate section id 2 name ram took 109 milliseconds 1 times

only one mini-stall, it is during stage 3 of savevm.

   migration: ended after 149529 milliseconds

similar time (a bit faster indeed)

samples  %        image name               symbol name
366803   73.9172  qemu-system-x86_64       ram_save_block
31717     6.3915  qemu-system-x86_64       kvm_client_sync_dirty_bitmap
16489     3.3228  qemu-system-x86_64       qemu_put_byte
5512      1.1108  qemu-system-x86_64       main_loop_wait
4886      0.9846  qemu-system-x86_64       cpu_exec_all
3418      0.6888  qemu-system-x86_64       qemu_put_be32
3397      0.6846  qemu-system-x86_64       kvm_vcpu_ioctl
3334      0.6719  [vdso] (tgid:18656 range:0x7ffff7ffe000-0x7ffff7fff000) [vdso] (tgid:18656 range:0x7ffff7ffe000-0x7ffff7fff000)
2913      0.5870  qemu-system-x86_64       cpu_physical_memory_reset_dirty

std deviation is a bit worse than qemu-kvm, but nothing to write home
   rtt min/avg/max/mdev = 475.406/485.577/909.463/40.292 ms

Juan Quintela (7):
  Add spent time for migration
  Add tracepoints for savevm section start/end
  No need to iterate if we already are over the limit
  Only TCG needs TLB handling
  Only calculate expected_time for stage 2
  Exit loop if we have been there too long
  Maintaing number of dirty pages

 arch_init.c      |   40 ++++++++++++++++++++++------------------
 cpu-all.h        |    1 +
 exec-obsolete.h  |    8 ++++++++
 exec.c           |   33 +++++++++++++++++++++++----------
 hmp.c            |    2 ++
 migration.c      |   11 +++++++++++
 migration.h      |    1 +
 qapi-schema.json |   12 +++++++++---
 savevm.c         |   11 +++++++++++
 trace-events     |    6 ++++++
 10 files changed, 94 insertions(+), 31 deletions(-)

-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 1/7] Add spent time for migration
  2012-05-22 18:32 [Qemu-devel] [RFC 0/7] Fix migration with lots of memory Juan Quintela
@ 2012-05-22 18:32 ` Juan Quintela
  2012-06-14 10:52   ` Orit Wasserman
  2012-05-22 18:32 ` [Qemu-devel] [PATCH 2/7] Add tracepoints for savevm section start/end Juan Quintela
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Juan Quintela @ 2012-05-22 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: chegu_vinod

We add time spent for migration to the output of "info migrate"
command.  'total_time' means time since the start fo migration if
migration is 'active', and total time of migration if migration is
completed.  As we are also interested in transferred ram when
migration completes, adding all ram statistics

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 hmp.c            |    2 ++
 migration.c      |   11 +++++++++++
 migration.h      |    1 +
 qapi-schema.json |   12 +++++++++---
 4 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/hmp.c b/hmp.c
index bb0952e..25a128b 100644
--- a/hmp.c
+++ b/hmp.c
@@ -142,6 +142,8 @@ void hmp_info_migrate(Monitor *mon)
                        info->ram->remaining >> 10);
         monitor_printf(mon, "total ram: %" PRIu64 " kbytes\n",
                        info->ram->total >> 10);
+        monitor_printf(mon, "total time: %" PRIu64 " milliseconds\n",
+                       info->ram->total_time);
     }

     if (info->has_disk) {
diff --git a/migration.c b/migration.c
index 3f485d3..599bb6c 100644
--- a/migration.c
+++ b/migration.c
@@ -131,6 +131,8 @@ MigrationInfo *qmp_query_migrate(Error **errp)
         info->ram->transferred = ram_bytes_transferred();
         info->ram->remaining = ram_bytes_remaining();
         info->ram->total = ram_bytes_total();
+        info->ram->total_time = qemu_get_clock_ms(rt_clock)
+            - s->total_time;

         if (blk_mig_active()) {
             info->has_disk = true;
@@ -143,6 +145,13 @@ MigrationInfo *qmp_query_migrate(Error **errp)
     case MIG_STATE_COMPLETED:
         info->has_status = true;
         info->status = g_strdup("completed");
+
+        info->has_ram = true;
+        info->ram = g_malloc0(sizeof(*info->ram));
+        info->ram->transferred = ram_bytes_transferred();
+        info->ram->remaining = 0;
+        info->ram->total = ram_bytes_total();
+        info->ram->total_time = s->total_time;
         break;
     case MIG_STATE_ERROR:
         info->has_status = true;
@@ -260,6 +269,7 @@ static void migrate_fd_put_ready(void *opaque)
         } else {
             migrate_fd_completed(s);
         }
+        s->total_time = qemu_get_clock_ms(rt_clock) - s->total_time;
         if (s->state != MIG_STATE_COMPLETED) {
             if (old_vm_running) {
                 vm_start();
@@ -373,6 +383,7 @@ static MigrationState *migrate_init(int blk, int inc)

     s->bandwidth_limit = bandwidth_limit;
     s->state = MIG_STATE_SETUP;
+    s->total_time = qemu_get_clock_ms(rt_clock);

     return s;
 }
diff --git a/migration.h b/migration.h
index 2e9ca2e..165b27b 100644
--- a/migration.h
+++ b/migration.h
@@ -33,6 +33,7 @@ struct MigrationState
     void *opaque;
     int blk;
     int shared;
+    int64_t total_time;
 };

 void process_incoming_migration(QEMUFile *f);
diff --git a/qapi-schema.json b/qapi-schema.json
index 2ca7195..c5a2e99 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -238,10 +238,15 @@
 #
 # @total: total amount of bytes involved in the migration process
 #
+# @total_time: tota0l amount of ms since migration started.  If
+#        migration has ended, it returns the total migration
+#        time. (since 1.2)
+#
 # Since: 0.14.0.
 ##
 { 'type': 'MigrationStats',
-  'data': {'transferred': 'int', 'remaining': 'int', 'total': 'int' } }
+  'data': {'transferred': 'int', 'remaining': 'int', 'total': 'int' ,
+           'total_time': 'int' } }

 ##
 # @MigrationInfo
@@ -253,8 +258,9 @@
 #          'cancelled'. If this field is not returned, no migration process
 #          has been initiated
 #
-# @ram: #optional @MigrationStats containing detailed migration status,
-#       only returned if status is 'active'
+# @ram: #optional @MigrationStats containing detailed migration
+#       status, only returned if status is 'active' or
+#       'completed'. 'comppleted' (since 1.2)
 #
 # @disk: #optional @MigrationStats containing detailed disk migration
 #        status, only returned if status is 'active' and it is a block
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 2/7] Add tracepoints for savevm section start/end
  2012-05-22 18:32 [Qemu-devel] [RFC 0/7] Fix migration with lots of memory Juan Quintela
  2012-05-22 18:32 ` [Qemu-devel] [PATCH 1/7] Add spent time for migration Juan Quintela
@ 2012-05-22 18:32 ` Juan Quintela
  2012-06-14 11:00   ` Orit Wasserman
  2012-05-22 18:32 ` [Qemu-devel] [PATCH 3/7] No need to iterate if we already are over the limit Juan Quintela
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Juan Quintela @ 2012-05-22 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: chegu_vinod

This allows to know how long each section takes to save.

An awk script like this tells us sections that takes more that 10ms

$1 ~ /savevm_state_iterate_end/ {
	/* Print savevm_section_end line when > 10ms duration */
	if ($2 > 10000) {
		printf("%s times_missing=%u\n", $0, times_missing++);
	}
}

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 savevm.c     |    8 ++++++++
 trace-events |    6 ++++++
 2 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/savevm.c b/savevm.c
index 2d18bab..779bd22 100644
--- a/savevm.c
+++ b/savevm.c
@@ -85,6 +85,7 @@
 #include "cpus.h"
 #include "memory.h"
 #include "qmp-commands.h"
+#include "trace.h"

 #define SELF_ANNOUNCE_ROUNDS 5

@@ -1624,11 +1625,14 @@ int qemu_savevm_state_iterate(QEMUFile *f)
         if (se->save_live_state == NULL)
             continue;

+        trace_savevm_section_start();
         /* Section type */
         qemu_put_byte(f, QEMU_VM_SECTION_PART);
         qemu_put_be32(f, se->section_id);

         ret = se->save_live_state(f, QEMU_VM_SECTION_PART, se->opaque);
+        trace_savevm_section_end(se->section_id);
+
         if (ret <= 0) {
             /* Do not proceed to the next vmstate before this one reported
                completion of the current stage. This serializes the migration
@@ -1658,11 +1662,13 @@ int qemu_savevm_state_complete(QEMUFile *f)
         if (se->save_live_state == NULL)
             continue;

+        trace_savevm_section_start();
         /* Section type */
         qemu_put_byte(f, QEMU_VM_SECTION_END);
         qemu_put_be32(f, se->section_id);

         ret = se->save_live_state(f, QEMU_VM_SECTION_END, se->opaque);
+        trace_savevm_section_end(se->section_id);
         if (ret < 0) {
             return ret;
         }
@@ -1674,6 +1680,7 @@ int qemu_savevm_state_complete(QEMUFile *f)
 	if (se->save_state == NULL && se->vmsd == NULL)
 	    continue;

+        trace_savevm_section_start();
         /* Section type */
         qemu_put_byte(f, QEMU_VM_SECTION_FULL);
         qemu_put_be32(f, se->section_id);
@@ -1687,6 +1694,7 @@ int qemu_savevm_state_complete(QEMUFile *f)
         qemu_put_be32(f, se->version_id);

         vmstate_save(f, se);
+        trace_savevm_section_end(se->section_id);
     }

     qemu_put_byte(f, QEMU_VM_EOF);
diff --git a/trace-events b/trace-events
index 87cb96c..73d5f81 100644
--- a/trace-events
+++ b/trace-events
@@ -741,6 +741,11 @@ displaysurface_resize(void *display_state, void *display_surface, int width, int
 # vga.c
 ppm_save(const char *filename, void *display_surface) "%s surface=%p"

+# savevm.c
+
+savevm_section_start(void) ""
+savevm_section_end(unsigned int section_id) "section_id %u"
+
 # hw/qxl.c
 disable qxl_interface_set_mm_time(int qid, uint32_t mm_time) "%d %d"
 disable qxl_io_write_vga(int qid, const char *mode, uint32_t addr, uint32_t val) "%d %s addr=%u val=%u"
@@ -805,3 +810,4 @@ qxl_render_blit_guest_primary_initialized(void) ""
 qxl_render_blit(int32_t stride, int32_t left, int32_t right, int32_t top, int32_t bottom) "stride=%d [%d, %d, %d, %d]"
 qxl_render_guest_primary_resized(int32_t width, int32_t height, int32_t stride, int32_t bytes_pp, int32_t bits_pp) "%dx%d, stride %d, bpp %d, depth %d"
 qxl_render_update_area_done(void *cookie) "%p"
+
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 3/7] No need to iterate if we already are over the limit
  2012-05-22 18:32 [Qemu-devel] [RFC 0/7] Fix migration with lots of memory Juan Quintela
  2012-05-22 18:32 ` [Qemu-devel] [PATCH 1/7] Add spent time for migration Juan Quintela
  2012-05-22 18:32 ` [Qemu-devel] [PATCH 2/7] Add tracepoints for savevm section start/end Juan Quintela
@ 2012-05-22 18:32 ` Juan Quintela
  2012-06-14 11:03   ` Orit Wasserman
  2012-05-22 18:32 ` [Qemu-devel] [PATCH 4/7] Only TCG needs TLB handling Juan Quintela
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Juan Quintela @ 2012-05-22 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: chegu_vinod

If buffers are full, don't iterate, just exit.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 savevm.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/savevm.c b/savevm.c
index 779bd22..6bc71b1 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1625,6 +1625,9 @@ int qemu_savevm_state_iterate(QEMUFile *f)
         if (se->save_live_state == NULL)
             continue;

+        if (qemu_file_rate_limit(f)) {
+            return 0;
+        }
         trace_savevm_section_start();
         /* Section type */
         qemu_put_byte(f, QEMU_VM_SECTION_PART);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 4/7] Only TCG needs TLB handling
  2012-05-22 18:32 [Qemu-devel] [RFC 0/7] Fix migration with lots of memory Juan Quintela
                   ` (2 preceding siblings ...)
  2012-05-22 18:32 ` [Qemu-devel] [PATCH 3/7] No need to iterate if we already are over the limit Juan Quintela
@ 2012-05-22 18:32 ` Juan Quintela
  2012-06-14 11:15   ` Orit Wasserman
  2012-05-22 18:32 ` [Qemu-devel] [PATCH 5/7] Only calculate expected_time for stage 2 Juan Quintela
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Juan Quintela @ 2012-05-22 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: chegu_vinod

Refactor the code that is only needed for tcg to an static function.
Call that only when tcg is enabled.  We can't refactor to a dummy
function in the kvm case, as qemu can be compiled at the same time
with tcg and kvm.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 exec.c |   31 +++++++++++++++++++++----------
 1 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/exec.c b/exec.c
index a0494c7..b6c7675 100644
--- a/exec.c
+++ b/exec.c
@@ -1943,11 +1943,29 @@ void tb_flush_jmp_cache(CPUArchState *env, target_ulong addr)
             TB_JMP_PAGE_SIZE * sizeof(TranslationBlock *));
 }

+static void tlb_reset_dirty_range_all(ram_addr_t start, ram_addr_t end,
+                                      uintptr_t length)
+{
+    uintptr_t start1;
+
+    /* we modify the TLB cache so that the dirty bit will be set again
+       when accessing the range */
+    start1 = (uintptr_t)qemu_safe_ram_ptr(start);
+    /* Check that we don't span multiple blocks - this breaks the
+       address comparisons below.  */
+    if ((uintptr_t)qemu_safe_ram_ptr(end - 1) - start1
+            != (end - 1) - start) {
+        abort();
+    }
+    cpu_tlb_reset_dirty_all(start1, length);
+
+}
+
 /* Note: start and end must be within the same ram block.  */
 void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end,
                                      int dirty_flags)
 {
-    uintptr_t length, start1;
+    uintptr_t length;

     start &= TARGET_PAGE_MASK;
     end = TARGET_PAGE_ALIGN(end);
@@ -1957,16 +1975,9 @@ void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end,
         return;
     cpu_physical_memory_mask_dirty_range(start, length, dirty_flags);

-    /* we modify the TLB cache so that the dirty bit will be set again
-       when accessing the range */
-    start1 = (uintptr_t)qemu_safe_ram_ptr(start);
-    /* Check that we don't span multiple blocks - this breaks the
-       address comparisons below.  */
-    if ((uintptr_t)qemu_safe_ram_ptr(end - 1) - start1
-            != (end - 1) - start) {
-        abort();
+    if (tcg_enabled()) {
+        tlb_reset_dirty_range_all(start, end, length);
     }
-    cpu_tlb_reset_dirty_all(start1, length);
 }

 int cpu_physical_memory_set_dirty_tracking(int enable)
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 5/7] Only calculate expected_time for stage 2
  2012-05-22 18:32 [Qemu-devel] [RFC 0/7] Fix migration with lots of memory Juan Quintela
                   ` (3 preceding siblings ...)
  2012-05-22 18:32 ` [Qemu-devel] [PATCH 4/7] Only TCG needs TLB handling Juan Quintela
@ 2012-05-22 18:32 ` Juan Quintela
  2012-06-14 11:31   ` Orit Wasserman
  2012-05-22 18:32 ` [Qemu-devel] [PATCH 6/7] Exit loop if we have been there too long Juan Quintela
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Juan Quintela @ 2012-05-22 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: chegu_vinod

ram_save_remaining() is an expensive operation when there is a lot of memory.
So we only call the function when we need it.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 arch_init.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 988adca..76a3d4e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -295,7 +295,6 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
     ram_addr_t addr;
     uint64_t bytes_transferred_last;
     double bwidth = 0;
-    uint64_t expected_time = 0;
     int ret;

     if (stage < 0) {
@@ -372,9 +371,12 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)

     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);

-    expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
-
-    return (stage == 2) && (expected_time <= migrate_max_downtime());
+    if (stage == 2) {
+        uint64_t expected_time;
+        expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
+        return expected_time <= migrate_max_downtime();
+    }
+    return 0;
 }

 static inline void *host_from_stream_offset(QEMUFile *f,
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 6/7] Exit loop if we have been there too long
  2012-05-22 18:32 [Qemu-devel] [RFC 0/7] Fix migration with lots of memory Juan Quintela
                   ` (4 preceding siblings ...)
  2012-05-22 18:32 ` [Qemu-devel] [PATCH 5/7] Only calculate expected_time for stage 2 Juan Quintela
@ 2012-05-22 18:32 ` Juan Quintela
  2012-06-14 11:36   ` Orit Wasserman
  2012-05-22 18:32 ` [Qemu-devel] [PATCH 7/7] Maintaing number of dirty pages Juan Quintela
  2012-06-11  3:56 ` [Qemu-devel] [RFC 0/7] Fix migration with lots of memory Chegu Vinod
  7 siblings, 1 reply; 19+ messages in thread
From: Juan Quintela @ 2012-05-22 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: chegu_vinod

cheking each 64 pages is a random magic number as good as any other.
We don't want to test too many times, but on the other hand,
qemu_get_clock_ns() is not so expensive either.  We want to be sure
that we spent less than 50ms (half of buffered_file timer), if we
spent more than 100ms, all the accounting got wrong.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 arch_init.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 76a3d4e..2aa77ff 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -296,6 +296,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
     uint64_t bytes_transferred_last;
     double bwidth = 0;
     int ret;
+    int i;

     if (stage < 0) {
         memory_global_dirty_log_stop();
@@ -335,6 +336,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
     bytes_transferred_last = bytes_transferred;
     bwidth = qemu_get_clock_ns(rt_clock);

+    i = 0;
     while ((ret = qemu_file_rate_limit(f)) == 0) {
         int bytes_sent;

@@ -343,6 +345,19 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
         if (bytes_sent == 0) { /* no more blocks */
             break;
         }
+        /* we want to check in the 1st loop, just in case it was the 1st time
+           and we had to sync the dirty bitmap.
+           qemu_get_clock_ns() is a bit expensive, so we only check each some
+           iterations
+        */
+        if ((i & 63) == 0) {
+            uint64_t t1 = (qemu_get_clock_ns(rt_clock) - bwidth) / 1000000;
+            if (t1 > 50) { /* 50ms, half buffered_file limit */
+                printf("big delay %ld milliseconds, %d iterations\n", t1, i);
+                break;
+            }
+        }
+        i++;
     }

     if (ret < 0) {
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 7/7] Maintaing number of dirty pages
  2012-05-22 18:32 [Qemu-devel] [RFC 0/7] Fix migration with lots of memory Juan Quintela
                   ` (5 preceding siblings ...)
  2012-05-22 18:32 ` [Qemu-devel] [PATCH 6/7] Exit loop if we have been there too long Juan Quintela
@ 2012-05-22 18:32 ` Juan Quintela
  2012-06-14 11:42   ` Orit Wasserman
  2012-06-11  3:56 ` [Qemu-devel] [RFC 0/7] Fix migration with lots of memory Chegu Vinod
  7 siblings, 1 reply; 19+ messages in thread
From: Juan Quintela @ 2012-05-22 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: chegu_vinod

Calculate the number of dirty pages takes a lot on hosts with lots
of memory.  Just maintain how many pages are dirty.  Only sync bitmaps
if number is small enough.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 arch_init.c     |   15 +--------------
 cpu-all.h       |    1 +
 exec-obsolete.h |    8 ++++++++
 exec.c          |    2 ++
 4 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 2aa77ff..77c3580 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -224,20 +224,7 @@ static uint64_t bytes_transferred;

 static ram_addr_t ram_save_remaining(void)
 {
-    RAMBlock *block;
-    ram_addr_t count = 0;
-
-    QLIST_FOREACH(block, &ram_list.blocks, next) {
-        ram_addr_t addr;
-        for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) {
-            if (memory_region_get_dirty(block->mr, addr, TARGET_PAGE_SIZE,
-                                        DIRTY_MEMORY_MIGRATION)) {
-                count++;
-            }
-        }
-    }
-
-    return count;
+    return ram_list.dirty_pages;
 }

 uint64_t ram_bytes_remaining(void)
diff --git a/cpu-all.h b/cpu-all.h
index 028528f..897210f 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -502,6 +502,7 @@ typedef struct RAMBlock {
 typedef struct RAMList {
     uint8_t *phys_dirty;
     QLIST_HEAD(, RAMBlock) blocks;
+    uint64_t dirty_pages;
 } RAMList;
 extern RAMList ram_list;

diff --git a/exec-obsolete.h b/exec-obsolete.h
index 792c831..2e54ac9 100644
--- a/exec-obsolete.h
+++ b/exec-obsolete.h
@@ -75,6 +75,10 @@ static inline int cpu_physical_memory_get_dirty(ram_addr_t start,

 static inline void cpu_physical_memory_set_dirty(ram_addr_t addr)
 {
+    if (!cpu_physical_memory_get_dirty(addr, TARGET_PAGE_SIZE,
+                                       MIGRATION_DIRTY_FLAG)) {
+        ram_list.dirty_pages++;
+    }
     ram_list.phys_dirty[addr >> TARGET_PAGE_BITS] = 0xff;
 }

@@ -112,6 +116,10 @@ static inline void cpu_physical_memory_mask_dirty_range(ram_addr_t start,
     mask = ~dirty_flags;
     p = ram_list.phys_dirty + (start >> TARGET_PAGE_BITS);
     for (addr = start; addr < end; addr += TARGET_PAGE_SIZE) {
+        if (cpu_physical_memory_get_dirty(addr, TARGET_PAGE_SIZE,
+                                          MIGRATION_DIRTY_FLAG & dirty_flags)) {
+            ram_list.dirty_pages--;
+        }
         *p++ &= mask;
     }
 }
diff --git a/exec.c b/exec.c
index b6c7675..be4865c 100644
--- a/exec.c
+++ b/exec.c
@@ -2687,6 +2687,8 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
     memset(ram_list.phys_dirty + (new_block->offset >> TARGET_PAGE_BITS),
            0xff, size >> TARGET_PAGE_BITS);

+    ram_list.dirty_pages += size >> TARGET_PAGE_BITS;
+
     if (kvm_enabled())
         kvm_setup_guest_memory(new_block->host, size);

-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC 0/7] Fix migration with lots of memory
  2012-05-22 18:32 [Qemu-devel] [RFC 0/7] Fix migration with lots of memory Juan Quintela
                   ` (6 preceding siblings ...)
  2012-05-22 18:32 ` [Qemu-devel] [PATCH 7/7] Maintaing number of dirty pages Juan Quintela
@ 2012-06-11  3:56 ` Chegu Vinod
  7 siblings, 0 replies; 19+ messages in thread
From: Chegu Vinod @ 2012-06-11  3:56 UTC (permalink / raw)
  To: Juan Quintela; +Cc: qemu-devel

Hello,

I did pick up these patches a while back and did run some migration tests while
running simple workloads in the guest. Below are some results.

FYI...
Vinod
----


Config Details:

Guest 10vcps, 60GB (running on a host that is 6cores(12threads) and 64GB).
The hosts are identical X86_64 Blade servers&  are connected via a private
10G link (for migration traffic)

Guest was started using qemu (no virsh/virt-manager etc).
Migration was initiated at the qemu monitor prompt
and the migration_set_speed was used to set to 10G. No changes
to the downtime.

Software:
- Guest&  the Host OS : 3.4.0-rc7+
- Vanilla : basic upstream qemu.git
- huge_memory changes(Juan's qemu.git tree)


[ Note : BTW, 'did also try vers:11 of XBZRLE patches...but ran into issues (guest crashed
after migration) 'have reported it to the author]


Here are the simple "workloads" and results:

1) Idling guest
2) AIM7-compute (with 2000 users).
3) 10way parallel make (of the kernel)
4) 2 instances of memory r/w loop (exactly the same as in docs/xbzrle.txt)
5) SPECJbb2005


Note: In the Vanilla case I had instrumented ram_save_live()
to print out the total migration time and the MB's transferred.

1) Idling guest:

Vanilla :
Total Mig. time: 173016 ms
Total MB's transferred : 1606MB

huge_memory:
Total Mig. time:  48821 ms
Total MB's transferred : 1620 MB

2) AIM7-compute  (2000 users)

Vanilla :
Total Mig. time: 241124 ms
Total MB's transferred : 4827MB

huge_memory:
Total Mig. time: 66716 ms
Total MB's transferred : 4022MB


3) 10 way parallel make: (of the linux kernel)

Vanilla :
Total Mig. time: 104319 ms
Total MB's transferred : 2316MB

huge_memory:
Total Mig. time: 55105 ms
Total MB's transferred : 2995MB


4) 2 instances of Memory r/w loop: (refer to docs/xbzrle.txt)

Vanilla :
Total Mig. time: 112102 ms
Total MB's transferred : 1739MB

huge_memory:
Total Mig. time: 85504ms
Total MB's transferred : 1745MB


5) SPECJbb :

Vanilla :
Total Mig. time: 162189 ms
Total MB's transferred : 5461MB

huge_memory:
Total Mig. time: 67787 ms
Total MB's transferred : 8528MB


[Expected] Observation :

Unlike with the Vanilla case(&  also the XBZRLE case), with these patches I was still able
to interact with the qemu monitor prompt and also interact with the guest during the migration (i.e. during the iterative pre-copy phase).





------


On 5/22/2012 11:32 AM, Juan Quintela wrote:
> Hi
>
> After a long, long time, this is v2.
>
> This are basically the changes that we have for RHEL, due to the
> problems that we have with big memory machines.  I just rebased the
> patches and fixed the easy parts:
>
> - buffered_file_limit is gone: we just use 50ms and call it a day
>
> - I let ram_addr_t as a valid type for a counter (no, I still don't
>    agree with Anthony on this, but it is not important).
>
> - Print total time of migration always.  Notice that I also print it
>    when migration is completed.  Luiz, could you take a look to see if
>    I did something worng (probably).
>
> - Moved debug printfs to tracepointns.  Thanks a lot to Stefan for
>    helping with it.  Once here, I had to put the traces in the middle
>    of trace-events file, if I put them on the end of the file, when I
>    enable them, I got generated the previous two tracepoints, instead
>    of the ones I just defined.  Stefan is looking on that.  Workaround
>    is defining them anywhere else.
>
> - exit from cpu_physical_memory_reset_dirty().  Anthony wanted that I
>    created an empty stub for kvm, and maintain the code for tcg.  The
>    problem is that we can have both kvm and tcg running from the same
>    binary.  Intead of exiting in the middle of the function, I just
>    refactored the code out.  Is there an struct where I could add a new
>    function pointer for this behaviour?
>
> - exit if we have been too long on ram_save_live() loop.  Anthony
>    didn't like this, I will sent a version based on the migration
>    thread in the following days.  But just need something working for
>    other people to test.
>
>    Notice that I still got "lots" of more than 50ms printf's. (Yes,
>    there is a debugging printf there).
>
> - Bitmap handling.  Still all code to count dirty pages, will try to
>    get something saner based on bitmap optimizations.
>
> Comments?
>
> Later, Juan.
>
>
>
>
> v1:
> ---
>
> Executive Summary
> -----------------
>
> This series of patches fix migration with lots of memory.  With them stalls
> are removed, and we honored max_dowtime.
> I also add infrastructure to measure what is happening during migration
> (#define DEBUG_MIGRATION and DEBUG_SAVEVM).
>
> Migration is broken at the momment in qemu tree, Michael patch is needed to
> fix virtio migration.  Measurements are given for qemu-kvm tree.  At the end, some measurements
> with qemu tree.
>
> Long Version with measurements (for those that like numbers O:-)
> ------------------------------
>
> 8 vCPUS and 64GB RAM, a RHEL5 guest that is completelly idle
>
> initial
> -------
>
>     savevm: save live iterate section id 3 name ram took 3266 milliseconds 46 times
>
> We have 46 stalls, and missed the 100ms deadline 46 times.
> stalls took around 3.5 and 3.6 seconds each.
>
>     savevm: save devices took 1 milliseconds
>
> if you had any doubt (rest of devices, not RAM) took less than 1ms, so
> we don't care for now to optimize them.
>
>     migration: ended after 207411 milliseconds
>
> total migration took 207 seconds for this guest
>
> samples  %        image name               symbol name
> 2161431  72.8297  qemu-system-x86_64       cpu_physical_memory_reset_dirty
> 379416   12.7845  qemu-system-x86_64       ram_save_live
> 367880   12.3958  qemu-system-x86_64       ram_save_block
> 16647     0.5609  qemu-system-x86_64       qemu_put_byte
> 10416     0.3510  qemu-system-x86_64       kvm_client_sync_dirty_bitmap
> 9013      0.3037  qemu-system-x86_64       qemu_put_be32
>
> Clearly, we are spending too much time on cpu_physical_memory_reset_dirty.
>
> ping results during the migration.
>
> rtt min/avg/max/mdev = 474.395/39772.087/151843.178/55413.633 ms, pipe 152
>
> You can see that the mean and maximun values are quite big.
>
> We got in the guests the dreade: CPU softlookup for 10s
>
> No need to iterate if we already are over the limit
> ---------------------------------------------------
>
>     Numbers similar to previous ones.
>
> KVM don't care about TLB handling
> ---------------------------------
>
>     savevm: save livne iterate section id 3 name ram took 466 milliseconds 56 times
>
> 56 stalls, but much smaller, betweenn 0.5 and 1.4 seconds
>
>      migration: ended after 115949 milliseconds
>
> total time has improved a lot. 115 seconds.
>
> samples  %        image name               symbol name
> 431530   52.1152  qemu-system-x86_64       ram_save_live
> 355568   42.9414  qemu-system-x86_64       ram_save_block
> 14446     1.7446  qemu-system-x86_64       qemu_put_byte
> 11856     1.4318  qemu-system-x86_64       kvm_client_sync_dirty_bitmap
> 3281      0.3962  qemu-system-x86_64       qemu_put_be32
> 2426      0.2930  qemu-system-x86_64       cpu_physical_memory_reset_dirty
> 2180      0.2633  qemu-system-x86_64       qemu_put_be64
>
> notice how cpu_physical_memory_dirty() use much less time.
>
> rtt min/avg/max/mdev = 474.438/1529.387/15578.055/2595.186 ms, pipe 16
>
> ping values from outside to the guest have improved a bit, but still
> bad.
>
> Exit loop if we have been there too long
> ----------------------------------------
>
> not a single stall bigger than 100ms
>
>     migration: ended after 157511 milliseconds
>
> not as good time as previous one, but we have removed the stalls.
>
> samples  %        image name               symbol name
> 1104546  71.8260  qemu-system-x86_64       ram_save_live
> 370472   24.0909  qemu-system-x86_64       ram_save_block
> 30419     1.9781  qemu-system-x86_64       kvm_client_sync_dirty_bitmap
> 16252     1.0568  qemu-system-x86_64       qemu_put_byte
> 3400      0.2211  qemu-system-x86_64       qemu_put_be32
> 2657      0.1728  qemu-system-x86_64       cpu_physical_memory_reset_dirty
> 2206      0.1435  qemu-system-x86_64       qemu_put_be64
> 1559      0.1014  qemu-system-x86_64       qemu_file_rate_limit
>
>
> You can see that ping times are improving
>    rtt min/avg/max/mdev = 474.422/504.416/628.508/35.366 ms
>
> now the maximun is near the minimum, in reasonable values.
>
> The limit in the loop in stage loop has been put into 50ms because
> buffered_file run a timer each 100ms.  If we miss that timer, we ended
> having trouble.  So, I put 100/2.
>
> I tried other values: 15ms (max_downtime/2, so it could be set by the
> user), but gave too much total time (~400seconds).
>
> I tried bigger values, 75ms and 100ms, but with any of them we got
> stalls, some times as big as 1s, as we loss some timer run, and then
> calculations are wrong.
>
> With this patch, the softlookups are gone.
>
> Change calculation to exit live migration
> -----------------------------------------
>
> we spent too much time on ram_save_live(), the problem is the
> calculation of number of dirty pages (ram_save_remaining()).  Instead
> of walking the bitmap each time that we need the value, we just
> maintain the number of dirty pages each time that we change one value
> in the bitmap.
>
>     migration: ended after 151187 milliseconds
>
> same total time.
>
> samples  %        image name               symbol name
> 365104   84.1659  qemu-system-x86_64       ram_save_block
> 32048     7.3879  qemu-system-x86_64       kvm_client_sync_dirty_bitmap
> 16033     3.6960  qemu-system-x86_64       qemu_put_byte
> 3383      0.7799  qemu-system-x86_64       qemu_put_be32
> 3028      0.6980  qemu-system-x86_64       cpu_physical_memory_reset_dirty
> 2174      0.5012  qemu-system-x86_64       qemu_put_be64
> 1953      0.4502  qemu-system-x86_64       ram_save_live
> 1408      0.3246  qemu-system-x86_64       qemu_file_rate_limit
>
> time is spent in ram_save_block() as expected.
>
> rtt min/avg/max/mdev = 474.412/492.713/539.419/21.896 ms
>
> std deviation is still better than without this.
>
>
> and now, with load on the guest!!!
> ----------------------------------
>
> will show only without my patches applied, and at the end (as with
> load it takes more time to run the tests).
>
> load is synthetic:
>
>   stress -c 2 -m 4 --vm-bytes 256M
>
> (2 cpu threads and two memory threads dirtying each 256MB RAM)
>
> Notice that we are dirtying too much memory to be able to migrate with
> the default downtime of 30ms.  What the migration should do is loop over
> but without having stalls.  To get the migration ending, I just kill the
> stress process after several iterations through all memory.
>
> initial
> -------
>
> same stalls that without load (stalls are caused when it finds lots of
> contiguous zero pages).
>
>
> samples  %        image name               symbol name
> 2328320  52.9645  qemu-system-x86_64       cpu_physical_memory_reset_dirty
> 1504561  34.2257  qemu-system-x86_64       ram_save_live
> 382838    8.7088  qemu-system-x86_64       ram_save_block
> 52050     1.1840  qemu-system-x86_64       cpu_get_physical_page_desc
> 48975     1.1141  qemu-system-x86_64       kvm_client_sync_dirty_bitmap
>
> rtt min/avg/max/mdev = 474.428/21033.451/134818.933/38245.396 ms, pipe 135
>
> You can see that values/results are similar to what we had.
>
> with all patches
> ----------------
>
> no stalls, I stopped it after 438 seconds
>
> samples  %        image name               symbol name
> 387722   56.4676  qemu-system-x86_64       ram_save_block
> 109500   15.9475  qemu-system-x86_64       kvm_client_sync_dirty_bitmap
> 92328    13.4466  qemu-system-x86_64       cpu_get_physical_page_desc
> 43573     6.3459  qemu-system-x86_64       phys_page_find_alloc
> 18255     2.6586  qemu-system-x86_64       qemu_put_byte
> 3940      0.5738  qemu-system-x86_64       qemu_put_be32
> 3621      0.5274  qemu-system-x86_64       cpu_physical_memory_reset_dirty
> 2591      0.3774  qemu-system-x86_64       ram_save_live
>
> and ping gives similar values to unload one.
>
> rtt min/avg/max/mdev = 474.400/486.094/548.479/15.820 ms
>
> Note:
>
> - I tested a version of this patches/algorithms with 400GB guests with
>    an old qemu-kvm version (0.9.1, the one in RHEL5.  with so many
>    memory the handling of the dirty bitmap is the thing that end
>    causing stalls, will try to retest when I got access to the machines
>    again).
>
>
> QEMU tree
> ---------
>
> original qemu
> -------------
>
>     savevm: save live iterate section id 2 name ram took 296 milliseconds 47 times
>
> stalls similar to qemu-kvm.
>
>    migration: ended after 205938 milliseconds
>
> similar total time.
>
> samples  %        image name               symbol name
> 2158149  72.3752  qemu-system-x86_64       cpu_physical_memory_reset_dirty
> 382016   12.8112  qemu-system-x86_64       ram_save_live
> 367000   12.3076  qemu-system-x86_64       ram_save_block
> 18012     0.6040  qemu-system-x86_64       qemu_put_byte
> 10496     0.3520  qemu-system-x86_64       kvm_client_sync_dirty_bitmap
> 7366      0.2470  qemu-system-x86_64       qemu_get_ram_ptr
>
> very bad ping times
>     rtt min/avg/max/mdev = 474.424/54575.554/159139.429/54473.043 ms, pipe 160
>
>
> with all patches applied (no load)
> ----------------------------------
>
>     savevm: save live iterate section id 2 name ram took 109 milliseconds 1 times
>
> only one mini-stall, it is during stage 3 of savevm.
>
>     migration: ended after 149529 milliseconds
>
> similar time (a bit faster indeed)
>
> samples  %        image name               symbol name
> 366803   73.9172  qemu-system-x86_64       ram_save_block
> 31717     6.3915  qemu-system-x86_64       kvm_client_sync_dirty_bitmap
> 16489     3.3228  qemu-system-x86_64       qemu_put_byte
> 5512      1.1108  qemu-system-x86_64       main_loop_wait
> 4886      0.9846  qemu-system-x86_64       cpu_exec_all
> 3418      0.6888  qemu-system-x86_64       qemu_put_be32
> 3397      0.6846  qemu-system-x86_64       kvm_vcpu_ioctl
> 3334      0.6719  [vdso] (tgid:18656 range:0x7ffff7ffe000-0x7ffff7fff000) [vdso] (tgid:18656 range:0x7ffff7ffe000-0x7ffff7fff000)
> 2913      0.5870  qemu-system-x86_64       cpu_physical_memory_reset_dirty
>
> std deviation is a bit worse than qemu-kvm, but nothing to write home
>     rtt min/avg/max/mdev = 475.406/485.577/909.463/40.292 ms
>
> Juan Quintela (7):
>    Add spent time for migration
>    Add tracepoints for savevm section start/end
>    No need to iterate if we already are over the limit
>    Only TCG needs TLB handling
>    Only calculate expected_time for stage 2
>    Exit loop if we have been there too long
>    Maintaing number of dirty pages
>
>   arch_init.c      |   40 ++++++++++++++++++++++------------------
>   cpu-all.h        |    1 +
>   exec-obsolete.h  |    8 ++++++++
>   exec.c           |   33 +++++++++++++++++++++++----------
>   hmp.c            |    2 ++
>   migration.c      |   11 +++++++++++
>   migration.h      |    1 +
>   qapi-schema.json |   12 +++++++++---
>   savevm.c         |   11 +++++++++++
>   trace-events     |    6 ++++++
>   10 files changed, 94 insertions(+), 31 deletions(-)
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 1/7] Add spent time for migration
  2012-05-22 18:32 ` [Qemu-devel] [PATCH 1/7] Add spent time for migration Juan Quintela
@ 2012-06-14 10:52   ` Orit Wasserman
  0 siblings, 0 replies; 19+ messages in thread
From: Orit Wasserman @ 2012-06-14 10:52 UTC (permalink / raw)
  To: Juan Quintela; +Cc: chegu_vinod, qemu-devel

On 05/22/2012 09:32 PM, Juan Quintela wrote:
> We add time spent for migration to the output of "info migrate"
> command.  'total_time' means time since the start fo migration if
> migration is 'active', and total time of migration if migration is
> completed.  As we are also interested in transferred ram when
> migration completes, adding all ram statistics
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  hmp.c            |    2 ++
>  migration.c      |   11 +++++++++++
>  migration.h      |    1 +
>  qapi-schema.json |   12 +++++++++---
>  4 files changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/hmp.c b/hmp.c
> index bb0952e..25a128b 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -142,6 +142,8 @@ void hmp_info_migrate(Monitor *mon)
>                         info->ram->remaining >> 10);
>          monitor_printf(mon, "total ram: %" PRIu64 " kbytes\n",
>                         info->ram->total >> 10);
> +        monitor_printf(mon, "total time: %" PRIu64 " milliseconds\n",
> +                       info->ram->total_time);
>      }
> 
>      if (info->has_disk) {
> diff --git a/migration.c b/migration.c
> index 3f485d3..599bb6c 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -131,6 +131,8 @@ MigrationInfo *qmp_query_migrate(Error **errp)
>          info->ram->transferred = ram_bytes_transferred();
>          info->ram->remaining = ram_bytes_remaining();
>          info->ram->total = ram_bytes_total();
> +        info->ram->total_time = qemu_get_clock_ms(rt_clock)
> +            - s->total_time;
> 
>          if (blk_mig_active()) {
>              info->has_disk = true;
> @@ -143,6 +145,13 @@ MigrationInfo *qmp_query_migrate(Error **errp)
>      case MIG_STATE_COMPLETED:
>          info->has_status = true;
>          info->status = g_strdup("completed");
> +
> +        info->has_ram = true;
> +        info->ram = g_malloc0(sizeof(*info->ram));
> +        info->ram->transferred = ram_bytes_transferred();
> +        info->ram->remaining = 0;
> +        info->ram->total = ram_bytes_total();
> +        info->ram->total_time = s->total_time;
>          break;
>      case MIG_STATE_ERROR:
>          info->has_status = true;
> @@ -260,6 +269,7 @@ static void migrate_fd_put_ready(void *opaque)
>          } else {
>              migrate_fd_completed(s);
>          }
> +        s->total_time = qemu_get_clock_ms(rt_clock) - s->total_time;
>          if (s->state != MIG_STATE_COMPLETED) {
>              if (old_vm_running) {
>                  vm_start();
> @@ -373,6 +383,7 @@ static MigrationState *migrate_init(int blk, int inc)
> 
>      s->bandwidth_limit = bandwidth_limit;
>      s->state = MIG_STATE_SETUP;
> +    s->total_time = qemu_get_clock_ms(rt_clock);
> 
>      return s;
>  }
> diff --git a/migration.h b/migration.h
> index 2e9ca2e..165b27b 100644
> --- a/migration.h
> +++ b/migration.h
> @@ -33,6 +33,7 @@ struct MigrationState
>      void *opaque;
>      int blk;
>      int shared;
> +    int64_t total_time;
>  };
> 
>  void process_incoming_migration(QEMUFile *f);
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 2ca7195..c5a2e99 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -238,10 +238,15 @@
>  #
>  # @total: total amount of bytes involved in the migration process
>  #
> +# @total_time: tota0l amount of ms since migration started.  If
> +#        migration has ended, it returns the total migration
> +#        time. (since 1.2)
> +#
>  # Since: 0.14.0.
>  ##
>  { 'type': 'MigrationStats',
> -  'data': {'transferred': 'int', 'remaining': 'int', 'total': 'int' } }
> +  'data': {'transferred': 'int', 'remaining': 'int', 'total': 'int' ,
> +           'total_time': 'int' } }
> 
>  ##
>  # @MigrationInfo
> @@ -253,8 +258,9 @@
>  #          'cancelled'. If this field is not returned, no migration process
>  #          has been initiated
>  #
> -# @ram: #optional @MigrationStats containing detailed migration status,
> -#       only returned if status is 'active'
> +# @ram: #optional @MigrationStats containing detailed migration
> +#       status, only returned if status is 'active' or
> +#       'completed'. 'comppleted' (since 1.2)
>  #
>  # @disk: #optional @MigrationStats containing detailed disk migration
>  #        status, only returned if status is 'active' and it is a block

Reviewed-by: Orit Wasserman <owasserm@redhat.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] Add tracepoints for savevm section start/end
  2012-05-22 18:32 ` [Qemu-devel] [PATCH 2/7] Add tracepoints for savevm section start/end Juan Quintela
@ 2012-06-14 11:00   ` Orit Wasserman
  0 siblings, 0 replies; 19+ messages in thread
From: Orit Wasserman @ 2012-06-14 11:00 UTC (permalink / raw)
  To: Juan Quintela; +Cc: chegu_vinod, qemu-devel

On 05/22/2012 09:32 PM, Juan Quintela wrote:
> This allows to know how long each section takes to save.
> 
> An awk script like this tells us sections that takes more that 10ms
> 
> $1 ~ /savevm_state_iterate_end/ {
> 	/* Print savevm_section_end line when > 10ms duration */
> 	if ($2 > 10000) {
> 		printf("%s times_missing=%u\n", $0, times_missing++);
> 	}
> }
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  savevm.c     |    8 ++++++++
>  trace-events |    6 ++++++
>  2 files changed, 14 insertions(+), 0 deletions(-)
> 
> diff --git a/savevm.c b/savevm.c
> index 2d18bab..779bd22 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -85,6 +85,7 @@
>  #include "cpus.h"
>  #include "memory.h"
>  #include "qmp-commands.h"
> +#include "trace.h"
> 
>  #define SELF_ANNOUNCE_ROUNDS 5
> 
> @@ -1624,11 +1625,14 @@ int qemu_savevm_state_iterate(QEMUFile *f)
>          if (se->save_live_state == NULL)
>              continue;
> 
> +        trace_savevm_section_start();
>          /* Section type */
>          qemu_put_byte(f, QEMU_VM_SECTION_PART);
>          qemu_put_be32(f, se->section_id);
> 
>          ret = se->save_live_state(f, QEMU_VM_SECTION_PART, se->opaque);
> +        trace_savevm_section_end(se->section_id);
> +
>          if (ret <= 0) {
>              /* Do not proceed to the next vmstate before this one reported
>                 completion of the current stage. This serializes the migration
> @@ -1658,11 +1662,13 @@ int qemu_savevm_state_complete(QEMUFile *f)
>          if (se->save_live_state == NULL)
>              continue;
> 
> +        trace_savevm_section_start();
>          /* Section type */
>          qemu_put_byte(f, QEMU_VM_SECTION_END);
>          qemu_put_be32(f, se->section_id);
> 
>          ret = se->save_live_state(f, QEMU_VM_SECTION_END, se->opaque);
> +        trace_savevm_section_end(se->section_id);
>          if (ret < 0) {
>              return ret;
>          }
> @@ -1674,6 +1680,7 @@ int qemu_savevm_state_complete(QEMUFile *f)
>  	if (se->save_state == NULL && se->vmsd == NULL)
>  	    continue;
> 
> +        trace_savevm_section_start();
>          /* Section type */
>          qemu_put_byte(f, QEMU_VM_SECTION_FULL);
>          qemu_put_be32(f, se->section_id);
> @@ -1687,6 +1694,7 @@ int qemu_savevm_state_complete(QEMUFile *f)
>          qemu_put_be32(f, se->version_id);
> 
>          vmstate_save(f, se);
> +        trace_savevm_section_end(se->section_id);
>      }
> 
>      qemu_put_byte(f, QEMU_VM_EOF);
> diff --git a/trace-events b/trace-events
> index 87cb96c..73d5f81 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -741,6 +741,11 @@ displaysurface_resize(void *display_state, void *display_surface, int width, int
>  # vga.c
>  ppm_save(const char *filename, void *display_surface) "%s surface=%p"
> 
> +# savevm.c
> +
> +savevm_section_start(void) ""
> +savevm_section_end(unsigned int section_id) "section_id %u"
> +
>  # hw/qxl.c
>  disable qxl_interface_set_mm_time(int qid, uint32_t mm_time) "%d %d"
>  disable qxl_io_write_vga(int qid, const char *mode, uint32_t addr, uint32_t val) "%d %s addr=%u val=%u"
> @@ -805,3 +810,4 @@ qxl_render_blit_guest_primary_initialized(void) ""
>  qxl_render_blit(int32_t stride, int32_t left, int32_t right, int32_t top, int32_t bottom) "stride=%d [%d, %d, %d, %d]"
>  qxl_render_guest_primary_resized(int32_t width, int32_t height, int32_t stride, int32_t bytes_pp, int32_t bits_pp) "%dx%d, stride %d, bpp %d, depth %d"
>  qxl_render_update_area_done(void *cookie) "%p"
> +
extra ws

Reviewed-by: Orit Wasserman <owasserm@redhat.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 3/7] No need to iterate if we already are over the limit
  2012-05-22 18:32 ` [Qemu-devel] [PATCH 3/7] No need to iterate if we already are over the limit Juan Quintela
@ 2012-06-14 11:03   ` Orit Wasserman
  0 siblings, 0 replies; 19+ messages in thread
From: Orit Wasserman @ 2012-06-14 11:03 UTC (permalink / raw)
  To: Juan Quintela; +Cc: chegu_vinod, qemu-devel

On 05/22/2012 09:32 PM, Juan Quintela wrote:
> If buffers are full, don't iterate, just exit.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  savevm.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/savevm.c b/savevm.c
> index 779bd22..6bc71b1 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -1625,6 +1625,9 @@ int qemu_savevm_state_iterate(QEMUFile *f)
>          if (se->save_live_state == NULL)
>              continue;
> 
> +        if (qemu_file_rate_limit(f)) {
> +            return 0;
> +        }
>          trace_savevm_section_start();
>          /* Section type */
>          qemu_put_byte(f, QEMU_VM_SECTION_PART);

Reviewed-by: Orit Wasserman <owasserm@redhat.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 4/7] Only TCG needs TLB handling
  2012-05-22 18:32 ` [Qemu-devel] [PATCH 4/7] Only TCG needs TLB handling Juan Quintela
@ 2012-06-14 11:15   ` Orit Wasserman
  0 siblings, 0 replies; 19+ messages in thread
From: Orit Wasserman @ 2012-06-14 11:15 UTC (permalink / raw)
  To: Juan Quintela; +Cc: chegu_vinod, qemu-devel

On 05/22/2012 09:32 PM, Juan Quintela wrote:
> Refactor the code that is only needed for tcg to an static function.
> Call that only when tcg is enabled.  We can't refactor to a dummy
> function in the kvm case, as qemu can be compiled at the same time
> with tcg and kvm.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  exec.c |   31 +++++++++++++++++++++----------
>  1 files changed, 21 insertions(+), 10 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index a0494c7..b6c7675 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1943,11 +1943,29 @@ void tb_flush_jmp_cache(CPUArchState *env, target_ulong addr)
>              TB_JMP_PAGE_SIZE * sizeof(TranslationBlock *));
>  }
> 
> +static void tlb_reset_dirty_range_all(ram_addr_t start, ram_addr_t end,
> +                                      uintptr_t length)
> +{
> +    uintptr_t start1;
> +
> +    /* we modify the TLB cache so that the dirty bit will be set again
> +       when accessing the range */
> +    start1 = (uintptr_t)qemu_safe_ram_ptr(start);
> +    /* Check that we don't span multiple blocks - this breaks the
> +       address comparisons below.  */
> +    if ((uintptr_t)qemu_safe_ram_ptr(end - 1) - start1
> +            != (end - 1) - start) {
> +        abort();
> +    }
> +    cpu_tlb_reset_dirty_all(start1, length);
> +
> +}
> +
>  /* Note: start and end must be within the same ram block.  */
>  void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end,
>                                       int dirty_flags)
>  {
> -    uintptr_t length, start1;
> +    uintptr_t length;
> 
>      start &= TARGET_PAGE_MASK;
>      end = TARGET_PAGE_ALIGN(end);
> @@ -1957,16 +1975,9 @@ void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end,
>          return;
>      cpu_physical_memory_mask_dirty_range(start, length, dirty_flags);
> 
> -    /* we modify the TLB cache so that the dirty bit will be set again
> -       when accessing the range */
> -    start1 = (uintptr_t)qemu_safe_ram_ptr(start);
> -    /* Check that we don't span multiple blocks - this breaks the
> -       address comparisons below.  */
> -    if ((uintptr_t)qemu_safe_ram_ptr(end - 1) - start1
> -            != (end - 1) - start) {
> -        abort();
> +    if (tcg_enabled()) {
> +        tlb_reset_dirty_range_all(start, end, length);
>      }
> -    cpu_tlb_reset_dirty_all(start1, length);
>  }
> 
>  int cpu_physical_memory_set_dirty_tracking(int enable)

Reviewed-by: Orit Wasserman <owasserm@redhat.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] Only calculate expected_time for stage 2
  2012-05-22 18:32 ` [Qemu-devel] [PATCH 5/7] Only calculate expected_time for stage 2 Juan Quintela
@ 2012-06-14 11:31   ` Orit Wasserman
  0 siblings, 0 replies; 19+ messages in thread
From: Orit Wasserman @ 2012-06-14 11:31 UTC (permalink / raw)
  To: Juan Quintela; +Cc: chegu_vinod, qemu-devel

On 05/22/2012 09:32 PM, Juan Quintela wrote:
> ram_save_remaining() is an expensive operation when there is a lot of memory.
> So we only call the function when we need it.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  arch_init.c |   10 ++++++----
>  1 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 988adca..76a3d4e 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -295,7 +295,6 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
>      ram_addr_t addr;
>      uint64_t bytes_transferred_last;
>      double bwidth = 0;
> -    uint64_t expected_time = 0;
>      int ret;
> 
>      if (stage < 0) {
> @@ -372,9 +371,12 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
> 
>      qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> 
> -    expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
> -
> -    return (stage == 2) && (expected_time <= migrate_max_downtime());
> +    if (stage == 2) {
> +        uint64_t expected_time;
> +        expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
> +        return expected_time <= migrate_max_downtime();
> +    }
> +    return 0;
>  }
> 
>  static inline void *host_from_stream_offset(QEMUFile *f,

Reviewed-by: Orit Wasserman <owasserm@redhat.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 6/7] Exit loop if we have been there too long
  2012-05-22 18:32 ` [Qemu-devel] [PATCH 6/7] Exit loop if we have been there too long Juan Quintela
@ 2012-06-14 11:36   ` Orit Wasserman
  2012-06-21 19:34     ` Juan Quintela
  0 siblings, 1 reply; 19+ messages in thread
From: Orit Wasserman @ 2012-06-14 11:36 UTC (permalink / raw)
  To: Juan Quintela; +Cc: chegu_vinod, qemu-devel

On 05/22/2012 09:32 PM, Juan Quintela wrote:
> cheking each 64 pages is a random magic number as good as any other.
s/cheking/checking
> We don't want to test too many times, but on the other hand,
> qemu_get_clock_ns() is not so expensive either.  We want to be sure
> that we spent less than 50ms (half of buffered_file timer), if we
> spent more than 100ms, all the accounting got wrong.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  arch_init.c |   15 +++++++++++++++
>  1 files changed, 15 insertions(+), 0 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 76a3d4e..2aa77ff 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -296,6 +296,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
>      uint64_t bytes_transferred_last;
>      double bwidth = 0;
>      int ret;
> +    int i;
> 
>      if (stage < 0) {
>          memory_global_dirty_log_stop();
> @@ -335,6 +336,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
>      bytes_transferred_last = bytes_transferred;
>      bwidth = qemu_get_clock_ns(rt_clock);
> 
> +    i = 0;
>      while ((ret = qemu_file_rate_limit(f)) == 0) {
>          int bytes_sent;
> 
> @@ -343,6 +345,19 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
>          if (bytes_sent == 0) { /* no more blocks */
>              break;
>          }
> +        /* we want to check in the 1st loop, just in case it was the 1st time
> +           and we had to sync the dirty bitmap.
> +           qemu_get_clock_ns() is a bit expensive, so we only check each some
> +           iterations
> +        */
> +        if ((i & 63) == 0) {
> +            uint64_t t1 = (qemu_get_clock_ns(rt_clock) - bwidth) / 1000000;
> +            if (t1 > 50) { /* 50ms, half buffered_file limit */
can't we use a constant ?
> +                printf("big delay %ld milliseconds, %d iterations\n", t1, i);
printf ? 

Orit
> +                break;
> +            }
> +        }
> +        i++;
>      }
> 
>      if (ret < 0) {

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 7/7] Maintaing number of dirty pages
  2012-05-22 18:32 ` [Qemu-devel] [PATCH 7/7] Maintaing number of dirty pages Juan Quintela
@ 2012-06-14 11:42   ` Orit Wasserman
  0 siblings, 0 replies; 19+ messages in thread
From: Orit Wasserman @ 2012-06-14 11:42 UTC (permalink / raw)
  To: Juan Quintela; +Cc: chegu_vinod, qemu-devel

On 05/22/2012 09:32 PM, Juan Quintela wrote:
> Calculate the number of dirty pages takes a lot on hosts with lots
> of memory.  Just maintain how many pages are dirty.  Only sync bitmaps
> if number is small enough.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  arch_init.c     |   15 +--------------
>  cpu-all.h       |    1 +
>  exec-obsolete.h |    8 ++++++++
>  exec.c          |    2 ++
>  4 files changed, 12 insertions(+), 14 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 2aa77ff..77c3580 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -224,20 +224,7 @@ static uint64_t bytes_transferred;
> 
>  static ram_addr_t ram_save_remaining(void)
>  {
> -    RAMBlock *block;
> -    ram_addr_t count = 0;
> -
> -    QLIST_FOREACH(block, &ram_list.blocks, next) {
> -        ram_addr_t addr;
> -        for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) {
> -            if (memory_region_get_dirty(block->mr, addr, TARGET_PAGE_SIZE,
> -                                        DIRTY_MEMORY_MIGRATION)) {
> -                count++;
> -            }
> -        }
> -    }
> -
> -    return count;
> +    return ram_list.dirty_pages;
>  }
> 
>  uint64_t ram_bytes_remaining(void)
> diff --git a/cpu-all.h b/cpu-all.h
> index 028528f..897210f 100644
> --- a/cpu-all.h
> +++ b/cpu-all.h
> @@ -502,6 +502,7 @@ typedef struct RAMBlock {
>  typedef struct RAMList {
>      uint8_t *phys_dirty;
>      QLIST_HEAD(, RAMBlock) blocks;
> +    uint64_t dirty_pages;
>  } RAMList;
>  extern RAMList ram_list;
> 
> diff --git a/exec-obsolete.h b/exec-obsolete.h
> index 792c831..2e54ac9 100644
> --- a/exec-obsolete.h
> +++ b/exec-obsolete.h
> @@ -75,6 +75,10 @@ static inline int cpu_physical_memory_get_dirty(ram_addr_t start,
> 
>  static inline void cpu_physical_memory_set_dirty(ram_addr_t addr)
>  {
> +    if (!cpu_physical_memory_get_dirty(addr, TARGET_PAGE_SIZE,
> +                                       MIGRATION_DIRTY_FLAG)) {
> +        ram_list.dirty_pages++;
> +    }
>      ram_list.phys_dirty[addr >> TARGET_PAGE_BITS] = 0xff;
>  }
> 
> @@ -112,6 +116,10 @@ static inline void cpu_physical_memory_mask_dirty_range(ram_addr_t start,
>      mask = ~dirty_flags;
>      p = ram_list.phys_dirty + (start >> TARGET_PAGE_BITS);
>      for (addr = start; addr < end; addr += TARGET_PAGE_SIZE) {
> +        if (cpu_physical_memory_get_dirty(addr, TARGET_PAGE_SIZE,
> +                                          MIGRATION_DIRTY_FLAG & dirty_flags)) {
> +            ram_list.dirty_pages--;
> +        }
>          *p++ &= mask;
>      }
>  }
> diff --git a/exec.c b/exec.c
> index b6c7675..be4865c 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2687,6 +2687,8 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
>      memset(ram_list.phys_dirty + (new_block->offset >> TARGET_PAGE_BITS),
>             0xff, size >> TARGET_PAGE_BITS);
> 
> +    ram_list.dirty_pages += size >> TARGET_PAGE_BITS;
> +
>      if (kvm_enabled())
>          kvm_setup_guest_memory(new_block->host, size);
> 
Juan,
Don't we need to update cpu_physical_memory_set_dirty_range ?

Orit

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 6/7] Exit loop if we have been there too long
  2012-06-14 11:36   ` Orit Wasserman
@ 2012-06-21 19:34     ` Juan Quintela
  2012-06-22  2:42       ` 陳韋任 (Wei-Ren Chen)
  0 siblings, 1 reply; 19+ messages in thread
From: Juan Quintela @ 2012-06-21 19:34 UTC (permalink / raw)
  To: Orit Wasserman; +Cc: chegu_vinod, qemu-devel

Orit Wasserman <owasserm@redhat.com> wrote:
> On 05/22/2012 09:32 PM, Juan Quintela wrote:
>> cheking each 64 pages is a random magic number as good as any other.
> s/cheking/checking

Done.

>> +        */
>> +        if ((i & 63) == 0) {
>> +            uint64_t t1 = (qemu_get_clock_ns(rt_clock) - bwidth) / 1000000;
>> +            if (t1 > 50) { /* 50ms, half buffered_file limit */
> can't we use a constant ?

50 is a constant already, no?  Or what do you mean.

>> +                printf("big delay %ld milliseconds, %d iterations\n", t1, i);
> printf ? 

This is the kind of "this shouldn't happen", but still happens, DPRINTF?

Thanks, Juan.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 6/7] Exit loop if we have been there too long
  2012-06-21 19:34     ` Juan Quintela
@ 2012-06-22  2:42       ` 陳韋任 (Wei-Ren Chen)
  2012-06-22 12:44         ` Juan Quintela
  0 siblings, 1 reply; 19+ messages in thread
From: 陳韋任 (Wei-Ren Chen) @ 2012-06-22  2:42 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Orit Wasserman, chegu_vinod, qemu-devel

> >> +        if ((i & 63) == 0) {
> >> +            uint64_t t1 = (qemu_get_clock_ns(rt_clock) - bwidth) / 1000000;
> >> +            if (t1 > 50) { /* 50ms, half buffered_file limit */
> > can't we use a constant ?
> 
> 50 is a constant already, no?  Or what do you mean.

  I guess Orit means,

#define THRESHOLD 50

if (t1 > THRESHOLD) { ... }

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 6/7] Exit loop if we have been there too long
  2012-06-22  2:42       ` 陳韋任 (Wei-Ren Chen)
@ 2012-06-22 12:44         ` Juan Quintela
  0 siblings, 0 replies; 19+ messages in thread
From: Juan Quintela @ 2012-06-22 12:44 UTC (permalink / raw)
  To: (Wei-Ren Chen); +Cc: Orit Wasserman, chegu_vinod, qemu-devel

"(Wei-Ren Chen)" <chenwj@iis.sinica.edu.tw> wrote:
>> >> +        if ((i & 63) == 0) {
>> >> +            uint64_t t1 = (qemu_get_clock_ns(rt_clock) - bwidth) / 1000000;
>> >> +            if (t1 > 50) { /* 50ms, half buffered_file limit */
>> > can't we use a constant ?
>> 
>> 50 is a constant already, no?  Or what do you mean.
>
>   I guess Orit means,
>
> #define THRESHOLD 50
>
> if (t1 > THRESHOLD) { ... }

Thanks, Added.

Later, Juan.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-06-22 12:44 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-22 18:32 [Qemu-devel] [RFC 0/7] Fix migration with lots of memory Juan Quintela
2012-05-22 18:32 ` [Qemu-devel] [PATCH 1/7] Add spent time for migration Juan Quintela
2012-06-14 10:52   ` Orit Wasserman
2012-05-22 18:32 ` [Qemu-devel] [PATCH 2/7] Add tracepoints for savevm section start/end Juan Quintela
2012-06-14 11:00   ` Orit Wasserman
2012-05-22 18:32 ` [Qemu-devel] [PATCH 3/7] No need to iterate if we already are over the limit Juan Quintela
2012-06-14 11:03   ` Orit Wasserman
2012-05-22 18:32 ` [Qemu-devel] [PATCH 4/7] Only TCG needs TLB handling Juan Quintela
2012-06-14 11:15   ` Orit Wasserman
2012-05-22 18:32 ` [Qemu-devel] [PATCH 5/7] Only calculate expected_time for stage 2 Juan Quintela
2012-06-14 11:31   ` Orit Wasserman
2012-05-22 18:32 ` [Qemu-devel] [PATCH 6/7] Exit loop if we have been there too long Juan Quintela
2012-06-14 11:36   ` Orit Wasserman
2012-06-21 19:34     ` Juan Quintela
2012-06-22  2:42       ` 陳韋任 (Wei-Ren Chen)
2012-06-22 12:44         ` Juan Quintela
2012-05-22 18:32 ` [Qemu-devel] [PATCH 7/7] Maintaing number of dirty pages Juan Quintela
2012-06-14 11:42   ` Orit Wasserman
2012-06-11  3:56 ` [Qemu-devel] [RFC 0/7] Fix migration with lots of memory Chegu Vinod

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.