qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Migration time prediction using calc-dirty-rate
@ 2023-02-28 13:16 Andrei Gudkov via
  2023-02-28 13:16 ` [PATCH 1/2] migration/calc-dirty-rate: new metrics in sampling mode Andrei Gudkov via
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Andrei Gudkov via @ 2023-02-28 13:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, dgilbert, Andrei Gudkov

The overall goal of this patch is to be able to predict time it would
take to migrate VM in precopy mode based on max allowed downtime,
network bandwidth, and metrics collected with "calc-dirty-rate".
Predictor itself is a simple python script that closely follows iterations
of the migration algorithm: compute how long it would take to copy
dirty pages, estimate number of pages dirtied by VM from the beginning
of the last iteration; repeat all over again until estimated iteration time
fits max allowed downtime. However, to get reasonable accuracy, predictor
requires more metrics, which have been implemented into "calc-dirty-rate".

Summary of calc-dirty-rate changes:

1. The most important change is that now calc-dirty-rate produces
   a *vector* of dirty page measurements for progressively increasing time
   periods: 125ms, 250, 500, 750, 1000, 1500, .., up to specified calc-time.
   The motivation behind such change is that number of dirtied pages as
   a function of time starting from "clean state" (new migration iteration)
   is far from linear. Shape of this function depends on the workload type
   and intensity. Measuring number of dirty pages at progressively
   increasing periods allows to reconstruct this function using piece-wise
   interpolation.

2. New metric added -- number of all-zero pages.
   Predictor needs to distinguish between number of zero and non-zero pages
   because during migration only 8 byte header is placed on the wire for
   all-zero page.

3. Hashing function was changed from CRC32 to xxHash.
   This reduces overhead of sampling by ~10 times, which is important since
   now some of the measurement periods are sub-second.

4. Other trivial metrics were added for convenience: total number
   of VM pages, number of sampled pages, page size.


After these changes output from calc-dirty-rate looks like this:

{
  "page-size": 4096,
  "periods": [125, 250, 375, 500, 750, 1000, 1500,
              2000, 3000, 4001, 6000, 8000, 10000,
              15000, 20000, 25000, 30000, 35000,
              40000, 45000, 50000, 60000],
  "status": "measured",
  "sample-pages": 512,
  "dirty-rate": 98,
  "mode": "page-sampling",
  "n-dirty-pages": [33, 78, 119, 151, 217, 236, 293, 336,
                    425, 505, 620, 756, 898, 1204, 1457,
                    1723, 1934, 2141, 2328, 2522, 2675, 2958],
  "n-sampled-pages": 16392,
  "n-zero-pages": 10060,
  "n-total-pages": 8392704,
  "start-time": 2916750,
  "calc-time": 60
}

Passing this data into prediction script, we get the following estimations:

Downtime> |    125ms |    250ms |    500ms |   1000ms |   5000ms |    unlim
---------------------------------------------------------------------------
 100 Mbps |        - |        - |        - |        - |        - |   16m59s  
   1 Gbps |        - |        - |        - |        - |        - |    1m40s
   2 Gbps |        - |        - |        - |        - |    1m41s |      50s  
 2.5 Gbps |        - |        - |        - |        - |    1m07s |      40s
   5 Gbps |      48s |      46s |      31s |      28s |      25s |      20s
  10 Gbps |      13s |      12s |      12s |      12s |      12s |      10s
  25 Gbps |       5s |       5s |       5s |       5s |       4s |       4s
  40 Gbps |       3s |       3s |       3s |       3s |       3s |       3s


Quality of prediction was tested with YCSB benchmark. Memcached instance
was installed into 32GiB VM, and a client generated a stream of requests.
Between experiments we varied request size distribution, number of threads,
and location of the client (inside or outside the VM).
After short preheat phase, we measured calc-dirty-rate:
1. {"execute": "calc-dirty-rate", "arguments":{"calc-time":60}}
2. Wait 60 seconds
3. Collect results with {"execute": "query-dirty-rate"}

Afterwards we tried to migrate VM after randomly selecting max downtime
and bandwidth limit. Typical prediction error is 6-7%, with only 180 out
of 5779 experiments failing badly: prediction error >=25% or incorrectly
predicting migration success when in fact it didn't converge.


Andrei Gudkov (2):
  migration/calc-dirty-rate: new metrics in sampling mode
  migration/calc-dirty-rate: tool to predict migration time

 MAINTAINERS                  |   1 +
 migration/dirtyrate.c        | 219 +++++++++++++++++++++------
 migration/dirtyrate.h        |  26 +++-
 qapi/migration.json          |  25 ++++
 scripts/predict_migration.py | 283 +++++++++++++++++++++++++++++++++++
 5 files changed, 502 insertions(+), 52 deletions(-)
 create mode 100644 scripts/predict_migration.py

-- 
2.30.2



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] migration/calc-dirty-rate: new metrics in sampling mode
  2023-02-28 13:16 [PATCH 0/2] Migration time prediction using calc-dirty-rate Andrei Gudkov via
@ 2023-02-28 13:16 ` Andrei Gudkov via
  2023-04-18 17:11   ` Daniel P. Berrangé
  2023-02-28 13:16 ` [PATCH 2/2] migration/calc-dirty-rate: tool to predict migration time Andrei Gudkov via
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 12+ messages in thread
From: Andrei Gudkov via @ 2023-02-28 13:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, dgilbert, Andrei Gudkov

* Collect number of all-zero pages
* Collect vector of number of dirty pages for different time periods
* Report total number of pages, number of sampled pages and page size
* Replaced CRC32 with xxHash for performance reasons

Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
---
 migration/dirtyrate.c | 219 +++++++++++++++++++++++++++++++++---------
 migration/dirtyrate.h |  26 ++++-
 qapi/migration.json   |  25 +++++
 3 files changed, 218 insertions(+), 52 deletions(-)

diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 575d48c397..cb5dc579c7 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -28,6 +28,7 @@
 #include "sysemu/kvm.h"
 #include "sysemu/runstate.h"
 #include "exec/memory.h"
+#include "qemu/xxhash.h"
 
 /*
  * total_dirty_pages is procted by BQL and is used
@@ -222,6 +223,7 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
     info->calc_time = DirtyStat.calc_time;
     info->sample_pages = DirtyStat.sample_pages;
     info->mode = dirtyrate_mode;
+    info->page_size = TARGET_PAGE_SIZE;
 
     if (qatomic_read(&CalculatingState) == DIRTY_RATE_STATUS_MEASURED) {
         info->has_dirty_rate = true;
@@ -243,6 +245,32 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
             info->vcpu_dirty_rate = head;
         }
 
+        if (dirtyrate_mode == DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING) {
+            int64List *periods_head = NULL;
+            int64List **periods_tail = &periods_head;
+            int64List *n_dirty_pages_head = NULL;
+            int64List **n_dirty_pages_tail = &n_dirty_pages_head;
+
+            info->n_total_pages = DirtyStat.page_sampling.n_total_pages;
+            info->has_n_total_pages = true;
+
+            info->n_sampled_pages = DirtyStat.page_sampling.n_sampled_pages;
+            info->has_n_sampled_pages = true;
+
+            info->n_zero_pages = DirtyStat.page_sampling.n_zero_pages;
+            info->has_n_zero_pages = true;
+
+            for (i = 0; i < DirtyStat.page_sampling.n_readings; i++) {
+                DirtyReading *dr = &DirtyStat.page_sampling.readings[i];
+                QAPI_LIST_APPEND(periods_tail, dr->period);
+                QAPI_LIST_APPEND(n_dirty_pages_tail, dr->n_dirty_pages);
+            }
+            info->n_dirty_pages = n_dirty_pages_head;
+            info->periods = periods_head;
+            info->has_n_dirty_pages = true;
+            info->has_periods = true;
+        }
+
         if (dirtyrate_mode == DIRTY_RATE_MEASURE_MODE_DIRTY_BITMAP) {
             info->sample_pages = 0;
         }
@@ -263,9 +291,12 @@ static void init_dirtyrate_stat(int64_t start_time,
 
     switch (config.mode) {
     case DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING:
-        DirtyStat.page_sampling.total_dirty_samples = 0;
-        DirtyStat.page_sampling.total_sample_count = 0;
-        DirtyStat.page_sampling.total_block_mem_MB = 0;
+        DirtyStat.page_sampling.n_total_pages = 0;
+        DirtyStat.page_sampling.n_sampled_pages = 0;
+        DirtyStat.page_sampling.n_zero_pages = 0;
+        DirtyStat.page_sampling.n_readings = 0;
+        DirtyStat.page_sampling.readings = g_try_malloc0_n(MAX_DIRTY_READINGS,
+                                                          sizeof(DirtyReading));
         break;
     case DIRTY_RATE_MEASURE_MODE_DIRTY_RING:
         DirtyStat.dirty_ring.nvcpu = -1;
@@ -283,28 +314,58 @@ static void cleanup_dirtyrate_stat(struct DirtyRateConfig config)
         free(DirtyStat.dirty_ring.rates);
         DirtyStat.dirty_ring.rates = NULL;
     }
+    if (DirtyStat.page_sampling.readings) {
+        free(DirtyStat.page_sampling.readings);
+        DirtyStat.page_sampling.readings = NULL;
+    }
 }
 
-static void update_dirtyrate_stat(struct RamblockDirtyInfo *info)
-{
-    DirtyStat.page_sampling.total_dirty_samples += info->sample_dirty_count;
-    DirtyStat.page_sampling.total_sample_count += info->sample_pages_count;
-    /* size of total pages in MB */
-    DirtyStat.page_sampling.total_block_mem_MB += (info->ramblock_pages *
-                                                   TARGET_PAGE_SIZE) >> 20;
+/*
+ * Compute hash of a single page of size TARGET_PAGE_SIZE.
+ * If ptr is NULL, then compute hash of a page entirely filled with zeros.
+ */
+static uint32_t compute_page_hash(void *ptr)
+{
+    uint32_t i;
+    uint64_t v1, v2, v3, v4;
+    uint64_t res;
+    const uint64_t *p = ptr;
+
+    v1 = QEMU_XXHASH_SEED + XXH_PRIME64_1 + XXH_PRIME64_2;
+    v2 = QEMU_XXHASH_SEED + XXH_PRIME64_2;
+    v3 = QEMU_XXHASH_SEED + 0;
+    v4 = QEMU_XXHASH_SEED - XXH_PRIME64_1;
+    if (ptr) {
+        for (i = 0; i < TARGET_PAGE_SIZE / 8; i += 4) {
+            v1 = XXH64_round(v1, p[i + 0]);
+            v2 = XXH64_round(v2, p[i + 1]);
+            v3 = XXH64_round(v3, p[i + 2]);
+            v4 = XXH64_round(v4, p[i + 3]);
+        }
+    } else {
+        for (i = 0; i < TARGET_PAGE_SIZE / 8; i += 4) {
+            v1 = XXH64_round(v1, 0);
+            v2 = XXH64_round(v2, 0);
+            v3 = XXH64_round(v3, 0);
+            v4 = XXH64_round(v4, 0);
+        }
+    }
+    res = XXH64_mergerounds(v1, v2, v3, v4);
+    res += TARGET_PAGE_SIZE;
+    res = XXH64_avalanche(res);
+    return (uint32_t)(res & UINT32_MAX);
 }
 
-static void update_dirtyrate(uint64_t msec)
+static uint32_t get_zero_page_hash(void)
 {
-    uint64_t dirtyrate;
-    uint64_t total_dirty_samples = DirtyStat.page_sampling.total_dirty_samples;
-    uint64_t total_sample_count = DirtyStat.page_sampling.total_sample_count;
-    uint64_t total_block_mem_MB = DirtyStat.page_sampling.total_block_mem_MB;
+    static uint32_t hash;
+    static int is_computed;
 
-    dirtyrate = total_dirty_samples * total_block_mem_MB *
-                1000 / (total_sample_count * msec);
-
-    DirtyStat.dirty_rate = dirtyrate;
+    if (!is_computed) {
+        hash = compute_page_hash(NULL);
+        is_computed = 1;
+    }
+    return hash;
 }
 
 /*
@@ -314,13 +375,10 @@ static void update_dirtyrate(uint64_t msec)
 static uint32_t get_ramblock_vfn_hash(struct RamblockDirtyInfo *info,
                                       uint64_t vfn)
 {
-    uint32_t crc;
-
-    crc = crc32(0, (info->ramblock_addr +
-                vfn * TARGET_PAGE_SIZE), TARGET_PAGE_SIZE);
-
-    trace_get_ramblock_vfn_hash(info->idstr, vfn, crc);
-    return crc;
+    uint32_t hash;
+    hash = compute_page_hash(info->ramblock_addr + vfn * TARGET_PAGE_SIZE);
+    trace_get_ramblock_vfn_hash(info->idstr, vfn, hash);
+    return hash;
 }
 
 static bool save_ramblock_hash(struct RamblockDirtyInfo *info)
@@ -328,6 +386,7 @@ static bool save_ramblock_hash(struct RamblockDirtyInfo *info)
     unsigned int sample_pages_count;
     int i;
     GRand *rand;
+    uint32_t zero_page_hash = get_zero_page_hash();
 
     sample_pages_count = info->sample_pages_count;
 
@@ -349,12 +408,17 @@ static bool save_ramblock_hash(struct RamblockDirtyInfo *info)
         return false;
     }
 
-    rand  = g_rand_new();
+    rand = g_rand_new();
+    DirtyStat.page_sampling.n_total_pages += info->ramblock_pages;
     for (i = 0; i < sample_pages_count; i++) {
         info->sample_page_vfn[i] = g_rand_int_range(rand, 0,
                                                     info->ramblock_pages - 1);
         info->hash_result[i] = get_ramblock_vfn_hash(info,
                                                      info->sample_page_vfn[i]);
+        DirtyStat.page_sampling.n_sampled_pages++;
+        if (info->hash_result[i] == zero_page_hash) {
+            DirtyStat.page_sampling.n_zero_pages++;
+        }
     }
     g_rand_free(rand);
 
@@ -451,18 +515,20 @@ out:
     return ret;
 }
 
-static void calc_page_dirty_rate(struct RamblockDirtyInfo *info)
+static int64_t calc_page_dirty_rate(struct RamblockDirtyInfo *info)
 {
     uint32_t crc;
     int i;
 
+    int64_t n_dirty = 0;
     for (i = 0; i < info->sample_pages_count; i++) {
         crc = get_ramblock_vfn_hash(info, info->sample_page_vfn[i]);
         if (crc != info->hash_result[i]) {
+            n_dirty++;
             trace_calc_page_dirty_rate(info->idstr, crc, info->hash_result[i]);
-            info->sample_dirty_count++;
         }
     }
+    return n_dirty;
 }
 
 static struct RamblockDirtyInfo *
@@ -491,11 +557,12 @@ find_block_matched(RAMBlock *block, int count,
     return &infos[i];
 }
 
-static bool compare_page_hash_info(struct RamblockDirtyInfo *info,
+static int64_t compare_page_hash_info(struct RamblockDirtyInfo *info,
                                   int block_count)
 {
     struct RamblockDirtyInfo *block_dinfo = NULL;
     RAMBlock *block = NULL;
+    int64_t n_dirty = 0;
 
     RAMBLOCK_FOREACH_MIGRATABLE(block) {
         if (skip_sample_ramblock(block)) {
@@ -505,15 +572,10 @@ static bool compare_page_hash_info(struct RamblockDirtyInfo *info,
         if (block_dinfo == NULL) {
             continue;
         }
-        calc_page_dirty_rate(block_dinfo);
-        update_dirtyrate_stat(block_dinfo);
-    }
-
-    if (DirtyStat.page_sampling.total_sample_count == 0) {
-        return false;
+        n_dirty += calc_page_dirty_rate(block_dinfo);
     }
 
-    return true;
+    return n_dirty;
 }
 
 static inline void record_dirtypages_bitmap(DirtyPageRecord *dirty_pages,
@@ -544,6 +606,8 @@ static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig config)
     int64_t start_time;
     DirtyPageRecord dirty_pages;
 
+
+
     qemu_mutex_lock_iothread();
     memory_global_dirty_log_start(GLOBAL_DIRTY_DIRTY_RATE);
 
@@ -614,13 +678,40 @@ static void calculate_dirtyrate_dirty_ring(struct DirtyRateConfig config)
     DirtyStat.dirty_rate = dirtyrate_sum;
 }
 
+static int64_t increase_period(int64_t prev_period, int64_t max_period)
+{
+    int64_t delta;
+    int64_t next_period;
+
+    if (prev_period < 500) {
+        delta = 125;
+    } else if (prev_period < 1000) {
+        delta = 250;
+    } else if (prev_period < 2000) {
+        delta = 500;
+    } else if (prev_period < 4000) {
+        delta = 1000;
+    } else if (prev_period < 10000) {
+        delta = 2000;
+    } else {
+        delta = 5000;
+    }
+
+    next_period = prev_period + delta;
+    if (next_period + delta >= max_period) {
+        next_period = max_period;
+    }
+    return next_period;
+}
+
 static void calculate_dirtyrate_sample_vm(struct DirtyRateConfig config)
 {
     struct RamblockDirtyInfo *block_dinfo = NULL;
     int block_count = 0;
-    int64_t msec = 0;
     int64_t initial_time;
+    int64_t current_time;
 
+    /* first pass */
     rcu_read_lock();
     initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     if (!record_ramblock_hash_info(&block_dinfo, config, &block_count)) {
@@ -628,20 +719,34 @@ static void calculate_dirtyrate_sample_vm(struct DirtyRateConfig config)
     }
     rcu_read_unlock();
 
-    msec = config.sample_period_seconds * 1000;
-    msec = dirty_stat_wait(msec, initial_time);
-    DirtyStat.start_time = initial_time / 1000;
-    DirtyStat.calc_time = msec / 1000;
+    int64_t period = INITIAL_PERIOD_MS;
+    while (true) {
+        current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+        int64_t delta = initial_time + period - current_time;
+        if (delta > 0) {
+            g_usleep(delta * 1000);
+        }
 
-    rcu_read_lock();
-    if (!compare_page_hash_info(block_dinfo, block_count)) {
-        goto out;
-    }
+        rcu_read_lock();
+        current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+        int64_t n_dirty = compare_page_hash_info(block_dinfo, block_count);
+        rcu_read_unlock();
 
-    update_dirtyrate(msec);
+        SampleVMStat *ps = &DirtyStat.page_sampling;
+        ps->readings[ps->n_readings].period = current_time - initial_time;
+        ps->readings[ps->n_readings].n_dirty_pages = n_dirty;
+        ps->n_readings++;
+
+        if (period >= DirtyStat.calc_time * 1000) {
+            int64_t mb_total = (ps->n_total_pages * TARGET_PAGE_SIZE) >> 20;
+            int64_t mb_dirty = n_dirty * mb_total / ps->n_sampled_pages;
+            DirtyStat.dirty_rate = mb_dirty * 1000 / period;
+            break;
+        }
+        period = increase_period(period, DirtyStat.calc_time * 1000);
+    }
 
 out:
-    rcu_read_unlock();
     free_ramblock_dirty_info(block_dinfo, block_count);
 }
 
@@ -804,11 +909,29 @@ void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict)
                                rate->value->dirty_rate);
             }
         }
+
     } else {
         monitor_printf(mon, "(not ready)\n");
     }
 
+    if (info->has_n_total_pages) {
+        monitor_printf(mon, "Page count (page size %d):\n", TARGET_PAGE_SIZE);
+        monitor_printf(mon, " Total: %"PRIi64"\n", info->n_total_pages);
+        monitor_printf(mon, "  Sampled: %"PRIi64"\n", info->n_sampled_pages);
+        monitor_printf(mon, "   Zero: %"PRIi64"\n", info->n_zero_pages);
+        int64List *periods = info->periods;
+        int64List *n_dirty_pages = info->n_dirty_pages;
+        while (periods) {
+            monitor_printf(mon, "   Dirty(%"PRIi64"ms): %"PRIi64"\n",
+                           periods->value, n_dirty_pages->value);
+            periods = periods->next;
+            n_dirty_pages = n_dirty_pages->next;
+        }
+    }
+
     qapi_free_DirtyRateVcpuList(info->vcpu_dirty_rate);
+    qapi_free_int64List(info->periods);
+    qapi_free_int64List(info->n_dirty_pages);
     g_free(info);
 }
 
diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
index 594a5c0bb6..e2af72fb8c 100644
--- a/migration/dirtyrate.h
+++ b/migration/dirtyrate.h
@@ -42,6 +42,18 @@
 #define MIN_SAMPLE_PAGE_COUNT                     128
 #define MAX_SAMPLE_PAGE_COUNT                     16384
 
+/*
+ * Initial sampling period expressed in milliseconds
+ */
+#define INITIAL_PERIOD_MS 125
+
+/*
+ * Upper bound on the number of DirtyReadings calculcated based on
+ * INITIAL_PERIOD_MS, MAX_FETCH_DIRTYRATE_TIME_SEC and increase_period()
+ */
+#define MAX_DIRTY_READINGS 32
+
+
 struct DirtyRateConfig {
     uint64_t sample_pages_per_gigabytes; /* sample pages per GB */
     int64_t sample_period_seconds; /* time duration between two sampling */
@@ -57,14 +69,20 @@ struct RamblockDirtyInfo {
     uint64_t ramblock_pages; /* ramblock size in TARGET_PAGE_SIZE */
     uint64_t *sample_page_vfn; /* relative offset address for sampled page */
     uint64_t sample_pages_count; /* count of sampled pages */
-    uint64_t sample_dirty_count; /* count of dirty pages we measure */
     uint32_t *hash_result; /* array of hash result for sampled pages */
 };
 
+typedef struct DirtyReading {
+    int64_t period; /* time period in milliseconds */
+    int64_t n_dirty_pages; /* number of observed dirty pages */
+} DirtyReading;
+
 typedef struct SampleVMStat {
-    uint64_t total_dirty_samples; /* total dirty sampled page */
-    uint64_t total_sample_count; /* total sampled pages */
-    uint64_t total_block_mem_MB; /* size of total sampled pages in MB */
+    int64_t n_total_pages; /* total number of pages */
+    int64_t n_sampled_pages; /* number of sampled pages */
+    int64_t n_zero_pages; /* number of observed zero pages */
+    int64_t n_readings;
+    DirtyReading *readings;
 } SampleVMStat;
 
 /*
diff --git a/qapi/migration.json b/qapi/migration.json
index c84fa10e86..1a1d7cb30a 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1830,6 +1830,25 @@
 # @mode: mode containing method of calculate dirtyrate includes
 #        'page-sampling' and 'dirty-ring' (Since 6.2)
 #
+# @page-size: page size in bytes
+#
+# @n-total-pages: total number of VM pages
+#
+# @n-sampled-pages: number of sampled pages
+#
+# @n-zero-pages: number of observed zero pages among all sampled pages.
+#                Normally all pages are zero when VM starts, but
+#                their number progressively goes down as VM fills more
+#                and more memory with useful data.
+#                Migration of zero pages is optimized: only their headers
+#                are copied but not the (zero) data.
+#
+# @periods: array of time periods expressed in milliseconds for which
+#           dirty-sample measurements are collected
+#
+# @n-dirty-pages: number of pages among all sampled pages that were observed
+#                 as changed after respective time period
+#
 # @vcpu-dirty-rate: dirtyrate for each vcpu if dirty-ring
 #                   mode specified (Since 6.2)
 #
@@ -1842,6 +1861,12 @@
            'calc-time': 'int64',
            'sample-pages': 'uint64',
            'mode': 'DirtyRateMeasureMode',
+           'page-size': 'int64',
+           '*n-total-pages': 'int64',
+           '*n-sampled-pages': 'int64',
+           '*n-zero-pages': 'int64',
+           '*periods': ['int64'],
+           '*n-dirty-pages': ['int64'],
            '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
 
 ##
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] migration/calc-dirty-rate: tool to predict migration time
  2023-02-28 13:16 [PATCH 0/2] Migration time prediction using calc-dirty-rate Andrei Gudkov via
  2023-02-28 13:16 ` [PATCH 1/2] migration/calc-dirty-rate: new metrics in sampling mode Andrei Gudkov via
@ 2023-02-28 13:16 ` Andrei Gudkov via
  2023-03-17 13:29 ` [PATCH 0/2] Migration time prediction using calc-dirty-rate Gudkov Andrei via
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Andrei Gudkov via @ 2023-02-28 13:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, dgilbert, Andrei Gudkov

Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
---
 MAINTAINERS                  |   1 +
 scripts/predict_migration.py | 283 +++++++++++++++++++++++++++++++++++
 2 files changed, 284 insertions(+)
 create mode 100644 scripts/predict_migration.py

diff --git a/MAINTAINERS b/MAINTAINERS
index c6e6549f06..2fb5b6298a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3107,6 +3107,7 @@ F: docs/devel/migration.rst
 F: qapi/migration.json
 F: tests/migration/
 F: util/userfaultfd.c
+F: scripts/predict_migration.py
 
 D-Bus
 M: Marc-André Lureau <marcandre.lureau@redhat.com>
diff --git a/scripts/predict_migration.py b/scripts/predict_migration.py
new file mode 100644
index 0000000000..c92a97585f
--- /dev/null
+++ b/scripts/predict_migration.py
@@ -0,0 +1,283 @@
+#!/usr/bin/env python3
+#
+# Predicts time required to migrate VM under given max downtime constraint.
+#
+# Copyright (c) 2023 HUAWEI TECHNOLOGIES CO.,LTD.
+#
+# Authors:
+#  Andrei Gudkov <gudkov.andrei@huawei.com>
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+
+# Usage:
+#
+# Step 1. Collect dirty page statistics from live VM:
+# $ scripts/predict_migration.py calc-dirty-rate <qmphost> <qmpport> >dirty.json
+# <...takes 1 minute by default...>
+#
+# Step 2. Run predictor against collected data:
+# $ scripts/predict_migration.py predict < dirty.json
+# Downtime> |    125ms |    250ms |    500ms |   1000ms |   5000ms |    unlim |
+# -----------------------------------------------------------------------------
+#  100 Mbps |        - |        - |        - |        - |        - |   16m45s |
+#    1 Gbps |        - |        - |        - |        - |        - |    1m39s |
+#    2 Gbps |        - |        - |        - |        - |    1m55s |      50s |
+#  2.5 Gbps |        - |        - |        - |        - |    1m12s |      40s |
+#    5 Gbps |        - |        - |        - |      29s |      25s |      20s |
+#   10 Gbps |      13s |      13s |      12s |      12s |      12s |      10s |
+#   25 Gbps |       5s |       5s |       5s |       5s |       4s |       4s |
+#   40 Gbps |       3s |       3s |       3s |       3s |       3s |       3s |
+#
+# The latter prints table that lists estimated time it will take to migrate VM.
+# This time depends on the network bandwidth and max allowed downtime.
+# Dash indicates that migration does not converge.
+# Prediction takes care only about migrating RAM and only in pre-copy mode.
+# Other features, such as compression or local disk migration, are not supported
+
+
+import sys
+import os
+import math
+import json
+from dataclasses import dataclass
+import asyncio
+import argparse
+
+sys.path.append(os.path.join(os.path.dirname(__file__), '..', 'python'))
+from qemu.qmp import QMPClient
+
+async def calc_dirty_rate(host, port, calc_time, sample_pages):
+    client = QMPClient()
+    try:
+        await client.connect((host, port))
+        args = {
+            'calc-time': calc_time,
+            'sample-pages': sample_pages
+        }
+        await client.execute('calc-dirty-rate', args)
+        await asyncio.sleep(calc_time)
+        while True:
+            data = await client.execute('query-dirty-rate')
+            if data['status'] == 'measuring':
+                await asyncio.sleep(0.5)
+            elif data['status'] == 'measured':
+                return data
+            else:
+                raise ValueError(data['status'])
+    finally:
+        await client.disconnect()
+
+
+class MemoryModel:
+    """
+    Models RAM state during pre-copy migration using calc-dirty-rate results.
+    Its primary function is to estimate how many pages will be dirtied
+    after given time starting from "clean" state.
+    This function is non-linear and saturates at some point.
+    """
+
+    @dataclass
+    class Point:
+        period_millis:float
+        dirty_pages:float
+
+    def __init__(self, data):
+        """
+        :param data: dictionary returned by calc-dirty-rate
+        """
+        self.__points = self.__make_points(data)
+        self.__page_size = data['page-size']
+        self.__num_total_pages = data['n-total-pages']
+        self.__num_zero_pages = data['n-zero-pages'] / \
+                (data['n-sampled-pages'] / data['n-total-pages'])
+
+    def __make_points(self, data):
+        points = list()
+
+        # Add observed points
+        sample_ratio = data['n-sampled-pages'] / data['n-total-pages']
+        for millis,dirty_pages in zip(data['periods'], data['n-dirty-pages']):
+            millis = float(millis)
+            dirty_pages = dirty_pages / sample_ratio
+            points.append(MemoryModel.Point(millis, dirty_pages))
+
+        # Extrapolate function to the left.
+        # Assuming that the function is convex, the worst case is achieved
+        # when dirty page count immediately jumps to some value at zero time
+        # (infinite slope), and next keeps the same slope as in the region
+        # between the first two observed points: points[0]..points[1]
+        slope, offset = self.__fit_line(points[0], points[1])
+        points.insert(0, MemoryModel.Point(0.0, max(offset, 0.0)))
+
+        # Extrapolate function to the right.
+        # The worst case is achieved when the function has the same slope
+        # as in the last observed region.
+        slope, offset = self.__fit_line(points[-2], points[-1])
+        max_dirty_pages = \
+                data['n-total-pages'] - (data['n-zero-pages'] / sample_ratio)
+        if slope > 0.0:
+            saturation_millis = (max_dirty_pages - offset) / slope
+            points.append(MemoryModel.Point(saturation_millis, max_dirty_pages))
+        points.append(MemoryModel.Point(math.inf, max_dirty_pages))
+
+        return points
+
+    def __fit_line(self, lhs:Point, rhs:Point):
+        slope = (rhs.dirty_pages - lhs.dirty_pages) / \
+                (rhs.period_millis - lhs.period_millis)
+        offset = lhs.dirty_pages - slope * lhs.period_millis
+        return slope, offset
+
+    def page_size(self):
+        """
+        Return page size in bytes
+        """
+        return self.__page_size
+
+    def num_total_pages(self):
+        return self.__num_total_pages
+
+    def num_zero_pages(self):
+        """
+        Estimated total number of zero pages. Assumed to be constant.
+        """
+        return self.__num_zero_pages
+
+    def num_dirty_pages(self, millis):
+        """
+        Estimate number of dirty pages after given time starting from "clean"
+        state. The estimation is based on piece-wise linear interpolation.
+        """
+        for i in range(len(self.__points)):
+            if self.__points[i].period_millis == millis:
+                return self.__points[i].dirty_pages
+            elif self.__points[i].period_millis > millis:
+                slope, offset = self.__fit_line(self.__points[i-1],
+                                                        self.__points[i])
+                return offset + slope * millis
+        raise RuntimeError("unreachable")
+
+
+def predict_migration_time(model, bandwidth, downtime, deadline=3600*1000):
+    """
+    Predict how much time it will take to migrate VM under under given
+    deadline constraint.
+
+    :param model: `MemoryModel` object for a given VM
+    :param bandwidth: Bandwidth available for migration [bytes/s]
+    :param downtime: Max allowed downtime [milliseconds]
+    :param deadline: Max total time to migrate VM before timeout [milliseconds]
+    :return: Predicted migration time [milliseconds] or `None`
+             if migration process doesn't converge before given deadline
+    """
+
+    left_zero_pages = model.num_zero_pages()
+    left_normal_pages = model.num_total_pages() - model.num_zero_pages()
+    header_size = 8
+
+    total_millis = 0.0
+    while True:
+        iter_bytes = 0.0
+        iter_bytes += left_normal_pages * (model.page_size() + header_size)
+        iter_bytes += left_zero_pages * header_size
+
+        iter_millis = iter_bytes * 1000.0 / bandwidth
+
+        total_millis += iter_millis
+
+        if iter_millis <= downtime:
+            return int(math.ceil(total_millis))
+        elif total_millis > deadline:
+            return None
+        else:
+            left_zero_pages = 0
+            left_normal_pages = model.num_dirty_pages(iter_millis)
+
+
+def run_predict_cmd(model):
+    @dataclass
+    class ValStr:
+        value:object
+        string:str
+
+    def gbps(value):
+        return ValStr(value*1024*1024*1024/8, f'{value} Gbps')
+
+    def mbps(value):
+        return ValStr(value*1024*1024/8, f'{value} Mbps')
+
+    def dt(millis):
+        if millis is not None:
+            return ValStr(millis, f'{millis}ms')
+        else:
+            return ValStr(math.inf, 'unlim')
+
+    def eta(millis):
+        if millis is not None:
+            seconds = int(math.ceil(millis/1000.0))
+            minutes, seconds = divmod(seconds, 60)
+            s = ''
+            if minutes > 0:
+                s += f'{minutes}m'
+            if len(s) > 0:
+                s += f'{seconds:02d}s'
+            else:
+                s += f'{seconds}s'
+        else:
+            s = '-'
+        return ValStr(millis, s)
+
+
+    bandwidths = [mbps(100), gbps(1), gbps(2), gbps(2.5), gbps(5), gbps(10),
+                  gbps(25), gbps(40)]
+    downtimes = [dt(125), dt(250), dt(500), dt(1000), dt(5000), dt(None)]
+
+    out = ''
+    out += 'Downtime> |'
+    for downtime in downtimes:
+        out += f'  {downtime.string:>7} |'
+    print(out)
+
+    print('-'*len(out))
+
+    for bandwidth in bandwidths:
+        print(f'{bandwidth.string:>9} | ', '', end='')
+        for downtime in downtimes:
+            millis = predict_migration_time(model,
+                                            bandwidth.value,
+                                            downtime.value)
+            print(f'{eta(millis).string:>7} | ', '', end='')
+        print()
+
+def main():
+    parser = argparse.ArgumentParser()
+    subparsers = parser.add_subparsers(dest='command', required=True)
+
+    parser_cdr = subparsers.add_parser('calc-dirty-rate',
+            help='Collect and print dirty page statistics from live VM')
+    parser_cdr.add_argument('--calc-time', type=int, default=60,
+                            help='Calculation time in seconds')
+    parser_cdr.add_argument('--sample-pages', type=int, default=512,
+            help='Number of sampled pages per one gigabyte of RAM')
+    parser_cdr.add_argument('host', metavar='host', type=str, help='QMP host')
+    parser_cdr.add_argument('port', metavar='port', type=int, help='QMP port')
+
+    subparsers.add_parser('predict', help='Predict migration time')
+
+    args = parser.parse_args()
+
+    if args.command == 'calc-dirty-rate':
+        data = asyncio.run(calc_dirty_rate(host=args.host,
+                                           port=args.port,
+                                           calc_time=args.calc_time,
+                                           sample_pages=args.sample_pages))
+        print(json.dumps(data))
+    elif args.command == 'predict':
+        data = json.load(sys.stdin)
+        model = MemoryModel(data)
+        run_predict_cmd(model)
+
+if __name__ == '__main__':
+    main()
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate
  2023-02-28 13:16 [PATCH 0/2] Migration time prediction using calc-dirty-rate Andrei Gudkov via
  2023-02-28 13:16 ` [PATCH 1/2] migration/calc-dirty-rate: new metrics in sampling mode Andrei Gudkov via
  2023-02-28 13:16 ` [PATCH 2/2] migration/calc-dirty-rate: tool to predict migration time Andrei Gudkov via
@ 2023-03-17 13:29 ` Gudkov Andrei via
  2023-03-27 14:08 ` Gudkov Andrei via
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Gudkov Andrei via @ 2023-03-17 13:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, dgilbert

ping

https://patchew.org/QEMU/cover.1677589218.git.gudkov.andrei@huawei.com/

-----Original Message-----
From: Gudkov Andrei 
Sent: Tuesday, February 28, 2023 16:16
To: qemu-devel@nongnu.org
Cc: quintela@redhat.com; dgilbert@redhat.com; Gudkov Andrei <gudkov.andrei@huawei.com>
Subject: [PATCH 0/2] Migration time prediction using calc-dirty-rate

The overall goal of this patch is to be able to predict time it would
take to migrate VM in precopy mode based on max allowed downtime,
network bandwidth, and metrics collected with "calc-dirty-rate".
Predictor itself is a simple python script that closely follows iterations
of the migration algorithm: compute how long it would take to copy
dirty pages, estimate number of pages dirtied by VM from the beginning
of the last iteration; repeat all over again until estimated iteration time
fits max allowed downtime. However, to get reasonable accuracy, predictor
requires more metrics, which have been implemented into "calc-dirty-rate".

Summary of calc-dirty-rate changes:

1. The most important change is that now calc-dirty-rate produces
   a *vector* of dirty page measurements for progressively increasing time
   periods: 125ms, 250, 500, 750, 1000, 1500, .., up to specified calc-time.
   The motivation behind such change is that number of dirtied pages as
   a function of time starting from "clean state" (new migration iteration)
   is far from linear. Shape of this function depends on the workload type
   and intensity. Measuring number of dirty pages at progressively
   increasing periods allows to reconstruct this function using piece-wise
   interpolation.

2. New metric added -- number of all-zero pages.
   Predictor needs to distinguish between number of zero and non-zero pages
   because during migration only 8 byte header is placed on the wire for
   all-zero page.

3. Hashing function was changed from CRC32 to xxHash.
   This reduces overhead of sampling by ~10 times, which is important since
   now some of the measurement periods are sub-second.

4. Other trivial metrics were added for convenience: total number
   of VM pages, number of sampled pages, page size.


After these changes output from calc-dirty-rate looks like this:

{
  "page-size": 4096,
  "periods": [125, 250, 375, 500, 750, 1000, 1500,
              2000, 3000, 4001, 6000, 8000, 10000,
              15000, 20000, 25000, 30000, 35000,
              40000, 45000, 50000, 60000],
  "status": "measured",
  "sample-pages": 512,
  "dirty-rate": 98,
  "mode": "page-sampling",
  "n-dirty-pages": [33, 78, 119, 151, 217, 236, 293, 336,
                    425, 505, 620, 756, 898, 1204, 1457,
                    1723, 1934, 2141, 2328, 2522, 2675, 2958],
  "n-sampled-pages": 16392,
  "n-zero-pages": 10060,
  "n-total-pages": 8392704,
  "start-time": 2916750,
  "calc-time": 60
}

Passing this data into prediction script, we get the following estimations:

Downtime> |    125ms |    250ms |    500ms |   1000ms |   5000ms |    unlim
---------------------------------------------------------------------------
 100 Mbps |        - |        - |        - |        - |        - |   16m59s  
   1 Gbps |        - |        - |        - |        - |        - |    1m40s
   2 Gbps |        - |        - |        - |        - |    1m41s |      50s  
 2.5 Gbps |        - |        - |        - |        - |    1m07s |      40s
   5 Gbps |      48s |      46s |      31s |      28s |      25s |      20s
  10 Gbps |      13s |      12s |      12s |      12s |      12s |      10s
  25 Gbps |       5s |       5s |       5s |       5s |       4s |       4s
  40 Gbps |       3s |       3s |       3s |       3s |       3s |       3s


Quality of prediction was tested with YCSB benchmark. Memcached instance
was installed into 32GiB VM, and a client generated a stream of requests.
Between experiments we varied request size distribution, number of threads,
and location of the client (inside or outside the VM).
After short preheat phase, we measured calc-dirty-rate:
1. {"execute": "calc-dirty-rate", "arguments":{"calc-time":60}}
2. Wait 60 seconds
3. Collect results with {"execute": "query-dirty-rate"}

Afterwards we tried to migrate VM after randomly selecting max downtime
and bandwidth limit. Typical prediction error is 6-7%, with only 180 out
of 5779 experiments failing badly: prediction error >=25% or incorrectly
predicting migration success when in fact it didn't converge.


Andrei Gudkov (2):
  migration/calc-dirty-rate: new metrics in sampling mode
  migration/calc-dirty-rate: tool to predict migration time

 MAINTAINERS                  |   1 +
 migration/dirtyrate.c        | 219 +++++++++++++++++++++------
 migration/dirtyrate.h        |  26 +++-
 qapi/migration.json          |  25 ++++
 scripts/predict_migration.py | 283 +++++++++++++++++++++++++++++++++++
 5 files changed, 502 insertions(+), 52 deletions(-)
 create mode 100644 scripts/predict_migration.py

-- 
2.30.2



^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate
  2023-02-28 13:16 [PATCH 0/2] Migration time prediction using calc-dirty-rate Andrei Gudkov via
                   ` (2 preceding siblings ...)
  2023-03-17 13:29 ` [PATCH 0/2] Migration time prediction using calc-dirty-rate Gudkov Andrei via
@ 2023-03-27 14:08 ` Gudkov Andrei via
  2023-04-03 14:41 ` Gudkov Andrei via
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Gudkov Andrei via @ 2023-03-27 14:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, dgilbert

ping2

https://patchew.org/QEMU/cover.1677589218.git.gudkov.andrei@huawei.com/

-----Original Message-----
From: Gudkov Andrei 
Sent: Friday, March 17, 2023 16:29
To: qemu-devel@nongnu.org
Cc: quintela@redhat.com; dgilbert@redhat.com
Subject: RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate

ping

https://patchew.org/QEMU/cover.1677589218.git.gudkov.andrei@huawei.com/

-----Original Message-----
From: Gudkov Andrei 
Sent: Tuesday, February 28, 2023 16:16
To: qemu-devel@nongnu.org
Cc: quintela@redhat.com; dgilbert@redhat.com; Gudkov Andrei <gudkov.andrei@huawei.com>
Subject: [PATCH 0/2] Migration time prediction using calc-dirty-rate




^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate
  2023-02-28 13:16 [PATCH 0/2] Migration time prediction using calc-dirty-rate Andrei Gudkov via
                   ` (3 preceding siblings ...)
  2023-03-27 14:08 ` Gudkov Andrei via
@ 2023-04-03 14:41 ` Gudkov Andrei via
  2023-04-10 15:19 ` Gudkov Andrei via
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Gudkov Andrei via @ 2023-04-03 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, dgilbert

ping3

https://patchew.org/QEMU/cover.1677589218.git.gudkov.andrei@huawei.com/

-----Original Message-----
From: Gudkov Andrei 
Sent: Monday, March 27, 2023 17:09
To: 'qemu-devel@nongnu.org' <qemu-devel@nongnu.org>
Cc: 'quintela@redhat.com' <quintela@redhat.com>; 'dgilbert@redhat.com' <dgilbert@redhat.com>
Subject: RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate

ping2

https://patchew.org/QEMU/cover.1677589218.git.gudkov.andrei@huawei.com/

-----Original Message-----
From: Gudkov Andrei 
Sent: Friday, March 17, 2023 16:29
To: qemu-devel@nongnu.org
Cc: quintela@redhat.com; dgilbert@redhat.com
Subject: RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate

ping

https://patchew.org/QEMU/cover.1677589218.git.gudkov.andrei@huawei.com/

-----Original Message-----
From: Gudkov Andrei 
Sent: Tuesday, February 28, 2023 16:16
To: qemu-devel@nongnu.org
Cc: quintela@redhat.com; dgilbert@redhat.com; Gudkov Andrei <gudkov.andrei@huawei.com>
Subject: [PATCH 0/2] Migration time prediction using calc-dirty-rate




^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate
  2023-02-28 13:16 [PATCH 0/2] Migration time prediction using calc-dirty-rate Andrei Gudkov via
                   ` (4 preceding siblings ...)
  2023-04-03 14:41 ` Gudkov Andrei via
@ 2023-04-10 15:19 ` Gudkov Andrei via
  2023-04-18 13:25 ` Gudkov Andrei via
  2023-04-18 17:17 ` Daniel P. Berrangé
  7 siblings, 0 replies; 12+ messages in thread
From: Gudkov Andrei via @ 2023-04-10 15:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, dgilbert, jsnow, eblake

ping4

https://patchew.org/QEMU/cover.1677589218.git.gudkov.andrei@huawei.com/

-----Original Message-----
From: Gudkov Andrei 
Sent: Monday, April 3, 2023 17:42
To: 'qemu-devel@nongnu.org' <qemu-devel@nongnu.org>
Cc: 'quintela@redhat.com' <quintela@redhat.com>; 'dgilbert@redhat.com' <dgilbert@redhat.com>
Subject: RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate

ping3

https://patchew.org/QEMU/cover.1677589218.git.gudkov.andrei@huawei.com/

-----Original Message-----
From: Gudkov Andrei 
Sent: Monday, March 27, 2023 17:09
To: 'qemu-devel@nongnu.org' <qemu-devel@nongnu.org>
Cc: 'quintela@redhat.com' <quintela@redhat.com>; 'dgilbert@redhat.com' <dgilbert@redhat.com>
Subject: RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate

ping2

https://patchew.org/QEMU/cover.1677589218.git.gudkov.andrei@huawei.com/

-----Original Message-----
From: Gudkov Andrei 
Sent: Friday, March 17, 2023 16:29
To: qemu-devel@nongnu.org
Cc: quintela@redhat.com; dgilbert@redhat.com
Subject: RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate

ping

https://patchew.org/QEMU/cover.1677589218.git.gudkov.andrei@huawei.com/

-----Original Message-----
From: Gudkov Andrei 
Sent: Tuesday, February 28, 2023 16:16
To: qemu-devel@nongnu.org
Cc: quintela@redhat.com; dgilbert@redhat.com; Gudkov Andrei <gudkov.andrei@huawei.com>
Subject: [PATCH 0/2] Migration time prediction using calc-dirty-rate




^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate
  2023-02-28 13:16 [PATCH 0/2] Migration time prediction using calc-dirty-rate Andrei Gudkov via
                   ` (5 preceding siblings ...)
  2023-04-10 15:19 ` Gudkov Andrei via
@ 2023-04-18 13:25 ` Gudkov Andrei via
  2023-04-18 17:21   ` Daniel P. Berrangé
  2023-04-18 17:17 ` Daniel P. Berrangé
  7 siblings, 1 reply; 12+ messages in thread
From: Gudkov Andrei via @ 2023-04-18 13:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, dgilbert, jsnow, eblake

ping5

https://patchew.org/QEMU/cover.1677589218.git.gudkov.andrei@huawei.com/

-----Original Message-----
From: Gudkov Andrei 
Sent: Monday, April 10, 2023 18:19
To: 'qemu-devel@nongnu.org' <qemu-devel@nongnu.org>
Cc: 'quintela@redhat.com' <quintela@redhat.com>; 'dgilbert@redhat.com' <dgilbert@redhat.com>; 'jsnow@redhat.com' <jsnow@redhat.com>; 'eblake@redhat.com' <eblake@redhat.com>
Subject: RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate

ping4

https://patchew.org/QEMU/cover.1677589218.git.gudkov.andrei@huawei.com/

-----Original Message-----
From: Gudkov Andrei 
Sent: Monday, April 3, 2023 17:42
To: 'qemu-devel@nongnu.org' <qemu-devel@nongnu.org>
Cc: 'quintela@redhat.com' <quintela@redhat.com>; 'dgilbert@redhat.com' <dgilbert@redhat.com>
Subject: RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate

ping3

https://patchew.org/QEMU/cover.1677589218.git.gudkov.andrei@huawei.com/

-----Original Message-----
From: Gudkov Andrei 
Sent: Monday, March 27, 2023 17:09
To: 'qemu-devel@nongnu.org' <qemu-devel@nongnu.org>
Cc: 'quintela@redhat.com' <quintela@redhat.com>; 'dgilbert@redhat.com' <dgilbert@redhat.com>
Subject: RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate

ping2

https://patchew.org/QEMU/cover.1677589218.git.gudkov.andrei@huawei.com/

-----Original Message-----
From: Gudkov Andrei 
Sent: Friday, March 17, 2023 16:29
To: qemu-devel@nongnu.org
Cc: quintela@redhat.com; dgilbert@redhat.com
Subject: RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate

ping

https://patchew.org/QEMU/cover.1677589218.git.gudkov.andrei@huawei.com/

-----Original Message-----
From: Gudkov Andrei 
Sent: Tuesday, February 28, 2023 16:16
To: qemu-devel@nongnu.org
Cc: quintela@redhat.com; dgilbert@redhat.com; Gudkov Andrei <gudkov.andrei@huawei.com>
Subject: [PATCH 0/2] Migration time prediction using calc-dirty-rate




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] migration/calc-dirty-rate: new metrics in sampling mode
  2023-02-28 13:16 ` [PATCH 1/2] migration/calc-dirty-rate: new metrics in sampling mode Andrei Gudkov via
@ 2023-04-18 17:11   ` Daniel P. Berrangé
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel P. Berrangé @ 2023-04-18 17:11 UTC (permalink / raw)
  To: Andrei Gudkov; +Cc: qemu-devel, quintela, dgilbert

On Tue, Feb 28, 2023 at 04:16:02PM +0300, Andrei Gudkov via wrote:
> * Collect number of all-zero pages
> * Collect vector of number of dirty pages for different time periods
> * Report total number of pages, number of sampled pages and page size
> * Replaced CRC32 with xxHash for performance reasons

I'd suggest that the CRC32 -> xxHash change should be a separate
commit from the newly reported statistics, since they're independant
functional changes.

> 
> Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
> ---
>  migration/dirtyrate.c | 219 +++++++++++++++++++++++++++++++++---------
>  migration/dirtyrate.h |  26 ++++-
>  qapi/migration.json   |  25 +++++
>  3 files changed, 218 insertions(+), 52 deletions(-)

> diff --git a/qapi/migration.json b/qapi/migration.json
> index c84fa10e86..1a1d7cb30a 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1830,6 +1830,25 @@
>  # @mode: mode containing method of calculate dirtyrate includes
>  #        'page-sampling' and 'dirty-ring' (Since 6.2)
>  #
> +# @page-size: page size in bytes
> +#
> +# @n-total-pages: total number of VM pages
> +#
> +# @n-sampled-pages: number of sampled pages
> +#
> +# @n-zero-pages: number of observed zero pages among all sampled pages.
> +#                Normally all pages are zero when VM starts, but
> +#                their number progressively goes down as VM fills more
> +#                and more memory with useful data.
> +#                Migration of zero pages is optimized: only their headers
> +#                are copied but not the (zero) data.
> +#
> +# @periods: array of time periods expressed in milliseconds for which
> +#           dirty-sample measurements are collected
> +#
> +# @n-dirty-pages: number of pages among all sampled pages that were observed
> +#                 as changed after respective time period
> +#

Each field addition needs a "(Since ....)" tag with QEMU version

The docs probably ought to be explicit that the size of @periods
array is the same as @n-dirty-pages array.

>  # @vcpu-dirty-rate: dirtyrate for each vcpu if dirty-ring
>  #                   mode specified (Since 6.2)
>  #
> @@ -1842,6 +1861,12 @@
>             'calc-time': 'int64',
>             'sample-pages': 'uint64',
>             'mode': 'DirtyRateMeasureMode',
> +           'page-size': 'int64',
> +           '*n-total-pages': 'int64',
> +           '*n-sampled-pages': 'int64',
> +           '*n-zero-pages': 'int64',
> +           '*periods': ['int64'],
> +           '*n-dirty-pages': ['int64'],
>             '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
>  
>  ##
> -- 
> 2.30.2
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Migration time prediction using calc-dirty-rate
  2023-02-28 13:16 [PATCH 0/2] Migration time prediction using calc-dirty-rate Andrei Gudkov via
                   ` (6 preceding siblings ...)
  2023-04-18 13:25 ` Gudkov Andrei via
@ 2023-04-18 17:17 ` Daniel P. Berrangé
  2023-04-27 13:51   ` Gudkov Andrei via
  7 siblings, 1 reply; 12+ messages in thread
From: Daniel P. Berrangé @ 2023-04-18 17:17 UTC (permalink / raw)
  To: Andrei Gudkov; +Cc: qemu-devel, quintela, dgilbert

On Tue, Feb 28, 2023 at 04:16:01PM +0300, Andrei Gudkov via wrote:
> Summary of calc-dirty-rate changes:
> 
> 1. The most important change is that now calc-dirty-rate produces
>    a *vector* of dirty page measurements for progressively increasing time
>    periods: 125ms, 250, 500, 750, 1000, 1500, .., up to specified calc-time.
>    The motivation behind such change is that number of dirtied pages as
>    a function of time starting from "clean state" (new migration iteration)
>    is far from linear. Shape of this function depends on the workload type
>    and intensity. Measuring number of dirty pages at progressively
>    increasing periods allows to reconstruct this function using piece-wise
>    interpolation.
> 
> 2. New metric added -- number of all-zero pages.
>    Predictor needs to distinguish between number of zero and non-zero pages
>    because during migration only 8 byte header is placed on the wire for
>    all-zero page.
> 
> 3. Hashing function was changed from CRC32 to xxHash.
>    This reduces overhead of sampling by ~10 times, which is important since
>    now some of the measurement periods are sub-second.

Very good !

> 
> 4. Other trivial metrics were added for convenience: total number
>    of VM pages, number of sampled pages, page size.
> 
> 
> After these changes output from calc-dirty-rate looks like this:
> 
> {
>   "page-size": 4096,
>   "periods": [125, 250, 375, 500, 750, 1000, 1500,
>               2000, 3000, 4001, 6000, 8000, 10000,
>               15000, 20000, 25000, 30000, 35000,
>               40000, 45000, 50000, 60000],
>   "status": "measured",
>   "sample-pages": 512,
>   "dirty-rate": 98,
>   "mode": "page-sampling",
>   "n-dirty-pages": [33, 78, 119, 151, 217, 236, 293, 336,
>                     425, 505, 620, 756, 898, 1204, 1457,
>                     1723, 1934, 2141, 2328, 2522, 2675, 2958],
>   "n-sampled-pages": 16392,
>   "n-zero-pages": 10060,
>   "n-total-pages": 8392704,
>   "start-time": 2916750,
>   "calc-time": 60
> }

Ok, so "periods" and "n-dirty-pages" pages arrays correlate with
each other.

> 
> Passing this data into prediction script, we get the following estimations:
> 
> Downtime> |    125ms |    250ms |    500ms |   1000ms |   5000ms |    unlim
> ---------------------------------------------------------------------------
>  100 Mbps |        - |        - |        - |        - |        - |   16m59s  
>    1 Gbps |        - |        - |        - |        - |        - |    1m40s
>    2 Gbps |        - |        - |        - |        - |    1m41s |      50s  
>  2.5 Gbps |        - |        - |        - |        - |    1m07s |      40s
>    5 Gbps |      48s |      46s |      31s |      28s |      25s |      20s
>   10 Gbps |      13s |      12s |      12s |      12s |      12s |      10s
>   25 Gbps |       5s |       5s |       5s |       5s |       4s |       4s
>   40 Gbps |       3s |       3s |       3s |       3s |       3s |       3s

This is fascinating and really helpful as an idea. It so nicely
shows the when it is not even worth bothering to try to start the
migrate unless you're willing to put up with large (5 sec) downtime.
or use autoconverge/post-copy.

I wonder if the calc-dirty-rate measurements also give enough info
to predict the likely number/duration of async page fetches needed
during post-copy phase ? Or does this give enough info to predict
how far down auto-converge should throttle the guest to enable
convergance.

> Quality of prediction was tested with YCSB benchmark. Memcached instance
> was installed into 32GiB VM, and a client generated a stream of requests.
> Between experiments we varied request size distribution, number of threads,
> and location of the client (inside or outside the VM).
> After short preheat phase, we measured calc-dirty-rate:
> 1. {"execute": "calc-dirty-rate", "arguments":{"calc-time":60}}
> 2. Wait 60 seconds
> 3. Collect results with {"execute": "query-dirty-rate"}
> 
> Afterwards we tried to migrate VM after randomly selecting max downtime
> and bandwidth limit. Typical prediction error is 6-7%, with only 180 out
> of 5779 experiments failing badly: prediction error >=25% or incorrectly
> predicting migration success when in fact it didn't converge.

Nice results


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Migration time prediction using calc-dirty-rate
  2023-04-18 13:25 ` Gudkov Andrei via
@ 2023-04-18 17:21   ` Daniel P. Berrangé
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel P. Berrangé @ 2023-04-18 17:21 UTC (permalink / raw)
  To: Gudkov Andrei; +Cc: qemu-devel, quintela, dgilbert, jsnow, eblake

Juan,

This series could use some feedback from the migration maintainer
POV. I think it looks like a valuable idea to take which could
significantly help mgmt apps plan migration.

Daniel

On Tue, Apr 18, 2023 at 01:25:08PM +0000, Gudkov Andrei via wrote:
> ping5
> 
> https://patchew.org/QEMU/cover.1677589218.git.gudkov.andrei@huawei.com/
> 
> -----Original Message-----
> From: Gudkov Andrei 
> Sent: Monday, April 10, 2023 18:19
> To: 'qemu-devel@nongnu.org' <qemu-devel@nongnu.org>
> Cc: 'quintela@redhat.com' <quintela@redhat.com>; 'dgilbert@redhat.com' <dgilbert@redhat.com>; 'jsnow@redhat.com' <jsnow@redhat.com>; 'eblake@redhat.com' <eblake@redhat.com>
> Subject: RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate
> 
> ping4
> 
> https://patchew.org/QEMU/cover.1677589218.git.gudkov.andrei@huawei.com/
> 
> -----Original Message-----
> From: Gudkov Andrei 
> Sent: Monday, April 3, 2023 17:42
> To: 'qemu-devel@nongnu.org' <qemu-devel@nongnu.org>
> Cc: 'quintela@redhat.com' <quintela@redhat.com>; 'dgilbert@redhat.com' <dgilbert@redhat.com>
> Subject: RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate
> 
> ping3
> 
> https://patchew.org/QEMU/cover.1677589218.git.gudkov.andrei@huawei.com/
> 
> -----Original Message-----
> From: Gudkov Andrei 
> Sent: Monday, March 27, 2023 17:09
> To: 'qemu-devel@nongnu.org' <qemu-devel@nongnu.org>
> Cc: 'quintela@redhat.com' <quintela@redhat.com>; 'dgilbert@redhat.com' <dgilbert@redhat.com>
> Subject: RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate
> 
> ping2
> 
> https://patchew.org/QEMU/cover.1677589218.git.gudkov.andrei@huawei.com/
> 
> -----Original Message-----
> From: Gudkov Andrei 
> Sent: Friday, March 17, 2023 16:29
> To: qemu-devel@nongnu.org
> Cc: quintela@redhat.com; dgilbert@redhat.com
> Subject: RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate
> 
> ping
> 
> https://patchew.org/QEMU/cover.1677589218.git.gudkov.andrei@huawei.com/
> 
> -----Original Message-----
> From: Gudkov Andrei 
> Sent: Tuesday, February 28, 2023 16:16
> To: qemu-devel@nongnu.org
> Cc: quintela@redhat.com; dgilbert@redhat.com; Gudkov Andrei <gudkov.andrei@huawei.com>
> Subject: [PATCH 0/2] Migration time prediction using calc-dirty-rate
> 
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate
  2023-04-18 17:17 ` Daniel P. Berrangé
@ 2023-04-27 13:51   ` Gudkov Andrei via
  0 siblings, 0 replies; 12+ messages in thread
From: Gudkov Andrei via @ 2023-04-27 13:51 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, quintela, dgilbert

Thank you for the review. I submitted new version of the patch:
https://patchew.org/QEMU/cover.1682598010.git.gudkov.andrei@huawei.com/

> -----Original Message-----
> From: Daniel P. Berrangé [mailto:berrange@redhat.com]
> Sent: Tuesday, April 18, 2023 20:18
> To: Gudkov Andrei <gudkov.andrei@huawei.com>
> Cc: qemu-devel@nongnu.org; quintela@redhat.com; dgilbert@redhat.com
> Subject: Re: [PATCH 0/2] Migration time prediction using calc-dirty-rate
> 
> On Tue, Feb 28, 2023 at 04:16:01PM +0300, Andrei Gudkov via wrote:
> > Summary of calc-dirty-rate changes:
> >
> > 1. The most important change is that now calc-dirty-rate produces
> >    a *vector* of dirty page measurements for progressively increasing time
> >    periods: 125ms, 250, 500, 750, 1000, 1500, .., up to specified calc-time.
> >    The motivation behind such change is that number of dirtied pages as
> >    a function of time starting from "clean state" (new migration iteration)
> >    is far from linear. Shape of this function depends on the workload type
> >    and intensity. Measuring number of dirty pages at progressively
> >    increasing periods allows to reconstruct this function using piece-wise
> >    interpolation.
> >
> > 2. New metric added -- number of all-zero pages.
> >    Predictor needs to distinguish between number of zero and non-zero pages
> >    because during migration only 8 byte header is placed on the wire for
> >    all-zero page.
> >
> > 3. Hashing function was changed from CRC32 to xxHash.
> >    This reduces overhead of sampling by ~10 times, which is important since
> >    now some of the measurement periods are sub-second.
> 
> Very good !
> 
> >
> > 4. Other trivial metrics were added for convenience: total number
> >    of VM pages, number of sampled pages, page size.
> >
> >
> > After these changes output from calc-dirty-rate looks like this:
> >
> > {
> >   "page-size": 4096,
> >   "periods": [125, 250, 375, 500, 750, 1000, 1500,
> >               2000, 3000, 4001, 6000, 8000, 10000,
> >               15000, 20000, 25000, 30000, 35000,
> >               40000, 45000, 50000, 60000],
> >   "status": "measured",
> >   "sample-pages": 512,
> >   "dirty-rate": 98,
> >   "mode": "page-sampling",
> >   "n-dirty-pages": [33, 78, 119, 151, 217, 236, 293, 336,
> >                     425, 505, 620, 756, 898, 1204, 1457,
> >                     1723, 1934, 2141, 2328, 2522, 2675, 2958],
> >   "n-sampled-pages": 16392,
> >   "n-zero-pages": 10060,
> >   "n-total-pages": 8392704,
> >   "start-time": 2916750,
> >   "calc-time": 60
> > }
> 
> Ok, so "periods" and "n-dirty-pages" pages arrays correlate with
> each other.
> 
> >
> > Passing this data into prediction script, we get the following estimations:
> >
> > Downtime> |    125ms |    250ms |    500ms |   1000ms |   5000ms |    unlim
> > ---------------------------------------------------------------------------
> >  100 Mbps |        - |        - |        - |        - |        - |   16m59s
> >    1 Gbps |        - |        - |        - |        - |        - |    1m40s
> >    2 Gbps |        - |        - |        - |        - |    1m41s |      50s
> >  2.5 Gbps |        - |        - |        - |        - |    1m07s |      40s
> >    5 Gbps |      48s |      46s |      31s |      28s |      25s |      20s
> >   10 Gbps |      13s |      12s |      12s |      12s |      12s |      10s
> >   25 Gbps |       5s |       5s |       5s |       5s |       4s |       4s
> >   40 Gbps |       3s |       3s |       3s |       3s |       3s |       3s
> 
> This is fascinating and really helpful as an idea. It so nicely
> shows the when it is not even worth bothering to try to start the
> migrate unless you're willing to put up with large (5 sec) downtime.
> or use autoconverge/post-copy.
> 
> I wonder if the calc-dirty-rate measurements also give enough info
> to predict the likely number/duration of async page fetches needed
> during post-copy phase ? Or does this give enough info to predict
> how far down auto-converge should throttle the guest to enable
> convergance.

I also was thinking about supporting more migration features.
Currently my understanding is the following:

1. It *should* be possible to support throttling directly inside the
   prediction script without any changes to calc-dirty-rate. Maybe we can
   suggest the level of throttling required to achieve target downtime.

2. Support for compression would be harder because we would have to know
   average compression ratio and compression speed. This would require
   more changes to calc-dirty-rate.

3. To support post-copy, we would need to know network characteristics, namely
   latency and jitter. Both can be quite unstable unless source and target
   hosts are located very close in network topology.

> 
> > Quality of prediction was tested with YCSB benchmark. Memcached instance
> > was installed into 32GiB VM, and a client generated a stream of requests.
> > Between experiments we varied request size distribution, number of threads,
> > and location of the client (inside or outside the VM).
> > After short preheat phase, we measured calc-dirty-rate:
> > 1. {"execute": "calc-dirty-rate", "arguments":{"calc-time":60}}
> > 2. Wait 60 seconds
> > 3. Collect results with {"execute": "query-dirty-rate"}
> >
> > Afterwards we tried to migrate VM after randomly selecting max downtime
> > and bandwidth limit. Typical prediction error is 6-7%, with only 180 out
> > of 5779 experiments failing badly: prediction error >=25% or incorrectly
> > predicting migration success when in fact it didn't converge.
> 
> Nice results
> 
> 
> With regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-04-27 13:52 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-28 13:16 [PATCH 0/2] Migration time prediction using calc-dirty-rate Andrei Gudkov via
2023-02-28 13:16 ` [PATCH 1/2] migration/calc-dirty-rate: new metrics in sampling mode Andrei Gudkov via
2023-04-18 17:11   ` Daniel P. Berrangé
2023-02-28 13:16 ` [PATCH 2/2] migration/calc-dirty-rate: tool to predict migration time Andrei Gudkov via
2023-03-17 13:29 ` [PATCH 0/2] Migration time prediction using calc-dirty-rate Gudkov Andrei via
2023-03-27 14:08 ` Gudkov Andrei via
2023-04-03 14:41 ` Gudkov Andrei via
2023-04-10 15:19 ` Gudkov Andrei via
2023-04-18 13:25 ` Gudkov Andrei via
2023-04-18 17:21   ` Daniel P. Berrangé
2023-04-18 17:17 ` Daniel P. Berrangé
2023-04-27 13:51   ` Gudkov Andrei via

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).