[Qemu-devel] [PATCH v2 0/1] migration: calculate expected_downtime with ram_bytes

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v2 0/1] migration: calculate expected_downtime with ram_bytes_remaining()
@ 2018-04-17 13:23 Balamuruhan S
  2018-04-17 13:23 ` [Qemu-devel] [PATCH v2 1/1] " Balamuruhan S
  0 siblings, 1 reply; 17+ messages in thread
From: Balamuruhan S @ 2018-04-17 13:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, dgilbert, dgibson, amit.shah, Balamuruhan S

Hi,

v2:

There is some difference in expected_downtime value due to following
reason,

1. bandwidth and expected_downtime value are calculated in
migration_update_counters() during each iteration from
migration_thread() 

2. remaining ram is calculated in qmp_query_migrate() when we actually
call "info migrate"

This v2 patch where bandwidth, expected_downtime and remaining ram are
calculated in migration_update_counters(), retrieve the same value during
"info migrate". By this approach we get almost close enough value,

(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off
zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off
release-ram: off block: off return-path: off pause-before-switchover:
off x-multifd: off dirty-bitmaps: off 
Migration status: active
total time: 319737 milliseconds
expected downtime: 1054 milliseconds
setup: 16 milliseconds
transferred ram: 3669862 kbytes
throughput: 108.92 mbps
remaining ram: 14016 kbytes
total ram: 8388864 kbytes
duplicate: 2296276 pages
skipped: 0 pages
normal: 910639 pages
normal bytes: 3642556 kbytes
dirty sync count: 249
page size: 4 kbytes
dirty pages rate: 4626 pages

Calculation:
calculated value = (14016 * 8 ) / 108.92 = 1029.452809401 milliseconds
actual value = 1054 milliseconds

since v1:

use ram_bytes_remaining() instead of dirty_pages_rate * page_size to
calculate expected_downtime to be more accurate.

Regards,
Bala

Balamuruhan S (1):
  migration: calculate expected_downtime with ram_bytes_remaining()

 migration/migration.c | 6 +++---
 migration/migration.h | 1 +
 2 files changed, 4 insertions(+), 3 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-17 13:23 [Qemu-devel] [PATCH v2 0/1] migration: calculate expected_downtime with ram_bytes_remaining() Balamuruhan S
@ 2018-04-17 13:23 ` Balamuruhan S
  2018-04-18  0:55   ` David Gibson
  0 siblings, 1 reply; 17+ messages in thread
From: Balamuruhan S @ 2018-04-17 13:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, dgilbert, dgibson, amit.shah, Balamuruhan S

expected_downtime value is not accurate with dirty_pages_rate * page_size,
using ram_bytes_remaining would yeild it correct.

Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
---
 migration/migration.c | 6 +++---
 migration/migration.h | 1 +
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 52a5092add..4d866bb920 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
     }
 
     if (s->state != MIGRATION_STATUS_COMPLETED) {
-        info->ram->remaining = ram_bytes_remaining();
+        info->ram->remaining = s->ram_bytes_remaining;
         info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate;
     }
 }
@@ -2227,6 +2227,7 @@ static void migration_update_counters(MigrationState *s,
     transferred = qemu_ftell(s->to_dst_file) - s->iteration_initial_bytes;
     time_spent = current_time - s->iteration_start_time;
     bandwidth = (double)transferred / time_spent;
+    s->ram_bytes_remaining = ram_bytes_remaining();
     s->threshold_size = bandwidth * s->parameters.downtime_limit;
 
     s->mbps = (((double) transferred * 8.0) /
@@ -2237,8 +2238,7 @@ static void migration_update_counters(MigrationState *s,
      * recalculate. 10000 is a small enough number for our purposes
      */
     if (ram_counters.dirty_pages_rate && transferred > 10000) {
-        s->expected_downtime = ram_counters.dirty_pages_rate *
-            qemu_target_page_size() / bandwidth;
+        s->expected_downtime = s->ram_bytes_remaining / bandwidth;
     }
 
     qemu_file_reset_rate_limit(s->to_dst_file);
diff --git a/migration/migration.h b/migration/migration.h
index 8d2f320c48..8584f8e22e 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -128,6 +128,7 @@ struct MigrationState
     int64_t downtime_start;
     int64_t downtime;
     int64_t expected_downtime;
+    int64_t ram_bytes_remaining;
     bool enabled_capabilities[MIGRATION_CAPABILITY__MAX];
     int64_t setup_time;
     /*
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-17 13:23 ` [Qemu-devel] [PATCH v2 1/1] " Balamuruhan S
@ 2018-04-18  0:55   ` David Gibson
  2018-04-18  0:57     ` David Gibson
  2018-04-18  6:52     ` Balamuruhan S
  0 siblings, 2 replies; 17+ messages in thread
From: David Gibson @ 2018-04-18  0:55 UTC (permalink / raw)
  To: Balamuruhan S; +Cc: qemu-devel, quintela, dgilbert, dgibson, amit.shah

[-- Attachment #1: Type: text/plain, Size: 2689 bytes --]

On Tue, Apr 17, 2018 at 06:53:17PM +0530, Balamuruhan S wrote:
> expected_downtime value is not accurate with dirty_pages_rate * page_size,
> using ram_bytes_remaining would yeild it correct.

This commit message hasn't been changed since v1, but the patch is
doing something completely different.  I think most of the info from
your cover letter needs to be in here.

> 
> Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> ---
>  migration/migration.c | 6 +++---
>  migration/migration.h | 1 +
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 52a5092add..4d866bb920 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
>      }
>  
>      if (s->state != MIGRATION_STATUS_COMPLETED) {
> -        info->ram->remaining = ram_bytes_remaining();
> +        info->ram->remaining = s->ram_bytes_remaining;
>          info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate;
>      }
>  }
> @@ -2227,6 +2227,7 @@ static void migration_update_counters(MigrationState *s,
>      transferred = qemu_ftell(s->to_dst_file) - s->iteration_initial_bytes;
>      time_spent = current_time - s->iteration_start_time;
>      bandwidth = (double)transferred / time_spent;
> +    s->ram_bytes_remaining = ram_bytes_remaining();
>      s->threshold_size = bandwidth * s->parameters.downtime_limit;
>  
>      s->mbps = (((double) transferred * 8.0) /
> @@ -2237,8 +2238,7 @@ static void migration_update_counters(MigrationState *s,
>       * recalculate. 10000 is a small enough number for our purposes
>       */
>      if (ram_counters.dirty_pages_rate && transferred > 10000) {
> -        s->expected_downtime = ram_counters.dirty_pages_rate *
> -            qemu_target_page_size() / bandwidth;
> +        s->expected_downtime = s->ram_bytes_remaining / bandwidth;
>      }
>  
>      qemu_file_reset_rate_limit(s->to_dst_file);
> diff --git a/migration/migration.h b/migration/migration.h
> index 8d2f320c48..8584f8e22e 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -128,6 +128,7 @@ struct MigrationState
>      int64_t downtime_start;
>      int64_t downtime;
>      int64_t expected_downtime;
> +    int64_t ram_bytes_remaining;
>      bool enabled_capabilities[MIGRATION_CAPABILITY__MAX];
>      int64_t setup_time;
>      /*

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-18  0:55   ` David Gibson
@ 2018-04-18  0:57     ` David Gibson
  2018-04-18  6:46       ` Balamuruhan S
  2018-04-18  6:52     ` Balamuruhan S
  1 sibling, 1 reply; 17+ messages in thread
From: David Gibson @ 2018-04-18  0:57 UTC (permalink / raw)
  To: Balamuruhan S; +Cc: qemu-devel, quintela, dgilbert, dgibson, amit.shah

[-- Attachment #1: Type: text/plain, Size: 3013 bytes --]

On Wed, Apr 18, 2018 at 10:55:50AM +1000, David Gibson wrote:
> On Tue, Apr 17, 2018 at 06:53:17PM +0530, Balamuruhan S wrote:
> > expected_downtime value is not accurate with dirty_pages_rate * page_size,
> > using ram_bytes_remaining would yeild it correct.
> 
> This commit message hasn't been changed since v1, but the patch is
> doing something completely different.  I think most of the info from
> your cover letter needs to be in here.
> 
> > 
> > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> > ---
> >  migration/migration.c | 6 +++---
> >  migration/migration.h | 1 +
> >  2 files changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 52a5092add..4d866bb920 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
> >      }
> >  
> >      if (s->state != MIGRATION_STATUS_COMPLETED) {
> > -        info->ram->remaining = ram_bytes_remaining();
> > +        info->ram->remaining = s->ram_bytes_remaining;
> >          info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate;
> >      }
> >  }
> > @@ -2227,6 +2227,7 @@ static void migration_update_counters(MigrationState *s,
> >      transferred = qemu_ftell(s->to_dst_file) - s->iteration_initial_bytes;
> >      time_spent = current_time - s->iteration_start_time;
> >      bandwidth = (double)transferred / time_spent;
> > +    s->ram_bytes_remaining = ram_bytes_remaining();
> >      s->threshold_size = bandwidth * s->parameters.downtime_limit;
> >  
> >      s->mbps = (((double) transferred * 8.0) /
> > @@ -2237,8 +2238,7 @@ static void migration_update_counters(MigrationState *s,
> >       * recalculate. 10000 is a small enough number for our purposes
> >       */
> >      if (ram_counters.dirty_pages_rate && transferred > 10000) {
> > -        s->expected_downtime = ram_counters.dirty_pages_rate *
> > -            qemu_target_page_size() / bandwidth;
> > +        s->expected_downtime = s->ram_bytes_remaining / bandwidth;
> >      }

..but more importantly, I still think this change is bogus.  expected
downtime is not the same thing as remaining ram / bandwidth.

> >  
> >      qemu_file_reset_rate_limit(s->to_dst_file);
> > diff --git a/migration/migration.h b/migration/migration.h
> > index 8d2f320c48..8584f8e22e 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -128,6 +128,7 @@ struct MigrationState
> >      int64_t downtime_start;
> >      int64_t downtime;
> >      int64_t expected_downtime;
> > +    int64_t ram_bytes_remaining;
> >      bool enabled_capabilities[MIGRATION_CAPABILITY__MAX];
> >      int64_t setup_time;
> >      /*
> 



-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-18  0:57     ` David Gibson
@ 2018-04-18  6:46       ` Balamuruhan S
  2018-04-18  8:36         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 17+ messages in thread
From: Balamuruhan S @ 2018-04-18  6:46 UTC (permalink / raw)
  To: David Gibson; +Cc: dgilbert, amit.shah, mdroth, quintela, qemu-devel

On Wed, Apr 18, 2018 at 10:57:26AM +1000, David Gibson wrote:
> On Wed, Apr 18, 2018 at 10:55:50AM +1000, David Gibson wrote:
> > On Tue, Apr 17, 2018 at 06:53:17PM +0530, Balamuruhan S wrote:
> > > expected_downtime value is not accurate with dirty_pages_rate * page_size,
> > > using ram_bytes_remaining would yeild it correct.
> > 
> > This commit message hasn't been changed since v1, but the patch is
> > doing something completely different.  I think most of the info from
> > your cover letter needs to be in here.
> > 
> > > 
> > > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> > > ---
> > >  migration/migration.c | 6 +++---
> > >  migration/migration.h | 1 +
> > >  2 files changed, 4 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 52a5092add..4d866bb920 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
> > >      }
> > >  
> > >      if (s->state != MIGRATION_STATUS_COMPLETED) {
> > > -        info->ram->remaining = ram_bytes_remaining();
> > > +        info->ram->remaining = s->ram_bytes_remaining;
> > >          info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate;
> > >      }
> > >  }
> > > @@ -2227,6 +2227,7 @@ static void migration_update_counters(MigrationState *s,
> > >      transferred = qemu_ftell(s->to_dst_file) - s->iteration_initial_bytes;
> > >      time_spent = current_time - s->iteration_start_time;
> > >      bandwidth = (double)transferred / time_spent;
> > > +    s->ram_bytes_remaining = ram_bytes_remaining();
> > >      s->threshold_size = bandwidth * s->parameters.downtime_limit;
> > >  
> > >      s->mbps = (((double) transferred * 8.0) /
> > > @@ -2237,8 +2238,7 @@ static void migration_update_counters(MigrationState *s,
> > >       * recalculate. 10000 is a small enough number for our purposes
> > >       */
> > >      if (ram_counters.dirty_pages_rate && transferred > 10000) {
> > > -        s->expected_downtime = ram_counters.dirty_pages_rate *
> > > -            qemu_target_page_size() / bandwidth;
> > > +        s->expected_downtime = s->ram_bytes_remaining / bandwidth;
> > >      }
> 
> ..but more importantly, I still think this change is bogus.  expected
> downtime is not the same thing as remaining ram / bandwidth.

I tested precopy migration of 16M HP backed P8 guest from P8 to 1G P9 host
and observed precopy migration was infinite with expected_downtime set as
downtime-limit.

During the discussion for Bug RH1560562, Michael Roth quoted that

One thing to note: in my testing I found that the "expected downtime" value
seems inaccurate in this scenario. To find a max downtime that allowed
migration to complete I had to divide "remaining ram" by "throughput" from
"info migrate" (after the initial pre-copy pass through ram, i.e. once
"dirty pages" value starts getting reported and we're just sending dirtied
pages).

Later by trying it precopy migration could able to complete with this
approach.

adding Michael Roth in cc.

Regards,
Bala

> 
> > >  
> > >      qemu_file_reset_rate_limit(s->to_dst_file);
> > > diff --git a/migration/migration.h b/migration/migration.h
> > > index 8d2f320c48..8584f8e22e 100644
> > > --- a/migration/migration.h
> > > +++ b/migration/migration.h
> > > @@ -128,6 +128,7 @@ struct MigrationState
> > >      int64_t downtime_start;
> > >      int64_t downtime;
> > >      int64_t expected_downtime;
> > > +    int64_t ram_bytes_remaining;
> > >      bool enabled_capabilities[MIGRATION_CAPABILITY__MAX];
> > >      int64_t setup_time;
> > >      /*
> > 
> 
> 
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-18  0:55   ` David Gibson
  2018-04-18  0:57     ` David Gibson
@ 2018-04-18  6:52     ` Balamuruhan S
  1 sibling, 0 replies; 17+ messages in thread
From: Balamuruhan S @ 2018-04-18  6:52 UTC (permalink / raw)
  To: David Gibson; +Cc: dgilbert, amit.shah, quintela, qemu-devel

On Wed, Apr 18, 2018 at 10:55:50AM +1000, David Gibson wrote:
> On Tue, Apr 17, 2018 at 06:53:17PM +0530, Balamuruhan S wrote:
> > expected_downtime value is not accurate with dirty_pages_rate * page_size,
> > using ram_bytes_remaining would yeild it correct.
> 
> This commit message hasn't been changed since v1, but the patch is
> doing something completely different.  I think most of the info from
> your cover letter needs to be in here.

Sure, I will make the change as suggested.

> 
> > 
> > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> > ---
> >  migration/migration.c | 6 +++---
> >  migration/migration.h | 1 +
> >  2 files changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 52a5092add..4d866bb920 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
> >      }
> >  
> >      if (s->state != MIGRATION_STATUS_COMPLETED) {
> > -        info->ram->remaining = ram_bytes_remaining();
> > +        info->ram->remaining = s->ram_bytes_remaining;
> >          info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate;
> >      }
> >  }
> > @@ -2227,6 +2227,7 @@ static void migration_update_counters(MigrationState *s,
> >      transferred = qemu_ftell(s->to_dst_file) - s->iteration_initial_bytes;
> >      time_spent = current_time - s->iteration_start_time;
> >      bandwidth = (double)transferred / time_spent;
> > +    s->ram_bytes_remaining = ram_bytes_remaining();
> >      s->threshold_size = bandwidth * s->parameters.downtime_limit;
> >  
> >      s->mbps = (((double) transferred * 8.0) /
> > @@ -2237,8 +2238,7 @@ static void migration_update_counters(MigrationState *s,
> >       * recalculate. 10000 is a small enough number for our purposes
> >       */
> >      if (ram_counters.dirty_pages_rate && transferred > 10000) {
> > -        s->expected_downtime = ram_counters.dirty_pages_rate *
> > -            qemu_target_page_size() / bandwidth;
> > +        s->expected_downtime = s->ram_bytes_remaining / bandwidth;
> >      }
> >  
> >      qemu_file_reset_rate_limit(s->to_dst_file);
> > diff --git a/migration/migration.h b/migration/migration.h
> > index 8d2f320c48..8584f8e22e 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -128,6 +128,7 @@ struct MigrationState
> >      int64_t downtime_start;
> >      int64_t downtime;
> >      int64_t expected_downtime;
> > +    int64_t ram_bytes_remaining;
> >      bool enabled_capabilities[MIGRATION_CAPABILITY__MAX];
> >      int64_t setup_time;
> >      /*
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-18  6:46       ` Balamuruhan S
@ 2018-04-18  8:36         ` Dr. David Alan Gilbert
  2018-04-19  4:44           ` Balamuruhan S
  0 siblings, 1 reply; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2018-04-18  8:36 UTC (permalink / raw)
  To: Balamuruhan S; +Cc: David Gibson, amit.shah, mdroth, quintela, qemu-devel

* Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:
> On Wed, Apr 18, 2018 at 10:57:26AM +1000, David Gibson wrote:
> > On Wed, Apr 18, 2018 at 10:55:50AM +1000, David Gibson wrote:
> > > On Tue, Apr 17, 2018 at 06:53:17PM +0530, Balamuruhan S wrote:
> > > > expected_downtime value is not accurate with dirty_pages_rate * page_size,
> > > > using ram_bytes_remaining would yeild it correct.
> > > 
> > > This commit message hasn't been changed since v1, but the patch is
> > > doing something completely different.  I think most of the info from
> > > your cover letter needs to be in here.
> > > 
> > > > 
> > > > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> > > > ---
> > > >  migration/migration.c | 6 +++---
> > > >  migration/migration.h | 1 +
> > > >  2 files changed, 4 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > index 52a5092add..4d866bb920 100644
> > > > --- a/migration/migration.c
> > > > +++ b/migration/migration.c
> > > > @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
> > > >      }
> > > >  
> > > >      if (s->state != MIGRATION_STATUS_COMPLETED) {
> > > > -        info->ram->remaining = ram_bytes_remaining();
> > > > +        info->ram->remaining = s->ram_bytes_remaining;
> > > >          info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate;
> > > >      }
> > > >  }
> > > > @@ -2227,6 +2227,7 @@ static void migration_update_counters(MigrationState *s,
> > > >      transferred = qemu_ftell(s->to_dst_file) - s->iteration_initial_bytes;
> > > >      time_spent = current_time - s->iteration_start_time;
> > > >      bandwidth = (double)transferred / time_spent;
> > > > +    s->ram_bytes_remaining = ram_bytes_remaining();
> > > >      s->threshold_size = bandwidth * s->parameters.downtime_limit;
> > > >  
> > > >      s->mbps = (((double) transferred * 8.0) /
> > > > @@ -2237,8 +2238,7 @@ static void migration_update_counters(MigrationState *s,
> > > >       * recalculate. 10000 is a small enough number for our purposes
> > > >       */
> > > >      if (ram_counters.dirty_pages_rate && transferred > 10000) {
> > > > -        s->expected_downtime = ram_counters.dirty_pages_rate *
> > > > -            qemu_target_page_size() / bandwidth;
> > > > +        s->expected_downtime = s->ram_bytes_remaining / bandwidth;
> > > >      }
> > 
> > ..but more importantly, I still think this change is bogus.  expected
> > downtime is not the same thing as remaining ram / bandwidth.
> 
> I tested precopy migration of 16M HP backed P8 guest from P8 to 1G P9 host
> and observed precopy migration was infinite with expected_downtime set as
> downtime-limit.

Did you debug why it was infinite? Which component of the calculation
had gone wrong and why?

> During the discussion for Bug RH1560562, Michael Roth quoted that
> 
> One thing to note: in my testing I found that the "expected downtime" value
> seems inaccurate in this scenario. To find a max downtime that allowed
> migration to complete I had to divide "remaining ram" by "throughput" from
> "info migrate" (after the initial pre-copy pass through ram, i.e. once
> "dirty pages" value starts getting reported and we're just sending dirtied
> pages).
> 
> Later by trying it precopy migration could able to complete with this
> approach.
> 
> adding Michael Roth in cc.

We should try and _understand_ the rational for the change, not just go
with it.  Now, remember that whatever we do is just an estimate and
there will be lots of cases where it's bad - so be careful what you're
using it for - you definitely should NOT use the value in any automated
system.
My problem with just using ram_bytes_remaining is that it doesn't take
into account the rate at which the guest is changing RAM - which feels
like it's the important measure for expected downtime.

Dave

> Regards,
> Bala
> 
> > 
> > > >  
> > > >      qemu_file_reset_rate_limit(s->to_dst_file);
> > > > diff --git a/migration/migration.h b/migration/migration.h
> > > > index 8d2f320c48..8584f8e22e 100644
> > > > --- a/migration/migration.h
> > > > +++ b/migration/migration.h
> > > > @@ -128,6 +128,7 @@ struct MigrationState
> > > >      int64_t downtime_start;
> > > >      int64_t downtime;
> > > >      int64_t expected_downtime;
> > > > +    int64_t ram_bytes_remaining;
> > > >      bool enabled_capabilities[MIGRATION_CAPABILITY__MAX];
> > > >      int64_t setup_time;
> > > >      /*
> > > 
> > 
> > 
> > 
> > -- 
> > David Gibson			| I'll have my music baroque, and my code
> > david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> > 				| _way_ _around_!
> > http://www.ozlabs.org/~dgibson
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-18  8:36         ` Dr. David Alan Gilbert
@ 2018-04-19  4:44           ` Balamuruhan S
  2018-04-19 11:24             ` Dr. David Alan Gilbert
  2018-04-19 11:48             ` David Gibson
  0 siblings, 2 replies; 17+ messages in thread
From: Balamuruhan S @ 2018-04-19  4:44 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: amit.shah, quintela, qemu-devel, david

On Wed, Apr 18, 2018 at 09:36:33AM +0100, Dr. David Alan Gilbert wrote:
> * Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:
> > On Wed, Apr 18, 2018 at 10:57:26AM +1000, David Gibson wrote:
> > > On Wed, Apr 18, 2018 at 10:55:50AM +1000, David Gibson wrote:
> > > > On Tue, Apr 17, 2018 at 06:53:17PM +0530, Balamuruhan S wrote:
> > > > > expected_downtime value is not accurate with dirty_pages_rate * page_size,
> > > > > using ram_bytes_remaining would yeild it correct.
> > > > 
> > > > This commit message hasn't been changed since v1, but the patch is
> > > > doing something completely different.  I think most of the info from
> > > > your cover letter needs to be in here.
> > > > 
> > > > > 
> > > > > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> > > > > ---
> > > > >  migration/migration.c | 6 +++---
> > > > >  migration/migration.h | 1 +
> > > > >  2 files changed, 4 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > > index 52a5092add..4d866bb920 100644
> > > > > --- a/migration/migration.c
> > > > > +++ b/migration/migration.c
> > > > > @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
> > > > >      }
> > > > >  
> > > > >      if (s->state != MIGRATION_STATUS_COMPLETED) {
> > > > > -        info->ram->remaining = ram_bytes_remaining();
> > > > > +        info->ram->remaining = s->ram_bytes_remaining;
> > > > >          info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate;
> > > > >      }
> > > > >  }
> > > > > @@ -2227,6 +2227,7 @@ static void migration_update_counters(MigrationState *s,
> > > > >      transferred = qemu_ftell(s->to_dst_file) - s->iteration_initial_bytes;
> > > > >      time_spent = current_time - s->iteration_start_time;
> > > > >      bandwidth = (double)transferred / time_spent;
> > > > > +    s->ram_bytes_remaining = ram_bytes_remaining();
> > > > >      s->threshold_size = bandwidth * s->parameters.downtime_limit;
> > > > >  
> > > > >      s->mbps = (((double) transferred * 8.0) /
> > > > > @@ -2237,8 +2238,7 @@ static void migration_update_counters(MigrationState *s,
> > > > >       * recalculate. 10000 is a small enough number for our purposes
> > > > >       */
> > > > >      if (ram_counters.dirty_pages_rate && transferred > 10000) {
> > > > > -        s->expected_downtime = ram_counters.dirty_pages_rate *
> > > > > -            qemu_target_page_size() / bandwidth;
> > > > > +        s->expected_downtime = s->ram_bytes_remaining / bandwidth;
> > > > >      }
> > > 
> > > ..but more importantly, I still think this change is bogus.  expected
> > > downtime is not the same thing as remaining ram / bandwidth.
> > 
> > I tested precopy migration of 16M HP backed P8 guest from P8 to 1G P9 host
> > and observed precopy migration was infinite with expected_downtime set as
> > downtime-limit.
> 
> Did you debug why it was infinite? Which component of the calculation
> had gone wrong and why?
> 
> > During the discussion for Bug RH1560562, Michael Roth quoted that
> > 
> > One thing to note: in my testing I found that the "expected downtime" value
> > seems inaccurate in this scenario. To find a max downtime that allowed
> > migration to complete I had to divide "remaining ram" by "throughput" from
> > "info migrate" (after the initial pre-copy pass through ram, i.e. once
> > "dirty pages" value starts getting reported and we're just sending dirtied
> > pages).
> > 
> > Later by trying it precopy migration could able to complete with this
> > approach.
> > 
> > adding Michael Roth in cc.
> 
> We should try and _understand_ the rational for the change, not just go
> with it.  Now, remember that whatever we do is just an estimate and

I have made the change based on my understanding,

Currently the calculation is,

expected_downtime = (dirty_pages_rate * qemu_target_page_size) / bandwidth

dirty_pages_rate = No of dirty pages / time =>  its unit (1 / seconds)
qemu_target_page_size => its unit (bytes)

dirty_pages_rate * qemu_target_page_size => bytes/seconds

bandwidth = bytes transferred / time => bytes/seconds

dividing this would not be a measurement of time.

> there will be lots of cases where it's bad - so be careful what you're
> using it for - you definitely should NOT use the value in any automated
> system.

I agree with it and I would not use it in automated system.

> My problem with just using ram_bytes_remaining is that it doesn't take
> into account the rate at which the guest is changing RAM - which feels
> like it's the important measure for expected downtime.

ram_bytes_remaining = ram_state->migration_dirty_pages * TARGET_PAGE_SIZE

This means ram_bytes_remaining is proportional to guest changing RAM, so
we can consider this change would yield expected_downtime

Regards,
Bala
> 
> Dave
> 
> > Regards,
> > Bala
> > 
> > > 
> > > > >  
> > > > >      qemu_file_reset_rate_limit(s->to_dst_file);
> > > > > diff --git a/migration/migration.h b/migration/migration.h
> > > > > index 8d2f320c48..8584f8e22e 100644
> > > > > --- a/migration/migration.h
> > > > > +++ b/migration/migration.h
> > > > > @@ -128,6 +128,7 @@ struct MigrationState
> > > > >      int64_t downtime_start;
> > > > >      int64_t downtime;
> > > > >      int64_t expected_downtime;
> > > > > +    int64_t ram_bytes_remaining;
> > > > >      bool enabled_capabilities[MIGRATION_CAPABILITY__MAX];
> > > > >      int64_t setup_time;
> > > > >      /*
> > > > 
> > > 
> > > 
> > > 
> > > -- 
> > > David Gibson			| I'll have my music baroque, and my code
> > > david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> > > 				| _way_ _around_!
> > > http://www.ozlabs.org/~dgibson
> > 
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-19  4:44           ` Balamuruhan S
@ 2018-04-19 11:24             ` Dr. David Alan Gilbert
  2018-04-20  5:47               ` David Gibson
  2018-04-19 11:48             ` David Gibson
  1 sibling, 1 reply; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2018-04-19 11:24 UTC (permalink / raw)
  To: Balamuruhan S; +Cc: quintela, qemu-devel, david

* Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:
> On Wed, Apr 18, 2018 at 09:36:33AM +0100, Dr. David Alan Gilbert wrote:
> > * Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:
> > > On Wed, Apr 18, 2018 at 10:57:26AM +1000, David Gibson wrote:
> > > > On Wed, Apr 18, 2018 at 10:55:50AM +1000, David Gibson wrote:
> > > > > On Tue, Apr 17, 2018 at 06:53:17PM +0530, Balamuruhan S wrote:
> > > > > > expected_downtime value is not accurate with dirty_pages_rate * page_size,
> > > > > > using ram_bytes_remaining would yeild it correct.
> > > > > 
> > > > > This commit message hasn't been changed since v1, but the patch is
> > > > > doing something completely different.  I think most of the info from
> > > > > your cover letter needs to be in here.
> > > > > 
> > > > > > 
> > > > > > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> > > > > > ---
> > > > > >  migration/migration.c | 6 +++---
> > > > > >  migration/migration.h | 1 +
> > > > > >  2 files changed, 4 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > > > index 52a5092add..4d866bb920 100644
> > > > > > --- a/migration/migration.c
> > > > > > +++ b/migration/migration.c
> > > > > > @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
> > > > > >      }
> > > > > >  
> > > > > >      if (s->state != MIGRATION_STATUS_COMPLETED) {
> > > > > > -        info->ram->remaining = ram_bytes_remaining();
> > > > > > +        info->ram->remaining = s->ram_bytes_remaining;
> > > > > >          info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate;
> > > > > >      }
> > > > > >  }
> > > > > > @@ -2227,6 +2227,7 @@ static void migration_update_counters(MigrationState *s,
> > > > > >      transferred = qemu_ftell(s->to_dst_file) - s->iteration_initial_bytes;
> > > > > >      time_spent = current_time - s->iteration_start_time;
> > > > > >      bandwidth = (double)transferred / time_spent;
> > > > > > +    s->ram_bytes_remaining = ram_bytes_remaining();
> > > > > >      s->threshold_size = bandwidth * s->parameters.downtime_limit;
> > > > > >  
> > > > > >      s->mbps = (((double) transferred * 8.0) /
> > > > > > @@ -2237,8 +2238,7 @@ static void migration_update_counters(MigrationState *s,
> > > > > >       * recalculate. 10000 is a small enough number for our purposes
> > > > > >       */
> > > > > >      if (ram_counters.dirty_pages_rate && transferred > 10000) {
> > > > > > -        s->expected_downtime = ram_counters.dirty_pages_rate *
> > > > > > -            qemu_target_page_size() / bandwidth;
> > > > > > +        s->expected_downtime = s->ram_bytes_remaining / bandwidth;
> > > > > >      }
> > > > 
> > > > ..but more importantly, I still think this change is bogus.  expected
> > > > downtime is not the same thing as remaining ram / bandwidth.
> > > 
> > > I tested precopy migration of 16M HP backed P8 guest from P8 to 1G P9 host
> > > and observed precopy migration was infinite with expected_downtime set as
> > > downtime-limit.
> > 
> > Did you debug why it was infinite? Which component of the calculation
> > had gone wrong and why?
> > 
> > > During the discussion for Bug RH1560562, Michael Roth quoted that
> > > 
> > > One thing to note: in my testing I found that the "expected downtime" value
> > > seems inaccurate in this scenario. To find a max downtime that allowed
> > > migration to complete I had to divide "remaining ram" by "throughput" from
> > > "info migrate" (after the initial pre-copy pass through ram, i.e. once
> > > "dirty pages" value starts getting reported and we're just sending dirtied
> > > pages).
> > > 
> > > Later by trying it precopy migration could able to complete with this
> > > approach.
> > > 
> > > adding Michael Roth in cc.
> > 
> > We should try and _understand_ the rational for the change, not just go
> > with it.  Now, remember that whatever we do is just an estimate and
> 
> I have made the change based on my understanding,
> 
> Currently the calculation is,
> 
> expected_downtime = (dirty_pages_rate * qemu_target_page_size) / bandwidth
> 
> dirty_pages_rate = No of dirty pages / time =>  its unit (1 / seconds)
> qemu_target_page_size => its unit (bytes)
> 
> dirty_pages_rate * qemu_target_page_size => bytes/seconds
> 
> bandwidth = bytes transferred / time => bytes/seconds
> 
> dividing this would not be a measurement of time.

OK, that argument makes sense to me about why it feels broken; but see
below.

> > there will be lots of cases where it's bad - so be careful what you're
> > using it for - you definitely should NOT use the value in any automated
> > system.
> 
> I agree with it and I would not use it in automated system.
> 
> > My problem with just using ram_bytes_remaining is that it doesn't take
> > into account the rate at which the guest is changing RAM - which feels
> > like it's the important measure for expected downtime.
> 
> ram_bytes_remaining = ram_state->migration_dirty_pages * TARGET_PAGE_SIZE
> 
> This means ram_bytes_remaining is proportional to guest changing RAM, so
> we can consider this change would yield expected_downtime

ram_bytes_remaining comes from the *current* number of dirty pages, so it
tells you how much you have to transmit, but if the guest wasn't
changing RAM, then that just tells you how much longer you have to keep
going - not the amount of downtime required.  e.g. right at the start of
migration you might have 16G of dirty-pages, but you don't need downtime
to transmit them all.

It's actually slightly different, because migration_update_counters is
called in the main iteration loop after an iteration and I think that
means it only ends up there either at the end of migration OR when
qemu_file_rate_limit(f) causes ram_save_iterate to return to the main
loop; so you've got the number of dirty pages when it's interrupted by
rate limiting.

So I don't think the use of ram_bytes_remaining is right either.

What is the right answer?
I'm not sure; but:

   a) If the bandwidth is lower then you can see the downtime should be
longer; so  having x/bandwidth  makes sense
   b) If the guest is dirtying RAM faster then you can see the downtime
should be longer;  so having  dirty_pages_rate on the top seems right.

So you can kind of see where the calculation above comes from.

I can't convince myself of any calculation that actually works!

Lets imagine a setup with a guest dirtying memory at 'Dr' Bytes/s
with the bandwidth (Bw), and we enter an iteration with
'Db' bytes dirty:

  The time for that iteration is:
     It   = Db / Bw

  during that time we've dirtied 'Dr' more RAM, so at the end of
it we have:
     Db' = Dr * It
         = Dr * Db
           -------
              Bw

But then if you follow that, in any case where Dr < Bw that iterates
down to Db' being ~0  irrespective of what that ration is - but that
makes no sense.

Dave

> Regards,
> Bala
> > 
> > Dave
> > 
> > > Regards,
> > > Bala
> > > 
> > > > 
> > > > > >  
> > > > > >      qemu_file_reset_rate_limit(s->to_dst_file);
> > > > > > diff --git a/migration/migration.h b/migration/migration.h
> > > > > > index 8d2f320c48..8584f8e22e 100644
> > > > > > --- a/migration/migration.h
> > > > > > +++ b/migration/migration.h
> > > > > > @@ -128,6 +128,7 @@ struct MigrationState
> > > > > >      int64_t downtime_start;
> > > > > >      int64_t downtime;
> > > > > >      int64_t expected_downtime;
> > > > > > +    int64_t ram_bytes_remaining;
> > > > > >      bool enabled_capabilities[MIGRATION_CAPABILITY__MAX];
> > > > > >      int64_t setup_time;
> > > > > >      /*
> > > > > 
> > > > 
> > > > 
> > > > 
> > > > -- 
> > > > David Gibson			| I'll have my music baroque, and my code
> > > > david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> > > > 				| _way_ _around_!
> > > > http://www.ozlabs.org/~dgibson
> > > 
> > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-19  4:44           ` Balamuruhan S
  2018-04-19 11:24             ` Dr. David Alan Gilbert
@ 2018-04-19 11:48             ` David Gibson
  2018-04-20 18:57               ` Dr. David Alan Gilbert
  2018-04-21 19:12               ` Balamuruhan S
  1 sibling, 2 replies; 17+ messages in thread
From: David Gibson @ 2018-04-19 11:48 UTC (permalink / raw)
  To: Balamuruhan S; +Cc: Dr. David Alan Gilbert, amit.shah, quintela, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 8313 bytes --]

On Thu, Apr 19, 2018 at 10:14:52AM +0530, Balamuruhan S wrote:
> On Wed, Apr 18, 2018 at 09:36:33AM +0100, Dr. David Alan Gilbert wrote:
> > * Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:
> > > On Wed, Apr 18, 2018 at 10:57:26AM +1000, David Gibson wrote:
> > > > On Wed, Apr 18, 2018 at 10:55:50AM +1000, David Gibson wrote:
> > > > > On Tue, Apr 17, 2018 at 06:53:17PM +0530, Balamuruhan S wrote:
> > > > > > expected_downtime value is not accurate with dirty_pages_rate * page_size,
> > > > > > using ram_bytes_remaining would yeild it correct.
> > > > > 
> > > > > This commit message hasn't been changed since v1, but the patch is
> > > > > doing something completely different.  I think most of the info from
> > > > > your cover letter needs to be in here.
> > > > > 
> > > > > > 
> > > > > > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> > > > > > ---
> > > > > >  migration/migration.c | 6 +++---
> > > > > >  migration/migration.h | 1 +
> > > > > >  2 files changed, 4 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > > > index 52a5092add..4d866bb920 100644
> > > > > > --- a/migration/migration.c
> > > > > > +++ b/migration/migration.c
> > > > > > @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
> > > > > >      }
> > > > > >  
> > > > > >      if (s->state != MIGRATION_STATUS_COMPLETED) {
> > > > > > -        info->ram->remaining = ram_bytes_remaining();
> > > > > > +        info->ram->remaining = s->ram_bytes_remaining;
> > > > > >          info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate;
> > > > > >      }
> > > > > >  }
> > > > > > @@ -2227,6 +2227,7 @@ static void migration_update_counters(MigrationState *s,
> > > > > >      transferred = qemu_ftell(s->to_dst_file) - s->iteration_initial_bytes;
> > > > > >      time_spent = current_time - s->iteration_start_time;
> > > > > >      bandwidth = (double)transferred / time_spent;
> > > > > > +    s->ram_bytes_remaining = ram_bytes_remaining();
> > > > > >      s->threshold_size = bandwidth * s->parameters.downtime_limit;
> > > > > >  
> > > > > >      s->mbps = (((double) transferred * 8.0) /
> > > > > > @@ -2237,8 +2238,7 @@ static void migration_update_counters(MigrationState *s,
> > > > > >       * recalculate. 10000 is a small enough number for our purposes
> > > > > >       */
> > > > > >      if (ram_counters.dirty_pages_rate && transferred > 10000) {
> > > > > > -        s->expected_downtime = ram_counters.dirty_pages_rate *
> > > > > > -            qemu_target_page_size() / bandwidth;
> > > > > > +        s->expected_downtime = s->ram_bytes_remaining / bandwidth;
> > > > > >      }
> > > > 
> > > > ..but more importantly, I still think this change is bogus.  expected
> > > > downtime is not the same thing as remaining ram / bandwidth.
> > > 
> > > I tested precopy migration of 16M HP backed P8 guest from P8 to 1G P9 host
> > > and observed precopy migration was infinite with expected_downtime set as
> > > downtime-limit.
> > 
> > Did you debug why it was infinite? Which component of the calculation
> > had gone wrong and why?
> > 
> > > During the discussion for Bug RH1560562, Michael Roth quoted that
> > > 
> > > One thing to note: in my testing I found that the "expected downtime" value
> > > seems inaccurate in this scenario. To find a max downtime that allowed
> > > migration to complete I had to divide "remaining ram" by "throughput" from
> > > "info migrate" (after the initial pre-copy pass through ram, i.e. once
> > > "dirty pages" value starts getting reported and we're just sending dirtied
> > > pages).
> > > 
> > > Later by trying it precopy migration could able to complete with this
> > > approach.
> > > 
> > > adding Michael Roth in cc.
> > 
> > We should try and _understand_ the rational for the change, not just go
> > with it.  Now, remember that whatever we do is just an estimate and
> 
> I have made the change based on my understanding,
> 
> Currently the calculation is,
> 
> expected_downtime = (dirty_pages_rate * qemu_target_page_size) / bandwidth
> 
> dirty_pages_rate = No of dirty pages / time =>  its unit (1 / seconds)
> qemu_target_page_size => its unit (bytes)
> 
> dirty_pages_rate * qemu_target_page_size => bytes/seconds
> 
> bandwidth = bytes transferred / time => bytes/seconds
> 
> dividing this would not be a measurement of time.

Hm, that's a good point, the units are not right here.  And thinking
about it more, it doesn't really make sense for it to be linear
either.  After all if the page dirty rate exceeds the bandwidth then
the expected downtime is infinite... well size of ram over bandwidth,
at least.

> > there will be lots of cases where it's bad - so be careful what you're
> > using it for - you definitely should NOT use the value in any automated
> > system.
> 
> I agree with it and I would not use it in automated system.
> 
> > My problem with just using ram_bytes_remaining is that it doesn't take
> > into account the rate at which the guest is changing RAM - which feels
> > like it's the important measure for expected downtime.
> 
> ram_bytes_remaining = ram_state->migration_dirty_pages * TARGET_PAGE_SIZE
> 
> This means ram_bytes_remaining is proportional to guest changing RAM, so
> we can consider this change would yield expected_downtime

Well, just because the existing estimate is wrong doesn't mean this
one is right.  Having the right units is a necessary but not
sufficient condition.

That said, I thought a bunch about this a bunch, and I think there is
a case to be made for it - although it's a lot more subtle than what's
been suggested so far.

So.  AFAICT the estimate of page dirty rate is based on the assumption
that page dirties are independent of each other - one page is as
likely to be dirtied as any other.  If we don't make that assumption,
I don't see how we can really have an estimate as a single number.

But if that's the assumption, then predicting downtime based on it is
futile: if the dirty rate is less than bandwidth, we can wait long
enough and make the downtime as small as we want.  If the dirty rate
is higher than bandwidth, then we don't converge and no downtime short
of (ram size / bandwidth) will be sufficient.

The only way a predicted downtime makes any sense is if we assume that
although the "instantaneous" dirty rate is high, the pages being
dirtied are within a working set that's substantially smaller than the
full RAM size.  In that case the expected down time becomes (working
set size / bandwidth).

Predicting downtime as (ram_bytes_remaining / bandwidth) is
essentially always wrong early in the migration, although it will be a
poor upper bound - it will basically give you the time to transfer all
RAM.

For a nicely converging migration it will also be wrong (but an upper
bound) until it isn't: it will gradually decrease until it dips below
the requested downtime threshold, at which point the migration
completes.

For a diverging migration with a working set, as discussed above,
ram_bytes_remaining will eventually converge on (roughly) the size of
that working set - it won't dip (much) below that, because we can't
keep up with the dirties within that working set.  At that point this
does become a reasonable estimate of the necessary downtime in order
to get the migration to complete, which I believe is the point of the
value.

So the question is: for the purposes of this value, is a gross
overestimate that gradually approaches a reasonable value good enough?

An estimate that would get closer, quicker would be (ram dirtied in
interval) / bandwidth.  Where (ram dirtied in interval) is a measure
of total ram dirtied over some measurement interval - only counting a
page once if its dirtied multiple times during the interval.  And
obviously you'd want some sort of averaging on that.  I think that
would be a bit of a pain to measure, though.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-19 11:24             ` Dr. David Alan Gilbert
@ 2018-04-20  5:47               ` David Gibson
  2018-04-20 10:28                 ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 17+ messages in thread
From: David Gibson @ 2018-04-20  5:47 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Balamuruhan S, quintela, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 8096 bytes --]

On Thu, Apr 19, 2018 at 12:24:04PM +0100, Dr. David Alan Gilbert wrote:
> * Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:
> > On Wed, Apr 18, 2018 at 09:36:33AM +0100, Dr. David Alan Gilbert wrote:
> > > * Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:
> > > > On Wed, Apr 18, 2018 at 10:57:26AM +1000, David Gibson wrote:
> > > > > On Wed, Apr 18, 2018 at 10:55:50AM +1000, David Gibson wrote:
> > > > > > On Tue, Apr 17, 2018 at 06:53:17PM +0530, Balamuruhan S wrote:
> > > > > > > expected_downtime value is not accurate with dirty_pages_rate * page_size,
> > > > > > > using ram_bytes_remaining would yeild it correct.
> > > > > > 
> > > > > > This commit message hasn't been changed since v1, but the patch is
> > > > > > doing something completely different.  I think most of the info from
> > > > > > your cover letter needs to be in here.
> > > > > > 
> > > > > > > 
> > > > > > > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> > > > > > > ---
> > > > > > >  migration/migration.c | 6 +++---
> > > > > > >  migration/migration.h | 1 +
> > > > > > >  2 files changed, 4 insertions(+), 3 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > > > > index 52a5092add..4d866bb920 100644
> > > > > > > --- a/migration/migration.c
> > > > > > > +++ b/migration/migration.c
> > > > > > > @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
> > > > > > >      }
> > > > > > >  
> > > > > > >      if (s->state != MIGRATION_STATUS_COMPLETED) {
> > > > > > > -        info->ram->remaining = ram_bytes_remaining();
> > > > > > > +        info->ram->remaining = s->ram_bytes_remaining;
> > > > > > >          info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate;
> > > > > > >      }
> > > > > > >  }
> > > > > > > @@ -2227,6 +2227,7 @@ static void migration_update_counters(MigrationState *s,
> > > > > > >      transferred = qemu_ftell(s->to_dst_file) - s->iteration_initial_bytes;
> > > > > > >      time_spent = current_time - s->iteration_start_time;
> > > > > > >      bandwidth = (double)transferred / time_spent;
> > > > > > > +    s->ram_bytes_remaining = ram_bytes_remaining();
> > > > > > >      s->threshold_size = bandwidth * s->parameters.downtime_limit;
> > > > > > >  
> > > > > > >      s->mbps = (((double) transferred * 8.0) /
> > > > > > > @@ -2237,8 +2238,7 @@ static void migration_update_counters(MigrationState *s,
> > > > > > >       * recalculate. 10000 is a small enough number for our purposes
> > > > > > >       */
> > > > > > >      if (ram_counters.dirty_pages_rate && transferred > 10000) {
> > > > > > > -        s->expected_downtime = ram_counters.dirty_pages_rate *
> > > > > > > -            qemu_target_page_size() / bandwidth;
> > > > > > > +        s->expected_downtime = s->ram_bytes_remaining / bandwidth;
> > > > > > >      }
> > > > > 
> > > > > ..but more importantly, I still think this change is bogus.  expected
> > > > > downtime is not the same thing as remaining ram / bandwidth.
> > > > 
> > > > I tested precopy migration of 16M HP backed P8 guest from P8 to 1G P9 host
> > > > and observed precopy migration was infinite with expected_downtime set as
> > > > downtime-limit.
> > > 
> > > Did you debug why it was infinite? Which component of the calculation
> > > had gone wrong and why?
> > > 
> > > > During the discussion for Bug RH1560562, Michael Roth quoted that
> > > > 
> > > > One thing to note: in my testing I found that the "expected downtime" value
> > > > seems inaccurate in this scenario. To find a max downtime that allowed
> > > > migration to complete I had to divide "remaining ram" by "throughput" from
> > > > "info migrate" (after the initial pre-copy pass through ram, i.e. once
> > > > "dirty pages" value starts getting reported and we're just sending dirtied
> > > > pages).
> > > > 
> > > > Later by trying it precopy migration could able to complete with this
> > > > approach.
> > > > 
> > > > adding Michael Roth in cc.
> > > 
> > > We should try and _understand_ the rational for the change, not just go
> > > with it.  Now, remember that whatever we do is just an estimate and
> > 
> > I have made the change based on my understanding,
> > 
> > Currently the calculation is,
> > 
> > expected_downtime = (dirty_pages_rate * qemu_target_page_size) / bandwidth
> > 
> > dirty_pages_rate = No of dirty pages / time =>  its unit (1 / seconds)
> > qemu_target_page_size => its unit (bytes)
> > 
> > dirty_pages_rate * qemu_target_page_size => bytes/seconds
> > 
> > bandwidth = bytes transferred / time => bytes/seconds
> > 
> > dividing this would not be a measurement of time.
> 
> OK, that argument makes sense to me about why it feels broken; but see
> below.
> 
> > > there will be lots of cases where it's bad - so be careful what you're
> > > using it for - you definitely should NOT use the value in any automated
> > > system.
> > 
> > I agree with it and I would not use it in automated system.
> > 
> > > My problem with just using ram_bytes_remaining is that it doesn't take
> > > into account the rate at which the guest is changing RAM - which feels
> > > like it's the important measure for expected downtime.
> > 
> > ram_bytes_remaining = ram_state->migration_dirty_pages * TARGET_PAGE_SIZE
> > 
> > This means ram_bytes_remaining is proportional to guest changing RAM, so
> > we can consider this change would yield expected_downtime
> 
> ram_bytes_remaining comes from the *current* number of dirty pages, so it
> tells you how much you have to transmit, but if the guest wasn't
> changing RAM, then that just tells you how much longer you have to keep
> going - not the amount of downtime required.  e.g. right at the start of
> migration you might have 16G of dirty-pages, but you don't need downtime
> to transmit them all.
> 
> It's actually slightly different, because migration_update_counters is
> called in the main iteration loop after an iteration and I think that
> means it only ends up there either at the end of migration OR when
> qemu_file_rate_limit(f) causes ram_save_iterate to return to the main
> loop; so you've got the number of dirty pages when it's interrupted by
> rate limiting.
> 
> So I don't think the use of ram_bytes_remaining is right either.
> 
> What is the right answer?
> I'm not sure; but:
> 
>    a) If the bandwidth is lower then you can see the downtime should be
> longer; so  having x/bandwidth  makes sense
>    b) If the guest is dirtying RAM faster then you can see the downtime
> should be longer;  so having  dirty_pages_rate on the top seems right.
> 
> So you can kind of see where the calculation above comes from.
> 
> I can't convince myself of any calculation that actually works!
> 
> Lets imagine a setup with a guest dirtying memory at 'Dr' Bytes/s
> with the bandwidth (Bw), and we enter an iteration with
> 'Db' bytes dirty:
> 
>   The time for that iteration is:
>      It   = Db / Bw
> 
>   during that time we've dirtied 'Dr' more RAM, so at the end of
> it we have:
>      Db' = Dr * It
>          = Dr * Db
>            -------
>               Bw
> 
> But then if you follow that, in any case where Dr < Bw that iterates
> down to Db' being ~0  irrespective of what that ration is - but that
> makes no sense.

So, as per our IRC discussion, this is pretty hard.

That said, I think Bala's proposed patch is better than what we have
now.  It will initially be a gross over-estimate, but for for
non-converging migrations it should approach a reasonable estimate
later on.  What we have now can never really be right.

So while it would be nice to have some better modelling of this long
term, in the short term I think it makes sense to apply Bala's patch.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-20  5:47               ` David Gibson
@ 2018-04-20 10:28                 ` Dr. David Alan Gilbert
  2018-04-21 19:24                   ` Balamuruhan S
  0 siblings, 1 reply; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2018-04-20 10:28 UTC (permalink / raw)
  To: David Gibson; +Cc: Balamuruhan S, quintela, qemu-devel

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Thu, Apr 19, 2018 at 12:24:04PM +0100, Dr. David Alan Gilbert wrote:
> > * Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:
> > > On Wed, Apr 18, 2018 at 09:36:33AM +0100, Dr. David Alan Gilbert wrote:
> > > > * Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:
> > > > > On Wed, Apr 18, 2018 at 10:57:26AM +1000, David Gibson wrote:
> > > > > > On Wed, Apr 18, 2018 at 10:55:50AM +1000, David Gibson wrote:
> > > > > > > On Tue, Apr 17, 2018 at 06:53:17PM +0530, Balamuruhan S wrote:
> > > > > > > > expected_downtime value is not accurate with dirty_pages_rate * page_size,
> > > > > > > > using ram_bytes_remaining would yeild it correct.
> > > > > > > 
> > > > > > > This commit message hasn't been changed since v1, but the patch is
> > > > > > > doing something completely different.  I think most of the info from
> > > > > > > your cover letter needs to be in here.
> > > > > > > 
> > > > > > > > 
> > > > > > > > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> > > > > > > > ---
> > > > > > > >  migration/migration.c | 6 +++---
> > > > > > > >  migration/migration.h | 1 +
> > > > > > > >  2 files changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > > > > > index 52a5092add..4d866bb920 100644
> > > > > > > > --- a/migration/migration.c
> > > > > > > > +++ b/migration/migration.c
> > > > > > > > @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
> > > > > > > >      }
> > > > > > > >  
> > > > > > > >      if (s->state != MIGRATION_STATUS_COMPLETED) {
> > > > > > > > -        info->ram->remaining = ram_bytes_remaining();
> > > > > > > > +        info->ram->remaining = s->ram_bytes_remaining;
> > > > > > > >          info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate;
> > > > > > > >      }
> > > > > > > >  }
> > > > > > > > @@ -2227,6 +2227,7 @@ static void migration_update_counters(MigrationState *s,
> > > > > > > >      transferred = qemu_ftell(s->to_dst_file) - s->iteration_initial_bytes;
> > > > > > > >      time_spent = current_time - s->iteration_start_time;
> > > > > > > >      bandwidth = (double)transferred / time_spent;
> > > > > > > > +    s->ram_bytes_remaining = ram_bytes_remaining();
> > > > > > > >      s->threshold_size = bandwidth * s->parameters.downtime_limit;
> > > > > > > >  
> > > > > > > >      s->mbps = (((double) transferred * 8.0) /
> > > > > > > > @@ -2237,8 +2238,7 @@ static void migration_update_counters(MigrationState *s,
> > > > > > > >       * recalculate. 10000 is a small enough number for our purposes
> > > > > > > >       */
> > > > > > > >      if (ram_counters.dirty_pages_rate && transferred > 10000) {
> > > > > > > > -        s->expected_downtime = ram_counters.dirty_pages_rate *
> > > > > > > > -            qemu_target_page_size() / bandwidth;
> > > > > > > > +        s->expected_downtime = s->ram_bytes_remaining / bandwidth;
> > > > > > > >      }
> > > > > > 
> > > > > > ..but more importantly, I still think this change is bogus.  expected
> > > > > > downtime is not the same thing as remaining ram / bandwidth.
> > > > > 
> > > > > I tested precopy migration of 16M HP backed P8 guest from P8 to 1G P9 host
> > > > > and observed precopy migration was infinite with expected_downtime set as
> > > > > downtime-limit.
> > > > 
> > > > Did you debug why it was infinite? Which component of the calculation
> > > > had gone wrong and why?
> > > > 
> > > > > During the discussion for Bug RH1560562, Michael Roth quoted that
> > > > > 
> > > > > One thing to note: in my testing I found that the "expected downtime" value
> > > > > seems inaccurate in this scenario. To find a max downtime that allowed
> > > > > migration to complete I had to divide "remaining ram" by "throughput" from
> > > > > "info migrate" (after the initial pre-copy pass through ram, i.e. once
> > > > > "dirty pages" value starts getting reported and we're just sending dirtied
> > > > > pages).
> > > > > 
> > > > > Later by trying it precopy migration could able to complete with this
> > > > > approach.
> > > > > 
> > > > > adding Michael Roth in cc.
> > > > 
> > > > We should try and _understand_ the rational for the change, not just go
> > > > with it.  Now, remember that whatever we do is just an estimate and
> > > 
> > > I have made the change based on my understanding,
> > > 
> > > Currently the calculation is,
> > > 
> > > expected_downtime = (dirty_pages_rate * qemu_target_page_size) / bandwidth
> > > 
> > > dirty_pages_rate = No of dirty pages / time =>  its unit (1 / seconds)
> > > qemu_target_page_size => its unit (bytes)
> > > 
> > > dirty_pages_rate * qemu_target_page_size => bytes/seconds
> > > 
> > > bandwidth = bytes transferred / time => bytes/seconds
> > > 
> > > dividing this would not be a measurement of time.
> > 
> > OK, that argument makes sense to me about why it feels broken; but see
> > below.
> > 
> > > > there will be lots of cases where it's bad - so be careful what you're
> > > > using it for - you definitely should NOT use the value in any automated
> > > > system.
> > > 
> > > I agree with it and I would not use it in automated system.
> > > 
> > > > My problem with just using ram_bytes_remaining is that it doesn't take
> > > > into account the rate at which the guest is changing RAM - which feels
> > > > like it's the important measure for expected downtime.
> > > 
> > > ram_bytes_remaining = ram_state->migration_dirty_pages * TARGET_PAGE_SIZE
> > > 
> > > This means ram_bytes_remaining is proportional to guest changing RAM, so
> > > we can consider this change would yield expected_downtime
> > 
> > ram_bytes_remaining comes from the *current* number of dirty pages, so it
> > tells you how much you have to transmit, but if the guest wasn't
> > changing RAM, then that just tells you how much longer you have to keep
> > going - not the amount of downtime required.  e.g. right at the start of
> > migration you might have 16G of dirty-pages, but you don't need downtime
> > to transmit them all.
> > 
> > It's actually slightly different, because migration_update_counters is
> > called in the main iteration loop after an iteration and I think that
> > means it only ends up there either at the end of migration OR when
> > qemu_file_rate_limit(f) causes ram_save_iterate to return to the main
> > loop; so you've got the number of dirty pages when it's interrupted by
> > rate limiting.
> > 
> > So I don't think the use of ram_bytes_remaining is right either.
> > 
> > What is the right answer?
> > I'm not sure; but:
> > 
> >    a) If the bandwidth is lower then you can see the downtime should be
> > longer; so  having x/bandwidth  makes sense
> >    b) If the guest is dirtying RAM faster then you can see the downtime
> > should be longer;  so having  dirty_pages_rate on the top seems right.
> > 
> > So you can kind of see where the calculation above comes from.
> > 
> > I can't convince myself of any calculation that actually works!
> > 
> > Lets imagine a setup with a guest dirtying memory at 'Dr' Bytes/s
> > with the bandwidth (Bw), and we enter an iteration with
> > 'Db' bytes dirty:
> > 
> >   The time for that iteration is:
> >      It   = Db / Bw
> > 
> >   during that time we've dirtied 'Dr' more RAM, so at the end of
> > it we have:
> >      Db' = Dr * It
> >          = Dr * Db
> >            -------
> >               Bw
> > 
> > But then if you follow that, in any case where Dr < Bw that iterates
> > down to Db' being ~0  irrespective of what that ration is - but that
> > makes no sense.
> 
> So, as per our IRC discussion, this is pretty hard.
> 
> That said, I think Bala's proposed patch is better than what we have
> now.  It will initially be a gross over-estimate, but for for
> non-converging migrations it should approach a reasonable estimate
> later on.  What we have now can never really be right.
> 
> So while it would be nice to have some better modelling of this long
> term, in the short term I think it makes sense to apply Bala's patch.

I'd like to see where the original one was going wrong for Bala; my
problem is that for me, the old code (which logically is wrong) is
giving sensible results here, within a factor of 2 of the actual
downtime I needed to set.  The code maybe wrong, but the results are
reasonably right.

iDave

> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-19 11:48             ` David Gibson
@ 2018-04-20 18:57               ` Dr. David Alan Gilbert
  2018-05-03  2:08                 ` David Gibson
  2018-04-21 19:12               ` Balamuruhan S
  1 sibling, 1 reply; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2018-04-20 18:57 UTC (permalink / raw)
  To: David Gibson; +Cc: Balamuruhan S, quintela, qemu-devel

* David Gibson (david@gibson.dropbear.id.au) wrote:

<snip>

> So.  AFAICT the estimate of page dirty rate is based on the assumption
> that page dirties are independent of each other - one page is as
> likely to be dirtied as any other.  If we don't make that assumption,
> I don't see how we can really have an estimate as a single number.

I don't think that's entirely true; at the moment we're calculating
it by looking at the number of bits that become set during a sync
operation, and the time since the last time we did the same calculation.
Multiple writes to that page in that period will only count it once.
Since it only counts it once I don't think it quite meets that
statement.  Except see the bit at the bottom.

> But if that's the assumption, then predicting downtime based on it is
> futile: if the dirty rate is less than bandwidth, we can wait long
> enough and make the downtime as small as we want.  If the dirty rate
> is higher than bandwidth, then we don't converge and no downtime short
> of (ram size / bandwidth) will be sufficient.
> 
> The only way a predicted downtime makes any sense is if we assume that
> although the "instantaneous" dirty rate is high, the pages being
> dirtied are within a working set that's substantially smaller than the
> full RAM size.  In that case the expected down time becomes (working
> set size / bandwidth).

I don't think it needs to be a working set - it can be gently scribbling
all over ram at a low rate and still satisfy the termination; but yes
if what you're trying to do is estimate the working set it makes sense.
> Predicting downtime as (ram_bytes_remaining / bandwidth) is
> essentially always wrong early in the migration, although it will be a
> poor upper bound - it will basically give you the time to transfer all
> RAM.
> 
> For a nicely converging migration it will also be wrong (but an upper
> bound) until it isn't: it will gradually decrease until it dips below
> the requested downtime threshold, at which point the migration
> completes.
> 
> For a diverging migration with a working set, as discussed above,
> ram_bytes_remaining will eventually converge on (roughly) the size of
> that working set - it won't dip (much) below that, because we can't
> keep up with the dirties within that working set.  At that point this
> does become a reasonable estimate of the necessary downtime in order
> to get the migration to complete, which I believe is the point of the
> value.
> 
> So the question is: for the purposes of this value, is a gross
> overestimate that gradually approaches a reasonable value good enough?

It's complicated a bit by the fact we redo the calculations when we
limit the bandwidth, so it's not always calculated at the end of a full
dirty sync set.
But I do wonder about whether using this value after a few iterations
makes sense - when as you say it's settling into a working set.

> An estimate that would get closer, quicker would be (ram dirtied in
> interval) / bandwidth.  Where (ram dirtied in interval) is a measure
> of total ram dirtied over some measurement interval - only counting a
> page once if its dirtied multiple times during the interval.  And
> obviously you'd want some sort of averaging on that.  I think that
> would be a bit of a pain to measure, though.

If you look at the code in ram.c it has:

    /* more than 1 second = 1000 millisecons */
    if (end_time > rs->time_last_bitmap_sync + 1000) {
        /* calculate period counters */
        ram_counters.dirty_pages_rate = rs->num_dirty_pages_period * 1000
            / (end_time - rs->time_last_bitmap_sync);


  what I think that means is that, when we get stuck near the end with
lots of iterations, we do get some averaging over short iterations.
But the iterations that are long, does it need any averaging - that
depends whether you think 'one second' is over the period you want to
average over.

Dave
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-19 11:48             ` David Gibson
  2018-04-20 18:57               ` Dr. David Alan Gilbert
@ 2018-04-21 19:12               ` Balamuruhan S
  2018-05-03  2:14                 ` David Gibson
  1 sibling, 1 reply; 17+ messages in thread
From: Balamuruhan S @ 2018-04-21 19:12 UTC (permalink / raw)
  To: David Gibson, Dr. David Alan Gilbert; +Cc: quintela, qemu-devel

On Thu, Apr 19, 2018 at 09:48:17PM +1000, David Gibson wrote:
> On Thu, Apr 19, 2018 at 10:14:52AM +0530, Balamuruhan S wrote:
> > On Wed, Apr 18, 2018 at 09:36:33AM +0100, Dr. David Alan Gilbert wrote:
> > > * Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:
> > > > On Wed, Apr 18, 2018 at 10:57:26AM +1000, David Gibson wrote:
> > > > > On Wed, Apr 18, 2018 at 10:55:50AM +1000, David Gibson wrote:
> > > > > > On Tue, Apr 17, 2018 at 06:53:17PM +0530, Balamuruhan S wrote:
> > > > > > > expected_downtime value is not accurate with dirty_pages_rate * page_size,
> > > > > > > using ram_bytes_remaining would yeild it correct.
> > > > > > 
> > > > > > This commit message hasn't been changed since v1, but the patch is
> > > > > > doing something completely different.  I think most of the info from
> > > > > > your cover letter needs to be in here.
> > > > > > 
> > > > > > > 
> > > > > > > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> > > > > > > ---
> > > > > > >  migration/migration.c | 6 +++---
> > > > > > >  migration/migration.h | 1 +
> > > > > > >  2 files changed, 4 insertions(+), 3 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > > > > index 52a5092add..4d866bb920 100644
> > > > > > > --- a/migration/migration.c
> > > > > > > +++ b/migration/migration.c
> > > > > > > @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
> > > > > > >      }
> > > > > > >  
> > > > > > >      if (s->state != MIGRATION_STATUS_COMPLETED) {
> > > > > > > -        info->ram->remaining = ram_bytes_remaining();
> > > > > > > +        info->ram->remaining = s->ram_bytes_remaining;
> > > > > > >          info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate;
> > > > > > >      }
> > > > > > >  }
> > > > > > > @@ -2227,6 +2227,7 @@ static void migration_update_counters(MigrationState *s,
> > > > > > >      transferred = qemu_ftell(s->to_dst_file) - s->iteration_initial_bytes;
> > > > > > >      time_spent = current_time - s->iteration_start_time;
> > > > > > >      bandwidth = (double)transferred / time_spent;
> > > > > > > +    s->ram_bytes_remaining = ram_bytes_remaining();
> > > > > > >      s->threshold_size = bandwidth * s->parameters.downtime_limit;
> > > > > > >  
> > > > > > >      s->mbps = (((double) transferred * 8.0) /
> > > > > > > @@ -2237,8 +2238,7 @@ static void migration_update_counters(MigrationState *s,
> > > > > > >       * recalculate. 10000 is a small enough number for our purposes
> > > > > > >       */
> > > > > > >      if (ram_counters.dirty_pages_rate && transferred > 10000) {
> > > > > > > -        s->expected_downtime = ram_counters.dirty_pages_rate *
> > > > > > > -            qemu_target_page_size() / bandwidth;
> > > > > > > +        s->expected_downtime = s->ram_bytes_remaining / bandwidth;
> > > > > > >      }
> > > > > 
> > > > > ..but more importantly, I still think this change is bogus.  expected
> > > > > downtime is not the same thing as remaining ram / bandwidth.
> > > > 
> > > > I tested precopy migration of 16M HP backed P8 guest from P8 to 1G P9 host
> > > > and observed precopy migration was infinite with expected_downtime set as
> > > > downtime-limit.
> > > 
> > > Did you debug why it was infinite? Which component of the calculation
> > > had gone wrong and why?
> > > 
> > > > During the discussion for Bug RH1560562, Michael Roth quoted that
> > > > 
> > > > One thing to note: in my testing I found that the "expected downtime" value
> > > > seems inaccurate in this scenario. To find a max downtime that allowed
> > > > migration to complete I had to divide "remaining ram" by "throughput" from
> > > > "info migrate" (after the initial pre-copy pass through ram, i.e. once
> > > > "dirty pages" value starts getting reported and we're just sending dirtied
> > > > pages).
> > > > 
> > > > Later by trying it precopy migration could able to complete with this
> > > > approach.
> > > > 
> > > > adding Michael Roth in cc.
> > > 
> > > We should try and _understand_ the rational for the change, not just go
> > > with it.  Now, remember that whatever we do is just an estimate and
> > 
> > I have made the change based on my understanding,
> > 
> > Currently the calculation is,
> > 
> > expected_downtime = (dirty_pages_rate * qemu_target_page_size) / bandwidth
> > 
> > dirty_pages_rate = No of dirty pages / time =>  its unit (1 / seconds)
> > qemu_target_page_size => its unit (bytes)
> > 
> > dirty_pages_rate * qemu_target_page_size => bytes/seconds
> > 
> > bandwidth = bytes transferred / time => bytes/seconds
> > 
> > dividing this would not be a measurement of time.
> 
> Hm, that's a good point, the units are not right here.  And thinking
> about it more, it doesn't really make sense for it to be linear
you are right.

> either.  After all if the page dirty rate exceeds the bandwidth then
> the expected downtime is infinite... well size of ram over bandwidth,
> at least.
> 
> > > there will be lots of cases where it's bad - so be careful what you're
> > > using it for - you definitely should NOT use the value in any automated
> > > system.
> > 
> > I agree with it and I would not use it in automated system.
> > 
> > > My problem with just using ram_bytes_remaining is that it doesn't take
> > > into account the rate at which the guest is changing RAM - which feels
> > > like it's the important measure for expected downtime.
> > 
> > ram_bytes_remaining = ram_state->migration_dirty_pages * TARGET_PAGE_SIZE
> > 
> > This means ram_bytes_remaining is proportional to guest changing RAM, so
> > we can consider this change would yield expected_downtime
> 
> Well, just because the existing estimate is wrong doesn't mean this
> one is right.  Having the right units is a necessary but not
> sufficient condition.

I Agree it.

> 
> That said, I thought a bunch about this a bunch, and I think there is
> a case to be made for it - although it's a lot more subtle than what's
> been suggested so far.
> 
> So.  AFAICT the estimate of page dirty rate is based on the assumption
> that page dirties are independent of each other - one page is as
> likely to be dirtied as any other.  If we don't make that assumption,
> I don't see how we can really have an estimate as a single number.
> 
> But if that's the assumption, then predicting downtime based on it is
> futile: if the dirty rate is less than bandwidth, we can wait long
> enough and make the downtime as small as we want.  If the dirty rate
> is higher than bandwidth, then we don't converge and no downtime short
> of (ram size / bandwidth) will be sufficient.
> 
> The only way a predicted downtime makes any sense is if we assume that
> although the "instantaneous" dirty rate is high, the pages being
> dirtied are within a working set that's substantially smaller than the
> full RAM size.  In that case the expected down time becomes (working
> set size / bandwidth).

Thank you Dave and David for such a nice explanation and for your time.

I thought about it after the explanation given by you and Dave, so in
expected downtime we are trying to predict downtime based on some
values at that instant, so we need to use that value and integrate it.

1. we are currently using bandwidth but actually I think we have to use
rate of change of bandwidth, because bandwidth is not constant always.

2. we are using dirty_pages_rate and as Dave suggested,

when we enter an iteration with 'Db' bytes dirty we should be
considering ['Db' + Dr * iteration time of previous one], where for the first
iteration, iteration time of previous would be 0.

3. As you have said, that ram_bytes_remaining / bandwidth is the time to
transfer all RAM, so this should be the limit for our integration. when
we calculate for any instant it would be 0 to ram_bytes_remaining /
bandwidth at that instant.

Regards,
Bala

> 
> Predicting downtime as (ram_bytes_remaining / bandwidth) is
> essentially always wrong early in the migration, although it will be a
> poor upper bound - it will basically give you the time to transfer all
> RAM.
> 
> For a nicely converging migration it will also be wrong (but an upper
> bound) until it isn't: it will gradually decrease until it dips below
> the requested downtime threshold, at which point the migration
> completes.
> 
> For a diverging migration with a working set, as discussed above,
> ram_bytes_remaining will eventually converge on (roughly) the size of
> that working set - it won't dip (much) below that, because we can't
> keep up with the dirties within that working set.  At that point this
> does become a reasonable estimate of the necessary downtime in order
> to get the migration to complete, which I believe is the point of the
> value.
> 
> So the question is: for the purposes of this value, is a gross
> overestimate that gradually approaches a reasonable value good enough?
> 
> An estimate that would get closer, quicker would be (ram dirtied in
> interval) / bandwidth.  Where (ram dirtied in interval) is a measure
> of total ram dirtied over some measurement interval - only counting a
> page once if its dirtied multiple times during the interval.  And
> obviously you'd want some sort of averaging on that.  I think that
> would be a bit of a pain to measure, though.
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-20 10:28                 ` Dr. David Alan Gilbert
@ 2018-04-21 19:24                   ` Balamuruhan S
  0 siblings, 0 replies; 17+ messages in thread
From: Balamuruhan S @ 2018-04-21 19:24 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, David Gibson; +Cc: quintela, qemu-devel

On Fri, Apr 20, 2018 at 11:28:04AM +0100, Dr. David Alan Gilbert wrote:
> * David Gibson (david@gibson.dropbear.id.au) wrote:
> > On Thu, Apr 19, 2018 at 12:24:04PM +0100, Dr. David Alan Gilbert wrote:
> > > * Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:
> > > > On Wed, Apr 18, 2018 at 09:36:33AM +0100, Dr. David Alan Gilbert wrote:
> > > > > * Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:
> > > > > > On Wed, Apr 18, 2018 at 10:57:26AM +1000, David Gibson wrote:
> > > > > > > On Wed, Apr 18, 2018 at 10:55:50AM +1000, David Gibson wrote:
> > > > > > > > On Tue, Apr 17, 2018 at 06:53:17PM +0530, Balamuruhan S wrote:
> > > > > > > > > expected_downtime value is not accurate with dirty_pages_rate * page_size,
> > > > > > > > > using ram_bytes_remaining would yeild it correct.
> > > > > > > > 
> > > > > > > > This commit message hasn't been changed since v1, but the patch is
> > > > > > > > doing something completely different.  I think most of the info from
> > > > > > > > your cover letter needs to be in here.
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> > > > > > > > > ---
> > > > > > > > >  migration/migration.c | 6 +++---
> > > > > > > > >  migration/migration.h | 1 +
> > > > > > > > >  2 files changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > > > > > > index 52a5092add..4d866bb920 100644
> > > > > > > > > --- a/migration/migration.c
> > > > > > > > > +++ b/migration/migration.c
> > > > > > > > > @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
> > > > > > > > >      }
> > > > > > > > >  
> > > > > > > > >      if (s->state != MIGRATION_STATUS_COMPLETED) {
> > > > > > > > > -        info->ram->remaining = ram_bytes_remaining();
> > > > > > > > > +        info->ram->remaining = s->ram_bytes_remaining;
> > > > > > > > >          info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate;
> > > > > > > > >      }
> > > > > > > > >  }
> > > > > > > > > @@ -2227,6 +2227,7 @@ static void migration_update_counters(MigrationState *s,
> > > > > > > > >      transferred = qemu_ftell(s->to_dst_file) - s->iteration_initial_bytes;
> > > > > > > > >      time_spent = current_time - s->iteration_start_time;
> > > > > > > > >      bandwidth = (double)transferred / time_spent;
> > > > > > > > > +    s->ram_bytes_remaining = ram_bytes_remaining();
> > > > > > > > >      s->threshold_size = bandwidth * s->parameters.downtime_limit;
> > > > > > > > >  
> > > > > > > > >      s->mbps = (((double) transferred * 8.0) /
> > > > > > > > > @@ -2237,8 +2238,7 @@ static void migration_update_counters(MigrationState *s,
> > > > > > > > >       * recalculate. 10000 is a small enough number for our purposes
> > > > > > > > >       */
> > > > > > > > >      if (ram_counters.dirty_pages_rate && transferred > 10000) {
> > > > > > > > > -        s->expected_downtime = ram_counters.dirty_pages_rate *
> > > > > > > > > -            qemu_target_page_size() / bandwidth;
> > > > > > > > > +        s->expected_downtime = s->ram_bytes_remaining / bandwidth;
> > > > > > > > >      }
> > > > > > > 
> > > > > > > ..but more importantly, I still think this change is bogus.  expected
> > > > > > > downtime is not the same thing as remaining ram / bandwidth.
> > > > > > 
> > > > > > I tested precopy migration of 16M HP backed P8 guest from P8 to 1G P9 host
> > > > > > and observed precopy migration was infinite with expected_downtime set as
> > > > > > downtime-limit.
> > > > > 
> > > > > Did you debug why it was infinite? Which component of the calculation
> > > > > had gone wrong and why?
> > > > > 
> > > > > > During the discussion for Bug RH1560562, Michael Roth quoted that
> > > > > > 
> > > > > > One thing to note: in my testing I found that the "expected downtime" value
> > > > > > seems inaccurate in this scenario. To find a max downtime that allowed
> > > > > > migration to complete I had to divide "remaining ram" by "throughput" from
> > > > > > "info migrate" (after the initial pre-copy pass through ram, i.e. once
> > > > > > "dirty pages" value starts getting reported and we're just sending dirtied
> > > > > > pages).
> > > > > > 
> > > > > > Later by trying it precopy migration could able to complete with this
> > > > > > approach.
> > > > > > 
> > > > > > adding Michael Roth in cc.
> > > > > 
> > > > > We should try and _understand_ the rational for the change, not just go
> > > > > with it.  Now, remember that whatever we do is just an estimate and
> > > > 
> > > > I have made the change based on my understanding,
> > > > 
> > > > Currently the calculation is,
> > > > 
> > > > expected_downtime = (dirty_pages_rate * qemu_target_page_size) / bandwidth
> > > > 
> > > > dirty_pages_rate = No of dirty pages / time =>  its unit (1 / seconds)
> > > > qemu_target_page_size => its unit (bytes)
> > > > 
> > > > dirty_pages_rate * qemu_target_page_size => bytes/seconds
> > > > 
> > > > bandwidth = bytes transferred / time => bytes/seconds
> > > > 
> > > > dividing this would not be a measurement of time.
> > > 
> > > OK, that argument makes sense to me about why it feels broken; but see
> > > below.
> > > 
> > > > > there will be lots of cases where it's bad - so be careful what you're
> > > > > using it for - you definitely should NOT use the value in any automated
> > > > > system.
> > > > 
> > > > I agree with it and I would not use it in automated system.
> > > > 
> > > > > My problem with just using ram_bytes_remaining is that it doesn't take
> > > > > into account the rate at which the guest is changing RAM - which feels
> > > > > like it's the important measure for expected downtime.
> > > > 
> > > > ram_bytes_remaining = ram_state->migration_dirty_pages * TARGET_PAGE_SIZE
> > > > 
> > > > This means ram_bytes_remaining is proportional to guest changing RAM, so
> > > > we can consider this change would yield expected_downtime
> > > 
> > > ram_bytes_remaining comes from the *current* number of dirty pages, so it
> > > tells you how much you have to transmit, but if the guest wasn't
> > > changing RAM, then that just tells you how much longer you have to keep
> > > going - not the amount of downtime required.  e.g. right at the start of
> > > migration you might have 16G of dirty-pages, but you don't need downtime
> > > to transmit them all.
> > > 
> > > It's actually slightly different, because migration_update_counters is
> > > called in the main iteration loop after an iteration and I think that
> > > means it only ends up there either at the end of migration OR when
> > > qemu_file_rate_limit(f) causes ram_save_iterate to return to the main
> > > loop; so you've got the number of dirty pages when it's interrupted by
> > > rate limiting.
> > > 
> > > So I don't think the use of ram_bytes_remaining is right either.
> > > 
> > > What is the right answer?
> > > I'm not sure; but:
> > > 
> > >    a) If the bandwidth is lower then you can see the downtime should be
> > > longer; so  having x/bandwidth  makes sense
> > >    b) If the guest is dirtying RAM faster then you can see the downtime
> > > should be longer;  so having  dirty_pages_rate on the top seems right.
> > > 
> > > So you can kind of see where the calculation above comes from.
> > > 
> > > I can't convince myself of any calculation that actually works!
> > > 
> > > Lets imagine a setup with a guest dirtying memory at 'Dr' Bytes/s
> > > with the bandwidth (Bw), and we enter an iteration with
> > > 'Db' bytes dirty:
> > > 
> > >   The time for that iteration is:
> > >      It   = Db / Bw
> > > 
> > >   during that time we've dirtied 'Dr' more RAM, so at the end of
> > > it we have:
> > >      Db' = Dr * It
> > >          = Dr * Db
> > >            -------
> > >               Bw
> > > 
> > > But then if you follow that, in any case where Dr < Bw that iterates
> > > down to Db' being ~0  irrespective of what that ration is - but that
> > > makes no sense.
> > 
> > So, as per our IRC discussion, this is pretty hard.
> > 
> > That said, I think Bala's proposed patch is better than what we have
> > now.  It will initially be a gross over-estimate, but for for
> > non-converging migrations it should approach a reasonable estimate
> > later on.  What we have now can never really be right.
> > 
> > So while it would be nice to have some better modelling of this long
> > term, in the short term I think it makes sense to apply Bala's patch.
> 
> I'd like to see where the original one was going wrong for Bala; my
> problem is that for me, the old code (which logically is wrong) is
> giving sensible results here, within a factor of 2 of the actual
> downtime I needed to set.  The code maybe wrong, but the results are
> reasonably right.
> 

Thanks for considering the patch, can I send another version with
changes asked by David to fix commit msg explaining well about the new
changes ?

-- Bala

> iDave
> 
> > 
> > -- 
> > David Gibson			| I'll have my music baroque, and my code
> > david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> > 				| _way_ _around_!
> > http://www.ozlabs.org/~dgibson
> 
> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-20 18:57               ` Dr. David Alan Gilbert
@ 2018-05-03  2:08                 ` David Gibson
  0 siblings, 0 replies; 17+ messages in thread
From: David Gibson @ 2018-05-03  2:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Balamuruhan S, quintela, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 8351 bytes --]

On Fri, Apr 20, 2018 at 07:57:34PM +0100, Dr. David Alan Gilbert wrote:
> * David Gibson (david@gibson.dropbear.id.au) wrote:
> 
> <snip>
> 
> > So.  AFAICT the estimate of page dirty rate is based on the assumption
> > that page dirties are independent of each other - one page is as
> > likely to be dirtied as any other.  If we don't make that assumption,
> > I don't see how we can really have an estimate as a single number.
> 
> I don't think that's entirely true; at the moment we're calculating
> it by looking at the number of bits that become set during a sync
> operation, and the time since the last time we did the same calculation.
> Multiple writes to that page in that period will only count it once.
> Since it only counts it once I don't think it quite meets that
> statement.  Except see the bit at the bottom.

Ah, good point.  I was assuming the dirty rate variable here
represented the "instantaneous" rate, but of course it doesn't.  Rather
it's a "deduplicated" rate over one iteration.  Of course that
iteration time could vary which might make one iterations value not
quite comparable to another's.

> > But if that's the assumption, then predicting downtime based on it is
> > futile: if the dirty rate is less than bandwidth, we can wait long
> > enough and make the downtime as small as we want.  If the dirty rate
> > is higher than bandwidth, then we don't converge and no downtime short
> > of (ram size / bandwidth) will be sufficient.
> > 
> > The only way a predicted downtime makes any sense is if we assume that
> > although the "instantaneous" dirty rate is high, the pages being
> > dirtied are within a working set that's substantially smaller than the
> > full RAM size.  In that case the expected down time becomes (working
> > set size / bandwidth).
> 
> I don't think it needs to be a working set - it can be gently scribbling
> all over ram at a low rate and still satisfy the termination; but
> yes

Not really.  If we assume we're scribbling all ram then if dirty-rate
< bandwidth, we'll eventually catch up to being a single page behind -
in which case we can make the downtime as small as we want down to
pagesize/bandwidth.  If dirty-rate > bandwidth, then we
lose ground on every iteration and we won't be able to migrate with
any downtime less than ramsize/bandwidth.

If dirty-rate == bandwidth, to within the noise on both those
parameters, then the downtime we need to migrate will essentially
random walk based on that noise, so I don't think we can make any
meaningful estimate of it.

I think the only case where "expected downtime" is a meaningful and
useful value is where we have some sort of working set.  The working
set in this case being defined by the fact that the dirty rate within
it is > bandwidth, but the dirty rate outside it is < bandwidth.

> if what you're trying to do is estimate the working set it makes sense.
> > Predicting downtime as (ram_bytes_remaining / bandwidth) is
> > essentially always wrong early in the migration, although it will be a
> > poor upper bound - it will basically give you the time to transfer all
> > RAM.
> > 
> > For a nicely converging migration it will also be wrong (but an upper
> > bound) until it isn't: it will gradually decrease until it dips below
> > the requested downtime threshold, at which point the migration
> > completes.
> > 
> > For a diverging migration with a working set, as discussed above,
> > ram_bytes_remaining will eventually converge on (roughly) the size of
> > that working set - it won't dip (much) below that, because we can't
> > keep up with the dirties within that working set.  At that point this
> > does become a reasonable estimate of the necessary downtime in order
> > to get the migration to complete, which I believe is the point of the
> > value.
> > 
> > So the question is: for the purposes of this value, is a gross
> > overestimate that gradually approaches a reasonable value good enough?
> 
> It's complicated a bit by the fact we redo the calculations when we
> limit the bandwidth, so it's not always calculated at the end of a full
> dirty sync set.
> But I do wonder about whether using this value after a few iterations
> makes sense - when as you say it's settling into a working set.
> 
> > An estimate that would get closer, quicker would be (ram dirtied in
> > interval) / bandwidth.  Where (ram dirtied in interval) is a measure
> > of total ram dirtied over some measurement interval - only counting a
> > page once if its dirtied multiple times during the interval.  And
> > obviously you'd want some sort of averaging on that.  I think that
> > would be a bit of a pain to measure, though.
> 
> If you look at the code in ram.c it has:
> 
>     /* more than 1 second = 1000 millisecons */
>     if (end_time > rs->time_last_bitmap_sync + 1000) {
>         /* calculate period counters */
>         ram_counters.dirty_pages_rate = rs->num_dirty_pages_period * 1000
>             / (end_time - rs->time_last_bitmap_sync);
> 
> 
>   what I think that means is that, when we get stuck near the end with
> lots of iterations, we do get some averaging over short iterations.
> But the iterations that are long, does it need any averaging - that
> depends whether you think 'one second' is over the period you want to
> average over.

So, thinking over this further, I don't think having the same (wall
clock time) duration for each interval is as important as I previously
did.  I think working on the basis of iterations is ok - at least
apart from the wrinkle you mention of bandwidth alterations causing a
short iteration.

Let's define an "interval" as being a single pass over all of guest
RAM, transmitting the pages dirtied since the last interval.  IIUC
that will correspond to the current iteration, with the exception of
that wrinkle.

In the very first interval, all RAM is dirty, and I think we should
just decline to provide any downtime estimate at all.

On subsequent iterations, I think the total RAM dirtied in the last
interval is a fair to middling estimate of that working set size.
Consider: if the guest is in a steady state with a really strict
working set - e.g. it is cyclically dirtying everything in that set
and *nothing* outside it, then that ram-dirtied-in-interval will
accurately measure that working set on just the second interval.
Realistically, of course, there will be some accesses outside that
working set, so the second-interval value will be an overestimate
because we've had all the time of sending the first pass of RAM to
dirty pages outside the working set.  Still, I think it's probably
about the best we'll be able to do easily.  It should approach a
decent estimate of the working set size reasonably quickly on
subsequent iterations - assuming there is one, anyway.  If it's a true
converging migration it will just shrink gradually down to a single
page, and we're likely to hit our downtime target and stop caring long
before that.

In practice, to handle spikes & noise a bit better, I'd suggest that
we have our "working set" estimate as a weighted average of the
previous estimate and the ram dirtied in the last interval.  That's
equivalent to a weighted average of all the interval observations.  Or
we can use the Jacobson/Karels algorithm which makes an estimate of
variance as well, which might be nice.

Thinking about this yet more, and a bit more formally, it occurs to me
the model I'm using for how the guest dirties RAM is this:
    * Assume RAM is partitioned into a bunch of pieces P1..Pn (not
      necessarily contiguous, and not known to the host)
    * For each piece we have a different (instantaneous) dirty rate
      r1..rn
    * We assume that dirties _within a single piece_ are randomly /
      evenly scattered

What we're looking for is essentially the total size of the pieces we
need to *exclude* to get the total dirty rate below the bandwidth.

It shouldn't be too hard, and might be interesting, to make a test
exerciser based on that model.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-21 19:12               ` Balamuruhan S
@ 2018-05-03  2:14                 ` David Gibson
  0 siblings, 0 replies; 17+ messages in thread
From: David Gibson @ 2018-05-03  2:14 UTC (permalink / raw)
  To: Balamuruhan S; +Cc: Dr. David Alan Gilbert, quintela, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4625 bytes --]

On Sun, Apr 22, 2018 at 12:42:49AM +0530, Balamuruhan S wrote:
> On Thu, Apr 19, 2018 at 09:48:17PM +1000, David Gibson wrote:
> > On Thu, Apr 19, 2018 at 10:14:52AM +0530, Balamuruhan S wrote:
> > > On Wed, Apr 18, 2018 at 09:36:33AM +0100, Dr. David Alan Gilbert wrote:
> > > > * Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:
[snip]
> > That said, I thought a bunch about this a bunch, and I think there is
> > a case to be made for it - although it's a lot more subtle than what's
> > been suggested so far.
> > 
> > So.  AFAICT the estimate of page dirty rate is based on the assumption
> > that page dirties are independent of each other - one page is as
> > likely to be dirtied as any other.  If we don't make that assumption,
> > I don't see how we can really have an estimate as a single number.
> > 
> > But if that's the assumption, then predicting downtime based on it is
> > futile: if the dirty rate is less than bandwidth, we can wait long
> > enough and make the downtime as small as we want.  If the dirty rate
> > is higher than bandwidth, then we don't converge and no downtime short
> > of (ram size / bandwidth) will be sufficient.
> > 
> > The only way a predicted downtime makes any sense is if we assume that
> > although the "instantaneous" dirty rate is high, the pages being
> > dirtied are within a working set that's substantially smaller than the
> > full RAM size.  In that case the expected down time becomes (working
> > set size / bandwidth).
> 
> Thank you Dave and David for such a nice explanation and for your time.
> 
> I thought about it after the explanation given by you and Dave, so in
> expected downtime we are trying to predict downtime based on some
> values at that instant, so we need to use that value and integrate it.

No, not really.  The problem is that as you accumulate dirties over a
longer interval, you'll get more duplicate dirties, which means you'll
get a lower effective value than simply integrating the results over
shorter intervals.

> 1. we are currently using bandwidth but actually I think we have to use
> rate of change of bandwidth, because bandwidth is not constant always.

Again, not really.  It's true that bandwidth isn't necessarily
constant, but in most cases it will be pretty close.  The real noise
here is coming in the dirty rate.

> 2. we are using dirty_pages_rate and as Dave suggested,
> 
> when we enter an iteration with 'Db' bytes dirty we should be
> considering ['Db' + Dr * iteration time of previous one], where for the first
> iteration, iteration time of previous would be 0.
> 
> 3. As you have said, that ram_bytes_remaining / bandwidth is the time to
> transfer all RAM, so this should be the limit for our integration. when
> we calculate for any instant it would be 0 to ram_bytes_remaining /
> bandwidth at that instant.
> 
> Regards,
> Bala
> 
> > 
> > Predicting downtime as (ram_bytes_remaining / bandwidth) is
> > essentially always wrong early in the migration, although it will be a
> > poor upper bound - it will basically give you the time to transfer all
> > RAM.
> > 
> > For a nicely converging migration it will also be wrong (but an upper
> > bound) until it isn't: it will gradually decrease until it dips below
> > the requested downtime threshold, at which point the migration
> > completes.
> > 
> > For a diverging migration with a working set, as discussed above,
> > ram_bytes_remaining will eventually converge on (roughly) the size of
> > that working set - it won't dip (much) below that, because we can't
> > keep up with the dirties within that working set.  At that point this
> > does become a reasonable estimate of the necessary downtime in order
> > to get the migration to complete, which I believe is the point of the
> > value.
> > 
> > So the question is: for the purposes of this value, is a gross
> > overestimate that gradually approaches a reasonable value good enough?
> > 
> > An estimate that would get closer, quicker would be (ram dirtied in
> > interval) / bandwidth.  Where (ram dirtied in interval) is a measure
> > of total ram dirtied over some measurement interval - only counting a
> > page once if its dirtied multiple times during the interval.  And
> > obviously you'd want some sort of averaging on that.  I think that
> > would be a bit of a pain to measure, though.
> > 
> 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-05-03  2:15 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-17 13:23 [Qemu-devel] [PATCH v2 0/1] migration: calculate expected_downtime with ram_bytes_remaining() Balamuruhan S
2018-04-17 13:23 ` [Qemu-devel] [PATCH v2 1/1] " Balamuruhan S
2018-04-18  0:55   ` David Gibson
2018-04-18  0:57     ` David Gibson
2018-04-18  6:46       ` Balamuruhan S
2018-04-18  8:36         ` Dr. David Alan Gilbert
2018-04-19  4:44           ` Balamuruhan S
2018-04-19 11:24             ` Dr. David Alan Gilbert
2018-04-20  5:47               ` David Gibson
2018-04-20 10:28                 ` Dr. David Alan Gilbert
2018-04-21 19:24                   ` Balamuruhan S
2018-04-19 11:48             ` David Gibson
2018-04-20 18:57               ` Dr. David Alan Gilbert
2018-05-03  2:08                 ` David Gibson
2018-04-21 19:12               ` Balamuruhan S
2018-05-03  2:14                 ` David Gibson
2018-04-18  6:52     ` Balamuruhan S

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.