All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] migration: calculate expected_downtime with ram_bytes_remaining()
@ 2018-03-31 18:55 Balamuruhan S
  2018-04-03  6:10 ` Peter Xu
  2018-04-04  9:04 ` Juan Quintela
  0 siblings, 2 replies; 15+ messages in thread
From: Balamuruhan S @ 2018-03-31 18:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, amit.shah, Balamuruhan S

expected_downtime value is not accurate with dirty_pages_rate * page_size,
using ram_bytes_remaining would yeild it correct.

Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
---
 migration/migration.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 58bd382730..4e43dc4f92 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2245,8 +2245,7 @@ static void migration_update_counters(MigrationState *s,
      * recalculate. 10000 is a small enough number for our purposes
      */
     if (ram_counters.dirty_pages_rate && transferred > 10000) {
-        s->expected_downtime = ram_counters.dirty_pages_rate *
-            qemu_target_page_size() / bandwidth;
+        s->expected_downtime = ram_bytes_remaining() / bandwidth;
     }
 
     qemu_file_reset_rate_limit(s->to_dst_file);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-03-31 18:55 [Qemu-devel] [PATCH] migration: calculate expected_downtime with ram_bytes_remaining() Balamuruhan S
@ 2018-04-03  6:10 ` Peter Xu
  2018-04-03 17:30   ` bala24
  2018-04-04  9:02   ` Juan Quintela
  2018-04-04  9:04 ` Juan Quintela
  1 sibling, 2 replies; 15+ messages in thread
From: Peter Xu @ 2018-04-03  6:10 UTC (permalink / raw)
  To: Balamuruhan S; +Cc: qemu-devel, amit.shah, quintela

On Sun, Apr 01, 2018 at 12:25:36AM +0530, Balamuruhan S wrote:
> expected_downtime value is not accurate with dirty_pages_rate * page_size,
> using ram_bytes_remaining would yeild it correct.
> 
> Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> ---
>  migration/migration.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 58bd382730..4e43dc4f92 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2245,8 +2245,7 @@ static void migration_update_counters(MigrationState *s,
>       * recalculate. 10000 is a small enough number for our purposes
>       */
>      if (ram_counters.dirty_pages_rate && transferred > 10000) {
> -        s->expected_downtime = ram_counters.dirty_pages_rate *
> -            qemu_target_page_size() / bandwidth;
> +        s->expected_downtime = ram_bytes_remaining() / bandwidth;

This field was removed in e4ed1541ac ("savevm: New save live migration
method: pending", 2012-12-20), in which remaing RAM was used.

And it was added back in 90f8ae724a ("migration: calculate
expected_downtime", 2013-02-22), in which dirty rate was used.

However I didn't find a clue on why we changed from using remaining
RAM to using dirty rate...  So I'll leave this question to Juan.

Besides, I'm a bit confused on when we'll want such a value.  AFAIU
precopy is mostly used by setting up the target downtime before hand,
so we should already know the downtime before hand.  Then why we want
to observe such a thing?

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-03  6:10 ` Peter Xu
@ 2018-04-03 17:30   ` bala24
  2018-04-04  1:59     ` Peter Xu
  2018-04-04  9:02   ` Juan Quintela
  1 sibling, 1 reply; 15+ messages in thread
From: bala24 @ 2018-04-03 17:30 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, amit.shah, quintela

On 2018-04-03 11:40, Peter Xu wrote:
> On Sun, Apr 01, 2018 at 12:25:36AM +0530, Balamuruhan S wrote:
>> expected_downtime value is not accurate with dirty_pages_rate * 
>> page_size,
>> using ram_bytes_remaining would yeild it correct.
>> 
>> Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
>> ---
>>  migration/migration.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>> 
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 58bd382730..4e43dc4f92 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -2245,8 +2245,7 @@ static void 
>> migration_update_counters(MigrationState *s,
>>       * recalculate. 10000 is a small enough number for our purposes
>>       */
>>      if (ram_counters.dirty_pages_rate && transferred > 10000) {
>> -        s->expected_downtime = ram_counters.dirty_pages_rate *
>> -            qemu_target_page_size() / bandwidth;
>> +        s->expected_downtime = ram_bytes_remaining() / bandwidth;
> 
> This field was removed in e4ed1541ac ("savevm: New save live migration
> method: pending", 2012-12-20), in which remaing RAM was used.
> 
> And it was added back in 90f8ae724a ("migration: calculate
> expected_downtime", 2013-02-22), in which dirty rate was used.
> 
> However I didn't find a clue on why we changed from using remaining
> RAM to using dirty rate...  So I'll leave this question to Juan.
> 
> Besides, I'm a bit confused on when we'll want such a value.  AFAIU
> precopy is mostly used by setting up the target downtime before hand,
> so we should already know the downtime before hand.  Then why we want
> to observe such a thing?

Thanks Peter Xu for reviewing,

I tested precopy migration with 16M hugepage backed ppc guest and 
granularity
of page size in migration is 4K so any page dirtied would result in 4096 
pages
to be transmitted again, this led for migration to continue endless,

default migrate_parameters:
downtime-limit: 300 milliseconds

info migrate:
expected downtime: 1475 milliseconds

Migration status: active
total time: 130874 milliseconds
expected downtime: 1475 milliseconds
setup: 3475 milliseconds
transferred ram: 18197383 kbytes
throughput: 866.83 mbps
remaining ram: 376892 kbytes
total ram: 8388864 kbytes
duplicate: 1678265 pages
skipped: 0 pages
normal: 4536795 pages
normal bytes: 18147180 kbytes
dirty sync count: 6
page size: 4 kbytes
dirty pages rate: 39044 pages

In order to complete migration I configured downtime-limit to 1475
milliseconds but still migration was endless. Later calculated expected
downtime by remaining ram 376892 Kbytes / 866.83 mbps yeilded 3478.34
milliseconds and configuring it as downtime-limit succeeds the migration
to complete. This led to the conclusion that expected downtime is not
accurate.

Regards,
Balamuruhan S

> 
> Thanks,

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-03 17:30   ` bala24
@ 2018-04-04  1:59     ` Peter Xu
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Xu @ 2018-04-04  1:59 UTC (permalink / raw)
  To: bala24; +Cc: qemu-devel, amit.shah, quintela

On Tue, Apr 03, 2018 at 11:00:00PM +0530, bala24 wrote:
> On 2018-04-03 11:40, Peter Xu wrote:
> > On Sun, Apr 01, 2018 at 12:25:36AM +0530, Balamuruhan S wrote:
> > > expected_downtime value is not accurate with dirty_pages_rate *
> > > page_size,
> > > using ram_bytes_remaining would yeild it correct.
> > > 
> > > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> > > ---
> > >  migration/migration.c | 3 +--
> > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 58bd382730..4e43dc4f92 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -2245,8 +2245,7 @@ static void
> > > migration_update_counters(MigrationState *s,
> > >       * recalculate. 10000 is a small enough number for our purposes
> > >       */
> > >      if (ram_counters.dirty_pages_rate && transferred > 10000) {
> > > -        s->expected_downtime = ram_counters.dirty_pages_rate *
> > > -            qemu_target_page_size() / bandwidth;
> > > +        s->expected_downtime = ram_bytes_remaining() / bandwidth;
> > 
> > This field was removed in e4ed1541ac ("savevm: New save live migration
> > method: pending", 2012-12-20), in which remaing RAM was used.
> > 
> > And it was added back in 90f8ae724a ("migration: calculate
> > expected_downtime", 2013-02-22), in which dirty rate was used.
> > 
> > However I didn't find a clue on why we changed from using remaining
> > RAM to using dirty rate...  So I'll leave this question to Juan.
> > 
> > Besides, I'm a bit confused on when we'll want such a value.  AFAIU
> > precopy is mostly used by setting up the target downtime before hand,
> > so we should already know the downtime before hand.  Then why we want
> > to observe such a thing?
> 
> Thanks Peter Xu for reviewing,
> 
> I tested precopy migration with 16M hugepage backed ppc guest and
> granularity
> of page size in migration is 4K so any page dirtied would result in 4096
> pages
> to be transmitted again, this led for migration to continue endless,
> 
> default migrate_parameters:
> downtime-limit: 300 milliseconds
> 
> info migrate:
> expected downtime: 1475 milliseconds
> 
> Migration status: active
> total time: 130874 milliseconds
> expected downtime: 1475 milliseconds
> setup: 3475 milliseconds
> transferred ram: 18197383 kbytes
> throughput: 866.83 mbps
> remaining ram: 376892 kbytes
> total ram: 8388864 kbytes
> duplicate: 1678265 pages
> skipped: 0 pages
> normal: 4536795 pages
> normal bytes: 18147180 kbytes
> dirty sync count: 6
> page size: 4 kbytes
> dirty pages rate: 39044 pages
> 
> In order to complete migration I configured downtime-limit to 1475
> milliseconds but still migration was endless. Later calculated expected
> downtime by remaining ram 376892 Kbytes / 866.83 mbps yeilded 3478.34
> milliseconds and configuring it as downtime-limit succeeds the migration
> to complete. This led to the conclusion that expected downtime is not
> accurate.

Hmm, thanks for the information.  I'd say your calculation seems
reasonable to me: it shows how long time will it need if we stop the
VM now on source immediately and migrate the rest. However Juan might
have an explanation on existing algorithm which I would like to know
too. So still I'll put aside the "which one is better" question.

For your use case, you can have a look on either of below way to
have a converged migration:

- auto-converge: that's a migration capability that throttles CPU
  usage of guests

- postcopy: that'll let you start the destination VM even without
  transferring all the RAMs before hand

Either of the technique can be configured via "migrate_set_capability"
HMP command or "migrate-set-capabilities" QMP command (some googling
would show detailed steps). And, either of above should help you to
migrate successfully in this hard-to-converge scenario, instead of
your current way (observing downtime, set downtime).

Meanwhile, I'm thinking whether instead of observing the downtime in
real time, whether we should introduce a command to stop the VM
immediately to migrate the rest when we want, or, a new parameter to
current "migrate" command.

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-03  6:10 ` Peter Xu
  2018-04-03 17:30   ` bala24
@ 2018-04-04  9:02   ` Juan Quintela
  1 sibling, 0 replies; 15+ messages in thread
From: Juan Quintela @ 2018-04-04  9:02 UTC (permalink / raw)
  To: Peter Xu; +Cc: Balamuruhan S, qemu-devel, amit.shah

Peter Xu <peterx@redhat.com> wrote:
> On Sun, Apr 01, 2018 at 12:25:36AM +0530, Balamuruhan S wrote:
>> expected_downtime value is not accurate with dirty_pages_rate * page_size,
>> using ram_bytes_remaining would yeild it correct.
>> 
>> Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
>> ---
>>  migration/migration.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>> 
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 58bd382730..4e43dc4f92 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -2245,8 +2245,7 @@ static void migration_update_counters(MigrationState *s,
>>       * recalculate. 10000 is a small enough number for our purposes
>>       */
>>      if (ram_counters.dirty_pages_rate && transferred > 10000) {
>> -        s->expected_downtime = ram_counters.dirty_pages_rate *
>> -            qemu_target_page_size() / bandwidth;
>> +        s->expected_downtime = ram_bytes_remaining() / bandwidth;
>
> This field was removed in e4ed1541ac ("savevm: New save live migration
> method: pending", 2012-12-20), in which remaing RAM was used.

Unrelated O:-)

> And it was added back in 90f8ae724a ("migration: calculate
> expected_downtime", 2013-02-22), in which dirty rate was used.

We didn't want to update the field if there haven't been enough activity.

> However I didn't find a clue on why we changed from using remaining
> RAM to using dirty rate...  So I'll leave this question to Juan.
>
> Besides, I'm a bit confused on when we'll want such a value.  AFAIU
> precopy is mostly used by setting up the target downtime before hand,
> so we should already know the downtime before hand.  Then why we want
> to observe such a thing?

What that field means is how much time the system needs to send
everything that is pending.

I.e. if expected_downtime = 2seconds, it means that with current dirty
rate, if we set a downtime of 2 or bigger it is going to finish
migration.

It is a help for upper layers to decide that:
- they want a 1second downtime
- system calculates with current load that they need a 2second downtime

So they can decide:
- change the downtime to 2seconds (easy)
- change the apps running on the guest to dirty less memory (It dependes
  on the guest, app, etc).

I don't know if anyone is using it at all.

Later, Juan.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-03-31 18:55 [Qemu-devel] [PATCH] migration: calculate expected_downtime with ram_bytes_remaining() Balamuruhan S
  2018-04-03  6:10 ` Peter Xu
@ 2018-04-04  9:04 ` Juan Quintela
  2018-04-10  9:52   ` Balamuruhan S
  1 sibling, 1 reply; 15+ messages in thread
From: Juan Quintela @ 2018-04-04  9:04 UTC (permalink / raw)
  To: Balamuruhan S; +Cc: qemu-devel, amit.shah

Balamuruhan S <bala24@linux.vnet.ibm.com> wrote:
> expected_downtime value is not accurate with dirty_pages_rate * page_size,
> using ram_bytes_remaining would yeild it correct.
>
> Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

See my other mail on the thread, my understanding is that your change is
corret (TM).

Thanks, Juan.

> ---
>  migration/migration.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 58bd382730..4e43dc4f92 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2245,8 +2245,7 @@ static void migration_update_counters(MigrationState *s,
>       * recalculate. 10000 is a small enough number for our purposes
>       */
>      if (ram_counters.dirty_pages_rate && transferred > 10000) {
> -        s->expected_downtime = ram_counters.dirty_pages_rate *
> -            qemu_target_page_size() / bandwidth;
> +        s->expected_downtime = ram_bytes_remaining() / bandwidth;
>      }
>  
>      qemu_file_reset_rate_limit(s->to_dst_file);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-04  9:04 ` Juan Quintela
@ 2018-04-10  9:52   ` Balamuruhan S
  2018-04-10 10:52     ` Balamuruhan S
  0 siblings, 1 reply; 15+ messages in thread
From: Balamuruhan S @ 2018-04-10  9:52 UTC (permalink / raw)
  To: Juan Quintela; +Cc: qemu-devel

On Wed, Apr 04, 2018 at 11:04:59AM +0200, Juan Quintela wrote:
> Balamuruhan S <bala24@linux.vnet.ibm.com> wrote:
> > expected_downtime value is not accurate with dirty_pages_rate * page_size,
> > using ram_bytes_remaining would yeild it correct.
> >
> > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> See my other mail on the thread, my understanding is that your change is
> corret (TM).

Juan, Please help to merge it.

Regards,
Bala

> 
> Thanks, Juan.
> 
> > ---
> >  migration/migration.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 58bd382730..4e43dc4f92 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -2245,8 +2245,7 @@ static void migration_update_counters(MigrationState *s,
> >       * recalculate. 10000 is a small enough number for our purposes
> >       */
> >      if (ram_counters.dirty_pages_rate && transferred > 10000) {
> > -        s->expected_downtime = ram_counters.dirty_pages_rate *
> > -            qemu_target_page_size() / bandwidth;
> > +        s->expected_downtime = ram_bytes_remaining() / bandwidth;
> >      }
> >  
> >      qemu_file_reset_rate_limit(s->to_dst_file);
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-10  9:52   ` Balamuruhan S
@ 2018-04-10 10:52     ` Balamuruhan S
  0 siblings, 0 replies; 15+ messages in thread
From: Balamuruhan S @ 2018-04-10 10:52 UTC (permalink / raw)
  To: Juan Quintela; +Cc: qemu-devel, Qemu-devel

On 2018-04-10 15:22, Balamuruhan S wrote:
> On Wed, Apr 04, 2018 at 11:04:59AM +0200, Juan Quintela wrote:
>> Balamuruhan S <bala24@linux.vnet.ibm.com> wrote:
>> > expected_downtime value is not accurate with dirty_pages_rate * page_size,
>> > using ram_bytes_remaining would yeild it correct.
>> >
>> > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
>> 
>> Reviewed-by: Juan Quintela <quintela@redhat.com>
>> 
>> See my other mail on the thread, my understanding is that your change 
>> is
>> corret (TM).
> 
> Juan, Please help to merge it.

Sorry for asking it as during discussion going on, but the reason is 
currently
postcopy migration for HP backed P8 guest from P8 -> P9 is broken and to 
use
precopy with appropriate downtime value we need this patch to be 
backported
to distros that is to be released soon.

> 
> Regards,
> Bala
> 
>> 
>> Thanks, Juan.
>> 
>> > ---
>> >  migration/migration.c | 3 +--
>> >  1 file changed, 1 insertion(+), 2 deletions(-)
>> >
>> > diff --git a/migration/migration.c b/migration/migration.c
>> > index 58bd382730..4e43dc4f92 100644
>> > --- a/migration/migration.c
>> > +++ b/migration/migration.c
>> > @@ -2245,8 +2245,7 @@ static void migration_update_counters(MigrationState *s,
>> >       * recalculate. 10000 is a small enough number for our purposes
>> >       */
>> >      if (ram_counters.dirty_pages_rate && transferred > 10000) {
>> > -        s->expected_downtime = ram_counters.dirty_pages_rate *
>> > -            qemu_target_page_size() / bandwidth;
>> > +        s->expected_downtime = ram_bytes_remaining() / bandwidth;
>> >      }
>> >
>> >      qemu_file_reset_rate_limit(s->to_dst_file);
>> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-10 10:02         ` Dr. David Alan Gilbert
@ 2018-04-11  1:28           ` David Gibson
  0 siblings, 0 replies; 15+ messages in thread
From: David Gibson @ 2018-04-11  1:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Balamuruhan S, Peter Xu, qemu-devel, quintela

[-- Attachment #1: Type: text/plain, Size: 6710 bytes --]

On Tue, 10 Apr 2018 11:02:36 +0100
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * David Gibson (dgibson@redhat.com) wrote:
> > On Mon, 9 Apr 2018 19:57:47 +0100
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> >   
> > > * Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:  
> > > > On 2018-04-04 13:36, Peter Xu wrote:    
> > > > > On Wed, Apr 04, 2018 at 11:55:14AM +0530, Balamuruhan S wrote:  
> > [snip]  
> > > > > > > - postcopy: that'll let you start the destination VM even without
> > > > > > >   transferring all the RAMs before hand    
> > > > > > 
> > > > > > I am seeing issue in postcopy migration between POWER8(16M) ->
> > > > > > POWER9(1G)
> > > > > > where the hugepage size is different. I am trying to enable it but
> > > > > > host
> > > > > > start
> > > > > > address have to be aligned with 1G page size in
> > > > > > ram_block_discard_range(),
> > > > > > which I am debugging further to fix it.    
> > > > > 
> > > > > I thought the huge page size needs to be matched on both side
> > > > > currently for postcopy but I'm not sure.    
> > > > 
> > > > you are right! it should be matched, but we need to support
> > > > POWER8(16M) -> POWER9(1G)
> > > >     
> > > > > CC Dave (though I think Dave's still on PTO).    
> > > 
> > > There's two problems there:
> > >   a) Postcopy with really big huge pages is a problem, because it takes
> > >      a long time to send the whole 1G page over the network and the vCPU
> > >      is paused during that time;  for example on a 10Gbps link, it takes
> > >      about 1 second to send a 1G page, so that's a silly time to keep
> > >      the vCPU paused.
> > > 
> > >   b) Mismatched pagesizes are a problem on postcopy; we require that the
> > >      whole of a hostpage is sent continuously, so that it can be
> > >      atomically placed in memory, the source knows to do this based on
> > >      the page sizes that it sees.  There are some other cases as well 
> > >      (e.g. discards have to be page aligned.)  
> > 
> > I'm not entirely clear on what mismatched means here.  Mismatched
> > between where and where?  I *think* the relevant thing is a mismatch
> > between host backing page size on source and destination, but I'm not
> > certain.  
> 
> Right.  As I understand it, we make no requirements on (an x86) guest
> as to what page sizes it uses given any particular host page sizes.

Right - AIUI there are basically separate gva->gpa and gpa->hpa page
tables and the pagesizes in each are unrelated.  That's also how it
works on POWER9 radix mode, so it doesn't suffer this restriction
either. In hash mode, though, there's just a single va->hpa hashed page
table which is owned by the host and updated by the guest via hcall.

>  [...]  
> > 
> > Sounds feasible, but like something that will take some thought and
> > time upstream.  
> 
> Yes; it's not too bad.
> 
> > > (a) is a much much harder problem; one *idea* would be a major
> > > reorganisation of the kernels hugepage + userfault code to somehow
> > > allow them to temporarily present as normal pages rather than a
> > > hugepage.  
> > 
> > Yeah... for Power specifically, I think doing that would be really
> > hard, verging on impossible, because of the way the MMU is
> > virtualized.  Well.. it's probably not too bad for a native POWER9
> > guest (using the radix MMU), but the issue here is for POWER8 compat
> > guests which use the hash MMU.  
> 
> My idea was to fill the pagetables for that hugepage using small page
> entries but using the physical hugepages memory; so that once we're
> done we'd flip it back to being a single hugepage entry.
> (But my understanding is that doesn't fit at all into the way the kernel
> hugepage code works).

I think it should be possible with hugepage code, although we might end
up only the physical allocation side of the existing hugepage code, not
the actual putting it in the pagetable parts.  Which is not to say
there couldn't be some curly edge cases.

The bigger problem for us is it really doesn't fit with the way HPT
virtualization works.  The way the hcalls are designed assume a 1-to-1
correspondance between PTEs in the guest view and real hardware PTEs.
It's technically possible, I guess, that we could set up a shadow hash
table beside the guest view of the hash table and populate the former
based on the latter, but it would be a complete PITA.

> > > Does P9 really not have a hugepage that's smaller than 1G?  
> > 
> > It does (2M), but we can't use it in this situation.  As hinted above,
> > POWER9 has two very different MMU modes, hash and radix.  In hash mode
> > (which is similar to POWER8 and earlier CPUs) the hugepage sizes are
> > 16M and 16G, in radix mode (more like x86) they are 2M and 1G.
> > 
> > POWER9 hosts always run in radix mode.  Or at least, we only support
> > running them in radix mode.  We support both radix mode and hash mode
> > guests, the latter including all POWER8 compat mode guests.
> > 
> > The next complication is because the way the hash virtualization works,
> > any page used by the guest must be HPA-contiguous, not just
> > GPA-contiguous.  Which means that any pagesize used by the guest must
> > be smaller or equal than the host pagesizes used to back the guest.
> > We (sort of) cope with that by only advertising the 16M pagesize to the
> > guest if all guest RAM is backed by >= 16M pages.
> > 
> > But that advertisement only happens at guest boot.  So if we migrate a
> > guest from POWER8, backed by 16M pages to POWER9 backed by 2M pages,
> > the guest still thinks it can use 16M pages and jams up.  (I'm in the
> > middle of upstream work to make the failure mode less horrible).
> > 
> > So, the only way to run a POWER8 compat mode guest with access to 16M
> > pages on a POWER9 radix mode host is using 1G hugepages on the host
> > side.  
> 
> Ah ok;  I'm not seeing an easy answer here.
> The only vague thing I can think of is if you gave P9 a fake 16M
> hugepage mode, that did all HPA and mappings in 16M chunks (using 8 x 2M
> page entries).

Huh.. that's a really interesting idea.  Basically use the physical
allocation side of the hugepage stuff to allow allocation of 16M
contiguous chunks, even though they'd actually be mapped with 8 2M PTEs
when in radix mode.  I'll talk to some people and see if this might be
feasible.

Otherwise I think we basically just have to say "No, won't work" to
migrations of HPT hugepage backed guests to a radix host.

-- 
David Gibson <dgibson@redhat.com>
Principal Software Engineer, Virtualization, Red Hat

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-10  1:22       ` David Gibson
@ 2018-04-10 10:02         ` Dr. David Alan Gilbert
  2018-04-11  1:28           ` David Gibson
  0 siblings, 1 reply; 15+ messages in thread
From: Dr. David Alan Gilbert @ 2018-04-10 10:02 UTC (permalink / raw)
  To: David Gibson; +Cc: Balamuruhan S, Peter Xu, qemu-devel, quintela

* David Gibson (dgibson@redhat.com) wrote:
> On Mon, 9 Apr 2018 19:57:47 +0100
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:
> > > On 2018-04-04 13:36, Peter Xu wrote:  
> > > > On Wed, Apr 04, 2018 at 11:55:14AM +0530, Balamuruhan S wrote:
> [snip]
> > > > > > - postcopy: that'll let you start the destination VM even without
> > > > > >   transferring all the RAMs before hand  
> > > > > 
> > > > > I am seeing issue in postcopy migration between POWER8(16M) ->
> > > > > POWER9(1G)
> > > > > where the hugepage size is different. I am trying to enable it but
> > > > > host
> > > > > start
> > > > > address have to be aligned with 1G page size in
> > > > > ram_block_discard_range(),
> > > > > which I am debugging further to fix it.  
> > > > 
> > > > I thought the huge page size needs to be matched on both side
> > > > currently for postcopy but I'm not sure.  
> > > 
> > > you are right! it should be matched, but we need to support
> > > POWER8(16M) -> POWER9(1G)
> > >   
> > > > CC Dave (though I think Dave's still on PTO).  
> > 
> > There's two problems there:
> >   a) Postcopy with really big huge pages is a problem, because it takes
> >      a long time to send the whole 1G page over the network and the vCPU
> >      is paused during that time;  for example on a 10Gbps link, it takes
> >      about 1 second to send a 1G page, so that's a silly time to keep
> >      the vCPU paused.
> > 
> >   b) Mismatched pagesizes are a problem on postcopy; we require that the
> >      whole of a hostpage is sent continuously, so that it can be
> >      atomically placed in memory, the source knows to do this based on
> >      the page sizes that it sees.  There are some other cases as well 
> >      (e.g. discards have to be page aligned.)
> 
> I'm not entirely clear on what mismatched means here.  Mismatched
> between where and where?  I *think* the relevant thing is a mismatch
> between host backing page size on source and destination, but I'm not
> certain.

Right.  As I understand it, we make no requirements on (an x86) guest
as to what page sizes it uses given any particular host page sizes.

> > Both of the problems are theoretically fixable; but neither case is
> > easy.
> > (b) could be fixed by sending the hugepage size back to the source,
> > so that it knows to perform alignments on a larger boundary to it's
> > own RAM blocks.
> 
> Sounds feasible, but like something that will take some thought and
> time upstream.

Yes; it's not too bad.

> > (a) is a much much harder problem; one *idea* would be a major
> > reorganisation of the kernels hugepage + userfault code to somehow
> > allow them to temporarily present as normal pages rather than a
> > hugepage.
> 
> Yeah... for Power specifically, I think doing that would be really
> hard, verging on impossible, because of the way the MMU is
> virtualized.  Well.. it's probably not too bad for a native POWER9
> guest (using the radix MMU), but the issue here is for POWER8 compat
> guests which use the hash MMU.

My idea was to fill the pagetables for that hugepage using small page
entries but using the physical hugepages memory; so that once we're
done we'd flip it back to being a single hugepage entry.
(But my understanding is that doesn't fit at all into the way the kernel
hugepage code works).

> > Does P9 really not have a hugepage that's smaller than 1G?
> 
> It does (2M), but we can't use it in this situation.  As hinted above,
> POWER9 has two very different MMU modes, hash and radix.  In hash mode
> (which is similar to POWER8 and earlier CPUs) the hugepage sizes are
> 16M and 16G, in radix mode (more like x86) they are 2M and 1G.
> 
> POWER9 hosts always run in radix mode.  Or at least, we only support
> running them in radix mode.  We support both radix mode and hash mode
> guests, the latter including all POWER8 compat mode guests.
> 
> The next complication is because the way the hash virtualization works,
> any page used by the guest must be HPA-contiguous, not just
> GPA-contiguous.  Which means that any pagesize used by the guest must
> be smaller or equal than the host pagesizes used to back the guest.
> We (sort of) cope with that by only advertising the 16M pagesize to the
> guest if all guest RAM is backed by >= 16M pages.
> 
> But that advertisement only happens at guest boot.  So if we migrate a
> guest from POWER8, backed by 16M pages to POWER9 backed by 2M pages,
> the guest still thinks it can use 16M pages and jams up.  (I'm in the
> middle of upstream work to make the failure mode less horrible).
> 
> So, the only way to run a POWER8 compat mode guest with access to 16M
> pages on a POWER9 radix mode host is using 1G hugepages on the host
> side.

Ah ok;  I'm not seeing an easy answer here.
The only vague thing I can think of is if you gave P9 a fake 16M
hugepage mode, that did all HPA and mappings in 16M chunks (using 8 x 2M
page entries).

Dave

> -- 
> David Gibson <dgibson@redhat.com>
> Principal Software Engineer, Virtualization, Red Hat


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-09 18:57     ` Dr. David Alan Gilbert
@ 2018-04-10  1:22       ` David Gibson
  2018-04-10 10:02         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 15+ messages in thread
From: David Gibson @ 2018-04-10  1:22 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Balamuruhan S, Peter Xu, qemu-devel, quintela

[-- Attachment #1: Type: text/plain, Size: 4288 bytes --]

On Mon, 9 Apr 2018 19:57:47 +0100
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:
> > On 2018-04-04 13:36, Peter Xu wrote:  
> > > On Wed, Apr 04, 2018 at 11:55:14AM +0530, Balamuruhan S wrote:
[snip]
> > > > > - postcopy: that'll let you start the destination VM even without
> > > > >   transferring all the RAMs before hand  
> > > > 
> > > > I am seeing issue in postcopy migration between POWER8(16M) ->
> > > > POWER9(1G)
> > > > where the hugepage size is different. I am trying to enable it but
> > > > host
> > > > start
> > > > address have to be aligned with 1G page size in
> > > > ram_block_discard_range(),
> > > > which I am debugging further to fix it.  
> > > 
> > > I thought the huge page size needs to be matched on both side
> > > currently for postcopy but I'm not sure.  
> > 
> > you are right! it should be matched, but we need to support
> > POWER8(16M) -> POWER9(1G)
> >   
> > > CC Dave (though I think Dave's still on PTO).  
> 
> There's two problems there:
>   a) Postcopy with really big huge pages is a problem, because it takes
>      a long time to send the whole 1G page over the network and the vCPU
>      is paused during that time;  for example on a 10Gbps link, it takes
>      about 1 second to send a 1G page, so that's a silly time to keep
>      the vCPU paused.
> 
>   b) Mismatched pagesizes are a problem on postcopy; we require that the
>      whole of a hostpage is sent continuously, so that it can be
>      atomically placed in memory, the source knows to do this based on
>      the page sizes that it sees.  There are some other cases as well 
>      (e.g. discards have to be page aligned.)

I'm not entirely clear on what mismatched means here.  Mismatched
between where and where?  I *think* the relevant thing is a mismatch
between host backing page size on source and destination, but I'm not
certain.

> Both of the problems are theoretically fixable; but neither case is
> easy.
> (b) could be fixed by sending the hugepage size back to the source,
> so that it knows to perform alignments on a larger boundary to it's
> own RAM blocks.

Sounds feasible, but like something that will take some thought and
time upstream.

> (a) is a much much harder problem; one *idea* would be a major
> reorganisation of the kernels hugepage + userfault code to somehow
> allow them to temporarily present as normal pages rather than a
> hugepage.

Yeah... for Power specifically, I think doing that would be really
hard, verging on impossible, because of the way the MMU is
virtualized.  Well.. it's probably not too bad for a native POWER9
guest (using the radix MMU), but the issue here is for POWER8 compat
guests which use the hash MMU.

> Does P9 really not have a hugepage that's smaller than 1G?

It does (2M), but we can't use it in this situation.  As hinted above,
POWER9 has two very different MMU modes, hash and radix.  In hash mode
(which is similar to POWER8 and earlier CPUs) the hugepage sizes are
16M and 16G, in radix mode (more like x86) they are 2M and 1G.

POWER9 hosts always run in radix mode.  Or at least, we only support
running them in radix mode.  We support both radix mode and hash mode
guests, the latter including all POWER8 compat mode guests.

The next complication is because the way the hash virtualization works,
any page used by the guest must be HPA-contiguous, not just
GPA-contiguous.  Which means that any pagesize used by the guest must
be smaller or equal than the host pagesizes used to back the guest.
We (sort of) cope with that by only advertising the 16M pagesize to the
guest if all guest RAM is backed by >= 16M pages.

But that advertisement only happens at guest boot.  So if we migrate a
guest from POWER8, backed by 16M pages to POWER9 backed by 2M pages,
the guest still thinks it can use 16M pages and jams up.  (I'm in the
middle of upstream work to make the failure mode less horrible).

So, the only way to run a POWER8 compat mode guest with access to 16M
pages on a POWER9 radix mode host is using 1G hugepages on the host
side.

-- 
David Gibson <dgibson@redhat.com>
Principal Software Engineer, Virtualization, Red Hat

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-04  8:49   ` Balamuruhan S
@ 2018-04-09 18:57     ` Dr. David Alan Gilbert
  2018-04-10  1:22       ` David Gibson
  0 siblings, 1 reply; 15+ messages in thread
From: Dr. David Alan Gilbert @ 2018-04-09 18:57 UTC (permalink / raw)
  To: Balamuruhan S; +Cc: Peter Xu, qemu-devel, quintela, dgibson

* Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:
> On 2018-04-04 13:36, Peter Xu wrote:
> > On Wed, Apr 04, 2018 at 11:55:14AM +0530, Balamuruhan S wrote:
> > 
> > [...]
> > 
> > > > too. So still I'll put aside the "which one is better" question.
> > > >
> > > > For your use case, you can have a look on either of below way to
> > > > have a converged migration:
> > > >
> > > > - auto-converge: that's a migration capability that throttles CPU
> > > >   usage of guests
> > > 
> > > I used auto-converge option before hand and still it doesn't help
> > > for migration to complete
> > 
> > Have you digged about why?  AFAIK auto-convergence will at last absort
> > merely the whole vcpu resource (99% of them maximum).  Maybe you are
> > not with the best throttle values?  Or do you think that could be a
> > auto-convergence bug too?
> 
> I am not sure, I will work on it to find why.

> > 
> > > 
> > > >
> > > > - postcopy: that'll let you start the destination VM even without
> > > >   transferring all the RAMs before hand
> > > 
> > > I am seeing issue in postcopy migration between POWER8(16M) ->
> > > POWER9(1G)
> > > where the hugepage size is different. I am trying to enable it but
> > > host
> > > start
> > > address have to be aligned with 1G page size in
> > > ram_block_discard_range(),
> > > which I am debugging further to fix it.
> > 
> > I thought the huge page size needs to be matched on both side
> > currently for postcopy but I'm not sure.
> 
> you are right! it should be matched, but we need to support
> POWER8(16M) -> POWER9(1G)
> 
> > CC Dave (though I think Dave's still on PTO).

There's two problems there:
  a) Postcopy with really big huge pages is a problem, because it takes
     a long time to send the whole 1G page over the network and the vCPU
     is paused during that time;  for example on a 10Gbps link, it takes
     about 1 second to send a 1G page, so that's a silly time to keep
     the vCPU paused.

  b) Mismatched pagesizes are a problem on postcopy; we require that the
     whole of a hostpage is sent continuously, so that it can be
     atomically placed in memory, the source knows to do this based on
     the page sizes that it sees.  There are some other cases as well 
     (e.g. discards have to be page aligned.)

Both of the problems are theoretically fixable; but neither case is
easy.
(b) could be fixed by sending the hugepage size back to the source,
so that it knows to perform alignments on a larger boundary to it's
own RAM blocks.

(a) is a much much harder problem; one *idea* would be a major
reorganisation of the kernels hugepage + userfault code to somehow
allow them to temporarily present as normal pages rather than a
hugepage.

Does P9 really not have a hugepage that's smaller than 1G?

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-04  8:06 ` Peter Xu
@ 2018-04-04  8:49   ` Balamuruhan S
  2018-04-09 18:57     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 15+ messages in thread
From: Balamuruhan S @ 2018-04-04  8:49 UTC (permalink / raw)
  To: Peter Xu; +Cc: Dr. David Alan Gilbert, qemu-devel, amit.shah, quintela

On 2018-04-04 13:36, Peter Xu wrote:
> On Wed, Apr 04, 2018 at 11:55:14AM +0530, Balamuruhan S wrote:
> 
> [...]
> 
>> > too. So still I'll put aside the "which one is better" question.
>> >
>> > For your use case, you can have a look on either of below way to
>> > have a converged migration:
>> >
>> > - auto-converge: that's a migration capability that throttles CPU
>> >   usage of guests
>> 
>> I used auto-converge option before hand and still it doesn't help
>> for migration to complete
> 
> Have you digged about why?  AFAIK auto-convergence will at last absort
> merely the whole vcpu resource (99% of them maximum).  Maybe you are
> not with the best throttle values?  Or do you think that could be a
> auto-convergence bug too?

I am not sure, I will work on it to find why.

> 
>> 
>> >
>> > - postcopy: that'll let you start the destination VM even without
>> >   transferring all the RAMs before hand
>> 
>> I am seeing issue in postcopy migration between POWER8(16M) -> 
>> POWER9(1G)
>> where the hugepage size is different. I am trying to enable it but 
>> host
>> start
>> address have to be aligned with 1G page size in 
>> ram_block_discard_range(),
>> which I am debugging further to fix it.
> 
> I thought the huge page size needs to be matched on both side
> currently for postcopy but I'm not sure.

you are right! it should be matched, but we need to support
POWER8(16M) -> POWER9(1G)

> CC Dave (though I think Dave's still on PTO).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] migration: calculate expected_downtime with ram_bytes_remaining()
  2018-04-04  6:25 Balamuruhan S
@ 2018-04-04  8:06 ` Peter Xu
  2018-04-04  8:49   ` Balamuruhan S
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Xu @ 2018-04-04  8:06 UTC (permalink / raw)
  To: Balamuruhan S, Dr. David Alan Gilbert; +Cc: qemu-devel, amit.shah, quintela

On Wed, Apr 04, 2018 at 11:55:14AM +0530, Balamuruhan S wrote:

[...]

> > too. So still I'll put aside the "which one is better" question.
> > 
> > For your use case, you can have a look on either of below way to
> > have a converged migration:
> > 
> > - auto-converge: that's a migration capability that throttles CPU
> >   usage of guests
> 
> I used auto-converge option before hand and still it doesn't help
> for migration to complete

Have you digged about why?  AFAIK auto-convergence will at last absort
merely the whole vcpu resource (99% of them maximum).  Maybe you are
not with the best throttle values?  Or do you think that could be a
auto-convergence bug too?

> 
> > 
> > - postcopy: that'll let you start the destination VM even without
> >   transferring all the RAMs before hand
> 
> I am seeing issue in postcopy migration between POWER8(16M) -> POWER9(1G)
> where the hugepage size is different. I am trying to enable it but host
> start
> address have to be aligned with 1G page size in ram_block_discard_range(),
> which I am debugging further to fix it.

I thought the huge page size needs to be matched on both side
currently for postcopy but I'm not sure.  CC Dave (though I think
Dave's still on PTO).

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] migration: calculate expected_downtime with ram_bytes_remaining()
@ 2018-04-04  6:25 Balamuruhan S
  2018-04-04  8:06 ` Peter Xu
  0 siblings, 1 reply; 15+ messages in thread
From: Balamuruhan S @ 2018-04-04  6:25 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, amit.shah, quintela

On 2018-04-04 07:29, Peter Xu wrote:
> On Tue, Apr 03, 2018 at 11:00:00PM +0530, bala24 wrote:
>> On 2018-04-03 11:40, Peter Xu wrote:
>> > On Sun, Apr 01, 2018 at 12:25:36AM +0530, Balamuruhan S wrote:
>> > > expected_downtime value is not accurate with dirty_pages_rate *
>> > > page_size,
>> > > using ram_bytes_remaining would yeild it correct.
>> > >
>> > > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
>> > > ---
>> > >  migration/migration.c | 3 +--
>> > >  1 file changed, 1 insertion(+), 2 deletions(-)
>> > >
>> > > diff --git a/migration/migration.c b/migration/migration.c
>> > > index 58bd382730..4e43dc4f92 100644
>> > > --- a/migration/migration.c
>> > > +++ b/migration/migration.c
>> > > @@ -2245,8 +2245,7 @@ static void
>> > > migration_update_counters(MigrationState *s,
>> > >       * recalculate. 10000 is a small enough number for our purposes
>> > >       */
>> > >      if (ram_counters.dirty_pages_rate && transferred > 10000) {
>> > > -        s->expected_downtime = ram_counters.dirty_pages_rate *
>> > > -            qemu_target_page_size() / bandwidth;
>> > > +        s->expected_downtime = ram_bytes_remaining() / bandwidth;
>> >
>> > This field was removed in e4ed1541ac ("savevm: New save live migration
>> > method: pending", 2012-12-20), in which remaing RAM was used.
>> >
>> > And it was added back in 90f8ae724a ("migration: calculate
>> > expected_downtime", 2013-02-22), in which dirty rate was used.
>> >
>> > However I didn't find a clue on why we changed from using remaining
>> > RAM to using dirty rate...  So I'll leave this question to Juan.
>> >
>> > Besides, I'm a bit confused on when we'll want such a value.  AFAIU
>> > precopy is mostly used by setting up the target downtime before hand,
>> > so we should already know the downtime before hand.  Then why we want
>> > to observe such a thing?
>> 
>> Thanks Peter Xu for reviewing,
>> 
>> I tested precopy migration with 16M hugepage backed ppc guest and
>> granularity
>> of page size in migration is 4K so any page dirtied would result in 
>> 4096
>> pages
>> to be transmitted again, this led for migration to continue endless,
>> 
>> default migrate_parameters:
>> downtime-limit: 300 milliseconds
>> 
>> info migrate:
>> expected downtime: 1475 milliseconds
>> 
>> Migration status: active
>> total time: 130874 milliseconds
>> expected downtime: 1475 milliseconds
>> setup: 3475 milliseconds
>> transferred ram: 18197383 kbytes
>> throughput: 866.83 mbps
>> remaining ram: 376892 kbytes
>> total ram: 8388864 kbytes
>> duplicate: 1678265 pages
>> skipped: 0 pages
>> normal: 4536795 pages
>> normal bytes: 18147180 kbytes
>> dirty sync count: 6
>> page size: 4 kbytes
>> dirty pages rate: 39044 pages
>> 
>> In order to complete migration I configured downtime-limit to 1475
>> milliseconds but still migration was endless. Later calculated 
>> expected
>> downtime by remaining ram 376892 Kbytes / 866.83 mbps yeilded 3478.34
>> milliseconds and configuring it as downtime-limit succeeds the 
>> migration
>> to complete. This led to the conclusion that expected downtime is not
>> accurate.
> 
> Hmm, thanks for the information.  I'd say your calculation seems
> reasonable to me: it shows how long time will it need if we stop the
> VM now on source immediately and migrate the rest. However Juan might
> have an explanation on existing algorithm which I would like to know

Sure, I agree

> too. So still I'll put aside the "which one is better" question.
> 
> For your use case, you can have a look on either of below way to
> have a converged migration:
> 
> - auto-converge: that's a migration capability that throttles CPU
>   usage of guests

I used auto-converge option before hand and still it doesn't help
for migration to complete

> 
> - postcopy: that'll let you start the destination VM even without
>   transferring all the RAMs before hand

I am seeing issue in postcopy migration between POWER8(16M) -> 
POWER9(1G)
where the hugepage size is different. I am trying to enable it but host 
start
address have to be aligned with 1G page size in 
ram_block_discard_range(),
which I am debugging further to fix it.

Regards,
Balamuruhan S

> 
> Either of the technique can be configured via "migrate_set_capability"
> HMP command or "migrate-set-capabilities" QMP command (some googling
> would show detailed steps). And, either of above should help you to
> migrate successfully in this hard-to-converge scenario, instead of
> your current way (observing downtime, set downtime).
> 
> Meanwhile, I'm thinking whether instead of observing the downtime in
> real time, whether we should introduce a command to stop the VM
> immediately to migrate the rest when we want, or, a new parameter to
> current "migrate" command.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-04-11  1:28 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-31 18:55 [Qemu-devel] [PATCH] migration: calculate expected_downtime with ram_bytes_remaining() Balamuruhan S
2018-04-03  6:10 ` Peter Xu
2018-04-03 17:30   ` bala24
2018-04-04  1:59     ` Peter Xu
2018-04-04  9:02   ` Juan Quintela
2018-04-04  9:04 ` Juan Quintela
2018-04-10  9:52   ` Balamuruhan S
2018-04-10 10:52     ` Balamuruhan S
2018-04-04  6:25 Balamuruhan S
2018-04-04  8:06 ` Peter Xu
2018-04-04  8:49   ` Balamuruhan S
2018-04-09 18:57     ` Dr. David Alan Gilbert
2018-04-10  1:22       ` David Gibson
2018-04-10 10:02         ` Dr. David Alan Gilbert
2018-04-11  1:28           ` David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.