qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [question] The source cannot recover, if the destination fails in the last round of live migration
@ 2021-05-06 13:02 Kunkun Jiang
  2021-05-06 13:05 ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 5+ messages in thread
From: Kunkun Jiang @ 2021-05-06 13:02 UTC (permalink / raw)
  To: Juan Quintela, Dr. David Alan Gilbert, open list:All patches CC here
  Cc: Zenghui Yu, wanghaibin.wang, Keqian Zhu, Peter Xu, David Edmondson

Hi all,

Recently I am learning about the part of live migration.
I have a question about the last round.

When the pending_size is less than the threshold, it will enter
the last round and call migration_completion(). It will stop the
source and sent the remaining dirty pages and devices' status
information to the destination. The destination will load these
information and start the VM.

If there is an error at the destination at this time, it will exit
directly, and the source will not be able to detect the error
and recover. Because the source will not call
migration_detect_error().

Is my understanding correct?
Should the source wait the result of the last round of destination ?

Thanks,
Kunkun Jiang




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [question] The source cannot recover, if the destination fails in the last round of live migration
  2021-05-06 13:02 [question] The source cannot recover, if the destination fails in the last round of live migration Kunkun Jiang
@ 2021-05-06 13:05 ` Dr. David Alan Gilbert
  2021-05-07  9:46   ` Kunkun Jiang
  0 siblings, 1 reply; 5+ messages in thread
From: Dr. David Alan Gilbert @ 2021-05-06 13:05 UTC (permalink / raw)
  To: Kunkun Jiang
  Cc: David Edmondson, Juan Quintela, open list:All patches CC here,
	Peter Xu, Zenghui Yu, wanghaibin.wang, Keqian Zhu

* Kunkun Jiang (jiangkunkun@huawei.com) wrote:
> Hi all,

Hi,

> Recently I am learning about the part of live migration.
> I have a question about the last round.
> 
> When the pending_size is less than the threshold, it will enter
> the last round and call migration_completion(). It will stop the
> source and sent the remaining dirty pages and devices' status
> information to the destination. The destination will load these
> information and start the VM.
> 
> If there is an error at the destination at this time, it will exit
> directly, and the source will not be able to detect the error
> and recover. Because the source will not call
> migration_detect_error().
> 
> Is my understanding correct?
> Should the source wait the result of the last round of destination ?

Try setting the 'return-path' migration capability on both the source
and destination;  I think it's that option will cause the destination to
send an OK/error at the end and the source to wait for it.

Dave

> Thanks,
> Kunkun Jiang
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [question] The source cannot recover, if the destination fails in the last round of live migration
  2021-05-06 13:05 ` Dr. David Alan Gilbert
@ 2021-05-07  9:46   ` Kunkun Jiang
  2021-05-07 14:57     ` Peter Xu
  0 siblings, 1 reply; 5+ messages in thread
From: Kunkun Jiang @ 2021-05-07  9:46 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: David Edmondson, Juan Quintela, open list:All patches CC here,
	Peter Xu, Zenghui Yu, wanghaibin.wang, Keqian Zhu

Hi Dave,

On 2021/5/6 21:05, Dr. David Alan Gilbert wrote:
> * Kunkun Jiang (jiangkunkun@huawei.com) wrote:
>> Hi all,
> Hi,
>
>> Recently I am learning about the part of live migration.
>> I have a question about the last round.
>>
>> When the pending_size is less than the threshold, it will enter
>> the last round and call migration_completion(). It will stop the
>> source and sent the remaining dirty pages and devices' status
>> information to the destination. The destination will load these
>> information and start the VM.
>>
>> If there is an error at the destination at this time, it will exit
>> directly, and the source will not be able to detect the error
>> and recover. Because the source will not call
>> migration_detect_error().
>>
>> Is my understanding correct?
>> Should the source wait the result of the last round of destination ?
> Try setting the 'return-path' migration capability on both the source
> and destination;  I think it's that option will cause the destination to
> send an OK/error at the end and the source to wait for it.
Thank you for your reply!
The 'return-path' migration capability solved my question. 😁

But why not set it as the default? In my opinion, it is a basic ability
of live migration. We need it to judge whether the last round of the
destination is successful in the way of 'precopy'.

Looking forward to your reply.

Thanks,
Kunkun Jiang
> Dave
>
>> Thanks,
>> Kunkun Jiang
>>
>>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [question] The source cannot recover, if the destination fails in the last round of live migration
  2021-05-07  9:46   ` Kunkun Jiang
@ 2021-05-07 14:57     ` Peter Xu
  2021-05-10  8:46       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Xu @ 2021-05-07 14:57 UTC (permalink / raw)
  To: Kunkun Jiang
  Cc: Juan Quintela, David Edmondson, open list:All patches CC here,
	Dr. David Alan Gilbert, Zenghui Yu, wanghaibin.wang, Keqian Zhu

On Fri, May 07, 2021 at 05:46:44PM +0800, Kunkun Jiang wrote:
> Hi Dave,
> 
> On 2021/5/6 21:05, Dr. David Alan Gilbert wrote:
> > * Kunkun Jiang (jiangkunkun@huawei.com) wrote:
> > > Hi all,
> > Hi,
> > 
> > > Recently I am learning about the part of live migration.
> > > I have a question about the last round.
> > > 
> > > When the pending_size is less than the threshold, it will enter
> > > the last round and call migration_completion(). It will stop the
> > > source and sent the remaining dirty pages and devices' status
> > > information to the destination. The destination will load these
> > > information and start the VM.
> > > 
> > > If there is an error at the destination at this time, it will exit
> > > directly, and the source will not be able to detect the error
> > > and recover. Because the source will not call
> > > migration_detect_error().
> > > 
> > > Is my understanding correct?
> > > Should the source wait the result of the last round of destination ?
> > Try setting the 'return-path' migration capability on both the source
> > and destination;  I think it's that option will cause the destination to
> > send an OK/error at the end and the source to wait for it.
> Thank you for your reply!
> The 'return-path' migration capability solved my question. 😁
> 
> But why not set it as the default? In my opinion, it is a basic ability
> of live migration. We need it to judge whether the last round of the
> destination is successful in the way of 'precopy'.

I think it should be enabled as long as both sides support it; though may not
be suitable as default (at least in QEMU) so as to consider old binaries.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [question] The source cannot recover, if the destination fails in the last round of live migration
  2021-05-07 14:57     ` Peter Xu
@ 2021-05-10  8:46       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 5+ messages in thread
From: Dr. David Alan Gilbert @ 2021-05-10  8:46 UTC (permalink / raw)
  To: Peter Xu
  Cc: David Edmondson, Juan Quintela, Kunkun Jiang,
	open list:All patches CC here, Zenghui Yu, wanghaibin.wang,
	Keqian Zhu

* Peter Xu (peterx@redhat.com) wrote:
> On Fri, May 07, 2021 at 05:46:44PM +0800, Kunkun Jiang wrote:
> > Hi Dave,
> > 
> > On 2021/5/6 21:05, Dr. David Alan Gilbert wrote:
> > > * Kunkun Jiang (jiangkunkun@huawei.com) wrote:
> > > > Hi all,
> > > Hi,
> > > 
> > > > Recently I am learning about the part of live migration.
> > > > I have a question about the last round.
> > > > 
> > > > When the pending_size is less than the threshold, it will enter
> > > > the last round and call migration_completion(). It will stop the
> > > > source and sent the remaining dirty pages and devices' status
> > > > information to the destination. The destination will load these
> > > > information and start the VM.
> > > > 
> > > > If there is an error at the destination at this time, it will exit
> > > > directly, and the source will not be able to detect the error
> > > > and recover. Because the source will not call
> > > > migration_detect_error().
> > > > 
> > > > Is my understanding correct?
> > > > Should the source wait the result of the last round of destination ?
> > > Try setting the 'return-path' migration capability on both the source
> > > and destination;  I think it's that option will cause the destination to
> > > send an OK/error at the end and the source to wait for it.
> > Thank you for your reply!
> > The 'return-path' migration capability solved my question. 😁
> > 
> > But why not set it as the default? In my opinion, it is a basic ability
> > of live migration. We need it to judge whether the last round of the
> > destination is successful in the way of 'precopy'.
> 
> I think it should be enabled as long as both sides support it; though may not
> be suitable as default (at least in QEMU) so as to consider old binaries.

Right, we can't break migration from old qemu's (I think libvirt will
use it if available by default).

(Maybe some day we should enable it by default and deprecate turning it
off)

Dave

> Thanks,
> 
> -- 
> Peter Xu
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-05-10  8:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-06 13:02 [question] The source cannot recover, if the destination fails in the last round of live migration Kunkun Jiang
2021-05-06 13:05 ` Dr. David Alan Gilbert
2021-05-07  9:46   ` Kunkun Jiang
2021-05-07 14:57     ` Peter Xu
2021-05-10  8:46       ` Dr. David Alan Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).