All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Qemu-devel] Migration To-do list
@ 2012-11-13 17:46 Hudzia, Benoit
  2012-11-14  2:23 ` Isaku Yamahata
  0 siblings, 1 reply; 4+ messages in thread
From: Hudzia, Benoit @ 2012-11-13 17:46 UTC (permalink / raw)
  To: quintela, qemu-devel qemu-devel, Orit Wasserman, chegu_vinod,
	Isaku Yamahata, Michael Roth

Hi,

One concept we have been playing around in the context of  and hybrid and post copy and might make sense if you are orienting your effort toward RDMA / Post copy is to move most of the logic in the destination side. 

This is one thing you might want to consider as it  can solve some of the issue you currently have and allow you to maintain almost a single API / Protocol once integrating with post copy approach. 

The idea is to drive the migration from the destination side. I.e. The page are pulled from the destination and not pushed from the source side. 

Ex: current pre-copy :

	*extract dirty bitmap ( dirty bitmap extraction can be scheduled or triggered by destination) 
	* send it to the destination side
	* have the destination iterating over the bitmap ( can do page prioritization here)  
	* depending of protocol :
		_  with standard socket ( or RDS) : 
			. Destination : request page(s)<- can be batched
			.  source receive request send back the page
			. destination process  
		_ with RDMA : 
			. Destination Read Page from source to local page ( the page have been mapped to RDMA at the bitmap extraction) ( RDMA support scatter gather) 
		_ with post copy 
			. pretty much the same but the dirty bitmap reset is done in kernel during the post copy operation ( provide a better dirty bit tracking granularity)


Disadvantage: 
	* add a round trip that can be compensate with batch operation ( only with standard socket)  

Advantage :
	* most of the heavy lifting is done at the destination side leaving the source to respond to request in an event based format  
	* resolve a lot of issue you have with your threading form the sender side ( accounting etc.. )
	* extremely friendly to optimised solution 
	* if the bitmap generation is expensive we can overlap their generation creating a semi continuous delivery of them guaranteeing an uninterrupted and optimised  flow. => we decouple the bitmap generation from the send/ receive operation. 



Anyway , I will notify you as soon as I have the patch / library available for RDMA / postcopy. 

Note On the fault tolerance part: this require a lot more heavy code optimisation and poking around to guarantee efficient checkpointing. Most of the solution we tested so far ( Remus and an old version of kemari) scale poorly . Again, an RDMA / post copy solution is kind of necessary when you talk about check pointing enterprise class applications. 


Regards
Benoit





> -----Original Message-----
> From: Juan Quintela [mailto:quintela@redhat.com]
> Sent: 13 November 2012 16:19
> To: qemu-devel qemu-devel; Orit Wasserman; chegu_vinod@hp.com;
> Hudzia, Benoit; Isaku Yamahata; Michael Roth
> Subject: Migration ToDo list
> 
> 
> Hi
> 
> If you have anything else to put, please add.
> 
> Migration Thread
> * Plan is integrate it as one of first thing in December (me)
> * Remove copies with buffered file (me)
> 
> Bitmap Optimization
> * Finish moving to individual bitmaps for migration/vga/code
> * Make sure we don't copy things around
> * Shared memory bitmap with kvm?
> * Move to 2MB pages bitmap and then fine grain?
> 
> QIDL
> * Review the patches (me)
> 
> PostCopy
> * Review patches?
> * See what we can already integrate?
>   I remember for last year that we could integrate the 1st third or so
> 
> RDMA
> * Send RDMA/tcp/.... library they already have (Benoit)
> * This is required for postcopy
> * This can be used for precopy
> 
> General
> * Change protocol to:
>   a) being always 16byte aligned (paolo said that is faster)
>   b) do scatter/gather of the pages?
> 
> Fault Tolerance
> * That is built on top of migration code, but I have nothing to add.
> 
> Any more ideas?
> 
> Later, Juan.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] Migration To-do list
  2012-11-13 17:46 [Qemu-devel] Migration To-do list Hudzia, Benoit
@ 2012-11-14  2:23 ` Isaku Yamahata
  2012-11-14 10:07   ` Hudzia, Benoit
  2012-11-14 11:38   ` Orit Wasserman
  0 siblings, 2 replies; 4+ messages in thread
From: Isaku Yamahata @ 2012-11-14  2:23 UTC (permalink / raw)
  To: Hudzia, Benoit
  Cc: Michael Roth, Orit Wasserman, chegu_vinod, qemu-devel qemu-devel,
	quintela

On Tue, Nov 13, 2012 at 05:46:13PM +0000, Hudzia, Benoit wrote:
> Hi,
> 
> One concept we have been playing around in the context of  and hybrid and post copy and might make sense if you are orienting your effort toward RDMA / Post copy is to move most of the logic in the destination side. 
> 
> This is one thing you might want to consider as it  can solve some of the issue you currently have and allow you to maintain almost a single API / Protocol once integrating with post copy approach. 
> 
> The idea is to drive the migration from the destination side. I.e. The page are pulled from the destination and not pushed from the source side. 
> 
> Ex: current pre-copy :
> 
> 	*extract dirty bitmap ( dirty bitmap extraction can be scheduled or triggered by destination) 
> 	* send it to the destination side
> 	* have the destination iterating over the bitmap ( can do page prioritization here)  

IIRC last year, you mentioned page prioritization, but didn't this year.
Is it still supported?
Where is it implemented? in qemu or kernel?


> 	* depending of protocol :
> 		_  with standard socket ( or RDS) : 
> 			. Destination : request page(s)<- can be batched
> 			.  source receive request send back the page
> 			. destination process  
> 		_ with RDMA : 
> 			. Destination Read Page from source to local page ( the page have been mapped to RDMA at the bitmap extraction) ( RDMA support scatter gather) 

Although I'm not familiar with RDMA, RDMA requires the exchange of DMA-address between
sender and receiver in advance and pinning down pages.
It it correct?


> 		_ with post copy 
> 			. pretty much the same but the dirty bitmap reset is done in kernel during the post copy operation ( provide a better dirty bit tracking granularity)
> 
> 
> Disadvantage: 
> 	* add a round trip that can be compensate with batch operation ( only with standard socket)  
> 
> Advantage :
> 	* most of the heavy lifting is done at the destination side leaving the source to respond to request in an event based format  
> 	* resolve a lot of issue you have with your threading form the sender side ( accounting etc.. )
> 	* extremely friendly to optimised solution 
> 	* if the bitmap generation is expensive we can overlap their generation creating a semi continuous delivery of them guaranteeing an uninterrupted and optimised  flow. => we decouple the bitmap generation from the send/ receive operation. 
> 
> 
> 
> Anyway , I will notify you as soon as I have the patch / library available for RDMA / postcopy. 
> 
> Note On the fault tolerance part: this require a lot more heavy code optimisation and poking around to guarantee efficient checkpointing. Most of the solution we tested so far ( Remus and an old version of kemari) scale poorly . Again, an RDMA / post copy solution is kind of necessary when you talk about check pointing enterprise class applications. 

IIRC Kemari guys evaluated IB case. I'm not sure that it was with RDMA or IPoIB.

thanks,
> 
> 
> Regards
> Benoit
> 
> 
> 
> 
> 
> > -----Original Message-----
> > From: Juan Quintela [mailto:quintela@redhat.com]
> > Sent: 13 November 2012 16:19
> > To: qemu-devel qemu-devel; Orit Wasserman; chegu_vinod@hp.com;
> > Hudzia, Benoit; Isaku Yamahata; Michael Roth
> > Subject: Migration ToDo list
> > 
> > 
> > Hi
> > 
> > If you have anything else to put, please add.
> > 
> > Migration Thread
> > * Plan is integrate it as one of first thing in December (me)
> > * Remove copies with buffered file (me)
> > 
> > Bitmap Optimization
> > * Finish moving to individual bitmaps for migration/vga/code
> > * Make sure we don't copy things around
> > * Shared memory bitmap with kvm?
> > * Move to 2MB pages bitmap and then fine grain?
> > 
> > QIDL
> > * Review the patches (me)
> > 
> > PostCopy
> > * Review patches?
> > * See what we can already integrate?
> >   I remember for last year that we could integrate the 1st third or so
> > 
> > RDMA
> > * Send RDMA/tcp/.... library they already have (Benoit)
> > * This is required for postcopy
> > * This can be used for precopy
> > 
> > General
> > * Change protocol to:
> >   a) being always 16byte aligned (paolo said that is faster)
> >   b) do scatter/gather of the pages?
> > 
> > Fault Tolerance
> > * That is built on top of migration code, but I have nothing to add.
> > 
> > Any more ideas?
> > 
> > Later, Juan.
> 

-- 
yamahata

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] Migration To-do list
  2012-11-14  2:23 ` Isaku Yamahata
@ 2012-11-14 10:07   ` Hudzia, Benoit
  2012-11-14 11:38   ` Orit Wasserman
  1 sibling, 0 replies; 4+ messages in thread
From: Hudzia, Benoit @ 2012-11-14 10:07 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: Michael Roth, Orit Wasserman, chegu_vinod, qemu-devel qemu-devel,
	quintela

Inline 

> -----Original Message-----
> From: Isaku Yamahata [mailto:yamahata@valinux.co.jp]
> Sent: 14 November 2012 02:23
> To: Hudzia, Benoit
> Cc: quintela@redhat.com; qemu-devel qemu-devel; Orit Wasserman;
> chegu_vinod@hp.com; Michael Roth
> Subject: Re: Migration To-do list
> 
> On Tue, Nov 13, 2012 at 05:46:13PM +0000, Hudzia, Benoit wrote:
> > Hi,
> >
> > One concept we have been playing around in the context of  and hybrid
> and post copy and might make sense if you are orienting your effort toward
> RDMA / Post copy is to move most of the logic in the destination side.
> >
> > This is one thing you might want to consider as it  can solve some of the
> issue you currently have and allow you to maintain almost a single API /
> Protocol once integrating with post copy approach.
> >
> > The idea is to drive the migration from the destination side. I.e. The page
> are pulled from the destination and not pushed from the source side.
> >
> > Ex: current pre-copy :
> >
> > 	*extract dirty bitmap ( dirty bitmap extraction can be scheduled or
> triggered by destination)
> > 	* send it to the destination side
> > 	* have the destination iterating over the bitmap ( can do page
> prioritization here)
> 
> IIRC last year, you mentioned page prioritization, but didn't this year.
> Is it still supported?
> Where is it implemented? in qemu or kernel?

It is in Qemu, it is too expensive and specialised to do that within the kernel.  
I think Orit did some work regarding this aspect however I am not 100% sure it is the stable branch yet. 

> 
> 
> > 	* depending of protocol :
> > 		_  with standard socket ( or RDS) :
> > 			. Destination : request page(s)<- can be batched
> > 			.  source receive request send back the page
> > 			. destination process
> > 		_ with RDMA :
> > 			. Destination Read Page from source to local page (
> the page have been mapped to RDMA at the bitmap extraction) ( RDMA
> support scatter gather)
> 
> Although I'm not familiar with RDMA, RDMA requires the exchange of DMA-
> address between
> sender and receiver in advance and pinning down pages.
> It it correct?

Yes it is correct. This is why you would be registering the memory only when the page is dirtied. Avoiding large memory pinning for too long. ( an unpinning upon RDMA read confirmation ).
The address is the same one as the one within the virtual memory. What you exchange is a combination of RDMA key ( to uniquely identify the memory region you are sharing ) and the offset start address of the MR.  Then you can read write at will within it. That is why it's a little bit tricky because the RDMA write and read typically do not trigger any notification ( cpu / os etc..  everything is bypassed) as a result your page content can change without the process/OS knowing it.    

> 
> 
> > 		_ with post copy
> > 			. pretty much the same but the dirty bitmap reset is
> done in kernel during the post copy operation ( provide a better dirty bit
> tracking granularity)
> >
> >
> > Disadvantage:
> > 	* add a round trip that can be compensate with batch operation (
> only with standard socket)
> >
> > Advantage :
> > 	* most of the heavy lifting is done at the destination side leaving the
> source to respond to request in an event based format
> > 	* resolve a lot of issue you have with your threading form the sender
> side ( accounting etc.. )
> > 	* extremely friendly to optimised solution
> > 	* if the bitmap generation is expensive we can overlap their
> generation creating a semi continuous delivery of them guaranteeing an
> uninterrupted and optimised  flow. => we decouple the bitmap generation
> from the send/ receive operation.
> >
> >
> >
> > Anyway , I will notify you as soon as I have the patch / library available for
> RDMA / postcopy.
> >
> > Note On the fault tolerance part: this require a lot more heavy code
> optimisation and poking around to guarantee efficient checkpointing. Most
> of the solution we tested so far ( Remus and an old version of kemari) scale
> poorly . Again, an RDMA / post copy solution is kind of necessary when you
> talk about check pointing enterprise class applications.
> 
> IIRC Kemari guys evaluated IB case. I'm not sure that it was with RDMA or
> IPoIB.
> 
> thanks,
> >
> >
> > Regards
> > Benoit
> >
> >
> >
> >
> >
> > > -----Original Message-----
> > > From: Juan Quintela [mailto:quintela@redhat.com]
> > > Sent: 13 November 2012 16:19
> > > To: qemu-devel qemu-devel; Orit Wasserman; chegu_vinod@hp.com;
> > > Hudzia, Benoit; Isaku Yamahata; Michael Roth
> > > Subject: Migration ToDo list
> > >
> > >
> > > Hi
> > >
> > > If you have anything else to put, please add.
> > >
> > > Migration Thread
> > > * Plan is integrate it as one of first thing in December (me)
> > > * Remove copies with buffered file (me)
> > >
> > > Bitmap Optimization
> > > * Finish moving to individual bitmaps for migration/vga/code
> > > * Make sure we don't copy things around
> > > * Shared memory bitmap with kvm?
> > > * Move to 2MB pages bitmap and then fine grain?
> > >
> > > QIDL
> > > * Review the patches (me)
> > >
> > > PostCopy
> > > * Review patches?
> > > * See what we can already integrate?
> > >   I remember for last year that we could integrate the 1st third or so
> > >
> > > RDMA
> > > * Send RDMA/tcp/.... library they already have (Benoit)
> > > * This is required for postcopy
> > > * This can be used for precopy
> > >
> > > General
> > > * Change protocol to:
> > >   a) being always 16byte aligned (paolo said that is faster)
> > >   b) do scatter/gather of the pages?
> > >
> > > Fault Tolerance
> > > * That is built on top of migration code, but I have nothing to add.
> > >
> > > Any more ideas?
> > >
> > > Later, Juan.
> >
> 
> --
> yamahata

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] Migration To-do list
  2012-11-14  2:23 ` Isaku Yamahata
  2012-11-14 10:07   ` Hudzia, Benoit
@ 2012-11-14 11:38   ` Orit Wasserman
  1 sibling, 0 replies; 4+ messages in thread
From: Orit Wasserman @ 2012-11-14 11:38 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: Michael Roth, chegu_vinod, Hudzia, Benoit, qemu-devel qemu-devel,
	quintela

On 11/14/2012 04:23 AM, Isaku Yamahata wrote:
> On Tue, Nov 13, 2012 at 05:46:13PM +0000, Hudzia, Benoit wrote:
>> Hi,
>>
>> One concept we have been playing around in the context of  and hybrid and post copy and might make sense if you are orienting your effort toward RDMA / Post copy is to move most of the logic in the destination side. 
>>
>> This is one thing you might want to consider as it  can solve some of the issue you currently have and allow you to maintain almost a single API / Protocol once integrating with post copy approach. 
>>
>> The idea is to drive the migration from the destination side. I.e. The page are pulled from the destination and not pushed from the source side. 
>>
>> Ex: current pre-copy :
>>
>> 	*extract dirty bitmap ( dirty bitmap extraction can be scheduled or triggered by destination) 
>> 	* send it to the destination side
>> 	* have the destination iterating over the bitmap ( can do page prioritization here)  
> 
> IIRC last year, you mentioned page prioritization, but didn't this year.
> Is it still supported?
> Where is it implemented? in qemu or kernel?
I did a prototype and couldn't find workload that benefits from it so 
it was moved down in priority. Maybe I will return to it in the future.

Regards,
Orit
> 
> 
>> 	* depending of protocol :
>> 		_  with standard socket ( or RDS) : 
>> 			. Destination : request page(s)<- can be batched
>> 			.  source receive request send back the page
>> 			. destination process  
>> 		_ with RDMA : 
>> 			. Destination Read Page from source to local page ( the page have been mapped to RDMA at the bitmap extraction) ( RDMA support scatter gather) 
> 
> Although I'm not familiar with RDMA, RDMA requires the exchange of DMA-address between
> sender and receiver in advance and pinning down pages.
> It it correct?
> 
> 
>> 		_ with post copy 
>> 			. pretty much the same but the dirty bitmap reset is done in kernel during the post copy operation ( provide a better dirty bit tracking granularity)
>>
>>
>> Disadvantage: 
>> 	* add a round trip that can be compensate with batch operation ( only with standard socket)  
>>
>> Advantage :
>> 	* most of the heavy lifting is done at the destination side leaving the source to respond to request in an event based format  
>> 	* resolve a lot of issue you have with your threading form the sender side ( accounting etc.. )
>> 	* extremely friendly to optimised solution 
>> 	* if the bitmap generation is expensive we can overlap their generation creating a semi continuous delivery of them guaranteeing an uninterrupted and optimised  flow. => we decouple the bitmap generation from the send/ receive operation. 
>>
>>
>>
>> Anyway , I will notify you as soon as I have the patch / library available for RDMA / postcopy. 
>>
>> Note On the fault tolerance part: this require a lot more heavy code optimisation and poking around to guarantee efficient checkpointing. Most of the solution we tested so far ( Remus and an old version of kemari) scale poorly . Again, an RDMA / post copy solution is kind of necessary when you talk about check pointing enterprise class applications. 
> 
> IIRC Kemari guys evaluated IB case. I'm not sure that it was with RDMA or IPoIB.
> 
> thanks,
>>
>>
>> Regards
>> Benoit
>>
>>
>>
>>
>>
>>> -----Original Message-----
>>> From: Juan Quintela [mailto:quintela@redhat.com]
>>> Sent: 13 November 2012 16:19
>>> To: qemu-devel qemu-devel; Orit Wasserman; chegu_vinod@hp.com;
>>> Hudzia, Benoit; Isaku Yamahata; Michael Roth
>>> Subject: Migration ToDo list
>>>
>>>
>>> Hi
>>>
>>> If you have anything else to put, please add.
>>>
>>> Migration Thread
>>> * Plan is integrate it as one of first thing in December (me)
>>> * Remove copies with buffered file (me)
>>>
>>> Bitmap Optimization
>>> * Finish moving to individual bitmaps for migration/vga/code
>>> * Make sure we don't copy things around
>>> * Shared memory bitmap with kvm?
>>> * Move to 2MB pages bitmap and then fine grain?
>>>
>>> QIDL
>>> * Review the patches (me)
>>>
>>> PostCopy
>>> * Review patches?
>>> * See what we can already integrate?
>>>   I remember for last year that we could integrate the 1st third or so
>>>
>>> RDMA
>>> * Send RDMA/tcp/.... library they already have (Benoit)
>>> * This is required for postcopy
>>> * This can be used for precopy
>>>
>>> General
>>> * Change protocol to:
>>>   a) being always 16byte aligned (paolo said that is faster)
>>>   b) do scatter/gather of the pages?
>>>
>>> Fault Tolerance
>>> * That is built on top of migration code, but I have nothing to add.
>>>
>>> Any more ideas?
>>>
>>> Later, Juan.
>>
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-11-14 11:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-13 17:46 [Qemu-devel] Migration To-do list Hudzia, Benoit
2012-11-14  2:23 ` Isaku Yamahata
2012-11-14 10:07   ` Hudzia, Benoit
2012-11-14 11:38   ` Orit Wasserman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.