All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC] qmp interface for save vmstate to image
@ 2013-03-15  7:24 Wenchao Xia
  2013-03-15 14:51 ` Stefan Hajnoczi
  2013-03-18 13:28 ` Pavel Hrdina
  0 siblings, 2 replies; 19+ messages in thread
From: Wenchao Xia @ 2013-03-15  7:24 UTC (permalink / raw)
  To: Juan Quintela, Eric Blake, Dietmar Maurer, Stefan Hajnoczi,
	Paolo Bonzini, Kevin Wolf, qemu-devel

Hi, Juan and guys,
  I'd like to add a new way to save vmstate, which will based on the
migration thread, but will write contents to block images, instead
of fd as stream. Following is the method to add API:

1 add parameters to migrate interface, and a new type of uri:
image:[VMSATE_SAVE_IMAGE]

##
# @MigrateImageOptions:
#
# Options for migration to image.
#
# @path: the full path to the image to be used.
# @use-existing: #optional, whether to use existing image in path. If
#                not specified, qemu will try create new image.
# @create-size: #optional, the image's virtual size in creation. Only
#               valid when use-existing is false or absence, unit is M.
# @fmt: #optional the format of the image. If not specified, when
#       use-existing is true, qemu will try detect the image format,
#       when use-existing is false or absence, qcow2 format will be
#       used.
# @stream: #optional, whether to save vmstate as stream, in which way
#          small writes reduce but size may continue growing. If not
#          specified, vmstate will be saved with fixed size.
#
# Since: 1.5
##
{ 'type': 'MigrateImageOptions',
  'data': { 'path': 'str', '*use-existing': 'bool',
            '*create-size': 'int', '*fmt': 'str',
            '*stream': 'bool' } }

##
# @migrate
#
# Migrates the current running guest to another Virtual Machine.
#
# @uri: the Uniform Resource Identifier of the destination VM
#
# @blk: #optional do block migration (full disk copy)
#
# @inc: #optional incremental disk copy migration
#
# @detach: this argument exists only for compatibility reasons and
#          is ignored by QEMU
#
# @image-options: #optional, the options used in migration to image.
#                 Only valid in migration to image.
#
# Returns: nothing on success
#
# Since: 0.14.0
##
{ 'command': 'migrate',
  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool',
           '*detach': 'bool', '*image-options': MigrateImageOptions} }

  In this way query-migrate and migrate incoming could be naturelly used
for querying and restoring, But introduce some options only for the
image migration.

2 new command vmstate-save with above options. Then use query-migrate
and migrate incoming to query/restore the states, which seems wild.

  I can't decide which is better, could u take a look and put some
comments on this?
-- 
Best Regards

Wenchao Xia

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC] qmp interface for save vmstate to image
  2013-03-15  7:24 [Qemu-devel] [RFC] qmp interface for save vmstate to image Wenchao Xia
@ 2013-03-15 14:51 ` Stefan Hajnoczi
  2013-03-18  6:40   ` Wenchao Xia
  2013-03-18 13:28 ` Pavel Hrdina
  1 sibling, 1 reply; 19+ messages in thread
From: Stefan Hajnoczi @ 2013-03-15 14:51 UTC (permalink / raw)
  To: Wenchao Xia
  Cc: Kevin Wolf, Juan Quintela, qemu-devel, Paolo Bonzini, Dietmar Maurer

On Fri, Mar 15, 2013 at 03:24:38PM +0800, Wenchao Xia wrote:
>   I'd like to add a new way to save vmstate, which will based on the
> migration thread, but will write contents to block images, instead
> of fd as stream. Following is the method to add API:

Hi Wenchao,
What use cases are there besides saving vmstate to a raw image?

I'm curious if you're proposing this since there is no "file:" URI or
because you really want to do things like saving vmstate into a qcow2
file or over NBD.

Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC] qmp interface for save vmstate to image
  2013-03-15 14:51 ` Stefan Hajnoczi
@ 2013-03-18  6:40   ` Wenchao Xia
  2013-03-18  9:04     ` Kevin Wolf
  2013-03-18 10:09     ` Stefan Hajnoczi
  0 siblings, 2 replies; 19+ messages in thread
From: Wenchao Xia @ 2013-03-18  6:40 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Juan Quintela, qemu-devel, Paolo Bonzini, Dietmar Maurer

于 2013-3-15 22:51, Stefan Hajnoczi 写道:
> On Fri, Mar 15, 2013 at 03:24:38PM +0800, Wenchao Xia wrote:
>>    I'd like to add a new way to save vmstate, which will based on the
>> migration thread, but will write contents to block images, instead
>> of fd as stream. Following is the method to add API:
> 
> Hi Wenchao,
> What use cases are there besides saving vmstate to a raw image?
> 
> I'm curious if you're proposing this since there is no "file:" URI or
> because you really want to do things like saving vmstate into a qcow2
> file or over NBD.
> 
> Stefan
> 
Hi, Stefan
  Most used cases would be "raw" and "qcow2", which is flex and can be
chosen by user. In this way, existing block layer feature in qemu can
be used, such as tagging zeros. I haven't check the buffer/cache status
in qemu block layer, but if there is, it can also benefit.
  User can specify "raw" or "qcow2" according to host configuration, If
there is dedicated storage components underlining, he can use "raw" to
skip qemu's block layer.

-- 
Best Regards

Wenchao Xia

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC] qmp interface for save vmstate to image
  2013-03-18  6:40   ` Wenchao Xia
@ 2013-03-18  9:04     ` Kevin Wolf
  2013-03-18 10:08       ` Paolo Bonzini
  2013-03-18 10:47       ` Wenchao Xia
  2013-03-18 10:09     ` Stefan Hajnoczi
  1 sibling, 2 replies; 19+ messages in thread
From: Kevin Wolf @ 2013-03-18  9:04 UTC (permalink / raw)
  To: Wenchao Xia
  Cc: Juan Quintela, Stefan Hajnoczi, qemu-devel, Paolo Bonzini,
	Dietmar Maurer

Am 18.03.2013 um 07:40 hat Wenchao Xia geschrieben:
> 于 2013-3-15 22:51, Stefan Hajnoczi 写道:
> > On Fri, Mar 15, 2013 at 03:24:38PM +0800, Wenchao Xia wrote:
> >>    I'd like to add a new way to save vmstate, which will based on the
> >> migration thread, but will write contents to block images, instead
> >> of fd as stream. Following is the method to add API:
> > 
> > Hi Wenchao,
> > What use cases are there besides saving vmstate to a raw image?
> > 
> > I'm curious if you're proposing this since there is no "file:" URI or
> > because you really want to do things like saving vmstate into a qcow2
> > file or over NBD.
> > 
> > Stefan
> > 
> Hi, Stefan
>   Most used cases would be "raw" and "qcow2", which is flex and can be
> chosen by user. In this way, existing block layer feature in qemu can
> be used, such as tagging zeros. I haven't check the buffer/cache status
> in qemu block layer, but if there is, it can also benefit.
>   User can specify "raw" or "qcow2" according to host configuration, If
> there is dedicated storage components underlining, he can use "raw" to
> skip qemu's block layer.

Oh, seems I misread this then. I thought this was about internal live
snapshots, which is a feature that I consider really useful. I'm not so
sure if saving the VM state as the disk contents of a qcow2 image is
really helpful.

If zero clusters help a lot, then there's clearly something to improve
in the migration protocol, because it shouldn't send so many zeros in
the first place.

Kevin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC] qmp interface for save vmstate to image
  2013-03-18  9:04     ` Kevin Wolf
@ 2013-03-18 10:08       ` Paolo Bonzini
  2013-03-18 10:50         ` Wenchao Xia
  2013-03-18 10:47       ` Wenchao Xia
  1 sibling, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2013-03-18 10:08 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Juan Quintela, Stefan Hajnoczi, qemu-devel, Dietmar Maurer, Wenchao Xia

Il 18/03/2013 10:04, Kevin Wolf ha scritto:
> Oh, seems I misread this then. I thought this was about internal live
> snapshots, which is a feature that I consider really useful. I'm not so
> sure if saving the VM state as the disk contents of a qcow2 image is
> really helpful.
> 
> If zero clusters help a lot, then there's clearly something to improve
> in the migration protocol, because it shouldn't send so many zeros in
> the first place.

Zero pages are sent as a single 9-byte entry (64 bits for the address
and flags, 8 for the zero).

I don't expect the migration stream to have a single zero cluster, since
every page is prefixed by the 64 bits for the address and flags.
Furthermore, the RAM data would be horribly unaligned because of this.
15-20% sectors or so would be read twice, since reading each page (4104
bytes including the address and flags) would span 10 sectors (5120 bytes).

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC] qmp interface for save vmstate to image
  2013-03-18  6:40   ` Wenchao Xia
  2013-03-18  9:04     ` Kevin Wolf
@ 2013-03-18 10:09     ` Stefan Hajnoczi
  1 sibling, 0 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2013-03-18 10:09 UTC (permalink / raw)
  To: Wenchao Xia
  Cc: Kevin Wolf, Juan Quintela, qemu-devel, Paolo Bonzini, Dietmar Maurer

On Mon, Mar 18, 2013 at 02:40:50PM +0800, Wenchao Xia wrote:
> 于 2013-3-15 22:51, Stefan Hajnoczi 写道:
> > On Fri, Mar 15, 2013 at 03:24:38PM +0800, Wenchao Xia wrote:
> >>    I'd like to add a new way to save vmstate, which will based on the
> >> migration thread, but will write contents to block images, instead
> >> of fd as stream. Following is the method to add API:
> > 
> > Hi Wenchao,
> > What use cases are there besides saving vmstate to a raw image?
> > 
> > I'm curious if you're proposing this since there is no "file:" URI or
> > because you really want to do things like saving vmstate into a qcow2
> > file or over NBD.
> > 
> > Stefan
> > 
> Hi, Stefan
>   Most used cases would be "raw" and "qcow2", which is flex and can be
> chosen by user. In this way, existing block layer feature in qemu can
> be used, such as tagging zeros. I haven't check the buffer/cache status
> in qemu block layer, but if there is, it can also benefit.

Okay, thanks for explaining.

You can use caching with the BDRV_O_CACHE_WB option.  Then you need to
call bdrv_co_flush() to ensure data reaches the disk.  The advantage of
caching is that I/O patterns with many small unaligned writes may be
much faster when going through the host's page cache - and reads can
also be faster.

You can bypass the host page cache with BDRV_O_CACHE_WB | BDRV_NO_CACHE.
Here bdrv_co_flush() calls are still necessary to ensure data reaches
the disk.

Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC] qmp interface for save vmstate to image
  2013-03-18  9:04     ` Kevin Wolf
  2013-03-18 10:08       ` Paolo Bonzini
@ 2013-03-18 10:47       ` Wenchao Xia
  1 sibling, 0 replies; 19+ messages in thread
From: Wenchao Xia @ 2013-03-18 10:47 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Stefan Hajnoczi, Paolo Bonzini, qemu-devel, Dietmar Maurer,
	Juan Quintela

于 2013-3-18 17:04, Kevin Wolf 写道:
> Am 18.03.2013 um 07:40 hat Wenchao Xia geschrieben:
>> 于 2013-3-15 22:51, Stefan Hajnoczi 写道:
>>> On Fri, Mar 15, 2013 at 03:24:38PM +0800, Wenchao Xia wrote:
>>>>     I'd like to add a new way to save vmstate, which will based on the
>>>> migration thread, but will write contents to block images, instead
>>>> of fd as stream. Following is the method to add API:
>>>
>>> Hi Wenchao,
>>> What use cases are there besides saving vmstate to a raw image?
>>>
>>> I'm curious if you're proposing this since there is no "file:" URI or
>>> because you really want to do things like saving vmstate into a qcow2
>>> file or over NBD.
>>>
>>> Stefan
>>>
>> Hi, Stefan
>>    Most used cases would be "raw" and "qcow2", which is flex and can be
>> chosen by user. In this way, existing block layer feature in qemu can
>> be used, such as tagging zeros. I haven't check the buffer/cache status
>> in qemu block layer, but if there is, it can also benefit.
>>    User can specify "raw" or "qcow2" according to host configuration, If
>> there is dedicated storage components underlining, he can use "raw" to
>> skip qemu's block layer.
>
> Oh, seems I misread this then. I thought this was about internal live
> snapshots, which is a feature that I consider really useful. I'm not so
> sure if saving the VM state as the disk contents of a qcow2 image is
> really helpful.
>
   Actually I am leaving internal live snapshot as 2nd step since there
are a bit more work to do when using migration thread, since SPICE
is handled in migration but not in internal snapshot.
   The main purpose is getting a standalone vmstate saving file with
limited size, since internal snapshot lacks a API now to drop vmstate
at any time.(better to have API to export vmstate/delta block data).

> If zero clusters help a lot, then there's clearly something to improve
> in the migration protocol, because it shouldn't send so many zeros in
> the first place.
>
  In streaming case, zero are good encoded now I think, but when it uses
fseek(), there may be some zeros inside, and small writes. Handling
those are likely block layer's job, by using image I can directly use
qemu's block layer with qcow2 format, or using raw if underline
component there, make it flex.

> Kevin
>


-- 
Best Regards

Wenchao Xia

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC] qmp interface for save vmstate to image
  2013-03-18 10:08       ` Paolo Bonzini
@ 2013-03-18 10:50         ` Wenchao Xia
  0 siblings, 0 replies; 19+ messages in thread
From: Wenchao Xia @ 2013-03-18 10:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Juan Quintela, Stefan Hajnoczi, qemu-devel, Dietmar Maurer

于 2013-3-18 18:08, Paolo Bonzini 写道:
> Il 18/03/2013 10:04, Kevin Wolf ha scritto:
>> Oh, seems I misread this then. I thought this was about internal live
>> snapshots, which is a feature that I consider really useful. I'm not so
>> sure if saving the VM state as the disk contents of a qcow2 image is
>> really helpful.
>>
>> If zero clusters help a lot, then there's clearly something to improve
>> in the migration protocol, because it shouldn't send so many zeros in
>> the first place.
>
> Zero pages are sent as a single 9-byte entry (64 bits for the address
> and flags, 8 for the zero).
>
> I don't expect the migration stream to have a single zero cluster, since
> every page is prefixed by the 64 bits for the address and flags.
> Furthermore, the RAM data would be horribly unaligned because of this.
> 15-20% sectors or so would be read twice, since reading each page (4104
> bytes including the address and flags) would span 10 sectors (5120 bytes).
>
> Paolo
>
   I think in streaming case, zero page will be handled well. I use
qcow2 mainly for fseek() case, which may have some zero holes inside.

-- 
Best Regards

Wenchao Xia

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC] qmp interface for save vmstate to image
  2013-03-15  7:24 [Qemu-devel] [RFC] qmp interface for save vmstate to image Wenchao Xia
  2013-03-15 14:51 ` Stefan Hajnoczi
@ 2013-03-18 13:28 ` Pavel Hrdina
  2013-03-21  6:43   ` Wenchao Xia
  1 sibling, 1 reply; 19+ messages in thread
From: Pavel Hrdina @ 2013-03-18 13:28 UTC (permalink / raw)
  To: Wenchao Xia
  Cc: Kevin Wolf, Juan Quintela, Stefan Hajnoczi, qemu-devel,
	Paolo Bonzini, Dietmar Maurer

Hi Wenchao,

It seems the we are working on the same thing. You are trying to improve
the size of vmstate if you want to save it to file or as an internal
snapshot.

I'm also working on that issue and I think that my solution could be
also used for savevm to external file or for live backup.

Here is my proposal how to do it:

We will not have the fixed size of vmstate, we will have the possible
minimal size of the vmstate. I will also use the migration code to save
the vmstate.

In the qemu_savevm_state_begin we will create bitmap for all ram pages.
Then we set all pages in bitmap to "1" and it means we need to save them
all. Then we check all ram pages for duplicated pages and we will unset
all duplicated pages from "savevm_bitmap".

In the qemu_savevm_state_iterate we will start saving remaining ram
pages according to "savevm_bitmap". Because the guest is running, it
could change the data in ram pages which is still not saved. For this
case we also have to create a priority queue. Into this priority queue
we will copy every ram page before it will be changed and also remove
this ram page from savevm_bitmap. In the iterate cycle we will at first
handle the priority queue and then continue to other ram pages from the
savevm_bitmap.

In the qemu_savevm_state_complete we will save only non-live data.

This should reduce the vmstate size and also speedup the saving of
vmstate with minimal memory usage.

Pavel

On 03/15/2013 08:24 AM, Wenchao Xia wrote:
> Hi, Juan and guys,
>    I'd like to add a new way to save vmstate, which will based on the
> migration thread, but will write contents to block images, instead
> of fd as stream. Following is the method to add API:
> 
> 1 add parameters to migrate interface, and a new type of uri:
> image:[VMSATE_SAVE_IMAGE]
> 
> ##
> # @MigrateImageOptions:
> #
> # Options for migration to image.
> #
> # @path: the full path to the image to be used.
> # @use-existing: #optional, whether to use existing image in path. If
> #                not specified, qemu will try create new image.
> # @create-size: #optional, the image's virtual size in creation. Only
> #               valid when use-existing is false or absence, unit is M.
> # @fmt: #optional the format of the image. If not specified, when
> #       use-existing is true, qemu will try detect the image format,
> #       when use-existing is false or absence, qcow2 format will be
> #       used.
> # @stream: #optional, whether to save vmstate as stream, in which way
> #          small writes reduce but size may continue growing. If not
> #          specified, vmstate will be saved with fixed size.
> #
> # Since: 1.5
> ##
> { 'type': 'MigrateImageOptions',
>    'data': { 'path': 'str', '*use-existing': 'bool',
>              '*create-size': 'int', '*fmt': 'str',
>              '*stream': 'bool' } }
> 
> ##
> # @migrate
> #
> # Migrates the current running guest to another Virtual Machine.
> #
> # @uri: the Uniform Resource Identifier of the destination VM
> #
> # @blk: #optional do block migration (full disk copy)
> #
> # @inc: #optional incremental disk copy migration
> #
> # @detach: this argument exists only for compatibility reasons and
> #          is ignored by QEMU
> #
> # @image-options: #optional, the options used in migration to image.
> #                 Only valid in migration to image.
> #
> # Returns: nothing on success
> #
> # Since: 0.14.0
> ##
> { 'command': 'migrate',
>    'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool',
>             '*detach': 'bool', '*image-options': MigrateImageOptions} }
> 
>    In this way query-migrate and migrate incoming could be naturelly used
> for querying and restoring, But introduce some options only for the
> image migration.
> 
> 2 new command vmstate-save with above options. Then use query-migrate
> and migrate incoming to query/restore the states, which seems wild.
> 
>    I can't decide which is better, could u take a look and put some
> comments on this?
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC] qmp interface for save vmstate to image
  2013-03-18 13:28 ` Pavel Hrdina
@ 2013-03-21  6:43   ` Wenchao Xia
  2013-03-21 11:48     ` Pavel Hrdina
  0 siblings, 1 reply; 19+ messages in thread
From: Wenchao Xia @ 2013-03-21  6:43 UTC (permalink / raw)
  To: Pavel Hrdina
  Cc: Kevin Wolf, Juan Quintela, Stefan Hajnoczi, qemu-devel,
	Paolo Bonzini, Dietmar Maurer

Hi, Pavel
  Sorry for late response.
> Hi Wenchao,
> 
> It seems the we are working on the same thing. You are trying to improve
> the size of vmstate if you want to save it to file or as an internal
> snapshot.
> 
> I'm also working on that issue and I think that my solution could be
> also used for savevm to external file or for live backup.
> 
> Here is my proposal how to do it:
> 
> We will not have the fixed size of vmstate, we will have the possible
> minimal size of the vmstate. I will also use the migration code to save
> the vmstate.
> 
  It is good if speed and size can be improved, but IMHO the size will
be a problem. Predictable or fixed size ensure management stack to
give assess and decision, preserve resource ahead, personally I
does not like a process continue to take resource without limit,
in most case I'll turn it off.... By using qcow2m vmstate will have a
fixed MAX size, ideal to be used to take it as a backup data.
  Above is my personal opinion, and I do want to know the maintainer's
opinion to decide whether to continue.

> In the qemu_savevm_state_begin we will create bitmap for all ram pages.
> Then we set all pages in bitmap to "1" and it means we need to save them
> all. Then we check all ram pages for duplicated pages and we will unset
> all duplicated pages from "savevm_bitmap".
> 
> In the qemu_savevm_state_iterate we will start saving remaining ram
> pages according to "savevm_bitmap". Because the guest is running, it
> could change the data in ram pages which is still not saved. For this
> case we also have to create a priority queue. Into this priority queue
> we will copy every ram page before it will be changed and also remove
> this ram page from savevm_bitmap. In the iterate cycle we will at first
> handle the priority queue and then continue to other ram pages from the
> savevm_bitmap.
> 
  OK, I got your idea: intercept the page writing before it changes.
I think this could reduce time in savevm. But some problems need to be
confirmed:
1 is it workable when KVM is used? In my understanding KVM will directly
change the ram page before qemu can take over.
2 the performance sacrifice of running guest, need a test.
3 the total buffer size in the queue. If you plan to make it used for
any migration then in TCP case the buffer may grow to a large size
for speed reason. If you use it only for local device, I suggest
conclude it as a improvement for migrate to block device, in contrast
to migrate to stream, then the performance optimizing infra such
as buffer/cache can be used much easier, to reduce the performance
lost in page changing.
  I feel this is more likely as an algorithm improvement for block
migration, which can work with my patch together. My patch
is actually introducing migrate vmstate to block instead of stream.

> In the qemu_savevm_state_complete we will save only non-live data.
> 
> This should reduce the vmstate size and also speedup the saving of
> vmstate with minimal memory usage.
> 
> Pavel
> 
> On 03/15/2013 08:24 AM, Wenchao Xia wrote:
>> Hi, Juan and guys,
>>     I'd like to add a new way to save vmstate, which will based on the
>> migration thread, but will write contents to block images, instead
>> of fd as stream. Following is the method to add API:
>>
>> 1 add parameters to migrate interface, and a new type of uri:
>> image:[VMSATE_SAVE_IMAGE]
>>
>> ##
>> # @MigrateImageOptions:
>> #
>> # Options for migration to image.
>> #
>> # @path: the full path to the image to be used.
>> # @use-existing: #optional, whether to use existing image in path. If
>> #                not specified, qemu will try create new image.
>> # @create-size: #optional, the image's virtual size in creation. Only
>> #               valid when use-existing is false or absence, unit is M.
>> # @fmt: #optional the format of the image. If not specified, when
>> #       use-existing is true, qemu will try detect the image format,
>> #       when use-existing is false or absence, qcow2 format will be
>> #       used.
>> # @stream: #optional, whether to save vmstate as stream, in which way
>> #          small writes reduce but size may continue growing. If not
>> #          specified, vmstate will be saved with fixed size.
>> #
>> # Since: 1.5
>> ##
>> { 'type': 'MigrateImageOptions',
>>     'data': { 'path': 'str', '*use-existing': 'bool',
>>               '*create-size': 'int', '*fmt': 'str',
>>               '*stream': 'bool' } }
>>
>> ##
>> # @migrate
>> #
>> # Migrates the current running guest to another Virtual Machine.
>> #
>> # @uri: the Uniform Resource Identifier of the destination VM
>> #
>> # @blk: #optional do block migration (full disk copy)
>> #
>> # @inc: #optional incremental disk copy migration
>> #
>> # @detach: this argument exists only for compatibility reasons and
>> #          is ignored by QEMU
>> #
>> # @image-options: #optional, the options used in migration to image.
>> #                 Only valid in migration to image.
>> #
>> # Returns: nothing on success
>> #
>> # Since: 0.14.0
>> ##
>> { 'command': 'migrate',
>>     'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool',
>>              '*detach': 'bool', '*image-options': MigrateImageOptions} }
>>
>>     In this way query-migrate and migrate incoming could be naturelly used
>> for querying and restoring, But introduce some options only for the
>> image migration.
>>
>> 2 new command vmstate-save with above options. Then use query-migrate
>> and migrate incoming to query/restore the states, which seems wild.
>>
>>     I can't decide which is better, could u take a look and put some
>> comments on this?
>>
> 


-- 
Best Regards

Wenchao Xia

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC] qmp interface for save vmstate to image
  2013-03-21  6:43   ` Wenchao Xia
@ 2013-03-21 11:48     ` Pavel Hrdina
  2013-03-21 13:38       ` Stefan Hajnoczi
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Hrdina @ 2013-03-21 11:48 UTC (permalink / raw)
  To: Wenchao Xia
  Cc: Kevin Wolf, Juan Quintela, Stefan Hajnoczi, qemu-devel,
	Paolo Bonzini, Dietmar Maurer

On 03/21/2013 07:43 AM, Wenchao Xia wrote:
> Hi, Pavel
>    Sorry for late response.

np :)

>> Hi Wenchao,
>>
>> It seems the we are working on the same thing. You are trying to improve
>> the size of vmstate if you want to save it to file or as an internal
>> snapshot.
>>
>> I'm also working on that issue and I think that my solution could be
>> also used for savevm to external file or for live backup.
>>
>> Here is my proposal how to do it:
>>
>> We will not have the fixed size of vmstate, we will have the possible
>> minimal size of the vmstate. I will also use the migration code to save
>> the vmstate.
>>
>    It is good if speed and size can be improved, but IMHO the size will
> be a problem. Predictable or fixed size ensure management stack to
> give assess and decision, preserve resource ahead, personally I
> does not like a process continue to take resource without limit,
> in most case I'll turn it off.... By using qcow2m vmstate will have a
> fixed MAX size, ideal to be used to take it as a backup data.
>    Above is my personal opinion, and I do want to know the maintainer's
> opinion to decide whether to continue.

I mean that the vmstate size would by at max the same as the guest ram
size, but could be smaller. I also dislike that actually the vmstate
could be much more larger then the guest ram size.

> 
>> In the qemu_savevm_state_begin we will create bitmap for all ram pages.
>> Then we set all pages in bitmap to "1" and it means we need to save them
>> all. Then we check all ram pages for duplicated pages and we will unset
>> all duplicated pages from "savevm_bitmap".
>>
>> In the qemu_savevm_state_iterate we will start saving remaining ram
>> pages according to "savevm_bitmap". Because the guest is running, it
>> could change the data in ram pages which is still not saved. For this
>> case we also have to create a priority queue. Into this priority queue
>> we will copy every ram page before it will be changed and also remove
>> this ram page from savevm_bitmap. In the iterate cycle we will at first
>> handle the priority queue and then continue to other ram pages from the
>> savevm_bitmap.
>>
>    OK, I got your idea: intercept the page writing before it changes.
> I think this could reduce time in savevm. But some problems need to be
> confirmed:
> 1 is it workable when KVM is used? In my understanding KVM will directly
> change the ram page before qemu can take over.

Yes, this is true. I'm now investigating any way how to do this, but I'm
afraid that without support from kvm kernel module it cannot be done.

> 2 the performance sacrifice of running guest, need a test.

Surly I'll test it if there will be some solution how to copy the page
before it is changed when kvm is used.

> 3 the total buffer size in the queue. If you plan to make it used for
> any migration then in TCP case the buffer may grow to a large size
> for speed reason. If you use it only for local device, I suggest
> conclude it as a improvement for migrate to block device, in contrast
> to migrate to stream, then the performance optimizing infra such
> as buffer/cache can be used much easier, to reduce the performance
> lost in page changing.
>    I feel this is more likely as an algorithm improvement for block
> migration, which can work with my patch together. My patch
> is actually introducing migrate vmstate to block instead of stream.

Yes, this proposal is to improve migration to block device.

> 
>> In the qemu_savevm_state_complete we will save only non-live data.
>>
>> This should reduce the vmstate size and also speedup the saving of
>> vmstate with minimal memory usage.
>>
>> Pavel
>>
>> On 03/15/2013 08:24 AM, Wenchao Xia wrote:
>>> Hi, Juan and guys,
>>>      I'd like to add a new way to save vmstate, which will based on the
>>> migration thread, but will write contents to block images, instead
>>> of fd as stream. Following is the method to add API:
>>>
>>> 1 add parameters to migrate interface, and a new type of uri:
>>> image:[VMSATE_SAVE_IMAGE]
>>>
>>> ##
>>> # @MigrateImageOptions:
>>> #
>>> # Options for migration to image.
>>> #
>>> # @path: the full path to the image to be used.
>>> # @use-existing: #optional, whether to use existing image in path. If
>>> #                not specified, qemu will try create new image.
>>> # @create-size: #optional, the image's virtual size in creation. Only
>>> #               valid when use-existing is false or absence, unit is M.
>>> # @fmt: #optional the format of the image. If not specified, when
>>> #       use-existing is true, qemu will try detect the image format,
>>> #       when use-existing is false or absence, qcow2 format will be
>>> #       used.
>>> # @stream: #optional, whether to save vmstate as stream, in which way
>>> #          small writes reduce but size may continue growing. If not
>>> #          specified, vmstate will be saved with fixed size.
>>> #
>>> # Since: 1.5
>>> ##
>>> { 'type': 'MigrateImageOptions',
>>>      'data': { 'path': 'str', '*use-existing': 'bool',
>>>                '*create-size': 'int', '*fmt': 'str',
>>>                '*stream': 'bool' } }
>>>
>>> ##
>>> # @migrate
>>> #
>>> # Migrates the current running guest to another Virtual Machine.
>>> #
>>> # @uri: the Uniform Resource Identifier of the destination VM
>>> #
>>> # @blk: #optional do block migration (full disk copy)
>>> #
>>> # @inc: #optional incremental disk copy migration
>>> #
>>> # @detach: this argument exists only for compatibility reasons and
>>> #          is ignored by QEMU
>>> #
>>> # @image-options: #optional, the options used in migration to image.
>>> #                 Only valid in migration to image.
>>> #
>>> # Returns: nothing on success
>>> #
>>> # Since: 0.14.0
>>> ##
>>> { 'command': 'migrate',
>>>      'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool',
>>>               '*detach': 'bool', '*image-options': MigrateImageOptions} }
>>>
>>>      In this way query-migrate and migrate incoming could be naturelly used
>>> for querying and restoring, But introduce some options only for the
>>> image migration.
>>>
>>> 2 new command vmstate-save with above options. Then use query-migrate
>>> and migrate incoming to query/restore the states, which seems wild.
>>>
>>>      I can't decide which is better, could u take a look and put some
>>> comments on this?
>>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC] qmp interface for save vmstate to image
  2013-03-21 11:48     ` Pavel Hrdina
@ 2013-03-21 13:38       ` Stefan Hajnoczi
  2013-03-21 13:42         ` Paolo Bonzini
  2013-03-21 13:43         ` Pavel Hrdina
  0 siblings, 2 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2013-03-21 13:38 UTC (permalink / raw)
  To: Pavel Hrdina
  Cc: Kevin Wolf, Juan Quintela, qemu-devel, Dietmar Maurer,
	Paolo Bonzini, Wenchao Xia

On Thu, Mar 21, 2013 at 12:48:35PM +0100, Pavel Hrdina wrote:
> On 03/21/2013 07:43 AM, Wenchao Xia wrote:
> > Hi, Pavel
> >    Sorry for late response.
> 
> np :)
> 
> >> Hi Wenchao,
> >>
> >> It seems the we are working on the same thing. You are trying to improve
> >> the size of vmstate if you want to save it to file or as an internal
> >> snapshot.
> >>
> >> I'm also working on that issue and I think that my solution could be
> >> also used for savevm to external file or for live backup.
> >>
> >> Here is my proposal how to do it:
> >>
> >> We will not have the fixed size of vmstate, we will have the possible
> >> minimal size of the vmstate. I will also use the migration code to save
> >> the vmstate.
> >>
> >    It is good if speed and size can be improved, but IMHO the size will
> > be a problem. Predictable or fixed size ensure management stack to
> > give assess and decision, preserve resource ahead, personally I
> > does not like a process continue to take resource without limit,
> > in most case I'll turn it off.... By using qcow2m vmstate will have a
> > fixed MAX size, ideal to be used to take it as a backup data.
> >    Above is my personal opinion, and I do want to know the maintainer's
> > opinion to decide whether to continue.
> 
> I mean that the vmstate size would by at max the same as the guest ram
> size, but could be smaller. I also dislike that actually the vmstate
> could be much more larger then the guest ram size.
> 
> > 
> >> In the qemu_savevm_state_begin we will create bitmap for all ram pages.
> >> Then we set all pages in bitmap to "1" and it means we need to save them
> >> all. Then we check all ram pages for duplicated pages and we will unset
> >> all duplicated pages from "savevm_bitmap".
> >>
> >> In the qemu_savevm_state_iterate we will start saving remaining ram
> >> pages according to "savevm_bitmap". Because the guest is running, it
> >> could change the data in ram pages which is still not saved. For this
> >> case we also have to create a priority queue. Into this priority queue
> >> we will copy every ram page before it will be changed and also remove
> >> this ram page from savevm_bitmap. In the iterate cycle we will at first
> >> handle the priority queue and then continue to other ram pages from the
> >> savevm_bitmap.
> >>
> >    OK, I got your idea: intercept the page writing before it changes.
> > I think this could reduce time in savevm. But some problems need to be
> > confirmed:
> > 1 is it workable when KVM is used? In my understanding KVM will directly
> > change the ram page before qemu can take over.
> 
> Yes, this is true. I'm now investigating any way how to do this, but I'm
> afraid that without support from kvm kernel module it cannot be done.
> 
> > 2 the performance sacrifice of running guest, need a test.
> 
> Surly I'll test it if there will be some solution how to copy the page
> before it is changed when kvm is used.

There already is a guest RAM cloning mechanism: fork the QEMU process.
Then you have a copy-on-write guest RAM.

In a little more detail:

1. save non-RAM device state
2. quiesce QEMU to a state that is safe for forking
3. create an EventNotifier for live savevm completion signal
4. fork and pass completion EventNotifier to child
5. parent continues running VM
6. child performs vmsave of copy-on-write guest RAM
7. child signals completion EventNotifier and terminates
8. parent raises live savevm completion QMP event

Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC] qmp interface for save vmstate to image
  2013-03-21 13:38       ` Stefan Hajnoczi
@ 2013-03-21 13:42         ` Paolo Bonzini
  2013-03-21 13:53           ` Pavel Hrdina
  2013-03-21 14:56           ` Stefan Hajnoczi
  2013-03-21 13:43         ` Pavel Hrdina
  1 sibling, 2 replies; 19+ messages in thread
From: Paolo Bonzini @ 2013-03-21 13:42 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Pavel Hrdina, Juan Quintela, qemu-devel,
	Dietmar Maurer, Wenchao Xia

Il 21/03/2013 14:38, Stefan Hajnoczi ha scritto:
> There already is a guest RAM cloning mechanism: fork the QEMU process.
> Then you have a copy-on-write guest RAM.
> 
> In a little more detail:
> 
> 1. save non-RAM device state
> 2. quiesce QEMU to a state that is safe for forking
> 3. create an EventNotifier for live savevm completion signal
> 4. fork and pass completion EventNotifier to child
> 5. parent continues running VM
> 6. child performs vmsave of copy-on-write guest RAM
> 7. child signals completion EventNotifier and terminates
> 8. parent raises live savevm completion QMP event

Forking a threaded program is not so easy, but it could be done if the
child is very simple and only uses syscalls to communicate back with the
parent:

1. save non-RAM device state
2. quiesce QEMU to a state that is safe for forking
3. create a memory map and a pipe
4. fork and pass the write end of the pipe to the child
5. parent continues running VM
6. child reads the memory map and writes data to the pipe
7. parent copies data from the pipe to the migration stream
8. child exits, parent raises live savevm completion QMP event

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC] qmp interface for save vmstate to image
  2013-03-21 13:38       ` Stefan Hajnoczi
  2013-03-21 13:42         ` Paolo Bonzini
@ 2013-03-21 13:43         ` Pavel Hrdina
  1 sibling, 0 replies; 19+ messages in thread
From: Pavel Hrdina @ 2013-03-21 13:43 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Juan Quintela, qemu-devel, Dietmar Maurer,
	Paolo Bonzini, Wenchao Xia

On 03/21/2013 02:38 PM, Stefan Hajnoczi wrote:
> On Thu, Mar 21, 2013 at 12:48:35PM +0100, Pavel Hrdina wrote:
>> On 03/21/2013 07:43 AM, Wenchao Xia wrote:
>>> Hi, Pavel
>>>     Sorry for late response.
>>
>> np :)
>>
>>>> Hi Wenchao,
>>>>
>>>> It seems the we are working on the same thing. You are trying to improve
>>>> the size of vmstate if you want to save it to file or as an internal
>>>> snapshot.
>>>>
>>>> I'm also working on that issue and I think that my solution could be
>>>> also used for savevm to external file or for live backup.
>>>>
>>>> Here is my proposal how to do it:
>>>>
>>>> We will not have the fixed size of vmstate, we will have the possible
>>>> minimal size of the vmstate. I will also use the migration code to save
>>>> the vmstate.
>>>>
>>>     It is good if speed and size can be improved, but IMHO the size will
>>> be a problem. Predictable or fixed size ensure management stack to
>>> give assess and decision, preserve resource ahead, personally I
>>> does not like a process continue to take resource without limit,
>>> in most case I'll turn it off.... By using qcow2m vmstate will have a
>>> fixed MAX size, ideal to be used to take it as a backup data.
>>>     Above is my personal opinion, and I do want to know the maintainer's
>>> opinion to decide whether to continue.
>>
>> I mean that the vmstate size would by at max the same as the guest ram
>> size, but could be smaller. I also dislike that actually the vmstate
>> could be much more larger then the guest ram size.
>>
>>>
>>>> In the qemu_savevm_state_begin we will create bitmap for all ram pages.
>>>> Then we set all pages in bitmap to "1" and it means we need to save them
>>>> all. Then we check all ram pages for duplicated pages and we will unset
>>>> all duplicated pages from "savevm_bitmap".
>>>>
>>>> In the qemu_savevm_state_iterate we will start saving remaining ram
>>>> pages according to "savevm_bitmap". Because the guest is running, it
>>>> could change the data in ram pages which is still not saved. For this
>>>> case we also have to create a priority queue. Into this priority queue
>>>> we will copy every ram page before it will be changed and also remove
>>>> this ram page from savevm_bitmap. In the iterate cycle we will at first
>>>> handle the priority queue and then continue to other ram pages from the
>>>> savevm_bitmap.
>>>>
>>>     OK, I got your idea: intercept the page writing before it changes.
>>> I think this could reduce time in savevm. But some problems need to be
>>> confirmed:
>>> 1 is it workable when KVM is used? In my understanding KVM will directly
>>> change the ram page before qemu can take over.
>>
>> Yes, this is true. I'm now investigating any way how to do this, but I'm
>> afraid that without support from kvm kernel module it cannot be done.
>>
>>> 2 the performance sacrifice of running guest, need a test.
>>
>> Surly I'll test it if there will be some solution how to copy the page
>> before it is changed when kvm is used.
>
> There already is a guest RAM cloning mechanism: fork the QEMU process.
> Then you have a copy-on-write guest RAM.
>
> In a little more detail:
>
> 1. save non-RAM device state
> 2. quiesce QEMU to a state that is safe for forking
> 3. create an EventNotifier for live savevm completion signal
And also for internal live savevm create pipe to past the vmstate data 
to the parent. It wouldn't be good idea touching qcow2 file from the child.
> 4. fork and pass completion EventNotifier to child
> 5. parent continues running VM
> 6. child performs vmsave of copy-on-write guest RAM
> 7. child signals completion EventNotifier and terminates
> 8. parent raises live savevm completion QMP event
>
> Stefan
>

Yes, this is another way how to do it. I already consider this. Now I 
know that it could be the right way if someone else wrote this.

Thanks Stefan,

Pavel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC] qmp interface for save vmstate to image
  2013-03-21 13:42         ` Paolo Bonzini
@ 2013-03-21 13:53           ` Pavel Hrdina
  2013-03-21 14:56           ` Stefan Hajnoczi
  1 sibling, 0 replies; 19+ messages in thread
From: Pavel Hrdina @ 2013-03-21 13:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Juan Quintela, Stefan Hajnoczi, qemu-devel,
	Dietmar Maurer, Wenchao Xia

On 03/21/2013 02:42 PM, Paolo Bonzini wrote:
> Il 21/03/2013 14:38, Stefan Hajnoczi ha scritto:
>> There already is a guest RAM cloning mechanism: fork the QEMU process.
>> Then you have a copy-on-write guest RAM.
>>
>> In a little more detail:
>>
>> 1. save non-RAM device state
>> 2. quiesce QEMU to a state that is safe for forking
>> 3. create an EventNotifier for live savevm completion signal
>> 4. fork and pass completion EventNotifier to child
>> 5. parent continues running VM
>> 6. child performs vmsave of copy-on-write guest RAM
>> 7. child signals completion EventNotifier and terminates
>> 8. parent raises live savevm completion QMP event
>
> Forking a threaded program is not so easy, but it could be done if the
> child is very simple and only uses syscalls to communicate back with the
> parent:
>
> 1. save non-RAM device state
> 2. quiesce QEMU to a state that is safe for forking
> 3. create a memory map and a pipe
> 4. fork and pass the write end of the pipe to the child
> 5. parent continues running VM
> 6. child reads the memory map and writes data to the pipe
> 7. parent copies data from the pipe to the migration stream
> 8. child exits, parent raises live savevm completion QMP event
>
> Paolo
>

As I just wrote to Stefan, I've already heard the fork idea.
I was trying to do it without forking the QEMU, but it will need support 
from KVM and it could be harder to do instead of forking the QEMU.

I'll start working on this.

Thanks Paolo

Pavel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC] qmp interface for save vmstate to image
  2013-03-21 13:42         ` Paolo Bonzini
  2013-03-21 13:53           ` Pavel Hrdina
@ 2013-03-21 14:56           ` Stefan Hajnoczi
  2013-03-21 15:08             ` Eric Blake
  1 sibling, 1 reply; 19+ messages in thread
From: Stefan Hajnoczi @ 2013-03-21 14:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Pavel Hrdina, Juan Quintela, qemu-devel,
	Dietmar Maurer, Wenchao Xia

On Thu, Mar 21, 2013 at 02:42:23PM +0100, Paolo Bonzini wrote:
> Il 21/03/2013 14:38, Stefan Hajnoczi ha scritto:
> > There already is a guest RAM cloning mechanism: fork the QEMU process.
> > Then you have a copy-on-write guest RAM.
> > 
> > In a little more detail:
> > 
> > 1. save non-RAM device state
> > 2. quiesce QEMU to a state that is safe for forking
> > 3. create an EventNotifier for live savevm completion signal
> > 4. fork and pass completion EventNotifier to child
> > 5. parent continues running VM
> > 6. child performs vmsave of copy-on-write guest RAM
> > 7. child signals completion EventNotifier and terminates
> > 8. parent raises live savevm completion QMP event
> 
> Forking a threaded program is not so easy, but it could be done if the
> child is very simple and only uses syscalls to communicate back with the
> parent:

On Linux you should be able to use clone(2) to spawn a thread with
copy-on-write memory.  Too bad it's not portable because it gets around
the messy fork issues.

Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC] qmp interface for save vmstate to image
  2013-03-21 14:56           ` Stefan Hajnoczi
@ 2013-03-21 15:08             ` Eric Blake
  2013-03-23  4:36               ` Wenchao Xia
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Blake @ 2013-03-21 15:08 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Pavel Hrdina, Juan Quintela, qemu-devel,
	Dietmar Maurer, Paolo Bonzini, Wenchao Xia

[-- Attachment #1: Type: text/plain, Size: 1438 bytes --]

On 03/21/2013 08:56 AM, Stefan Hajnoczi wrote:
> On Thu, Mar 21, 2013 at 02:42:23PM +0100, Paolo Bonzini wrote:
>> Il 21/03/2013 14:38, Stefan Hajnoczi ha scritto:
>>> There already is a guest RAM cloning mechanism: fork the QEMU process.
>>> Then you have a copy-on-write guest RAM.
>>>
>>> In a little more detail:
>>>
>>> 1. save non-RAM device state
>>> 2. quiesce QEMU to a state that is safe for forking
>>> 3. create an EventNotifier for live savevm completion signal
>>> 4. fork and pass completion EventNotifier to child
>>> 5. parent continues running VM
>>> 6. child performs vmsave of copy-on-write guest RAM
>>> 7. child signals completion EventNotifier and terminates
>>> 8. parent raises live savevm completion QMP event
>>
>> Forking a threaded program is not so easy, but it could be done if the
>> child is very simple and only uses syscalls to communicate back with the
>> parent:
> 
> On Linux you should be able to use clone(2) to spawn a thread with
> copy-on-write memory.  Too bad it's not portable because it gets around
> the messy fork issues.

And introduces its own messy issues - once you clone() using different
flags than what fork() does, you have invalidated the use of a LOT of
libc interfaces in that child; in particular, any use of pthread is
liable to break.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC] qmp interface for save vmstate to image
  2013-03-21 15:08             ` Eric Blake
@ 2013-03-23  4:36               ` Wenchao Xia
  2013-03-27  3:35                 ` Wenchao Xia
  0 siblings, 1 reply; 19+ messages in thread
From: Wenchao Xia @ 2013-03-23  4:36 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, Pavel Hrdina, Juan Quintela, Stefan Hajnoczi,
	qemu-devel, Paolo Bonzini, Dietmar Maurer

于 2013-3-21 23:08, Eric Blake 写道:
> On 03/21/2013 08:56 AM, Stefan Hajnoczi wrote:
>> On Thu, Mar 21, 2013 at 02:42:23PM +0100, Paolo Bonzini wrote:
>>> Il 21/03/2013 14:38, Stefan Hajnoczi ha scritto:
>>>> There already is a guest RAM cloning mechanism: fork the QEMU process.
>>>> Then you have a copy-on-write guest RAM.
>>>>
>>>> In a little more detail:
>>>>
>>>> 1. save non-RAM device state
>>>> 2. quiesce QEMU to a state that is safe for forking
>>>> 3. create an EventNotifier for live savevm completion signal
>>>> 4. fork and pass completion EventNotifier to child
>>>> 5. parent continues running VM
>>>> 6. child performs vmsave of copy-on-write guest RAM
>>>> 7. child signals completion EventNotifier and terminates
>>>> 8. parent raises live savevm completion QMP event
>>>
>>> Forking a threaded program is not so easy, but it could be done if the
>>> child is very simple and only uses syscalls to communicate back with the
>>> parent:
>>
>> On Linux you should be able to use clone(2) to spawn a thread with
>> copy-on-write memory.  Too bad it's not portable because it gets around
>> the messy fork issues.
> 
> And introduces its own messy issues - once you clone() using different
> flags than what fork() does, you have invalidated the use of a LOT of
> libc interfaces in that child; in particular, any use of pthread is
> liable to break.
> 
  I think the core of fork() is snapshot RAM pages with RAM, just like
LVM2's block snapshot, very cool idea :).
  The problem is implemention, an API like following is needed:
void *mem_snapshot(void *addr, uint64_t len);
  Briefly I haven't found it on Linux, and not sure if it is available
on upstream Linux kernel/C lib. Make this API available then use it
in qemu, would be much nicer.
  It is very challenge to use fork()/clone() way in qemu, I guess
there will be many sparse code preparing for fork(), and some
resource handling code after fork(), code to query progress, exception
handling, child/parent talking mechnism, ah... seems complex. But I am
looking forward to see how good it is.
  Compared with migration to image, the later one use less mem with
more I/O, but is much easier to be implemented and portable, maybe
it can be used as a simple improvement for "migrate to fd", before
an underlining mem snapshot API is available.
-- 
Best Regards

Wenchao Xia

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC] qmp interface for save vmstate to image
  2013-03-23  4:36               ` Wenchao Xia
@ 2013-03-27  3:35                 ` Wenchao Xia
  0 siblings, 0 replies; 19+ messages in thread
From: Wenchao Xia @ 2013-03-27  3:35 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, Carsten Otte, Anthony Liguori, Pavel Hrdina,
	Heiko Carstens, Juan Quintela, Stefan Hajnoczi, Marcelo Tosatti,
	Sebastian Ott, qemu-devel, Alexander Graf, Christian Borntraeger,
	Cornelia Huck, Paolo Bonzini, Dietmar Maurer, Martin Schwidefsky

>
  With a deeper thinking, I'd like to share some more analyse:
Vmstate saving equals memory snapshotting, to do it in theory methods
can be concluded as:
1 get a mirror of it just in the time sending the "snapshot" request,
kernel cow that region.
2 get a mirror of it by gradually coping out the region, complete
when clone sync with the original region, basically similar to migrate.

  Take a closer look:
1 cow the memory region:
Saving: block I/O, cpu, since any duplicated step do not exist.
Sacrifice: mem.
Industry improvement solution: NUMA, price: expensive.
Implement: hard, need quite some work.
Qemu code maintain: easy.
Detail:
  This method is the closest one to the meaning of "snapshot", but it
contains a hidden requirement: reserved memory. As a really used
server today, it is not possible that a huge memory is reserved for it:
for example, one 4G mem server will possible to run a 3.5G mem guest,
to get benefit of easing deploying, hardware independency, whole
machine backup/restore. In this case, memory is not enough to do it.
Let's take another example more possible happen: one 4G mem server
run two 1.5G guest, in this case one guest need to be migrated out,
obvious bad. So a much better solution is adding memory at the time
doing snapshot, to do it without hardware plug and economic, it need
NUMA+memory sharing:

Host1    Host2    Host3
|  |     |  |     |  |
|  mem   | mem    |  mem
|        |        |
|------------------
         |
      shared mem

  Some hosts share a memory to do snapshot, they get it when
doing snapshot and return it to cluster manager after complete.
This is possible on expensive architecture, but hard to be done
on x86 architecture which labels itself cheap.
  One unrelated topic I thought: does qemu support migrating
to a host device? If not it should support migrate to a block device
with fixed size(different with snapshot, two mirror need sync), when
shared memory present they can be migrated to a RAM block device
quickly.

Implement detail:
  It should be done by adding an API in kernel: mem_snapshot(),
from where kernel can cow a region, and write the snapshotted pages
to far slower shared mem(if this logic is added as optimization).
Fork() can do it, but brings many trouble and wound not benefit
from NUMA architecture by moving snapshotted pages to slower mem.

2 gradually coping out and sync the memory region, two ways to do it:
2.1 migrate to block device.(migrate to fd, or migrate to image):
Saving: mem.
Sacrifice: CPU, block I/O.
Industry improvement solution: Flash disk, cheap.
Implement: easy, based on migration.
Qemu code maintain: easy.
Detail:
  It is a relative easier case, we just need to make the size fixed.
And flash disk is possible on X86 architecture.

2.2 migrate to a stream, use another process to receive and rearrange
the data.
Saving: mem.
Sacrifice: CPU(very high), block I/O(unless big buffer).
Industry improvement solution: another host or CPU do it.
Implement: hard, need new qemu tool.
Qemu code maintain: hard, data need to be encoded in qemu, decoded
on another process and rearrange, every change or new device adding
need change it on both side.
Detail:
  It invokes a process to receive the data, or invoke a fake qemu
to recieve it and save(need many memory). Since code are hard
to maintain, personally I think it is worse than 2.1.


Summary:
  suggest:
  1) support both method 1 and 2.1, treat 2.1 as an improvement
for migrate fd. Adding a new qmp interface as "vmsate snapshot"
for method 1 to declare it as true snapshot. This allow it
work on different architecture.
  2) pushing a API to Linux to do method 1, instead of fork().
I'd like to send a RFC to Linux memory mail-list to get feedback.



-- 
Best Regards

Wenchao Xia

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-03-27  3:36 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-15  7:24 [Qemu-devel] [RFC] qmp interface for save vmstate to image Wenchao Xia
2013-03-15 14:51 ` Stefan Hajnoczi
2013-03-18  6:40   ` Wenchao Xia
2013-03-18  9:04     ` Kevin Wolf
2013-03-18 10:08       ` Paolo Bonzini
2013-03-18 10:50         ` Wenchao Xia
2013-03-18 10:47       ` Wenchao Xia
2013-03-18 10:09     ` Stefan Hajnoczi
2013-03-18 13:28 ` Pavel Hrdina
2013-03-21  6:43   ` Wenchao Xia
2013-03-21 11:48     ` Pavel Hrdina
2013-03-21 13:38       ` Stefan Hajnoczi
2013-03-21 13:42         ` Paolo Bonzini
2013-03-21 13:53           ` Pavel Hrdina
2013-03-21 14:56           ` Stefan Hajnoczi
2013-03-21 15:08             ` Eric Blake
2013-03-23  4:36               ` Wenchao Xia
2013-03-27  3:35                 ` Wenchao Xia
2013-03-21 13:43         ` Pavel Hrdina

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.