qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Den Lunev <den@openvz.org>,
	Eric Blake <eblake@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Juan Quintela <quintela@redhat.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	Markus Armbruster <armbru@redhat.com>
Subject: Re: [RFC PATCH 0/9] migration/snap-tool: External snapshot utility
Date: Fri, 16 Apr 2021 15:27:07 +0300	[thread overview]
Message-ID: <7a9f8bbd-01f9-f7fe-76ee-12a17b5861e0@virtuozzo.com> (raw)
In-Reply-To: <20210415235032.GS4440@xz-x1>

[-- Attachment #1: Type: text/plain, Size: 6256 bytes --]

On 16.04.2021 02:50, Peter Xu wrote:
> On Wed, Mar 17, 2021 at 07:32:13PM +0300, Andrey Gruzdev wrote:
>> This series is a kind of PoC for asynchronous snapshot reverting. This is
>> about external snapshots only and doesn't involve block devices. Thus, it's
>> mainly intended to be used with the new 'background-snapshot' migration
>> capability and otherwise standard QEMU migration mechanism.
>>
>> The major ideas behind this first version were:
>>    * Make it compatible with 'exec:'-style migration - options can be create
>>      some separate tool or integrate into qemu-system.
>>    * Support asynchronous revert stage by using unaltered postcopy logic
>>      at destination. To do this, we should be capable of saving RAM pages
>>      so that any particular page can be directly addressed by it's block ID
>>      and page offset. Possible solutions here seem to be:
>>        use separate index (and storing it somewhere)
>>        create sparse file on host FS and address pages with file offset
>>        use QCOW2 (or other) image container with inherent sparsity support
>>    * Make snapshot image file dense on the host FS so we don't depend on
>>      copy/backup tools and how they deal with sparse files. Off course,
>>      there's some performance cost for this choice.
>>    * Make the code which is parsing unstructered format of migration stream,
>>      at least, not very sophisticated. Also, try to have minimum dependencies
>>      on QEMU migration code, both RAM and device.
>>    * Try to keep page save latencies small while not degrading migration
>>      bandwidth too much.
>>
>> For this first version I decided not to integrate into main QEMU code but
>> create a separate tool. The main reason is that there's not too much migration
>> code that is target-specific and can be used in it's unmodified form. Also,
>> it's still not very clear how to make 'qemu-system' integration in terms of
>> command-line (or monitor/QMP?) interface extension.
>>
>> For the storage format, QCOW2 as a container and large (1MB) cluster size seem
>> to be an optimal choice. Larger cluster is beneficial for performance particularly
>> in the case when image preallocation is disabled. Such cluster size does not result
>> in too high internal fragmentation level (~10% of space waste in most cases) yet
>> allows to reduce significantly the number of expensive cluster allocations.
>>
>> A bit tricky part is dispatching QEMU migration stream cause it is mostly
>> unstructered and depends on configuration parameters like 'send-configuration'
>> and 'send-section-footer'. But, for the case with default values in migration
>> globals it seems that implemented dispatching code works well and won't have
>> compatibility issues in a reasonably long time frame.
>>
>> I decided to keep RAM save path synchronous, anyhow it's better to use writeback
>> cache mode for the live snapshots cause of it's interleaving page address pattern.
>> Page coalescing buffer is used to merge contiguous pages to optimize block layer
>> writes.
>>
>> Since for snapshot loading opening image file in cached mode would not do any good,
>> it implies that Linux native AIO and O_DIRECT mode is used in a common scenario.
>> AIO support in RAM loading path is implemented by using a ring of preallocated
>> fixed-sized buffers in such a way that there's always a number of outstanding block
>> requests anytime. It also ensures in-order request completion.
>>
>> How to use:
>>
>> **Save:**
>> * qemu> migrate_set_capability background-snapshot on
>> * qemu> migrate "exec:<qemu-bin-path>/qemu-snap -s <virtual-size>
>>             --cache=writeback --aio=threads save <image-file.qcow2>"
>>
>> **Load:**
>> * Use 'qemu-system-* -incoming defer'
>> * qemu> migrate_incoming "exec:<qemu-bin-path>/qemu-snap
>>            --cache=none --aio=native load <image-file.qcow2>"
>>
>> **Load with postcopy:**
>> * Use 'qemu-system-* -incoming defer'
>> * qemu> migrate_set_capability postcopy-ram on
>> * qemu> migrate_incoming "exec:<qemu-bin-path>/qemu-snap --postcopy=60
>>            --cache=none --aio=native load <image-file.qcow2>"
>>
>> And yes, asynchronous revert works well only with SSD, not with rotational disk..
>>
>> Some performance stats:
>> * SATA SSD drive with ~500/450 MB/s sequantial read/write and ~60K IOPS max.
>> * 220 MB/s average save rate (depends on workload)
>> * 440 MB/s average load rate in precopy
>> * 260 MB/s average load rate in postcopy
> Andrey,
>
> Before I try to read it (since I'm probably not the best person to review
> it..).. Would you remind me on the major difference of external snapshots
> comparing to the existing one, and problems to solve?
>
> Thanks,
>
Hi Peter,

For the external snapshots - the difference (compared to internal) is that snapshot
data is going to storage objects which are not part VM config. I mean that for internal
snapshots we use configured storage of the VM instance to store both vm state and blockdev
snapshot data. The opposite is for external snapshots when we save vmstate and blockdev
snapshots to separate files on the host. Also external snapshots are not managed by QEMU.

One of the problems is that the vmstate part of external snapshot is essentially the
migration stream which is schema-less and it's structure is dependent on QEMU target.
That means that currently we can do a revert-to-snapshot operation with the sequence of
QMP commands but we can do that only in a synchronous way, i.e. vcpus can't be started
until all of the vmstate data has been transferred. The reason for this synchronous
behavior is that we cannot locate arbitrary RAM page in raw migration stream if we start
vcpus early and get faults for the pages that are missing on destination vm.

So the major goal of this PoC is to demonstrate asynchronous snapshot reverting in QEMU
while keeping migration code mostly unchanged. To do that we need to split migration stream
into two parts, particularly these parts are RAM pages and the rest of vmstate. And then,
if we can do this, RAM pages can be dispatched directly to a block device with block offsets
deduced from page GPAs.


-- 
Andrey Gruzdev, Principal Engineer
Virtuozzo GmbH  +7-903-247-6397
                 virtuzzo.com


[-- Attachment #2: Type: text/html, Size: 6604 bytes --]

      reply	other threads:[~2021-04-16 12:44 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-17 16:32 [RFC PATCH 0/9] migration/snap-tool: External snapshot utility Andrey Gruzdev
2021-03-17 16:32 ` [RFC PATCH 1/9] migration/snap-tool: Introduce qemu-snap tool Andrey Gruzdev
2021-03-17 16:32 ` [RFC PATCH 2/9] migration/snap-tool: Snapshot image create/open routines for " Andrey Gruzdev
2021-03-17 16:32 ` [RFC PATCH 3/9] migration/snap-tool: Preparations to run code in main loop context Andrey Gruzdev
2021-03-17 16:32 ` [RFC PATCH 4/9] migration/snap-tool: Introduce qemu_ftell2() routine to qemu-file.c Andrey Gruzdev
2021-03-17 16:32 ` [RFC PATCH 5/9] migration/snap-tool: Block layer AIO support and file utility routines Andrey Gruzdev
2021-03-17 16:32 ` [RFC PATCH 6/9] migration/snap-tool: Move RAM_SAVE_FLAG_xxx defines to migration/ram.h Andrey Gruzdev
2021-03-17 16:32 ` [RFC PATCH 7/9] migration/snap-tool: Complete implementation of snapshot saving Andrey Gruzdev
2021-03-17 16:32 ` [RFC PATCH 8/9] migration/snap-tool: Implementation of snapshot loading in precopy Andrey Gruzdev
2021-03-17 16:32 ` [RFC PATCH 9/9] migration/snap-tool: Implementation of snapshot loading in postcopy Andrey Gruzdev
2021-03-29  8:11 ` [RFC PATCH 0/9] migration/snap-tool: External snapshot utility Andrey Gruzdev
2021-04-15 23:50 ` Peter Xu
2021-04-16 12:27   ` Andrey Gruzdev [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7a9f8bbd-01f9-f7fe-76ee-12a17b5861e0@virtuozzo.com \
    --to=andrey.gruzdev@virtuozzo.com \
    --cc=armbru@redhat.com \
    --cc=den@openvz.org \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).