Ping On 17.03.2021 19:32, Andrey Gruzdev wrote: > This series is a kind of PoC for asynchronous snapshot reverting. This is > about external snapshots only and doesn't involve block devices. Thus, it's > mainly intended to be used with the new 'background-snapshot' migration > capability and otherwise standard QEMU migration mechanism. > > The major ideas behind this first version were: > * Make it compatible with 'exec:'-style migration - options can be create > some separate tool or integrate into qemu-system. > * Support asynchronous revert stage by using unaltered postcopy logic > at destination. To do this, we should be capable of saving RAM pages > so that any particular page can be directly addressed by it's block ID > and page offset. Possible solutions here seem to be: > use separate index (and storing it somewhere) > create sparse file on host FS and address pages with file offset > use QCOW2 (or other) image container with inherent sparsity support > * Make snapshot image file dense on the host FS so we don't depend on > copy/backup tools and how they deal with sparse files. Off course, > there's some performance cost for this choice. > * Make the code which is parsing unstructered format of migration stream, > at least, not very sophisticated. Also, try to have minimum dependencies > on QEMU migration code, both RAM and device. > * Try to keep page save latencies small while not degrading migration > bandwidth too much. > > For this first version I decided not to integrate into main QEMU code but > create a separate tool. The main reason is that there's not too much migration > code that is target-specific and can be used in it's unmodified form. Also, > it's still not very clear how to make 'qemu-system' integration in terms of > command-line (or monitor/QMP?) interface extension. > > For the storage format, QCOW2 as a container and large (1MB) cluster size seem > to be an optimal choice. Larger cluster is beneficial for performance particularly > in the case when image preallocation is disabled. Such cluster size does not result > in too high internal fragmentation level (~10% of space waste in most cases) yet > allows to reduce significantly the number of expensive cluster allocations. > > A bit tricky part is dispatching QEMU migration stream cause it is mostly > unstructered and depends on configuration parameters like 'send-configuration' > and 'send-section-footer'. But, for the case with default values in migration > globals it seems that implemented dispatching code works well and won't have > compatibility issues in a reasonably long time frame. > > I decided to keep RAM save path synchronous, anyhow it's better to use writeback > cache mode for the live snapshots cause of it's interleaving page address pattern. > Page coalescing buffer is used to merge contiguous pages to optimize block layer > writes. > > Since for snapshot loading opening image file in cached mode would not do any good, > it implies that Linux native AIO and O_DIRECT mode is used in a common scenario. > AIO support in RAM loading path is implemented by using a ring of preallocated > fixed-sized buffers in such a way that there's always a number of outstanding block > requests anytime. It also ensures in-order request completion. > > How to use: > > **Save:** > * qemu> migrate_set_capability background-snapshot on > * qemu> migrate "exec:/qemu-snap -s > --cache=writeback --aio=threads save " > > **Load:** > * Use 'qemu-system-* -incoming defer' > * qemu> migrate_incoming "exec:/qemu-snap > --cache=none --aio=native load " > > **Load with postcopy:** > * Use 'qemu-system-* -incoming defer' > * qemu> migrate_set_capability postcopy-ram on > * qemu> migrate_incoming "exec:/qemu-snap --postcopy=60 > --cache=none --aio=native load " > > And yes, asynchronous revert works well only with SSD, not with rotational disk.. > > Some performance stats: > * SATA SSD drive with ~500/450 MB/s sequantial read/write and ~60K IOPS max. > * 220 MB/s average save rate (depends on workload) > * 440 MB/s average load rate in precopy > * 260 MB/s average load rate in postcopy > > Andrey Gruzdev (9): > migration/snap-tool: Introduce qemu-snap tool > migration/snap-tool: Snapshot image create/open routines for qemu-snap > tool > migration/snap-tool: Preparations to run code in main loop context > migration/snap-tool: Introduce qemu_ftell2() routine to qemu-file.c > migration/snap-tool: Block layer AIO support and file utility routines > migration/snap-tool: Move RAM_SAVE_FLAG_xxx defines to migration/ram.h > migration/snap-tool: Complete implementation of snapshot saving > migration/snap-tool: Implementation of snapshot loading in precopy > migration/snap-tool: Implementation of snapshot loading in postcopy > > include/qemu-snap.h | 163 ++++ > meson.build | 2 + > migration/qemu-file.c | 6 + > migration/qemu-file.h | 1 + > migration/ram.c | 16 - > migration/ram.h | 16 + > qemu-snap-handlers.c | 1801 +++++++++++++++++++++++++++++++++++++++++ > qemu-snap-io.c | 325 ++++++++ > qemu-snap.c | 673 +++++++++++++++ > 9 files changed, 2987 insertions(+), 16 deletions(-) > create mode 100644 include/qemu-snap.h > create mode 100644 qemu-snap-handlers.c > create mode 100644 qemu-snap-io.c > create mode 100644 qemu-snap.c > -- Andrey Gruzdev, Principal Engineer Virtuozzo GmbH +7-903-247-6397 virtuzzo.com