* [PATCH v8 0/6] block: seriously improve savevm/loadvm performance
@ 2020-07-09 13:26 Denis V. Lunev
2020-07-09 13:26 ` [PATCH 1/6] block/aio_task: allow start/wait task from any coroutine Denis V. Lunev
` (7 more replies)
0 siblings, 8 replies; 15+ messages in thread
From: Denis V. Lunev @ 2020-07-09 13:26 UTC (permalink / raw)
To: qemu-block, qemu-devel
Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
Juan Quintela, Dr. David Alan Gilbert, Max Reitz,
Denis Plotnikov, Stefan Hajnoczi, Denis V . Lunev
This series do standard basic things:
- it creates intermediate buffer for all writes from QEMU migration code
to QCOW2 image,
- this buffer is sent to disk asynchronously, allowing several writes to
run in parallel.
In general, migration code is fantastically inefficent (by observation),
buffers are not aligned and sent with arbitrary pieces, a lot of time
less than 100 bytes at a chunk, which results in read-modify-write
operations with non-cached operations. It should also be noted that all
operations are performed into unallocated image blocks, which also suffer
due to partial writes to such new clusters.
This patch series is an implementation of idea discussed in the RFC
posted by Denis Plotnikov
https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg01925.html
Results with this series over NVME are better than original code
original rfc this
cached: 1.79s 2.38s 1.27s
non-cached: 3.29s 1.31s 0.81s
Changes from v7:
- dropped lock from LoadVMState
- fixed assert in last patch
- dropped patch 1 as queued
Changes from v6:
- blk_load_vmstate kludges added (patchew problem fixed)
Changes from v5:
- loadvm optimizations added with Vladimir comments included
Changes from v4:
- added patch 4 with blk_save_vmstate() cleanup
- added R-By
- bdrv_flush_vmstate -> bdrv_finalize_vmstate
- fixed return code of bdrv_co_do_save_vmstate
- fixed typos in comments (Eric, thanks!)
- fixed patchew warnings
Changes from v3:
- rebased to master
- added patch 3 which removes aio_task_pool_wait_one()
- added R-By to patch 1
- patch 4 is rewritten via bdrv_run_co
- error path in blk_save_vmstate() is rewritten to call bdrv_flush_vmstate
unconditionally
- added some comments
- fixes initialization in bdrv_co_vmstate_save_task_entry as suggested
Changes from v2:
- code moved from QCOW2 level to generic block level
- created bdrv_flush_vmstate helper to fix 022, 029 tests
- added recursive for bs->file in bdrv_co_flush_vmstate (fix 267)
- fixed blk_save_vmstate helper
- fixed coroutine wait as Vladimir suggested with waiting fixes from me
Changes from v1:
- patchew warning fixed
- fixed validation that only 1 waiter is allowed in patch 1
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <fam@euphon.net>
CC: Juan Quintela <quintela@redhat.com>
CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/6] block/aio_task: allow start/wait task from any coroutine
2020-07-09 13:26 [PATCH v8 0/6] block: seriously improve savevm/loadvm performance Denis V. Lunev
@ 2020-07-09 13:26 ` Denis V. Lunev
2020-07-09 13:26 ` [PATCH 2/6] block/aio_task: drop aio_task_pool_wait_one() helper Denis V. Lunev
` (6 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: Denis V. Lunev @ 2020-07-09 13:26 UTC (permalink / raw)
To: qemu-block, qemu-devel
Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
Juan Quintela, Dr. David Alan Gilbert, Max Reitz,
Denis Plotnikov, Stefan Hajnoczi, Denis V . Lunev
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Currently, aio task pool assumes that there is a main coroutine, which
creates tasks and wait for them. Let's remove the restriction by using
CoQueue. Code becomes clearer, interface more obvious.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <fam@euphon.net>
CC: Juan Quintela <quintela@redhat.com>
CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
block/aio_task.c | 21 ++++++---------------
1 file changed, 6 insertions(+), 15 deletions(-)
diff --git a/block/aio_task.c b/block/aio_task.c
index 88989fa248..cf62e5c58b 100644
--- a/block/aio_task.c
+++ b/block/aio_task.c
@@ -27,11 +27,10 @@
#include "block/aio_task.h"
struct AioTaskPool {
- Coroutine *main_co;
int status;
int max_busy_tasks;
int busy_tasks;
- bool waiting;
+ CoQueue waiters;
};
static void coroutine_fn aio_task_co(void *opaque)
@@ -52,31 +51,23 @@ static void coroutine_fn aio_task_co(void *opaque)
g_free(task);
- if (pool->waiting) {
- pool->waiting = false;
- aio_co_wake(pool->main_co);
- }
+ qemu_co_queue_restart_all(&pool->waiters);
}
void coroutine_fn aio_task_pool_wait_one(AioTaskPool *pool)
{
assert(pool->busy_tasks > 0);
- assert(qemu_coroutine_self() == pool->main_co);
- pool->waiting = true;
- qemu_coroutine_yield();
+ qemu_co_queue_wait(&pool->waiters, NULL);
- assert(!pool->waiting);
assert(pool->busy_tasks < pool->max_busy_tasks);
}
void coroutine_fn aio_task_pool_wait_slot(AioTaskPool *pool)
{
- if (pool->busy_tasks < pool->max_busy_tasks) {
- return;
+ while (pool->busy_tasks >= pool->max_busy_tasks) {
+ aio_task_pool_wait_one(pool);
}
-
- aio_task_pool_wait_one(pool);
}
void coroutine_fn aio_task_pool_wait_all(AioTaskPool *pool)
@@ -98,8 +89,8 @@ AioTaskPool *coroutine_fn aio_task_pool_new(int max_busy_tasks)
{
AioTaskPool *pool = g_new0(AioTaskPool, 1);
- pool->main_co = qemu_coroutine_self();
pool->max_busy_tasks = max_busy_tasks;
+ qemu_co_queue_init(&pool->waiters);
return pool;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/6] block/aio_task: drop aio_task_pool_wait_one() helper
2020-07-09 13:26 [PATCH v8 0/6] block: seriously improve savevm/loadvm performance Denis V. Lunev
2020-07-09 13:26 ` [PATCH 1/6] block/aio_task: allow start/wait task from any coroutine Denis V. Lunev
@ 2020-07-09 13:26 ` Denis V. Lunev
2020-07-09 13:26 ` [PATCH 3/6] block/block-backend: remove always true check from blk_save_vmstate Denis V. Lunev
` (5 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: Denis V. Lunev @ 2020-07-09 13:26 UTC (permalink / raw)
To: qemu-block, qemu-devel
Cc: Kevin Wolf, Fam Zheng, Juan Quintela, Dr. David Alan Gilbert,
Max Reitz, Denis Plotnikov, Stefan Hajnoczi, Denis V. Lunev
It is not used outside the module.
Actually there are 2 kind of waiters:
- for a slot and
- for all tasks to finish
This patch limits external API to listed types.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <fam@euphon.net>
CC: Juan Quintela <quintela@redhat.com>
CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
block/aio_task.c | 13 ++-----------
include/block/aio_task.h | 1 -
2 files changed, 2 insertions(+), 12 deletions(-)
diff --git a/block/aio_task.c b/block/aio_task.c
index cf62e5c58b..7ba15ff41f 100644
--- a/block/aio_task.c
+++ b/block/aio_task.c
@@ -54,26 +54,17 @@ static void coroutine_fn aio_task_co(void *opaque)
qemu_co_queue_restart_all(&pool->waiters);
}
-void coroutine_fn aio_task_pool_wait_one(AioTaskPool *pool)
-{
- assert(pool->busy_tasks > 0);
-
- qemu_co_queue_wait(&pool->waiters, NULL);
-
- assert(pool->busy_tasks < pool->max_busy_tasks);
-}
-
void coroutine_fn aio_task_pool_wait_slot(AioTaskPool *pool)
{
while (pool->busy_tasks >= pool->max_busy_tasks) {
- aio_task_pool_wait_one(pool);
+ qemu_co_queue_wait(&pool->waiters, NULL);
}
}
void coroutine_fn aio_task_pool_wait_all(AioTaskPool *pool)
{
while (pool->busy_tasks > 0) {
- aio_task_pool_wait_one(pool);
+ qemu_co_queue_wait(&pool->waiters, NULL);
}
}
diff --git a/include/block/aio_task.h b/include/block/aio_task.h
index 50bc1e1817..50b1c036c5 100644
--- a/include/block/aio_task.h
+++ b/include/block/aio_task.h
@@ -48,7 +48,6 @@ bool aio_task_pool_empty(AioTaskPool *pool);
void coroutine_fn aio_task_pool_start_task(AioTaskPool *pool, AioTask *task);
void coroutine_fn aio_task_pool_wait_slot(AioTaskPool *pool);
-void coroutine_fn aio_task_pool_wait_one(AioTaskPool *pool);
void coroutine_fn aio_task_pool_wait_all(AioTaskPool *pool);
#endif /* BLOCK_AIO_TASK_H */
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/6] block/block-backend: remove always true check from blk_save_vmstate
2020-07-09 13:26 [PATCH v8 0/6] block: seriously improve savevm/loadvm performance Denis V. Lunev
2020-07-09 13:26 ` [PATCH 1/6] block/aio_task: allow start/wait task from any coroutine Denis V. Lunev
2020-07-09 13:26 ` [PATCH 2/6] block/aio_task: drop aio_task_pool_wait_one() helper Denis V. Lunev
@ 2020-07-09 13:26 ` Denis V. Lunev
2020-07-09 13:26 ` [PATCH 4/6] block, migration: add bdrv_finalize_vmstate helper Denis V. Lunev
` (4 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: Denis V. Lunev @ 2020-07-09 13:26 UTC (permalink / raw)
To: qemu-block, qemu-devel
Cc: Kevin Wolf, Fam Zheng, Juan Quintela, Dr. David Alan Gilbert,
Max Reitz, Denis Plotnikov, Stefan Hajnoczi, Denis V. Lunev
bdrv_save_vmstate() returns either error with negative return value or
size. Thus this check is useless.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Suggested-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <fam@euphon.net>
CC: Juan Quintela <quintela@redhat.com>
CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
block/block-backend.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/block-backend.c b/block/block-backend.c
index 6936b25c83..1c6e53bbde 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2188,7 +2188,7 @@ int blk_save_vmstate(BlockBackend *blk, const uint8_t *buf,
return ret;
}
- if (ret == size && !blk->enable_write_cache) {
+ if (!blk->enable_write_cache) {
ret = bdrv_flush(blk_bs(blk));
}
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 4/6] block, migration: add bdrv_finalize_vmstate helper
2020-07-09 13:26 [PATCH v8 0/6] block: seriously improve savevm/loadvm performance Denis V. Lunev
` (2 preceding siblings ...)
2020-07-09 13:26 ` [PATCH 3/6] block/block-backend: remove always true check from blk_save_vmstate Denis V. Lunev
@ 2020-07-09 13:26 ` Denis V. Lunev
2020-08-27 12:58 ` Daniel P. Berrangé
2020-07-09 13:26 ` [PATCH 5/6] block/io: improve savevm performance Denis V. Lunev
` (3 subsequent siblings)
7 siblings, 1 reply; 15+ messages in thread
From: Denis V. Lunev @ 2020-07-09 13:26 UTC (permalink / raw)
To: qemu-block, qemu-devel
Cc: Kevin Wolf, Fam Zheng, Juan Quintela, Dr. David Alan Gilbert,
Max Reitz, Denis Plotnikov, Stefan Hajnoczi, Denis V. Lunev
Right now bdrv_fclose() is just calling bdrv_flush().
The problem is that migration code is working inefficiently from block
layer terms and are frequently called for very small pieces of
unaligned data. Block layer is capable to work this way, but this is very
slow.
This patch is a preparation for the introduction of the intermediate
buffer at block driver state. It would be beneficial to separate
conventional bdrv_flush() from closing QEMU file from migration code.
The patch also forces bdrv_finalize_vmstate() operation inside
synchronous blk_save_vmstate() operation. This helper is used from
qemu-io only.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <fam@euphon.net>
CC: Juan Quintela <quintela@redhat.com>
CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
block/block-backend.c | 6 +++++-
block/io.c | 15 +++++++++++++++
include/block/block.h | 5 +++++
migration/savevm.c | 4 ++++
4 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/block/block-backend.c b/block/block-backend.c
index 1c6e53bbde..5bb11c8e01 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2177,16 +2177,20 @@ int blk_truncate(BlockBackend *blk, int64_t offset, bool exact,
int blk_save_vmstate(BlockBackend *blk, const uint8_t *buf,
int64_t pos, int size)
{
- int ret;
+ int ret, ret2;
if (!blk_is_available(blk)) {
return -ENOMEDIUM;
}
ret = bdrv_save_vmstate(blk_bs(blk), buf, pos, size);
+ ret2 = bdrv_finalize_vmstate(blk_bs(blk));
if (ret < 0) {
return ret;
}
+ if (ret2 < 0) {
+ return ret2;
+ }
if (!blk->enable_write_cache) {
ret = bdrv_flush(blk_bs(blk));
diff --git a/block/io.c b/block/io.c
index b6564e34c5..be1fac0be8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2724,6 +2724,21 @@ int bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
return bdrv_rw_vmstate(bs, qiov, pos, true);
}
+static int coroutine_fn bdrv_co_finalize_vmstate(BlockDriverState *bs)
+{
+ return 0;
+}
+
+static int coroutine_fn bdrv_finalize_vmstate_co_entry(void *opaque)
+{
+ return bdrv_co_finalize_vmstate(opaque);
+}
+
+int bdrv_finalize_vmstate(BlockDriverState *bs)
+{
+ return bdrv_run_co(bs, bdrv_finalize_vmstate_co_entry, bs);
+}
+
/**************************************************************/
/* async I/Os */
diff --git a/include/block/block.h b/include/block/block.h
index bca3bb831c..124eb8d422 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -567,6 +567,11 @@ int bdrv_save_vmstate(BlockDriverState *bs, const uint8_t *buf,
int bdrv_load_vmstate(BlockDriverState *bs, uint8_t *buf,
int64_t pos, int size);
+/*
+ * bdrv_finalize_vmstate() is mandatory to commit vmstate changes if
+ * bdrv_save_vmstate() was ever called.
+ */
+int bdrv_finalize_vmstate(BlockDriverState *bs);
void bdrv_img_create(const char *filename, const char *fmt,
const char *base_filename, const char *base_fmt,
diff --git a/migration/savevm.c b/migration/savevm.c
index 45c9dd9d8a..d8a94e312c 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -150,6 +150,10 @@ static ssize_t block_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
static int bdrv_fclose(void *opaque, Error **errp)
{
+ int err = bdrv_finalize_vmstate(opaque);
+ if (err < 0) {
+ return err;
+ }
return bdrv_flush(opaque);
}
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 5/6] block/io: improve savevm performance
2020-07-09 13:26 [PATCH v8 0/6] block: seriously improve savevm/loadvm performance Denis V. Lunev
` (3 preceding siblings ...)
2020-07-09 13:26 ` [PATCH 4/6] block, migration: add bdrv_finalize_vmstate helper Denis V. Lunev
@ 2020-07-09 13:26 ` Denis V. Lunev
2020-07-09 13:26 ` [PATCH 6/6] block/io: improve loadvm performance Denis V. Lunev
` (2 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: Denis V. Lunev @ 2020-07-09 13:26 UTC (permalink / raw)
To: qemu-block, qemu-devel
Cc: Kevin Wolf, Fam Zheng, Juan Quintela, Dr. David Alan Gilbert,
Max Reitz, Denis Plotnikov, Stefan Hajnoczi, Denis V. Lunev
This patch does 2 standard basic things:
- it creates intermediate buffer for all writes from QEMU migration code
to block driver,
- this buffer is sent to disk asynchronously, allowing several writes to
run in parallel.
Thus bdrv_vmstate_write() is becoming asynchronous. All pending operations
completion are performed in newly invented bdrv_finalize_vmstate().
In general, migration code is fantastically inefficent (by observation),
buffers are not aligned and sent with arbitrary pieces, a lot of time
less than 100 bytes at a chunk, which results in read-modify-write
operations if target file descriptor is opened with O_DIRECT. It should
also be noted that all operations are performed into unallocated image
blocks, which also suffer due to partial writes to such new clusters
even on cached file descriptors.
Snapshot creation time (2 GB Fedora-31 VM running over NVME storage):
original fixed
cached: 1.79s 1.27s
non-cached: 3.29s 0.81s
The difference over HDD would be more significant :)
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <fam@euphon.net>
CC: Juan Quintela <quintela@redhat.com>
CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
block/io.c | 126 +++++++++++++++++++++++++++++++++++++-
include/block/block_int.h | 8 +++
2 files changed, 132 insertions(+), 2 deletions(-)
diff --git a/block/io.c b/block/io.c
index be1fac0be8..061d3239b9 100644
--- a/block/io.c
+++ b/block/io.c
@@ -26,6 +26,7 @@
#include "trace.h"
#include "sysemu/block-backend.h"
#include "block/aio-wait.h"
+#include "block/aio_task.h"
#include "block/blockjob.h"
#include "block/blockjob_int.h"
#include "block/block_int.h"
@@ -33,6 +34,7 @@
#include "qapi/error.h"
#include "qemu/error-report.h"
#include "qemu/main-loop.h"
+#include "qemu/units.h"
#include "sysemu/replay.h"
/* Maximum bounce buffer for copy-on-read and write zeroes, in bytes */
@@ -2640,6 +2642,103 @@ typedef struct BdrvVmstateCo {
bool is_read;
} BdrvVmstateCo;
+typedef struct BdrvVMStateTask {
+ AioTask task;
+
+ BlockDriverState *bs;
+ int64_t offset;
+ void *buf;
+ size_t bytes;
+} BdrvVMStateTask;
+
+typedef struct BdrvSaveVMState {
+ AioTaskPool *pool;
+ BdrvVMStateTask *t;
+} BdrvSaveVMState;
+
+
+static coroutine_fn int bdrv_co_vmstate_save_task_entry(AioTask *task)
+{
+ int err = 0;
+ BdrvVMStateTask *t = container_of(task, BdrvVMStateTask, task);
+
+ if (t->bytes != 0) {
+ QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, t->buf, t->bytes);
+
+ bdrv_inc_in_flight(t->bs);
+ err = t->bs->drv->bdrv_save_vmstate(t->bs, &qiov, t->offset);
+ bdrv_dec_in_flight(t->bs);
+ }
+
+ qemu_vfree(t->buf);
+ return err;
+}
+
+static BdrvVMStateTask *bdrv_vmstate_task_create(BlockDriverState *bs,
+ int64_t pos, size_t size)
+{
+ BdrvVMStateTask *t = g_new(BdrvVMStateTask, 1);
+
+ *t = (BdrvVMStateTask) {
+ .task.func = bdrv_co_vmstate_save_task_entry,
+ .buf = qemu_blockalign(bs, size),
+ .offset = pos,
+ .bs = bs,
+ };
+
+ return t;
+}
+
+static int bdrv_co_do_save_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
+ int64_t pos)
+{
+ BdrvSaveVMState *state = bs->savevm_state;
+ BdrvVMStateTask *t;
+ size_t buf_size = MAX(bdrv_get_cluster_size(bs), 1 * MiB);
+ size_t to_copy, off;
+
+ if (state == NULL) {
+ state = g_new(BdrvSaveVMState, 1);
+ *state = (BdrvSaveVMState) {
+ .pool = aio_task_pool_new(BDRV_VMSTATE_WORKERS_MAX),
+ .t = bdrv_vmstate_task_create(bs, pos, buf_size),
+ };
+
+ bs->savevm_state = state;
+ }
+
+ if (aio_task_pool_status(state->pool) < 0) {
+ /*
+ * The operation as a whole is unsuccessful. Prohibit all futher
+ * operations. If we clean here, new useless ops will come again.
+ * Thus we rely on caller for cleanup here.
+ */
+ return aio_task_pool_status(state->pool);
+ }
+
+ t = state->t;
+ if (t->offset + t->bytes != pos) {
+ /* Normally this branch is not reachable from migration */
+ return bs->drv->bdrv_save_vmstate(bs, qiov, pos);
+ }
+
+ off = 0;
+ while (1) {
+ to_copy = MIN(qiov->size - off, buf_size - t->bytes);
+ qemu_iovec_to_buf(qiov, off, t->buf + t->bytes, to_copy);
+ t->bytes += to_copy;
+ if (t->bytes < buf_size) {
+ return 0;
+ }
+
+ aio_task_pool_start_task(state->pool, &t->task);
+
+ pos += to_copy;
+ off += to_copy;
+ state->t = t = bdrv_vmstate_task_create(bs, pos, buf_size);
+ }
+}
+
static int coroutine_fn
bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
bool is_read)
@@ -2655,7 +2754,7 @@ bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
if (is_read) {
ret = drv->bdrv_load_vmstate(bs, qiov, pos);
} else {
- ret = drv->bdrv_save_vmstate(bs, qiov, pos);
+ ret = bdrv_co_do_save_vmstate(bs, qiov, pos);
}
} else if (bs->file) {
ret = bdrv_co_rw_vmstate(bs->file->bs, qiov, pos, is_read);
@@ -2726,7 +2825,30 @@ int bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
static int coroutine_fn bdrv_co_finalize_vmstate(BlockDriverState *bs)
{
- return 0;
+ int err;
+ BdrvSaveVMState *state = bs->savevm_state;
+
+ if (bs->drv->bdrv_save_vmstate == NULL && bs->file != NULL) {
+ return bdrv_co_finalize_vmstate(bs->file->bs);
+ }
+ if (state == NULL) {
+ return 0;
+ }
+
+ if (aio_task_pool_status(state->pool) >= 0) {
+ /* We are on success path, commit last chunk if possible */
+ aio_task_pool_start_task(state->pool, &state->t->task);
+ }
+
+ aio_task_pool_wait_all(state->pool);
+ err = aio_task_pool_status(state->pool);
+
+ aio_task_pool_free(state->pool);
+ g_free(state);
+
+ bs->savevm_state = NULL;
+
+ return err;
}
static int coroutine_fn bdrv_finalize_vmstate_co_entry(void *opaque)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 3d6cf88592..e975ac4c21 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -61,6 +61,8 @@
#define BLOCK_PROBE_BUF_SIZE 512
+#define BDRV_VMSTATE_WORKERS_MAX 8
+
enum BdrvTrackedRequestType {
BDRV_TRACKED_READ,
BDRV_TRACKED_WRITE,
@@ -808,6 +810,9 @@ struct BdrvChild {
QLIST_ENTRY(BdrvChild) next_parent;
};
+
+typedef struct BdrvSaveVMState BdrvSaveVMState;
+
/*
* Note: the function bdrv_append() copies and swaps contents of
* BlockDriverStates, so if you add new fields to this struct, please
@@ -971,6 +976,9 @@ struct BlockDriverState {
/* BdrvChild links to this node may never be frozen */
bool never_freeze;
+
+ /* Intermediate buffer for VM state saving from snapshot creation code */
+ BdrvSaveVMState *savevm_state;
};
struct BlockBackendRootState {
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 6/6] block/io: improve loadvm performance
2020-07-09 13:26 [PATCH v8 0/6] block: seriously improve savevm/loadvm performance Denis V. Lunev
` (4 preceding siblings ...)
2020-07-09 13:26 ` [PATCH 5/6] block/io: improve savevm performance Denis V. Lunev
@ 2020-07-09 13:26 ` Denis V. Lunev
2020-07-10 14:40 ` Vladimir Sementsov-Ogievskiy
2020-08-20 7:42 ` [PATCH v8 0/6] block: seriously improve savevm/loadvm performance Denis V. Lunev
2020-09-14 13:13 ` Stefan Hajnoczi
7 siblings, 1 reply; 15+ messages in thread
From: Denis V. Lunev @ 2020-07-09 13:26 UTC (permalink / raw)
To: qemu-block, qemu-devel
Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
Juan Quintela, Max Reitz, Denis Plotnikov, Stefan Hajnoczi,
Denis V. Lunev
This patch creates intermediate buffer for reading from block driver
state and performs read-ahead to this buffer. Snapshot code performs
reads sequentially and thus we know what offsets will be required
and when they will become not needed.
Results are fantastic. Switch to snapshot times of 2GB Fedora 31 VM
over NVME storage are the following:
original fixed
cached: 1.84s 1.16s
non-cached: 12.74s 1.27s
The difference over HDD would be more significant :)
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <fam@euphon.net>
CC: Juan Quintela <quintela@redhat.com>
CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
block/block-backend.c | 12 +-
block/io.c | 239 +++++++++++++++++++++++++++++++++++++-
include/block/block_int.h | 3 +
3 files changed, 250 insertions(+), 4 deletions(-)
diff --git a/block/block-backend.c b/block/block-backend.c
index 5bb11c8e01..09773b3e37 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2201,11 +2201,21 @@ int blk_save_vmstate(BlockBackend *blk, const uint8_t *buf,
int blk_load_vmstate(BlockBackend *blk, uint8_t *buf, int64_t pos, int size)
{
+ int ret, ret2;
+
if (!blk_is_available(blk)) {
return -ENOMEDIUM;
}
- return bdrv_load_vmstate(blk_bs(blk), buf, pos, size);
+ ret = bdrv_load_vmstate(blk_bs(blk), buf, pos, size);
+ ret2 = bdrv_finalize_vmstate(blk_bs(blk));
+ if (ret < 0) {
+ return ret;
+ }
+ if (ret2 < 0) {
+ return ret2;
+ }
+ return ret;
}
int blk_probe_blocksizes(BlockBackend *blk, BlockSizes *bsz)
diff --git a/block/io.c b/block/io.c
index 061d3239b9..510122fbc4 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2739,6 +2739,194 @@ static int bdrv_co_do_save_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
}
}
+
+typedef struct BdrvLoadVMChunk {
+ void *buf;
+ uint64_t offset;
+ ssize_t bytes;
+
+ QLIST_ENTRY(BdrvLoadVMChunk) list;
+} BdrvLoadVMChunk;
+
+typedef struct BdrvLoadVMState {
+ AioTaskPool *pool;
+
+ int64_t offset;
+ int64_t last_loaded;
+
+ int chunk_count;
+ QLIST_HEAD(, BdrvLoadVMChunk) chunks;
+ QLIST_HEAD(, BdrvLoadVMChunk) loading;
+ CoQueue waiters;
+} BdrvLoadVMState;
+
+typedef struct BdrvLoadVMStateTask {
+ AioTask task;
+
+ BlockDriverState *bs;
+ BdrvLoadVMChunk *chunk;
+} BdrvLoadVMStateTask;
+
+static BdrvLoadVMChunk *bdrv_co_find_loadvmstate_chunk(int64_t pos,
+ BdrvLoadVMChunk *c)
+{
+ for (; c != NULL; c = QLIST_NEXT(c, list)) {
+ if (c->offset <= pos && c->offset + c->bytes > pos) {
+ return c;
+ }
+ }
+
+ return NULL;
+}
+
+static void bdrv_free_loadvm_chunk(BdrvLoadVMChunk *c)
+{
+ qemu_vfree(c->buf);
+ g_free(c);
+}
+
+static coroutine_fn int bdrv_co_vmstate_load_task_entry(AioTask *task)
+{
+ int err = 0;
+ BdrvLoadVMStateTask *t = container_of(task, BdrvLoadVMStateTask, task);
+ BdrvLoadVMChunk *c = t->chunk;
+ BdrvLoadVMState *state = t->bs->loadvm_state;
+ QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, c->buf, c->bytes);
+
+ bdrv_inc_in_flight(t->bs);
+ err = t->bs->drv->bdrv_load_vmstate(t->bs, &qiov, c->offset);
+ bdrv_dec_in_flight(t->bs);
+
+ QLIST_REMOVE(c, list);
+ if (err == 0) {
+ QLIST_INSERT_HEAD(&state->chunks, c, list);
+ } else {
+ bdrv_free_loadvm_chunk(c);
+ }
+ qemu_co_queue_restart_all(&state->waiters);
+
+ return err;
+}
+
+
+static void bdrv_co_loadvmstate_next(BlockDriverState *bs, BdrvLoadVMChunk *c)
+{
+ BdrvLoadVMStateTask *t = g_new(BdrvLoadVMStateTask, 1);
+ BdrvLoadVMState *state = bs->loadvm_state;
+
+ c->offset = state->last_loaded;
+
+ *t = (BdrvLoadVMStateTask) {
+ .task.func = bdrv_co_vmstate_load_task_entry,
+ .bs = bs,
+ .chunk = c,
+ };
+
+ QLIST_INSERT_HEAD(&state->loading, t->chunk, list);
+ state->chunk_count++;
+ state->last_loaded += c->bytes;
+
+ aio_task_pool_start_task(state->pool, &t->task);
+}
+
+
+static void bdrv_co_loadvmstate_start(BlockDriverState *bs)
+{
+ int i;
+ size_t buf_size = MAX(bdrv_get_cluster_size(bs), 1 * MiB);
+
+ for (i = 0; i < BDRV_VMSTATE_WORKERS_MAX; i++) {
+ BdrvLoadVMChunk *c = g_new0(BdrvLoadVMChunk, 1);
+
+ c->buf = qemu_blockalign(bs, buf_size);
+ c->bytes = buf_size;
+
+ bdrv_co_loadvmstate_next(bs, c);
+ }
+}
+
+static int bdrv_co_do_load_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
+ int64_t pos)
+{
+ BdrvLoadVMState *state = bs->loadvm_state;
+ BdrvLoadVMChunk *c;
+ size_t off;
+ int64_t start_pos = pos;
+
+ if (state == NULL) {
+ if (pos != 0) {
+ goto slow_path;
+ }
+
+ state = g_new(BdrvLoadVMState, 1);
+ *state = (BdrvLoadVMState) {
+ .pool = aio_task_pool_new(BDRV_VMSTATE_WORKERS_MAX),
+ .chunks = QLIST_HEAD_INITIALIZER(state->chunks),
+ .loading = QLIST_HEAD_INITIALIZER(state->loading),
+ };
+ qemu_co_queue_init(&state->waiters);
+
+ bs->loadvm_state = state;
+
+ bdrv_co_loadvmstate_start(bs);
+ }
+
+ if (state->offset != pos) {
+ goto slow_path;
+ }
+
+ off = 0;
+
+ while (off < qiov->size && aio_task_pool_status(state->pool) == 0) {
+ c = bdrv_co_find_loadvmstate_chunk(pos, QLIST_FIRST(&state->chunks));
+ if (c != NULL) {
+ ssize_t chunk_off = pos - c->offset;
+ ssize_t to_copy = MIN(qiov->size - off, c->bytes - chunk_off);
+
+ qemu_iovec_from_buf(qiov, off, c->buf + chunk_off, to_copy);
+
+ off += to_copy;
+ pos += to_copy;
+
+ if (pos == c->offset + c->bytes) {
+ state->chunk_count--;
+ /* End of buffer, discard it from the list */
+ QLIST_REMOVE(c, list);
+
+ /*
+ * Start loading next chunk. The slot in the pool should
+ * always be free for this purpose at the moment.
+ *
+ * There is a problem with the end of the stream. This code
+ * starts to read the data beyond the end of the saved state.
+ * The code in low level should be ready to such behavior but
+ * we will have unnecessary BDRV_VMSTATE_WORKERS_MAX chunks
+ * fully zeroed. This is not good, but acceptable.
+ */
+ bdrv_co_loadvmstate_next(bs, c);
+ }
+
+ state->offset += to_copy;
+ continue;
+ }
+
+ c = bdrv_co_find_loadvmstate_chunk(pos, QLIST_FIRST(&state->loading));
+ if (c != NULL) {
+ qemu_co_queue_wait(&state->waiters, NULL);
+ continue;
+ }
+
+slow_path:
+ /*
+ * This should not happen normally. This point could be reached only
+ * if we have had some parallel requests. Fallback to slow load.
+ */
+ return bs->drv->bdrv_load_vmstate(bs, qiov, start_pos);
+ }
+
+ return aio_task_pool_status(state->pool);
+}
+
static int coroutine_fn
bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
bool is_read)
@@ -2752,7 +2940,7 @@ bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
ret = -ENOMEDIUM;
} else if (drv->bdrv_load_vmstate) {
if (is_read) {
- ret = drv->bdrv_load_vmstate(bs, qiov, pos);
+ ret = bdrv_co_do_load_vmstate(bs, qiov, pos);
} else {
ret = bdrv_co_do_save_vmstate(bs, qiov, pos);
}
@@ -2823,13 +3011,13 @@ int bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
return bdrv_rw_vmstate(bs, qiov, pos, true);
}
-static int coroutine_fn bdrv_co_finalize_vmstate(BlockDriverState *bs)
+static int coroutine_fn bdrv_co_finalize_save_vmstate(BlockDriverState *bs)
{
int err;
BdrvSaveVMState *state = bs->savevm_state;
if (bs->drv->bdrv_save_vmstate == NULL && bs->file != NULL) {
- return bdrv_co_finalize_vmstate(bs->file->bs);
+ return bdrv_co_finalize_save_vmstate(bs->file->bs);
}
if (state == NULL) {
return 0;
@@ -2851,6 +3039,51 @@ static int coroutine_fn bdrv_co_finalize_vmstate(BlockDriverState *bs)
return err;
}
+static int coroutine_fn bdrv_co_finalize_load_vmstate(BlockDriverState *bs)
+{
+ int err;
+ BdrvLoadVMState *state = bs->loadvm_state;
+ BdrvLoadVMChunk *c, *tmp;
+
+ if (bs->drv->bdrv_load_vmstate == NULL && bs->file != NULL) {
+ return bdrv_co_finalize_load_vmstate(bs->file->bs);
+ }
+ if (state == NULL) {
+ return 0;
+ }
+
+ aio_task_pool_wait_all(state->pool);
+ err = aio_task_pool_status(state->pool);
+ aio_task_pool_free(state->pool);
+
+ /* this list must be empty as all tasks are committed */
+ assert(QLIST_EMPTY(&state->loading));
+
+ QLIST_FOREACH_SAFE(c, &state->chunks, list, tmp) {
+ QLIST_REMOVE(c, list);
+ bdrv_free_loadvm_chunk(c);
+ }
+
+ g_free(state);
+
+ bs->loadvm_state = NULL;
+
+ return err;
+}
+
+static int coroutine_fn bdrv_co_finalize_vmstate(BlockDriverState *bs)
+{
+ int err1 = bdrv_co_finalize_save_vmstate(bs);
+ int err2 = bdrv_co_finalize_load_vmstate(bs);
+ if (err1 < 0) {
+ return err1;
+ }
+ if (err2 < 0) {
+ return err2;
+ }
+ return 0;
+}
+
static int coroutine_fn bdrv_finalize_vmstate_co_entry(void *opaque)
{
return bdrv_co_finalize_vmstate(opaque);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index e975ac4c21..b98d44f61b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -812,6 +812,7 @@ struct BdrvChild {
typedef struct BdrvSaveVMState BdrvSaveVMState;
+typedef struct BdrvLoadVMState BdrvLoadVMState;
/*
* Note: the function bdrv_append() copies and swaps contents of
@@ -979,6 +980,8 @@ struct BlockDriverState {
/* Intermediate buffer for VM state saving from snapshot creation code */
BdrvSaveVMState *savevm_state;
+ /* Intermediate buffer for VM state loading */
+ BdrvLoadVMState *loadvm_state;
};
struct BlockBackendRootState {
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 6/6] block/io: improve loadvm performance
2020-07-09 13:26 ` [PATCH 6/6] block/io: improve loadvm performance Denis V. Lunev
@ 2020-07-10 14:40 ` Vladimir Sementsov-Ogievskiy
0 siblings, 0 replies; 15+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-07-10 14:40 UTC (permalink / raw)
To: Denis V. Lunev, qemu-block, qemu-devel
Cc: Kevin Wolf, Fam Zheng, Juan Quintela, Max Reitz, Denis Plotnikov,
Stefan Hajnoczi
09.07.2020 16:26, Denis V. Lunev wrote:
> This patch creates intermediate buffer for reading from block driver
> state and performs read-ahead to this buffer. Snapshot code performs
> reads sequentially and thus we know what offsets will be required
> and when they will become not needed.
>
> Results are fantastic. Switch to snapshot times of 2GB Fedora 31 VM
> over NVME storage are the following:
> original fixed
> cached: 1.84s 1.16s
> non-cached: 12.74s 1.27s
>
> The difference over HDD would be more significant:)
>
> Signed-off-by: Denis V. Lunev<den@openvz.org>
> CC: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
> CC: Kevin Wolf<kwolf@redhat.com>
> CC: Max Reitz<mreitz@redhat.com>
> CC: Stefan Hajnoczi<stefanha@redhat.com>
> CC: Fam Zheng<fam@euphon.net>
> CC: Juan Quintela<quintela@redhat.com>
> CC: Denis Plotnikov<dplotnikov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v8 0/6] block: seriously improve savevm/loadvm performance
2020-07-09 13:26 [PATCH v8 0/6] block: seriously improve savevm/loadvm performance Denis V. Lunev
` (5 preceding siblings ...)
2020-07-09 13:26 ` [PATCH 6/6] block/io: improve loadvm performance Denis V. Lunev
@ 2020-08-20 7:42 ` Denis V. Lunev
2020-08-27 12:41 ` Denis V. Lunev
2020-09-14 13:13 ` Stefan Hajnoczi
7 siblings, 1 reply; 15+ messages in thread
From: Denis V. Lunev @ 2020-08-20 7:42 UTC (permalink / raw)
To: qemu-block, qemu-devel
Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
Juan Quintela, Dr. David Alan Gilbert, Max Reitz,
Denis Plotnikov, Stefan Hajnoczi
On 7/9/20 4:26 PM, Denis V. Lunev wrote:
> This series do standard basic things:
> - it creates intermediate buffer for all writes from QEMU migration code
> to QCOW2 image,
> - this buffer is sent to disk asynchronously, allowing several writes to
> run in parallel.
>
> In general, migration code is fantastically inefficent (by observation),
> buffers are not aligned and sent with arbitrary pieces, a lot of time
> less than 100 bytes at a chunk, which results in read-modify-write
> operations with non-cached operations. It should also be noted that all
> operations are performed into unallocated image blocks, which also suffer
> due to partial writes to such new clusters.
>
> This patch series is an implementation of idea discussed in the RFC
> posted by Denis Plotnikov
> https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg01925.html
> Results with this series over NVME are better than original code
> original rfc this
> cached: 1.79s 2.38s 1.27s
> non-cached: 3.29s 1.31s 0.81s
>
> Changes from v7:
> - dropped lock from LoadVMState
> - fixed assert in last patch
> - dropped patch 1 as queued
>
> Changes from v6:
> - blk_load_vmstate kludges added (patchew problem fixed)
>
> Changes from v5:
> - loadvm optimizations added with Vladimir comments included
>
> Changes from v4:
> - added patch 4 with blk_save_vmstate() cleanup
> - added R-By
> - bdrv_flush_vmstate -> bdrv_finalize_vmstate
> - fixed return code of bdrv_co_do_save_vmstate
> - fixed typos in comments (Eric, thanks!)
> - fixed patchew warnings
>
> Changes from v3:
> - rebased to master
> - added patch 3 which removes aio_task_pool_wait_one()
> - added R-By to patch 1
> - patch 4 is rewritten via bdrv_run_co
> - error path in blk_save_vmstate() is rewritten to call bdrv_flush_vmstate
> unconditionally
> - added some comments
> - fixes initialization in bdrv_co_vmstate_save_task_entry as suggested
>
> Changes from v2:
> - code moved from QCOW2 level to generic block level
> - created bdrv_flush_vmstate helper to fix 022, 029 tests
> - added recursive for bs->file in bdrv_co_flush_vmstate (fix 267)
> - fixed blk_save_vmstate helper
> - fixed coroutine wait as Vladimir suggested with waiting fixes from me
>
> Changes from v1:
> - patchew warning fixed
> - fixed validation that only 1 waiter is allowed in patch 1
>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Fam Zheng <fam@euphon.net>
> CC: Juan Quintela <quintela@redhat.com>
> CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
>
>
ping
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v8 0/6] block: seriously improve savevm/loadvm performance
2020-08-20 7:42 ` [PATCH v8 0/6] block: seriously improve savevm/loadvm performance Denis V. Lunev
@ 2020-08-27 12:41 ` Denis V. Lunev
0 siblings, 0 replies; 15+ messages in thread
From: Denis V. Lunev @ 2020-08-27 12:41 UTC (permalink / raw)
To: qemu-block, qemu-devel
Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
Juan Quintela, Dr. David Alan Gilbert, Max Reitz,
Denis Plotnikov, Stefan Hajnoczi
On 8/20/20 10:42 AM, Denis V. Lunev wrote:
> On 7/9/20 4:26 PM, Denis V. Lunev wrote:
>> This series do standard basic things:
>> - it creates intermediate buffer for all writes from QEMU migration code
>> to QCOW2 image,
>> - this buffer is sent to disk asynchronously, allowing several writes to
>> run in parallel.
>>
>> In general, migration code is fantastically inefficent (by observation),
>> buffers are not aligned and sent with arbitrary pieces, a lot of time
>> less than 100 bytes at a chunk, which results in read-modify-write
>> operations with non-cached operations. It should also be noted that all
>> operations are performed into unallocated image blocks, which also suffer
>> due to partial writes to such new clusters.
>>
>> This patch series is an implementation of idea discussed in the RFC
>> posted by Denis Plotnikov
>> https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg01925.html
>> Results with this series over NVME are better than original code
>> original rfc this
>> cached: 1.79s 2.38s 1.27s
>> non-cached: 3.29s 1.31s 0.81s
>>
>> Changes from v7:
>> - dropped lock from LoadVMState
>> - fixed assert in last patch
>> - dropped patch 1 as queued
>>
>> Changes from v6:
>> - blk_load_vmstate kludges added (patchew problem fixed)
>>
>> Changes from v5:
>> - loadvm optimizations added with Vladimir comments included
>>
>> Changes from v4:
>> - added patch 4 with blk_save_vmstate() cleanup
>> - added R-By
>> - bdrv_flush_vmstate -> bdrv_finalize_vmstate
>> - fixed return code of bdrv_co_do_save_vmstate
>> - fixed typos in comments (Eric, thanks!)
>> - fixed patchew warnings
>>
>> Changes from v3:
>> - rebased to master
>> - added patch 3 which removes aio_task_pool_wait_one()
>> - added R-By to patch 1
>> - patch 4 is rewritten via bdrv_run_co
>> - error path in blk_save_vmstate() is rewritten to call bdrv_flush_vmstate
>> unconditionally
>> - added some comments
>> - fixes initialization in bdrv_co_vmstate_save_task_entry as suggested
>>
>> Changes from v2:
>> - code moved from QCOW2 level to generic block level
>> - created bdrv_flush_vmstate helper to fix 022, 029 tests
>> - added recursive for bs->file in bdrv_co_flush_vmstate (fix 267)
>> - fixed blk_save_vmstate helper
>> - fixed coroutine wait as Vladimir suggested with waiting fixes from me
>>
>> Changes from v1:
>> - patchew warning fixed
>> - fixed validation that only 1 waiter is allowed in patch 1
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Max Reitz <mreitz@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> CC: Fam Zheng <fam@euphon.net>
>> CC: Juan Quintela <quintela@redhat.com>
>> CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
>>
>>
> ping
ping V2
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/6] block, migration: add bdrv_finalize_vmstate helper
2020-07-09 13:26 ` [PATCH 4/6] block, migration: add bdrv_finalize_vmstate helper Denis V. Lunev
@ 2020-08-27 12:58 ` Daniel P. Berrangé
2020-08-27 13:02 ` Denis V. Lunev
0 siblings, 1 reply; 15+ messages in thread
From: Daniel P. Berrangé @ 2020-08-27 12:58 UTC (permalink / raw)
To: Denis V. Lunev
Cc: Kevin Wolf, Fam Zheng, qemu-block, Juan Quintela,
Dr. David Alan Gilbert, qemu-devel, Denis Plotnikov,
Stefan Hajnoczi, Max Reitz
On Thu, Jul 09, 2020 at 04:26:42PM +0300, Denis V. Lunev wrote:
> Right now bdrv_fclose() is just calling bdrv_flush().
>
> The problem is that migration code is working inefficiently from block
> layer terms and are frequently called for very small pieces of
> unaligned data. Block layer is capable to work this way, but this is very
> slow.
>
> This patch is a preparation for the introduction of the intermediate
> buffer at block driver state. It would be beneficial to separate
> conventional bdrv_flush() from closing QEMU file from migration code.
>
> The patch also forces bdrv_finalize_vmstate() operation inside
> synchronous blk_save_vmstate() operation. This helper is used from
> qemu-io only.
>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Fam Zheng <fam@euphon.net>
> CC: Juan Quintela <quintela@redhat.com>
> CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
> ---
> block/block-backend.c | 6 +++++-
> block/io.c | 15 +++++++++++++++
> include/block/block.h | 5 +++++
> migration/savevm.c | 4 ++++
> 4 files changed, 29 insertions(+), 1 deletion(-)
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 45c9dd9d8a..d8a94e312c 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -150,6 +150,10 @@ static ssize_t block_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
>
> static int bdrv_fclose(void *opaque, Error **errp)
> {
> + int err = bdrv_finalize_vmstate(opaque);
> + if (err < 0) {
> + return err;
This is returning an error without having populating 'errp' which means
the caller will be missing error diagnosis
> + }
> return bdrv_flush(opaque);
> }
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/6] block, migration: add bdrv_finalize_vmstate helper
2020-08-27 12:58 ` Daniel P. Berrangé
@ 2020-08-27 13:02 ` Denis V. Lunev
2020-08-27 13:06 ` Daniel P. Berrangé
0 siblings, 1 reply; 15+ messages in thread
From: Denis V. Lunev @ 2020-08-27 13:02 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Kevin Wolf, Fam Zheng, qemu-block, Juan Quintela,
Dr. David Alan Gilbert, qemu-devel, Denis Plotnikov,
Stefan Hajnoczi, Max Reitz
On 8/27/20 3:58 PM, Daniel P. Berrangé wrote:
> On Thu, Jul 09, 2020 at 04:26:42PM +0300, Denis V. Lunev wrote:
>> Right now bdrv_fclose() is just calling bdrv_flush().
>>
>> The problem is that migration code is working inefficiently from block
>> layer terms and are frequently called for very small pieces of
>> unaligned data. Block layer is capable to work this way, but this is very
>> slow.
>>
>> This patch is a preparation for the introduction of the intermediate
>> buffer at block driver state. It would be beneficial to separate
>> conventional bdrv_flush() from closing QEMU file from migration code.
>>
>> The patch also forces bdrv_finalize_vmstate() operation inside
>> synchronous blk_save_vmstate() operation. This helper is used from
>> qemu-io only.
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Max Reitz <mreitz@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> CC: Fam Zheng <fam@euphon.net>
>> CC: Juan Quintela <quintela@redhat.com>
>> CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
>> ---
>> block/block-backend.c | 6 +++++-
>> block/io.c | 15 +++++++++++++++
>> include/block/block.h | 5 +++++
>> migration/savevm.c | 4 ++++
>> 4 files changed, 29 insertions(+), 1 deletion(-)
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index 45c9dd9d8a..d8a94e312c 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -150,6 +150,10 @@ static ssize_t block_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
>>
>> static int bdrv_fclose(void *opaque, Error **errp)
>> {
>> + int err = bdrv_finalize_vmstate(opaque);
>> + if (err < 0) {
>> + return err;
> This is returning an error without having populating 'errp' which means
> the caller will be missing error diagnosis
but this behaves exactly like the branch below,
bdrv_flush() could return error too and errp
is not filled in the same way.
Den
>> + }
>> return bdrv_flush(opaque);
>> }
> Regards,
> Daniel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/6] block, migration: add bdrv_finalize_vmstate helper
2020-08-27 13:02 ` Denis V. Lunev
@ 2020-08-27 13:06 ` Daniel P. Berrangé
2020-08-28 6:13 ` Markus Armbruster
0 siblings, 1 reply; 15+ messages in thread
From: Daniel P. Berrangé @ 2020-08-27 13:06 UTC (permalink / raw)
To: Denis V. Lunev
Cc: Kevin Wolf, Fam Zheng, qemu-block, Juan Quintela,
Dr. David Alan Gilbert, qemu-devel, Denis Plotnikov,
Stefan Hajnoczi, Max Reitz
On Thu, Aug 27, 2020 at 04:02:38PM +0300, Denis V. Lunev wrote:
> On 8/27/20 3:58 PM, Daniel P. Berrangé wrote:
> > On Thu, Jul 09, 2020 at 04:26:42PM +0300, Denis V. Lunev wrote:
> >> Right now bdrv_fclose() is just calling bdrv_flush().
> >>
> >> The problem is that migration code is working inefficiently from block
> >> layer terms and are frequently called for very small pieces of
> >> unaligned data. Block layer is capable to work this way, but this is very
> >> slow.
> >>
> >> This patch is a preparation for the introduction of the intermediate
> >> buffer at block driver state. It would be beneficial to separate
> >> conventional bdrv_flush() from closing QEMU file from migration code.
> >>
> >> The patch also forces bdrv_finalize_vmstate() operation inside
> >> synchronous blk_save_vmstate() operation. This helper is used from
> >> qemu-io only.
> >>
> >> Signed-off-by: Denis V. Lunev <den@openvz.org>
> >> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >> CC: Kevin Wolf <kwolf@redhat.com>
> >> CC: Max Reitz <mreitz@redhat.com>
> >> CC: Stefan Hajnoczi <stefanha@redhat.com>
> >> CC: Fam Zheng <fam@euphon.net>
> >> CC: Juan Quintela <quintela@redhat.com>
> >> CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >> CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
> >> ---
> >> block/block-backend.c | 6 +++++-
> >> block/io.c | 15 +++++++++++++++
> >> include/block/block.h | 5 +++++
> >> migration/savevm.c | 4 ++++
> >> 4 files changed, 29 insertions(+), 1 deletion(-)
> >> diff --git a/migration/savevm.c b/migration/savevm.c
> >> index 45c9dd9d8a..d8a94e312c 100644
> >> --- a/migration/savevm.c
> >> +++ b/migration/savevm.c
> >> @@ -150,6 +150,10 @@ static ssize_t block_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
> >>
> >> static int bdrv_fclose(void *opaque, Error **errp)
> >> {
> >> + int err = bdrv_finalize_vmstate(opaque);
> >> + if (err < 0) {
> >> + return err;
> > This is returning an error without having populating 'errp' which means
> > the caller will be missing error diagnosis
>
> but this behaves exactly like the branch below,
> bdrv_flush() could return error too and errp
> is not filled in the same way.
Doh, it seems the only caller passes NULL for the errp too,
so it is a redundant parameter. So nothing wrong with your
patch after all.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/6] block, migration: add bdrv_finalize_vmstate helper
2020-08-27 13:06 ` Daniel P. Berrangé
@ 2020-08-28 6:13 ` Markus Armbruster
0 siblings, 0 replies; 15+ messages in thread
From: Markus Armbruster @ 2020-08-28 6:13 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Kevin Wolf, Fam Zheng, qemu-block, Juan Quintela,
Dr. David Alan Gilbert, qemu-devel, Denis Plotnikov,
Stefan Hajnoczi, Denis V. Lunev, Max Reitz
Daniel P. Berrangé <berrange@redhat.com> writes:
> On Thu, Aug 27, 2020 at 04:02:38PM +0300, Denis V. Lunev wrote:
>> On 8/27/20 3:58 PM, Daniel P. Berrangé wrote:
>> > On Thu, Jul 09, 2020 at 04:26:42PM +0300, Denis V. Lunev wrote:
>> >> Right now bdrv_fclose() is just calling bdrv_flush().
>> >>
>> >> The problem is that migration code is working inefficiently from block
>> >> layer terms and are frequently called for very small pieces of
>> >> unaligned data. Block layer is capable to work this way, but this is very
>> >> slow.
>> >>
>> >> This patch is a preparation for the introduction of the intermediate
>> >> buffer at block driver state. It would be beneficial to separate
>> >> conventional bdrv_flush() from closing QEMU file from migration code.
>> >>
>> >> The patch also forces bdrv_finalize_vmstate() operation inside
>> >> synchronous blk_save_vmstate() operation. This helper is used from
>> >> qemu-io only.
>> >>
>> >> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> >> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> >> CC: Kevin Wolf <kwolf@redhat.com>
>> >> CC: Max Reitz <mreitz@redhat.com>
>> >> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> >> CC: Fam Zheng <fam@euphon.net>
>> >> CC: Juan Quintela <quintela@redhat.com>
>> >> CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> >> CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
>> >> ---
>> >> block/block-backend.c | 6 +++++-
>> >> block/io.c | 15 +++++++++++++++
>> >> include/block/block.h | 5 +++++
>> >> migration/savevm.c | 4 ++++
>> >> 4 files changed, 29 insertions(+), 1 deletion(-)
>> >> diff --git a/migration/savevm.c b/migration/savevm.c
>> >> index 45c9dd9d8a..d8a94e312c 100644
>> >> --- a/migration/savevm.c
>> >> +++ b/migration/savevm.c
>> >> @@ -150,6 +150,10 @@ static ssize_t block_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
>> >>
>> >> static int bdrv_fclose(void *opaque, Error **errp)
>> >> {
>> >> + int err = bdrv_finalize_vmstate(opaque);
>> >> + if (err < 0) {
>> >> + return err;
>> > This is returning an error without having populating 'errp' which means
>> > the caller will be missing error diagnosis
>>
>> but this behaves exactly like the branch below,
>> bdrv_flush() could return error too and errp
>> is not filled in the same way.
>
> Doh, it seems the only caller passes NULL for the errp too,
> so it is a redundant parameter. So nothing wrong with your
> patch after all.
Not setting an error on failure is plainly wrong.
If it works because all callers pass NULL, then the obvious fix is to
drop the @errp parameter.
I agree it's not this patch's fault. It needs fixing anyway. If you
have to respin for some other reason, including a fix would be nice.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v8 0/6] block: seriously improve savevm/loadvm performance
2020-07-09 13:26 [PATCH v8 0/6] block: seriously improve savevm/loadvm performance Denis V. Lunev
` (6 preceding siblings ...)
2020-08-20 7:42 ` [PATCH v8 0/6] block: seriously improve savevm/loadvm performance Denis V. Lunev
@ 2020-09-14 13:13 ` Stefan Hajnoczi
7 siblings, 0 replies; 15+ messages in thread
From: Stefan Hajnoczi @ 2020-09-14 13:13 UTC (permalink / raw)
To: Kevin Wolf, Max Reitz
Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block,
Juan Quintela, qemu-devel, Max Reitz, Denis Plotnikov,
Denis V. Lunev, Dr. David Alan Gilbert
[-- Attachment #1: Type: text/plain, Size: 1169 bytes --]
On Thu, Jul 09, 2020 at 04:26:38PM +0300, Denis V. Lunev wrote:
> This series do standard basic things:
> - it creates intermediate buffer for all writes from QEMU migration code
> to QCOW2 image,
> - this buffer is sent to disk asynchronously, allowing several writes to
> run in parallel.
>
> In general, migration code is fantastically inefficent (by observation),
> buffers are not aligned and sent with arbitrary pieces, a lot of time
> less than 100 bytes at a chunk, which results in read-modify-write
> operations with non-cached operations. It should also be noted that all
> operations are performed into unallocated image blocks, which also suffer
> due to partial writes to such new clusters.
>
> This patch series is an implementation of idea discussed in the RFC
> posted by Denis Plotnikov
> https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg01925.html
> Results with this series over NVME are better than original code
> original rfc this
> cached: 1.79s 2.38s 1.27s
> non-cached: 3.29s 1.31s 0.81s
Kevin and Max: Is this going through one of your trees?
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2020-09-14 16:03 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-09 13:26 [PATCH v8 0/6] block: seriously improve savevm/loadvm performance Denis V. Lunev
2020-07-09 13:26 ` [PATCH 1/6] block/aio_task: allow start/wait task from any coroutine Denis V. Lunev
2020-07-09 13:26 ` [PATCH 2/6] block/aio_task: drop aio_task_pool_wait_one() helper Denis V. Lunev
2020-07-09 13:26 ` [PATCH 3/6] block/block-backend: remove always true check from blk_save_vmstate Denis V. Lunev
2020-07-09 13:26 ` [PATCH 4/6] block, migration: add bdrv_finalize_vmstate helper Denis V. Lunev
2020-08-27 12:58 ` Daniel P. Berrangé
2020-08-27 13:02 ` Denis V. Lunev
2020-08-27 13:06 ` Daniel P. Berrangé
2020-08-28 6:13 ` Markus Armbruster
2020-07-09 13:26 ` [PATCH 5/6] block/io: improve savevm performance Denis V. Lunev
2020-07-09 13:26 ` [PATCH 6/6] block/io: improve loadvm performance Denis V. Lunev
2020-07-10 14:40 ` Vladimir Sementsov-Ogievskiy
2020-08-20 7:42 ` [PATCH v8 0/6] block: seriously improve savevm/loadvm performance Denis V. Lunev
2020-08-27 12:41 ` Denis V. Lunev
2020-09-14 13:13 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).