* [PATCH 0/3] migration: Fixes to the 'background-snapshot' code
@ 2021-03-18 17:46 Andrey Gruzdev
2021-03-18 17:46 ` [PATCH 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread Andrey Gruzdev
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Andrey Gruzdev @ 2021-03-18 17:46 UTC (permalink / raw)
To: qemu-devel
Cc: Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
Dr . David Alan Gilbert, Markus Armbruster, Peter Xu,
David Hildenbrand, Andrey Gruzdev
This patch series contains:
* Fix to the issue with occasionally truncated non-iterable device state
* Solution to compatibility issues with virtio-balloon device
* Fix to the issue when discarded or never populated pages miss UFFD
write protection and get into migration stream in dirty state
Andrey Gruzdev (3):
migration: Fix missing qemu_fflush() on buffer file in
bg_migration_thread
migration: Inhibit virtio-balloon for the duration of background
snapshot
migration: Pre-fault memory before starting background snasphot
hw/virtio/virtio-balloon.c | 8 ++++--
include/migration/misc.h | 2 ++
migration/migration.c | 18 +++++++++++++-
migration/ram.c | 51 ++++++++++++++++++++++++++++++++++++++
migration/ram.h | 1 +
5 files changed, 77 insertions(+), 3 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread
2021-03-18 17:46 [PATCH 0/3] migration: Fixes to the 'background-snapshot' code Andrey Gruzdev
@ 2021-03-18 17:46 ` Andrey Gruzdev
2021-03-19 12:39 ` David Hildenbrand
2021-03-18 17:46 ` [PATCH 2/3] migration: Inhibit virtio-balloon for the duration of background snapshot Andrey Gruzdev
2021-03-18 17:46 ` [PATCH 3/3] migration: Pre-fault memory before starting background snasphot Andrey Gruzdev
2 siblings, 1 reply; 14+ messages in thread
From: Andrey Gruzdev @ 2021-03-18 17:46 UTC (permalink / raw)
To: qemu-devel
Cc: Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
Dr . David Alan Gilbert, Markus Armbruster, Peter Xu,
David Hildenbrand, Andrey Gruzdev
Added missing qemu_fflush() on buffer file holding precopy device state.
Increased initial QIOChannelBuffer allocation to 512KB to avoid reallocs.
Typical configurations often require >200KB for device state and VMDESC.
Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
---
migration/migration.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/migration/migration.c b/migration/migration.c
index 36768391b6..496cf6e17b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3857,7 +3857,7 @@ static void *bg_migration_thread(void *opaque)
* with vCPUs running and, finally, write stashed non-RAM part of
* the vmstate from the buffer to the migration stream.
*/
- s->bioc = qio_channel_buffer_new(128 * 1024);
+ s->bioc = qio_channel_buffer_new(512 * 1024);
qio_channel_set_name(QIO_CHANNEL(s->bioc), "vmstate-buffer");
fb = qemu_fopen_channel_output(QIO_CHANNEL(s->bioc));
object_unref(OBJECT(s->bioc));
@@ -3911,6 +3911,8 @@ static void *bg_migration_thread(void *opaque)
if (qemu_savevm_state_complete_precopy_non_iterable(fb, false, false)) {
goto fail;
}
+ qemu_fflush(fb);
+
/* Now initialize UFFD context and start tracking RAM writes */
if (ram_write_tracking_start()) {
goto fail;
--
2.25.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/3] migration: Inhibit virtio-balloon for the duration of background snapshot
2021-03-18 17:46 [PATCH 0/3] migration: Fixes to the 'background-snapshot' code Andrey Gruzdev
2021-03-18 17:46 ` [PATCH 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread Andrey Gruzdev
@ 2021-03-18 17:46 ` Andrey Gruzdev
2021-03-18 18:16 ` David Hildenbrand
2021-03-18 17:46 ` [PATCH 3/3] migration: Pre-fault memory before starting background snasphot Andrey Gruzdev
2 siblings, 1 reply; 14+ messages in thread
From: Andrey Gruzdev @ 2021-03-18 17:46 UTC (permalink / raw)
To: qemu-devel
Cc: Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
Dr . David Alan Gilbert, Markus Armbruster, Peter Xu,
David Hildenbrand, Andrey Gruzdev
The same thing as for incoming postcopy - we cannot deal with concurrent
RAM discards when using background snapshot feature in outgoing migration.
Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
---
hw/virtio/virtio-balloon.c | 8 ++++++--
include/migration/misc.h | 2 ++
migration/migration.c | 8 ++++++++
3 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index e770955176..d120bf8f43 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -66,8 +66,12 @@ static bool virtio_balloon_pbp_matches(PartiallyBalloonedPage *pbp,
static bool virtio_balloon_inhibited(void)
{
- /* Postcopy cannot deal with concurrent discards, so it's special. */
- return ram_block_discard_is_disabled() || migration_in_incoming_postcopy();
+ /*
+ * Postcopy cannot deal with concurrent discards,
+ * so it's special, as well as background snapshots.
+ */
+ return ram_block_discard_is_disabled() || migration_in_incoming_postcopy() ||
+ migration_in_bg_snapshot();
}
static void balloon_inflate_page(VirtIOBalloon *balloon,
diff --git a/include/migration/misc.h b/include/migration/misc.h
index bccc1b6b44..738675ef52 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -70,6 +70,8 @@ bool migration_in_postcopy_after_devices(MigrationState *);
void migration_global_dump(Monitor *mon);
/* True if incomming migration entered POSTCOPY_INCOMING_DISCARD */
bool migration_in_incoming_postcopy(void);
+/* True if background snapshot is active */
+bool migration_in_bg_snapshot(void);
/* migration/block-dirty-bitmap.c */
void dirty_bitmap_mig_init(void);
diff --git a/migration/migration.c b/migration/migration.c
index 496cf6e17b..656d6249a6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1976,6 +1976,14 @@ bool migration_in_incoming_postcopy(void)
return ps >= POSTCOPY_INCOMING_DISCARD && ps < POSTCOPY_INCOMING_END;
}
+bool migration_in_bg_snapshot(void)
+{
+ MigrationState *s = migrate_get_current();
+
+ return migrate_background_snapshot() &&
+ migration_is_setup_or_active(s->state);
+}
+
bool migration_is_idle(void)
{
MigrationState *s = current_migration;
--
2.25.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/3] migration: Pre-fault memory before starting background snasphot
2021-03-18 17:46 [PATCH 0/3] migration: Fixes to the 'background-snapshot' code Andrey Gruzdev
2021-03-18 17:46 ` [PATCH 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread Andrey Gruzdev
2021-03-18 17:46 ` [PATCH 2/3] migration: Inhibit virtio-balloon for the duration of background snapshot Andrey Gruzdev
@ 2021-03-18 17:46 ` Andrey Gruzdev
2021-03-19 9:28 ` David Hildenbrand
2 siblings, 1 reply; 14+ messages in thread
From: Andrey Gruzdev @ 2021-03-18 17:46 UTC (permalink / raw)
To: qemu-devel
Cc: Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
Dr . David Alan Gilbert, Markus Armbruster, Peter Xu,
David Hildenbrand, Andrey Gruzdev
This commit solves the issue with userfault_fd WP feature that
background snapshot is based on. For any never poluated or discarded
memory page, the UFFDIO_WRITEPROTECT ioctl() would skip updating
PTE for that page, thereby loosing WP setting for it.
So we need to pre-fault pages for each RAM block to be protected
before making a userfault_fd wr-protect ioctl().
Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
---
migration/migration.c | 6 +++++
migration/ram.c | 51 +++++++++++++++++++++++++++++++++++++++++++
migration/ram.h | 1 +
3 files changed, 58 insertions(+)
diff --git a/migration/migration.c b/migration/migration.c
index 656d6249a6..496e88cbda 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3872,6 +3872,12 @@ static void *bg_migration_thread(void *opaque)
update_iteration_initial_status(s);
+ /*
+ * Prepare for tracking memory writes with UFFD-WP - populate
+ * RAM pages before protecting.
+ */
+ ram_write_tracking_prepare();
+
qemu_savevm_state_header(s->to_dst_file);
qemu_savevm_state_setup(s->to_dst_file);
diff --git a/migration/ram.c b/migration/ram.c
index 52537f14ac..825eb80030 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1560,6 +1560,57 @@ out:
return ret;
}
+/*
+ * ram_block_populate_pages: populate memory in the RAM block by reading
+ * an integer from the beginning of each page.
+ *
+ * Since it's solely used for userfault_fd WP feature, here we just
+ * hardcode page size to TARGET_PAGE_SIZE.
+ *
+ * @bs: RAM block to populate
+ */
+volatile int ram_block_populate_pages__tmp;
+static void ram_block_populate_pages(RAMBlock *bs)
+{
+ ram_addr_t offset = 0;
+ int tmp = 0;
+
+ for (char *ptr = (char *) bs->host; offset < bs->used_length;
+ ptr += TARGET_PAGE_SIZE, offset += TARGET_PAGE_SIZE) {
+ /* Try to do it without memory writes */
+ tmp += *(volatile int *) ptr;
+ }
+ /* Create dependency on 'extern volatile int' to avoid optimizing out */
+ ram_block_populate_pages__tmp += tmp;
+}
+
+/*
+ * ram_write_tracking_prepare: prepare for UFFD-WP memory tracking
+ */
+void ram_write_tracking_prepare(void)
+{
+ RAMBlock *bs;
+
+ RCU_READ_LOCK_GUARD();
+
+ RAMBLOCK_FOREACH_NOT_IGNORED(bs) {
+ /* Nothing to do with read-only and MMIO-writable regions */
+ if (bs->mr->readonly || bs->mr->rom_device) {
+ continue;
+ }
+
+ /*
+ * Populate pages of the RAM block before enabling userfault_fd
+ * write protection.
+ *
+ * This stage is required since ioctl(UFFDIO_WRITEPROTECT) with
+ * UFFDIO_WRITEPROTECT_MODE_WP mode setting would silently skip
+ * pages with pte_none() entries in page table.
+ */
+ ram_block_populate_pages(bs);
+ }
+}
+
/*
* ram_write_tracking_start: start UFFD-WP memory tracking
*
diff --git a/migration/ram.h b/migration/ram.h
index 6378bb3ebc..4833e9fd5b 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -82,6 +82,7 @@ void colo_incoming_start_dirty_log(void);
/* Background snapshot */
bool ram_write_tracking_available(void);
bool ram_write_tracking_compatible(void);
+void ram_write_tracking_prepare(void);
int ram_write_tracking_start(void);
void ram_write_tracking_stop(void);
--
2.25.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] migration: Inhibit virtio-balloon for the duration of background snapshot
2021-03-18 17:46 ` [PATCH 2/3] migration: Inhibit virtio-balloon for the duration of background snapshot Andrey Gruzdev
@ 2021-03-18 18:16 ` David Hildenbrand
2021-03-19 8:27 ` Andrey Gruzdev
0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2021-03-18 18:16 UTC (permalink / raw)
To: Andrey Gruzdev, qemu-devel
Cc: Juan Quintela, Dr . David Alan Gilbert, Peter Xu,
Markus Armbruster, Paolo Bonzini, Den Lunev
On 18.03.21 18:46, Andrey Gruzdev wrote:
> The same thing as for incoming postcopy - we cannot deal with concurrent
> RAM discards when using background snapshot feature in outgoing migration.
>
> Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
> ---
> hw/virtio/virtio-balloon.c | 8 ++++++--
> include/migration/misc.h | 2 ++
> migration/migration.c | 8 ++++++++
> 3 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index e770955176..d120bf8f43 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -66,8 +66,12 @@ static bool virtio_balloon_pbp_matches(PartiallyBalloonedPage *pbp,
>
> static bool virtio_balloon_inhibited(void)
> {
> - /* Postcopy cannot deal with concurrent discards, so it's special. */
> - return ram_block_discard_is_disabled() || migration_in_incoming_postcopy();
> + /*
> + * Postcopy cannot deal with concurrent discards,
> + * so it's special, as well as background snapshots.
> + */
> + return ram_block_discard_is_disabled() || migration_in_incoming_postcopy() ||
> + migration_in_bg_snapshot();
> }
>
> static void balloon_inflate_page(VirtIOBalloon *balloon,
> diff --git a/include/migration/misc.h b/include/migration/misc.h
> index bccc1b6b44..738675ef52 100644
> --- a/include/migration/misc.h
> +++ b/include/migration/misc.h
> @@ -70,6 +70,8 @@ bool migration_in_postcopy_after_devices(MigrationState *);
> void migration_global_dump(Monitor *mon);
> /* True if incomming migration entered POSTCOPY_INCOMING_DISCARD */
> bool migration_in_incoming_postcopy(void);
> +/* True if background snapshot is active */
> +bool migration_in_bg_snapshot(void);
>
> /* migration/block-dirty-bitmap.c */
> void dirty_bitmap_mig_init(void);
> diff --git a/migration/migration.c b/migration/migration.c
> index 496cf6e17b..656d6249a6 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1976,6 +1976,14 @@ bool migration_in_incoming_postcopy(void)
> return ps >= POSTCOPY_INCOMING_DISCARD && ps < POSTCOPY_INCOMING_END;
> }
>
> +bool migration_in_bg_snapshot(void)
> +{
> + MigrationState *s = migrate_get_current();
> +
> + return migrate_background_snapshot() &&
> + migration_is_setup_or_active(s->state);
> +}
> +
> bool migration_is_idle(void)
> {
> MigrationState *s = current_migration;
>
Reviewed-by: David Hildenbrand <david@redhat.com>
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] migration: Inhibit virtio-balloon for the duration of background snapshot
2021-03-18 18:16 ` David Hildenbrand
@ 2021-03-19 8:27 ` Andrey Gruzdev
0 siblings, 0 replies; 14+ messages in thread
From: Andrey Gruzdev @ 2021-03-19 8:27 UTC (permalink / raw)
To: David Hildenbrand, qemu-devel
Cc: Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
Dr . David Alan Gilbert, Markus Armbruster, Peter Xu
On 18.03.2021 21:16, David Hildenbrand wrote:
> On 18.03.21 18:46, Andrey Gruzdev wrote:
>> The same thing as for incoming postcopy - we cannot deal with concurrent
>> RAM discards when using background snapshot feature in outgoing
>> migration.
>>
>> Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
>> ---
>> hw/virtio/virtio-balloon.c | 8 ++++++--
>> include/migration/misc.h | 2 ++
>> migration/migration.c | 8 ++++++++
>> 3 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
>> index e770955176..d120bf8f43 100644
>> --- a/hw/virtio/virtio-balloon.c
>> +++ b/hw/virtio/virtio-balloon.c
>> @@ -66,8 +66,12 @@ static bool
>> virtio_balloon_pbp_matches(PartiallyBalloonedPage *pbp,
>> static bool virtio_balloon_inhibited(void)
>> {
>> - /* Postcopy cannot deal with concurrent discards, so it's
>> special. */
>> - return ram_block_discard_is_disabled() ||
>> migration_in_incoming_postcopy();
>> + /*
>> + * Postcopy cannot deal with concurrent discards,
>> + * so it's special, as well as background snapshots.
>> + */
>> + return ram_block_discard_is_disabled() ||
>> migration_in_incoming_postcopy() ||
>> + migration_in_bg_snapshot();
>> }
>> static void balloon_inflate_page(VirtIOBalloon *balloon,
>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>> index bccc1b6b44..738675ef52 100644
>> --- a/include/migration/misc.h
>> +++ b/include/migration/misc.h
>> @@ -70,6 +70,8 @@ bool
>> migration_in_postcopy_after_devices(MigrationState *);
>> void migration_global_dump(Monitor *mon);
>> /* True if incomming migration entered POSTCOPY_INCOMING_DISCARD */
>> bool migration_in_incoming_postcopy(void);
>> +/* True if background snapshot is active */
>> +bool migration_in_bg_snapshot(void);
>> /* migration/block-dirty-bitmap.c */
>> void dirty_bitmap_mig_init(void);
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 496cf6e17b..656d6249a6 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1976,6 +1976,14 @@ bool migration_in_incoming_postcopy(void)
>> return ps >= POSTCOPY_INCOMING_DISCARD && ps <
>> POSTCOPY_INCOMING_END;
>> }
>> +bool migration_in_bg_snapshot(void)
>> +{
>> + MigrationState *s = migrate_get_current();
>> +
>> + return migrate_background_snapshot() &&
>> + migration_is_setup_or_active(s->state);
>> +}
>> +
>> bool migration_is_idle(void)
>> {
>> MigrationState *s = current_migration;
>>
>
> Reviewed-by: David Hildenbrand <david@redhat.com>
>
Thanks!
--
Andrey Gruzdev, Principal Engineer
Virtuozzo GmbH +7-903-247-6397
virtuzzo.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] migration: Pre-fault memory before starting background snasphot
2021-03-18 17:46 ` [PATCH 3/3] migration: Pre-fault memory before starting background snasphot Andrey Gruzdev
@ 2021-03-19 9:28 ` David Hildenbrand
2021-03-19 9:32 ` David Hildenbrand
2021-03-19 11:05 ` Andrey Gruzdev
0 siblings, 2 replies; 14+ messages in thread
From: David Hildenbrand @ 2021-03-19 9:28 UTC (permalink / raw)
To: Andrey Gruzdev, qemu-devel
Cc: Juan Quintela, Dr . David Alan Gilbert, Peter Xu,
Markus Armbruster, Paolo Bonzini, Den Lunev
> +/*
> + * ram_block_populate_pages: populate memory in the RAM block by reading
> + * an integer from the beginning of each page.
> + *
> + * Since it's solely used for userfault_fd WP feature, here we just
> + * hardcode page size to TARGET_PAGE_SIZE.
> + *
> + * @bs: RAM block to populate
> + */
> +volatile int ram_block_populate_pages__tmp;
> +static void ram_block_populate_pages(RAMBlock *bs)
> +{
> + ram_addr_t offset = 0;
> + int tmp = 0;
> +
> + for (char *ptr = (char *) bs->host; offset < bs->used_length;
> + ptr += TARGET_PAGE_SIZE, offset += TARGET_PAGE_SIZE) {
You'll want qemu_real_host_page_size instead of TARGET_PAGE_SIZE
> + /* Try to do it without memory writes */
> + tmp += *(volatile int *) ptr;
> + }
The following is slightly simpler and doesn't rely on volatile semantics [1].
Should work on any arch I guess.
static void ram_block_populate_pages(RAMBlock *bs)
{
char *ptr = (char *) bs->host;
ram_addr_t offset;
for (offset = 0; offset < bs->used_length;
offset += qemu_real_host_page_size) {
char tmp = *(volatile char *)(ptr + offset)
/* Don't optimize the read out. */
asm volatile ("" : "+r" (tmp));
}
Compiles to
for (offset = 0; offset < bs->used_length;
316d: 48 8b 4b 30 mov 0x30(%rbx),%rcx
char *ptr = (char *) bs->host;
3171: 48 8b 73 18 mov 0x18(%rbx),%rsi
for (offset = 0; offset < bs->used_length;
3175: 48 85 c9 test %rcx,%rcx
3178: 74 ce je 3148 <ram_write_tracking_prepare+0x58>
offset += qemu_real_host_page_size) {
317a: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 3181 <ram_write_tracking_prepare+0x91>
3181: 48 8b 38 mov (%rax),%rdi
3184: 0f 1f 40 00 nopl 0x0(%rax)
char tmp = *(volatile char *)(ptr + offset);
3188: 48 8d 04 16 lea (%rsi,%rdx,1),%rax
318c: 0f b6 00 movzbl (%rax),%eax
offset += qemu_real_host_page_size) {
318f: 48 01 fa add %rdi,%rdx
for (offset = 0; offset < bs->used_length;
3192: 48 39 ca cmp %rcx,%rdx
3195: 72 f1 jb 3188 <ram_write_tracking_prepare+0x98>
[1] https://programfan.github.io/blog/2015/04/27/prevent-gcc-optimize-away-code/
I'll send patches soon to take care of virtio-mem via RamDiscardManager -
to skip populating the parts that are supposed to remain discarded and not migrated.
Unfortunately, the RamDiscardManager patches are still stuck waiting for
acks ... and now we're in soft-freeze.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] migration: Pre-fault memory before starting background snasphot
2021-03-19 9:28 ` David Hildenbrand
@ 2021-03-19 9:32 ` David Hildenbrand
2021-03-19 11:09 ` Andrey Gruzdev
2021-03-19 11:05 ` Andrey Gruzdev
1 sibling, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2021-03-19 9:32 UTC (permalink / raw)
To: Andrey Gruzdev, qemu-devel
Cc: Juan Quintela, Dr . David Alan Gilbert, Peter Xu,
Markus Armbruster, Paolo Bonzini, Den Lunev
On 19.03.21 10:28, David Hildenbrand wrote:
>> +/*
>> + * ram_block_populate_pages: populate memory in the RAM block by reading
>> + * an integer from the beginning of each page.
>> + *
>> + * Since it's solely used for userfault_fd WP feature, here we just
>> + * hardcode page size to TARGET_PAGE_SIZE.
>> + *
>> + * @bs: RAM block to populate
>> + */
>> +volatile int ram_block_populate_pages__tmp;
>> +static void ram_block_populate_pages(RAMBlock *bs)
>> +{
>> + ram_addr_t offset = 0;
>> + int tmp = 0;
>> +
>> + for (char *ptr = (char *) bs->host; offset < bs->used_length;
>> + ptr += TARGET_PAGE_SIZE, offset += TARGET_PAGE_SIZE) {
>
> You'll want qemu_real_host_page_size instead of TARGET_PAGE_SIZE
>
>> + /* Try to do it without memory writes */
>> + tmp += *(volatile int *) ptr;
>> + }
>
>
> The following is slightly simpler and doesn't rely on volatile semantics [1].
> Should work on any arch I guess.
>
> static void ram_block_populate_pages(RAMBlock *bs)
> {
> char *ptr = (char *) bs->host;
> ram_addr_t offset;
>
> for (offset = 0; offset < bs->used_length;
> offset += qemu_real_host_page_size) {
> char tmp = *(volatile char *)(ptr + offset)
I wanted to do a "= *(ptr + offset)" here.
>
> /* Don't optimize the read out. */
> asm volatile ("" : "+r" (tmp));
So this is the only volatile thing that the compiler must guarantee to
not optimize away.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] migration: Pre-fault memory before starting background snasphot
2021-03-19 9:28 ` David Hildenbrand
2021-03-19 9:32 ` David Hildenbrand
@ 2021-03-19 11:05 ` Andrey Gruzdev
2021-03-19 11:27 ` David Hildenbrand
1 sibling, 1 reply; 14+ messages in thread
From: Andrey Gruzdev @ 2021-03-19 11:05 UTC (permalink / raw)
To: David Hildenbrand, qemu-devel
Cc: Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
Dr . David Alan Gilbert, Markus Armbruster, Peter Xu
On 19.03.2021 12:28, David Hildenbrand wrote:
>> +/*
>> + * ram_block_populate_pages: populate memory in the RAM block by
>> reading
>> + * an integer from the beginning of each page.
>> + *
>> + * Since it's solely used for userfault_fd WP feature, here we just
>> + * hardcode page size to TARGET_PAGE_SIZE.
>> + *
>> + * @bs: RAM block to populate
>> + */
>> +volatile int ram_block_populate_pages__tmp;
>> +static void ram_block_populate_pages(RAMBlock *bs)
>> +{
>> + ram_addr_t offset = 0;
>> + int tmp = 0;
>> +
>> + for (char *ptr = (char *) bs->host; offset < bs->used_length;
>> + ptr += TARGET_PAGE_SIZE, offset += TARGET_PAGE_SIZE) {
>
> You'll want qemu_real_host_page_size instead of TARGET_PAGE_SIZE
>
Ok.
>> + /* Try to do it without memory writes */
>> + tmp += *(volatile int *) ptr;
>> + }
>
>
> The following is slightly simpler and doesn't rely on volatile
> semantics [1].
> Should work on any arch I guess.
>
> static void ram_block_populate_pages(RAMBlock *bs)
> {
> char *ptr = (char *) bs->host;
> ram_addr_t offset;
>
> for (offset = 0; offset < bs->used_length;
> offset += qemu_real_host_page_size) {
> char tmp = *(volatile char *)(ptr + offset)
>
> /* Don't optimize the read out. */
> asm volatile ("" : "+r" (tmp));
> }
>
Thanks, good option, I'll change the code.
> Compiles to
>
> for (offset = 0; offset < bs->used_length;
> 316d: 48 8b 4b 30 mov 0x30(%rbx),%rcx
> char *ptr = (char *) bs->host;
> 3171: 48 8b 73 18 mov 0x18(%rbx),%rsi
> for (offset = 0; offset < bs->used_length;
> 3175: 48 85 c9 test %rcx,%rcx
> 3178: 74 ce je 3148
> <ram_write_tracking_prepare+0x58>
> offset += qemu_real_host_page_size) {
> 317a: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax #
> 3181 <ram_write_tracking_prepare+0x91>
> 3181: 48 8b 38 mov (%rax),%rdi
> 3184: 0f 1f 40 00 nopl 0x0(%rax)
> char tmp = *(volatile char *)(ptr + offset);
> 3188: 48 8d 04 16 lea (%rsi,%rdx,1),%rax
> 318c: 0f b6 00 movzbl (%rax),%eax
> offset += qemu_real_host_page_size) {
> 318f: 48 01 fa add %rdi,%rdx
> for (offset = 0; offset < bs->used_length;
> 3192: 48 39 ca cmp %rcx,%rdx
> 3195: 72 f1 jb 3188
> <ram_write_tracking_prepare+0x98>
>
>
> [1]
> https://programfan.github.io/blog/2015/04/27/prevent-gcc-optimize-away-code/
>
>
> I'll send patches soon to take care of virtio-mem via RamDiscardManager -
> to skip populating the parts that are supposed to remain discarded and
> not migrated.
> Unfortunately, the RamDiscardManager patches are still stuck waiting for
> acks ... and now we're in soft-freeze.
>
RamDiscardManager patches - do they also modify migration code?
I mean which part is responsible of not migrating discarded ranges.
--
Andrey Gruzdev, Principal Engineer
Virtuozzo GmbH +7-903-247-6397
virtuzzo.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] migration: Pre-fault memory before starting background snasphot
2021-03-19 9:32 ` David Hildenbrand
@ 2021-03-19 11:09 ` Andrey Gruzdev
0 siblings, 0 replies; 14+ messages in thread
From: Andrey Gruzdev @ 2021-03-19 11:09 UTC (permalink / raw)
To: David Hildenbrand, qemu-devel
Cc: Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
Dr . David Alan Gilbert, Markus Armbruster, Peter Xu
On 19.03.2021 12:32, David Hildenbrand wrote:
> On 19.03.21 10:28, David Hildenbrand wrote:
>>> +/*
>>> + * ram_block_populate_pages: populate memory in the RAM block by
>>> reading
>>> + * an integer from the beginning of each page.
>>> + *
>>> + * Since it's solely used for userfault_fd WP feature, here we just
>>> + * hardcode page size to TARGET_PAGE_SIZE.
>>> + *
>>> + * @bs: RAM block to populate
>>> + */
>>> +volatile int ram_block_populate_pages__tmp;
>>> +static void ram_block_populate_pages(RAMBlock *bs)
>>> +{
>>> + ram_addr_t offset = 0;
>>> + int tmp = 0;
>>> +
>>> + for (char *ptr = (char *) bs->host; offset < bs->used_length;
>>> + ptr += TARGET_PAGE_SIZE, offset += TARGET_PAGE_SIZE) {
>>
>> You'll want qemu_real_host_page_size instead of TARGET_PAGE_SIZE
>>
>>> + /* Try to do it without memory writes */
>>> + tmp += *(volatile int *) ptr;
>>> + }
>>
>>
>> The following is slightly simpler and doesn't rely on volatile
>> semantics [1].
>> Should work on any arch I guess.
>>
>> static void ram_block_populate_pages(RAMBlock *bs)
>> {
>> char *ptr = (char *) bs->host;
>> ram_addr_t offset;
>>
>> for (offset = 0; offset < bs->used_length;
>> offset += qemu_real_host_page_size) {
>> char tmp = *(volatile char *)(ptr + offset)
>
> I wanted to do a "= *(ptr + offset)" here.
>
Yep
>>
>> /* Don't optimize the read out. */
>> asm volatile ("" : "+r" (tmp));
>
> So this is the only volatile thing that the compiler must guarantee to
> not optimize away.
>
>
--
Andrey Gruzdev, Principal Engineer
Virtuozzo GmbH +7-903-247-6397
virtuzzo.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] migration: Pre-fault memory before starting background snasphot
2021-03-19 11:05 ` Andrey Gruzdev
@ 2021-03-19 11:27 ` David Hildenbrand
2021-03-19 12:37 ` Andrey Gruzdev
0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2021-03-19 11:27 UTC (permalink / raw)
To: Andrey Gruzdev, qemu-devel
Cc: Juan Quintela, Dr . David Alan Gilbert, Peter Xu,
Markus Armbruster, Paolo Bonzini, Den Lunev
On 19.03.21 12:05, Andrey Gruzdev wrote:
> On 19.03.2021 12:28, David Hildenbrand wrote:
>>> +/*
>>> + * ram_block_populate_pages: populate memory in the RAM block by
>>> reading
>>> + * an integer from the beginning of each page.
>>> + *
>>> + * Since it's solely used for userfault_fd WP feature, here we just
>>> + * hardcode page size to TARGET_PAGE_SIZE.
>>> + *
>>> + * @bs: RAM block to populate
>>> + */
>>> +volatile int ram_block_populate_pages__tmp;
>>> +static void ram_block_populate_pages(RAMBlock *bs)
>>> +{
>>> + ram_addr_t offset = 0;
>>> + int tmp = 0;
>>> +
>>> + for (char *ptr = (char *) bs->host; offset < bs->used_length;
>>> + ptr += TARGET_PAGE_SIZE, offset += TARGET_PAGE_SIZE) {
>>
>> You'll want qemu_real_host_page_size instead of TARGET_PAGE_SIZE
>>
> Ok.
>>> + /* Try to do it without memory writes */
>>> + tmp += *(volatile int *) ptr;
>>> + }
>>
>>
>> The following is slightly simpler and doesn't rely on volatile
>> semantics [1].
>> Should work on any arch I guess.
>>
>> static void ram_block_populate_pages(RAMBlock *bs)
>> {
>> char *ptr = (char *) bs->host;
>> ram_addr_t offset;
>>
>> for (offset = 0; offset < bs->used_length;
>> offset += qemu_real_host_page_size) {
>> char tmp = *(volatile char *)(ptr + offset)
>>
>> /* Don't optimize the read out. */
>> asm volatile ("" : "+r" (tmp));
>> }
>>
> Thanks, good option, I'll change the code.
>
>> Compiles to
>>
>> for (offset = 0; offset < bs->used_length;
>> 316d: 48 8b 4b 30 mov 0x30(%rbx),%rcx
>> char *ptr = (char *) bs->host;
>> 3171: 48 8b 73 18 mov 0x18(%rbx),%rsi
>> for (offset = 0; offset < bs->used_length;
>> 3175: 48 85 c9 test %rcx,%rcx
>> 3178: 74 ce je 3148
>> <ram_write_tracking_prepare+0x58>
>> offset += qemu_real_host_page_size) {
>> 317a: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax #
>> 3181 <ram_write_tracking_prepare+0x91>
>> 3181: 48 8b 38 mov (%rax),%rdi
>> 3184: 0f 1f 40 00 nopl 0x0(%rax)
>> char tmp = *(volatile char *)(ptr + offset);
>> 3188: 48 8d 04 16 lea (%rsi,%rdx,1),%rax
>> 318c: 0f b6 00 movzbl (%rax),%eax
>> offset += qemu_real_host_page_size) {
>> 318f: 48 01 fa add %rdi,%rdx
>> for (offset = 0; offset < bs->used_length;
>> 3192: 48 39 ca cmp %rcx,%rdx
>> 3195: 72 f1 jb 3188
>> <ram_write_tracking_prepare+0x98>
>>
>>
>> [1]
>> https://programfan.github.io/blog/2015/04/27/prevent-gcc-optimize-away-code/
>>
>>
>> I'll send patches soon to take care of virtio-mem via RamDiscardManager -
>> to skip populating the parts that are supposed to remain discarded and
>> not migrated.
>> Unfortunately, the RamDiscardManager patches are still stuck waiting for
>> acks ... and now we're in soft-freeze.
>>
> RamDiscardManager patches - do they also modify migration code?
> I mean which part is responsible of not migrating discarded ranges.
I haven't shared relevant patches yet that touch migration code. I'm
planning on doing that once the generic RamDiscardManager has all
relevant acks. I'll put you on cc.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] migration: Pre-fault memory before starting background snasphot
2021-03-19 11:27 ` David Hildenbrand
@ 2021-03-19 12:37 ` Andrey Gruzdev
0 siblings, 0 replies; 14+ messages in thread
From: Andrey Gruzdev @ 2021-03-19 12:37 UTC (permalink / raw)
To: David Hildenbrand, qemu-devel
Cc: Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
Dr . David Alan Gilbert, Markus Armbruster, Peter Xu
On 19.03.2021 14:27, David Hildenbrand wrote:
> On 19.03.21 12:05, Andrey Gruzdev wrote:
>> On 19.03.2021 12:28, David Hildenbrand wrote:
>>>> +/*
>>>> + * ram_block_populate_pages: populate memory in the RAM block by
>>>> reading
>>>> + * an integer from the beginning of each page.
>>>> + *
>>>> + * Since it's solely used for userfault_fd WP feature, here we just
>>>> + * hardcode page size to TARGET_PAGE_SIZE.
>>>> + *
>>>> + * @bs: RAM block to populate
>>>> + */
>>>> +volatile int ram_block_populate_pages__tmp;
>>>> +static void ram_block_populate_pages(RAMBlock *bs)
>>>> +{
>>>> + ram_addr_t offset = 0;
>>>> + int tmp = 0;
>>>> +
>>>> + for (char *ptr = (char *) bs->host; offset < bs->used_length;
>>>> + ptr += TARGET_PAGE_SIZE, offset += TARGET_PAGE_SIZE) {
>>>
>>> You'll want qemu_real_host_page_size instead of TARGET_PAGE_SIZE
>>>
>> Ok.
>>>> + /* Try to do it without memory writes */
>>>> + tmp += *(volatile int *) ptr;
>>>> + }
>>>
>>>
>>> The following is slightly simpler and doesn't rely on volatile
>>> semantics [1].
>>> Should work on any arch I guess.
>>>
>>> static void ram_block_populate_pages(RAMBlock *bs)
>>> {
>>> char *ptr = (char *) bs->host;
>>> ram_addr_t offset;
>>>
>>> for (offset = 0; offset < bs->used_length;
>>> offset += qemu_real_host_page_size) {
>>> char tmp = *(volatile char *)(ptr + offset)
>>>
>>> /* Don't optimize the read out. */
>>> asm volatile ("" : "+r" (tmp));
>>> }
>>>
>> Thanks, good option, I'll change the code.
>>
>>> Compiles to
>>>
>>> for (offset = 0; offset < bs->used_length;
>>> 316d: 48 8b 4b 30 mov 0x30(%rbx),%rcx
>>> char *ptr = (char *) bs->host;
>>> 3171: 48 8b 73 18 mov 0x18(%rbx),%rsi
>>> for (offset = 0; offset < bs->used_length;
>>> 3175: 48 85 c9 test %rcx,%rcx
>>> 3178: 74 ce je 3148
>>> <ram_write_tracking_prepare+0x58>
>>> offset += qemu_real_host_page_size) {
>>> 317a: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax #
>>> 3181 <ram_write_tracking_prepare+0x91>
>>> 3181: 48 8b 38 mov (%rax),%rdi
>>> 3184: 0f 1f 40 00 nopl 0x0(%rax)
>>> char tmp = *(volatile char *)(ptr + offset);
>>> 3188: 48 8d 04 16 lea (%rsi,%rdx,1),%rax
>>> 318c: 0f b6 00 movzbl (%rax),%eax
>>> offset += qemu_real_host_page_size) {
>>> 318f: 48 01 fa add %rdi,%rdx
>>> for (offset = 0; offset < bs->used_length;
>>> 3192: 48 39 ca cmp %rcx,%rdx
>>> 3195: 72 f1 jb 3188
>>> <ram_write_tracking_prepare+0x98>
>>>
>>>
>>> [1]
>>> https://programfan.github.io/blog/2015/04/27/prevent-gcc-optimize-away-code/
>>>
>>>
>>>
>>> I'll send patches soon to take care of virtio-mem via
>>> RamDiscardManager -
>>> to skip populating the parts that are supposed to remain discarded and
>>> not migrated.
>>> Unfortunately, the RamDiscardManager patches are still stuck waiting
>>> for
>>> acks ... and now we're in soft-freeze.
>>>
>> RamDiscardManager patches - do they also modify migration code?
>> I mean which part is responsible of not migrating discarded ranges.
>
> I haven't shared relevant patches yet that touch migration code. I'm
> planning on doing that once the generic RamDiscardManager has all
> relevant acks. I'll put you on cc.
>
Got it, thanks.
--
Andrey Gruzdev, Principal Engineer
Virtuozzo GmbH +7-903-247-6397
virtuzzo.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread
2021-03-18 17:46 ` [PATCH 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread Andrey Gruzdev
@ 2021-03-19 12:39 ` David Hildenbrand
2021-03-19 13:13 ` Andrey Gruzdev
0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2021-03-19 12:39 UTC (permalink / raw)
To: Andrey Gruzdev, qemu-devel
Cc: Juan Quintela, Dr . David Alan Gilbert, Peter Xu,
Markus Armbruster, Paolo Bonzini, Den Lunev
On 18.03.21 18:46, Andrey Gruzdev wrote:
> Added missing qemu_fflush() on buffer file holding precopy device state.
> Increased initial QIOChannelBuffer allocation to 512KB to avoid reallocs.
> Typical configurations often require >200KB for device state and VMDESC.
>
> Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
> ---
> migration/migration.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 36768391b6..496cf6e17b 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3857,7 +3857,7 @@ static void *bg_migration_thread(void *opaque)
> * with vCPUs running and, finally, write stashed non-RAM part of
> * the vmstate from the buffer to the migration stream.
> */
> - s->bioc = qio_channel_buffer_new(128 * 1024);
> + s->bioc = qio_channel_buffer_new(512 * 1024);
^ would that better be a separate patch? It sounds more like an
improvement than a fix.
> qio_channel_set_name(QIO_CHANNEL(s->bioc), "vmstate-buffer");
> fb = qemu_fopen_channel_output(QIO_CHANNEL(s->bioc));
> object_unref(OBJECT(s->bioc));
> @@ -3911,6 +3911,8 @@ static void *bg_migration_thread(void *opaque)
> if (qemu_savevm_state_complete_precopy_non_iterable(fb, false, false)) {
> goto fail;
> }
> + qemu_fflush(fb);
Fixes: ?
> +
> /* Now initialize UFFD context and start tracking RAM writes */
> if (ram_write_tracking_start()) {
> goto fail;
>
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread
2021-03-19 12:39 ` David Hildenbrand
@ 2021-03-19 13:13 ` Andrey Gruzdev
0 siblings, 0 replies; 14+ messages in thread
From: Andrey Gruzdev @ 2021-03-19 13:13 UTC (permalink / raw)
To: David Hildenbrand, qemu-devel
Cc: Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
Dr . David Alan Gilbert, Markus Armbruster, Peter Xu
On 19.03.2021 15:39, David Hildenbrand wrote:
> On 18.03.21 18:46, Andrey Gruzdev wrote:
>> Added missing qemu_fflush() on buffer file holding precopy device state.
>> Increased initial QIOChannelBuffer allocation to 512KB to avoid
>> reallocs.
>> Typical configurations often require >200KB for device state and VMDESC.
>>
>> Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
>> ---
>> migration/migration.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 36768391b6..496cf6e17b 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -3857,7 +3857,7 @@ static void *bg_migration_thread(void *opaque)
>> * with vCPUs running and, finally, write stashed non-RAM part of
>> * the vmstate from the buffer to the migration stream.
>> */
>> - s->bioc = qio_channel_buffer_new(128 * 1024);
>> + s->bioc = qio_channel_buffer_new(512 * 1024);
>
> ^ would that better be a separate patch? It sounds more like an
> improvement than a fix.
>
>> qio_channel_set_name(QIO_CHANNEL(s->bioc), "vmstate-buffer");
>> fb = qemu_fopen_channel_output(QIO_CHANNEL(s->bioc));
>> object_unref(OBJECT(s->bioc));
>> @@ -3911,6 +3911,8 @@ static void *bg_migration_thread(void *opaque)
>> if (qemu_savevm_state_complete_precopy_non_iterable(fb, false,
>> false)) {
>> goto fail;
>> }
>> + qemu_fflush(fb);
>
> Fixes: ?
>
It fixes unflushed QEMUFile, not an improvement. If not flushed, the
migrate_get_current()->bioc->data
would stay missing some bytes at the tail.
>> +
>> /* Now initialize UFFD context and start tracking RAM writes */
>> if (ram_write_tracking_start()) {
>> goto fail;
>>
>
>
--
Andrey Gruzdev, Principal Engineer
Virtuozzo GmbH +7-903-247-6397
virtuzzo.com
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2021-03-19 13:14 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-18 17:46 [PATCH 0/3] migration: Fixes to the 'background-snapshot' code Andrey Gruzdev
2021-03-18 17:46 ` [PATCH 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread Andrey Gruzdev
2021-03-19 12:39 ` David Hildenbrand
2021-03-19 13:13 ` Andrey Gruzdev
2021-03-18 17:46 ` [PATCH 2/3] migration: Inhibit virtio-balloon for the duration of background snapshot Andrey Gruzdev
2021-03-18 18:16 ` David Hildenbrand
2021-03-19 8:27 ` Andrey Gruzdev
2021-03-18 17:46 ` [PATCH 3/3] migration: Pre-fault memory before starting background snasphot Andrey Gruzdev
2021-03-19 9:28 ` David Hildenbrand
2021-03-19 9:32 ` David Hildenbrand
2021-03-19 11:09 ` Andrey Gruzdev
2021-03-19 11:05 ` Andrey Gruzdev
2021-03-19 11:27 ` David Hildenbrand
2021-03-19 12:37 ` Andrey Gruzdev
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.