qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/3] migration: Fixes to the 'background-snapshot' code
@ 2021-03-19 14:52 Andrey Gruzdev
  2021-03-19 14:52 ` [PATCH v1 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread Andrey Gruzdev
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Andrey Gruzdev @ 2021-03-19 14:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
	Dr . David Alan Gilbert, Markus Armbruster, Peter Xu,
	David Hildenbrand, Andrey Gruzdev

Changes v0->v1:
 * Using qemu_real_host_page_size instead of TARGET_PAGE_SIZE for host
   page size in ram_block_populate_pages()
 * More elegant implementation of ram_block_populate_pages()

This patch series contains:
 * Fix to the issue with occasionally truncated non-iterable device state
 * Solution to compatibility issues with virtio-balloon device
 * Fix to the issue when discarded or never populated pages miss UFFD
   write protection and get into migration stream in dirty state

Andrey Gruzdev (3):
  migration: Fix missing qemu_fflush() on buffer file in
    bg_migration_thread
  migration: Inhibit virtio-balloon for the duration of background
    snapshot
  migration: Pre-fault memory before starting background snasphot

 hw/virtio/virtio-balloon.c |  8 +++++--
 include/migration/misc.h   |  2 ++
 migration/migration.c      | 18 +++++++++++++-
 migration/ram.c            | 48 ++++++++++++++++++++++++++++++++++++++
 migration/ram.h            |  1 +
 5 files changed, 74 insertions(+), 3 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v1 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread
  2021-03-19 14:52 [PATCH v1 0/3] migration: Fixes to the 'background-snapshot' code Andrey Gruzdev
@ 2021-03-19 14:52 ` Andrey Gruzdev
  2021-03-22 20:17   ` Peter Xu
  2021-03-19 14:52 ` [PATCH v1 2/3] migration: Inhibit virtio-balloon for the duration of background snapshot Andrey Gruzdev
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Andrey Gruzdev @ 2021-03-19 14:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
	Dr . David Alan Gilbert, Markus Armbruster, Peter Xu,
	David Hildenbrand, Andrey Gruzdev

Added missing qemu_fflush() on buffer file holding precopy device state.
Increased initial QIOChannelBuffer allocation to 512KB to avoid reallocs.
Typical configurations often require >200KB for device state and VMDESC.

Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
---
 migration/migration.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index ca8b97baa5..32b48fe9f5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3812,7 +3812,7 @@ static void *bg_migration_thread(void *opaque)
      * with vCPUs running and, finally, write stashed non-RAM part of
      * the vmstate from the buffer to the migration stream.
      */
-    s->bioc = qio_channel_buffer_new(128 * 1024);
+    s->bioc = qio_channel_buffer_new(512 * 1024);
     qio_channel_set_name(QIO_CHANNEL(s->bioc), "vmstate-buffer");
     fb = qemu_fopen_channel_output(QIO_CHANNEL(s->bioc));
     object_unref(OBJECT(s->bioc));
@@ -3866,6 +3866,8 @@ static void *bg_migration_thread(void *opaque)
     if (qemu_savevm_state_complete_precopy_non_iterable(fb, false, false)) {
         goto fail;
     }
+    qemu_fflush(fb);
+
     /* Now initialize UFFD context and start tracking RAM writes */
     if (ram_write_tracking_start()) {
         goto fail;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v1 2/3] migration: Inhibit virtio-balloon for the duration of background snapshot
  2021-03-19 14:52 [PATCH v1 0/3] migration: Fixes to the 'background-snapshot' code Andrey Gruzdev
  2021-03-19 14:52 ` [PATCH v1 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread Andrey Gruzdev
@ 2021-03-19 14:52 ` Andrey Gruzdev
  2021-03-19 14:52 ` [PATCH v1 3/3] migration: Pre-fault memory before starting background snasphot Andrey Gruzdev
  2021-03-23 22:21 ` [PATCH v1 0/3] migration: Fixes to the 'background-snapshot' code Peter Xu
  3 siblings, 0 replies; 16+ messages in thread
From: Andrey Gruzdev @ 2021-03-19 14:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
	Dr . David Alan Gilbert, Markus Armbruster, Peter Xu,
	David Hildenbrand, Andrey Gruzdev

The same thing as for incoming postcopy - we cannot deal with concurrent
RAM discards when using background snapshot feature in outgoing migration.

Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/virtio-balloon.c | 8 ++++++--
 include/migration/misc.h   | 2 ++
 migration/migration.c      | 8 ++++++++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index e770955176..d120bf8f43 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -66,8 +66,12 @@ static bool virtio_balloon_pbp_matches(PartiallyBalloonedPage *pbp,
 
 static bool virtio_balloon_inhibited(void)
 {
-    /* Postcopy cannot deal with concurrent discards, so it's special. */
-    return ram_block_discard_is_disabled() || migration_in_incoming_postcopy();
+    /*
+     * Postcopy cannot deal with concurrent discards,
+     * so it's special, as well as background snapshots.
+     */
+    return ram_block_discard_is_disabled() || migration_in_incoming_postcopy() ||
+            migration_in_bg_snapshot();
 }
 
 static void balloon_inflate_page(VirtIOBalloon *balloon,
diff --git a/include/migration/misc.h b/include/migration/misc.h
index bccc1b6b44..738675ef52 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -70,6 +70,8 @@ bool migration_in_postcopy_after_devices(MigrationState *);
 void migration_global_dump(Monitor *mon);
 /* True if incomming migration entered POSTCOPY_INCOMING_DISCARD */
 bool migration_in_incoming_postcopy(void);
+/* True if background snapshot is active */
+bool migration_in_bg_snapshot(void);
 
 /* migration/block-dirty-bitmap.c */
 void dirty_bitmap_mig_init(void);
diff --git a/migration/migration.c b/migration/migration.c
index 32b48fe9f5..32aee2c5bd 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1976,6 +1976,14 @@ bool migration_in_incoming_postcopy(void)
     return ps >= POSTCOPY_INCOMING_DISCARD && ps < POSTCOPY_INCOMING_END;
 }
 
+bool migration_in_bg_snapshot(void)
+{
+    MigrationState *s = migrate_get_current();
+
+    return migrate_background_snapshot() &&
+            migration_is_setup_or_active(s->state);
+}
+
 bool migration_is_idle(void)
 {
     MigrationState *s = current_migration;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v1 3/3] migration: Pre-fault memory before starting background snasphot
  2021-03-19 14:52 [PATCH v1 0/3] migration: Fixes to the 'background-snapshot' code Andrey Gruzdev
  2021-03-19 14:52 ` [PATCH v1 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread Andrey Gruzdev
  2021-03-19 14:52 ` [PATCH v1 2/3] migration: Inhibit virtio-balloon for the duration of background snapshot Andrey Gruzdev
@ 2021-03-19 14:52 ` Andrey Gruzdev
  2021-03-23 22:21 ` [PATCH v1 0/3] migration: Fixes to the 'background-snapshot' code Peter Xu
  3 siblings, 0 replies; 16+ messages in thread
From: Andrey Gruzdev @ 2021-03-19 14:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
	Dr . David Alan Gilbert, Markus Armbruster, Peter Xu,
	David Hildenbrand, Andrey Gruzdev

This commit solves the issue with userfault_fd WP feature that
background snapshot is based on. For any never poluated or discarded
memory page, the UFFDIO_WRITEPROTECT ioctl() would skip updating
PTE for that page, thereby loosing WP setting for it.

So we need to pre-fault pages for each RAM block to be protected
before making a userfault_fd wr-protect ioctl().

Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
---
 migration/migration.c |  6 ++++++
 migration/ram.c       | 48 +++++++++++++++++++++++++++++++++++++++++++
 migration/ram.h       |  1 +
 3 files changed, 55 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 32aee2c5bd..ae7855155f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3827,6 +3827,12 @@ static void *bg_migration_thread(void *opaque)
 
     update_iteration_initial_status(s);
 
+    /*
+     * Prepare for tracking memory writes with UFFD-WP - populate
+     * RAM pages before protecting.
+     */
+    ram_write_tracking_prepare();
+
     qemu_savevm_state_header(s->to_dst_file);
     qemu_savevm_state_setup(s->to_dst_file);
 
diff --git a/migration/ram.c b/migration/ram.c
index 40e78952ad..24c8627214 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1560,6 +1560,54 @@ out:
     return ret;
 }
 
+/*
+ * ram_block_populate_pages: populate memory in the RAM block by reading
+ *   an integer from the beginning of each page.
+ *
+ * Since it's solely used for userfault_fd WP feature, here we just
+ *   hardcode page size to qemu_real_host_page_size.
+ *
+ * @bs: RAM block to populate
+ */
+static void ram_block_populate_pages(RAMBlock *bs)
+{
+    char *ptr = (char *) bs->host;
+
+    for (ram_addr_t offset = 0; offset < bs->used_length;
+            offset += qemu_real_host_page_size) {
+        char tmp = *(ptr + offset);
+        /* Don't optimize the read out */
+        asm volatile("" : "+r" (tmp));
+    }
+}
+
+/*
+ * ram_write_tracking_prepare: prepare for UFFD-WP memory tracking
+ */
+void ram_write_tracking_prepare(void)
+{
+    RAMBlock *bs;
+
+    RCU_READ_LOCK_GUARD();
+
+    RAMBLOCK_FOREACH_NOT_IGNORED(bs) {
+        /* Nothing to do with read-only and MMIO-writable regions */
+        if (bs->mr->readonly || bs->mr->rom_device) {
+            continue;
+        }
+
+        /*
+         * Populate pages of the RAM block before enabling userfault_fd
+         * write protection.
+         *
+         * This stage is required since ioctl(UFFDIO_WRITEPROTECT) with
+         * UFFDIO_WRITEPROTECT_MODE_WP mode setting would silently skip
+         * pages with pte_none() entries in page table.
+         */
+        ram_block_populate_pages(bs);
+    }
+}
+
 /*
  * ram_write_tracking_start: start UFFD-WP memory tracking
  *
diff --git a/migration/ram.h b/migration/ram.h
index 6378bb3ebc..4833e9fd5b 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -82,6 +82,7 @@ void colo_incoming_start_dirty_log(void);
 /* Background snapshot */
 bool ram_write_tracking_available(void);
 bool ram_write_tracking_compatible(void);
+void ram_write_tracking_prepare(void);
 int ram_write_tracking_start(void);
 void ram_write_tracking_stop(void);
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread
  2021-03-19 14:52 ` [PATCH v1 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread Andrey Gruzdev
@ 2021-03-22 20:17   ` Peter Xu
  2021-03-23  7:51     ` Andrey Gruzdev
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2021-03-22 20:17 UTC (permalink / raw)
  To: Andrey Gruzdev
  Cc: Juan Quintela, Markus Armbruster, David Hildenbrand, qemu-devel,
	Dr . David Alan Gilbert, Paolo Bonzini, Den Lunev

On Fri, Mar 19, 2021 at 05:52:47PM +0300, Andrey Gruzdev wrote:
> Added missing qemu_fflush() on buffer file holding precopy device state.
> Increased initial QIOChannelBuffer allocation to 512KB to avoid reallocs.
> Typical configurations often require >200KB for device state and VMDESC.
> 
> Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
> ---
>  migration/migration.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index ca8b97baa5..32b48fe9f5 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3812,7 +3812,7 @@ static void *bg_migration_thread(void *opaque)
>       * with vCPUs running and, finally, write stashed non-RAM part of
>       * the vmstate from the buffer to the migration stream.
>       */
> -    s->bioc = qio_channel_buffer_new(128 * 1024);
> +    s->bioc = qio_channel_buffer_new(512 * 1024);
>      qio_channel_set_name(QIO_CHANNEL(s->bioc), "vmstate-buffer");
>      fb = qemu_fopen_channel_output(QIO_CHANNEL(s->bioc));
>      object_unref(OBJECT(s->bioc));
> @@ -3866,6 +3866,8 @@ static void *bg_migration_thread(void *opaque)
>      if (qemu_savevm_state_complete_precopy_non_iterable(fb, false, false)) {
>          goto fail;
>      }
> +    qemu_fflush(fb);

What will happen if the vmstates are bigger than 512KB?  Would the extra data
be dropped?

In that case, I'm wondering whether we'll need a qemu_file_get_error() after
the flush to detect it, and whether we need to retry with a bigger buffer size?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread
  2021-03-22 20:17   ` Peter Xu
@ 2021-03-23  7:51     ` Andrey Gruzdev
  2021-03-23 14:54       ` Peter Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Andrey Gruzdev @ 2021-03-23  7:51 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
	Dr . David Alan Gilbert, Markus Armbruster, David Hildenbrand

[-- Attachment #1: Type: text/plain, Size: 1831 bytes --]

On 22.03.2021 23:17, Peter Xu wrote:
> On Fri, Mar 19, 2021 at 05:52:47PM +0300, Andrey Gruzdev wrote:
>> Added missing qemu_fflush() on buffer file holding precopy device state.
>> Increased initial QIOChannelBuffer allocation to 512KB to avoid reallocs.
>> Typical configurations often require >200KB for device state and VMDESC.
>>
>> Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
>> ---
>>   migration/migration.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index ca8b97baa5..32b48fe9f5 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -3812,7 +3812,7 @@ static void *bg_migration_thread(void *opaque)
>>        * with vCPUs running and, finally, write stashed non-RAM part of
>>        * the vmstate from the buffer to the migration stream.
>>        */
>> -    s->bioc = qio_channel_buffer_new(128 * 1024);
>> +    s->bioc = qio_channel_buffer_new(512 * 1024);
>>       qio_channel_set_name(QIO_CHANNEL(s->bioc), "vmstate-buffer");
>>       fb = qemu_fopen_channel_output(QIO_CHANNEL(s->bioc));
>>       object_unref(OBJECT(s->bioc));
>> @@ -3866,6 +3866,8 @@ static void *bg_migration_thread(void *opaque)
>>       if (qemu_savevm_state_complete_precopy_non_iterable(fb, false, false)) {
>>           goto fail;
>>       }
>> +    qemu_fflush(fb);
> What will happen if the vmstates are bigger than 512KB?  Would the extra data
> be dropped?

No, the buffer shall be reallocated and the original content shall be kept.

> In that case, I'm wondering whether we'll need a qemu_file_get_error() after
> the flush to detect it, and whether we need to retry with a bigger buffer size?
>
> Thanks,
>
I think for QIOBufferChannel we don't need that, as I can see from the buffer channel
implementation.


[-- Attachment #2: Type: text/html, Size: 2457 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread
  2021-03-23  7:51     ` Andrey Gruzdev
@ 2021-03-23 14:54       ` Peter Xu
  2021-03-23 17:21         ` Andrey Gruzdev
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2021-03-23 14:54 UTC (permalink / raw)
  To: Andrey Gruzdev
  Cc: Juan Quintela, Markus Armbruster, David Hildenbrand, qemu-devel,
	Dr . David Alan Gilbert, Paolo Bonzini, Den Lunev

On Tue, Mar 23, 2021 at 10:51:57AM +0300, Andrey Gruzdev wrote:
> On 22.03.2021 23:17, Peter Xu wrote:
> > On Fri, Mar 19, 2021 at 05:52:47PM +0300, Andrey Gruzdev wrote:
> > > Added missing qemu_fflush() on buffer file holding precopy device state.
> > > Increased initial QIOChannelBuffer allocation to 512KB to avoid reallocs.
> > > Typical configurations often require >200KB for device state and VMDESC.
> > > 
> > > Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
> > > ---
> > >   migration/migration.c | 4 +++-
> > >   1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index ca8b97baa5..32b48fe9f5 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -3812,7 +3812,7 @@ static void *bg_migration_thread(void *opaque)
> > >        * with vCPUs running and, finally, write stashed non-RAM part of
> > >        * the vmstate from the buffer to the migration stream.
> > >        */
> > > -    s->bioc = qio_channel_buffer_new(128 * 1024);
> > > +    s->bioc = qio_channel_buffer_new(512 * 1024);
> > >       qio_channel_set_name(QIO_CHANNEL(s->bioc), "vmstate-buffer");
> > >       fb = qemu_fopen_channel_output(QIO_CHANNEL(s->bioc));
> > >       object_unref(OBJECT(s->bioc));
> > > @@ -3866,6 +3866,8 @@ static void *bg_migration_thread(void *opaque)
> > >       if (qemu_savevm_state_complete_precopy_non_iterable(fb, false, false)) {
> > >           goto fail;
> > >       }
> > > +    qemu_fflush(fb);
> > What will happen if the vmstates are bigger than 512KB?  Would the extra data
> > be dropped?
> 
> No, the buffer shall be reallocated and the original content shall be kept.

Oh right..

Would you comment above qemu_fflush() about it (maybe also move it right before
the call to qemu_put_buffer)?  Otherwise it indeed looks more like an
optimization rather than a bugfix.

For the long term I think we'd better have a helper:

        qemu_put_qio_channel_buffer(QEMUFile *file, QIOChannelBuffer *bioc)

So as to hide this flush operation, which is tricky. We'll have two users so
far:

        bg_migration_completion
        colo_do_checkpoint_transaction

IMHO it'll be nicer if you'd do it in this patch altogether!

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread
  2021-03-23 14:54       ` Peter Xu
@ 2021-03-23 17:21         ` Andrey Gruzdev
  2021-03-23 18:35           ` Peter Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Andrey Gruzdev @ 2021-03-23 17:21 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
	Dr . David Alan Gilbert, Markus Armbruster, David Hildenbrand

[-- Attachment #1: Type: text/plain, Size: 2414 bytes --]

On 23.03.2021 17:54, Peter Xu wrote:
> On Tue, Mar 23, 2021 at 10:51:57AM +0300, Andrey Gruzdev wrote:
>> On 22.03.2021 23:17, Peter Xu wrote:
>>> On Fri, Mar 19, 2021 at 05:52:47PM +0300, Andrey Gruzdev wrote:
>>>> Added missing qemu_fflush() on buffer file holding precopy device state.
>>>> Increased initial QIOChannelBuffer allocation to 512KB to avoid reallocs.
>>>> Typical configurations often require >200KB for device state and VMDESC.
>>>>
>>>> Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
>>>> ---
>>>>    migration/migration.c | 4 +++-
>>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>> index ca8b97baa5..32b48fe9f5 100644
>>>> --- a/migration/migration.c
>>>> +++ b/migration/migration.c
>>>> @@ -3812,7 +3812,7 @@ static void *bg_migration_thread(void *opaque)
>>>>         * with vCPUs running and, finally, write stashed non-RAM part of
>>>>         * the vmstate from the buffer to the migration stream.
>>>>         */
>>>> -    s->bioc = qio_channel_buffer_new(128 * 1024);
>>>> +    s->bioc = qio_channel_buffer_new(512 * 1024);
>>>>        qio_channel_set_name(QIO_CHANNEL(s->bioc), "vmstate-buffer");
>>>>        fb = qemu_fopen_channel_output(QIO_CHANNEL(s->bioc));
>>>>        object_unref(OBJECT(s->bioc));
>>>> @@ -3866,6 +3866,8 @@ static void *bg_migration_thread(void *opaque)
>>>>        if (qemu_savevm_state_complete_precopy_non_iterable(fb, false, false)) {
>>>>            goto fail;
>>>>        }
>>>> +    qemu_fflush(fb);
>>> What will happen if the vmstates are bigger than 512KB?  Would the extra data
>>> be dropped?
>> No, the buffer shall be reallocated and the original content shall be kept.
> Oh right..
>
> Would you comment above qemu_fflush() about it (maybe also move it right before
> the call to qemu_put_buffer)?  Otherwise it indeed looks more like an
> optimization rather than a bugfix.

Agree, better to have a comment here.

> For the long term I think we'd better have a helper:
>
>          qemu_put_qio_channel_buffer(QEMUFile *file, QIOChannelBuffer *bioc)
>
> So as to hide this flush operation, which is tricky. We'll have two users so
> far:
>
>          bg_migration_completion
>          colo_do_checkpoint_transaction
>
> IMHO it'll be nicer if you'd do it in this patch altogether!
>
> Thanks,
>
Sorry, can't get the idea, what's wrong with the fix.


[-- Attachment #2: Type: text/html, Size: 3268 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread
  2021-03-23 17:21         ` Andrey Gruzdev
@ 2021-03-23 18:35           ` Peter Xu
  2021-03-24  7:48             ` Andrey Gruzdev
  2021-03-24 17:35             ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 16+ messages in thread
From: Peter Xu @ 2021-03-23 18:35 UTC (permalink / raw)
  To: Andrey Gruzdev
  Cc: Juan Quintela, Markus Armbruster, David Hildenbrand, qemu-devel,
	Dr . David Alan Gilbert, Paolo Bonzini, Den Lunev

[-- Attachment #1: Type: text/plain, Size: 812 bytes --]

On Tue, Mar 23, 2021 at 08:21:43PM +0300, Andrey Gruzdev wrote:
> > For the long term I think we'd better have a helper:
> > 
> >          qemu_put_qio_channel_buffer(QEMUFile *file, QIOChannelBuffer *bioc)
> > 
> > So as to hide this flush operation, which is tricky. We'll have two users so
> > far:
> > 
> >          bg_migration_completion
> >          colo_do_checkpoint_transaction
> > 
> > IMHO it'll be nicer if you'd do it in this patch altogether!
> > 
> > Thanks,
> > 
> Sorry, can't get the idea, what's wrong with the fix.

I'm fine with the fix, but I've got one patch attached just to show what I
meant, so without any testing for sure..

Looks more complicated than I thought, but again I think we should hide that
buffer flush into another helper to avoid overlooking it.

Thanks,

-- 
Peter Xu

[-- Attachment #2: 0001-migration-Introduce-qemu_put_qio_channel_buffer.patch --]
[-- Type: text/plain, Size: 6038 bytes --]

From f3004948e2498fb9ac4646a6a5225c58ea3f1623 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Tue, 23 Mar 2021 14:30:24 -0400
Subject: [PATCH] migration: Introduce qemu_put_qio_channel_buffer()

Meanwhile use it in background snapshot code, so as to dump one QEMUFile buffer
(which is created by QIOChannelBuffer) into another QEMUFile.

Similar thing should be able to be applied to colo_do_checkpoint_transaction()
too.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c         | 16 +++++++++-------
 migration/migration.h         |  1 -
 migration/qemu-file-channel.c | 14 ++++++++++++++
 migration/qemu-file-channel.h |  2 ++
 migration/qemu-file.c         |  4 ++++
 migration/qemu-file.h         |  1 +
 6 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index a5ddf43559e..9d73c236b16 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3243,8 +3243,9 @@ fail:
  *   RAM has been saved. The caller 'breaks' the loop when this returns.
  *
  * @s: Current migration state
+ * @vmstates: The QEMUFile* buffer that contains all the device vmstates
  */
-static void bg_migration_completion(MigrationState *s)
+static void bg_migration_completion(MigrationState *s, QEMUFile *vmstates)
 {
     int current_active_state = s->state;
 
@@ -3262,7 +3263,7 @@ static void bg_migration_completion(MigrationState *s)
          * right after the ram content. The device state has been stored into
          * the temporary buffer before RAM saving started.
          */
-        qemu_put_buffer(s->to_dst_file, s->bioc->data, s->bioc->usage);
+        qemu_put_qio_channel_buffer(s->to_dst_file, vmstates);
         qemu_fflush(s->to_dst_file);
     } else if (s->state == MIGRATION_STATUS_CANCELLING) {
         goto fail;
@@ -3656,7 +3657,6 @@ static MigIterateState bg_migration_iteration_run(MigrationState *s)
 
     res = qemu_savevm_state_iterate(s->to_dst_file, false);
     if (res > 0) {
-        bg_migration_completion(s);
         return MIG_ITERATE_BREAK;
     }
 
@@ -3843,6 +3843,7 @@ static void *bg_migration_thread(void *opaque)
     MigThrError thr_error;
     QEMUFile *fb;
     bool early_fail = true;
+    QIOChannelBuffer *bioc;
 
     rcu_register_thread();
     object_ref(OBJECT(s));
@@ -3859,10 +3860,10 @@ static void *bg_migration_thread(void *opaque)
      * with vCPUs running and, finally, write stashed non-RAM part of
      * the vmstate from the buffer to the migration stream.
      */
-    s->bioc = qio_channel_buffer_new(128 * 1024);
-    qio_channel_set_name(QIO_CHANNEL(s->bioc), "vmstate-buffer");
-    fb = qemu_fopen_channel_output(QIO_CHANNEL(s->bioc));
-    object_unref(OBJECT(s->bioc));
+    bioc = qio_channel_buffer_new(128 * 1024);
+    qio_channel_set_name(QIO_CHANNEL(bioc), "vmstate-buffer");
+    fb = qemu_fopen_channel_output(QIO_CHANNEL(bioc));
+    object_unref(OBJECT(bioc));
 
     update_iteration_initial_status(s);
 
@@ -3935,6 +3936,7 @@ static void *bg_migration_thread(void *opaque)
         if (iter_state == MIG_ITERATE_SKIP) {
             continue;
         } else if (iter_state == MIG_ITERATE_BREAK) {
+            bg_migration_completion(s, fb);
             break;
         }
 
diff --git a/migration/migration.h b/migration/migration.h
index db6708326b1..bdcd74ce084 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -151,7 +151,6 @@ struct MigrationState {
     QEMUBH *vm_start_bh;
     QEMUBH *cleanup_bh;
     QEMUFile *to_dst_file;
-    QIOChannelBuffer *bioc;
     /*
      * Protects to_dst_file pointer.  We need to make sure we won't
      * yield or hang during the critical section, since this lock will
diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
index afc3a7f642a..c1bf71510f8 100644
--- a/migration/qemu-file-channel.c
+++ b/migration/qemu-file-channel.c
@@ -26,6 +26,7 @@
 #include "qemu-file-channel.h"
 #include "qemu-file.h"
 #include "io/channel-socket.h"
+#include "io/channel-buffer.h"
 #include "qemu/iov.h"
 #include "qemu/yank.h"
 
@@ -196,3 +197,16 @@ QEMUFile *qemu_fopen_channel_output(QIOChannel *ioc)
     object_ref(OBJECT(ioc));
     return qemu_fopen_ops(ioc, &channel_output_ops);
 }
+
+/*
+ * Splice all the data in `buffer' into `dst'.  Note that `buffer' must points
+ * to a QEMUFile that was created with qemu_fopen_channel_output().
+ */
+void qemu_put_qio_channel_buffer(QEMUFile *dst, QEMUFile *buffer)
+{
+    QIOChannelBuffer *bioc = QIO_CHANNEL_BUFFER(qemu_file_get_opaque(buffer));
+
+    /* Make sure data cached in QEMUFile flushed to QIOChannel buffers */
+    qemu_fflush(buffer);
+    qemu_put_buffer(dst, bioc->data, bioc->usage);
+}
diff --git a/migration/qemu-file-channel.h b/migration/qemu-file-channel.h
index 0028a09eb61..5f165527616 100644
--- a/migration/qemu-file-channel.h
+++ b/migration/qemu-file-channel.h
@@ -29,4 +29,6 @@
 
 QEMUFile *qemu_fopen_channel_input(QIOChannel *ioc);
 QEMUFile *qemu_fopen_channel_output(QIOChannel *ioc);
+void qemu_put_qio_channel_buffer(QEMUFile *dst, QEMUFile *buffer);
+
 #endif
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index d6e03dbc0e0..317f98fc8f5 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -112,6 +112,10 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops)
     return f;
 }
 
+void *qemu_file_get_opaque(QEMUFile *f)
+{
+    return f->opaque;
+}
 
 void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks)
 {
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index a9b6d6ccb7d..30c8ec855ac 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -161,6 +161,7 @@ int qemu_file_shutdown(QEMUFile *f);
 QEMUFile *qemu_file_get_return_path(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 void qemu_file_set_blocking(QEMUFile *f, bool block);
+void *qemu_file_get_opaque(QEMUFile *f);
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 0/3] migration: Fixes to the 'background-snapshot' code
  2021-03-19 14:52 [PATCH v1 0/3] migration: Fixes to the 'background-snapshot' code Andrey Gruzdev
                   ` (2 preceding siblings ...)
  2021-03-19 14:52 ` [PATCH v1 3/3] migration: Pre-fault memory before starting background snasphot Andrey Gruzdev
@ 2021-03-23 22:21 ` Peter Xu
  2021-03-24  8:09   ` Andrey Gruzdev
  3 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2021-03-23 22:21 UTC (permalink / raw)
  To: Andrey Gruzdev
  Cc: Juan Quintela, Markus Armbruster, David Hildenbrand, qemu-devel,
	Dr . David Alan Gilbert, Paolo Bonzini, Den Lunev

On Fri, Mar 19, 2021 at 05:52:46PM +0300, Andrey Gruzdev wrote:
> Changes v0->v1:
>  * Using qemu_real_host_page_size instead of TARGET_PAGE_SIZE for host
>    page size in ram_block_populate_pages()
>  * More elegant implementation of ram_block_populate_pages()
> 
> This patch series contains:
>  * Fix to the issue with occasionally truncated non-iterable device state
>  * Solution to compatibility issues with virtio-balloon device
>  * Fix to the issue when discarded or never populated pages miss UFFD
>    write protection and get into migration stream in dirty state
> 
> Andrey Gruzdev (3):
>   migration: Fix missing qemu_fflush() on buffer file in
>     bg_migration_thread
>   migration: Inhibit virtio-balloon for the duration of background
>     snapshot
>   migration: Pre-fault memory before starting background snasphot

Unless Andrey would like to respin a new version, this version looks good to me
(I don't think the adding new helper issue in patch 1 is a blocker):

Reviewed-by: Peter Xu <peterx@redhat.com>

I'm also looking into introducing UFFD_FEATURE_WP_UNALLOCATED so as to
wr-protect page holes too for a uffd-wp region when the feature bit is set.
With that feature we should be able to avoid pre-fault as what we do in the
last patch of this series.  However even if that can work out, we'll still need
this for old kernel anyways.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread
  2021-03-23 18:35           ` Peter Xu
@ 2021-03-24  7:48             ` Andrey Gruzdev
  2021-03-24 17:35             ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 16+ messages in thread
From: Andrey Gruzdev @ 2021-03-24  7:48 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
	Dr . David Alan Gilbert, Markus Armbruster, David Hildenbrand

[-- Attachment #1: Type: text/plain, Size: 999 bytes --]

On 23.03.2021 21:35, Peter Xu wrote:
> On Tue, Mar 23, 2021 at 08:21:43PM +0300, Andrey Gruzdev wrote:
>>> For the long term I think we'd better have a helper:
>>>
>>>           qemu_put_qio_channel_buffer(QEMUFile *file, QIOChannelBuffer *bioc)
>>>
>>> So as to hide this flush operation, which is tricky. We'll have two users so
>>> far:
>>>
>>>           bg_migration_completion
>>>           colo_do_checkpoint_transaction
>>>
>>> IMHO it'll be nicer if you'd do it in this patch altogether!
>>>
>>> Thanks,
>>>
>> Sorry, can't get the idea, what's wrong with the fix.
> I'm fine with the fix, but I've got one patch attached just to show what I
> meant, so without any testing for sure..
>
> Looks more complicated than I thought, but again I think we should hide that
> buffer flush into another helper to avoid overlooking it.

Thanks, Peter, now I've got what you meant - not to overlook flush on buffer channel.
But it seems that adding a reasonable comment is enough here.

>
> Thanks,
>


[-- Attachment #2: Type: text/html, Size: 1653 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 0/3] migration: Fixes to the 'background-snapshot' code
  2021-03-23 22:21 ` [PATCH v1 0/3] migration: Fixes to the 'background-snapshot' code Peter Xu
@ 2021-03-24  8:09   ` Andrey Gruzdev
  2021-03-24 15:41     ` Peter Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Andrey Gruzdev @ 2021-03-24  8:09 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
	Dr . David Alan Gilbert, Markus Armbruster, David Hildenbrand

[-- Attachment #1: Type: text/plain, Size: 1603 bytes --]

On 24.03.2021 01:21, Peter Xu wrote:
> On Fri, Mar 19, 2021 at 05:52:46PM +0300, Andrey Gruzdev wrote:
>> Changes v0->v1:
>>   * Using qemu_real_host_page_size instead of TARGET_PAGE_SIZE for host
>>     page size in ram_block_populate_pages()
>>   * More elegant implementation of ram_block_populate_pages()
>>
>> This patch series contains:
>>   * Fix to the issue with occasionally truncated non-iterable device state
>>   * Solution to compatibility issues with virtio-balloon device
>>   * Fix to the issue when discarded or never populated pages miss UFFD
>>     write protection and get into migration stream in dirty state
>>
>> Andrey Gruzdev (3):
>>    migration: Fix missing qemu_fflush() on buffer file in
>>      bg_migration_thread
>>    migration: Inhibit virtio-balloon for the duration of background
>>      snapshot
>>    migration: Pre-fault memory before starting background snasphot
> Unless Andrey would like to respin a new version, this version looks good to me
> (I don't think the adding new helper issue in patch 1 is a blocker):
>
> Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks.

> I'm also looking into introducing UFFD_FEATURE_WP_UNALLOCATED so as to
> wr-protect page holes too for a uffd-wp region when the feature bit is set.
> With that feature we should be able to avoid pre-fault as what we do in the
> last patch of this series.  However even if that can work out, we'll still need
> this for old kernel anyways.

I'm curious this new feature is based on adding wr-protection at the level of VMAs,
so we won't miss write faults for missing pages?

> Thanks,
>


[-- Attachment #2: Type: text/html, Size: 2372 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 0/3] migration: Fixes to the 'background-snapshot' code
  2021-03-24  8:09   ` Andrey Gruzdev
@ 2021-03-24 15:41     ` Peter Xu
  2021-03-25  9:51       ` Andrey Gruzdev
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2021-03-24 15:41 UTC (permalink / raw)
  To: Andrey Gruzdev
  Cc: Juan Quintela, Markus Armbruster, David Hildenbrand, qemu-devel,
	Dr . David Alan Gilbert, Paolo Bonzini, Den Lunev

On Wed, Mar 24, 2021 at 11:09:27AM +0300, Andrey Gruzdev wrote:
> > I'm also looking into introducing UFFD_FEATURE_WP_UNALLOCATED so as to
> > wr-protect page holes too for a uffd-wp region when the feature bit is set.
> > With that feature we should be able to avoid pre-fault as what we do in the
> > last patch of this series.  However even if that can work out, we'll still need
> > this for old kernel anyways.
> 
> I'm curious this new feature is based on adding wr-protection at the level of VMAs,
> so we won't miss write faults for missing pages?

I think we can do it with multiple ways.

The most efficient one would be wr-protect the range during uffd-wp
registration, so as you said it'll be per-vma attribute.  However that'll
change the general semantics of uffd-wp as normally we need registration and
explicit wr-protect.  Then it'll still be pte-based for faulted in pages (the
ones we wr-protected during registration will still be), however for the rest
it'll become vma-based.  It's indeed a bit confusing.

The other way is we can fault in zero page during UFFDIO_WRITEPROTECT.  However
that's less efficient, since it's close to pre-fault on read but it's just
slightly more cleaner than doing it in userspace.  When I rethink about this it
may not worth it to do in kernel if userspace can achieve things similar.

So let's stick with current solution; that idea may need more thoughts..

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread
  2021-03-23 18:35           ` Peter Xu
  2021-03-24  7:48             ` Andrey Gruzdev
@ 2021-03-24 17:35             ` Dr. David Alan Gilbert
  2021-03-24 18:50               ` Peter Xu
  1 sibling, 1 reply; 16+ messages in thread
From: Dr. David Alan Gilbert @ 2021-03-24 17:35 UTC (permalink / raw)
  To: Peter Xu
  Cc: Juan Quintela, David Hildenbrand, qemu-devel, Markus Armbruster,
	Paolo Bonzini, Den Lunev, Andrey Gruzdev

* Peter Xu (peterx@redhat.com) wrote:
> On Tue, Mar 23, 2021 at 08:21:43PM +0300, Andrey Gruzdev wrote:
> > > For the long term I think we'd better have a helper:
> > > 
> > >          qemu_put_qio_channel_buffer(QEMUFile *file, QIOChannelBuffer *bioc)
> > > 
> > > So as to hide this flush operation, which is tricky. We'll have two users so
> > > far:
> > > 
> > >          bg_migration_completion
> > >          colo_do_checkpoint_transaction
> > > 
> > > IMHO it'll be nicer if you'd do it in this patch altogether!
> > > 
> > > Thanks,
> > > 
> > Sorry, can't get the idea, what's wrong with the fix.
> 
> I'm fine with the fix, but I've got one patch attached just to show what I
> meant, so without any testing for sure..
> 
> Looks more complicated than I thought, but again I think we should hide that
> buffer flush into another helper to avoid overlooking it.

I was wondering if I was missing the same fflush in postcopy, but I
don't *think* so, although it's a bit round about; before sending the
data I call:

  qemu_savevm_send_postcopy_run(fb)

and that calls qemu_savevm_command_send that ends in a fflish;  which is
non-obvious.

While I'd leave that in there, it might be good to use that same thing.

Dave

> Thanks,
> 
> -- 
> Peter Xu

> From f3004948e2498fb9ac4646a6a5225c58ea3f1623 Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Tue, 23 Mar 2021 14:30:24 -0400
> Subject: [PATCH] migration: Introduce qemu_put_qio_channel_buffer()
> 
> Meanwhile use it in background snapshot code, so as to dump one QEMUFile buffer
> (which is created by QIOChannelBuffer) into another QEMUFile.
> 
> Similar thing should be able to be applied to colo_do_checkpoint_transaction()
> too.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c         | 16 +++++++++-------
>  migration/migration.h         |  1 -
>  migration/qemu-file-channel.c | 14 ++++++++++++++
>  migration/qemu-file-channel.h |  2 ++
>  migration/qemu-file.c         |  4 ++++
>  migration/qemu-file.h         |  1 +
>  6 files changed, 30 insertions(+), 8 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index a5ddf43559e..9d73c236b16 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3243,8 +3243,9 @@ fail:
>   *   RAM has been saved. The caller 'breaks' the loop when this returns.
>   *
>   * @s: Current migration state
> + * @vmstates: The QEMUFile* buffer that contains all the device vmstates
>   */
> -static void bg_migration_completion(MigrationState *s)
> +static void bg_migration_completion(MigrationState *s, QEMUFile *vmstates)
>  {
>      int current_active_state = s->state;
>  
> @@ -3262,7 +3263,7 @@ static void bg_migration_completion(MigrationState *s)
>           * right after the ram content. The device state has been stored into
>           * the temporary buffer before RAM saving started.
>           */
> -        qemu_put_buffer(s->to_dst_file, s->bioc->data, s->bioc->usage);
> +        qemu_put_qio_channel_buffer(s->to_dst_file, vmstates);
>          qemu_fflush(s->to_dst_file);
>      } else if (s->state == MIGRATION_STATUS_CANCELLING) {
>          goto fail;
> @@ -3656,7 +3657,6 @@ static MigIterateState bg_migration_iteration_run(MigrationState *s)
>  
>      res = qemu_savevm_state_iterate(s->to_dst_file, false);
>      if (res > 0) {
> -        bg_migration_completion(s);
>          return MIG_ITERATE_BREAK;
>      }
>  
> @@ -3843,6 +3843,7 @@ static void *bg_migration_thread(void *opaque)
>      MigThrError thr_error;
>      QEMUFile *fb;
>      bool early_fail = true;
> +    QIOChannelBuffer *bioc;
>  
>      rcu_register_thread();
>      object_ref(OBJECT(s));
> @@ -3859,10 +3860,10 @@ static void *bg_migration_thread(void *opaque)
>       * with vCPUs running and, finally, write stashed non-RAM part of
>       * the vmstate from the buffer to the migration stream.
>       */
> -    s->bioc = qio_channel_buffer_new(128 * 1024);
> -    qio_channel_set_name(QIO_CHANNEL(s->bioc), "vmstate-buffer");
> -    fb = qemu_fopen_channel_output(QIO_CHANNEL(s->bioc));
> -    object_unref(OBJECT(s->bioc));
> +    bioc = qio_channel_buffer_new(128 * 1024);
> +    qio_channel_set_name(QIO_CHANNEL(bioc), "vmstate-buffer");
> +    fb = qemu_fopen_channel_output(QIO_CHANNEL(bioc));
> +    object_unref(OBJECT(bioc));
>  
>      update_iteration_initial_status(s);
>  
> @@ -3935,6 +3936,7 @@ static void *bg_migration_thread(void *opaque)
>          if (iter_state == MIG_ITERATE_SKIP) {
>              continue;
>          } else if (iter_state == MIG_ITERATE_BREAK) {
> +            bg_migration_completion(s, fb);
>              break;
>          }
>  
> diff --git a/migration/migration.h b/migration/migration.h
> index db6708326b1..bdcd74ce084 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -151,7 +151,6 @@ struct MigrationState {
>      QEMUBH *vm_start_bh;
>      QEMUBH *cleanup_bh;
>      QEMUFile *to_dst_file;
> -    QIOChannelBuffer *bioc;
>      /*
>       * Protects to_dst_file pointer.  We need to make sure we won't
>       * yield or hang during the critical section, since this lock will
> diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
> index afc3a7f642a..c1bf71510f8 100644
> --- a/migration/qemu-file-channel.c
> +++ b/migration/qemu-file-channel.c
> @@ -26,6 +26,7 @@
>  #include "qemu-file-channel.h"
>  #include "qemu-file.h"
>  #include "io/channel-socket.h"
> +#include "io/channel-buffer.h"
>  #include "qemu/iov.h"
>  #include "qemu/yank.h"
>  
> @@ -196,3 +197,16 @@ QEMUFile *qemu_fopen_channel_output(QIOChannel *ioc)
>      object_ref(OBJECT(ioc));
>      return qemu_fopen_ops(ioc, &channel_output_ops);
>  }
> +
> +/*
> + * Splice all the data in `buffer' into `dst'.  Note that `buffer' must points
> + * to a QEMUFile that was created with qemu_fopen_channel_output().
> + */
> +void qemu_put_qio_channel_buffer(QEMUFile *dst, QEMUFile *buffer)
> +{
> +    QIOChannelBuffer *bioc = QIO_CHANNEL_BUFFER(qemu_file_get_opaque(buffer));
> +
> +    /* Make sure data cached in QEMUFile flushed to QIOChannel buffers */
> +    qemu_fflush(buffer);
> +    qemu_put_buffer(dst, bioc->data, bioc->usage);
> +}
> diff --git a/migration/qemu-file-channel.h b/migration/qemu-file-channel.h
> index 0028a09eb61..5f165527616 100644
> --- a/migration/qemu-file-channel.h
> +++ b/migration/qemu-file-channel.h
> @@ -29,4 +29,6 @@
>  
>  QEMUFile *qemu_fopen_channel_input(QIOChannel *ioc);
>  QEMUFile *qemu_fopen_channel_output(QIOChannel *ioc);
> +void qemu_put_qio_channel_buffer(QEMUFile *dst, QEMUFile *buffer);
> +
>  #endif
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index d6e03dbc0e0..317f98fc8f5 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -112,6 +112,10 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops)
>      return f;
>  }
>  
> +void *qemu_file_get_opaque(QEMUFile *f)
> +{
> +    return f->opaque;
> +}
>  
>  void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks)
>  {
> diff --git a/migration/qemu-file.h b/migration/qemu-file.h
> index a9b6d6ccb7d..30c8ec855ac 100644
> --- a/migration/qemu-file.h
> +++ b/migration/qemu-file.h
> @@ -161,6 +161,7 @@ int qemu_file_shutdown(QEMUFile *f);
>  QEMUFile *qemu_file_get_return_path(QEMUFile *f);
>  void qemu_fflush(QEMUFile *f);
>  void qemu_file_set_blocking(QEMUFile *f, bool block);
> +void *qemu_file_get_opaque(QEMUFile *f);
>  
>  void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
>  void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
> -- 
> 2.26.2
> 

-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread
  2021-03-24 17:35             ` Dr. David Alan Gilbert
@ 2021-03-24 18:50               ` Peter Xu
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Xu @ 2021-03-24 18:50 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Juan Quintela, David Hildenbrand, qemu-devel, Markus Armbruster,
	Paolo Bonzini, Den Lunev, Andrey Gruzdev

On Wed, Mar 24, 2021 at 05:35:44PM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Tue, Mar 23, 2021 at 08:21:43PM +0300, Andrey Gruzdev wrote:
> > > > For the long term I think we'd better have a helper:
> > > > 
> > > >          qemu_put_qio_channel_buffer(QEMUFile *file, QIOChannelBuffer *bioc)
> > > > 
> > > > So as to hide this flush operation, which is tricky. We'll have two users so
> > > > far:
> > > > 
> > > >          bg_migration_completion
> > > >          colo_do_checkpoint_transaction
> > > > 
> > > > IMHO it'll be nicer if you'd do it in this patch altogether!
> > > > 
> > > > Thanks,
> > > > 
> > > Sorry, can't get the idea, what's wrong with the fix.
> > 
> > I'm fine with the fix, but I've got one patch attached just to show what I
> > meant, so without any testing for sure..
> > 
> > Looks more complicated than I thought, but again I think we should hide that
> > buffer flush into another helper to avoid overlooking it.
> 
> I was wondering if I was missing the same fflush in postcopy, but I
> don't *think* so, although it's a bit round about; before sending the
> data I call:
> 
>   qemu_savevm_send_postcopy_run(fb)
> 
> and that calls qemu_savevm_command_send that ends in a fflish;  which is
> non-obvious.
> 
> While I'd leave that in there, it might be good to use that same thing.

Right, I was grepping qemu_put_buffer() previously, so as to easily got
qemu_savevm_send_packaged() overlooked..

Maybe I can make it a small patch series after the snapshot fixes got in.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 0/3] migration: Fixes to the 'background-snapshot' code
  2021-03-24 15:41     ` Peter Xu
@ 2021-03-25  9:51       ` Andrey Gruzdev
  0 siblings, 0 replies; 16+ messages in thread
From: Andrey Gruzdev @ 2021-03-25  9:51 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Den Lunev, Eric Blake, Paolo Bonzini, Juan Quintela,
	Dr . David Alan Gilbert, Markus Armbruster, David Hildenbrand

On 24.03.2021 18:41, Peter Xu wrote:
> On Wed, Mar 24, 2021 at 11:09:27AM +0300, Andrey Gruzdev wrote:
>>> I'm also looking into introducing UFFD_FEATURE_WP_UNALLOCATED so as to
>>> wr-protect page holes too for a uffd-wp region when the feature bit is set.
>>> With that feature we should be able to avoid pre-fault as what we do in the
>>> last patch of this series.  However even if that can work out, we'll still need
>>> this for old kernel anyways.
>> I'm curious this new feature is based on adding wr-protection at the level of VMAs,
>> so we won't miss write faults for missing pages?
> I think we can do it with multiple ways.
>
> The most efficient one would be wr-protect the range during uffd-wp
> registration, so as you said it'll be per-vma attribute.  However that'll
> change the general semantics of uffd-wp as normally we need registration and
> explicit wr-protect.  Then it'll still be pte-based for faulted in pages (the
> ones we wr-protected during registration will still be), however for the rest
> it'll become vma-based.  It's indeed a bit confusing.
>
> The other way is we can fault in zero page during UFFDIO_WRITEPROTECT.  However
> that's less efficient, since it's close to pre-fault on read but it's just
> slightly more cleaner than doing it in userspace.  When I rethink about this it
> may not worth it to do in kernel if userspace can achieve things similar.
>
> So let's stick with current solution; that idea may need more thoughts..
>
> Thanks,
>
Agree, let's stick with current solution. For the future I think having 
a registration
flag like WP_MISSING to induce per-vma wr-protection is not a bad choice.

The reason is that usage of UFFDIO_WRITEPROTECT ioctl is often 
asymmetrical; we usually
write-protect the whole registration range but un-protect by small chunks.

So if we stay with current current symmetric protect/un-protect API but 
add the registration
flag to handle protection for unpopulated pages - that may be worth to do.

-- 
Andrey Gruzdev, Principal Engineer
Virtuozzo GmbH  +7-903-247-6397
                 virtuzzo.com



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-03-25 10:08 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-19 14:52 [PATCH v1 0/3] migration: Fixes to the 'background-snapshot' code Andrey Gruzdev
2021-03-19 14:52 ` [PATCH v1 1/3] migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread Andrey Gruzdev
2021-03-22 20:17   ` Peter Xu
2021-03-23  7:51     ` Andrey Gruzdev
2021-03-23 14:54       ` Peter Xu
2021-03-23 17:21         ` Andrey Gruzdev
2021-03-23 18:35           ` Peter Xu
2021-03-24  7:48             ` Andrey Gruzdev
2021-03-24 17:35             ` Dr. David Alan Gilbert
2021-03-24 18:50               ` Peter Xu
2021-03-19 14:52 ` [PATCH v1 2/3] migration: Inhibit virtio-balloon for the duration of background snapshot Andrey Gruzdev
2021-03-19 14:52 ` [PATCH v1 3/3] migration: Pre-fault memory before starting background snasphot Andrey Gruzdev
2021-03-23 22:21 ` [PATCH v1 0/3] migration: Fixes to the 'background-snapshot' code Peter Xu
2021-03-24  8:09   ` Andrey Gruzdev
2021-03-24 15:41     ` Peter Xu
2021-03-25  9:51       ` Andrey Gruzdev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).