All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2021-07-07 14:06 David Hildenbrand
  2021-07-07 14:06 ` [PATCH v1 1/2] migration/postcopy-ram: define type for "struct PostcopyNotifyData" David Hildenbrand
  2021-07-07 14:06 ` [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT David Hildenbrand
  0 siblings, 2 replies; 20+ messages in thread
From: David Hildenbrand @ 2021-07-07 14:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Michael S. Tsirkin, David Hildenbrand,
	Dr. David Alan Gilbert, Peter Xu, Wei Wang, Alexander Duyck,
	Philippe Mathieu-Daudé

Working on getting migration for virtio-mem completely right [1] I realized
that virtio-balloon with VIRTIO_BALLOON_F_FREE_PAGE_HINT paired with
postcopy might be shaky. Actually testing it, I directly found two issues,
one of both being far from trivial to fix.

Let's disallow postcopy with "free-page-hint=on".

[1] https://lkml.kernel.org/r/20210616162940.28630-1-david@redhat.com

David Hildenbrand (2):
  migration/postcopy-ram: define type for "struct PostcopyNotifyData"
  virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT

 hw/virtio/vhost-user.c             |  2 +-
 hw/virtio/virtio-balloon.c         | 26 ++++++++++++++++++++++++++
 include/hw/virtio/virtio-balloon.h |  1 +
 migration/postcopy-ram.c           |  2 +-
 migration/postcopy-ram.h           |  4 ++--
 5 files changed, 31 insertions(+), 4 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v1 1/2] migration/postcopy-ram: define type for "struct PostcopyNotifyData"
  2021-07-07 14:06 [PATCH v1 0/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT David Hildenbrand
@ 2021-07-07 14:06 ` David Hildenbrand
  2021-07-07 20:30   ` Pankaj Gupta
  2021-07-07 14:06 ` [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT David Hildenbrand
  1 sibling, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2021-07-07 14:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Michael S . Tsirkin, David Hildenbrand,
	Dr. David Alan Gilbert, Alexander Duyck, Wei Wang, Peter Xu,
	Philippe Mathieu-Daudé

Let's define a proper type, just as we do for PrecopyNotifyData already.

Cc: Wei Wang <wei.w.wang@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/vhost-user.c   | 2 +-
 migration/postcopy-ram.c | 2 +-
 migration/postcopy-ram.h | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 1ac4a2ebec..42dad711bf 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1827,9 +1827,9 @@ static int vhost_user_postcopy_end(struct vhost_dev *dev, Error **errp)
 static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
                                         void *opaque)
 {
-    struct PostcopyNotifyData *pnd = opaque;
     struct vhost_user *u = container_of(notifier, struct vhost_user,
                                          postcopy_notifier);
+    PostcopyNotifyData *pnd = opaque;
     struct vhost_dev *dev = u->dev;
 
     switch (pnd->reason) {
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 2e9697bdd2..ee4e1c7cf5 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -69,7 +69,7 @@ void postcopy_remove_notifier(NotifierWithReturn *n)
 
 int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp)
 {
-    struct PostcopyNotifyData pnd;
+    PostcopyNotifyData pnd;
     pnd.reason = reason;
     pnd.errp = errp;
 
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 6d2b3cf124..01829c3562 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -125,10 +125,10 @@ enum PostcopyNotifyReason {
     POSTCOPY_NOTIFY_INBOUND_END,
 };
 
-struct PostcopyNotifyData {
+typedef struct PostcopyNotifyData {
     enum PostcopyNotifyReason reason;
     Error **errp;
-};
+} PostcopyNotifyData;
 
 void postcopy_add_notifier(NotifierWithReturn *nn);
 void postcopy_remove_notifier(NotifierWithReturn *n);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2021-07-07 14:06 [PATCH v1 0/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT David Hildenbrand
  2021-07-07 14:06 ` [PATCH v1 1/2] migration/postcopy-ram: define type for "struct PostcopyNotifyData" David Hildenbrand
@ 2021-07-07 14:06 ` David Hildenbrand
  2021-07-07 18:02   ` Peter Xu
  2021-07-07 19:05   ` Michael S. Tsirkin
  1 sibling, 2 replies; 20+ messages in thread
From: David Hildenbrand @ 2021-07-07 14:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S . Tsirkin, David Hildenbrand, qemu-stable,
	Alexander Duyck, Dr. David Alan Gilbert, Juan Quintela, Wei Wang,
	Peter Xu, Philippe Mathieu-Daudé

Postcopy never worked properly with 'free-page-hint=on', as there are
at least two issues:

1) With postcopy, the guest will never receive a VIRTIO_BALLOON_CMD_ID_DONE
   and consequently won't release free pages back to the OS once
   migration finishes.

   The issue is that for postcopy, we won't do a final bitmap sync while
   the guest is stopped on the source and
   virtio_balloon_free_page_hint_notify() will only call
   virtio_balloon_free_page_done() on the source during
   PRECOPY_NOTIFY_CLEANUP, after the VM state was already migrated to
   the destination.

2) Once the VM touches a page on the destination that has been excluded
   from migration on the source via qemu_guest_free_page_hint() while
   postcopy is active, that thread will stall until postcopy finishes
   and all threads are woken up. (with older Linux kernels that won't
   retry faults when woken up via userfaultfd, we might actually get a
   SEGFAULT)

   The issue is that the source will refuse to migrate any pages that
   are not marked as dirty in the dirty bmap -- for example, because the
   page might just have been sent. Consequently, the faulting thread will
   stall, waiting for the page to be migrated -- which could take quite
   a while and result in guest OS issues.

While we could fix 1), for example, by calling
virtio_balloon_free_page_done() via pre_save callbacks of the
vmstate, 2) is mostly impossible to fix without additional tracking,
such that we can actually identify these hinted pages and handle
them accordingly.

As it never worked properly, let's disable it via the postcopy notifier on
the destination. Trying to set "migrate_set_capability postcopy-ram on"
on the destination now results in "virtio-balloon: 'free-page-hint' does
not support postcopy Error: Postcopy is not supported".

Note 1: We could let qemu_guest_free_page_hint() mark postcopy
        as broken once actually clearing bits on the source. However, it's
        harder to realize as we can race with users starting postcopy
        and we cannot produce an expressive error message easily.

Note 2: virtio-mem has similar issues, however, access to "unplugged"
        memory by the guest is very rare and we would have to be very
        lucky for it to happen during migration. The spec states
        "The driver SHOULD NOT read from unplugged memory blocks ..."
        and "The driver MUST NOT write to unplugged memory blocks".
        virtio-mem will move away from virtio_balloon_free_page_done()
        soon and handle this case explicitly on the destination.

Fixes: c13c4153f76d ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
Cc: qemu-stable@nongnu.org
Cc: Wei Wang <wei.w.wang@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/virtio-balloon.c         | 26 ++++++++++++++++++++++++++
 include/hw/virtio/virtio-balloon.h |  1 +
 2 files changed, 27 insertions(+)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 4b5d9e5e50..d0c9dc677c 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -30,6 +30,7 @@
 #include "trace.h"
 #include "qemu/error-report.h"
 #include "migration/misc.h"
+#include "migration/postcopy-ram.h"
 
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
@@ -692,6 +693,28 @@ virtio_balloon_free_page_hint_notify(NotifierWithReturn *n, void *data)
     return 0;
 }
 
+
+static int virtio_balloon_postcopy_notify(NotifierWithReturn *n, void *opaque)
+{
+    VirtIOBalloon *dev = container_of(n, VirtIOBalloon, postcopy_notifier);
+    PostcopyNotifyData *pnd = opaque;
+
+    /* We register the notifier only with 'free-page-hint=on' for now. */
+    g_assert(virtio_has_feature(dev->host_features,
+                                VIRTIO_BALLOON_F_FREE_PAGE_HINT));
+
+    /*
+     * Pages hinted via qemu_guest_free_page_hint() are cleared from the dirty
+     * bitmap and will not get migrated, especially also not when the postcopy
+     * destination starts using them and requests migration from the source; the
+     * faulting thread will stall until postcopy migration finishes and
+     * all threads are woken up.
+     */
+    error_setg(pnd->errp,
+               "virtio-balloon: 'free-page-hint' does not support postcopy");
+    return -ENOENT;
+}
+
 static size_t virtio_balloon_config_size(VirtIOBalloon *s)
 {
     uint64_t features = s->host_features;
@@ -911,6 +934,7 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
         s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE,
                                            virtio_balloon_handle_free_page_vq);
         precopy_add_notifier(&s->free_page_hint_notify);
+        postcopy_add_notifier(&s->postcopy_notifier);
 
         object_ref(OBJECT(s->iothread));
         s->free_page_bh = aio_bh_new(iothread_get_aio_context(s->iothread),
@@ -935,6 +959,7 @@ static void virtio_balloon_device_unrealize(DeviceState *dev)
         object_unref(OBJECT(s->iothread));
         virtio_balloon_free_page_stop(s);
         precopy_remove_notifier(&s->free_page_hint_notify);
+        postcopy_remove_notifier(&s->postcopy_notifier);
     }
     balloon_stats_destroy_timer(s);
     qemu_remove_balloon_handler(s);
@@ -1008,6 +1033,7 @@ static void virtio_balloon_instance_init(Object *obj)
     qemu_cond_init(&s->free_page_cond);
     s->free_page_hint_cmd_id = VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN;
     s->free_page_hint_notify.notify = virtio_balloon_free_page_hint_notify;
+    s->postcopy_notifier.notify = virtio_balloon_postcopy_notify;
 
     object_property_add(obj, "guest-stats", "guest statistics",
                         balloon_stats_get_all, NULL, NULL, s);
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index 5139cf8ab6..d0d5b793b9 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -65,6 +65,7 @@ struct VirtIOBalloon {
      */
     bool block_iothread;
     NotifierWithReturn free_page_hint_notify;
+    NotifierWithReturn postcopy_notifier;
     int64_t stats_last_update;
     int64_t stats_poll_interval;
     uint32_t host_features;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2021-07-07 14:06 ` [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT David Hildenbrand
@ 2021-07-07 18:02   ` Peter Xu
  2021-07-07 18:57     ` David Hildenbrand
  2021-07-07 19:05   ` Michael S. Tsirkin
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Xu @ 2021-07-07 18:02 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Michael S . Tsirkin, Juan Quintela, qemu-devel, Alexander Duyck,
	qemu-stable, Wei Wang, Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote:
> As it never worked properly, let's disable it via the postcopy notifier on
> the destination. Trying to set "migrate_set_capability postcopy-ram on"
> on the destination now results in "virtio-balloon: 'free-page-hint' does
> not support postcopy Error: Postcopy is not supported".

Would it be possible to do this in reversed order?  Say, dynamically disable
free-page-hinting if postcopy capability is set when migration starts? Perhaps
it can also be re-enabled automatically when migration completes?

I see postcopy a "functional" feature and free-page-hint a "performance"
feature, from that pov IMHO it's better to not block function for performance.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2021-07-07 18:02   ` Peter Xu
@ 2021-07-07 18:57     ` David Hildenbrand
  2021-07-07 19:07       ` Michael S. Tsirkin
  2021-07-07 20:08       ` Peter Xu
  0 siblings, 2 replies; 20+ messages in thread
From: David Hildenbrand @ 2021-07-07 18:57 UTC (permalink / raw)
  To: Peter Xu
  Cc: Michael S . Tsirkin, Juan Quintela, qemu-devel, Alexander Duyck,
	qemu-stable, Wei Wang, Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On 07.07.21 20:02, Peter Xu wrote:
> On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote:
>> As it never worked properly, let's disable it via the postcopy notifier on
>> the destination. Trying to set "migrate_set_capability postcopy-ram on"
>> on the destination now results in "virtio-balloon: 'free-page-hint' does
>> not support postcopy Error: Postcopy is not supported".
> 
> Would it be possible to do this in reversed order?  Say, dynamically disable
> free-page-hinting if postcopy capability is set when migration starts? Perhaps
> it can also be re-enabled automatically when migration completes?

I remember that this might be quite racy. We would have to make sure 
that no hinting happens before we enable the capability.

As soon as we messed with the dirty bitmap (during precopy), postcopy is 
no longer safe. As noted in the patch, the only runtime alternative is 
to disable postcopy as soon as we actually do clear a bit. 
Alternatively, we could ignore any hints if the postcopy capability was 
enabled.

Whatever we do, we have to make sure that a user cannot trick the system 
into an inconsistent state. Like enabling hinting, starting migration, 
then enabling the postcopy capability and kicking of postcopy. I did not 
check if we allow for that, though.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2021-07-07 14:06 ` [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT David Hildenbrand
  2021-07-07 18:02   ` Peter Xu
@ 2021-07-07 19:05   ` Michael S. Tsirkin
  2021-07-07 19:14     ` David Hildenbrand
  1 sibling, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2021-07-07 19:05 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Juan Quintela, qemu-devel, Alexander Duyck, qemu-stable,
	Wei Wang, Peter Xu, Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote:
> Postcopy never worked properly with 'free-page-hint=on', as there are
> at least two issues:
> 
> 1) With postcopy, the guest will never receive a VIRTIO_BALLOON_CMD_ID_DONE
>    and consequently won't release free pages back to the OS once
>    migration finishes.
> 
>    The issue is that for postcopy, we won't do a final bitmap sync while
>    the guest is stopped on the source and
>    virtio_balloon_free_page_hint_notify() will only call
>    virtio_balloon_free_page_done() on the source during
>    PRECOPY_NOTIFY_CLEANUP, after the VM state was already migrated to
>    the destination.
> 
> 2) Once the VM touches a page on the destination that has been excluded
>    from migration on the source via qemu_guest_free_page_hint() while
>    postcopy is active, that thread will stall until postcopy finishes
>    and all threads are woken up. (with older Linux kernels that won't
>    retry faults when woken up via userfaultfd, we might actually get a
>    SEGFAULT)
> 
>    The issue is that the source will refuse to migrate any pages that
>    are not marked as dirty in the dirty bmap -- for example, because the
>    page might just have been sent. Consequently, the faulting thread will
>    stall, waiting for the page to be migrated -- which could take quite
>    a while and result in guest OS issues.

OK so if source gets a request for a page which is not dirty
it does not respond immediately? Why not just teach it to
respond? It would seem that if destination wants a page we
should just give it to the destination ...


> 
> While we could fix 1), for example, by calling
> virtio_balloon_free_page_done() via pre_save callbacks of the
> vmstate, 2) is mostly impossible to fix without additional tracking,
> such that we can actually identify these hinted pages and handle
> them accordingly.
> As it never worked properly, let's disable it via the postcopy notifier on
> the destination. Trying to set "migrate_set_capability postcopy-ram on"
> on the destination now results in "virtio-balloon: 'free-page-hint' does
> not support postcopy Error: Postcopy is not supported".
> Note 1: We could let qemu_guest_free_page_hint() mark postcopy
>         as broken once actually clearing bits on the source. However, it's
>         harder to realize as we can race with users starting postcopy
>         and we cannot produce an expressive error message easily.


How about the reverse? Ignore qemu_guest_free_page_hint if postcopy
started?  Seems better than making it user/guest visible ..

> Note 2: virtio-mem has similar issues, however, access to "unplugged"
>         memory by the guest is very rare and we would have to be very
>         lucky for it to happen during migration. The spec states
>         "The driver SHOULD NOT read from unplugged memory blocks ..."
>         and "The driver MUST NOT write to unplugged memory blocks".
>         virtio-mem will move away from virtio_balloon_free_page_done()
>         soon and handle this case explicitly on the destination.
> 
> Fixes: c13c4153f76d ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")

OK it's not too bad, but I wonder whether above aideas have been
explored.

> Cc: qemu-stable@nongnu.org
> Cc: Wei Wang <wei.w.wang@intel.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
> Cc: Alexander Duyck <alexander.duyck@gmail.com>
> Cc: Juan Quintela <quintela@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Peter Xu <peterx@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/virtio/virtio-balloon.c         | 26 ++++++++++++++++++++++++++
>  include/hw/virtio/virtio-balloon.h |  1 +
>  2 files changed, 27 insertions(+)
> 
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 4b5d9e5e50..d0c9dc677c 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -30,6 +30,7 @@
>  #include "trace.h"
>  #include "qemu/error-report.h"
>  #include "migration/misc.h"
> +#include "migration/postcopy-ram.h"
>  
>  #include "hw/virtio/virtio-bus.h"
>  #include "hw/virtio/virtio-access.h"
> @@ -692,6 +693,28 @@ virtio_balloon_free_page_hint_notify(NotifierWithReturn *n, void *data)
>      return 0;
>  }
>  
> +
> +static int virtio_balloon_postcopy_notify(NotifierWithReturn *n, void *opaque)
> +{
> +    VirtIOBalloon *dev = container_of(n, VirtIOBalloon, postcopy_notifier);
> +    PostcopyNotifyData *pnd = opaque;
> +
> +    /* We register the notifier only with 'free-page-hint=on' for now. */
> +    g_assert(virtio_has_feature(dev->host_features,
> +                                VIRTIO_BALLOON_F_FREE_PAGE_HINT));
> +
> +    /*
> +     * Pages hinted via qemu_guest_free_page_hint() are cleared from the dirty
> +     * bitmap and will not get migrated, especially also not when the postcopy
> +     * destination starts using them and requests migration from the source; the
> +     * faulting thread will stall until postcopy migration finishes and
> +     * all threads are woken up.
> +     */
> +    error_setg(pnd->errp,
> +               "virtio-balloon: 'free-page-hint' does not support postcopy");
> +    return -ENOENT;
> +}
> +
>  static size_t virtio_balloon_config_size(VirtIOBalloon *s)
>  {
>      uint64_t features = s->host_features;
> @@ -911,6 +934,7 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
>          s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE,
>                                             virtio_balloon_handle_free_page_vq);
>          precopy_add_notifier(&s->free_page_hint_notify);
> +        postcopy_add_notifier(&s->postcopy_notifier);
>  
>          object_ref(OBJECT(s->iothread));
>          s->free_page_bh = aio_bh_new(iothread_get_aio_context(s->iothread),
> @@ -935,6 +959,7 @@ static void virtio_balloon_device_unrealize(DeviceState *dev)
>          object_unref(OBJECT(s->iothread));
>          virtio_balloon_free_page_stop(s);
>          precopy_remove_notifier(&s->free_page_hint_notify);
> +        postcopy_remove_notifier(&s->postcopy_notifier);
>      }
>      balloon_stats_destroy_timer(s);
>      qemu_remove_balloon_handler(s);
> @@ -1008,6 +1033,7 @@ static void virtio_balloon_instance_init(Object *obj)
>      qemu_cond_init(&s->free_page_cond);
>      s->free_page_hint_cmd_id = VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN;
>      s->free_page_hint_notify.notify = virtio_balloon_free_page_hint_notify;
> +    s->postcopy_notifier.notify = virtio_balloon_postcopy_notify;
>  
>      object_property_add(obj, "guest-stats", "guest statistics",
>                          balloon_stats_get_all, NULL, NULL, s);
> diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
> index 5139cf8ab6..d0d5b793b9 100644
> --- a/include/hw/virtio/virtio-balloon.h
> +++ b/include/hw/virtio/virtio-balloon.h
> @@ -65,6 +65,7 @@ struct VirtIOBalloon {
>       */
>      bool block_iothread;
>      NotifierWithReturn free_page_hint_notify;
> +    NotifierWithReturn postcopy_notifier;
>      int64_t stats_last_update;
>      int64_t stats_poll_interval;
>      uint32_t host_features;
> -- 
> 2.31.1



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2021-07-07 18:57     ` David Hildenbrand
@ 2021-07-07 19:07       ` Michael S. Tsirkin
  2021-07-07 20:08       ` Peter Xu
  1 sibling, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2021-07-07 19:07 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Juan Quintela, qemu-devel, Alexander Duyck, qemu-stable,
	Wei Wang, Peter Xu, Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On Wed, Jul 07, 2021 at 08:57:29PM +0200, David Hildenbrand wrote:
> On 07.07.21 20:02, Peter Xu wrote:
> > On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote:
> > > As it never worked properly, let's disable it via the postcopy notifier on
> > > the destination. Trying to set "migrate_set_capability postcopy-ram on"
> > > on the destination now results in "virtio-balloon: 'free-page-hint' does
> > > not support postcopy Error: Postcopy is not supported".
> > 
> > Would it be possible to do this in reversed order?  Say, dynamically disable
> > free-page-hinting if postcopy capability is set when migration starts? Perhaps
> > it can also be re-enabled automatically when migration completes?
> 
> I remember that this might be quite racy. We would have to make sure that no
> hinting happens before we enable the capability.
> 
> As soon as we messed with the dirty bitmap (during precopy), postcopy is no
> longer safe. As noted in the patch, the only runtime alternative is to
> disable postcopy as soon as we actually do clear a bit. Alternatively, we
> could ignore any hints if the postcopy capability was enabled.
> 
> Whatever we do, we have to make sure that a user cannot trick the system
> into an inconsistent state. Like enabling hinting, starting migration, then
> enabling the postcopy capability and kicking of postcopy. I did not check if
> we allow for that, though.

What bothers me with limitations like this is we train users about
this lack of orthogonality, it's then very hard to retrain them that
a given feature is safe to use.

> -- 
> Thanks,
> 
> David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2021-07-07 19:05   ` Michael S. Tsirkin
@ 2021-07-07 19:14     ` David Hildenbrand
  2021-07-07 19:19       ` Michael S. Tsirkin
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2021-07-07 19:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Juan Quintela, qemu-devel, Alexander Duyck, qemu-stable,
	Wei Wang, Peter Xu, Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On 07.07.21 21:05, Michael S. Tsirkin wrote:
> On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote:
>> Postcopy never worked properly with 'free-page-hint=on', as there are
>> at least two issues:
>>
>> 1) With postcopy, the guest will never receive a VIRTIO_BALLOON_CMD_ID_DONE
>>     and consequently won't release free pages back to the OS once
>>     migration finishes.
>>
>>     The issue is that for postcopy, we won't do a final bitmap sync while
>>     the guest is stopped on the source and
>>     virtio_balloon_free_page_hint_notify() will only call
>>     virtio_balloon_free_page_done() on the source during
>>     PRECOPY_NOTIFY_CLEANUP, after the VM state was already migrated to
>>     the destination.
>>
>> 2) Once the VM touches a page on the destination that has been excluded
>>     from migration on the source via qemu_guest_free_page_hint() while
>>     postcopy is active, that thread will stall until postcopy finishes
>>     and all threads are woken up. (with older Linux kernels that won't
>>     retry faults when woken up via userfaultfd, we might actually get a
>>     SEGFAULT)
>>
>>     The issue is that the source will refuse to migrate any pages that
>>     are not marked as dirty in the dirty bmap -- for example, because the
>>     page might just have been sent. Consequently, the faulting thread will
>>     stall, waiting for the page to be migrated -- which could take quite
>>     a while and result in guest OS issues.
> 
> OK so if source gets a request for a page which is not dirty
> it does not respond immediately? Why not just teach it to
> respond? It would seem that if destination wants a page we
> should just give it to the destination ...

The source does not know if a page has already been sent (e.g., via the 
background migration thread that moves all data over) vs. the page has 
not been send because the page was hinted. This is the part where we'd 
need additional tracking on the source to actually know that.

We must not send a page twice, otherwise bad things can happen when 
placing pages that already have been migrated, because that scenario can 
easily happen with ordinary postcopy (page has already been sent and 
we're dealing with a stale request from the destination).

> 
> 
>>
>> While we could fix 1), for example, by calling
>> virtio_balloon_free_page_done() via pre_save callbacks of the
>> vmstate, 2) is mostly impossible to fix without additional tracking,
>> such that we can actually identify these hinted pages and handle
>> them accordingly.
>> As it never worked properly, let's disable it via the postcopy notifier on
>> the destination. Trying to set "migrate_set_capability postcopy-ram on"
>> on the destination now results in "virtio-balloon: 'free-page-hint' does
>> not support postcopy Error: Postcopy is not supported".
>> Note 1: We could let qemu_guest_free_page_hint() mark postcopy
>>          as broken once actually clearing bits on the source. However, it's
>>          harder to realize as we can race with users starting postcopy
>>          and we cannot produce an expressive error message easily.
> 
> 
> How about the reverse? Ignore qemu_guest_free_page_hint if postcopy
> started?  Seems better than making it user/guest visible ..

Might be an option, but we let the user configure something that does 
not work in combination ... essentially ignoring one of both user 
settings. Also not perfect IMHO.

> 
>> Note 2: virtio-mem has similar issues, however, access to "unplugged"
>>          memory by the guest is very rare and we would have to be very
>>          lucky for it to happen during migration. The spec states
>>          "The driver SHOULD NOT read from unplugged memory blocks ..."
>>          and "The driver MUST NOT write to unplugged memory blocks".
>>          virtio-mem will move away from virtio_balloon_free_page_done()
>>          soon and handle this case explicitly on the destination.
>>
>> Fixes: c13c4153f76d ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
> 
> OK it's not too bad, but I wonder whether above aideas have been
> explored.

TBH, it's been broken all along and I'd rather have a simple fix. If 
somebody ever cares about this, we could investigate making it work (or 
making postcopy overrule free page hinting). But I'm open for suggestions.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2021-07-07 19:14     ` David Hildenbrand
@ 2021-07-07 19:19       ` Michael S. Tsirkin
  2021-07-07 19:47         ` David Hildenbrand
  0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2021-07-07 19:19 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Juan Quintela, qemu-devel, Alexander Duyck, qemu-stable,
	Wei Wang, Peter Xu, Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On Wed, Jul 07, 2021 at 09:14:00PM +0200, David Hildenbrand wrote:
> On 07.07.21 21:05, Michael S. Tsirkin wrote:
> > On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote:
> > > Postcopy never worked properly with 'free-page-hint=on', as there are
> > > at least two issues:
> > > 
> > > 1) With postcopy, the guest will never receive a VIRTIO_BALLOON_CMD_ID_DONE
> > >     and consequently won't release free pages back to the OS once
> > >     migration finishes.
> > > 
> > >     The issue is that for postcopy, we won't do a final bitmap sync while
> > >     the guest is stopped on the source and
> > >     virtio_balloon_free_page_hint_notify() will only call
> > >     virtio_balloon_free_page_done() on the source during
> > >     PRECOPY_NOTIFY_CLEANUP, after the VM state was already migrated to
> > >     the destination.
> > > 
> > > 2) Once the VM touches a page on the destination that has been excluded
> > >     from migration on the source via qemu_guest_free_page_hint() while
> > >     postcopy is active, that thread will stall until postcopy finishes
> > >     and all threads are woken up. (with older Linux kernels that won't
> > >     retry faults when woken up via userfaultfd, we might actually get a
> > >     SEGFAULT)
> > > 
> > >     The issue is that the source will refuse to migrate any pages that
> > >     are not marked as dirty in the dirty bmap -- for example, because the
> > >     page might just have been sent. Consequently, the faulting thread will
> > >     stall, waiting for the page to be migrated -- which could take quite
> > >     a while and result in guest OS issues.
> > 
> > OK so if source gets a request for a page which is not dirty
> > it does not respond immediately? Why not just teach it to
> > respond? It would seem that if destination wants a page we
> > should just give it to the destination ...
> 
> The source does not know if a page has already been sent (e.g., via the
> background migration thread that moves all data over) vs. the page has not
> been send because the page was hinted. This is the part where we'd need
> additional tracking on the source to actually know that.
> 
> We must not send a page twice, otherwise bad things can happen when placing
> pages that already have been migrated, because that scenario can easily
> happen with ordinary postcopy (page has already been sent and we're dealing
> with a stale request from the destination).

OK let me get this straight

A. source sends page
B. destination requests page
C. destination changes page
D. source sends page
E. destination overwrites page

this is what you are worried about right?

the fix is to mark page clean in A.
then in D to not send page if it's clean?

And the problem with hinting is this:

A. page is marked clean
B. destination requests page
C. destination changes page
D. source sends page <- does not happen, page is clean!
E. destination overwrites page


did I get it right?


> > 
> > 
> > > 
> > > While we could fix 1), for example, by calling
> > > virtio_balloon_free_page_done() via pre_save callbacks of the
> > > vmstate, 2) is mostly impossible to fix without additional tracking,
> > > such that we can actually identify these hinted pages and handle
> > > them accordingly.
> > > As it never worked properly, let's disable it via the postcopy notifier on
> > > the destination. Trying to set "migrate_set_capability postcopy-ram on"
> > > on the destination now results in "virtio-balloon: 'free-page-hint' does
> > > not support postcopy Error: Postcopy is not supported".
> > > Note 1: We could let qemu_guest_free_page_hint() mark postcopy
> > >          as broken once actually clearing bits on the source. However, it's
> > >          harder to realize as we can race with users starting postcopy
> > >          and we cannot produce an expressive error message easily.
> > 
> > 
> > How about the reverse? Ignore qemu_guest_free_page_hint if postcopy
> > started?  Seems better than making it user/guest visible ..
> 
> Might be an option, but we let the user configure something that does not
> work in combination ... essentially ignoring one of both user settings. Also
> not perfect IMHO.
> 
> > 
> > > Note 2: virtio-mem has similar issues, however, access to "unplugged"
> > >          memory by the guest is very rare and we would have to be very
> > >          lucky for it to happen during migration. The spec states
> > >          "The driver SHOULD NOT read from unplugged memory blocks ..."
> > >          and "The driver MUST NOT write to unplugged memory blocks".
> > >          virtio-mem will move away from virtio_balloon_free_page_done()
> > >          soon and handle this case explicitly on the destination.
> > > 
> > > Fixes: c13c4153f76d ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
> > 
> > OK it's not too bad, but I wonder whether above aideas have been
> > explored.
> 
> TBH, it's been broken all along and I'd rather have a simple fix. If
> somebody ever cares about this, we could investigate making it work (or
> making postcopy overrule free page hinting). But I'm open for suggestions.
> 
> -- 
> Thanks,
> 
> David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2021-07-07 19:19       ` Michael S. Tsirkin
@ 2021-07-07 19:47         ` David Hildenbrand
  2021-07-07 19:57           ` Michael S. Tsirkin
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2021-07-07 19:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Juan Quintela, qemu-devel, Alexander Duyck, qemu-stable,
	Wei Wang, Peter Xu, Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On 07.07.21 21:19, Michael S. Tsirkin wrote:
> On Wed, Jul 07, 2021 at 09:14:00PM +0200, David Hildenbrand wrote:
>> On 07.07.21 21:05, Michael S. Tsirkin wrote:
>>> On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote:
>>>> Postcopy never worked properly with 'free-page-hint=on', as there are
>>>> at least two issues:
>>>>
>>>> 1) With postcopy, the guest will never receive a VIRTIO_BALLOON_CMD_ID_DONE
>>>>      and consequently won't release free pages back to the OS once
>>>>      migration finishes.
>>>>
>>>>      The issue is that for postcopy, we won't do a final bitmap sync while
>>>>      the guest is stopped on the source and
>>>>      virtio_balloon_free_page_hint_notify() will only call
>>>>      virtio_balloon_free_page_done() on the source during
>>>>      PRECOPY_NOTIFY_CLEANUP, after the VM state was already migrated to
>>>>      the destination.
>>>>
>>>> 2) Once the VM touches a page on the destination that has been excluded
>>>>      from migration on the source via qemu_guest_free_page_hint() while
>>>>      postcopy is active, that thread will stall until postcopy finishes
>>>>      and all threads are woken up. (with older Linux kernels that won't
>>>>      retry faults when woken up via userfaultfd, we might actually get a
>>>>      SEGFAULT)
>>>>
>>>>      The issue is that the source will refuse to migrate any pages that
>>>>      are not marked as dirty in the dirty bmap -- for example, because the
>>>>      page might just have been sent. Consequently, the faulting thread will
>>>>      stall, waiting for the page to be migrated -- which could take quite
>>>>      a while and result in guest OS issues.
>>>
>>> OK so if source gets a request for a page which is not dirty
>>> it does not respond immediately? Why not just teach it to
>>> respond? It would seem that if destination wants a page we
>>> should just give it to the destination ...
>>
>> The source does not know if a page has already been sent (e.g., via the
>> background migration thread that moves all data over) vs. the page has not
>> been send because the page was hinted. This is the part where we'd need
>> additional tracking on the source to actually know that.
>>
>> We must not send a page twice, otherwise bad things can happen when placing
>> pages that already have been migrated, because that scenario can easily
>> happen with ordinary postcopy (page has already been sent and we're dealing
>> with a stale request from the destination).
> 
> OK let me get this straight
> 
> A. source sends page
> B. destination requests page
> C. destination changes page
> D. source sends page
> E. destination overwrites page
> 
> this is what you are worried about right?

IIRC E. is with recent kernels:

E. placing the page fails with -EEXIST and postcopy migration fails

However, the man page (man ioctl_userfaultfd) doesn't describe what is 
actually supposed to happen when double-placing. Could be that it's 
"undefined behavior".

I did not try, though.


This is how it works today:

A. source sends page and marks it clean
B. destination requests page
C. destination receives page and places it
D. source ignores request as page is clean

> 
> the fix is to mark page clean in A.
> then in D to not send page if it's clean?
> 
> And the problem with hinting is this:
> 
> A. page is marked clean
> B. destination requests page
> C. destination changes page
> D. source sends page <- does not happen, page is clean!
> E. destination overwrites page

Simplified it's

A. page is marked clean by hinting code
B. destination requests page
D. source ignores request as page is clean
E. destination stalls until postcopy unregisters uffd


Some thoughts

1. We do have a a recv bitmap where we track received pages on the 
destination (e.g., ramblock_recv_bitmap_test()), however we only use it 
to avoid sending duplicate requests to the hypervisor AFAIKs, and don't 
check it when placing pages.

2. Changing the migration behavior unconditionally on the source will 
break migration to old QEMU binaries that cannot handle this change.

3. I think the current behavior is in place to make debugging easier. If 
only a single instance of a page will ever be migrated from source to 
destination, there cannot be silent data corruption. Further, we avoid 
migrating unnecessarily pages twice.


Maybe Dave and Peter can spot any flaws in my understanding.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2021-07-07 19:47         ` David Hildenbrand
@ 2021-07-07 19:57           ` Michael S. Tsirkin
  2021-07-08  8:19             ` David Hildenbrand
  2021-07-08 19:07             ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2021-07-07 19:57 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Juan Quintela, qemu-devel, Alexander Duyck, qemu-stable,
	Wei Wang, Peter Xu, Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On Wed, Jul 07, 2021 at 09:47:31PM +0200, David Hildenbrand wrote:
> On 07.07.21 21:19, Michael S. Tsirkin wrote:
> > On Wed, Jul 07, 2021 at 09:14:00PM +0200, David Hildenbrand wrote:
> > > On 07.07.21 21:05, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote:
> > > > > Postcopy never worked properly with 'free-page-hint=on', as there are
> > > > > at least two issues:
> > > > > 
> > > > > 1) With postcopy, the guest will never receive a VIRTIO_BALLOON_CMD_ID_DONE
> > > > >      and consequently won't release free pages back to the OS once
> > > > >      migration finishes.
> > > > > 
> > > > >      The issue is that for postcopy, we won't do a final bitmap sync while
> > > > >      the guest is stopped on the source and
> > > > >      virtio_balloon_free_page_hint_notify() will only call
> > > > >      virtio_balloon_free_page_done() on the source during
> > > > >      PRECOPY_NOTIFY_CLEANUP, after the VM state was already migrated to
> > > > >      the destination.
> > > > > 
> > > > > 2) Once the VM touches a page on the destination that has been excluded
> > > > >      from migration on the source via qemu_guest_free_page_hint() while
> > > > >      postcopy is active, that thread will stall until postcopy finishes
> > > > >      and all threads are woken up. (with older Linux kernels that won't
> > > > >      retry faults when woken up via userfaultfd, we might actually get a
> > > > >      SEGFAULT)
> > > > > 
> > > > >      The issue is that the source will refuse to migrate any pages that
> > > > >      are not marked as dirty in the dirty bmap -- for example, because the
> > > > >      page might just have been sent. Consequently, the faulting thread will
> > > > >      stall, waiting for the page to be migrated -- which could take quite
> > > > >      a while and result in guest OS issues.
> > > > 
> > > > OK so if source gets a request for a page which is not dirty
> > > > it does not respond immediately? Why not just teach it to
> > > > respond? It would seem that if destination wants a page we
> > > > should just give it to the destination ...
> > > 
> > > The source does not know if a page has already been sent (e.g., via the
> > > background migration thread that moves all data over) vs. the page has not
> > > been send because the page was hinted. This is the part where we'd need
> > > additional tracking on the source to actually know that.
> > > 
> > > We must not send a page twice, otherwise bad things can happen when placing
> > > pages that already have been migrated, because that scenario can easily
> > > happen with ordinary postcopy (page has already been sent and we're dealing
> > > with a stale request from the destination).
> > 
> > OK let me get this straight
> > 
> > A. source sends page
> > B. destination requests page
> > C. destination changes page
> > D. source sends page
> > E. destination overwrites page
> > 
> > this is what you are worried about right?
> 
> IIRC E. is with recent kernels:
> 
> E. placing the page fails with -EEXIST and postcopy migration fails
> 
> However, the man page (man ioctl_userfaultfd) doesn't describe what is
> actually supposed to happen when double-placing. Could be that it's
> "undefined behavior".
> 
> I did not try, though.
> 
> 
> This is how it works today:
> 
> A. source sends page and marks it clean
> B. destination requests page
> C. destination receives page and places it
> D. source ignores request as page is clean

If it's actually -EEXIST then we could just resend it
and teach destination to ignore -EEXIST errors right?

Will actually make things a bit more robust: destination
handles its own consistency instead of relying on source.



> > 
> > the fix is to mark page clean in A.
> > then in D to not send page if it's clean?
> > 
> > And the problem with hinting is this:
> > 
> > A. page is marked clean
> > B. destination requests page
> > C. destination changes page
> > D. source sends page <- does not happen, page is clean!
> > E. destination overwrites page
> 
> Simplified it's
> 
> A. page is marked clean by hinting code
> B. destination requests page
> D. source ignores request as page is clean
> E. destination stalls until postcopy unregisters uffd
> 
> 
> Some thoughts
> 
> 1. We do have a a recv bitmap where we track received pages on the
> destination (e.g., ramblock_recv_bitmap_test()), however we only use it to
> avoid sending duplicate requests to the hypervisor AFAIKs, and don't check
> it when placing pages.
> 
> 2. Changing the migration behavior unconditionally on the source will break
> migration to old QEMU binaries that cannot handle this change.

We can always make this depend on new machine types.


> 3. I think the current behavior is in place to make debugging easier. If
> only a single instance of a page will ever be migrated from source to
> destination, there cannot be silent data corruption. Further, we avoid
> migrating unnecessarily pages twice.
> 

Likely does not matter much for performance, it seems unlikely that
the race is all that common.

> Maybe Dave and Peter can spot any flaws in my understanding.
> 
> -- 
> Thanks,
> 
> David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2021-07-07 18:57     ` David Hildenbrand
  2021-07-07 19:07       ` Michael S. Tsirkin
@ 2021-07-07 20:08       ` Peter Xu
  2021-07-07 21:22         ` Alexander Duyck
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Xu @ 2021-07-07 20:08 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Michael S . Tsirkin, Juan Quintela, qemu-devel, Alexander Duyck,
	qemu-stable, Wei Wang, Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On Wed, Jul 07, 2021 at 08:57:29PM +0200, David Hildenbrand wrote:
> On 07.07.21 20:02, Peter Xu wrote:
> > On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote:
> > > As it never worked properly, let's disable it via the postcopy notifier on
> > > the destination. Trying to set "migrate_set_capability postcopy-ram on"
> > > on the destination now results in "virtio-balloon: 'free-page-hint' does
> > > not support postcopy Error: Postcopy is not supported".
> > 
> > Would it be possible to do this in reversed order?  Say, dynamically disable
> > free-page-hinting if postcopy capability is set when migration starts? Perhaps
> > it can also be re-enabled automatically when migration completes?
> 
> I remember that this might be quite racy. We would have to make sure that no
> hinting happens before we enable the capability.
> 
> As soon as we messed with the dirty bitmap (during precopy), postcopy is no
> longer safe. As noted in the patch, the only runtime alternative is to
> disable postcopy as soon as we actually do clear a bit. Alternatively, we
> could ignore any hints if the postcopy capability was enabled.

Logically migration capabilities are applied at VM starts, and these
capabilities should be constant during migration (I didn't check if there's a
hard requirement; easy to add that if we want to assure it), and in most cases
for the lifecycle of the vm.

> 
> Whatever we do, we have to make sure that a user cannot trick the system
> into an inconsistent state. Like enabling hinting, starting migration, then
> enabling the postcopy capability and kicking of postcopy. I did not check if
> we allow for that, though.

We could turn free page hinting off when migration starts with postcopy-ram=on,
then re-enable it after migration finishes.  That looks very safe to me.  And I
don't even worry on user trying to mess it up - as that only put their own VM
at risk; that's mostly fine to me.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 1/2] migration/postcopy-ram: define type for "struct PostcopyNotifyData"
  2021-07-07 14:06 ` [PATCH v1 1/2] migration/postcopy-ram: define type for "struct PostcopyNotifyData" David Hildenbrand
@ 2021-07-07 20:30   ` Pankaj Gupta
  0 siblings, 0 replies; 20+ messages in thread
From: Pankaj Gupta @ 2021-07-07 20:30 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Michael S . Tsirkin, Juan Quintela, Qemu Developers,
	Alexander Duyck, Dr. David Alan Gilbert, Wei Wang, Peter Xu,
	Philippe Mathieu-Daudé

> Let's define a proper type, just as we do for PrecopyNotifyData already.
>
> Cc: Wei Wang <wei.w.wang@intel.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
> Cc: Alexander Duyck <alexander.duyck@gmail.com>
> Cc: Juan Quintela <quintela@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Peter Xu <peterx@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/virtio/vhost-user.c   | 2 +-
>  migration/postcopy-ram.c | 2 +-
>  migration/postcopy-ram.h | 4 ++--
>  3 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 1ac4a2ebec..42dad711bf 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -1827,9 +1827,9 @@ static int vhost_user_postcopy_end(struct vhost_dev *dev, Error **errp)
>  static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
>                                          void *opaque)
>  {
> -    struct PostcopyNotifyData *pnd = opaque;
>      struct vhost_user *u = container_of(notifier, struct vhost_user,
>                                           postcopy_notifier);
> +    PostcopyNotifyData *pnd = opaque;
>      struct vhost_dev *dev = u->dev;
>
>      switch (pnd->reason) {
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 2e9697bdd2..ee4e1c7cf5 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -69,7 +69,7 @@ void postcopy_remove_notifier(NotifierWithReturn *n)
>
>  int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp)
>  {
> -    struct PostcopyNotifyData pnd;
> +    PostcopyNotifyData pnd;
>      pnd.reason = reason;
>      pnd.errp = errp;
>
> diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
> index 6d2b3cf124..01829c3562 100644
> --- a/migration/postcopy-ram.h
> +++ b/migration/postcopy-ram.h
> @@ -125,10 +125,10 @@ enum PostcopyNotifyReason {
>      POSTCOPY_NOTIFY_INBOUND_END,
>  };
>
> -struct PostcopyNotifyData {
> +typedef struct PostcopyNotifyData {
>      enum PostcopyNotifyReason reason;
>      Error **errp;
> -};
> +} PostcopyNotifyData;
>
>  void postcopy_add_notifier(NotifierWithReturn *nn);
>  void postcopy_remove_notifier(NotifierWithReturn *n);
> --

Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2021-07-07 20:08       ` Peter Xu
@ 2021-07-07 21:22         ` Alexander Duyck
  2021-07-07 22:40           ` Peter Xu
  2021-07-08  7:23           ` David Hildenbrand
  0 siblings, 2 replies; 20+ messages in thread
From: Alexander Duyck @ 2021-07-07 21:22 UTC (permalink / raw)
  To: Peter Xu
  Cc: Michael S . Tsirkin, David Hildenbrand, qemu-stable, qemu-devel,
	Juan Quintela, Wei Wang, Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On Wed, Jul 7, 2021 at 1:08 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Jul 07, 2021 at 08:57:29PM +0200, David Hildenbrand wrote:
> > On 07.07.21 20:02, Peter Xu wrote:
> > > On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote:
> > > > As it never worked properly, let's disable it via the postcopy notifier on
> > > > the destination. Trying to set "migrate_set_capability postcopy-ram on"
> > > > on the destination now results in "virtio-balloon: 'free-page-hint' does
> > > > not support postcopy Error: Postcopy is not supported".
> > >
> > > Would it be possible to do this in reversed order?  Say, dynamically disable
> > > free-page-hinting if postcopy capability is set when migration starts? Perhaps
> > > it can also be re-enabled automatically when migration completes?
> >
> > I remember that this might be quite racy. We would have to make sure that no
> > hinting happens before we enable the capability.
> >
> > As soon as we messed with the dirty bitmap (during precopy), postcopy is no
> > longer safe. As noted in the patch, the only runtime alternative is to
> > disable postcopy as soon as we actually do clear a bit. Alternatively, we
> > could ignore any hints if the postcopy capability was enabled.
>
> Logically migration capabilities are applied at VM starts, and these
> capabilities should be constant during migration (I didn't check if there's a
> hard requirement; easy to add that if we want to assure it), and in most cases
> for the lifecycle of the vm.

Would it make sense to maybe just look at adding a postcopy value to
the PrecopyNotifyData that you could populate with
migration_in_postcopy() in precopy_notify()?

Then all you would need to do is check for that value and if it is set
you shut down the page hinting or don't start it since I suspect it
wouldn't likely add any value anyway since I would think flagging
unused pages doesn't add much value in a postcopy environment anyway.

> >
> > Whatever we do, we have to make sure that a user cannot trick the system
> > into an inconsistent state. Like enabling hinting, starting migration, then
> > enabling the postcopy capability and kicking of postcopy. I did not check if
> > we allow for that, though.
>
> We could turn free page hinting off when migration starts with postcopy-ram=on,
> then re-enable it after migration finishes.  That looks very safe to me.  And I
> don't even worry on user trying to mess it up - as that only put their own VM
> at risk; that's mostly fine to me.

We wouldn't necessarily even need to really turn it off, just don't
start it. I wonder if we couldn't just get away with adding a check to
the existing virtio_balloon_free_page_hint_notify to see if we are in
the postcopy state there and just shut things down or not start them.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2021-07-07 21:22         ` Alexander Duyck
@ 2021-07-07 22:40           ` Peter Xu
  2021-07-08  7:14             ` David Hildenbrand
  2021-07-08  7:23           ` David Hildenbrand
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Xu @ 2021-07-07 22:40 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Michael S . Tsirkin, David Hildenbrand, qemu-stable, qemu-devel,
	Juan Quintela, Wei Wang, Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On Wed, Jul 07, 2021 at 02:22:32PM -0700, Alexander Duyck wrote:
> On Wed, Jul 7, 2021 at 1:08 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Wed, Jul 07, 2021 at 08:57:29PM +0200, David Hildenbrand wrote:
> > > On 07.07.21 20:02, Peter Xu wrote:
> > > > On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote:
> > > > > As it never worked properly, let's disable it via the postcopy notifier on
> > > > > the destination. Trying to set "migrate_set_capability postcopy-ram on"
> > > > > on the destination now results in "virtio-balloon: 'free-page-hint' does
> > > > > not support postcopy Error: Postcopy is not supported".
> > > >
> > > > Would it be possible to do this in reversed order?  Say, dynamically disable
> > > > free-page-hinting if postcopy capability is set when migration starts? Perhaps
> > > > it can also be re-enabled automatically when migration completes?
> > >
> > > I remember that this might be quite racy. We would have to make sure that no
> > > hinting happens before we enable the capability.
> > >
> > > As soon as we messed with the dirty bitmap (during precopy), postcopy is no
> > > longer safe. As noted in the patch, the only runtime alternative is to
> > > disable postcopy as soon as we actually do clear a bit. Alternatively, we
> > > could ignore any hints if the postcopy capability was enabled.
> >
> > Logically migration capabilities are applied at VM starts, and these
> > capabilities should be constant during migration (I didn't check if there's a
> > hard requirement; easy to add that if we want to assure it), and in most cases
> > for the lifecycle of the vm.
> 
> Would it make sense to maybe just look at adding a postcopy value to
> the PrecopyNotifyData that you could populate with
> migration_in_postcopy() in precopy_notify()?

Should we check migrate_postcopy_ram() rather than migration_in_postcopy()?
It's the precopy phase that's dropping the dirty bits and can potentially hang
a postcopy vcpu, afaiu.

> 
> Then all you would need to do is check for that value and if it is set
> you shut down the page hinting or don't start it since I suspect it
> wouldn't likely add any value anyway since I would think flagging
> unused pages doesn't add much value in a postcopy environment anyway.
> 
> > >
> > > Whatever we do, we have to make sure that a user cannot trick the system
> > > into an inconsistent state. Like enabling hinting, starting migration, then
> > > enabling the postcopy capability and kicking of postcopy. I did not check if
> > > we allow for that, though.
> >
> > We could turn free page hinting off when migration starts with postcopy-ram=on,
> > then re-enable it after migration finishes.  That looks very safe to me.  And I
> > don't even worry on user trying to mess it up - as that only put their own VM
> > at risk; that's mostly fine to me.
> 
> We wouldn't necessarily even need to really turn it off, just don't
> start it. I wonder if we couldn't just get away with adding a check to
> the existing virtio_balloon_free_page_hint_notify to see if we are in
> the postcopy state there and just shut things down or not start them.

This makes me wonder whether qemu_guest_free_page_hint() should be called at
all on destination host when incoming postcopy migration is in progress.

Right now the check migration_is_setup_or_active() should return true on
destination host, however I am not sure if that's necessary as we don't track
dirty at all there.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2021-07-07 22:40           ` Peter Xu
@ 2021-07-08  7:14             ` David Hildenbrand
  0 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2021-07-08  7:14 UTC (permalink / raw)
  To: Peter Xu, Alexander Duyck
  Cc: Juan Quintela, Michael S . Tsirkin, qemu-devel, qemu-stable,
	Wei Wang, Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On 08.07.21 00:40, Peter Xu wrote:
> On Wed, Jul 07, 2021 at 02:22:32PM -0700, Alexander Duyck wrote:
>> On Wed, Jul 7, 2021 at 1:08 PM Peter Xu <peterx@redhat.com> wrote:
>>>
>>> On Wed, Jul 07, 2021 at 08:57:29PM +0200, David Hildenbrand wrote:
>>>> On 07.07.21 20:02, Peter Xu wrote:
>>>>> On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote:
>>>>>> As it never worked properly, let's disable it via the postcopy notifier on
>>>>>> the destination. Trying to set "migrate_set_capability postcopy-ram on"
>>>>>> on the destination now results in "virtio-balloon: 'free-page-hint' does
>>>>>> not support postcopy Error: Postcopy is not supported".
>>>>>
>>>>> Would it be possible to do this in reversed order?  Say, dynamically disable
>>>>> free-page-hinting if postcopy capability is set when migration starts? Perhaps
>>>>> it can also be re-enabled automatically when migration completes?
>>>>
>>>> I remember that this might be quite racy. We would have to make sure that no
>>>> hinting happens before we enable the capability.
>>>>
>>>> As soon as we messed with the dirty bitmap (during precopy), postcopy is no
>>>> longer safe. As noted in the patch, the only runtime alternative is to
>>>> disable postcopy as soon as we actually do clear a bit. Alternatively, we
>>>> could ignore any hints if the postcopy capability was enabled.
>>>
>>> Logically migration capabilities are applied at VM starts, and these
>>> capabilities should be constant during migration (I didn't check if there's a
>>> hard requirement; easy to add that if we want to assure it), and in most cases
>>> for the lifecycle of the vm.
>>
>> Would it make sense to maybe just look at adding a postcopy value to
>> the PrecopyNotifyData that you could populate with
>> migration_in_postcopy() in precopy_notify()?
> 
> Should we check migrate_postcopy_ram() rather than migration_in_postcopy()?
Right, we care about the source only -- if postcopy could be started.

> 
>>
>> Then all you would need to do is check for that value and if it is set
>> you shut down the page hinting or don't start it since I suspect it
>> wouldn't likely add any value anyway since I would think flagging
>> unused pages doesn't add much value in a postcopy environment anyway.
>>

We'd have to never kick it off right from the start as I explained 
previously. As soon as you messed with the bitmaps it's problematic.

>>>>
>>>> Whatever we do, we have to make sure that a user cannot trick the system
>>>> into an inconsistent state. Like enabling hinting, starting migration, then
>>>> enabling the postcopy capability and kicking of postcopy. I did not check if
>>>> we allow for that, though.
>>>
>>> We could turn free page hinting off when migration starts with postcopy-ram=on,
>>> then re-enable it after migration finishes.  That looks very safe to me.  And I
>>> don't even worry on user trying to mess it up - as that only put their own VM
>>> at risk; that's mostly fine to me.
>>
>> We wouldn't necessarily even need to really turn it off, just don't
>> start it. I wonder if we couldn't just get away with adding a check to
>> the existing virtio_balloon_free_page_hint_notify to see if we are in
>> the postcopy state there and just shut things down or not start them.
> 
> This makes me wonder whether qemu_guest_free_page_hint() should be called at
> all on destination host when incoming postcopy migration is in progress.

It really shouldn't. And if it would currently happen, it would be due 
to issue 1. described in the patch description that will be fixed 
independently, such that hinting is completely done once running on the 
destination.

> 
> Right now the check migration_is_setup_or_active() should return true on
> destination host, however I am not sure if that's necessary as we don't track
> dirty at all there.

migration_is_setup_or_active(s->state) uses migrate_get_current(), which 
gives us the outgoing state (source) not the incoming state (destination).

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2021-07-07 21:22         ` Alexander Duyck
  2021-07-07 22:40           ` Peter Xu
@ 2021-07-08  7:23           ` David Hildenbrand
  1 sibling, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2021-07-08  7:23 UTC (permalink / raw)
  To: Alexander Duyck, Peter Xu
  Cc: Juan Quintela, Michael S . Tsirkin, qemu-devel, qemu-stable,
	Wei Wang, Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On 07.07.21 23:22, Alexander Duyck wrote:
> On Wed, Jul 7, 2021 at 1:08 PM Peter Xu <peterx@redhat.com> wrote:
>>
>> On Wed, Jul 07, 2021 at 08:57:29PM +0200, David Hildenbrand wrote:
>>> On 07.07.21 20:02, Peter Xu wrote:
>>>> On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote:
>>>>> As it never worked properly, let's disable it via the postcopy notifier on
>>>>> the destination. Trying to set "migrate_set_capability postcopy-ram on"
>>>>> on the destination now results in "virtio-balloon: 'free-page-hint' does
>>>>> not support postcopy Error: Postcopy is not supported".
>>>>
>>>> Would it be possible to do this in reversed order?  Say, dynamically disable
>>>> free-page-hinting if postcopy capability is set when migration starts? Perhaps
>>>> it can also be re-enabled automatically when migration completes?
>>>
>>> I remember that this might be quite racy. We would have to make sure that no
>>> hinting happens before we enable the capability.
>>>
>>> As soon as we messed with the dirty bitmap (during precopy), postcopy is no
>>> longer safe. As noted in the patch, the only runtime alternative is to
>>> disable postcopy as soon as we actually do clear a bit. Alternatively, we
>>> could ignore any hints if the postcopy capability was enabled.
>>
>> Logically migration capabilities are applied at VM starts, and these
>> capabilities should be constant during migration (I didn't check if there's a
>> hard requirement; easy to add that if we want to assure it), and in most cases
>> for the lifecycle of the vm.
> 
> Would it make sense to maybe just look at adding a postcopy value to
> the PrecopyNotifyData that you could populate with
> migration_in_postcopy() in precopy_notify()?
> 
> Then all you would need to do is check for that value and if it is set
> you shut down the page hinting or don't start it since I suspect it
> wouldn't likely add any value anyway since I would think flagging
> unused pages doesn't add much value in a postcopy environment anyway.

I don't think that's true. With free page hinting you reduce the 
effective VM size you have to migrate. Any page that has to be migrated 
will consume bandwidth.

1. Although postcopy transfers only the currently requested pages, the 
background thread will keep pushing pages, making postcopy eventually 
run longer. While in postcopy (well, and in precopy) we are faced with a 
clear performance degradation, so we want to minimize the overall time 
spent.

2. Usually you let precopy run for a while before switching to postcopy. 
With free page hinting you might be able to greatly reduce the number of 
pages you'll have to migrate later in the same amount of time.


So there would be value, but at least I am not too interested in making 
it work in combination perfectly if it results in significant migration 
code changes; my goal is to not silently break guests when used in 
combination -- once there is the actual requirement to optimize this 
setup, we can work on that optimization (as discussed with MST here).

So I'll explore going the migrate_postcopy_ram() way to silently (or at 
least warn) disable free page hinting. Thanks.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2021-07-07 19:57           ` Michael S. Tsirkin
@ 2021-07-08  8:19             ` David Hildenbrand
  2021-07-08 19:07             ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2021-07-08  8:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Juan Quintela, qemu-devel, Alexander Duyck, qemu-stable,
	Wei Wang, Peter Xu, Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On 07.07.21 21:57, Michael S. Tsirkin wrote:
> On Wed, Jul 07, 2021 at 09:47:31PM +0200, David Hildenbrand wrote:
>> On 07.07.21 21:19, Michael S. Tsirkin wrote:
>>> On Wed, Jul 07, 2021 at 09:14:00PM +0200, David Hildenbrand wrote:
>>>> On 07.07.21 21:05, Michael S. Tsirkin wrote:
>>>>> On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote:
>>>>>> Postcopy never worked properly with 'free-page-hint=on', as there are
>>>>>> at least two issues:
>>>>>>
>>>>>> 1) With postcopy, the guest will never receive a VIRTIO_BALLOON_CMD_ID_DONE
>>>>>>       and consequently won't release free pages back to the OS once
>>>>>>       migration finishes.
>>>>>>
>>>>>>       The issue is that for postcopy, we won't do a final bitmap sync while
>>>>>>       the guest is stopped on the source and
>>>>>>       virtio_balloon_free_page_hint_notify() will only call
>>>>>>       virtio_balloon_free_page_done() on the source during
>>>>>>       PRECOPY_NOTIFY_CLEANUP, after the VM state was already migrated to
>>>>>>       the destination.
>>>>>>
>>>>>> 2) Once the VM touches a page on the destination that has been excluded
>>>>>>       from migration on the source via qemu_guest_free_page_hint() while
>>>>>>       postcopy is active, that thread will stall until postcopy finishes
>>>>>>       and all threads are woken up. (with older Linux kernels that won't
>>>>>>       retry faults when woken up via userfaultfd, we might actually get a
>>>>>>       SEGFAULT)
>>>>>>
>>>>>>       The issue is that the source will refuse to migrate any pages that
>>>>>>       are not marked as dirty in the dirty bmap -- for example, because the
>>>>>>       page might just have been sent. Consequently, the faulting thread will
>>>>>>       stall, waiting for the page to be migrated -- which could take quite
>>>>>>       a while and result in guest OS issues.
>>>>>
>>>>> OK so if source gets a request for a page which is not dirty
>>>>> it does not respond immediately? Why not just teach it to
>>>>> respond? It would seem that if destination wants a page we
>>>>> should just give it to the destination ...
>>>>
>>>> The source does not know if a page has already been sent (e.g., via the
>>>> background migration thread that moves all data over) vs. the page has not
>>>> been send because the page was hinted. This is the part where we'd need
>>>> additional tracking on the source to actually know that.
>>>>
>>>> We must not send a page twice, otherwise bad things can happen when placing
>>>> pages that already have been migrated, because that scenario can easily
>>>> happen with ordinary postcopy (page has already been sent and we're dealing
>>>> with a stale request from the destination).
>>>
>>> OK let me get this straight
>>>
>>> A. source sends page
>>> B. destination requests page
>>> C. destination changes page
>>> D. source sends page
>>> E. destination overwrites page
>>>
>>> this is what you are worried about right?
>>
>> IIRC E. is with recent kernels:
>>
>> E. placing the page fails with -EEXIST and postcopy migration fails
>>
>> However, the man page (man ioctl_userfaultfd) doesn't describe what is
>> actually supposed to happen when double-placing. Could be that it's
>> "undefined behavior".
>>
>> I did not try, though.
>>
>>
>> This is how it works today:
>>
>> A. source sends page and marks it clean
>> B. destination requests page
>> C. destination receives page and places it
>> D. source ignores request as page is clean
> 
> If it's actually -EEXIST then we could just resend it
> and teach destination to ignore -EEXIST errors right?

I think checking the received bitmap would be more robust.

> 
> Will actually make things a bit more robust: destination
> handles its own consistency instead of relying on source.

TBH, I don't think having multiple copies of the same page in flight is 
neither a very good design, nor robust.

In an idea world, the destination would make sure to send a page only 
once and the source would expect to receive a page only once. This is 
currently the case except for free page hinting, where a page might not 
be sent as we're relying on the dirty bitmap to also track what has been 
already sent.

The destination does handle consistently right now by bailing out if it 
receives the page twice (e.g., -EEXIST). In addition, we could consult 
the received bitmap to make sure we really only receive stuff once 
instead of relying on undocumented userfaultfd behavior. On the source, 
we'd ideally have a "sent bitmap", but I'll really avoid introducing new 
bitmaps because that can't be the ultimate solution (dirty, clean, 
received, ...).


I just found the comment that describes the current design: 
migration/ram.c:get_queued_page()

"We're sending this page, and since it's postcopy nothing else will 
dirty it, and we must make sure it doesn't get sent again even if this 
queue request was received after the background search already sent it."

> 
>>>
>>> the fix is to mark page clean in A.
>>> then in D to not send page if it's clean?
>>>
>>> And the problem with hinting is this:
>>>
>>> A. page is marked clean
>>> B. destination requests page
>>> C. destination changes page
>>> D. source sends page <- does not happen, page is clean!
>>> E. destination overwrites page
>>
>> Simplified it's
>>
>> A. page is marked clean by hinting code
>> B. destination requests page
>> D. source ignores request as page is clean
>> E. destination stalls until postcopy unregisters uffd
>>
>>
>> Some thoughts
>>
>> 1. We do have a a recv bitmap where we track received pages on the
>> destination (e.g., ramblock_recv_bitmap_test()), however we only use it to
>> avoid sending duplicate requests to the hypervisor AFAIKs, and don't check
>> it when placing pages.
>>
>> 2. Changing the migration behavior unconditionally on the source will break
>> migration to old QEMU binaries that cannot handle this change.
> 
> We can always make this depend on new machine types.

Yes, we could if we want to go down that path.

> 
> 
>> 3. I think the current behavior is in place to make debugging easier. If
>> only a single instance of a page will ever be migrated from source to
>> destination, there cannot be silent data corruption. Further, we avoid
>> migrating unnecessarily pages twice.
>>
> 
> Likely does not matter much for performance, it seems unlikely that
> the race is all that common.

I cannot really tell, I'd guess it can happen but shouldn't happen too 
often.


For now I'm going to keep it simple and ignore free page hints while 
postcopy has been enabled as suggested in the other discussion. That 
should be the easy fix and if someone wants to optimize this scenarios, 
we can think about a proper way to make it fly (as I said, nobody seemed 
to really have cared in the past as it's mostly broken right now).

Thanks for the helpful discussion!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2021-07-07 19:57           ` Michael S. Tsirkin
  2021-07-08  8:19             ` David Hildenbrand
@ 2021-07-08 19:07             ` Dr. David Alan Gilbert
  2021-07-09 11:27               ` Michael S. Tsirkin
  1 sibling, 1 reply; 20+ messages in thread
From: Dr. David Alan Gilbert @ 2021-07-08 19:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Hildenbrand, Juan Quintela, qemu-stable, Alexander Duyck,
	qemu-devel, Wei Wang, Peter Xu, Philippe Mathieu-Daudé

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Wed, Jul 07, 2021 at 09:47:31PM +0200, David Hildenbrand wrote:
> > On 07.07.21 21:19, Michael S. Tsirkin wrote:
> > > On Wed, Jul 07, 2021 at 09:14:00PM +0200, David Hildenbrand wrote:
> > > > On 07.07.21 21:05, Michael S. Tsirkin wrote:
> > > > > On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote:
> > > > > > Postcopy never worked properly with 'free-page-hint=on', as there are
> > > > > > at least two issues:
> > > > > > 
> > > > > > 1) With postcopy, the guest will never receive a VIRTIO_BALLOON_CMD_ID_DONE
> > > > > >      and consequently won't release free pages back to the OS once
> > > > > >      migration finishes.
> > > > > > 
> > > > > >      The issue is that for postcopy, we won't do a final bitmap sync while
> > > > > >      the guest is stopped on the source and
> > > > > >      virtio_balloon_free_page_hint_notify() will only call
> > > > > >      virtio_balloon_free_page_done() on the source during
> > > > > >      PRECOPY_NOTIFY_CLEANUP, after the VM state was already migrated to
> > > > > >      the destination.
> > > > > > 
> > > > > > 2) Once the VM touches a page on the destination that has been excluded
> > > > > >      from migration on the source via qemu_guest_free_page_hint() while
> > > > > >      postcopy is active, that thread will stall until postcopy finishes
> > > > > >      and all threads are woken up. (with older Linux kernels that won't
> > > > > >      retry faults when woken up via userfaultfd, we might actually get a
> > > > > >      SEGFAULT)
> > > > > > 
> > > > > >      The issue is that the source will refuse to migrate any pages that
> > > > > >      are not marked as dirty in the dirty bmap -- for example, because the
> > > > > >      page might just have been sent. Consequently, the faulting thread will
> > > > > >      stall, waiting for the page to be migrated -- which could take quite
> > > > > >      a while and result in guest OS issues.
> > > > > 
> > > > > OK so if source gets a request for a page which is not dirty
> > > > > it does not respond immediately? Why not just teach it to
> > > > > respond? It would seem that if destination wants a page we
> > > > > should just give it to the destination ...
> > > > 
> > > > The source does not know if a page has already been sent (e.g., via the
> > > > background migration thread that moves all data over) vs. the page has not
> > > > been send because the page was hinted. This is the part where we'd need
> > > > additional tracking on the source to actually know that.
> > > > 
> > > > We must not send a page twice, otherwise bad things can happen when placing
> > > > pages that already have been migrated, because that scenario can easily
> > > > happen with ordinary postcopy (page has already been sent and we're dealing
> > > > with a stale request from the destination).
> > > 
> > > OK let me get this straight
> > > 
> > > A. source sends page
> > > B. destination requests page
> > > C. destination changes page
> > > D. source sends page
> > > E. destination overwrites page
> > > 
> > > this is what you are worried about right?
> > 
> > IIRC E. is with recent kernels:
> > 
> > E. placing the page fails with -EEXIST and postcopy migration fails
> > 
> > However, the man page (man ioctl_userfaultfd) doesn't describe what is
> > actually supposed to happen when double-placing. Could be that it's
> > "undefined behavior".
> > 
> > I did not try, though.
> > 
> > 
> > This is how it works today:
> > 
> > A. source sends page and marks it clean
> > B. destination requests page
> > C. destination receives page and places it
> > D. source ignores request as page is clean
> 
> If it's actually -EEXIST then we could just resend it
> and teach destination to ignore -EEXIST errors right?
> 
> Will actually make things a bit more robust: destination
> handles its own consistency instead of relying on source.

You have to be careful of a few things;
  a) If the destination has modified the page, then you must
definitely not under any circumstances lose those modifications
and replace them by an old version from the source.

  b) With postcopy recovery I think there is a bitmap to track some
of this; but you have to be careful since you don't know whether
pages that were sent were actually received.

Dave

> 
> 
> > > 
> > > the fix is to mark page clean in A.
> > > then in D to not send page if it's clean?
> > > 
> > > And the problem with hinting is this:
> > > 
> > > A. page is marked clean
> > > B. destination requests page
> > > C. destination changes page
> > > D. source sends page <- does not happen, page is clean!
> > > E. destination overwrites page
> > 
> > Simplified it's
> > 
> > A. page is marked clean by hinting code
> > B. destination requests page
> > D. source ignores request as page is clean
> > E. destination stalls until postcopy unregisters uffd
> > 
> > 
> > Some thoughts
> > 
> > 1. We do have a a recv bitmap where we track received pages on the
> > destination (e.g., ramblock_recv_bitmap_test()), however we only use it to
> > avoid sending duplicate requests to the hypervisor AFAIKs, and don't check
> > it when placing pages.
> > 
> > 2. Changing the migration behavior unconditionally on the source will break
> > migration to old QEMU binaries that cannot handle this change.
> 
> We can always make this depend on new machine types.
> 
> 
> > 3. I think the current behavior is in place to make debugging easier. If
> > only a single instance of a page will ever be migrated from source to
> > destination, there cannot be silent data corruption. Further, we avoid
> > migrating unnecessarily pages twice.
> > 
> 
> Likely does not matter much for performance, it seems unlikely that
> the race is all that common.
> 
> > Maybe Dave and Peter can spot any flaws in my understanding.
> > 
> > -- 
> > Thanks,
> > 
> > David / dhildenb
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2021-07-08 19:07             ` Dr. David Alan Gilbert
@ 2021-07-09 11:27               ` Michael S. Tsirkin
  0 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2021-07-09 11:27 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: David Hildenbrand, Juan Quintela, qemu-stable, Alexander Duyck,
	qemu-devel, Wei Wang, Peter Xu, Philippe Mathieu-Daudé

On Thu, Jul 08, 2021 at 08:07:44PM +0100, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
> > On Wed, Jul 07, 2021 at 09:47:31PM +0200, David Hildenbrand wrote:
> > > On 07.07.21 21:19, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 07, 2021 at 09:14:00PM +0200, David Hildenbrand wrote:
> > > > > On 07.07.21 21:05, Michael S. Tsirkin wrote:
> > > > > > On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote:
> > > > > > > Postcopy never worked properly with 'free-page-hint=on', as there are
> > > > > > > at least two issues:
> > > > > > > 
> > > > > > > 1) With postcopy, the guest will never receive a VIRTIO_BALLOON_CMD_ID_DONE
> > > > > > >      and consequently won't release free pages back to the OS once
> > > > > > >      migration finishes.
> > > > > > > 
> > > > > > >      The issue is that for postcopy, we won't do a final bitmap sync while
> > > > > > >      the guest is stopped on the source and
> > > > > > >      virtio_balloon_free_page_hint_notify() will only call
> > > > > > >      virtio_balloon_free_page_done() on the source during
> > > > > > >      PRECOPY_NOTIFY_CLEANUP, after the VM state was already migrated to
> > > > > > >      the destination.
> > > > > > > 
> > > > > > > 2) Once the VM touches a page on the destination that has been excluded
> > > > > > >      from migration on the source via qemu_guest_free_page_hint() while
> > > > > > >      postcopy is active, that thread will stall until postcopy finishes
> > > > > > >      and all threads are woken up. (with older Linux kernels that won't
> > > > > > >      retry faults when woken up via userfaultfd, we might actually get a
> > > > > > >      SEGFAULT)
> > > > > > > 
> > > > > > >      The issue is that the source will refuse to migrate any pages that
> > > > > > >      are not marked as dirty in the dirty bmap -- for example, because the
> > > > > > >      page might just have been sent. Consequently, the faulting thread will
> > > > > > >      stall, waiting for the page to be migrated -- which could take quite
> > > > > > >      a while and result in guest OS issues.
> > > > > > 
> > > > > > OK so if source gets a request for a page which is not dirty
> > > > > > it does not respond immediately? Why not just teach it to
> > > > > > respond? It would seem that if destination wants a page we
> > > > > > should just give it to the destination ...
> > > > > 
> > > > > The source does not know if a page has already been sent (e.g., via the
> > > > > background migration thread that moves all data over) vs. the page has not
> > > > > been send because the page was hinted. This is the part where we'd need
> > > > > additional tracking on the source to actually know that.
> > > > > 
> > > > > We must not send a page twice, otherwise bad things can happen when placing
> > > > > pages that already have been migrated, because that scenario can easily
> > > > > happen with ordinary postcopy (page has already been sent and we're dealing
> > > > > with a stale request from the destination).
> > > > 
> > > > OK let me get this straight
> > > > 
> > > > A. source sends page
> > > > B. destination requests page
> > > > C. destination changes page
> > > > D. source sends page
> > > > E. destination overwrites page
> > > > 
> > > > this is what you are worried about right?
> > > 
> > > IIRC E. is with recent kernels:
> > > 
> > > E. placing the page fails with -EEXIST and postcopy migration fails
> > > 
> > > However, the man page (man ioctl_userfaultfd) doesn't describe what is
> > > actually supposed to happen when double-placing. Could be that it's
> > > "undefined behavior".
> > > 
> > > I did not try, though.
> > > 
> > > 
> > > This is how it works today:
> > > 
> > > A. source sends page and marks it clean
> > > B. destination requests page
> > > C. destination receives page and places it
> > > D. source ignores request as page is clean
> > 
> > If it's actually -EEXIST then we could just resend it
> > and teach destination to ignore -EEXIST errors right?
> > 
> > Will actually make things a bit more robust: destination
> > handles its own consistency instead of relying on source.
> 
> You have to be careful of a few things;
>   a) If the destination has modified the page, then you must
> definitely not under any circumstances lose those modifications
> and replace them by an old version from the source.
> 
>   b) With postcopy recovery I think there is a bitmap to track some
> of this; but you have to be careful since you don't know whether
> pages that were sent were actually received.
> 
> Dave

what I am trying to say is that userfaultfd already tracks these
things in the kernel for us. Ideally we'd just use that ...

> > 
> > 
> > > > 
> > > > the fix is to mark page clean in A.
> > > > then in D to not send page if it's clean?
> > > > 
> > > > And the problem with hinting is this:
> > > > 
> > > > A. page is marked clean
> > > > B. destination requests page
> > > > C. destination changes page
> > > > D. source sends page <- does not happen, page is clean!
> > > > E. destination overwrites page
> > > 
> > > Simplified it's
> > > 
> > > A. page is marked clean by hinting code
> > > B. destination requests page
> > > D. source ignores request as page is clean
> > > E. destination stalls until postcopy unregisters uffd
> > > 
> > > 
> > > Some thoughts
> > > 
> > > 1. We do have a a recv bitmap where we track received pages on the
> > > destination (e.g., ramblock_recv_bitmap_test()), however we only use it to
> > > avoid sending duplicate requests to the hypervisor AFAIKs, and don't check
> > > it when placing pages.
> > > 
> > > 2. Changing the migration behavior unconditionally on the source will break
> > > migration to old QEMU binaries that cannot handle this change.
> > 
> > We can always make this depend on new machine types.
> > 
> > 
> > > 3. I think the current behavior is in place to make debugging easier. If
> > > only a single instance of a page will ever be migrated from source to
> > > destination, there cannot be silent data corruption. Further, we avoid
> > > migrating unnecessarily pages twice.
> > > 
> > 
> > Likely does not matter much for performance, it seems unlikely that
> > the race is all that common.
> > 
> > > Maybe Dave and Peter can spot any flaws in my understanding.
> > > 
> > > -- 
> > > Thanks,
> > > 
> > > David / dhildenb
> > 
> -- 
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-07-09 11:36 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-07 14:06 [PATCH v1 0/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT David Hildenbrand
2021-07-07 14:06 ` [PATCH v1 1/2] migration/postcopy-ram: define type for "struct PostcopyNotifyData" David Hildenbrand
2021-07-07 20:30   ` Pankaj Gupta
2021-07-07 14:06 ` [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT David Hildenbrand
2021-07-07 18:02   ` Peter Xu
2021-07-07 18:57     ` David Hildenbrand
2021-07-07 19:07       ` Michael S. Tsirkin
2021-07-07 20:08       ` Peter Xu
2021-07-07 21:22         ` Alexander Duyck
2021-07-07 22:40           ` Peter Xu
2021-07-08  7:14             ` David Hildenbrand
2021-07-08  7:23           ` David Hildenbrand
2021-07-07 19:05   ` Michael S. Tsirkin
2021-07-07 19:14     ` David Hildenbrand
2021-07-07 19:19       ` Michael S. Tsirkin
2021-07-07 19:47         ` David Hildenbrand
2021-07-07 19:57           ` Michael S. Tsirkin
2021-07-08  8:19             ` David Hildenbrand
2021-07-08 19:07             ` Dr. David Alan Gilbert
2021-07-09 11:27               ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.