* [PATCH v1 0/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT @ 2021-07-07 14:06 David Hildenbrand 2021-07-07 14:06 ` [PATCH v1 1/2] migration/postcopy-ram: define type for "struct PostcopyNotifyData" David Hildenbrand 2021-07-07 14:06 ` [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT David Hildenbrand 0 siblings, 2 replies; 20+ messages in thread From: David Hildenbrand @ 2021-07-07 14:06 UTC (permalink / raw) To: qemu-devel Cc: Juan Quintela, Michael S. Tsirkin, David Hildenbrand, Dr. David Alan Gilbert, Peter Xu, Wei Wang, Alexander Duyck, Philippe Mathieu-Daudé Working on getting migration for virtio-mem completely right [1] I realized that virtio-balloon with VIRTIO_BALLOON_F_FREE_PAGE_HINT paired with postcopy might be shaky. Actually testing it, I directly found two issues, one of both being far from trivial to fix. Let's disallow postcopy with "free-page-hint=on". [1] https://lkml.kernel.org/r/20210616162940.28630-1-david@redhat.com David Hildenbrand (2): migration/postcopy-ram: define type for "struct PostcopyNotifyData" virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT hw/virtio/vhost-user.c | 2 +- hw/virtio/virtio-balloon.c | 26 ++++++++++++++++++++++++++ include/hw/virtio/virtio-balloon.h | 1 + migration/postcopy-ram.c | 2 +- migration/postcopy-ram.h | 4 ++-- 5 files changed, 31 insertions(+), 4 deletions(-) -- 2.31.1 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v1 1/2] migration/postcopy-ram: define type for "struct PostcopyNotifyData" 2021-07-07 14:06 [PATCH v1 0/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT David Hildenbrand @ 2021-07-07 14:06 ` David Hildenbrand 2021-07-07 20:30 ` Pankaj Gupta 2021-07-07 14:06 ` [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT David Hildenbrand 1 sibling, 1 reply; 20+ messages in thread From: David Hildenbrand @ 2021-07-07 14:06 UTC (permalink / raw) To: qemu-devel Cc: Juan Quintela, Michael S . Tsirkin, David Hildenbrand, Dr. David Alan Gilbert, Alexander Duyck, Wei Wang, Peter Xu, Philippe Mathieu-Daudé Let's define a proper type, just as we do for PrecopyNotifyData already. Cc: Wei Wang <wei.w.wang@intel.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Philippe Mathieu-Daudé <philmd@redhat.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Juan Quintela <quintela@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> --- hw/virtio/vhost-user.c | 2 +- migration/postcopy-ram.c | 2 +- migration/postcopy-ram.h | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 1ac4a2ebec..42dad711bf 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -1827,9 +1827,9 @@ static int vhost_user_postcopy_end(struct vhost_dev *dev, Error **errp) static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier, void *opaque) { - struct PostcopyNotifyData *pnd = opaque; struct vhost_user *u = container_of(notifier, struct vhost_user, postcopy_notifier); + PostcopyNotifyData *pnd = opaque; struct vhost_dev *dev = u->dev; switch (pnd->reason) { diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 2e9697bdd2..ee4e1c7cf5 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -69,7 +69,7 @@ void postcopy_remove_notifier(NotifierWithReturn *n) int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp) { - struct PostcopyNotifyData pnd; + PostcopyNotifyData pnd; pnd.reason = reason; pnd.errp = errp; diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h index 6d2b3cf124..01829c3562 100644 --- a/migration/postcopy-ram.h +++ b/migration/postcopy-ram.h @@ -125,10 +125,10 @@ enum PostcopyNotifyReason { POSTCOPY_NOTIFY_INBOUND_END, }; -struct PostcopyNotifyData { +typedef struct PostcopyNotifyData { enum PostcopyNotifyReason reason; Error **errp; -}; +} PostcopyNotifyData; void postcopy_add_notifier(NotifierWithReturn *nn); void postcopy_remove_notifier(NotifierWithReturn *n); -- 2.31.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v1 1/2] migration/postcopy-ram: define type for "struct PostcopyNotifyData" 2021-07-07 14:06 ` [PATCH v1 1/2] migration/postcopy-ram: define type for "struct PostcopyNotifyData" David Hildenbrand @ 2021-07-07 20:30 ` Pankaj Gupta 0 siblings, 0 replies; 20+ messages in thread From: Pankaj Gupta @ 2021-07-07 20:30 UTC (permalink / raw) To: David Hildenbrand Cc: Michael S . Tsirkin, Juan Quintela, Qemu Developers, Alexander Duyck, Dr. David Alan Gilbert, Wei Wang, Peter Xu, Philippe Mathieu-Daudé > Let's define a proper type, just as we do for PrecopyNotifyData already. > > Cc: Wei Wang <wei.w.wang@intel.com> > Cc: Michael S. Tsirkin <mst@redhat.com> > Cc: Philippe Mathieu-Daudé <philmd@redhat.com> > Cc: Alexander Duyck <alexander.duyck@gmail.com> > Cc: Juan Quintela <quintela@redhat.com> > Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > Cc: Peter Xu <peterx@redhat.com> > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > hw/virtio/vhost-user.c | 2 +- > migration/postcopy-ram.c | 2 +- > migration/postcopy-ram.h | 4 ++-- > 3 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > index 1ac4a2ebec..42dad711bf 100644 > --- a/hw/virtio/vhost-user.c > +++ b/hw/virtio/vhost-user.c > @@ -1827,9 +1827,9 @@ static int vhost_user_postcopy_end(struct vhost_dev *dev, Error **errp) > static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier, > void *opaque) > { > - struct PostcopyNotifyData *pnd = opaque; > struct vhost_user *u = container_of(notifier, struct vhost_user, > postcopy_notifier); > + PostcopyNotifyData *pnd = opaque; > struct vhost_dev *dev = u->dev; > > switch (pnd->reason) { > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c > index 2e9697bdd2..ee4e1c7cf5 100644 > --- a/migration/postcopy-ram.c > +++ b/migration/postcopy-ram.c > @@ -69,7 +69,7 @@ void postcopy_remove_notifier(NotifierWithReturn *n) > > int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp) > { > - struct PostcopyNotifyData pnd; > + PostcopyNotifyData pnd; > pnd.reason = reason; > pnd.errp = errp; > > diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h > index 6d2b3cf124..01829c3562 100644 > --- a/migration/postcopy-ram.h > +++ b/migration/postcopy-ram.h > @@ -125,10 +125,10 @@ enum PostcopyNotifyReason { > POSTCOPY_NOTIFY_INBOUND_END, > }; > > -struct PostcopyNotifyData { > +typedef struct PostcopyNotifyData { > enum PostcopyNotifyReason reason; > Error **errp; > -}; > +} PostcopyNotifyData; > > void postcopy_add_notifier(NotifierWithReturn *nn); > void postcopy_remove_notifier(NotifierWithReturn *n); > -- Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT 2021-07-07 14:06 [PATCH v1 0/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT David Hildenbrand 2021-07-07 14:06 ` [PATCH v1 1/2] migration/postcopy-ram: define type for "struct PostcopyNotifyData" David Hildenbrand @ 2021-07-07 14:06 ` David Hildenbrand 2021-07-07 18:02 ` Peter Xu 2021-07-07 19:05 ` Michael S. Tsirkin 1 sibling, 2 replies; 20+ messages in thread From: David Hildenbrand @ 2021-07-07 14:06 UTC (permalink / raw) To: qemu-devel Cc: Michael S . Tsirkin, David Hildenbrand, qemu-stable, Alexander Duyck, Dr. David Alan Gilbert, Juan Quintela, Wei Wang, Peter Xu, Philippe Mathieu-Daudé Postcopy never worked properly with 'free-page-hint=on', as there are at least two issues: 1) With postcopy, the guest will never receive a VIRTIO_BALLOON_CMD_ID_DONE and consequently won't release free pages back to the OS once migration finishes. The issue is that for postcopy, we won't do a final bitmap sync while the guest is stopped on the source and virtio_balloon_free_page_hint_notify() will only call virtio_balloon_free_page_done() on the source during PRECOPY_NOTIFY_CLEANUP, after the VM state was already migrated to the destination. 2) Once the VM touches a page on the destination that has been excluded from migration on the source via qemu_guest_free_page_hint() while postcopy is active, that thread will stall until postcopy finishes and all threads are woken up. (with older Linux kernels that won't retry faults when woken up via userfaultfd, we might actually get a SEGFAULT) The issue is that the source will refuse to migrate any pages that are not marked as dirty in the dirty bmap -- for example, because the page might just have been sent. Consequently, the faulting thread will stall, waiting for the page to be migrated -- which could take quite a while and result in guest OS issues. While we could fix 1), for example, by calling virtio_balloon_free_page_done() via pre_save callbacks of the vmstate, 2) is mostly impossible to fix without additional tracking, such that we can actually identify these hinted pages and handle them accordingly. As it never worked properly, let's disable it via the postcopy notifier on the destination. Trying to set "migrate_set_capability postcopy-ram on" on the destination now results in "virtio-balloon: 'free-page-hint' does not support postcopy Error: Postcopy is not supported". Note 1: We could let qemu_guest_free_page_hint() mark postcopy as broken once actually clearing bits on the source. However, it's harder to realize as we can race with users starting postcopy and we cannot produce an expressive error message easily. Note 2: virtio-mem has similar issues, however, access to "unplugged" memory by the guest is very rare and we would have to be very lucky for it to happen during migration. The spec states "The driver SHOULD NOT read from unplugged memory blocks ..." and "The driver MUST NOT write to unplugged memory blocks". virtio-mem will move away from virtio_balloon_free_page_done() soon and handle this case explicitly on the destination. Fixes: c13c4153f76d ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT") Cc: qemu-stable@nongnu.org Cc: Wei Wang <wei.w.wang@intel.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Philippe Mathieu-Daudé <philmd@redhat.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Juan Quintela <quintela@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> --- hw/virtio/virtio-balloon.c | 26 ++++++++++++++++++++++++++ include/hw/virtio/virtio-balloon.h | 1 + 2 files changed, 27 insertions(+) diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index 4b5d9e5e50..d0c9dc677c 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -30,6 +30,7 @@ #include "trace.h" #include "qemu/error-report.h" #include "migration/misc.h" +#include "migration/postcopy-ram.h" #include "hw/virtio/virtio-bus.h" #include "hw/virtio/virtio-access.h" @@ -692,6 +693,28 @@ virtio_balloon_free_page_hint_notify(NotifierWithReturn *n, void *data) return 0; } + +static int virtio_balloon_postcopy_notify(NotifierWithReturn *n, void *opaque) +{ + VirtIOBalloon *dev = container_of(n, VirtIOBalloon, postcopy_notifier); + PostcopyNotifyData *pnd = opaque; + + /* We register the notifier only with 'free-page-hint=on' for now. */ + g_assert(virtio_has_feature(dev->host_features, + VIRTIO_BALLOON_F_FREE_PAGE_HINT)); + + /* + * Pages hinted via qemu_guest_free_page_hint() are cleared from the dirty + * bitmap and will not get migrated, especially also not when the postcopy + * destination starts using them and requests migration from the source; the + * faulting thread will stall until postcopy migration finishes and + * all threads are woken up. + */ + error_setg(pnd->errp, + "virtio-balloon: 'free-page-hint' does not support postcopy"); + return -ENOENT; +} + static size_t virtio_balloon_config_size(VirtIOBalloon *s) { uint64_t features = s->host_features; @@ -911,6 +934,7 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp) s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, virtio_balloon_handle_free_page_vq); precopy_add_notifier(&s->free_page_hint_notify); + postcopy_add_notifier(&s->postcopy_notifier); object_ref(OBJECT(s->iothread)); s->free_page_bh = aio_bh_new(iothread_get_aio_context(s->iothread), @@ -935,6 +959,7 @@ static void virtio_balloon_device_unrealize(DeviceState *dev) object_unref(OBJECT(s->iothread)); virtio_balloon_free_page_stop(s); precopy_remove_notifier(&s->free_page_hint_notify); + postcopy_remove_notifier(&s->postcopy_notifier); } balloon_stats_destroy_timer(s); qemu_remove_balloon_handler(s); @@ -1008,6 +1033,7 @@ static void virtio_balloon_instance_init(Object *obj) qemu_cond_init(&s->free_page_cond); s->free_page_hint_cmd_id = VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN; s->free_page_hint_notify.notify = virtio_balloon_free_page_hint_notify; + s->postcopy_notifier.notify = virtio_balloon_postcopy_notify; object_property_add(obj, "guest-stats", "guest statistics", balloon_stats_get_all, NULL, NULL, s); diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h index 5139cf8ab6..d0d5b793b9 100644 --- a/include/hw/virtio/virtio-balloon.h +++ b/include/hw/virtio/virtio-balloon.h @@ -65,6 +65,7 @@ struct VirtIOBalloon { */ bool block_iothread; NotifierWithReturn free_page_hint_notify; + NotifierWithReturn postcopy_notifier; int64_t stats_last_update; int64_t stats_poll_interval; uint32_t host_features; -- 2.31.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT 2021-07-07 14:06 ` [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT David Hildenbrand @ 2021-07-07 18:02 ` Peter Xu 2021-07-07 18:57 ` David Hildenbrand 2021-07-07 19:05 ` Michael S. Tsirkin 1 sibling, 1 reply; 20+ messages in thread From: Peter Xu @ 2021-07-07 18:02 UTC (permalink / raw) To: David Hildenbrand Cc: Michael S . Tsirkin, Juan Quintela, qemu-devel, Alexander Duyck, qemu-stable, Wei Wang, Philippe Mathieu-Daudé, Dr. David Alan Gilbert On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote: > As it never worked properly, let's disable it via the postcopy notifier on > the destination. Trying to set "migrate_set_capability postcopy-ram on" > on the destination now results in "virtio-balloon: 'free-page-hint' does > not support postcopy Error: Postcopy is not supported". Would it be possible to do this in reversed order? Say, dynamically disable free-page-hinting if postcopy capability is set when migration starts? Perhaps it can also be re-enabled automatically when migration completes? I see postcopy a "functional" feature and free-page-hint a "performance" feature, from that pov IMHO it's better to not block function for performance. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT 2021-07-07 18:02 ` Peter Xu @ 2021-07-07 18:57 ` David Hildenbrand 2021-07-07 19:07 ` Michael S. Tsirkin 2021-07-07 20:08 ` Peter Xu 0 siblings, 2 replies; 20+ messages in thread From: David Hildenbrand @ 2021-07-07 18:57 UTC (permalink / raw) To: Peter Xu Cc: Michael S . Tsirkin, Juan Quintela, qemu-devel, Alexander Duyck, qemu-stable, Wei Wang, Philippe Mathieu-Daudé, Dr. David Alan Gilbert On 07.07.21 20:02, Peter Xu wrote: > On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote: >> As it never worked properly, let's disable it via the postcopy notifier on >> the destination. Trying to set "migrate_set_capability postcopy-ram on" >> on the destination now results in "virtio-balloon: 'free-page-hint' does >> not support postcopy Error: Postcopy is not supported". > > Would it be possible to do this in reversed order? Say, dynamically disable > free-page-hinting if postcopy capability is set when migration starts? Perhaps > it can also be re-enabled automatically when migration completes? I remember that this might be quite racy. We would have to make sure that no hinting happens before we enable the capability. As soon as we messed with the dirty bitmap (during precopy), postcopy is no longer safe. As noted in the patch, the only runtime alternative is to disable postcopy as soon as we actually do clear a bit. Alternatively, we could ignore any hints if the postcopy capability was enabled. Whatever we do, we have to make sure that a user cannot trick the system into an inconsistent state. Like enabling hinting, starting migration, then enabling the postcopy capability and kicking of postcopy. I did not check if we allow for that, though. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT 2021-07-07 18:57 ` David Hildenbrand @ 2021-07-07 19:07 ` Michael S. Tsirkin 2021-07-07 20:08 ` Peter Xu 1 sibling, 0 replies; 20+ messages in thread From: Michael S. Tsirkin @ 2021-07-07 19:07 UTC (permalink / raw) To: David Hildenbrand Cc: Juan Quintela, qemu-devel, Alexander Duyck, qemu-stable, Wei Wang, Peter Xu, Philippe Mathieu-Daudé, Dr. David Alan Gilbert On Wed, Jul 07, 2021 at 08:57:29PM +0200, David Hildenbrand wrote: > On 07.07.21 20:02, Peter Xu wrote: > > On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote: > > > As it never worked properly, let's disable it via the postcopy notifier on > > > the destination. Trying to set "migrate_set_capability postcopy-ram on" > > > on the destination now results in "virtio-balloon: 'free-page-hint' does > > > not support postcopy Error: Postcopy is not supported". > > > > Would it be possible to do this in reversed order? Say, dynamically disable > > free-page-hinting if postcopy capability is set when migration starts? Perhaps > > it can also be re-enabled automatically when migration completes? > > I remember that this might be quite racy. We would have to make sure that no > hinting happens before we enable the capability. > > As soon as we messed with the dirty bitmap (during precopy), postcopy is no > longer safe. As noted in the patch, the only runtime alternative is to > disable postcopy as soon as we actually do clear a bit. Alternatively, we > could ignore any hints if the postcopy capability was enabled. > > Whatever we do, we have to make sure that a user cannot trick the system > into an inconsistent state. Like enabling hinting, starting migration, then > enabling the postcopy capability and kicking of postcopy. I did not check if > we allow for that, though. What bothers me with limitations like this is we train users about this lack of orthogonality, it's then very hard to retrain them that a given feature is safe to use. > -- > Thanks, > > David / dhildenb ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT 2021-07-07 18:57 ` David Hildenbrand 2021-07-07 19:07 ` Michael S. Tsirkin @ 2021-07-07 20:08 ` Peter Xu 2021-07-07 21:22 ` Alexander Duyck 1 sibling, 1 reply; 20+ messages in thread From: Peter Xu @ 2021-07-07 20:08 UTC (permalink / raw) To: David Hildenbrand Cc: Michael S . Tsirkin, Juan Quintela, qemu-devel, Alexander Duyck, qemu-stable, Wei Wang, Philippe Mathieu-Daudé, Dr. David Alan Gilbert On Wed, Jul 07, 2021 at 08:57:29PM +0200, David Hildenbrand wrote: > On 07.07.21 20:02, Peter Xu wrote: > > On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote: > > > As it never worked properly, let's disable it via the postcopy notifier on > > > the destination. Trying to set "migrate_set_capability postcopy-ram on" > > > on the destination now results in "virtio-balloon: 'free-page-hint' does > > > not support postcopy Error: Postcopy is not supported". > > > > Would it be possible to do this in reversed order? Say, dynamically disable > > free-page-hinting if postcopy capability is set when migration starts? Perhaps > > it can also be re-enabled automatically when migration completes? > > I remember that this might be quite racy. We would have to make sure that no > hinting happens before we enable the capability. > > As soon as we messed with the dirty bitmap (during precopy), postcopy is no > longer safe. As noted in the patch, the only runtime alternative is to > disable postcopy as soon as we actually do clear a bit. Alternatively, we > could ignore any hints if the postcopy capability was enabled. Logically migration capabilities are applied at VM starts, and these capabilities should be constant during migration (I didn't check if there's a hard requirement; easy to add that if we want to assure it), and in most cases for the lifecycle of the vm. > > Whatever we do, we have to make sure that a user cannot trick the system > into an inconsistent state. Like enabling hinting, starting migration, then > enabling the postcopy capability and kicking of postcopy. I did not check if > we allow for that, though. We could turn free page hinting off when migration starts with postcopy-ram=on, then re-enable it after migration finishes. That looks very safe to me. And I don't even worry on user trying to mess it up - as that only put their own VM at risk; that's mostly fine to me. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT 2021-07-07 20:08 ` Peter Xu @ 2021-07-07 21:22 ` Alexander Duyck 2021-07-07 22:40 ` Peter Xu 2021-07-08 7:23 ` David Hildenbrand 0 siblings, 2 replies; 20+ messages in thread From: Alexander Duyck @ 2021-07-07 21:22 UTC (permalink / raw) To: Peter Xu Cc: Michael S . Tsirkin, David Hildenbrand, qemu-stable, qemu-devel, Juan Quintela, Wei Wang, Philippe Mathieu-Daudé, Dr. David Alan Gilbert On Wed, Jul 7, 2021 at 1:08 PM Peter Xu <peterx@redhat.com> wrote: > > On Wed, Jul 07, 2021 at 08:57:29PM +0200, David Hildenbrand wrote: > > On 07.07.21 20:02, Peter Xu wrote: > > > On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote: > > > > As it never worked properly, let's disable it via the postcopy notifier on > > > > the destination. Trying to set "migrate_set_capability postcopy-ram on" > > > > on the destination now results in "virtio-balloon: 'free-page-hint' does > > > > not support postcopy Error: Postcopy is not supported". > > > > > > Would it be possible to do this in reversed order? Say, dynamically disable > > > free-page-hinting if postcopy capability is set when migration starts? Perhaps > > > it can also be re-enabled automatically when migration completes? > > > > I remember that this might be quite racy. We would have to make sure that no > > hinting happens before we enable the capability. > > > > As soon as we messed with the dirty bitmap (during precopy), postcopy is no > > longer safe. As noted in the patch, the only runtime alternative is to > > disable postcopy as soon as we actually do clear a bit. Alternatively, we > > could ignore any hints if the postcopy capability was enabled. > > Logically migration capabilities are applied at VM starts, and these > capabilities should be constant during migration (I didn't check if there's a > hard requirement; easy to add that if we want to assure it), and in most cases > for the lifecycle of the vm. Would it make sense to maybe just look at adding a postcopy value to the PrecopyNotifyData that you could populate with migration_in_postcopy() in precopy_notify()? Then all you would need to do is check for that value and if it is set you shut down the page hinting or don't start it since I suspect it wouldn't likely add any value anyway since I would think flagging unused pages doesn't add much value in a postcopy environment anyway. > > > > Whatever we do, we have to make sure that a user cannot trick the system > > into an inconsistent state. Like enabling hinting, starting migration, then > > enabling the postcopy capability and kicking of postcopy. I did not check if > > we allow for that, though. > > We could turn free page hinting off when migration starts with postcopy-ram=on, > then re-enable it after migration finishes. That looks very safe to me. And I > don't even worry on user trying to mess it up - as that only put their own VM > at risk; that's mostly fine to me. We wouldn't necessarily even need to really turn it off, just don't start it. I wonder if we couldn't just get away with adding a check to the existing virtio_balloon_free_page_hint_notify to see if we are in the postcopy state there and just shut things down or not start them. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT 2021-07-07 21:22 ` Alexander Duyck @ 2021-07-07 22:40 ` Peter Xu 2021-07-08 7:14 ` David Hildenbrand 2021-07-08 7:23 ` David Hildenbrand 1 sibling, 1 reply; 20+ messages in thread From: Peter Xu @ 2021-07-07 22:40 UTC (permalink / raw) To: Alexander Duyck Cc: Michael S . Tsirkin, David Hildenbrand, qemu-stable, qemu-devel, Juan Quintela, Wei Wang, Philippe Mathieu-Daudé, Dr. David Alan Gilbert On Wed, Jul 07, 2021 at 02:22:32PM -0700, Alexander Duyck wrote: > On Wed, Jul 7, 2021 at 1:08 PM Peter Xu <peterx@redhat.com> wrote: > > > > On Wed, Jul 07, 2021 at 08:57:29PM +0200, David Hildenbrand wrote: > > > On 07.07.21 20:02, Peter Xu wrote: > > > > On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote: > > > > > As it never worked properly, let's disable it via the postcopy notifier on > > > > > the destination. Trying to set "migrate_set_capability postcopy-ram on" > > > > > on the destination now results in "virtio-balloon: 'free-page-hint' does > > > > > not support postcopy Error: Postcopy is not supported". > > > > > > > > Would it be possible to do this in reversed order? Say, dynamically disable > > > > free-page-hinting if postcopy capability is set when migration starts? Perhaps > > > > it can also be re-enabled automatically when migration completes? > > > > > > I remember that this might be quite racy. We would have to make sure that no > > > hinting happens before we enable the capability. > > > > > > As soon as we messed with the dirty bitmap (during precopy), postcopy is no > > > longer safe. As noted in the patch, the only runtime alternative is to > > > disable postcopy as soon as we actually do clear a bit. Alternatively, we > > > could ignore any hints if the postcopy capability was enabled. > > > > Logically migration capabilities are applied at VM starts, and these > > capabilities should be constant during migration (I didn't check if there's a > > hard requirement; easy to add that if we want to assure it), and in most cases > > for the lifecycle of the vm. > > Would it make sense to maybe just look at adding a postcopy value to > the PrecopyNotifyData that you could populate with > migration_in_postcopy() in precopy_notify()? Should we check migrate_postcopy_ram() rather than migration_in_postcopy()? It's the precopy phase that's dropping the dirty bits and can potentially hang a postcopy vcpu, afaiu. > > Then all you would need to do is check for that value and if it is set > you shut down the page hinting or don't start it since I suspect it > wouldn't likely add any value anyway since I would think flagging > unused pages doesn't add much value in a postcopy environment anyway. > > > > > > > Whatever we do, we have to make sure that a user cannot trick the system > > > into an inconsistent state. Like enabling hinting, starting migration, then > > > enabling the postcopy capability and kicking of postcopy. I did not check if > > > we allow for that, though. > > > > We could turn free page hinting off when migration starts with postcopy-ram=on, > > then re-enable it after migration finishes. That looks very safe to me. And I > > don't even worry on user trying to mess it up - as that only put their own VM > > at risk; that's mostly fine to me. > > We wouldn't necessarily even need to really turn it off, just don't > start it. I wonder if we couldn't just get away with adding a check to > the existing virtio_balloon_free_page_hint_notify to see if we are in > the postcopy state there and just shut things down or not start them. This makes me wonder whether qemu_guest_free_page_hint() should be called at all on destination host when incoming postcopy migration is in progress. Right now the check migration_is_setup_or_active() should return true on destination host, however I am not sure if that's necessary as we don't track dirty at all there. -- Peter Xu ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT 2021-07-07 22:40 ` Peter Xu @ 2021-07-08 7:14 ` David Hildenbrand 0 siblings, 0 replies; 20+ messages in thread From: David Hildenbrand @ 2021-07-08 7:14 UTC (permalink / raw) To: Peter Xu, Alexander Duyck Cc: Juan Quintela, Michael S . Tsirkin, qemu-devel, qemu-stable, Wei Wang, Philippe Mathieu-Daudé, Dr. David Alan Gilbert On 08.07.21 00:40, Peter Xu wrote: > On Wed, Jul 07, 2021 at 02:22:32PM -0700, Alexander Duyck wrote: >> On Wed, Jul 7, 2021 at 1:08 PM Peter Xu <peterx@redhat.com> wrote: >>> >>> On Wed, Jul 07, 2021 at 08:57:29PM +0200, David Hildenbrand wrote: >>>> On 07.07.21 20:02, Peter Xu wrote: >>>>> On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote: >>>>>> As it never worked properly, let's disable it via the postcopy notifier on >>>>>> the destination. Trying to set "migrate_set_capability postcopy-ram on" >>>>>> on the destination now results in "virtio-balloon: 'free-page-hint' does >>>>>> not support postcopy Error: Postcopy is not supported". >>>>> >>>>> Would it be possible to do this in reversed order? Say, dynamically disable >>>>> free-page-hinting if postcopy capability is set when migration starts? Perhaps >>>>> it can also be re-enabled automatically when migration completes? >>>> >>>> I remember that this might be quite racy. We would have to make sure that no >>>> hinting happens before we enable the capability. >>>> >>>> As soon as we messed with the dirty bitmap (during precopy), postcopy is no >>>> longer safe. As noted in the patch, the only runtime alternative is to >>>> disable postcopy as soon as we actually do clear a bit. Alternatively, we >>>> could ignore any hints if the postcopy capability was enabled. >>> >>> Logically migration capabilities are applied at VM starts, and these >>> capabilities should be constant during migration (I didn't check if there's a >>> hard requirement; easy to add that if we want to assure it), and in most cases >>> for the lifecycle of the vm. >> >> Would it make sense to maybe just look at adding a postcopy value to >> the PrecopyNotifyData that you could populate with >> migration_in_postcopy() in precopy_notify()? > > Should we check migrate_postcopy_ram() rather than migration_in_postcopy()? Right, we care about the source only -- if postcopy could be started. > >> >> Then all you would need to do is check for that value and if it is set >> you shut down the page hinting or don't start it since I suspect it >> wouldn't likely add any value anyway since I would think flagging >> unused pages doesn't add much value in a postcopy environment anyway. >> We'd have to never kick it off right from the start as I explained previously. As soon as you messed with the bitmaps it's problematic. >>>> >>>> Whatever we do, we have to make sure that a user cannot trick the system >>>> into an inconsistent state. Like enabling hinting, starting migration, then >>>> enabling the postcopy capability and kicking of postcopy. I did not check if >>>> we allow for that, though. >>> >>> We could turn free page hinting off when migration starts with postcopy-ram=on, >>> then re-enable it after migration finishes. That looks very safe to me. And I >>> don't even worry on user trying to mess it up - as that only put their own VM >>> at risk; that's mostly fine to me. >> >> We wouldn't necessarily even need to really turn it off, just don't >> start it. I wonder if we couldn't just get away with adding a check to >> the existing virtio_balloon_free_page_hint_notify to see if we are in >> the postcopy state there and just shut things down or not start them. > > This makes me wonder whether qemu_guest_free_page_hint() should be called at > all on destination host when incoming postcopy migration is in progress. It really shouldn't. And if it would currently happen, it would be due to issue 1. described in the patch description that will be fixed independently, such that hinting is completely done once running on the destination. > > Right now the check migration_is_setup_or_active() should return true on > destination host, however I am not sure if that's necessary as we don't track > dirty at all there. migration_is_setup_or_active(s->state) uses migrate_get_current(), which gives us the outgoing state (source) not the incoming state (destination). -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT 2021-07-07 21:22 ` Alexander Duyck 2021-07-07 22:40 ` Peter Xu @ 2021-07-08 7:23 ` David Hildenbrand 1 sibling, 0 replies; 20+ messages in thread From: David Hildenbrand @ 2021-07-08 7:23 UTC (permalink / raw) To: Alexander Duyck, Peter Xu Cc: Juan Quintela, Michael S . Tsirkin, qemu-devel, qemu-stable, Wei Wang, Philippe Mathieu-Daudé, Dr. David Alan Gilbert On 07.07.21 23:22, Alexander Duyck wrote: > On Wed, Jul 7, 2021 at 1:08 PM Peter Xu <peterx@redhat.com> wrote: >> >> On Wed, Jul 07, 2021 at 08:57:29PM +0200, David Hildenbrand wrote: >>> On 07.07.21 20:02, Peter Xu wrote: >>>> On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote: >>>>> As it never worked properly, let's disable it via the postcopy notifier on >>>>> the destination. Trying to set "migrate_set_capability postcopy-ram on" >>>>> on the destination now results in "virtio-balloon: 'free-page-hint' does >>>>> not support postcopy Error: Postcopy is not supported". >>>> >>>> Would it be possible to do this in reversed order? Say, dynamically disable >>>> free-page-hinting if postcopy capability is set when migration starts? Perhaps >>>> it can also be re-enabled automatically when migration completes? >>> >>> I remember that this might be quite racy. We would have to make sure that no >>> hinting happens before we enable the capability. >>> >>> As soon as we messed with the dirty bitmap (during precopy), postcopy is no >>> longer safe. As noted in the patch, the only runtime alternative is to >>> disable postcopy as soon as we actually do clear a bit. Alternatively, we >>> could ignore any hints if the postcopy capability was enabled. >> >> Logically migration capabilities are applied at VM starts, and these >> capabilities should be constant during migration (I didn't check if there's a >> hard requirement; easy to add that if we want to assure it), and in most cases >> for the lifecycle of the vm. > > Would it make sense to maybe just look at adding a postcopy value to > the PrecopyNotifyData that you could populate with > migration_in_postcopy() in precopy_notify()? > > Then all you would need to do is check for that value and if it is set > you shut down the page hinting or don't start it since I suspect it > wouldn't likely add any value anyway since I would think flagging > unused pages doesn't add much value in a postcopy environment anyway. I don't think that's true. With free page hinting you reduce the effective VM size you have to migrate. Any page that has to be migrated will consume bandwidth. 1. Although postcopy transfers only the currently requested pages, the background thread will keep pushing pages, making postcopy eventually run longer. While in postcopy (well, and in precopy) we are faced with a clear performance degradation, so we want to minimize the overall time spent. 2. Usually you let precopy run for a while before switching to postcopy. With free page hinting you might be able to greatly reduce the number of pages you'll have to migrate later in the same amount of time. So there would be value, but at least I am not too interested in making it work in combination perfectly if it results in significant migration code changes; my goal is to not silently break guests when used in combination -- once there is the actual requirement to optimize this setup, we can work on that optimization (as discussed with MST here). So I'll explore going the migrate_postcopy_ram() way to silently (or at least warn) disable free page hinting. Thanks. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT 2021-07-07 14:06 ` [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT David Hildenbrand 2021-07-07 18:02 ` Peter Xu @ 2021-07-07 19:05 ` Michael S. Tsirkin 2021-07-07 19:14 ` David Hildenbrand 1 sibling, 1 reply; 20+ messages in thread From: Michael S. Tsirkin @ 2021-07-07 19:05 UTC (permalink / raw) To: David Hildenbrand Cc: Juan Quintela, qemu-devel, Alexander Duyck, qemu-stable, Wei Wang, Peter Xu, Philippe Mathieu-Daudé, Dr. David Alan Gilbert On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote: > Postcopy never worked properly with 'free-page-hint=on', as there are > at least two issues: > > 1) With postcopy, the guest will never receive a VIRTIO_BALLOON_CMD_ID_DONE > and consequently won't release free pages back to the OS once > migration finishes. > > The issue is that for postcopy, we won't do a final bitmap sync while > the guest is stopped on the source and > virtio_balloon_free_page_hint_notify() will only call > virtio_balloon_free_page_done() on the source during > PRECOPY_NOTIFY_CLEANUP, after the VM state was already migrated to > the destination. > > 2) Once the VM touches a page on the destination that has been excluded > from migration on the source via qemu_guest_free_page_hint() while > postcopy is active, that thread will stall until postcopy finishes > and all threads are woken up. (with older Linux kernels that won't > retry faults when woken up via userfaultfd, we might actually get a > SEGFAULT) > > The issue is that the source will refuse to migrate any pages that > are not marked as dirty in the dirty bmap -- for example, because the > page might just have been sent. Consequently, the faulting thread will > stall, waiting for the page to be migrated -- which could take quite > a while and result in guest OS issues. OK so if source gets a request for a page which is not dirty it does not respond immediately? Why not just teach it to respond? It would seem that if destination wants a page we should just give it to the destination ... > > While we could fix 1), for example, by calling > virtio_balloon_free_page_done() via pre_save callbacks of the > vmstate, 2) is mostly impossible to fix without additional tracking, > such that we can actually identify these hinted pages and handle > them accordingly. > As it never worked properly, let's disable it via the postcopy notifier on > the destination. Trying to set "migrate_set_capability postcopy-ram on" > on the destination now results in "virtio-balloon: 'free-page-hint' does > not support postcopy Error: Postcopy is not supported". > Note 1: We could let qemu_guest_free_page_hint() mark postcopy > as broken once actually clearing bits on the source. However, it's > harder to realize as we can race with users starting postcopy > and we cannot produce an expressive error message easily. How about the reverse? Ignore qemu_guest_free_page_hint if postcopy started? Seems better than making it user/guest visible .. > Note 2: virtio-mem has similar issues, however, access to "unplugged" > memory by the guest is very rare and we would have to be very > lucky for it to happen during migration. The spec states > "The driver SHOULD NOT read from unplugged memory blocks ..." > and "The driver MUST NOT write to unplugged memory blocks". > virtio-mem will move away from virtio_balloon_free_page_done() > soon and handle this case explicitly on the destination. > > Fixes: c13c4153f76d ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT") OK it's not too bad, but I wonder whether above aideas have been explored. > Cc: qemu-stable@nongnu.org > Cc: Wei Wang <wei.w.wang@intel.com> > Cc: Michael S. Tsirkin <mst@redhat.com> > Cc: Philippe Mathieu-Daudé <philmd@redhat.com> > Cc: Alexander Duyck <alexander.duyck@gmail.com> > Cc: Juan Quintela <quintela@redhat.com> > Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > Cc: Peter Xu <peterx@redhat.com> > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > hw/virtio/virtio-balloon.c | 26 ++++++++++++++++++++++++++ > include/hw/virtio/virtio-balloon.h | 1 + > 2 files changed, 27 insertions(+) > > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c > index 4b5d9e5e50..d0c9dc677c 100644 > --- a/hw/virtio/virtio-balloon.c > +++ b/hw/virtio/virtio-balloon.c > @@ -30,6 +30,7 @@ > #include "trace.h" > #include "qemu/error-report.h" > #include "migration/misc.h" > +#include "migration/postcopy-ram.h" > > #include "hw/virtio/virtio-bus.h" > #include "hw/virtio/virtio-access.h" > @@ -692,6 +693,28 @@ virtio_balloon_free_page_hint_notify(NotifierWithReturn *n, void *data) > return 0; > } > > + > +static int virtio_balloon_postcopy_notify(NotifierWithReturn *n, void *opaque) > +{ > + VirtIOBalloon *dev = container_of(n, VirtIOBalloon, postcopy_notifier); > + PostcopyNotifyData *pnd = opaque; > + > + /* We register the notifier only with 'free-page-hint=on' for now. */ > + g_assert(virtio_has_feature(dev->host_features, > + VIRTIO_BALLOON_F_FREE_PAGE_HINT)); > + > + /* > + * Pages hinted via qemu_guest_free_page_hint() are cleared from the dirty > + * bitmap and will not get migrated, especially also not when the postcopy > + * destination starts using them and requests migration from the source; the > + * faulting thread will stall until postcopy migration finishes and > + * all threads are woken up. > + */ > + error_setg(pnd->errp, > + "virtio-balloon: 'free-page-hint' does not support postcopy"); > + return -ENOENT; > +} > + > static size_t virtio_balloon_config_size(VirtIOBalloon *s) > { > uint64_t features = s->host_features; > @@ -911,6 +934,7 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp) > s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, > virtio_balloon_handle_free_page_vq); > precopy_add_notifier(&s->free_page_hint_notify); > + postcopy_add_notifier(&s->postcopy_notifier); > > object_ref(OBJECT(s->iothread)); > s->free_page_bh = aio_bh_new(iothread_get_aio_context(s->iothread), > @@ -935,6 +959,7 @@ static void virtio_balloon_device_unrealize(DeviceState *dev) > object_unref(OBJECT(s->iothread)); > virtio_balloon_free_page_stop(s); > precopy_remove_notifier(&s->free_page_hint_notify); > + postcopy_remove_notifier(&s->postcopy_notifier); > } > balloon_stats_destroy_timer(s); > qemu_remove_balloon_handler(s); > @@ -1008,6 +1033,7 @@ static void virtio_balloon_instance_init(Object *obj) > qemu_cond_init(&s->free_page_cond); > s->free_page_hint_cmd_id = VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN; > s->free_page_hint_notify.notify = virtio_balloon_free_page_hint_notify; > + s->postcopy_notifier.notify = virtio_balloon_postcopy_notify; > > object_property_add(obj, "guest-stats", "guest statistics", > balloon_stats_get_all, NULL, NULL, s); > diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h > index 5139cf8ab6..d0d5b793b9 100644 > --- a/include/hw/virtio/virtio-balloon.h > +++ b/include/hw/virtio/virtio-balloon.h > @@ -65,6 +65,7 @@ struct VirtIOBalloon { > */ > bool block_iothread; > NotifierWithReturn free_page_hint_notify; > + NotifierWithReturn postcopy_notifier; > int64_t stats_last_update; > int64_t stats_poll_interval; > uint32_t host_features; > -- > 2.31.1 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT 2021-07-07 19:05 ` Michael S. Tsirkin @ 2021-07-07 19:14 ` David Hildenbrand 2021-07-07 19:19 ` Michael S. Tsirkin 0 siblings, 1 reply; 20+ messages in thread From: David Hildenbrand @ 2021-07-07 19:14 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Juan Quintela, qemu-devel, Alexander Duyck, qemu-stable, Wei Wang, Peter Xu, Philippe Mathieu-Daudé, Dr. David Alan Gilbert On 07.07.21 21:05, Michael S. Tsirkin wrote: > On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote: >> Postcopy never worked properly with 'free-page-hint=on', as there are >> at least two issues: >> >> 1) With postcopy, the guest will never receive a VIRTIO_BALLOON_CMD_ID_DONE >> and consequently won't release free pages back to the OS once >> migration finishes. >> >> The issue is that for postcopy, we won't do a final bitmap sync while >> the guest is stopped on the source and >> virtio_balloon_free_page_hint_notify() will only call >> virtio_balloon_free_page_done() on the source during >> PRECOPY_NOTIFY_CLEANUP, after the VM state was already migrated to >> the destination. >> >> 2) Once the VM touches a page on the destination that has been excluded >> from migration on the source via qemu_guest_free_page_hint() while >> postcopy is active, that thread will stall until postcopy finishes >> and all threads are woken up. (with older Linux kernels that won't >> retry faults when woken up via userfaultfd, we might actually get a >> SEGFAULT) >> >> The issue is that the source will refuse to migrate any pages that >> are not marked as dirty in the dirty bmap -- for example, because the >> page might just have been sent. Consequently, the faulting thread will >> stall, waiting for the page to be migrated -- which could take quite >> a while and result in guest OS issues. > > OK so if source gets a request for a page which is not dirty > it does not respond immediately? Why not just teach it to > respond? It would seem that if destination wants a page we > should just give it to the destination ... The source does not know if a page has already been sent (e.g., via the background migration thread that moves all data over) vs. the page has not been send because the page was hinted. This is the part where we'd need additional tracking on the source to actually know that. We must not send a page twice, otherwise bad things can happen when placing pages that already have been migrated, because that scenario can easily happen with ordinary postcopy (page has already been sent and we're dealing with a stale request from the destination). > > >> >> While we could fix 1), for example, by calling >> virtio_balloon_free_page_done() via pre_save callbacks of the >> vmstate, 2) is mostly impossible to fix without additional tracking, >> such that we can actually identify these hinted pages and handle >> them accordingly. >> As it never worked properly, let's disable it via the postcopy notifier on >> the destination. Trying to set "migrate_set_capability postcopy-ram on" >> on the destination now results in "virtio-balloon: 'free-page-hint' does >> not support postcopy Error: Postcopy is not supported". >> Note 1: We could let qemu_guest_free_page_hint() mark postcopy >> as broken once actually clearing bits on the source. However, it's >> harder to realize as we can race with users starting postcopy >> and we cannot produce an expressive error message easily. > > > How about the reverse? Ignore qemu_guest_free_page_hint if postcopy > started? Seems better than making it user/guest visible .. Might be an option, but we let the user configure something that does not work in combination ... essentially ignoring one of both user settings. Also not perfect IMHO. > >> Note 2: virtio-mem has similar issues, however, access to "unplugged" >> memory by the guest is very rare and we would have to be very >> lucky for it to happen during migration. The spec states >> "The driver SHOULD NOT read from unplugged memory blocks ..." >> and "The driver MUST NOT write to unplugged memory blocks". >> virtio-mem will move away from virtio_balloon_free_page_done() >> soon and handle this case explicitly on the destination. >> >> Fixes: c13c4153f76d ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT") > > OK it's not too bad, but I wonder whether above aideas have been > explored. TBH, it's been broken all along and I'd rather have a simple fix. If somebody ever cares about this, we could investigate making it work (or making postcopy overrule free page hinting). But I'm open for suggestions. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT 2021-07-07 19:14 ` David Hildenbrand @ 2021-07-07 19:19 ` Michael S. Tsirkin 2021-07-07 19:47 ` David Hildenbrand 0 siblings, 1 reply; 20+ messages in thread From: Michael S. Tsirkin @ 2021-07-07 19:19 UTC (permalink / raw) To: David Hildenbrand Cc: Juan Quintela, qemu-devel, Alexander Duyck, qemu-stable, Wei Wang, Peter Xu, Philippe Mathieu-Daudé, Dr. David Alan Gilbert On Wed, Jul 07, 2021 at 09:14:00PM +0200, David Hildenbrand wrote: > On 07.07.21 21:05, Michael S. Tsirkin wrote: > > On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote: > > > Postcopy never worked properly with 'free-page-hint=on', as there are > > > at least two issues: > > > > > > 1) With postcopy, the guest will never receive a VIRTIO_BALLOON_CMD_ID_DONE > > > and consequently won't release free pages back to the OS once > > > migration finishes. > > > > > > The issue is that for postcopy, we won't do a final bitmap sync while > > > the guest is stopped on the source and > > > virtio_balloon_free_page_hint_notify() will only call > > > virtio_balloon_free_page_done() on the source during > > > PRECOPY_NOTIFY_CLEANUP, after the VM state was already migrated to > > > the destination. > > > > > > 2) Once the VM touches a page on the destination that has been excluded > > > from migration on the source via qemu_guest_free_page_hint() while > > > postcopy is active, that thread will stall until postcopy finishes > > > and all threads are woken up. (with older Linux kernels that won't > > > retry faults when woken up via userfaultfd, we might actually get a > > > SEGFAULT) > > > > > > The issue is that the source will refuse to migrate any pages that > > > are not marked as dirty in the dirty bmap -- for example, because the > > > page might just have been sent. Consequently, the faulting thread will > > > stall, waiting for the page to be migrated -- which could take quite > > > a while and result in guest OS issues. > > > > OK so if source gets a request for a page which is not dirty > > it does not respond immediately? Why not just teach it to > > respond? It would seem that if destination wants a page we > > should just give it to the destination ... > > The source does not know if a page has already been sent (e.g., via the > background migration thread that moves all data over) vs. the page has not > been send because the page was hinted. This is the part where we'd need > additional tracking on the source to actually know that. > > We must not send a page twice, otherwise bad things can happen when placing > pages that already have been migrated, because that scenario can easily > happen with ordinary postcopy (page has already been sent and we're dealing > with a stale request from the destination). OK let me get this straight A. source sends page B. destination requests page C. destination changes page D. source sends page E. destination overwrites page this is what you are worried about right? the fix is to mark page clean in A. then in D to not send page if it's clean? And the problem with hinting is this: A. page is marked clean B. destination requests page C. destination changes page D. source sends page <- does not happen, page is clean! E. destination overwrites page did I get it right? > > > > > > > > > > While we could fix 1), for example, by calling > > > virtio_balloon_free_page_done() via pre_save callbacks of the > > > vmstate, 2) is mostly impossible to fix without additional tracking, > > > such that we can actually identify these hinted pages and handle > > > them accordingly. > > > As it never worked properly, let's disable it via the postcopy notifier on > > > the destination. Trying to set "migrate_set_capability postcopy-ram on" > > > on the destination now results in "virtio-balloon: 'free-page-hint' does > > > not support postcopy Error: Postcopy is not supported". > > > Note 1: We could let qemu_guest_free_page_hint() mark postcopy > > > as broken once actually clearing bits on the source. However, it's > > > harder to realize as we can race with users starting postcopy > > > and we cannot produce an expressive error message easily. > > > > > > How about the reverse? Ignore qemu_guest_free_page_hint if postcopy > > started? Seems better than making it user/guest visible .. > > Might be an option, but we let the user configure something that does not > work in combination ... essentially ignoring one of both user settings. Also > not perfect IMHO. > > > > > > Note 2: virtio-mem has similar issues, however, access to "unplugged" > > > memory by the guest is very rare and we would have to be very > > > lucky for it to happen during migration. The spec states > > > "The driver SHOULD NOT read from unplugged memory blocks ..." > > > and "The driver MUST NOT write to unplugged memory blocks". > > > virtio-mem will move away from virtio_balloon_free_page_done() > > > soon and handle this case explicitly on the destination. > > > > > > Fixes: c13c4153f76d ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT") > > > > OK it's not too bad, but I wonder whether above aideas have been > > explored. > > TBH, it's been broken all along and I'd rather have a simple fix. If > somebody ever cares about this, we could investigate making it work (or > making postcopy overrule free page hinting). But I'm open for suggestions. > > -- > Thanks, > > David / dhildenb ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT 2021-07-07 19:19 ` Michael S. Tsirkin @ 2021-07-07 19:47 ` David Hildenbrand 2021-07-07 19:57 ` Michael S. Tsirkin 0 siblings, 1 reply; 20+ messages in thread From: David Hildenbrand @ 2021-07-07 19:47 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Juan Quintela, qemu-devel, Alexander Duyck, qemu-stable, Wei Wang, Peter Xu, Philippe Mathieu-Daudé, Dr. David Alan Gilbert On 07.07.21 21:19, Michael S. Tsirkin wrote: > On Wed, Jul 07, 2021 at 09:14:00PM +0200, David Hildenbrand wrote: >> On 07.07.21 21:05, Michael S. Tsirkin wrote: >>> On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote: >>>> Postcopy never worked properly with 'free-page-hint=on', as there are >>>> at least two issues: >>>> >>>> 1) With postcopy, the guest will never receive a VIRTIO_BALLOON_CMD_ID_DONE >>>> and consequently won't release free pages back to the OS once >>>> migration finishes. >>>> >>>> The issue is that for postcopy, we won't do a final bitmap sync while >>>> the guest is stopped on the source and >>>> virtio_balloon_free_page_hint_notify() will only call >>>> virtio_balloon_free_page_done() on the source during >>>> PRECOPY_NOTIFY_CLEANUP, after the VM state was already migrated to >>>> the destination. >>>> >>>> 2) Once the VM touches a page on the destination that has been excluded >>>> from migration on the source via qemu_guest_free_page_hint() while >>>> postcopy is active, that thread will stall until postcopy finishes >>>> and all threads are woken up. (with older Linux kernels that won't >>>> retry faults when woken up via userfaultfd, we might actually get a >>>> SEGFAULT) >>>> >>>> The issue is that the source will refuse to migrate any pages that >>>> are not marked as dirty in the dirty bmap -- for example, because the >>>> page might just have been sent. Consequently, the faulting thread will >>>> stall, waiting for the page to be migrated -- which could take quite >>>> a while and result in guest OS issues. >>> >>> OK so if source gets a request for a page which is not dirty >>> it does not respond immediately? Why not just teach it to >>> respond? It would seem that if destination wants a page we >>> should just give it to the destination ... >> >> The source does not know if a page has already been sent (e.g., via the >> background migration thread that moves all data over) vs. the page has not >> been send because the page was hinted. This is the part where we'd need >> additional tracking on the source to actually know that. >> >> We must not send a page twice, otherwise bad things can happen when placing >> pages that already have been migrated, because that scenario can easily >> happen with ordinary postcopy (page has already been sent and we're dealing >> with a stale request from the destination). > > OK let me get this straight > > A. source sends page > B. destination requests page > C. destination changes page > D. source sends page > E. destination overwrites page > > this is what you are worried about right? IIRC E. is with recent kernels: E. placing the page fails with -EEXIST and postcopy migration fails However, the man page (man ioctl_userfaultfd) doesn't describe what is actually supposed to happen when double-placing. Could be that it's "undefined behavior". I did not try, though. This is how it works today: A. source sends page and marks it clean B. destination requests page C. destination receives page and places it D. source ignores request as page is clean > > the fix is to mark page clean in A. > then in D to not send page if it's clean? > > And the problem with hinting is this: > > A. page is marked clean > B. destination requests page > C. destination changes page > D. source sends page <- does not happen, page is clean! > E. destination overwrites page Simplified it's A. page is marked clean by hinting code B. destination requests page D. source ignores request as page is clean E. destination stalls until postcopy unregisters uffd Some thoughts 1. We do have a a recv bitmap where we track received pages on the destination (e.g., ramblock_recv_bitmap_test()), however we only use it to avoid sending duplicate requests to the hypervisor AFAIKs, and don't check it when placing pages. 2. Changing the migration behavior unconditionally on the source will break migration to old QEMU binaries that cannot handle this change. 3. I think the current behavior is in place to make debugging easier. If only a single instance of a page will ever be migrated from source to destination, there cannot be silent data corruption. Further, we avoid migrating unnecessarily pages twice. Maybe Dave and Peter can spot any flaws in my understanding. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT 2021-07-07 19:47 ` David Hildenbrand @ 2021-07-07 19:57 ` Michael S. Tsirkin 2021-07-08 8:19 ` David Hildenbrand 2021-07-08 19:07 ` Dr. David Alan Gilbert 0 siblings, 2 replies; 20+ messages in thread From: Michael S. Tsirkin @ 2021-07-07 19:57 UTC (permalink / raw) To: David Hildenbrand Cc: Juan Quintela, qemu-devel, Alexander Duyck, qemu-stable, Wei Wang, Peter Xu, Philippe Mathieu-Daudé, Dr. David Alan Gilbert On Wed, Jul 07, 2021 at 09:47:31PM +0200, David Hildenbrand wrote: > On 07.07.21 21:19, Michael S. Tsirkin wrote: > > On Wed, Jul 07, 2021 at 09:14:00PM +0200, David Hildenbrand wrote: > > > On 07.07.21 21:05, Michael S. Tsirkin wrote: > > > > On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote: > > > > > Postcopy never worked properly with 'free-page-hint=on', as there are > > > > > at least two issues: > > > > > > > > > > 1) With postcopy, the guest will never receive a VIRTIO_BALLOON_CMD_ID_DONE > > > > > and consequently won't release free pages back to the OS once > > > > > migration finishes. > > > > > > > > > > The issue is that for postcopy, we won't do a final bitmap sync while > > > > > the guest is stopped on the source and > > > > > virtio_balloon_free_page_hint_notify() will only call > > > > > virtio_balloon_free_page_done() on the source during > > > > > PRECOPY_NOTIFY_CLEANUP, after the VM state was already migrated to > > > > > the destination. > > > > > > > > > > 2) Once the VM touches a page on the destination that has been excluded > > > > > from migration on the source via qemu_guest_free_page_hint() while > > > > > postcopy is active, that thread will stall until postcopy finishes > > > > > and all threads are woken up. (with older Linux kernels that won't > > > > > retry faults when woken up via userfaultfd, we might actually get a > > > > > SEGFAULT) > > > > > > > > > > The issue is that the source will refuse to migrate any pages that > > > > > are not marked as dirty in the dirty bmap -- for example, because the > > > > > page might just have been sent. Consequently, the faulting thread will > > > > > stall, waiting for the page to be migrated -- which could take quite > > > > > a while and result in guest OS issues. > > > > > > > > OK so if source gets a request for a page which is not dirty > > > > it does not respond immediately? Why not just teach it to > > > > respond? It would seem that if destination wants a page we > > > > should just give it to the destination ... > > > > > > The source does not know if a page has already been sent (e.g., via the > > > background migration thread that moves all data over) vs. the page has not > > > been send because the page was hinted. This is the part where we'd need > > > additional tracking on the source to actually know that. > > > > > > We must not send a page twice, otherwise bad things can happen when placing > > > pages that already have been migrated, because that scenario can easily > > > happen with ordinary postcopy (page has already been sent and we're dealing > > > with a stale request from the destination). > > > > OK let me get this straight > > > > A. source sends page > > B. destination requests page > > C. destination changes page > > D. source sends page > > E. destination overwrites page > > > > this is what you are worried about right? > > IIRC E. is with recent kernels: > > E. placing the page fails with -EEXIST and postcopy migration fails > > However, the man page (man ioctl_userfaultfd) doesn't describe what is > actually supposed to happen when double-placing. Could be that it's > "undefined behavior". > > I did not try, though. > > > This is how it works today: > > A. source sends page and marks it clean > B. destination requests page > C. destination receives page and places it > D. source ignores request as page is clean If it's actually -EEXIST then we could just resend it and teach destination to ignore -EEXIST errors right? Will actually make things a bit more robust: destination handles its own consistency instead of relying on source. > > > > the fix is to mark page clean in A. > > then in D to not send page if it's clean? > > > > And the problem with hinting is this: > > > > A. page is marked clean > > B. destination requests page > > C. destination changes page > > D. source sends page <- does not happen, page is clean! > > E. destination overwrites page > > Simplified it's > > A. page is marked clean by hinting code > B. destination requests page > D. source ignores request as page is clean > E. destination stalls until postcopy unregisters uffd > > > Some thoughts > > 1. We do have a a recv bitmap where we track received pages on the > destination (e.g., ramblock_recv_bitmap_test()), however we only use it to > avoid sending duplicate requests to the hypervisor AFAIKs, and don't check > it when placing pages. > > 2. Changing the migration behavior unconditionally on the source will break > migration to old QEMU binaries that cannot handle this change. We can always make this depend on new machine types. > 3. I think the current behavior is in place to make debugging easier. If > only a single instance of a page will ever be migrated from source to > destination, there cannot be silent data corruption. Further, we avoid > migrating unnecessarily pages twice. > Likely does not matter much for performance, it seems unlikely that the race is all that common. > Maybe Dave and Peter can spot any flaws in my understanding. > > -- > Thanks, > > David / dhildenb ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT 2021-07-07 19:57 ` Michael S. Tsirkin @ 2021-07-08 8:19 ` David Hildenbrand 2021-07-08 19:07 ` Dr. David Alan Gilbert 1 sibling, 0 replies; 20+ messages in thread From: David Hildenbrand @ 2021-07-08 8:19 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Juan Quintela, qemu-devel, Alexander Duyck, qemu-stable, Wei Wang, Peter Xu, Philippe Mathieu-Daudé, Dr. David Alan Gilbert On 07.07.21 21:57, Michael S. Tsirkin wrote: > On Wed, Jul 07, 2021 at 09:47:31PM +0200, David Hildenbrand wrote: >> On 07.07.21 21:19, Michael S. Tsirkin wrote: >>> On Wed, Jul 07, 2021 at 09:14:00PM +0200, David Hildenbrand wrote: >>>> On 07.07.21 21:05, Michael S. Tsirkin wrote: >>>>> On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote: >>>>>> Postcopy never worked properly with 'free-page-hint=on', as there are >>>>>> at least two issues: >>>>>> >>>>>> 1) With postcopy, the guest will never receive a VIRTIO_BALLOON_CMD_ID_DONE >>>>>> and consequently won't release free pages back to the OS once >>>>>> migration finishes. >>>>>> >>>>>> The issue is that for postcopy, we won't do a final bitmap sync while >>>>>> the guest is stopped on the source and >>>>>> virtio_balloon_free_page_hint_notify() will only call >>>>>> virtio_balloon_free_page_done() on the source during >>>>>> PRECOPY_NOTIFY_CLEANUP, after the VM state was already migrated to >>>>>> the destination. >>>>>> >>>>>> 2) Once the VM touches a page on the destination that has been excluded >>>>>> from migration on the source via qemu_guest_free_page_hint() while >>>>>> postcopy is active, that thread will stall until postcopy finishes >>>>>> and all threads are woken up. (with older Linux kernels that won't >>>>>> retry faults when woken up via userfaultfd, we might actually get a >>>>>> SEGFAULT) >>>>>> >>>>>> The issue is that the source will refuse to migrate any pages that >>>>>> are not marked as dirty in the dirty bmap -- for example, because the >>>>>> page might just have been sent. Consequently, the faulting thread will >>>>>> stall, waiting for the page to be migrated -- which could take quite >>>>>> a while and result in guest OS issues. >>>>> >>>>> OK so if source gets a request for a page which is not dirty >>>>> it does not respond immediately? Why not just teach it to >>>>> respond? It would seem that if destination wants a page we >>>>> should just give it to the destination ... >>>> >>>> The source does not know if a page has already been sent (e.g., via the >>>> background migration thread that moves all data over) vs. the page has not >>>> been send because the page was hinted. This is the part where we'd need >>>> additional tracking on the source to actually know that. >>>> >>>> We must not send a page twice, otherwise bad things can happen when placing >>>> pages that already have been migrated, because that scenario can easily >>>> happen with ordinary postcopy (page has already been sent and we're dealing >>>> with a stale request from the destination). >>> >>> OK let me get this straight >>> >>> A. source sends page >>> B. destination requests page >>> C. destination changes page >>> D. source sends page >>> E. destination overwrites page >>> >>> this is what you are worried about right? >> >> IIRC E. is with recent kernels: >> >> E. placing the page fails with -EEXIST and postcopy migration fails >> >> However, the man page (man ioctl_userfaultfd) doesn't describe what is >> actually supposed to happen when double-placing. Could be that it's >> "undefined behavior". >> >> I did not try, though. >> >> >> This is how it works today: >> >> A. source sends page and marks it clean >> B. destination requests page >> C. destination receives page and places it >> D. source ignores request as page is clean > > If it's actually -EEXIST then we could just resend it > and teach destination to ignore -EEXIST errors right? I think checking the received bitmap would be more robust. > > Will actually make things a bit more robust: destination > handles its own consistency instead of relying on source. TBH, I don't think having multiple copies of the same page in flight is neither a very good design, nor robust. In an idea world, the destination would make sure to send a page only once and the source would expect to receive a page only once. This is currently the case except for free page hinting, where a page might not be sent as we're relying on the dirty bitmap to also track what has been already sent. The destination does handle consistently right now by bailing out if it receives the page twice (e.g., -EEXIST). In addition, we could consult the received bitmap to make sure we really only receive stuff once instead of relying on undocumented userfaultfd behavior. On the source, we'd ideally have a "sent bitmap", but I'll really avoid introducing new bitmaps because that can't be the ultimate solution (dirty, clean, received, ...). I just found the comment that describes the current design: migration/ram.c:get_queued_page() "We're sending this page, and since it's postcopy nothing else will dirty it, and we must make sure it doesn't get sent again even if this queue request was received after the background search already sent it." > >>> >>> the fix is to mark page clean in A. >>> then in D to not send page if it's clean? >>> >>> And the problem with hinting is this: >>> >>> A. page is marked clean >>> B. destination requests page >>> C. destination changes page >>> D. source sends page <- does not happen, page is clean! >>> E. destination overwrites page >> >> Simplified it's >> >> A. page is marked clean by hinting code >> B. destination requests page >> D. source ignores request as page is clean >> E. destination stalls until postcopy unregisters uffd >> >> >> Some thoughts >> >> 1. We do have a a recv bitmap where we track received pages on the >> destination (e.g., ramblock_recv_bitmap_test()), however we only use it to >> avoid sending duplicate requests to the hypervisor AFAIKs, and don't check >> it when placing pages. >> >> 2. Changing the migration behavior unconditionally on the source will break >> migration to old QEMU binaries that cannot handle this change. > > We can always make this depend on new machine types. Yes, we could if we want to go down that path. > > >> 3. I think the current behavior is in place to make debugging easier. If >> only a single instance of a page will ever be migrated from source to >> destination, there cannot be silent data corruption. Further, we avoid >> migrating unnecessarily pages twice. >> > > Likely does not matter much for performance, it seems unlikely that > the race is all that common. I cannot really tell, I'd guess it can happen but shouldn't happen too often. For now I'm going to keep it simple and ignore free page hints while postcopy has been enabled as suggested in the other discussion. That should be the easy fix and if someone wants to optimize this scenarios, we can think about a proper way to make it fly (as I said, nobody seemed to really have cared in the past as it's mostly broken right now). Thanks for the helpful discussion! -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT 2021-07-07 19:57 ` Michael S. Tsirkin 2021-07-08 8:19 ` David Hildenbrand @ 2021-07-08 19:07 ` Dr. David Alan Gilbert 2021-07-09 11:27 ` Michael S. Tsirkin 1 sibling, 1 reply; 20+ messages in thread From: Dr. David Alan Gilbert @ 2021-07-08 19:07 UTC (permalink / raw) To: Michael S. Tsirkin Cc: David Hildenbrand, Juan Quintela, qemu-stable, Alexander Duyck, qemu-devel, Wei Wang, Peter Xu, Philippe Mathieu-Daudé * Michael S. Tsirkin (mst@redhat.com) wrote: > On Wed, Jul 07, 2021 at 09:47:31PM +0200, David Hildenbrand wrote: > > On 07.07.21 21:19, Michael S. Tsirkin wrote: > > > On Wed, Jul 07, 2021 at 09:14:00PM +0200, David Hildenbrand wrote: > > > > On 07.07.21 21:05, Michael S. Tsirkin wrote: > > > > > On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote: > > > > > > Postcopy never worked properly with 'free-page-hint=on', as there are > > > > > > at least two issues: > > > > > > > > > > > > 1) With postcopy, the guest will never receive a VIRTIO_BALLOON_CMD_ID_DONE > > > > > > and consequently won't release free pages back to the OS once > > > > > > migration finishes. > > > > > > > > > > > > The issue is that for postcopy, we won't do a final bitmap sync while > > > > > > the guest is stopped on the source and > > > > > > virtio_balloon_free_page_hint_notify() will only call > > > > > > virtio_balloon_free_page_done() on the source during > > > > > > PRECOPY_NOTIFY_CLEANUP, after the VM state was already migrated to > > > > > > the destination. > > > > > > > > > > > > 2) Once the VM touches a page on the destination that has been excluded > > > > > > from migration on the source via qemu_guest_free_page_hint() while > > > > > > postcopy is active, that thread will stall until postcopy finishes > > > > > > and all threads are woken up. (with older Linux kernels that won't > > > > > > retry faults when woken up via userfaultfd, we might actually get a > > > > > > SEGFAULT) > > > > > > > > > > > > The issue is that the source will refuse to migrate any pages that > > > > > > are not marked as dirty in the dirty bmap -- for example, because the > > > > > > page might just have been sent. Consequently, the faulting thread will > > > > > > stall, waiting for the page to be migrated -- which could take quite > > > > > > a while and result in guest OS issues. > > > > > > > > > > OK so if source gets a request for a page which is not dirty > > > > > it does not respond immediately? Why not just teach it to > > > > > respond? It would seem that if destination wants a page we > > > > > should just give it to the destination ... > > > > > > > > The source does not know if a page has already been sent (e.g., via the > > > > background migration thread that moves all data over) vs. the page has not > > > > been send because the page was hinted. This is the part where we'd need > > > > additional tracking on the source to actually know that. > > > > > > > > We must not send a page twice, otherwise bad things can happen when placing > > > > pages that already have been migrated, because that scenario can easily > > > > happen with ordinary postcopy (page has already been sent and we're dealing > > > > with a stale request from the destination). > > > > > > OK let me get this straight > > > > > > A. source sends page > > > B. destination requests page > > > C. destination changes page > > > D. source sends page > > > E. destination overwrites page > > > > > > this is what you are worried about right? > > > > IIRC E. is with recent kernels: > > > > E. placing the page fails with -EEXIST and postcopy migration fails > > > > However, the man page (man ioctl_userfaultfd) doesn't describe what is > > actually supposed to happen when double-placing. Could be that it's > > "undefined behavior". > > > > I did not try, though. > > > > > > This is how it works today: > > > > A. source sends page and marks it clean > > B. destination requests page > > C. destination receives page and places it > > D. source ignores request as page is clean > > If it's actually -EEXIST then we could just resend it > and teach destination to ignore -EEXIST errors right? > > Will actually make things a bit more robust: destination > handles its own consistency instead of relying on source. You have to be careful of a few things; a) If the destination has modified the page, then you must definitely not under any circumstances lose those modifications and replace them by an old version from the source. b) With postcopy recovery I think there is a bitmap to track some of this; but you have to be careful since you don't know whether pages that were sent were actually received. Dave > > > > > > > > the fix is to mark page clean in A. > > > then in D to not send page if it's clean? > > > > > > And the problem with hinting is this: > > > > > > A. page is marked clean > > > B. destination requests page > > > C. destination changes page > > > D. source sends page <- does not happen, page is clean! > > > E. destination overwrites page > > > > Simplified it's > > > > A. page is marked clean by hinting code > > B. destination requests page > > D. source ignores request as page is clean > > E. destination stalls until postcopy unregisters uffd > > > > > > Some thoughts > > > > 1. We do have a a recv bitmap where we track received pages on the > > destination (e.g., ramblock_recv_bitmap_test()), however we only use it to > > avoid sending duplicate requests to the hypervisor AFAIKs, and don't check > > it when placing pages. > > > > 2. Changing the migration behavior unconditionally on the source will break > > migration to old QEMU binaries that cannot handle this change. > > We can always make this depend on new machine types. > > > > 3. I think the current behavior is in place to make debugging easier. If > > only a single instance of a page will ever be migrated from source to > > destination, there cannot be silent data corruption. Further, we avoid > > migrating unnecessarily pages twice. > > > > Likely does not matter much for performance, it seems unlikely that > the race is all that common. > > > Maybe Dave and Peter can spot any flaws in my understanding. > > > > -- > > Thanks, > > > > David / dhildenb > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT 2021-07-08 19:07 ` Dr. David Alan Gilbert @ 2021-07-09 11:27 ` Michael S. Tsirkin 0 siblings, 0 replies; 20+ messages in thread From: Michael S. Tsirkin @ 2021-07-09 11:27 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: David Hildenbrand, Juan Quintela, qemu-stable, Alexander Duyck, qemu-devel, Wei Wang, Peter Xu, Philippe Mathieu-Daudé On Thu, Jul 08, 2021 at 08:07:44PM +0100, Dr. David Alan Gilbert wrote: > * Michael S. Tsirkin (mst@redhat.com) wrote: > > On Wed, Jul 07, 2021 at 09:47:31PM +0200, David Hildenbrand wrote: > > > On 07.07.21 21:19, Michael S. Tsirkin wrote: > > > > On Wed, Jul 07, 2021 at 09:14:00PM +0200, David Hildenbrand wrote: > > > > > On 07.07.21 21:05, Michael S. Tsirkin wrote: > > > > > > On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote: > > > > > > > Postcopy never worked properly with 'free-page-hint=on', as there are > > > > > > > at least two issues: > > > > > > > > > > > > > > 1) With postcopy, the guest will never receive a VIRTIO_BALLOON_CMD_ID_DONE > > > > > > > and consequently won't release free pages back to the OS once > > > > > > > migration finishes. > > > > > > > > > > > > > > The issue is that for postcopy, we won't do a final bitmap sync while > > > > > > > the guest is stopped on the source and > > > > > > > virtio_balloon_free_page_hint_notify() will only call > > > > > > > virtio_balloon_free_page_done() on the source during > > > > > > > PRECOPY_NOTIFY_CLEANUP, after the VM state was already migrated to > > > > > > > the destination. > > > > > > > > > > > > > > 2) Once the VM touches a page on the destination that has been excluded > > > > > > > from migration on the source via qemu_guest_free_page_hint() while > > > > > > > postcopy is active, that thread will stall until postcopy finishes > > > > > > > and all threads are woken up. (with older Linux kernels that won't > > > > > > > retry faults when woken up via userfaultfd, we might actually get a > > > > > > > SEGFAULT) > > > > > > > > > > > > > > The issue is that the source will refuse to migrate any pages that > > > > > > > are not marked as dirty in the dirty bmap -- for example, because the > > > > > > > page might just have been sent. Consequently, the faulting thread will > > > > > > > stall, waiting for the page to be migrated -- which could take quite > > > > > > > a while and result in guest OS issues. > > > > > > > > > > > > OK so if source gets a request for a page which is not dirty > > > > > > it does not respond immediately? Why not just teach it to > > > > > > respond? It would seem that if destination wants a page we > > > > > > should just give it to the destination ... > > > > > > > > > > The source does not know if a page has already been sent (e.g., via the > > > > > background migration thread that moves all data over) vs. the page has not > > > > > been send because the page was hinted. This is the part where we'd need > > > > > additional tracking on the source to actually know that. > > > > > > > > > > We must not send a page twice, otherwise bad things can happen when placing > > > > > pages that already have been migrated, because that scenario can easily > > > > > happen with ordinary postcopy (page has already been sent and we're dealing > > > > > with a stale request from the destination). > > > > > > > > OK let me get this straight > > > > > > > > A. source sends page > > > > B. destination requests page > > > > C. destination changes page > > > > D. source sends page > > > > E. destination overwrites page > > > > > > > > this is what you are worried about right? > > > > > > IIRC E. is with recent kernels: > > > > > > E. placing the page fails with -EEXIST and postcopy migration fails > > > > > > However, the man page (man ioctl_userfaultfd) doesn't describe what is > > > actually supposed to happen when double-placing. Could be that it's > > > "undefined behavior". > > > > > > I did not try, though. > > > > > > > > > This is how it works today: > > > > > > A. source sends page and marks it clean > > > B. destination requests page > > > C. destination receives page and places it > > > D. source ignores request as page is clean > > > > If it's actually -EEXIST then we could just resend it > > and teach destination to ignore -EEXIST errors right? > > > > Will actually make things a bit more robust: destination > > handles its own consistency instead of relying on source. > > You have to be careful of a few things; > a) If the destination has modified the page, then you must > definitely not under any circumstances lose those modifications > and replace them by an old version from the source. > > b) With postcopy recovery I think there is a bitmap to track some > of this; but you have to be careful since you don't know whether > pages that were sent were actually received. > > Dave what I am trying to say is that userfaultfd already tracks these things in the kernel for us. Ideally we'd just use that ... > > > > > > > > > > > > the fix is to mark page clean in A. > > > > then in D to not send page if it's clean? > > > > > > > > And the problem with hinting is this: > > > > > > > > A. page is marked clean > > > > B. destination requests page > > > > C. destination changes page > > > > D. source sends page <- does not happen, page is clean! > > > > E. destination overwrites page > > > > > > Simplified it's > > > > > > A. page is marked clean by hinting code > > > B. destination requests page > > > D. source ignores request as page is clean > > > E. destination stalls until postcopy unregisters uffd > > > > > > > > > Some thoughts > > > > > > 1. We do have a a recv bitmap where we track received pages on the > > > destination (e.g., ramblock_recv_bitmap_test()), however we only use it to > > > avoid sending duplicate requests to the hypervisor AFAIKs, and don't check > > > it when placing pages. > > > > > > 2. Changing the migration behavior unconditionally on the source will break > > > migration to old QEMU binaries that cannot handle this change. > > > > We can always make this depend on new machine types. > > > > > > > 3. I think the current behavior is in place to make debugging easier. If > > > only a single instance of a page will ever be migrated from source to > > > destination, there cannot be silent data corruption. Further, we avoid > > > migrating unnecessarily pages twice. > > > > > > > Likely does not matter much for performance, it seems unlikely that > > the race is all that common. > > > > > Maybe Dave and Peter can spot any flaws in my understanding. > > > > > > -- > > > Thanks, > > > > > > David / dhildenb > > > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2021-07-09 11:36 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-07-07 14:06 [PATCH v1 0/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT David Hildenbrand 2021-07-07 14:06 ` [PATCH v1 1/2] migration/postcopy-ram: define type for "struct PostcopyNotifyData" David Hildenbrand 2021-07-07 20:30 ` Pankaj Gupta 2021-07-07 14:06 ` [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT David Hildenbrand 2021-07-07 18:02 ` Peter Xu 2021-07-07 18:57 ` David Hildenbrand 2021-07-07 19:07 ` Michael S. Tsirkin 2021-07-07 20:08 ` Peter Xu 2021-07-07 21:22 ` Alexander Duyck 2021-07-07 22:40 ` Peter Xu 2021-07-08 7:14 ` David Hildenbrand 2021-07-08 7:23 ` David Hildenbrand 2021-07-07 19:05 ` Michael S. Tsirkin 2021-07-07 19:14 ` David Hildenbrand 2021-07-07 19:19 ` Michael S. Tsirkin 2021-07-07 19:47 ` David Hildenbrand 2021-07-07 19:57 ` Michael S. Tsirkin 2021-07-08 8:19 ` David Hildenbrand 2021-07-08 19:07 ` Dr. David Alan Gilbert 2021-07-09 11:27 ` Michael S. Tsirkin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.