All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] failover: don't allow to migrate a paused VM that needs PCI unplug
@ 2021-09-29 14:43 Laurent Vivier
  2021-09-29 14:43 ` [PATCH 1/2] migration: provide an error message to migration_cancel() Laurent Vivier
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Laurent Vivier @ 2021-09-29 14:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Jason Wang, Juan Quintela, Dr. David Alan Gilbert, Michael S. Tsirkin

As the guest OS is paused, we will never receive the unplug event
from the kernel and the migration cannot continue.

The first patch is optional, it provides the error message to display
to migration_cancel() rather than to have to call migrate_set_error()
from the caller.

Laurent Vivier (2):
  migration: provide an error message to migration_cancel()
  failover: don't allow to migrate a paused VM that needs PCI unplug

 migration/migration.h |  2 +-
 hw/net/virtio-net.c   | 10 +++++++++-
 migration/migration.c |  9 ++++++---
 migration/ram.c       |  3 +--
 4 files changed, 17 insertions(+), 7 deletions(-)

-- 
2.31.1




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] migration: provide an error message to migration_cancel()
  2021-09-29 14:43 [PATCH 0/2] failover: don't allow to migrate a paused VM that needs PCI unplug Laurent Vivier
@ 2021-09-29 14:43 ` Laurent Vivier
  2021-11-02  9:04   ` Juan Quintela
  2021-09-29 14:43 ` [PATCH 2/2] failover: don't allow to migrate a paused VM that needs PCI unplug Laurent Vivier
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Laurent Vivier @ 2021-09-29 14:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Jason Wang, Juan Quintela, Dr. David Alan Gilbert, Michael S. Tsirkin

This avoids to call migrate_get_current() in the caller function
whereas migration_cancel() already needs the pointer to the current
migration state.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 migration/migration.h | 2 +-
 migration/migration.c | 9 ++++++---
 migration/ram.c       | 3 +--
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 7a5aa8c2fdde..8130b703eb95 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -388,7 +388,7 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
 void migration_make_urgent_request(void);
 void migration_consume_urgent_request(void);
 bool migration_rate_limit(void);
-void migration_cancel(void);
+void migration_cancel(const Error *error);
 
 void populate_vfio_info(MigrationInfo *info);
 
diff --git a/migration/migration.c b/migration/migration.c
index bb909781b7f5..565cbc859744 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -215,8 +215,11 @@ void migration_object_init(void)
     dirty_bitmap_mig_init();
 }
 
-void migration_cancel(void)
+void migration_cancel(const Error *error)
 {
+    if (error) {
+        migrate_set_error(current_migration, error);
+    }
     migrate_fd_cancel(current_migration);
 }
 
@@ -226,7 +229,7 @@ void migration_shutdown(void)
      * Cancel the current migration - that will (eventually)
      * stop the migration using this structure
      */
-    migration_cancel();
+    migration_cancel(NULL);
     object_unref(OBJECT(current_migration));
 
     /*
@@ -2316,7 +2319,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 
 void qmp_migrate_cancel(Error **errp)
 {
-    migration_cancel();
+    migration_cancel(NULL);
 }
 
 void qmp_migrate_continue(MigrationStatus state, Error **errp)
diff --git a/migration/ram.c b/migration/ram.c
index 7a43bfd7afcb..50809b59ee70 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4178,9 +4178,8 @@ static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
          * Abort and indicate a proper reason.
          */
         error_setg(&err, "RAM block '%s' resized during precopy.", rb->idstr);
-        migrate_set_error(migrate_get_current(), err);
+        migration_cancel(err);
         error_free(err);
-        migration_cancel();
     }
 
     switch (ps) {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] failover: don't allow to migrate a paused VM that needs PCI unplug
  2021-09-29 14:43 [PATCH 0/2] failover: don't allow to migrate a paused VM that needs PCI unplug Laurent Vivier
  2021-09-29 14:43 ` [PATCH 1/2] migration: provide an error message to migration_cancel() Laurent Vivier
@ 2021-09-29 14:43 ` Laurent Vivier
  2021-11-02  9:04   ` Juan Quintela
  2021-11-02 15:04   ` Michael S. Tsirkin
  2021-10-06  9:22 ` [PATCH 0/2] " Laurent Vivier
  2021-10-21  8:49 ` Laurent Vivier
  3 siblings, 2 replies; 15+ messages in thread
From: Laurent Vivier @ 2021-09-29 14:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Jason Wang, Juan Quintela, Dr. David Alan Gilbert, Michael S. Tsirkin

As the guest OS is paused, we will never receive the unplug event
from the kernel and the migration cannot continue.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 hw/net/virtio-net.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index f205331dcf8c..e54b6c8cd86c 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -37,8 +37,10 @@
 #include "qapi/qapi-events-migration.h"
 #include "hw/virtio/virtio-access.h"
 #include "migration/misc.h"
+#include "migration/migration.h"
 #include "standard-headers/linux/ethtool.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/runstate.h"
 #include "trace.h"
 #include "monitor/qdev.h"
 #include "hw/pci/pci.h"
@@ -3279,7 +3281,13 @@ static void virtio_net_handle_migration_primary(VirtIONet *n, MigrationState *s)
     should_be_hidden = qatomic_read(&n->failover_primary_hidden);
 
     if (migration_in_setup(s) && !should_be_hidden) {
-        if (failover_unplug_primary(n, dev)) {
+        if (!runstate_is_running()) {
+            Error *err = NULL;
+            error_setg(&err,
+                       "cannot unplug primary device while VM is paused");
+            migration_cancel(err);
+            error_free(err);
+        } else if (failover_unplug_primary(n, dev)) {
             vmstate_unregister(VMSTATE_IF(dev), qdev_get_vmsd(dev), dev);
             qapi_event_send_unplug_primary(dev->id);
             qatomic_set(&n->failover_primary_hidden, true);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] failover: don't allow to migrate a paused VM that needs PCI unplug
  2021-09-29 14:43 [PATCH 0/2] failover: don't allow to migrate a paused VM that needs PCI unplug Laurent Vivier
  2021-09-29 14:43 ` [PATCH 1/2] migration: provide an error message to migration_cancel() Laurent Vivier
  2021-09-29 14:43 ` [PATCH 2/2] failover: don't allow to migrate a paused VM that needs PCI unplug Laurent Vivier
@ 2021-10-06  9:22 ` Laurent Vivier
  2021-10-21  8:49 ` Laurent Vivier
  3 siblings, 0 replies; 15+ messages in thread
From: Laurent Vivier @ 2021-10-06  9:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Jason Wang, Michael S. Tsirkin, Dr. David Alan Gilbert, Juan Quintela

On 29/09/2021 16:43, Laurent Vivier wrote:
> As the guest OS is paused, we will never receive the unplug event
> from the kernel and the migration cannot continue.
> 
> The first patch is optional, it provides the error message to display
> to migration_cancel() rather than to have to call migrate_set_error()
> from the caller.
> 
> Laurent Vivier (2):
>    migration: provide an error message to migration_cancel()
>    failover: don't allow to migrate a paused VM that needs PCI unplug
> 
>   migration/migration.h |  2 +-
>   hw/net/virtio-net.c   | 10 +++++++++-
>   migration/migration.c |  9 ++++++---
>   migration/ram.c       |  3 +--
>   4 files changed, 17 insertions(+), 7 deletions(-)
> 

Any comment?

Thanks,
Laurent



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] failover: don't allow to migrate a paused VM that needs PCI unplug
  2021-09-29 14:43 [PATCH 0/2] failover: don't allow to migrate a paused VM that needs PCI unplug Laurent Vivier
                   ` (2 preceding siblings ...)
  2021-10-06  9:22 ` [PATCH 0/2] " Laurent Vivier
@ 2021-10-21  8:49 ` Laurent Vivier
  3 siblings, 0 replies; 15+ messages in thread
From: Laurent Vivier @ 2021-10-21  8:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Jason Wang, Michael S. Tsirkin, Dr. David Alan Gilbert, Juan Quintela

On 29/09/2021 16:43, Laurent Vivier wrote:
> As the guest OS is paused, we will never receive the unplug event
> from the kernel and the migration cannot continue.
> 
> The first patch is optional, it provides the error message to display
> to migration_cancel() rather than to have to call migrate_set_error()
> from the caller.
> 
> Laurent Vivier (2):
>    migration: provide an error message to migration_cancel()
>    failover: don't allow to migrate a paused VM that needs PCI unplug
> 
>   migration/migration.h |  2 +-
>   hw/net/virtio-net.c   | 10 +++++++++-
>   migration/migration.c |  9 ++++++---
>   migration/ram.c       |  3 +--
>   4 files changed, 17 insertions(+), 7 deletions(-)
> 

ping...



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] migration: provide an error message to migration_cancel()
  2021-09-29 14:43 ` [PATCH 1/2] migration: provide an error message to migration_cancel() Laurent Vivier
@ 2021-11-02  9:04   ` Juan Quintela
  0 siblings, 0 replies; 15+ messages in thread
From: Juan Quintela @ 2021-11-02  9:04 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Jason Wang, qemu-devel, Dr. David Alan Gilbert, Michael S. Tsirkin

Laurent Vivier <lvivier@redhat.com> wrote:
> This avoids to call migrate_get_current() in the caller function
> whereas migration_cancel() already needs the pointer to the current
> migration state.
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

queued.


>          error_setg(&err, "RAM block '%s' resized during precopy.", rb->idstr);
> -        migrate_set_error(migrate_get_current(), err);
> +        migration_cancel(err);
>          error_free(err);
> -        migration_cancel();

This was already there, but it is ugly, isn't it?

Later, Juan.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] failover: don't allow to migrate a paused VM that needs PCI unplug
  2021-09-29 14:43 ` [PATCH 2/2] failover: don't allow to migrate a paused VM that needs PCI unplug Laurent Vivier
@ 2021-11-02  9:04   ` Juan Quintela
  2021-11-02 15:04   ` Michael S. Tsirkin
  1 sibling, 0 replies; 15+ messages in thread
From: Juan Quintela @ 2021-11-02  9:04 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Jason Wang, qemu-devel, Dr. David Alan Gilbert, Michael S. Tsirkin

Laurent Vivier <lvivier@redhat.com> wrote:
> As the guest OS is paused, we will never receive the unplug event
> from the kernel and the migration cannot continue.
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

queued.

I can't think of a better solution.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] failover: don't allow to migrate a paused VM that needs PCI unplug
  2021-09-29 14:43 ` [PATCH 2/2] failover: don't allow to migrate a paused VM that needs PCI unplug Laurent Vivier
  2021-11-02  9:04   ` Juan Quintela
@ 2021-11-02 15:04   ` Michael S. Tsirkin
  2021-11-02 15:28     ` Juan Quintela
  2021-11-02 17:06     ` Laurent Vivier
  1 sibling, 2 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2021-11-02 15:04 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Jason Wang, Juan Quintela, qemu-devel, Dr. David Alan Gilbert

On Wed, Sep 29, 2021 at 04:43:11PM +0200, Laurent Vivier wrote:
> As the guest OS is paused, we will never receive the unplug event
> from the kernel and the migration cannot continue.
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>

Well ... what if user previously did

pause
start migration
unpause

we are breaking it now for no good reason.

Further, how about

start migration
pause

are we going to break this too? by failing pause?


> ---
>  hw/net/virtio-net.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index f205331dcf8c..e54b6c8cd86c 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -37,8 +37,10 @@
>  #include "qapi/qapi-events-migration.h"
>  #include "hw/virtio/virtio-access.h"
>  #include "migration/misc.h"
> +#include "migration/migration.h"
>  #include "standard-headers/linux/ethtool.h"
>  #include "sysemu/sysemu.h"
> +#include "sysemu/runstate.h"
>  #include "trace.h"
>  #include "monitor/qdev.h"
>  #include "hw/pci/pci.h"
> @@ -3279,7 +3281,13 @@ static void virtio_net_handle_migration_primary(VirtIONet *n, MigrationState *s)
>      should_be_hidden = qatomic_read(&n->failover_primary_hidden);
>  
>      if (migration_in_setup(s) && !should_be_hidden) {
> -        if (failover_unplug_primary(n, dev)) {
> +        if (!runstate_is_running()) {
> +            Error *err = NULL;
> +            error_setg(&err,
> +                       "cannot unplug primary device while VM is paused");
> +            migration_cancel(err);
> +            error_free(err);
> +        } else if (failover_unplug_primary(n, dev)) {
>              vmstate_unregister(VMSTATE_IF(dev), qdev_get_vmsd(dev), dev);
>              qapi_event_send_unplug_primary(dev->id);
>              qatomic_set(&n->failover_primary_hidden, true);
> -- 
> 2.31.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] failover: don't allow to migrate a paused VM that needs PCI unplug
  2021-11-02 15:04   ` Michael S. Tsirkin
@ 2021-11-02 15:28     ` Juan Quintela
  2021-11-02 17:06     ` Laurent Vivier
  1 sibling, 0 replies; 15+ messages in thread
From: Juan Quintela @ 2021-11-02 15:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Laurent Vivier, Jason Wang, qemu-devel, Dr. David Alan Gilbert

"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Sep 29, 2021 at 04:43:11PM +0200, Laurent Vivier wrote:
>> As the guest OS is paused, we will never receive the unplug event
>> from the kernel and the migration cannot continue.
>> 
>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>
> Well ... what if user previously did
>
> pause
> start migration
> unpause
>
> we are breaking it now for no good reason.

No.  we are canceling the migration.  Migration can not finish on that
state.  We are inside the test:

      if (migration_in_setup(s) && !should_be_hidden) {

If you don't have any really weird setup[1], migration setup just takes
milliseconds (low units for small guest, and 200-300ms for really huge
ones).

So I still think this is right.


1: Weird here means things like RDMA, locking all the memory of one
   guest can take forever.  To get an idea about this, until we
   introduce RDMA, we didn't meassured the setup stage time, because it
   was so small that it didn't matter at all.

Unplug from guest is other operation that can take quite a long time,
because it depends on guest cooperation.

> Further, how about
>
> start migration
> pause
>
> are we going to break this too? by failing pause?

I haven't thougth about this one, but it shouldn't matter (famous last
words), beacuse there are to cases:

- migration has started and unplug has already finished, no problem.

- migration has started but we haven't yet arrived to
  virtio_net_handle_migration_primary().  We are paused, and we give the
  guest a good error message about why are we failing.  notice that
  migration can't finish anyways, it would stuck there forever waiting
  for the (stopped guest to unplug the device).

So the only case that I can see that *could* matter is:

- start migration
- pause the guest
   this implies pausing the migration
- unpause
   at this point we can continue the migration

do we really care about this scenary?

I think not, because the migration has advanced so few, that starting
from zero would be the best option anyways.

Later, Juan.

PD1: No, I am not sure what happens if you run "pause" after the event
     to guest is sent, but before that the guest finish the unplug (I
     guess it would stall).  But in this case, we are doing something at
     least fishy.  On the other hand, we know that "pause; migration"
     will never really work.

PD2: Perhaps we could "invet" another state that means:
IN_SETUP_AND_WE_CANT_BE_PAUSED, and change it between we ask for the
device to unplug, and that it unplugs.  But it looks really complicated.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] failover: don't allow to migrate a paused VM that needs PCI unplug
  2021-11-02 15:04   ` Michael S. Tsirkin
  2021-11-02 15:28     ` Juan Quintela
@ 2021-11-02 17:06     ` Laurent Vivier
  2021-11-02 17:08       ` Michael S. Tsirkin
  1 sibling, 1 reply; 15+ messages in thread
From: Laurent Vivier @ 2021-11-02 17:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Juan Quintela, qemu-devel, Dr. David Alan Gilbert

On 02/11/2021 16:04, Michael S. Tsirkin wrote:
> On Wed, Sep 29, 2021 at 04:43:11PM +0200, Laurent Vivier wrote:
>> As the guest OS is paused, we will never receive the unplug event
>> from the kernel and the migration cannot continue.
>>
>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> 
> Well ... what if user previously did
> 
> pause
> start migration
> unpause
> 
> we are breaking it now for no good reason.
> 
> Further, how about
> 
> start migration
> pause
> 
> are we going to break this too? by failing pause?
> 
> 

TL;DR: This patch only prevents to migrate a VFIO device as failover allows to start a 
migration with a VFIO device plugged in.

Long Story:

* before this patch:

- pause and start migration and unpause-> fails if we unpause too late because we migrate 
a VFIO device, works otherwise
- start migration and pause before we unplug the card -> hangs forever
- start migration and pause after we unplug the card -> it works fine

* After this patch:

- pause and start migration and unpause-> fails if we unpause too late because of the new 
error checking, works otherwise
- start migration and pause before we unplug the card -> fails because of the new error 
checking
- start migration and pause after we unplug the card -> it works fine

Thanks,
Laurent




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] failover: don't allow to migrate a paused VM that needs PCI unplug
  2021-11-02 17:06     ` Laurent Vivier
@ 2021-11-02 17:08       ` Michael S. Tsirkin
  2021-11-02 17:26         ` Juan Quintela
  2021-11-02 17:43         ` Laurent Vivier
  0 siblings, 2 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2021-11-02 17:08 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Jason Wang, Juan Quintela, qemu-devel, Dr. David Alan Gilbert

On Tue, Nov 02, 2021 at 06:06:51PM +0100, Laurent Vivier wrote:
> On 02/11/2021 16:04, Michael S. Tsirkin wrote:
> > On Wed, Sep 29, 2021 at 04:43:11PM +0200, Laurent Vivier wrote:
> > > As the guest OS is paused, we will never receive the unplug event
> > > from the kernel and the migration cannot continue.
> > > 
> > > Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> > 
> > Well ... what if user previously did
> > 
> > pause
> > start migration
> > unpause
> > 
> > we are breaking it now for no good reason.
> > 
> > Further, how about
> > 
> > start migration
> > pause
> > 
> > are we going to break this too? by failing pause?
> > 
> > 
> 
> TL;DR: This patch only prevents to migrate a VFIO device as failover allows
> to start a migration with a VFIO device plugged in.
> 
> Long Story:
> 
> * before this patch:
> 
> - pause and start migration and unpause-> fails if we unpause too late
> because we migrate a VFIO device, works otherwise


confused about this one. can you explain pls?

> - start migration and pause before we unplug the card -> hangs forever
> - start migration and pause after we unplug the card -> it works fine
> 
> * After this patch:
> 
> - pause and start migration and unpause-> fails if we unpause too late
> because of the new error checking, works otherwise
> - start migration and pause before we unplug the card -> fails because of
> the new error checking
> - start migration and pause after we unplug the card -> it works fine
> 
> Thanks,
> Laurent
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] failover: don't allow to migrate a paused VM that needs PCI unplug
  2021-11-02 17:08       ` Michael S. Tsirkin
@ 2021-11-02 17:26         ` Juan Quintela
  2021-11-02 17:47           ` Laurent Vivier
  2021-11-02 17:43         ` Laurent Vivier
  1 sibling, 1 reply; 15+ messages in thread
From: Juan Quintela @ 2021-11-02 17:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Laurent Vivier, Jason Wang, qemu-devel, Dr. David Alan Gilbert

"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, Nov 02, 2021 at 06:06:51PM +0100, Laurent Vivier wrote:
>> On 02/11/2021 16:04, Michael S. Tsirkin wrote:
>> > On Wed, Sep 29, 2021 at 04:43:11PM +0200, Laurent Vivier wrote:
>> > > As the guest OS is paused, we will never receive the unplug event
>> > > from the kernel and the migration cannot continue.
>> > > 
>> > > Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>> > 
>> > Well ... what if user previously did
>> > 
>> > pause
>> > start migration
>> > unpause
>> > 
>> > we are breaking it now for no good reason.
>> > 
>> > Further, how about
>> > 
>> > start migration
>> > pause
>> > 
>> > are we going to break this too? by failing pause?
>> > 
>> > 
>> 
>> TL;DR: This patch only prevents to migrate a VFIO device as failover allows
>> to start a migration with a VFIO device plugged in.
>> 
>> Long Story:
>> 
>> * before this patch:
>> 
>> - pause and start migration and unpause-> fails if we unpause too late
>> because we migrate a VFIO device, works otherwise
>
>
> confused about this one. can you explain pls?

Pause the guest.
Start migration.

     if (migration_in_setup(s) && !should_be_hidden) {
        if (failover_unplug_primary(n, dev)) {
             vmstate_unregister(VMSTATE_IF(dev), qdev_get_vmsd(dev), dev);
             qapi_event_send_unplug_primary(dev->id);

We send the unplug request, but the guest is paused.

             qatomic_set(&n->failover_primary_hidden, true);

callbacks, callbacks, callbacks.

        while (s->state == MIGRATION_STATUS_WAIT_UNPLUG &&
               qemu_savevm_state_guest_unplug_pending()) {
            qemu_sem_timedwait(&s->wait_unplug_sem, 250);
        }

And we are not able to get out of that loop, because we never get to the
point where the guest send the unplug command.

So, the only other thing that I can think of is putting one timeout
there, but how much?  That is a good question.

Later, Juan.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] failover: don't allow to migrate a paused VM that needs PCI unplug
  2021-11-02 17:08       ` Michael S. Tsirkin
  2021-11-02 17:26         ` Juan Quintela
@ 2021-11-02 17:43         ` Laurent Vivier
  1 sibling, 0 replies; 15+ messages in thread
From: Laurent Vivier @ 2021-11-02 17:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Juan Quintela, qemu-devel, Dr. David Alan Gilbert

On 02/11/2021 18:08, Michael S. Tsirkin wrote:
> On Tue, Nov 02, 2021 at 06:06:51PM +0100, Laurent Vivier wrote:
>> On 02/11/2021 16:04, Michael S. Tsirkin wrote:
>>> On Wed, Sep 29, 2021 at 04:43:11PM +0200, Laurent Vivier wrote:
>>>> As the guest OS is paused, we will never receive the unplug event
>>>> from the kernel and the migration cannot continue.
>>>>
>>>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>>>
>>> Well ... what if user previously did
>>>
>>> pause
>>> start migration
>>> unpause
>>>
>>> we are breaking it now for no good reason.
>>>
>>> Further, how about
>>>
>>> start migration
>>> pause
>>>
>>> are we going to break this too? by failing pause?
>>>
>>>
>>
>> TL;DR: This patch only prevents to migrate a VFIO device as failover allows
>> to start a migration with a VFIO device plugged in.
>>
>> Long Story:
>>
>> * before this patch:
>>
>> - pause and start migration and unpause-> fails if we unpause too late
>> because we migrate a VFIO device, works otherwise
> 
> 
> confused about this one. can you explain pls?
> 

Sorry, I've been confused by another bug: with ACPI unplug, we don't wait the unplug, and 
so if the machine is paused the VFIO is "migrated" and we have an error message on the 
destination side as the card cannot be plugged back.

but with PCIe native hotplug ("-global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off") 
we have:

before this patch:

if we pause and then start the migration, migration hangs until we unpause the VM. But the 
migration can hangs forever if the VM is never unpaused. Normally migration of a paused VM 
should not hang.

after this patch:

if we pause and then start the migration, the migration fails because of the new error.

Remember that the migration of a VM with a VFIO device normally fails, so a user should 
not try to migrate a VM with a VFIO device except if he knows he is using failover, and in 
this case he should know he must not pause the VM.

Thanks,
Laurent



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] failover: don't allow to migrate a paused VM that needs PCI unplug
  2021-11-02 17:26         ` Juan Quintela
@ 2021-11-02 17:47           ` Laurent Vivier
  2021-11-02 18:09             ` Juan Quintela
  0 siblings, 1 reply; 15+ messages in thread
From: Laurent Vivier @ 2021-11-02 17:47 UTC (permalink / raw)
  To: quintela, Michael S. Tsirkin
  Cc: Jason Wang, qemu-devel, Dr. David Alan Gilbert

On 02/11/2021 18:26, Juan Quintela wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> On Tue, Nov 02, 2021 at 06:06:51PM +0100, Laurent Vivier wrote:
>>> On 02/11/2021 16:04, Michael S. Tsirkin wrote:
>>>> On Wed, Sep 29, 2021 at 04:43:11PM +0200, Laurent Vivier wrote:
>>>>> As the guest OS is paused, we will never receive the unplug event
>>>>> from the kernel and the migration cannot continue.
>>>>>
>>>>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>>>>
>>>> Well ... what if user previously did
>>>>
>>>> pause
>>>> start migration
>>>> unpause
>>>>
>>>> we are breaking it now for no good reason.
>>>>
>>>> Further, how about
>>>>
>>>> start migration
>>>> pause
>>>>
>>>> are we going to break this too? by failing pause?
>>>>
>>>>
>>>
>>> TL;DR: This patch only prevents to migrate a VFIO device as failover allows
>>> to start a migration with a VFIO device plugged in.
>>>
>>> Long Story:
>>>
>>> * before this patch:
>>>
>>> - pause and start migration and unpause-> fails if we unpause too late
>>> because we migrate a VFIO device, works otherwise
>>
>>
>> confused about this one. can you explain pls?
> 
> Pause the guest.
> Start migration.
> 
>       if (migration_in_setup(s) && !should_be_hidden) {
>          if (failover_unplug_primary(n, dev)) {
>               vmstate_unregister(VMSTATE_IF(dev), qdev_get_vmsd(dev), dev);
>               qapi_event_send_unplug_primary(dev->id);
> 
> We send the unplug request, but the guest is paused.
> 
>               qatomic_set(&n->failover_primary_hidden, true);
> 
> callbacks, callbacks, callbacks.
> 
>          while (s->state == MIGRATION_STATUS_WAIT_UNPLUG &&
>                 qemu_savevm_state_guest_unplug_pending()) {
>              qemu_sem_timedwait(&s->wait_unplug_sem, 250);
>          }
> 
> And we are not able to get out of that loop, because we never get to the
> point where the guest send the unplug command.
> 
> So, the only other thing that I can think of is putting one timeout
> there, but how much?  That is a good question.
> 

Please, no timeout, IMHO timeout is worse than a clean exit on failure.

Thanks,
Laurent



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] failover: don't allow to migrate a paused VM that needs PCI unplug
  2021-11-02 17:47           ` Laurent Vivier
@ 2021-11-02 18:09             ` Juan Quintela
  0 siblings, 0 replies; 15+ messages in thread
From: Juan Quintela @ 2021-11-02 18:09 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Jason Wang, open list:All patches CC here,
	Dr. David Alan Gilbert, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 2396 bytes --]

On Tue, Nov 2, 2021, 18:47 Laurent Vivier <lvivier@redhat.com> wrote:

> On 02/11/2021 18:26, Juan Quintela wrote:
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> On Tue, Nov 02, 2021 at 06:06:51PM +0100, Laurent Vivier wrote:
> >>> On 02/11/2021 16:04, Michael S. Tsirkin wrote:
> >>>> On Wed, Sep 29, 2021 at 04:43:11PM +0200, Laurent Vivier wrote:
> >>>>> As the guest OS is paused, we will never receive the unplug event
> >>>>> from the kernel and the migration cannot continue.
> >>>>>
> >>>>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> >>>>
> >>>> Well ... what if user previously did
> >>>>
> >>>> pause
> >>>> start migration
> >>>> unpause
> >>>>
> >>>> we are breaking it now for no good reason.
> >>>>
> >>>> Further, how about
> >>>>
> >>>> start migration
> >>>> pause
> >>>>
> >>>> are we going to break this too? by failing pause?
> >>>>
> >>>>
> >>>
> >>> TL;DR: This patch only prevents to migrate a VFIO device as failover
> allows
> >>> to start a migration with a VFIO device plugged in.
> >>>
> >>> Long Story:
> >>>
> >>> * before this patch:
> >>>
> >>> - pause and start migration and unpause-> fails if we unpause too late
> >>> because we migrate a VFIO device, works otherwise
> >>
> >>
> >> confused about this one. can you explain pls?
> >
> > Pause the guest.
> > Start migration.
> >
> >       if (migration_in_setup(s) && !should_be_hidden) {
> >          if (failover_unplug_primary(n, dev)) {
> >               vmstate_unregister(VMSTATE_IF(dev), qdev_get_vmsd(dev),
> dev);
> >               qapi_event_send_unplug_primary(dev->id);
> >
> > We send the unplug request, but the guest is paused.
> >
> >               qatomic_set(&n->failover_primary_hidden, true);
> >
> > callbacks, callbacks, callbacks.
> >
> >          while (s->state == MIGRATION_STATUS_WAIT_UNPLUG &&
> >                 qemu_savevm_state_guest_unplug_pending()) {
> >              qemu_sem_timedwait(&s->wait_unplug_sem, 250);
> >          }
> >
> > And we are not able to get out of that loop, because we never get to the
> > point where the guest send the unplug command.
> >
> > So, the only other thing that I can think of is putting one timeout
> > there, but how much?  That is a good question.
> >
>
> Please, no timeout, IMHO timeout is worse than a clean exit on failure.
>


How long should we wait for the guest? If not a timeout....


> Thanks,
> Laurent
>
>

[-- Attachment #2: Type: text/html, Size: 3898 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-11-02 18:21 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-29 14:43 [PATCH 0/2] failover: don't allow to migrate a paused VM that needs PCI unplug Laurent Vivier
2021-09-29 14:43 ` [PATCH 1/2] migration: provide an error message to migration_cancel() Laurent Vivier
2021-11-02  9:04   ` Juan Quintela
2021-09-29 14:43 ` [PATCH 2/2] failover: don't allow to migrate a paused VM that needs PCI unplug Laurent Vivier
2021-11-02  9:04   ` Juan Quintela
2021-11-02 15:04   ` Michael S. Tsirkin
2021-11-02 15:28     ` Juan Quintela
2021-11-02 17:06     ` Laurent Vivier
2021-11-02 17:08       ` Michael S. Tsirkin
2021-11-02 17:26         ` Juan Quintela
2021-11-02 17:47           ` Laurent Vivier
2021-11-02 18:09             ` Juan Quintela
2021-11-02 17:43         ` Laurent Vivier
2021-10-06  9:22 ` [PATCH 0/2] " Laurent Vivier
2021-10-21  8:49 ` Laurent Vivier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.