All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/2] migration: fix virtio-rng
@ 2017-04-12 13:53 Laurent Vivier
  2017-04-12 13:53 ` [Qemu-devel] [PATCH v2 1/2] migration: don't close a file descriptor while it can be in use Laurent Vivier
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Laurent Vivier @ 2017-04-12 13:53 UTC (permalink / raw)
  To: Dr . David Alan Gilbert
  Cc: Michael S . Tsirkin, Stefan Hajnoczi, Amit Shah, qemu-devel

When post-copy migration is enabled, the destination
guest can ask for memory from the source when the
vmstate is restored.

In the case of virtio, a part of the virtqueue
is migrated by the vmstate structure (last_avail_idx)
another part is migrated inside the RAM (used_idx).
On the source side, the virtqueue can be modified
whereas the vmstate is already migrated, and the destination
side can ask for the value in RAM. In this case we have
an inconsistency that can generate this kind of error:
    "VQ 0 size 0x8 < last_avail_idx 0xa - used_idx 0"
in hw/virtio/virtio.c:2180, virtio_load().

This happens with virtio-rng as the chr_read()
function which modifies the virqueue is called
by the rng backend and the rng backend continues to
run while the migration is running and the CPU is stopped.

This series fixes this problem by ignoring chr_read()
calls while the CPU is stopped. The first patch of the
series fixes another problem triggered by this error
case: a use-after-free case.

The probability to have this problem is very low, as
generally the post-copy phase is very short, so the window
to modify the virtqueue while the vmstate has been sent
is very small... except if you are doing trans-continental
guest migration with high latency and post-copy phase that
can be run for minutes.

I've been able to reproduce the problem locally on a host,
by adding network latency with "tc". Another condition is
to have an rng daemon running in the guest to generate
events in the virtio-rng device.

v2:
- add a vm state change handler to restart the virtio-rng
  process when the CPU restarts (it also replaces
  the post_load function).

Laurent Vivier (2):
  migration: don't close a file descriptor while it can be in use
  virtio-rng: stop virtqueue while the CPU is stopped

 hw/virtio/trace-events         |  3 +++
 hw/virtio/virtio-rng.c         | 29 +++++++++++++++++++++++------
 include/hw/virtio/virtio-rng.h |  2 ++
 migration/migration.c          |  6 +++---
 4 files changed, 31 insertions(+), 9 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH v2 1/2] migration: don't close a file descriptor while it can be in use
  2017-04-12 13:53 [Qemu-devel] [PATCH v2 0/2] migration: fix virtio-rng Laurent Vivier
@ 2017-04-12 13:53 ` Laurent Vivier
  2017-04-20 18:48   ` Dr. David Alan Gilbert
  2017-04-21  9:19   ` Juan Quintela
  2017-04-12 13:53 ` [Qemu-devel] [PATCH v2 2/2] virtio-rng: stop virtqueue while the CPU is stopped Laurent Vivier
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 8+ messages in thread
From: Laurent Vivier @ 2017-04-12 13:53 UTC (permalink / raw)
  To: Dr . David Alan Gilbert
  Cc: Michael S . Tsirkin, Stefan Hajnoczi, Amit Shah, qemu-devel

If we close the QEMUFile descriptor in process_incoming_migration_co()
while it has been stopped by an error, the postcopy_ram_listen_thread()
can try to continue to use it. And as the memory has been freed
it is working with an invalid pointer and crashes.

Fix this by releasing the memory after having managed the error
case (which, in fact, calls exit())

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 migration/migration.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index ad4036f..e024e0a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -436,9 +436,6 @@ static void process_incoming_migration_co(void *opaque)
         qemu_thread_join(&mis->colo_incoming_thread);
     }
 
-    qemu_fclose(f);
-    free_xbzrle_decoded_buf();
-
     if (ret < 0) {
         migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                           MIGRATION_STATUS_FAILED);
@@ -447,6 +444,9 @@ static void process_incoming_migration_co(void *opaque)
         exit(EXIT_FAILURE);
     }
 
+    qemu_fclose(f);
+    free_xbzrle_decoded_buf();
+
     mis->bh = qemu_bh_new(process_incoming_migration_bh, mis);
     qemu_bh_schedule(mis->bh);
 }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH v2 2/2] virtio-rng: stop virtqueue while the CPU is stopped
  2017-04-12 13:53 [Qemu-devel] [PATCH v2 0/2] migration: fix virtio-rng Laurent Vivier
  2017-04-12 13:53 ` [Qemu-devel] [PATCH v2 1/2] migration: don't close a file descriptor while it can be in use Laurent Vivier
@ 2017-04-12 13:53 ` Laurent Vivier
  2017-04-21  9:20   ` Juan Quintela
  2017-04-13 14:51 ` [Qemu-devel] [PATCH v2 0/2] migration: fix virtio-rng Stefan Hajnoczi
  2017-04-17 19:33 ` Amit Shah
  3 siblings, 1 reply; 8+ messages in thread
From: Laurent Vivier @ 2017-04-12 13:53 UTC (permalink / raw)
  To: Dr . David Alan Gilbert
  Cc: Michael S . Tsirkin, Stefan Hajnoczi, Amit Shah, qemu-devel

If we modify the virtio-rng virqueue while the
vmstate is already migrated we can have some
inconsistencies between the virtqueue state and
the memory content.

To avoid this, stop the virtqueue while the CPU
is stopped.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 hw/virtio/trace-events         |  3 +++
 hw/virtio/virtio-rng.c         | 29 +++++++++++++++++++++++------
 include/hw/virtio/virtio-rng.h |  2 ++
 3 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 6926eed..1f7a7c1 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -11,8 +11,11 @@ virtio_set_status(void *vdev, uint8_t val) "vdev %p val %u"
 
 # hw/virtio/virtio-rng.c
 virtio_rng_guest_not_ready(void *rng) "rng %p: guest not ready"
+virtio_rng_cpu_is_stopped(void *rng, int size) "rng %p: cpu is stopped, dropping %d bytes"
+virtio_rng_popped(void *rng) "rng %p: elem popped"
 virtio_rng_pushed(void *rng, size_t len) "rng %p: %zd bytes pushed"
 virtio_rng_request(void *rng, size_t size, unsigned quota) "rng %p: %zd bytes requested, %u bytes quota left"
+virtio_rng_vm_state_change(void *rng, int running, int state) "rng %p: state change to running %d state %d"
 
 # hw/virtio/virtio-balloon.c
 #
diff --git a/hw/virtio/virtio-rng.c b/hw/virtio/virtio-rng.c
index 9639f4e..a6ee501 100644
--- a/hw/virtio/virtio-rng.c
+++ b/hw/virtio/virtio-rng.c
@@ -53,6 +53,15 @@ static void chr_read(void *opaque, const void *buf, size_t size)
         return;
     }
 
+    /* we can't modify the virtqueue until
+     * our state is fully synced
+     */
+
+    if (!runstate_check(RUN_STATE_RUNNING)) {
+        trace_virtio_rng_cpu_is_stopped(vrng, size);
+        return;
+    }
+
     vrng->quota_remaining -= size;
 
     offset = 0;
@@ -61,6 +70,7 @@ static void chr_read(void *opaque, const void *buf, size_t size)
         if (!elem) {
             break;
         }
+        trace_virtio_rng_popped(vrng);
         len = iov_from_buf(elem->in_sg, elem->in_num,
                            0, buf + offset, size - offset);
         offset += len;
@@ -120,17 +130,21 @@ static uint64_t get_features(VirtIODevice *vdev, uint64_t f, Error **errp)
     return f;
 }
 
-static int virtio_rng_post_load(void *opaque, int version_id)
+static void virtio_rng_vm_state_change(void *opaque, int running,
+                                       RunState state)
 {
     VirtIORNG *vrng = opaque;
 
+    trace_virtio_rng_vm_state_change(vrng, running, state);
+
     /* We may have an element ready but couldn't process it due to a quota
-     * limit.  Make sure to try again after live migration when the quota may
-     * have been reset.
+     * limit or because CPU was stopped.  Make sure to try again when the
+     * CPU restart.
      */
-    virtio_rng_process(vrng);
 
-    return 0;
+    if (running && is_guest_ready(vrng)) {
+        virtio_rng_process(vrng);
+    }
 }
 
 static void check_rate_limit(void *opaque)
@@ -198,6 +212,9 @@ static void virtio_rng_device_realize(DeviceState *dev, Error **errp)
     vrng->rate_limit_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
                                                check_rate_limit, vrng);
     vrng->activate_timer = true;
+
+    vrng->vmstate = qemu_add_vm_change_state_handler(virtio_rng_vm_state_change,
+                                                     vrng);
 }
 
 static void virtio_rng_device_unrealize(DeviceState *dev, Error **errp)
@@ -205,6 +222,7 @@ static void virtio_rng_device_unrealize(DeviceState *dev, Error **errp)
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VirtIORNG *vrng = VIRTIO_RNG(dev);
 
+    qemu_del_vm_change_state_handler(vrng->vmstate);
     timer_del(vrng->rate_limit_timer);
     timer_free(vrng->rate_limit_timer);
     virtio_cleanup(vdev);
@@ -218,7 +236,6 @@ static const VMStateDescription vmstate_virtio_rng = {
         VMSTATE_VIRTIO_DEVICE,
         VMSTATE_END_OF_LIST()
     },
-    .post_load =  virtio_rng_post_load,
 };
 
 static Property virtio_rng_properties[] = {
diff --git a/include/hw/virtio/virtio-rng.h b/include/hw/virtio/virtio-rng.h
index 2d40abd..922dce7 100644
--- a/include/hw/virtio/virtio-rng.h
+++ b/include/hw/virtio/virtio-rng.h
@@ -45,6 +45,8 @@ typedef struct VirtIORNG {
     QEMUTimer *rate_limit_timer;
     int64_t quota_remaining;
     bool activate_timer;
+
+    VMChangeStateEntry *vmstate;
 } VirtIORNG;
 
 #endif
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/2] migration: fix virtio-rng
  2017-04-12 13:53 [Qemu-devel] [PATCH v2 0/2] migration: fix virtio-rng Laurent Vivier
  2017-04-12 13:53 ` [Qemu-devel] [PATCH v2 1/2] migration: don't close a file descriptor while it can be in use Laurent Vivier
  2017-04-12 13:53 ` [Qemu-devel] [PATCH v2 2/2] virtio-rng: stop virtqueue while the CPU is stopped Laurent Vivier
@ 2017-04-13 14:51 ` Stefan Hajnoczi
  2017-04-17 19:33 ` Amit Shah
  3 siblings, 0 replies; 8+ messages in thread
From: Stefan Hajnoczi @ 2017-04-13 14:51 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Dr . David Alan Gilbert, Michael S . Tsirkin, Amit Shah, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2330 bytes --]

On Wed, Apr 12, 2017 at 03:53:10PM +0200, Laurent Vivier wrote:
> When post-copy migration is enabled, the destination
> guest can ask for memory from the source when the
> vmstate is restored.
> 
> In the case of virtio, a part of the virtqueue
> is migrated by the vmstate structure (last_avail_idx)
> another part is migrated inside the RAM (used_idx).
> On the source side, the virtqueue can be modified
> whereas the vmstate is already migrated, and the destination
> side can ask for the value in RAM. In this case we have
> an inconsistency that can generate this kind of error:
>     "VQ 0 size 0x8 < last_avail_idx 0xa - used_idx 0"
> in hw/virtio/virtio.c:2180, virtio_load().
> 
> This happens with virtio-rng as the chr_read()
> function which modifies the virqueue is called
> by the rng backend and the rng backend continues to
> run while the migration is running and the CPU is stopped.
> 
> This series fixes this problem by ignoring chr_read()
> calls while the CPU is stopped. The first patch of the
> series fixes another problem triggered by this error
> case: a use-after-free case.
> 
> The probability to have this problem is very low, as
> generally the post-copy phase is very short, so the window
> to modify the virtqueue while the vmstate has been sent
> is very small... except if you are doing trans-continental
> guest migration with high latency and post-copy phase that
> can be run for minutes.
> 
> I've been able to reproduce the problem locally on a host,
> by adding network latency with "tc". Another condition is
> to have an rng daemon running in the guest to generate
> events in the virtio-rng device.
> 
> v2:
> - add a vm state change handler to restart the virtio-rng
>   process when the CPU restarts (it also replaces
>   the post_load function).
> 
> Laurent Vivier (2):
>   migration: don't close a file descriptor while it can be in use
>   virtio-rng: stop virtqueue while the CPU is stopped
> 
>  hw/virtio/trace-events         |  3 +++
>  hw/virtio/virtio-rng.c         | 29 +++++++++++++++++++++++------
>  include/hw/virtio/virtio-rng.h |  2 ++
>  migration/migration.c          |  6 +++---
>  4 files changed, 31 insertions(+), 9 deletions(-)
> 
> -- 
> 2.9.3
> 

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/2] migration: fix virtio-rng
  2017-04-12 13:53 [Qemu-devel] [PATCH v2 0/2] migration: fix virtio-rng Laurent Vivier
                   ` (2 preceding siblings ...)
  2017-04-13 14:51 ` [Qemu-devel] [PATCH v2 0/2] migration: fix virtio-rng Stefan Hajnoczi
@ 2017-04-17 19:33 ` Amit Shah
  3 siblings, 0 replies; 8+ messages in thread
From: Amit Shah @ 2017-04-17 19:33 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Dr . David Alan Gilbert, Michael S . Tsirkin, Stefan Hajnoczi,
	qemu-devel

On (Wed) 12 Apr 2017 [15:53:10], Laurent Vivier wrote:
> When post-copy migration is enabled, the destination
> guest can ask for memory from the source when the
> vmstate is restored.
> 
> In the case of virtio, a part of the virtqueue
> is migrated by the vmstate structure (last_avail_idx)
> another part is migrated inside the RAM (used_idx).
> On the source side, the virtqueue can be modified
> whereas the vmstate is already migrated, and the destination
> side can ask for the value in RAM. In this case we have
> an inconsistency that can generate this kind of error:
>     "VQ 0 size 0x8 < last_avail_idx 0xa - used_idx 0"
> in hw/virtio/virtio.c:2180, virtio_load().
> 
> This happens with virtio-rng as the chr_read()
> function which modifies the virqueue is called
> by the rng backend and the rng backend continues to
> run while the migration is running and the CPU is stopped.
> 
> This series fixes this problem by ignoring chr_read()
> calls while the CPU is stopped. The first patch of the
> series fixes another problem triggered by this error
> case: a use-after-free case.
> 
> The probability to have this problem is very low, as
> generally the post-copy phase is very short, so the window
> to modify the virtqueue while the vmstate has been sent
> is very small... except if you are doing trans-continental
> guest migration with high latency and post-copy phase that
> can be run for minutes.
> 
> I've been able to reproduce the problem locally on a host,
> by adding network latency with "tc". Another condition is
> to have an rng daemon running in the guest to generate
> events in the virtio-rng device.

Acked-by: Amit Shah <amit@kernel.org>

		Amit
-- 
http://log.amitshah.net/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] migration: don't close a file descriptor while it can be in use
  2017-04-12 13:53 ` [Qemu-devel] [PATCH v2 1/2] migration: don't close a file descriptor while it can be in use Laurent Vivier
@ 2017-04-20 18:48   ` Dr. David Alan Gilbert
  2017-04-21  9:19   ` Juan Quintela
  1 sibling, 0 replies; 8+ messages in thread
From: Dr. David Alan Gilbert @ 2017-04-20 18:48 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Michael S . Tsirkin, Stefan Hajnoczi, Amit Shah, qemu-devel

* Laurent Vivier (lvivier@redhat.com) wrote:
> If we close the QEMUFile descriptor in process_incoming_migration_co()
> while it has been stopped by an error, the postcopy_ram_listen_thread()
> can try to continue to use it. And as the memory has been freed
> it is working with an invalid pointer and crashes.
> 
> Fix this by releasing the memory after having managed the error
> case (which, in fact, calls exit())
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Yes, this took me some thinking about why it got there.
(I only managed to reproduce this case once, even with 'tc')

'LISTEN' message via loadvm_postcopy_handle_listen,
postcopy state is set to LISTENING
sets mis->have_listen_thread
starts 'listen' thread
Errors while 'loading state of instance...' so fails
   qemu_loadvm_state_main in loadvm_handle_cmd_packaged
   fails loadvm_process_command
   fails qemu_loadvm_state_main
   fails in qemu_loadvm_state
       has mis->have_listen_thread
   process_incoming_migration_co
      since ret < 0 fails now rather than leaving it to the
      'listening thread' - which is probably still alive

Dave


> ---
>  migration/migration.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index ad4036f..e024e0a 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -436,9 +436,6 @@ static void process_incoming_migration_co(void *opaque)
>          qemu_thread_join(&mis->colo_incoming_thread);
>      }
>  
> -    qemu_fclose(f);
> -    free_xbzrle_decoded_buf();
> -
>      if (ret < 0) {
>          migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>                            MIGRATION_STATUS_FAILED);
> @@ -447,6 +444,9 @@ static void process_incoming_migration_co(void *opaque)
>          exit(EXIT_FAILURE);
>      }
>  
> +    qemu_fclose(f);
> +    free_xbzrle_decoded_buf();
> +
>      mis->bh = qemu_bh_new(process_incoming_migration_bh, mis);
>      qemu_bh_schedule(mis->bh);
>  }
> -- 
> 2.9.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] migration: don't close a file descriptor while it can be in use
  2017-04-12 13:53 ` [Qemu-devel] [PATCH v2 1/2] migration: don't close a file descriptor while it can be in use Laurent Vivier
  2017-04-20 18:48   ` Dr. David Alan Gilbert
@ 2017-04-21  9:19   ` Juan Quintela
  1 sibling, 0 replies; 8+ messages in thread
From: Juan Quintela @ 2017-04-21  9:19 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Dr . David Alan Gilbert, Amit Shah, qemu-devel, Stefan Hajnoczi,
	Michael S . Tsirkin

Laurent Vivier <lvivier@redhat.com> wrote:
> If we close the QEMUFile descriptor in process_incoming_migration_co()
> while it has been stopped by an error, the postcopy_ram_listen_thread()
> can try to continue to use it. And as the memory has been freed
> it is working with an invalid pointer and crashes.
>
> Fix this by releasing the memory after having managed the error
> case (which, in fact, calls exit())
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/2] virtio-rng: stop virtqueue while the CPU is stopped
  2017-04-12 13:53 ` [Qemu-devel] [PATCH v2 2/2] virtio-rng: stop virtqueue while the CPU is stopped Laurent Vivier
@ 2017-04-21  9:20   ` Juan Quintela
  0 siblings, 0 replies; 8+ messages in thread
From: Juan Quintela @ 2017-04-21  9:20 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Dr . David Alan Gilbert, Amit Shah, qemu-devel, Stefan Hajnoczi,
	Michael S . Tsirkin

Laurent Vivier <lvivier@redhat.com> wrote:
> If we modify the virtio-rng virqueue while the
> vmstate is already migrated we can have some
> inconsistencies between the virtqueue state and
> the memory content.
>
> To avoid this, stop the virtqueue while the CPU
> is stopped.
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-04-21  9:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-12 13:53 [Qemu-devel] [PATCH v2 0/2] migration: fix virtio-rng Laurent Vivier
2017-04-12 13:53 ` [Qemu-devel] [PATCH v2 1/2] migration: don't close a file descriptor while it can be in use Laurent Vivier
2017-04-20 18:48   ` Dr. David Alan Gilbert
2017-04-21  9:19   ` Juan Quintela
2017-04-12 13:53 ` [Qemu-devel] [PATCH v2 2/2] virtio-rng: stop virtqueue while the CPU is stopped Laurent Vivier
2017-04-21  9:20   ` Juan Quintela
2017-04-13 14:51 ` [Qemu-devel] [PATCH v2 0/2] migration: fix virtio-rng Stefan Hajnoczi
2017-04-17 19:33 ` Amit Shah

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.