All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/2] migration: fix virtio-rng
@ 2017-04-11 13:17 Laurent Vivier
  2017-04-11 13:17 ` [Qemu-devel] [PATCH 1/2] migration: don't close a file descriptor while it can be in use Laurent Vivier
  2017-04-11 13:17 ` [Qemu-devel] [PATCH 2/2] virtio-rng: stop virtqueue while the CPU is stopped Laurent Vivier
  0 siblings, 2 replies; 6+ messages in thread
From: Laurent Vivier @ 2017-04-11 13:17 UTC (permalink / raw)
  To: Dr . David Alan Gilbert
  Cc: Michael S . Tsirkin, Stefan Hajnoczi, Amit Shah, qemu-devel

When post-copy migration is enabled, the destination
guest can ask for memory from the source when the
vmstate is restored.

In the case of virtio, a part of the virtqueue
is migrated by the vmstate structure (last_avail_idx)
another part is migrated inside the RAM (used_idx).
On the source side, the virtqueue can be modified
whereas the vmstate is already migrated, and the destination
side can ask for the value in RAM. In this case we have
an inconsistency that can generate this kind of error:
    "VQ 0 size 0x8 < last_avail_idx 0xa - used_idx 0"
in hw/virtio/virtio.c:2180, virtio_load().

This happens with virtio-rng as the chr_read()
function which modifies the virqueue is called
by the rng backend and the rng backend continues to
run while the migration is running and the CPU is stopped.

This series fixes this problem by ignoring chr_read()
calls while the CPU is stopped. The first patch of the
series fixes another problem triggered by this error
case: a use-after-free case.

The probability to have this problem is very low, as
generally the post-copy phase is very short, so the window
to modify the virtqueue while the vmstate has been sent
is very small... except if you are doing trans-continental
guest migration with high latency and post-copy phase that
can be run for minutes.

I've been able to reproduce the problem locally on a host,
by adding network latency with "tc". Another condition is
to have an rng daemon running in the guest to generate
events in the virtio-rng device.

Laurent Vivier (2):
  migration: don't close a file descriptor while it can be in use
  virtio-rng: stop virtqueue while the CPU is stopped

 hw/virtio/trace-events |  2 ++
 hw/virtio/virtio-rng.c | 10 ++++++++++
 migration/migration.c  |  6 +++---
 3 files changed, 15 insertions(+), 3 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH 1/2] migration: don't close a file descriptor while it can be in use
  2017-04-11 13:17 [Qemu-devel] [PATCH 0/2] migration: fix virtio-rng Laurent Vivier
@ 2017-04-11 13:17 ` Laurent Vivier
  2017-04-11 13:17 ` [Qemu-devel] [PATCH 2/2] virtio-rng: stop virtqueue while the CPU is stopped Laurent Vivier
  1 sibling, 0 replies; 6+ messages in thread
From: Laurent Vivier @ 2017-04-11 13:17 UTC (permalink / raw)
  To: Dr . David Alan Gilbert
  Cc: Michael S . Tsirkin, Stefan Hajnoczi, Amit Shah, qemu-devel

If we close the QEMUFile descriptor in process_incoming_migration_co()
while it has been stopped by an error, the postcopy_ram_listen_thread()
can try to continue to use it. And as the memory has been freed
it is working with an invalid pointer and crashes.

Fix this by releasing the memory after having managed the error
case (which, in fact, calls exit())

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 migration/migration.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index ad4036f..e024e0a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -436,9 +436,6 @@ static void process_incoming_migration_co(void *opaque)
         qemu_thread_join(&mis->colo_incoming_thread);
     }
 
-    qemu_fclose(f);
-    free_xbzrle_decoded_buf();
-
     if (ret < 0) {
         migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                           MIGRATION_STATUS_FAILED);
@@ -447,6 +444,9 @@ static void process_incoming_migration_co(void *opaque)
         exit(EXIT_FAILURE);
     }
 
+    qemu_fclose(f);
+    free_xbzrle_decoded_buf();
+
     mis->bh = qemu_bh_new(process_incoming_migration_bh, mis);
     qemu_bh_schedule(mis->bh);
 }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH 2/2] virtio-rng: stop virtqueue while the CPU is stopped
  2017-04-11 13:17 [Qemu-devel] [PATCH 0/2] migration: fix virtio-rng Laurent Vivier
  2017-04-11 13:17 ` [Qemu-devel] [PATCH 1/2] migration: don't close a file descriptor while it can be in use Laurent Vivier
@ 2017-04-11 13:17 ` Laurent Vivier
  2017-04-11 17:02   ` Stefan Hajnoczi
  1 sibling, 1 reply; 6+ messages in thread
From: Laurent Vivier @ 2017-04-11 13:17 UTC (permalink / raw)
  To: Dr . David Alan Gilbert
  Cc: Michael S . Tsirkin, Stefan Hajnoczi, Amit Shah, qemu-devel

If we modify the virtio-rng virqueue while the
vmstate is already migrated we can have some
inconsistencies between the virtqueue state and
the memory content.

To avoid this, stop the virtqueue while the CPU
is stopped.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 hw/virtio/trace-events |  2 ++
 hw/virtio/virtio-rng.c | 10 ++++++++++
 2 files changed, 12 insertions(+)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 6926eed..564a4b8 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -11,6 +11,8 @@ virtio_set_status(void *vdev, uint8_t val) "vdev %p val %u"
 
 # hw/virtio/virtio-rng.c
 virtio_rng_guest_not_ready(void *rng) "rng %p: guest not ready"
+virtio_rng_cpu_is_stopped(void *rng) "rng %p: cpu is stopped"
+virtio_rng_popped(void *rng) "rng %p: elem popped"
 virtio_rng_pushed(void *rng, size_t len) "rng %p: %zd bytes pushed"
 virtio_rng_request(void *rng, size_t size, unsigned quota) "rng %p: %zd bytes requested, %u bytes quota left"
 
diff --git a/hw/virtio/virtio-rng.c b/hw/virtio/virtio-rng.c
index 9639f4e..d270d56 100644
--- a/hw/virtio/virtio-rng.c
+++ b/hw/virtio/virtio-rng.c
@@ -53,6 +53,15 @@ static void chr_read(void *opaque, const void *buf, size_t size)
         return;
     }
 
+    /* we can't modify the virtqueue until
+     * our state is fully synced
+     */
+
+    if (!runstate_check(RUN_STATE_RUNNING)) {
+        trace_virtio_rng_cpu_is_stopped(vrng);
+        return;
+    }
+
     vrng->quota_remaining -= size;
 
     offset = 0;
@@ -61,6 +70,7 @@ static void chr_read(void *opaque, const void *buf, size_t size)
         if (!elem) {
             break;
         }
+        trace_virtio_rng_popped(vrng);
         len = iov_from_buf(elem->in_sg, elem->in_num,
                            0, buf + offset, size - offset);
         offset += len;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] virtio-rng: stop virtqueue while the CPU is stopped
  2017-04-11 13:17 ` [Qemu-devel] [PATCH 2/2] virtio-rng: stop virtqueue while the CPU is stopped Laurent Vivier
@ 2017-04-11 17:02   ` Stefan Hajnoczi
  2017-04-11 17:42     ` Laurent Vivier
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2017-04-11 17:02 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Dr . David Alan Gilbert, Michael S . Tsirkin, Amit Shah, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 880 bytes --]

On Tue, Apr 11, 2017 at 03:17:33PM +0200, Laurent Vivier wrote:
> diff --git a/hw/virtio/virtio-rng.c b/hw/virtio/virtio-rng.c
> index 9639f4e..d270d56 100644
> --- a/hw/virtio/virtio-rng.c
> +++ b/hw/virtio/virtio-rng.c
> @@ -53,6 +53,15 @@ static void chr_read(void *opaque, const void *buf, size_t size)
>          return;
>      }
>  
> +    /* we can't modify the virtqueue until
> +     * our state is fully synced
> +     */
> +
> +    if (!runstate_check(RUN_STATE_RUNNING)) {
> +        trace_virtio_rng_cpu_is_stopped(vrng);
> +        return;
> +    }
> +

I'm concerned about what happens when the guest is stopped and resumed
(e.g. 'stop' and 'cont' monitor commands).  Since we throw away the
chr_read() callback the device will hang unless the guest kicks it
again?

It's not clear to me that the rate limit timer will help us...

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] virtio-rng: stop virtqueue while the CPU is stopped
  2017-04-11 17:02   ` Stefan Hajnoczi
@ 2017-04-11 17:42     ` Laurent Vivier
  2017-04-12  9:57       ` Stefan Hajnoczi
  0 siblings, 1 reply; 6+ messages in thread
From: Laurent Vivier @ 2017-04-11 17:42 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Dr . David Alan Gilbert, Michael S . Tsirkin, Amit Shah, qemu-devel

On 11/04/2017 19:02, Stefan Hajnoczi wrote:
> On Tue, Apr 11, 2017 at 03:17:33PM +0200, Laurent Vivier wrote:
>> diff --git a/hw/virtio/virtio-rng.c b/hw/virtio/virtio-rng.c
>> index 9639f4e..d270d56 100644
>> --- a/hw/virtio/virtio-rng.c
>> +++ b/hw/virtio/virtio-rng.c
>> @@ -53,6 +53,15 @@ static void chr_read(void *opaque, const void *buf, size_t size)
>>          return;
>>      }
>>  
>> +    /* we can't modify the virtqueue until
>> +     * our state is fully synced
>> +     */
>> +
>> +    if (!runstate_check(RUN_STATE_RUNNING)) {
>> +        trace_virtio_rng_cpu_is_stopped(vrng);
>> +        return;
>> +    }
>> +
> 
> I'm concerned about what happens when the guest is stopped and resumed
> (e.g. 'stop' and 'cont' monitor commands).  Since we throw away the
> chr_read() callback the device will hang unless the guest kicks it
> again?
> 
> It's not clear to me that the rate limit timer will help us...

I think you're right (even if it seems hard to generate this case)

What is the best solution:

- re-arming the timer/the backend request by calling
virtio_rng_process() before the "return;"

or

- adding a vmstate change handler to call virtio_rng_process()?

Thanks,
Laurent

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] virtio-rng: stop virtqueue while the CPU is stopped
  2017-04-11 17:42     ` Laurent Vivier
@ 2017-04-12  9:57       ` Stefan Hajnoczi
  0 siblings, 0 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2017-04-12  9:57 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Dr . David Alan Gilbert, Michael S . Tsirkin, Amit Shah, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1464 bytes --]

On Tue, Apr 11, 2017 at 07:42:02PM +0200, Laurent Vivier wrote:
> On 11/04/2017 19:02, Stefan Hajnoczi wrote:
> > On Tue, Apr 11, 2017 at 03:17:33PM +0200, Laurent Vivier wrote:
> >> diff --git a/hw/virtio/virtio-rng.c b/hw/virtio/virtio-rng.c
> >> index 9639f4e..d270d56 100644
> >> --- a/hw/virtio/virtio-rng.c
> >> +++ b/hw/virtio/virtio-rng.c
> >> @@ -53,6 +53,15 @@ static void chr_read(void *opaque, const void *buf, size_t size)
> >>          return;
> >>      }
> >>  
> >> +    /* we can't modify the virtqueue until
> >> +     * our state is fully synced
> >> +     */
> >> +
> >> +    if (!runstate_check(RUN_STATE_RUNNING)) {
> >> +        trace_virtio_rng_cpu_is_stopped(vrng);
> >> +        return;
> >> +    }
> >> +
> > 
> > I'm concerned about what happens when the guest is stopped and resumed
> > (e.g. 'stop' and 'cont' monitor commands).  Since we throw away the
> > chr_read() callback the device will hang unless the guest kicks it
> > again?
> > 
> > It's not clear to me that the rate limit timer will help us...
> 
> I think you're right (even if it seems hard to generate this case)
> 
> What is the best solution:
> 
> - re-arming the timer/the backend request by calling
> virtio_rng_process() before the "return;"

This would waste CPU and throw away entropy bits.

> or
> 
> - adding a vmstate change handler to call virtio_rng_process()?

Yes, I think vm change handlers solve the problem.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-04-12  9:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-11 13:17 [Qemu-devel] [PATCH 0/2] migration: fix virtio-rng Laurent Vivier
2017-04-11 13:17 ` [Qemu-devel] [PATCH 1/2] migration: don't close a file descriptor while it can be in use Laurent Vivier
2017-04-11 13:17 ` [Qemu-devel] [PATCH 2/2] virtio-rng: stop virtqueue while the CPU is stopped Laurent Vivier
2017-04-11 17:02   ` Stefan Hajnoczi
2017-04-11 17:42     ` Laurent Vivier
2017-04-12  9:57       ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.