All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 1/1] coroutine-lock: do not touch coroutine after another one has been entered
@ 2017-06-01 16:08 Roman Pen
  2017-06-02  8:35 ` Stefan Hajnoczi
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Roman Pen @ 2017-06-01 16:08 UTC (permalink / raw)
  Cc: Roman Pen, Paolo Bonzini, Fam Zheng, Stefan Hajnoczi, Kevin Wolf,
	qemu-devel

Submission of requests on linux aio is a bit tricky and can lead to
requests completions on submission path:

44713c9e8547 ("linux-aio: Handle io_submit() failure gracefully")
0ed93d84edab ("linux-aio: process completions from ioq_submit()")

That means that any coroutine which has been yielded in order to wait
for completion can be resumed from submission path and be eventually
terminated (freed).

The following use-after-free crash was observed when IO throttling
was enabled:

 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x7f5813dff700 (LWP 56417)]
 virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252
 (gdb) bt
 #0  virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252
                              ^^^^^^^^^^^^^^
                              remember the address

 #1  virtqueue_fill (vq=0x5598b20d21b0, elem=0x7f5804009a30, len=1, idx=0) at virtio.c:282
 #2  virtqueue_push (vq=0x5598b20d21b0, elem=elem@entry=0x7f5804009a30, len=<optimized out>) at virtio.c:308
 #3  virtio_blk_req_complete (req=req@entry=0x7f5804009a30, status=status@entry=0 '\000') at virtio-blk.c:61
 #4  virtio_blk_rw_complete (opaque=<optimized out>, ret=0) at virtio-blk.c:126
 #5  blk_aio_complete (acb=0x7f58040068d0) at block-backend.c:923
 #6  coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at coroutine-ucontext.c:78

 (gdb) p * elem
 $8 = {index = 77, out_num = 2, in_num = 1,
       in_addr = 0x7f5804009ad8, out_addr = 0x7f5804009ae0,
       in_sg = 0x0, out_sg = 0x7f5804009a50}
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       'in_sg' and 'out_sg' are invalid.
       e.g. it is impossible that 'in_sg' is zero,
       instead its value must be equal to:

       (gdb) p/x 0x7f5804009ad8 + sizeof(elem->in_addr[0]) + 2 * sizeof(elem->out_addr[0])
       $26 = 0x7f5804009af0

Seems 'elem' was corrupted.  Meanwhile another thread raised an abort:

 Thread 12 (Thread 0x7f57f2ffd700 (LWP 56426)):
 #0  raise () from /lib/x86_64-linux-gnu/libc.so.6
 #1  abort () from /lib/x86_64-linux-gnu/libc.so.6
 #2  qemu_coroutine_enter (co=0x7f5804009af0) at qemu-coroutine.c:113
 #3  qemu_co_queue_run_restart (co=0x7f5804009a30) at qemu-coroutine-lock.c:60
 #4  qemu_coroutine_enter (co=0x7f5804009a30) at qemu-coroutine.c:119
                           ^^^^^^^^^^^^^^^^^^
                           WTF?? this is equal to elem from crashed thread

 #5  qemu_co_queue_run_restart (co=0x7f57e7f16ae0) at qemu-coroutine-lock.c:60
 #6  qemu_coroutine_enter (co=0x7f57e7f16ae0) at qemu-coroutine.c:119
 #7  qemu_co_queue_run_restart (co=0x7f5807e112a0) at qemu-coroutine-lock.c:60
 #8  qemu_coroutine_enter (co=0x7f5807e112a0) at qemu-coroutine.c:119
 #9  qemu_co_queue_run_restart (co=0x7f5807f17820) at qemu-coroutine-lock.c:60
 #10 qemu_coroutine_enter (co=0x7f5807f17820) at qemu-coroutine.c:119
 #11 qemu_co_queue_run_restart (co=0x7f57e7f18e10) at qemu-coroutine-lock.c:60
 #12 qemu_coroutine_enter (co=0x7f57e7f18e10) at qemu-coroutine.c:119
 #13 qemu_co_enter_next (queue=queue@entry=0x5598b1e742d0) at qemu-coroutine-lock.c:106
 #14 timer_cb (blk=0x5598b1e74280, is_write=<optimized out>) at throttle-groups.c:419

Crash can be explained by access of 'co' object from the loop inside
qemu_co_queue_run_restart():

  while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) {
      QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next);
                           ^^^^^^^^^^^^^^^^^^^^
                           on each iteration 'co' is accessed,
                           but 'co' can be already freed

      qemu_coroutine_enter(next);
  }

When 'next' coroutine is resumed (entered) it can in its turn resume
'co', and eventually free it.  That's why we see 'co' (which was freed)
has the same address as 'elem' from the first backtrace.

The fix is obvious: use temporary queue and do not touch coroutine after
first qemu_coroutine_enter() is invoked.

The issue is quite rare and happens every ~12 hours on very high IO
and CPU load (building linux kernel with -j512 inside guest) when IO
throttling is enabled.  With the fix applied guest is running ~35 hours
and is still alive so far.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Fam Zheng <famz@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org
---
 v3:
     Comments tweaks suggested by Stefan.
 v2:
     Comments tweaks suggested by Paolo.

 util/qemu-coroutine-lock.c | 19 +++++++++++++++++--
 util/qemu-coroutine.c      |  5 +++++
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index 6328eed26bc6..b44b5d55ebad 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -77,10 +77,25 @@ void coroutine_fn qemu_co_queue_wait(CoQueue *queue, CoMutex *mutex)
 void qemu_co_queue_run_restart(Coroutine *co)
 {
     Coroutine *next;
+    QSIMPLEQ_HEAD(, Coroutine) tmp_queue_wakeup =
+        QSIMPLEQ_HEAD_INITIALIZER(tmp_queue_wakeup);
 
     trace_qemu_co_queue_run_restart(co);
-    while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) {
-        QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next);
+
+    /* Because "co" has yielded, any coroutine that we wakeup can resume it.
+     * If this happens and "co" terminates, co->co_queue_wakeup becomes
+     * invalid memory.  Therefore, use a temporary queue and do not touch
+     * the "co" coroutine as soon as you enter another one.
+     *
+     * In its turn resumed "co" can pupulate "co_queue_wakeup" queue with
+     * new coroutines to be woken up.  The caller, who has resumed "co",
+     * will be responsible for traversing the same queue, which may cause
+     * a different wakeup order but not any missing wakeups.
+     */
+    QSIMPLEQ_CONCAT(&tmp_queue_wakeup, &co->co_queue_wakeup);
+
+    while ((next = QSIMPLEQ_FIRST(&tmp_queue_wakeup))) {
+        QSIMPLEQ_REMOVE_HEAD(&tmp_queue_wakeup, co_queue_next);
         qemu_coroutine_enter(next);
     }
 }
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index 486af9a62275..d6095c1d5aa4 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -126,6 +126,11 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co)
 
     qemu_co_queue_run_restart(co);
 
+    /* Beware, if ret == COROUTINE_YIELD and qemu_co_queue_run_restart()
+     * has started any other coroutine, "co" might have been reentered
+     * and even freed by now!  So be careful and do not touch it.
+     */
+
     switch (ret) {
     case COROUTINE_YIELD:
         return;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/1] coroutine-lock: do not touch coroutine after another one has been entered
  2017-06-01 16:08 [Qemu-devel] [PATCH v3 1/1] coroutine-lock: do not touch coroutine after another one has been entered Roman Pen
@ 2017-06-02  8:35 ` Stefan Hajnoczi
  2017-06-02 15:29 ` Roman Penyaev
  2017-06-07 13:54 ` Stefan Hajnoczi
  2 siblings, 0 replies; 4+ messages in thread
From: Stefan Hajnoczi @ 2017-06-02  8:35 UTC (permalink / raw)
  To: Roman Pen; +Cc: Paolo Bonzini, Fam Zheng, Kevin Wolf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 5089 bytes --]

On Thu, Jun 01, 2017 at 06:08:47PM +0200, Roman Pen wrote:
> Submission of requests on linux aio is a bit tricky and can lead to
> requests completions on submission path:
> 
> 44713c9e8547 ("linux-aio: Handle io_submit() failure gracefully")
> 0ed93d84edab ("linux-aio: process completions from ioq_submit()")
> 
> That means that any coroutine which has been yielded in order to wait
> for completion can be resumed from submission path and be eventually
> terminated (freed).
> 
> The following use-after-free crash was observed when IO throttling
> was enabled:
> 
>  Program received signal SIGSEGV, Segmentation fault.
>  [Switching to Thread 0x7f5813dff700 (LWP 56417)]
>  virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252
>  (gdb) bt
>  #0  virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252
>                               ^^^^^^^^^^^^^^
>                               remember the address
> 
>  #1  virtqueue_fill (vq=0x5598b20d21b0, elem=0x7f5804009a30, len=1, idx=0) at virtio.c:282
>  #2  virtqueue_push (vq=0x5598b20d21b0, elem=elem@entry=0x7f5804009a30, len=<optimized out>) at virtio.c:308
>  #3  virtio_blk_req_complete (req=req@entry=0x7f5804009a30, status=status@entry=0 '\000') at virtio-blk.c:61
>  #4  virtio_blk_rw_complete (opaque=<optimized out>, ret=0) at virtio-blk.c:126
>  #5  blk_aio_complete (acb=0x7f58040068d0) at block-backend.c:923
>  #6  coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at coroutine-ucontext.c:78
> 
>  (gdb) p * elem
>  $8 = {index = 77, out_num = 2, in_num = 1,
>        in_addr = 0x7f5804009ad8, out_addr = 0x7f5804009ae0,
>        in_sg = 0x0, out_sg = 0x7f5804009a50}
>        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>        'in_sg' and 'out_sg' are invalid.
>        e.g. it is impossible that 'in_sg' is zero,
>        instead its value must be equal to:
> 
>        (gdb) p/x 0x7f5804009ad8 + sizeof(elem->in_addr[0]) + 2 * sizeof(elem->out_addr[0])
>        $26 = 0x7f5804009af0
> 
> Seems 'elem' was corrupted.  Meanwhile another thread raised an abort:
> 
>  Thread 12 (Thread 0x7f57f2ffd700 (LWP 56426)):
>  #0  raise () from /lib/x86_64-linux-gnu/libc.so.6
>  #1  abort () from /lib/x86_64-linux-gnu/libc.so.6
>  #2  qemu_coroutine_enter (co=0x7f5804009af0) at qemu-coroutine.c:113
>  #3  qemu_co_queue_run_restart (co=0x7f5804009a30) at qemu-coroutine-lock.c:60
>  #4  qemu_coroutine_enter (co=0x7f5804009a30) at qemu-coroutine.c:119
>                            ^^^^^^^^^^^^^^^^^^
>                            WTF?? this is equal to elem from crashed thread
> 
>  #5  qemu_co_queue_run_restart (co=0x7f57e7f16ae0) at qemu-coroutine-lock.c:60
>  #6  qemu_coroutine_enter (co=0x7f57e7f16ae0) at qemu-coroutine.c:119
>  #7  qemu_co_queue_run_restart (co=0x7f5807e112a0) at qemu-coroutine-lock.c:60
>  #8  qemu_coroutine_enter (co=0x7f5807e112a0) at qemu-coroutine.c:119
>  #9  qemu_co_queue_run_restart (co=0x7f5807f17820) at qemu-coroutine-lock.c:60
>  #10 qemu_coroutine_enter (co=0x7f5807f17820) at qemu-coroutine.c:119
>  #11 qemu_co_queue_run_restart (co=0x7f57e7f18e10) at qemu-coroutine-lock.c:60
>  #12 qemu_coroutine_enter (co=0x7f57e7f18e10) at qemu-coroutine.c:119
>  #13 qemu_co_enter_next (queue=queue@entry=0x5598b1e742d0) at qemu-coroutine-lock.c:106
>  #14 timer_cb (blk=0x5598b1e74280, is_write=<optimized out>) at throttle-groups.c:419
> 
> Crash can be explained by access of 'co' object from the loop inside
> qemu_co_queue_run_restart():
> 
>   while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) {
>       QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next);
>                            ^^^^^^^^^^^^^^^^^^^^
>                            on each iteration 'co' is accessed,
>                            but 'co' can be already freed
> 
>       qemu_coroutine_enter(next);
>   }
> 
> When 'next' coroutine is resumed (entered) it can in its turn resume
> 'co', and eventually free it.  That's why we see 'co' (which was freed)
> has the same address as 'elem' from the first backtrace.
> 
> The fix is obvious: use temporary queue and do not touch coroutine after
> first qemu_coroutine_enter() is invoked.
> 
> The issue is quite rare and happens every ~12 hours on very high IO
> and CPU load (building linux kernel with -j512 inside guest) when IO
> throttling is enabled.  With the fix applied guest is running ~35 hours
> and is still alive so far.
> 
> Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Fam Zheng <famz@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: qemu-devel@nongnu.org
> ---
>  v3:
>      Comments tweaks suggested by Stefan.
>  v2:
>      Comments tweaks suggested by Paolo.
> 
>  util/qemu-coroutine-lock.c | 19 +++++++++++++++++--
>  util/qemu-coroutine.c      |  5 +++++
>  2 files changed, 22 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/1] coroutine-lock: do not touch coroutine after another one has been entered
  2017-06-01 16:08 [Qemu-devel] [PATCH v3 1/1] coroutine-lock: do not touch coroutine after another one has been entered Roman Pen
  2017-06-02  8:35 ` Stefan Hajnoczi
@ 2017-06-02 15:29 ` Roman Penyaev
  2017-06-07 13:54 ` Stefan Hajnoczi
  2 siblings, 0 replies; 4+ messages in thread
From: Roman Penyaev @ 2017-06-02 15:29 UTC (permalink / raw)
  To: Roman Pen
  Cc: Paolo Bonzini, Fam Zheng, Stefan Hajnoczi, Kevin Wolf, qemu-devel

On Thu, Jun 1, 2017 at 6:08 PM, Roman Pen
<roman.penyaev@profitbricks.com> wrote:

[cut]

>  Thread 12 (Thread 0x7f57f2ffd700 (LWP 56426)):
>  #0  raise () from /lib/x86_64-linux-gnu/libc.so.6
>  #1  abort () from /lib/x86_64-linux-gnu/libc.so.6
>  #2  qemu_coroutine_enter (co=0x7f5804009af0) at qemu-coroutine.c:113
>  #3  qemu_co_queue_run_restart (co=0x7f5804009a30) at qemu-coroutine-lock.c:60
>  #4  qemu_coroutine_enter (co=0x7f5804009a30) at qemu-coroutine.c:119
>                            ^^^^^^^^^^^^^^^^^^
>                            WTF?? this is equal to elem from crashed thread
>
>  #5  qemu_co_queue_run_restart (co=0x7f57e7f16ae0) at qemu-coroutine-lock.c:60
>  #6  qemu_coroutine_enter (co=0x7f57e7f16ae0) at qemu-coroutine.c:119
>  #7  qemu_co_queue_run_restart (co=0x7f5807e112a0) at qemu-coroutine-lock.c:60
>  #8  qemu_coroutine_enter (co=0x7f5807e112a0) at qemu-coroutine.c:119
>  #9  qemu_co_queue_run_restart (co=0x7f5807f17820) at qemu-coroutine-lock.c:60
>  #10 qemu_coroutine_enter (co=0x7f5807f17820) at qemu-coroutine.c:119
>  #11 qemu_co_queue_run_restart (co=0x7f57e7f18e10) at qemu-coroutine-lock.c:60
>  #12 qemu_coroutine_enter (co=0x7f57e7f18e10) at qemu-coroutine.c:119
>  #13 qemu_co_enter_next (queue=queue@entry=0x5598b1e742d0) at qemu-coroutine-lock.c:106
>  #14 timer_cb (blk=0x5598b1e74280, is_write=<optimized out>) at throttle-groups.c:419

BTW, while chasing current bug I observed utterly long recursive backtraces with
repeating:

qemu_co_queue_run_restart
qemu_coroutine_enter
....

like in the current bug but much much longer.

It is clear that each pair qemu_coroutine_enter(),qemu_co_queue_run_restart()
is one request/coroutine, so in the worst case we can get stack depth equal to
number of requests in flight which were throttled.

But real number of requests with MQ virtio device can be very huge.  When virtio
device is created with num-queues=$VCPU we have requests in total equal to
nr_requests x num_queues.  Say if guest has 16 VCPUS, virtio block
device can have
128x16 = 2048 requests in flight.  If my calculations are correct each
loop costs
around 112 bytes on stack, so 2048x112 = 224K.  Definitely not nice.

Straightforward fix can be in keeping queue_wakeup on stack.  I did
just fast dirty
diff (below), but seems the idea should be clear.  That will iron out recursion
and save 8 bytes for each coroutine.

--
Roman


diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index e46a32d0fcc2..4f1a3278e432 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -65,10 +65,13 @@ typedef void coroutine_fn CoroutineEntry(void *opaque);
  */
 Coroutine *qemu_coroutine_create(CoroutineEntry *entry, void *opaque);

+typedef QSIMPLEQ_HEAD(CoroutineQueue, Coroutine) CoroutineQueue;
+
 /**
  * Transfer control to a coroutine
  */
 void qemu_coroutine_enter(Coroutine *coroutine);
+void __qemu_coroutine_enter(Coroutine *coroutine, CoroutineQueue
*queue_wakeup);

 /**
  * Transfer control back to a coroutine's caller
diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
index 581a7f514075..edf1c0a4edbb 100644
--- a/include/qemu/coroutine_int.h
+++ b/include/qemu/coroutine_int.h
@@ -41,7 +41,7 @@ struct Coroutine {
     QSLIST_ENTRY(Coroutine) pool_next;

     /* Coroutines that should be woken up when we yield or terminate */
-    QSIMPLEQ_HEAD(, Coroutine) co_queue_wakeup;
+    CoroutineQueue *co_queue_wakeup;
     QSIMPLEQ_ENTRY(Coroutine) co_queue_next;
 };

@@ -49,6 +49,6 @@ Coroutine *qemu_coroutine_new(void);
 void qemu_coroutine_delete(Coroutine *co);
 CoroutineAction qemu_coroutine_switch(Coroutine *from, Coroutine *to,
                                       CoroutineAction action);
-void coroutine_fn qemu_co_queue_run_restart(Coroutine *co);
+void coroutine_fn qemu_co_queue_run_restart(CoroutineQueue *queue_wakeup);

 #endif
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index 0db8e4fc6d98..9282e5b82824 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -50,27 +50,13 @@ void coroutine_fn qemu_co_queue_wait(CoQueue *queue)
  * invoked by the core coroutine code when the current coroutine yields or
  * terminates.
  */
-void qemu_co_queue_run_restart(Coroutine *co)
+void qemu_co_queue_run_restart(CoroutineQueue *queue_wakeup)
 {
     Coroutine *next;
-    QSIMPLEQ_HEAD(, Coroutine) tmp_queue_wakeup =
-        QSIMPLEQ_HEAD_INITIALIZER(tmp_queue_wakeup);
-
-    trace_qemu_co_queue_run_restart(co);
-
-    /*
-     * We use temporal queue in order not to touch 'co' after entering
-     * 'next' coroutine.  The thing is that 'next' coroutine can resume
-     * current 'co' coroutine and eventually terminate (free) it (see
-     * linux-aio.c: ioq_submit() where qemu_laio_process_completions()
-     * is invoked).  The rule of thumb is simple: do not touch coroutine
-     * when you enter another one.
-     */
-    QSIMPLEQ_CONCAT(&tmp_queue_wakeup, &co->co_queue_wakeup);
-
-    while ((next = QSIMPLEQ_FIRST(&tmp_queue_wakeup))) {
-        QSIMPLEQ_REMOVE_HEAD(&tmp_queue_wakeup, co_queue_next);
-        qemu_coroutine_enter(next);
+
+    while ((next = QSIMPLEQ_FIRST(queue_wakeup))) {
+        QSIMPLEQ_REMOVE_HEAD(queue_wakeup, co_queue_next);
+        __qemu_coroutine_enter(next, queue_wakeup);
     }
 }

@@ -85,7 +71,7 @@ static bool qemu_co_queue_do_restart(CoQueue *queue,
bool single)

     while ((next = QSIMPLEQ_FIRST(&queue->entries)) != NULL) {
         QSIMPLEQ_REMOVE_HEAD(&queue->entries, co_queue_next);
-        QSIMPLEQ_INSERT_TAIL(&self->co_queue_wakeup, next, co_queue_next);
+        QSIMPLEQ_INSERT_TAIL(self->co_queue_wakeup, next, co_queue_next);
         trace_qemu_co_queue_next(next);
         if (single) {
             break;
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index 1a89e04238fa..6d280921c161 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -77,7 +77,7 @@ Coroutine *qemu_coroutine_create(CoroutineEntry
*entry, void *opaque)

     co->entry = entry;
     co->entry_arg = opaque;
-    QSIMPLEQ_INIT(&co->co_queue_wakeup);
+    co->co_queue_wakeup = NULL;
     return co;
 }

@@ -101,7 +101,7 @@ static void coroutine_delete(Coroutine *co)
     qemu_coroutine_delete(co);
 }

-void qemu_coroutine_enter(Coroutine *co)
+void __qemu_coroutine_enter(Coroutine *co, CoroutineQueue *queue_wakeup)
 {
     Coroutine *self = qemu_coroutine_self();
     CoroutineAction ret;
@@ -114,17 +114,9 @@ void qemu_coroutine_enter(Coroutine *co)
     }

     co->caller = self;
+    co->co_queue_wakeup = queue_wakeup;
     ret = qemu_coroutine_switch(self, co, COROUTINE_ENTER);

-    qemu_co_queue_run_restart(co);
-
-    /*
-     * BEWARE: in case of ret == COROUTINE_YIELD here at this point
-     *         after qemu_co_queue_run_restart() 'co' can be already
-     *         freed by other coroutine, which has entered 'co'. So
-     *         be careful and do not touch it.
-     */
-
     switch (ret) {
     case COROUTINE_YIELD:
         return;
@@ -137,6 +129,15 @@ void qemu_coroutine_enter(Coroutine *co)
     }
 }

+void qemu_coroutine_enter(Coroutine *co)
+{
+    CoroutineQueue queue_wakeup =
+        QSIMPLEQ_HEAD_INITIALIZER(queue_wakeup);
+
+    __qemu_coroutine_enter(co, &queue_wakeup);
+    qemu_co_queue_run_restart(&queue_wakeup);
+}
+
 void coroutine_fn qemu_coroutine_yield(void)
 {
     Coroutine *self = qemu_coroutine_self();
@@ -150,6 +151,7 @@ void coroutine_fn qemu_coroutine_yield(void)
     }

     self->caller = NULL;
+    self->co_queue_wakeup = NULL;
     qemu_coroutine_switch(self, to, COROUTINE_YIELD);
 }

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/1] coroutine-lock: do not touch coroutine after another one has been entered
  2017-06-01 16:08 [Qemu-devel] [PATCH v3 1/1] coroutine-lock: do not touch coroutine after another one has been entered Roman Pen
  2017-06-02  8:35 ` Stefan Hajnoczi
  2017-06-02 15:29 ` Roman Penyaev
@ 2017-06-07 13:54 ` Stefan Hajnoczi
  2 siblings, 0 replies; 4+ messages in thread
From: Stefan Hajnoczi @ 2017-06-07 13:54 UTC (permalink / raw)
  To: Roman Pen
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, Stefan Hajnoczi, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 5130 bytes --]

On Thu, Jun 01, 2017 at 06:08:47PM +0200, Roman Pen wrote:
> Submission of requests on linux aio is a bit tricky and can lead to
> requests completions on submission path:
> 
> 44713c9e8547 ("linux-aio: Handle io_submit() failure gracefully")
> 0ed93d84edab ("linux-aio: process completions from ioq_submit()")
> 
> That means that any coroutine which has been yielded in order to wait
> for completion can be resumed from submission path and be eventually
> terminated (freed).
> 
> The following use-after-free crash was observed when IO throttling
> was enabled:
> 
>  Program received signal SIGSEGV, Segmentation fault.
>  [Switching to Thread 0x7f5813dff700 (LWP 56417)]
>  virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252
>  (gdb) bt
>  #0  virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252
>                               ^^^^^^^^^^^^^^
>                               remember the address
> 
>  #1  virtqueue_fill (vq=0x5598b20d21b0, elem=0x7f5804009a30, len=1, idx=0) at virtio.c:282
>  #2  virtqueue_push (vq=0x5598b20d21b0, elem=elem@entry=0x7f5804009a30, len=<optimized out>) at virtio.c:308
>  #3  virtio_blk_req_complete (req=req@entry=0x7f5804009a30, status=status@entry=0 '\000') at virtio-blk.c:61
>  #4  virtio_blk_rw_complete (opaque=<optimized out>, ret=0) at virtio-blk.c:126
>  #5  blk_aio_complete (acb=0x7f58040068d0) at block-backend.c:923
>  #6  coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at coroutine-ucontext.c:78
> 
>  (gdb) p * elem
>  $8 = {index = 77, out_num = 2, in_num = 1,
>        in_addr = 0x7f5804009ad8, out_addr = 0x7f5804009ae0,
>        in_sg = 0x0, out_sg = 0x7f5804009a50}
>        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>        'in_sg' and 'out_sg' are invalid.
>        e.g. it is impossible that 'in_sg' is zero,
>        instead its value must be equal to:
> 
>        (gdb) p/x 0x7f5804009ad8 + sizeof(elem->in_addr[0]) + 2 * sizeof(elem->out_addr[0])
>        $26 = 0x7f5804009af0
> 
> Seems 'elem' was corrupted.  Meanwhile another thread raised an abort:
> 
>  Thread 12 (Thread 0x7f57f2ffd700 (LWP 56426)):
>  #0  raise () from /lib/x86_64-linux-gnu/libc.so.6
>  #1  abort () from /lib/x86_64-linux-gnu/libc.so.6
>  #2  qemu_coroutine_enter (co=0x7f5804009af0) at qemu-coroutine.c:113
>  #3  qemu_co_queue_run_restart (co=0x7f5804009a30) at qemu-coroutine-lock.c:60
>  #4  qemu_coroutine_enter (co=0x7f5804009a30) at qemu-coroutine.c:119
>                            ^^^^^^^^^^^^^^^^^^
>                            WTF?? this is equal to elem from crashed thread
> 
>  #5  qemu_co_queue_run_restart (co=0x7f57e7f16ae0) at qemu-coroutine-lock.c:60
>  #6  qemu_coroutine_enter (co=0x7f57e7f16ae0) at qemu-coroutine.c:119
>  #7  qemu_co_queue_run_restart (co=0x7f5807e112a0) at qemu-coroutine-lock.c:60
>  #8  qemu_coroutine_enter (co=0x7f5807e112a0) at qemu-coroutine.c:119
>  #9  qemu_co_queue_run_restart (co=0x7f5807f17820) at qemu-coroutine-lock.c:60
>  #10 qemu_coroutine_enter (co=0x7f5807f17820) at qemu-coroutine.c:119
>  #11 qemu_co_queue_run_restart (co=0x7f57e7f18e10) at qemu-coroutine-lock.c:60
>  #12 qemu_coroutine_enter (co=0x7f57e7f18e10) at qemu-coroutine.c:119
>  #13 qemu_co_enter_next (queue=queue@entry=0x5598b1e742d0) at qemu-coroutine-lock.c:106
>  #14 timer_cb (blk=0x5598b1e74280, is_write=<optimized out>) at throttle-groups.c:419
> 
> Crash can be explained by access of 'co' object from the loop inside
> qemu_co_queue_run_restart():
> 
>   while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) {
>       QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next);
>                            ^^^^^^^^^^^^^^^^^^^^
>                            on each iteration 'co' is accessed,
>                            but 'co' can be already freed
> 
>       qemu_coroutine_enter(next);
>   }
> 
> When 'next' coroutine is resumed (entered) it can in its turn resume
> 'co', and eventually free it.  That's why we see 'co' (which was freed)
> has the same address as 'elem' from the first backtrace.
> 
> The fix is obvious: use temporary queue and do not touch coroutine after
> first qemu_coroutine_enter() is invoked.
> 
> The issue is quite rare and happens every ~12 hours on very high IO
> and CPU load (building linux kernel with -j512 inside guest) when IO
> throttling is enabled.  With the fix applied guest is running ~35 hours
> and is still alive so far.
> 
> Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Fam Zheng <famz@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: qemu-devel@nongnu.org
> ---
>  v3:
>      Comments tweaks suggested by Stefan.
>  v2:
>      Comments tweaks suggested by Paolo.
> 
>  util/qemu-coroutine-lock.c | 19 +++++++++++++++++--
>  util/qemu-coroutine.c      |  5 +++++
>  2 files changed, 22 insertions(+), 2 deletions(-)

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-06-07 13:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-01 16:08 [Qemu-devel] [PATCH v3 1/1] coroutine-lock: do not touch coroutine after another one has been entered Roman Pen
2017-06-02  8:35 ` Stefan Hajnoczi
2017-06-02 15:29 ` Roman Penyaev
2017-06-07 13:54 ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.