All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 1/1] coroutine-lock: do not touch coroutine after another one has been entered
@ 2017-05-30 10:07 Roman Pen
  2017-05-30 11:35 ` Fam Zheng
  2017-05-31 13:06 ` Stefan Hajnoczi
  0 siblings, 2 replies; 11+ messages in thread
From: Roman Pen @ 2017-05-30 10:07 UTC (permalink / raw)
  Cc: Roman Pen, Paolo Bonzini, Fam Zheng, Stefan Hajnoczi, Kevin Wolf,
	qemu-devel

Submission of requests on linux aio is a bit tricky and can lead to
requests completions on submission path:

44713c9e8547 ("linux-aio: Handle io_submit() failure gracefully")
0ed93d84edab ("linux-aio: process completions from ioq_submit()")

That means that any coroutine which has been yielded in order to wait
for completion can be resumed from submission path and be eventually
terminated (freed).

The following use-after-free crash was observed when IO throttling
was enabled:

 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x7f5813dff700 (LWP 56417)]
 virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252
 (gdb) bt
 #0  virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252
                              ^^^^^^^^^^^^^^
                              remember the address

 #1  virtqueue_fill (vq=0x5598b20d21b0, elem=0x7f5804009a30, len=1, idx=0) at virtio.c:282
 #2  virtqueue_push (vq=0x5598b20d21b0, elem=elem@entry=0x7f5804009a30, len=<optimized out>) at virtio.c:308
 #3  virtio_blk_req_complete (req=req@entry=0x7f5804009a30, status=status@entry=0 '\000') at virtio-blk.c:61
 #4  virtio_blk_rw_complete (opaque=<optimized out>, ret=0) at virtio-blk.c:126
 #5  blk_aio_complete (acb=0x7f58040068d0) at block-backend.c:923
 #6  coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at coroutine-ucontext.c:78

 (gdb) p * elem
 $8 = {index = 77, out_num = 2, in_num = 1,
       in_addr = 0x7f5804009ad8, out_addr = 0x7f5804009ae0,
       in_sg = 0x0, out_sg = 0x7f5804009a50}
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       'in_sg' and 'out_sg' are invalid.
       e.g. it is impossible that 'in_sg' is zero,
       instead its value must be equal to:

       (gdb) p/x 0x7f5804009ad8 + sizeof(elem->in_addr[0]) + 2 * sizeof(elem->out_addr[0])
       $26 = 0x7f5804009af0

Seems 'elem' was corrupted.  Meanwhile another thread raised an abort:

 Thread 12 (Thread 0x7f57f2ffd700 (LWP 56426)):
 #0  raise () from /lib/x86_64-linux-gnu/libc.so.6
 #1  abort () from /lib/x86_64-linux-gnu/libc.so.6
 #2  qemu_coroutine_enter (co=0x7f5804009af0) at qemu-coroutine.c:113
 #3  qemu_co_queue_run_restart (co=0x7f5804009a30) at qemu-coroutine-lock.c:60
 #4  qemu_coroutine_enter (co=0x7f5804009a30) at qemu-coroutine.c:119
                           ^^^^^^^^^^^^^^^^^^
                           WTF?? this is equal to elem from crashed thread

 #5  qemu_co_queue_run_restart (co=0x7f57e7f16ae0) at qemu-coroutine-lock.c:60
 #6  qemu_coroutine_enter (co=0x7f57e7f16ae0) at qemu-coroutine.c:119
 #7  qemu_co_queue_run_restart (co=0x7f5807e112a0) at qemu-coroutine-lock.c:60
 #8  qemu_coroutine_enter (co=0x7f5807e112a0) at qemu-coroutine.c:119
 #9  qemu_co_queue_run_restart (co=0x7f5807f17820) at qemu-coroutine-lock.c:60
 #10 qemu_coroutine_enter (co=0x7f5807f17820) at qemu-coroutine.c:119
 #11 qemu_co_queue_run_restart (co=0x7f57e7f18e10) at qemu-coroutine-lock.c:60
 #12 qemu_coroutine_enter (co=0x7f57e7f18e10) at qemu-coroutine.c:119
 #13 qemu_co_enter_next (queue=queue@entry=0x5598b1e742d0) at qemu-coroutine-lock.c:106
 #14 timer_cb (blk=0x5598b1e74280, is_write=<optimized out>) at throttle-groups.c:419

Crash can be explained by access of 'co' object from the loop inside
qemu_co_queue_run_restart():

  while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) {
      QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next);
                           ^^^^^^^^^^^^^^^^^^^^
                           on each iteration 'co' is accessed,
                           but 'co' can be already freed

      qemu_coroutine_enter(next);
  }

When 'next' coroutine is resumed (entered) it can in its turn resume
'co', and eventually free it.  That's why we see 'co' (which was freed)
has the same address as 'elem' from the first backtrace.

The fix is obvious: use temporary queue and do not touch coroutine after
first qemu_coroutine_enter() is invoked.

The issue is quite rare and happens every ~12 hours on very high IO
and CPU load (building linux kernel with -j512 inside guest) when IO
throttling is enabled.  With the fix applied guest is running ~35 hours
and is still alive so far.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Fam Zheng <famz@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org
---
 v2:
     Comments tweaks suggested by Paolo.

 util/qemu-coroutine-lock.c | 14 ++++++++++++--
 util/qemu-coroutine.c      |  5 +++++
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index 6328eed26bc6..d589d8c66d5e 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -77,10 +77,20 @@ void coroutine_fn qemu_co_queue_wait(CoQueue *queue, CoMutex *mutex)
 void qemu_co_queue_run_restart(Coroutine *co)
 {
     Coroutine *next;
+    QSIMPLEQ_HEAD(, Coroutine) tmp_queue_wakeup =
+        QSIMPLEQ_HEAD_INITIALIZER(tmp_queue_wakeup);
 
     trace_qemu_co_queue_run_restart(co);
-    while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) {
-        QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next);
+
+    /* Because "co" has yielded, any coroutine that we wakeup can resume it.
+     * If this happens and "co" terminates, co->co_queue_wakeup becomes
+     * invalid memory.  Therefore, use a temporary queue and do not touch
+     * the "co" coroutine as soon as you enter another one.
+     */
+    QSIMPLEQ_CONCAT(&tmp_queue_wakeup, &co->co_queue_wakeup);
+
+    while ((next = QSIMPLEQ_FIRST(&tmp_queue_wakeup))) {
+        QSIMPLEQ_REMOVE_HEAD(&tmp_queue_wakeup, co_queue_next);
         qemu_coroutine_enter(next);
     }
 }
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index 486af9a62275..d6095c1d5aa4 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -126,6 +126,11 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co)
 
     qemu_co_queue_run_restart(co);
 
+    /* Beware, if ret == COROUTINE_YIELD and qemu_co_queue_run_restart()
+     * has started any other coroutine, "co" might have been reentered
+     * and even freed by now!  So be careful and do not touch it.
+     */
+
     switch (ret) {
     case COROUTINE_YIELD:
         return;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] coroutine-lock: do not touch coroutine after another one has been entered
  2017-05-30 10:07 [Qemu-devel] [PATCH v2 1/1] coroutine-lock: do not touch coroutine after another one has been entered Roman Pen
@ 2017-05-30 11:35 ` Fam Zheng
  2017-06-01  9:38   ` Roman Penyaev
  2017-05-31 13:06 ` Stefan Hajnoczi
  1 sibling, 1 reply; 11+ messages in thread
From: Fam Zheng @ 2017-05-30 11:35 UTC (permalink / raw)
  To: Roman Pen; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Paolo Bonzini

On Tue, 05/30 12:07, Roman Pen wrote:
> Submission of requests on linux aio is a bit tricky and can lead to
> requests completions on submission path:
> 
> 44713c9e8547 ("linux-aio: Handle io_submit() failure gracefully")
> 0ed93d84edab ("linux-aio: process completions from ioq_submit()")
> 
> That means that any coroutine which has been yielded in order to wait
> for completion can be resumed from submission path and be eventually
> terminated (freed).
> 
> The following use-after-free crash was observed when IO throttling
> was enabled:
> 
>  Program received signal SIGSEGV, Segmentation fault.
>  [Switching to Thread 0x7f5813dff700 (LWP 56417)]
>  virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252
>  (gdb) bt
>  #0  virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252
>                               ^^^^^^^^^^^^^^
>                               remember the address
> 
>  #1  virtqueue_fill (vq=0x5598b20d21b0, elem=0x7f5804009a30, len=1, idx=0) at virtio.c:282
>  #2  virtqueue_push (vq=0x5598b20d21b0, elem=elem@entry=0x7f5804009a30, len=<optimized out>) at virtio.c:308
>  #3  virtio_blk_req_complete (req=req@entry=0x7f5804009a30, status=status@entry=0 '\000') at virtio-blk.c:61
>  #4  virtio_blk_rw_complete (opaque=<optimized out>, ret=0) at virtio-blk.c:126
>  #5  blk_aio_complete (acb=0x7f58040068d0) at block-backend.c:923
>  #6  coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at coroutine-ucontext.c:78
> 
>  (gdb) p * elem
>  $8 = {index = 77, out_num = 2, in_num = 1,
>        in_addr = 0x7f5804009ad8, out_addr = 0x7f5804009ae0,
>        in_sg = 0x0, out_sg = 0x7f5804009a50}
>        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>        'in_sg' and 'out_sg' are invalid.
>        e.g. it is impossible that 'in_sg' is zero,
>        instead its value must be equal to:
> 
>        (gdb) p/x 0x7f5804009ad8 + sizeof(elem->in_addr[0]) + 2 * sizeof(elem->out_addr[0])
>        $26 = 0x7f5804009af0
> 
> Seems 'elem' was corrupted.  Meanwhile another thread raised an abort:
> 
>  Thread 12 (Thread 0x7f57f2ffd700 (LWP 56426)):
>  #0  raise () from /lib/x86_64-linux-gnu/libc.so.6
>  #1  abort () from /lib/x86_64-linux-gnu/libc.so.6
>  #2  qemu_coroutine_enter (co=0x7f5804009af0) at qemu-coroutine.c:113
>  #3  qemu_co_queue_run_restart (co=0x7f5804009a30) at qemu-coroutine-lock.c:60
>  #4  qemu_coroutine_enter (co=0x7f5804009a30) at qemu-coroutine.c:119
>                            ^^^^^^^^^^^^^^^^^^
>                            WTF?? this is equal to elem from crashed thread
> 
>  #5  qemu_co_queue_run_restart (co=0x7f57e7f16ae0) at qemu-coroutine-lock.c:60
>  #6  qemu_coroutine_enter (co=0x7f57e7f16ae0) at qemu-coroutine.c:119
>  #7  qemu_co_queue_run_restart (co=0x7f5807e112a0) at qemu-coroutine-lock.c:60
>  #8  qemu_coroutine_enter (co=0x7f5807e112a0) at qemu-coroutine.c:119
>  #9  qemu_co_queue_run_restart (co=0x7f5807f17820) at qemu-coroutine-lock.c:60
>  #10 qemu_coroutine_enter (co=0x7f5807f17820) at qemu-coroutine.c:119
>  #11 qemu_co_queue_run_restart (co=0x7f57e7f18e10) at qemu-coroutine-lock.c:60
>  #12 qemu_coroutine_enter (co=0x7f57e7f18e10) at qemu-coroutine.c:119
>  #13 qemu_co_enter_next (queue=queue@entry=0x5598b1e742d0) at qemu-coroutine-lock.c:106
>  #14 timer_cb (blk=0x5598b1e74280, is_write=<optimized out>) at throttle-groups.c:419
> 
> Crash can be explained by access of 'co' object from the loop inside
> qemu_co_queue_run_restart():
> 
>   while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) {
>       QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next);
>                            ^^^^^^^^^^^^^^^^^^^^
>                            on each iteration 'co' is accessed,
>                            but 'co' can be already freed
> 
>       qemu_coroutine_enter(next);
>   }
> 
> When 'next' coroutine is resumed (entered) it can in its turn resume
> 'co', and eventually free it.  That's why we see 'co' (which was freed)
> has the same address as 'elem' from the first backtrace.
> 
> The fix is obvious: use temporary queue and do not touch coroutine after
> first qemu_coroutine_enter() is invoked.
> 
> The issue is quite rare and happens every ~12 hours on very high IO
> and CPU load (building linux kernel with -j512 inside guest) when IO
> throttling is enabled.  With the fix applied guest is running ~35 hours
> and is still alive so far.
> 
> Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Fam Zheng <famz@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: qemu-devel@nongnu.org
> ---
>  v2:
>      Comments tweaks suggested by Paolo.
> 
>  util/qemu-coroutine-lock.c | 14 ++++++++++++--
>  util/qemu-coroutine.c      |  5 +++++
>  2 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
> index 6328eed26bc6..d589d8c66d5e 100644
> --- a/util/qemu-coroutine-lock.c
> +++ b/util/qemu-coroutine-lock.c
> @@ -77,10 +77,20 @@ void coroutine_fn qemu_co_queue_wait(CoQueue *queue, CoMutex *mutex)
>  void qemu_co_queue_run_restart(Coroutine *co)
>  {
>      Coroutine *next;
> +    QSIMPLEQ_HEAD(, Coroutine) tmp_queue_wakeup =
> +        QSIMPLEQ_HEAD_INITIALIZER(tmp_queue_wakeup);
>  
>      trace_qemu_co_queue_run_restart(co);
> -    while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) {
> -        QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next);
> +
> +    /* Because "co" has yielded, any coroutine that we wakeup can resume it.
> +     * If this happens and "co" terminates, co->co_queue_wakeup becomes
> +     * invalid memory.  Therefore, use a temporary queue and do not touch
> +     * the "co" coroutine as soon as you enter another one.
> +     */
> +    QSIMPLEQ_CONCAT(&tmp_queue_wakeup, &co->co_queue_wakeup);
> +
> +    while ((next = QSIMPLEQ_FIRST(&tmp_queue_wakeup))) {
> +        QSIMPLEQ_REMOVE_HEAD(&tmp_queue_wakeup, co_queue_next);
>          qemu_coroutine_enter(next);
>      }
>  }
> diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
> index 486af9a62275..d6095c1d5aa4 100644
> --- a/util/qemu-coroutine.c
> +++ b/util/qemu-coroutine.c
> @@ -126,6 +126,11 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co)
>  
>      qemu_co_queue_run_restart(co);
>  
> +    /* Beware, if ret == COROUTINE_YIELD and qemu_co_queue_run_restart()
> +     * has started any other coroutine, "co" might have been reentered
> +     * and even freed by now!  So be careful and do not touch it.
> +     */
> +
>      switch (ret) {
>      case COROUTINE_YIELD:
>          return;
> -- 
> 2.11.0
> 
> 

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] coroutine-lock: do not touch coroutine after another one has been entered
  2017-05-30 10:07 [Qemu-devel] [PATCH v2 1/1] coroutine-lock: do not touch coroutine after another one has been entered Roman Pen
  2017-05-30 11:35 ` Fam Zheng
@ 2017-05-31 13:06 ` Stefan Hajnoczi
  2017-05-31 13:22   ` Paolo Bonzini
  2017-05-31 13:23   ` Roman Penyaev
  1 sibling, 2 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2017-05-31 13:06 UTC (permalink / raw)
  To: Roman Pen
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, Stefan Hajnoczi, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1547 bytes --]

On Tue, May 30, 2017 at 12:07:36PM +0200, Roman Pen wrote:
> diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
> index 6328eed26bc6..d589d8c66d5e 100644
> --- a/util/qemu-coroutine-lock.c
> +++ b/util/qemu-coroutine-lock.c
> @@ -77,10 +77,20 @@ void coroutine_fn qemu_co_queue_wait(CoQueue *queue, CoMutex *mutex)
>  void qemu_co_queue_run_restart(Coroutine *co)
>  {
>      Coroutine *next;
> +    QSIMPLEQ_HEAD(, Coroutine) tmp_queue_wakeup =
> +        QSIMPLEQ_HEAD_INITIALIZER(tmp_queue_wakeup);
>  
>      trace_qemu_co_queue_run_restart(co);
> -    while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) {
> -        QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next);
> +
> +    /* Because "co" has yielded, any coroutine that we wakeup can resume it.
> +     * If this happens and "co" terminates, co->co_queue_wakeup becomes
> +     * invalid memory.  Therefore, use a temporary queue and do not touch
> +     * the "co" coroutine as soon as you enter another one.
> +     */
> +    QSIMPLEQ_CONCAT(&tmp_queue_wakeup, &co->co_queue_wakeup);
> +
> +    while ((next = QSIMPLEQ_FIRST(&tmp_queue_wakeup))) {
> +        QSIMPLEQ_REMOVE_HEAD(&tmp_queue_wakeup, co_queue_next);
>          qemu_coroutine_enter(next);
>      }
>  }

What happens if co remains alive and qemu_coroutine_enter(next) causes
additional coroutines to add themselves to co->co_queue_wakeup?

I think they used to be entered but not anymore after this patch.  Not
sure if anything depends on this behavior...

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] coroutine-lock: do not touch coroutine after another one has been entered
  2017-05-31 13:06 ` Stefan Hajnoczi
@ 2017-05-31 13:22   ` Paolo Bonzini
  2017-05-31 13:25     ` Roman Penyaev
  2017-05-31 13:23   ` Roman Penyaev
  1 sibling, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2017-05-31 13:22 UTC (permalink / raw)
  To: Stefan Hajnoczi, Roman Pen
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 576 bytes --]



On 31/05/2017 15:06, Stefan Hajnoczi wrote:
>> +
>> +    while ((next = QSIMPLEQ_FIRST(&tmp_queue_wakeup))) {
>> +        QSIMPLEQ_REMOVE_HEAD(&tmp_queue_wakeup, co_queue_next);
>>          qemu_coroutine_enter(next);
>>      }
>>  }
> What happens if co remains alive and qemu_coroutine_enter(next) causes
> additional coroutines to add themselves to co->co_queue_wakeup?

Wouldn't that happen only if co is entered again?  Then it will also
reenter qemu_co_queue_run_restart, which may cause a different wakeup
order but not any missing wakeups.

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] coroutine-lock: do not touch coroutine after another one has been entered
  2017-05-31 13:06 ` Stefan Hajnoczi
  2017-05-31 13:22   ` Paolo Bonzini
@ 2017-05-31 13:23   ` Roman Penyaev
  2017-06-01 13:15     ` Stefan Hajnoczi
  1 sibling, 1 reply; 11+ messages in thread
From: Roman Penyaev @ 2017-05-31 13:23 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, Stefan Hajnoczi, Paolo Bonzini

On Wed, May 31, 2017 at 3:06 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Tue, May 30, 2017 at 12:07:36PM +0200, Roman Pen wrote:
>> diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
>> index 6328eed26bc6..d589d8c66d5e 100644
>> --- a/util/qemu-coroutine-lock.c
>> +++ b/util/qemu-coroutine-lock.c
>> @@ -77,10 +77,20 @@ void coroutine_fn qemu_co_queue_wait(CoQueue *queue, CoMutex *mutex)
>>  void qemu_co_queue_run_restart(Coroutine *co)
>>  {
>>      Coroutine *next;
>> +    QSIMPLEQ_HEAD(, Coroutine) tmp_queue_wakeup =
>> +        QSIMPLEQ_HEAD_INITIALIZER(tmp_queue_wakeup);
>>
>>      trace_qemu_co_queue_run_restart(co);
>> -    while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) {
>> -        QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next);
>> +
>> +    /* Because "co" has yielded, any coroutine that we wakeup can resume it.
>> +     * If this happens and "co" terminates, co->co_queue_wakeup becomes
>> +     * invalid memory.  Therefore, use a temporary queue and do not touch
>> +     * the "co" coroutine as soon as you enter another one.
>> +     */
>> +    QSIMPLEQ_CONCAT(&tmp_queue_wakeup, &co->co_queue_wakeup);
>> +
>> +    while ((next = QSIMPLEQ_FIRST(&tmp_queue_wakeup))) {
>> +        QSIMPLEQ_REMOVE_HEAD(&tmp_queue_wakeup, co_queue_next);
>>          qemu_coroutine_enter(next);
>>      }
>>  }
>
> What happens if co remains alive and qemu_coroutine_enter(next) causes
> additional coroutines to add themselves to co->co_queue_wakeup?

Yeah, I thought about it.  But according to my understanding the only
path where you add something to the tail of a queue is:

void aio_co_enter(AioContext *ctx, struct Coroutine *co)
{
...
   if (qemu_in_coroutine()) {
        Coroutine *self = qemu_coroutine_self();
        assert(self != co);
        QSIMPLEQ_INSERT_TAIL(&self->co_queue_wakeup, co,
co_queue_next); <<<< HERE

So you should be in *that* coroutine to chain other coroutines.
That means that caller of your 'co' will be responsible to complete
what it has in the list.  Something like that:


co1 YIELDED,
foreach co in co1.queue{co2}
   enter(co) -------------->  co2 does something and
                              eventually enter(co1):  -----> co1 does
something and
                                                             add co4
to the queue
                                                             terminates
                                                      <-----
                              co2 iterates over the queue of co1 and
                               foreach co in co1.queue{co4}


Sorry, the explanation is totally crap, but the key is:
caller is responsible for cleaning the queue no matter what
happens.  Sounds sane?

--
Roman


>
> I think they used to be entered but not anymore after this patch.  Not
> sure if anything depends on this behavior...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] coroutine-lock: do not touch coroutine after another one has been entered
  2017-05-31 13:22   ` Paolo Bonzini
@ 2017-05-31 13:25     ` Roman Penyaev
  0 siblings, 0 replies; 11+ messages in thread
From: Roman Penyaev @ 2017-05-31 13:25 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Stefan Hajnoczi, Kevin Wolf, Fam Zheng, qemu-devel, Stefan Hajnoczi

On Wed, May 31, 2017 at 3:22 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 31/05/2017 15:06, Stefan Hajnoczi wrote:
>>> +
>>> +    while ((next = QSIMPLEQ_FIRST(&tmp_queue_wakeup))) {
>>> +        QSIMPLEQ_REMOVE_HEAD(&tmp_queue_wakeup, co_queue_next);
>>>          qemu_coroutine_enter(next);
>>>      }
>>>  }
>> What happens if co remains alive and qemu_coroutine_enter(next) causes
>> additional coroutines to add themselves to co->co_queue_wakeup?
>
> Wouldn't that happen only if co is entered again?  Then it will also
> reenter qemu_co_queue_run_restart, which may cause a different wakeup
> order but not any missing wakeups.

Exactly, that what I tried to show with stupid arrows in another mail.
I am not sure that I've succeeded :)

--
Roman

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] coroutine-lock: do not touch coroutine after another one has been entered
  2017-05-30 11:35 ` Fam Zheng
@ 2017-06-01  9:38   ` Roman Penyaev
  2017-06-01  9:42     ` Paolo Bonzini
  2017-06-01  9:48     ` Fam Zheng
  0 siblings, 2 replies; 11+ messages in thread
From: Roman Penyaev @ 2017-06-01  9:38 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Paolo Bonzini

On Tue, May 30, 2017 at 1:35 PM, Fam Zheng <famz@redhat.com> wrote:
[cut]
> Reviewed-by: Fam Zheng <famz@redhat.com>

Do I need to resend the patch with 'reviewed-by' line?  Or it is
already queued?

--
Roman

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] coroutine-lock: do not touch coroutine after another one has been entered
  2017-06-01  9:38   ` Roman Penyaev
@ 2017-06-01  9:42     ` Paolo Bonzini
  2017-06-01  9:48     ` Fam Zheng
  1 sibling, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2017-06-01  9:42 UTC (permalink / raw)
  To: Roman Penyaev, Fam Zheng; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi



On 01/06/2017 11:38, Roman Penyaev wrote:
> On Tue, May 30, 2017 at 1:35 PM, Fam Zheng <famz@redhat.com> wrote:
> [cut]
>> Reviewed-by: Fam Zheng <famz@redhat.com>
> 
> Do I need to resend the patch with 'reviewed-by' line?  Or it is
> already queued?

No need to resend, thanks!

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] coroutine-lock: do not touch coroutine after another one has been entered
  2017-06-01  9:38   ` Roman Penyaev
  2017-06-01  9:42     ` Paolo Bonzini
@ 2017-06-01  9:48     ` Fam Zheng
  1 sibling, 0 replies; 11+ messages in thread
From: Fam Zheng @ 2017-06-01  9:48 UTC (permalink / raw)
  To: Roman Penyaev; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Paolo Bonzini

On Thu, 06/01 11:38, Roman Penyaev wrote:
> On Tue, May 30, 2017 at 1:35 PM, Fam Zheng <famz@redhat.com> wrote:
> [cut]
> > Reviewed-by: Fam Zheng <famz@redhat.com>
> 
> Do I need to resend the patch with 'reviewed-by' line?  Or it is
> already queued?

No need to resend just for adding reviewed-by, it can be picked up by
maintainers when applying. The question is who's going to take it :)

Fam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] coroutine-lock: do not touch coroutine after another one has been entered
  2017-05-31 13:23   ` Roman Penyaev
@ 2017-06-01 13:15     ` Stefan Hajnoczi
  2017-06-01 16:08       ` Roman Penyaev
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2017-06-01 13:15 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Stefan Hajnoczi, Kevin Wolf, Fam Zheng, qemu-devel, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 3071 bytes --]

On Wed, May 31, 2017 at 03:23:25PM +0200, Roman Penyaev wrote:
> On Wed, May 31, 2017 at 3:06 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Tue, May 30, 2017 at 12:07:36PM +0200, Roman Pen wrote:
> >> diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
> >> index 6328eed26bc6..d589d8c66d5e 100644
> >> --- a/util/qemu-coroutine-lock.c
> >> +++ b/util/qemu-coroutine-lock.c
> >> @@ -77,10 +77,20 @@ void coroutine_fn qemu_co_queue_wait(CoQueue *queue, CoMutex *mutex)
> >>  void qemu_co_queue_run_restart(Coroutine *co)
> >>  {
> >>      Coroutine *next;
> >> +    QSIMPLEQ_HEAD(, Coroutine) tmp_queue_wakeup =
> >> +        QSIMPLEQ_HEAD_INITIALIZER(tmp_queue_wakeup);
> >>
> >>      trace_qemu_co_queue_run_restart(co);
> >> -    while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) {
> >> -        QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next);
> >> +
> >> +    /* Because "co" has yielded, any coroutine that we wakeup can resume it.
> >> +     * If this happens and "co" terminates, co->co_queue_wakeup becomes
> >> +     * invalid memory.  Therefore, use a temporary queue and do not touch
> >> +     * the "co" coroutine as soon as you enter another one.
> >> +     */
> >> +    QSIMPLEQ_CONCAT(&tmp_queue_wakeup, &co->co_queue_wakeup);
> >> +
> >> +    while ((next = QSIMPLEQ_FIRST(&tmp_queue_wakeup))) {
> >> +        QSIMPLEQ_REMOVE_HEAD(&tmp_queue_wakeup, co_queue_next);
> >>          qemu_coroutine_enter(next);
> >>      }
> >>  }
> >
> > What happens if co remains alive and qemu_coroutine_enter(next) causes
> > additional coroutines to add themselves to co->co_queue_wakeup?
> 
> Yeah, I thought about it.  But according to my understanding the only
> path where you add something to the tail of a queue is:
> 
> void aio_co_enter(AioContext *ctx, struct Coroutine *co)
> {
> ...
>    if (qemu_in_coroutine()) {
>         Coroutine *self = qemu_coroutine_self();
>         assert(self != co);
>         QSIMPLEQ_INSERT_TAIL(&self->co_queue_wakeup, co,
> co_queue_next); <<<< HERE
> 
> So you should be in *that* coroutine to chain other coroutines.
> That means that caller of your 'co' will be responsible to complete
> what it has in the list.  Something like that:
> 
> 
> co1 YIELDED,
> foreach co in co1.queue{co2}
>    enter(co) -------------->  co2 does something and
>                               eventually enter(co1):  -----> co1 does
> something and
>                                                              add co4
> to the queue
>                                                              terminates
>                                                       <-----
>                               co2 iterates over the queue of co1 and
>                                foreach co in co1.queue{co4}
> 
> 
> Sorry, the explanation is totally crap, but the key is:
> caller is responsible for cleaning the queue no matter what
> happens.  Sounds sane?

Yes, that makes sense.  A comment in the code would be helpful.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/1] coroutine-lock: do not touch coroutine after another one has been entered
  2017-06-01 13:15     ` Stefan Hajnoczi
@ 2017-06-01 16:08       ` Roman Penyaev
  0 siblings, 0 replies; 11+ messages in thread
From: Roman Penyaev @ 2017-06-01 16:08 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Stefan Hajnoczi, Kevin Wolf, Fam Zheng, qemu-devel, Paolo Bonzini

On Thu, Jun 1, 2017 at 3:15 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> On Wed, May 31, 2017 at 03:23:25PM +0200, Roman Penyaev wrote:
>> On Wed, May 31, 2017 at 3:06 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> > On Tue, May 30, 2017 at 12:07:36PM +0200, Roman Pen wrote:
>> >> diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
>> >> index 6328eed26bc6..d589d8c66d5e 100644
>> >> --- a/util/qemu-coroutine-lock.c
>> >> +++ b/util/qemu-coroutine-lock.c
>> >> @@ -77,10 +77,20 @@ void coroutine_fn qemu_co_queue_wait(CoQueue *queue, CoMutex *mutex)
>> >>  void qemu_co_queue_run_restart(Coroutine *co)
>> >>  {
>> >>      Coroutine *next;
>> >> +    QSIMPLEQ_HEAD(, Coroutine) tmp_queue_wakeup =
>> >> +        QSIMPLEQ_HEAD_INITIALIZER(tmp_queue_wakeup);
>> >>
>> >>      trace_qemu_co_queue_run_restart(co);
>> >> -    while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) {
>> >> -        QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next);
>> >> +
>> >> +    /* Because "co" has yielded, any coroutine that we wakeup can resume it.
>> >> +     * If this happens and "co" terminates, co->co_queue_wakeup becomes
>> >> +     * invalid memory.  Therefore, use a temporary queue and do not touch
>> >> +     * the "co" coroutine as soon as you enter another one.
>> >> +     */
>> >> +    QSIMPLEQ_CONCAT(&tmp_queue_wakeup, &co->co_queue_wakeup);
>> >> +
>> >> +    while ((next = QSIMPLEQ_FIRST(&tmp_queue_wakeup))) {
>> >> +        QSIMPLEQ_REMOVE_HEAD(&tmp_queue_wakeup, co_queue_next);
>> >>          qemu_coroutine_enter(next);
>> >>      }
>> >>  }
>> >
>> > What happens if co remains alive and qemu_coroutine_enter(next) causes
>> > additional coroutines to add themselves to co->co_queue_wakeup?
>>
>> Yeah, I thought about it.  But according to my understanding the only
>> path where you add something to the tail of a queue is:
>>
>> void aio_co_enter(AioContext *ctx, struct Coroutine *co)
>> {
>> ...
>>    if (qemu_in_coroutine()) {
>>         Coroutine *self = qemu_coroutine_self();
>>         assert(self != co);
>>         QSIMPLEQ_INSERT_TAIL(&self->co_queue_wakeup, co,
>> co_queue_next); <<<< HERE
>>
>> So you should be in *that* coroutine to chain other coroutines.
>> That means that caller of your 'co' will be responsible to complete
>> what it has in the list.  Something like that:
>>
>>
>> co1 YIELDED,
>> foreach co in co1.queue{co2}
>>    enter(co) -------------->  co2 does something and
>>                               eventually enter(co1):  -----> co1 does
>> something and
>>                                                              add co4
>> to the queue
>>                                                              terminates
>>                                                       <-----
>>                               co2 iterates over the queue of co1 and
>>                                foreach co in co1.queue{co4}
>>
>>
>> Sorry, the explanation is totally crap, but the key is:
>> caller is responsible for cleaning the queue no matter what
>> happens.  Sounds sane?
>
> Yes, that makes sense.  A comment in the code would be helpful.

Will resend v3 then.

--
Roman

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-06-01 16:08 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-30 10:07 [Qemu-devel] [PATCH v2 1/1] coroutine-lock: do not touch coroutine after another one has been entered Roman Pen
2017-05-30 11:35 ` Fam Zheng
2017-06-01  9:38   ` Roman Penyaev
2017-06-01  9:42     ` Paolo Bonzini
2017-06-01  9:48     ` Fam Zheng
2017-05-31 13:06 ` Stefan Hajnoczi
2017-05-31 13:22   ` Paolo Bonzini
2017-05-31 13:25     ` Roman Penyaev
2017-05-31 13:23   ` Roman Penyaev
2017-06-01 13:15     ` Stefan Hajnoczi
2017-06-01 16:08       ` Roman Penyaev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.