All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/3] linux-aio: fix "Co-routine re-entered recursively" error
@ 2016-09-27 14:06 Stefan Hajnoczi
  2016-09-27 14:06 ` [Qemu-devel] [PATCH 1/3] coroutine: add qemu_coroutine_entered() function Stefan Hajnoczi
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Stefan Hajnoczi @ 2016-09-27 14:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, qemu-block, Roman Pen, Fam Zheng, Stefan Hajnoczi

It's possible to hit the "Co-routine re-entered recursively" error with -drive
format=qcow2,aio=native.  This is a regression introduced by a linux-aio.c
optimization.  See Patch 3 for details.

Patches 1 & 2 add a new coroutine API called qemu_coroutine_entered() for
checking whether a coroutine is currently "entered".  This makes it possible to
avoid re-entering coroutines recursively.

Stefan Hajnoczi (3):
  coroutine: add qemu_coroutine_entered() function
  test-coroutine: test qemu_coroutine_entered()
  linux-aio: fix re-entrant completion processing

 block/linux-aio.c        |  9 ++++++---
 include/qemu/coroutine.h | 13 +++++++++++++
 tests/test-coroutine.c   | 42 ++++++++++++++++++++++++++++++++++++++++++
 util/qemu-coroutine.c    |  5 +++++
 4 files changed, 66 insertions(+), 3 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 1/3] coroutine: add qemu_coroutine_entered() function
  2016-09-27 14:06 [Qemu-devel] [PATCH 0/3] linux-aio: fix "Co-routine re-entered recursively" error Stefan Hajnoczi
@ 2016-09-27 14:06 ` Stefan Hajnoczi
  2016-09-27 16:20   ` Paolo Bonzini
  2016-09-27 14:06 ` [Qemu-devel] [PATCH 2/3] test-coroutine: test qemu_coroutine_entered() Stefan Hajnoczi
  2016-09-27 14:06 ` [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing Stefan Hajnoczi
  2 siblings, 1 reply; 15+ messages in thread
From: Stefan Hajnoczi @ 2016-09-27 14:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, qemu-block, Roman Pen, Fam Zheng, Stefan Hajnoczi

See the doc comments for a description of this new coroutine API.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/qemu/coroutine.h | 13 +++++++++++++
 util/qemu-coroutine.c    |  5 +++++
 2 files changed, 18 insertions(+)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 29a2078..e6a60d5 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -92,6 +92,19 @@ Coroutine *coroutine_fn qemu_coroutine_self(void);
  */
 bool qemu_in_coroutine(void);
 
+/**
+ * Return true if the coroutine is currently entered
+ *
+ * A coroutine is "entered" if it has not yielded from the current
+ * qemu_coroutine_enter() call used to run it.  This does not mean that the
+ * coroutine is currently executing code since it may have transferred control
+ * to another coroutine using qemu_coroutine_enter().
+ *
+ * When several coroutines enter each other there may be no way to know which
+ * ones have already been entered.  In such situations this function can be
+ * used to avoid recursively entering coroutines.
+ */
+bool qemu_coroutine_entered(Coroutine *co);
 
 
 /**
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index 3cbf225..737bffa 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -146,3 +146,8 @@ void coroutine_fn qemu_coroutine_yield(void)
     self->caller = NULL;
     qemu_coroutine_switch(self, to, COROUTINE_YIELD);
 }
+
+bool qemu_coroutine_entered(Coroutine *co)
+{
+    return co->caller;
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 2/3] test-coroutine: test qemu_coroutine_entered()
  2016-09-27 14:06 [Qemu-devel] [PATCH 0/3] linux-aio: fix "Co-routine re-entered recursively" error Stefan Hajnoczi
  2016-09-27 14:06 ` [Qemu-devel] [PATCH 1/3] coroutine: add qemu_coroutine_entered() function Stefan Hajnoczi
@ 2016-09-27 14:06 ` Stefan Hajnoczi
  2016-09-27 14:06 ` [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing Stefan Hajnoczi
  2 siblings, 0 replies; 15+ messages in thread
From: Stefan Hajnoczi @ 2016-09-27 14:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, qemu-block, Roman Pen, Fam Zheng, Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/test-coroutine.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/tests/test-coroutine.c b/tests/test-coroutine.c
index 6431dd6..abd97c2 100644
--- a/tests/test-coroutine.c
+++ b/tests/test-coroutine.c
@@ -53,6 +53,47 @@ static void test_self(void)
 }
 
 /*
+ * Check that qemu_coroutine_entered() works
+ */
+
+static void coroutine_fn verify_entered_step_2(void *opaque)
+{
+    Coroutine *caller = (Coroutine *)opaque;
+
+    g_assert(qemu_coroutine_entered(caller));
+    g_assert(qemu_coroutine_entered(qemu_coroutine_self()));
+    qemu_coroutine_yield();
+
+    /* Once more to check it still works after yielding */
+    g_assert(qemu_coroutine_entered(caller));
+    g_assert(qemu_coroutine_entered(qemu_coroutine_self()));
+    qemu_coroutine_yield();
+}
+
+static void coroutine_fn verify_entered_step_1(void *opaque)
+{
+    Coroutine *self = qemu_coroutine_self();
+    Coroutine *coroutine;
+
+    g_assert(qemu_coroutine_entered(self));
+
+    coroutine = qemu_coroutine_create(verify_entered_step_2, self);
+    g_assert(!qemu_coroutine_entered(coroutine));
+    qemu_coroutine_enter(coroutine);
+    g_assert(!qemu_coroutine_entered(coroutine));
+    qemu_coroutine_enter(coroutine);
+}
+
+static void test_entered(void)
+{
+    Coroutine *coroutine;
+
+    coroutine = qemu_coroutine_create(verify_entered_step_1, NULL);
+    g_assert(!qemu_coroutine_entered(coroutine));
+    qemu_coroutine_enter(coroutine);
+}
+
+/*
  * Check that coroutines may nest multiple levels
  */
 
@@ -389,6 +430,7 @@ int main(int argc, char **argv)
     g_test_add_func("/basic/yield", test_yield);
     g_test_add_func("/basic/nesting", test_nesting);
     g_test_add_func("/basic/self", test_self);
+    g_test_add_func("/basic/entered", test_entered);
     g_test_add_func("/basic/in_coroutine", test_in_coroutine);
     g_test_add_func("/basic/order", test_order);
     if (g_test_perf()) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing
  2016-09-27 14:06 [Qemu-devel] [PATCH 0/3] linux-aio: fix "Co-routine re-entered recursively" error Stefan Hajnoczi
  2016-09-27 14:06 ` [Qemu-devel] [PATCH 1/3] coroutine: add qemu_coroutine_entered() function Stefan Hajnoczi
  2016-09-27 14:06 ` [Qemu-devel] [PATCH 2/3] test-coroutine: test qemu_coroutine_entered() Stefan Hajnoczi
@ 2016-09-27 14:06 ` Stefan Hajnoczi
  2016-09-27 14:29   ` Roman Penyaev
  2 siblings, 1 reply; 15+ messages in thread
From: Stefan Hajnoczi @ 2016-09-27 14:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, qemu-block, Roman Pen, Fam Zheng, Stefan Hajnoczi

Commit 0ed93d84edabc7656f5c998ae1a346fe8b94ca54 ("linux-aio: process
completions from ioq_submit()") added an optimization that processes
completions each time ioq_submit() returns with requests in flight.
This commit introduces a "Co-routine re-entered recursively" error which
can be triggered with -drive format=qcow2,aio=native.

Fam Zheng <famz@redhat.com>, Kevin Wolf <kwolf@redhat.com>, and I
debugged the following backtrace:

  (gdb) bt
  #0  0x00007ffff0a046f5 in raise () at /lib64/libc.so.6
  #1  0x00007ffff0a062fa in abort () at /lib64/libc.so.6
  #2  0x0000555555ac0013 in qemu_coroutine_enter (co=0x5555583464d0) at util/qemu-coroutine.c:113
  #3  0x0000555555a4b663 in qemu_laio_process_completions (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218
  #4  0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:331
  #5  0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, laiocb=laiocb@entry=0x555559d38ae0, offset=offset@entry=2932727808, type=type@entry=1) at block/linux-aio.c:383
  #6  0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, s=0x555557e2f7f0, fd=13, offset=2932727808, qiov=0x555559d38e20, type=1) at block/linux-aio.c:402
  #7  0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, offset=offset@entry=2932727808, bytes=bytes@entry=8192, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:804
  #8  0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x55555663bcb0, req=req@entry=0x555559d38d20, offset=offset@entry=2932727808, bytes=bytes@entry=8192, align=align@entry=512, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:1041
  #9  0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=2932727808, bytes=8192, qiov=qiov@entry=0x555559d38e20, flags=flags@entry=0) at block/io.c:1133
  #10 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=<optimized out>) at block/qcow2.c:1509
  #11 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:804
  #12 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x555556635890, req=req@entry=0x555559d39000, offset=offset@entry=6178725888, bytes=bytes@entry=8192, align=align@entry=1, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:1041
  #13 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=flags@entry=0) at block/io.c:1133
  #14 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=0) at block/block-backend.c:783
  #15 0x0000555555a45266 in blk_aio_read_entry (opaque=0x5555577025e0) at block/block-backend.c:991
  #16 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78

It turned out that re-entrant ioq_submit() and completion processing
between three requests caused this error.  The following check is not
sufficient to prevent recursively entering coroutines:

  if (laiocb->co != qemu_coroutine_self()) {
      qemu_coroutine_enter(laiocb->co);
  }

As the following coroutine backtrace shows, not just the current
coroutine (self) can be entered.  There might also be other coroutines
that are currently entered and transferred control due to the qcow2 lock
(CoMutex):

(gdb) qemu coroutine 0x5555583464d0

Use the new qemu_coroutine_entered() function instead of comparing
against qemu_coroutine_self().  This is correct because:

1. If a coroutine is not entered then it must have yielded to wait for
   I/O completion.  It is therefore safe to enter.

2. If a coroutine is entered then it must be in
   ioq_submit()/qemu_laio_process_completions() because otherwise it
   would be yielded while waiting for I/O completion.  Therefore it will
   check laio->ret and return from ioq_submit() instead of yielding,
   i.e. it's guaranteed not to hang.

Reported-by: Fam Zheng <famz@redhat.com>
Tested-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/linux-aio.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index d4e19d4..1685ec2 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -94,9 +94,12 @@ static void qemu_laio_process_completion(struct qemu_laiocb *laiocb)
 
     laiocb->ret = ret;
     if (laiocb->co) {
-        /* Jump and continue completion for foreign requests, don't do
-         * anything for current request, it will be completed shortly. */
-        if (laiocb->co != qemu_coroutine_self()) {
+        /* If the coroutine is already entered it must be in ioq_submit() and
+         * will notice laio->ret has been filled in when it eventually runs
+         * later.  Coroutines cannot be entered recursively so avoid doing
+         * that!
+         */
+        if (!qemu_coroutine_entered(laiocb->co)) {
             qemu_coroutine_enter(laiocb->co);
         }
     } else {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing
  2016-09-27 14:06 ` [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing Stefan Hajnoczi
@ 2016-09-27 14:29   ` Roman Penyaev
  2016-09-27 15:25     ` Stefan Hajnoczi
  0 siblings, 1 reply; 15+ messages in thread
From: Roman Penyaev @ 2016-09-27 14:29 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel, Kevin Wolf, qemu-block, Fam Zheng

Hey Stefan,

On Tue, Sep 27, 2016 at 4:06 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> Commit 0ed93d84edabc7656f5c998ae1a346fe8b94ca54 ("linux-aio: process
> completions from ioq_submit()") added an optimization that processes
> completions each time ioq_submit() returns with requests in flight.
> This commit introduces a "Co-routine re-entered recursively" error which
> can be triggered with -drive format=qcow2,aio=native.
>
> Fam Zheng <famz@redhat.com>, Kevin Wolf <kwolf@redhat.com>, and I
> debugged the following backtrace:
>
>   (gdb) bt
>   #0  0x00007ffff0a046f5 in raise () at /lib64/libc.so.6
>   #1  0x00007ffff0a062fa in abort () at /lib64/libc.so.6
>   #2  0x0000555555ac0013 in qemu_coroutine_enter (co=0x5555583464d0) at util/qemu-coroutine.c:113
>   #3  0x0000555555a4b663 in qemu_laio_process_completions (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218
>   #4  0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:331
>   #5  0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, laiocb=laiocb@entry=0x555559d38ae0, offset=offset@entry=2932727808, type=type@entry=1) at block/linux-aio.c:383
>   #6  0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, s=0x555557e2f7f0, fd=13, offset=2932727808, qiov=0x555559d38e20, type=1) at block/linux-aio.c:402
>   #7  0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, offset=offset@entry=2932727808, bytes=bytes@entry=8192, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:804
>   #8  0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x55555663bcb0, req=req@entry=0x555559d38d20, offset=offset@entry=2932727808, bytes=bytes@entry=8192, align=align@entry=512, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:1041
>   #9  0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=2932727808, bytes=8192, qiov=qiov@entry=0x555559d38e20, flags=flags@entry=0) at block/io.c:1133
>   #10 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=<optimized out>) at block/qcow2.c:1509
>   #11 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:804
>   #12 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x555556635890, req=req@entry=0x555559d39000, offset=offset@entry=6178725888, bytes=bytes@entry=8192, align=align@entry=1, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:1041
>   #13 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=flags@entry=0) at block/io.c:1133
>   #14 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=0) at block/block-backend.c:783
>   #15 0x0000555555a45266 in blk_aio_read_entry (opaque=0x5555577025e0) at block/block-backend.c:991
>   #16 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78
>
> It turned out that re-entrant ioq_submit() and completion processing
> between three requests caused this error.  The following check is not
> sufficient to prevent recursively entering coroutines:
>
>   if (laiocb->co != qemu_coroutine_self()) {
>       qemu_coroutine_enter(laiocb->co);
>   }
>
> As the following coroutine backtrace shows, not just the current
> coroutine (self) can be entered.  There might also be other coroutines
> that are currently entered and transferred control due to the qcow2 lock
> (CoMutex):

I doubt that that was introduced by the commit you've specified:
0ed93d84edab.

Before my patch coroutine was unconditionally entered.  The following
is what was changed by 0ed93d84edab:

     if (laiocb->co) {
-        qemu_coroutine_enter(laiocb->co);
+        /* Jump and continue completion for foreign requests, don't do
+         * anything for current request, it will be completed shortly. */
+        if (laiocb->co != qemu_coroutine_self()) {
+            qemu_coroutine_enter(laiocb->co);
+        }

If you have a strong reproduction, could you please verify that.

What worries me is the following:

 1. Issue was introduced before and was unnoticed (ok).
 2. Issue - is something else and/or was really introduced by commit
    0ed93d84edab (not ok).

Of course the 2. is not nice.
Thanks.

--
Roman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing
  2016-09-27 14:29   ` Roman Penyaev
@ 2016-09-27 15:25     ` Stefan Hajnoczi
  2016-09-27 17:55       ` Roman Penyaev
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Hajnoczi @ 2016-09-27 15:25 UTC (permalink / raw)
  To: Roman Penyaev; +Cc: qemu-devel, Kevin Wolf, qemu-block, Fam Zheng

[-- Attachment #1: Type: text/plain, Size: 4954 bytes --]

On Tue, Sep 27, 2016 at 04:29:55PM +0200, Roman Penyaev wrote:
> On Tue, Sep 27, 2016 at 4:06 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > Commit 0ed93d84edabc7656f5c998ae1a346fe8b94ca54 ("linux-aio: process
> > completions from ioq_submit()") added an optimization that processes
> > completions each time ioq_submit() returns with requests in flight.
> > This commit introduces a "Co-routine re-entered recursively" error which
> > can be triggered with -drive format=qcow2,aio=native.
> >
> > Fam Zheng <famz@redhat.com>, Kevin Wolf <kwolf@redhat.com>, and I
> > debugged the following backtrace:
> >
> >   (gdb) bt
> >   #0  0x00007ffff0a046f5 in raise () at /lib64/libc.so.6
> >   #1  0x00007ffff0a062fa in abort () at /lib64/libc.so.6
> >   #2  0x0000555555ac0013 in qemu_coroutine_enter (co=0x5555583464d0) at util/qemu-coroutine.c:113
> >   #3  0x0000555555a4b663 in qemu_laio_process_completions (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218
> >   #4  0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:331
> >   #5  0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, laiocb=laiocb@entry=0x555559d38ae0, offset=offset@entry=2932727808, type=type@entry=1) at block/linux-aio.c:383
> >   #6  0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, s=0x555557e2f7f0, fd=13, offset=2932727808, qiov=0x555559d38e20, type=1) at block/linux-aio.c:402
> >   #7  0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, offset=offset@entry=2932727808, bytes=bytes@entry=8192, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:804
> >   #8  0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x55555663bcb0, req=req@entry=0x555559d38d20, offset=offset@entry=2932727808, bytes=bytes@entry=8192, align=align@entry=512, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:1041
> >   #9  0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=2932727808, bytes=8192, qiov=qiov@entry=0x555559d38e20, flags=flags@entry=0) at block/io.c:1133
> >   #10 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=<optimized out>) at block/qcow2.c:1509
> >   #11 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:804
> >   #12 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x555556635890, req=req@entry=0x555559d39000, offset=offset@entry=6178725888, bytes=bytes@entry=8192, align=align@entry=1, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:1041
> >   #13 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=flags@entry=0) at block/io.c:1133
> >   #14 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=0) at block/block-backend.c:783
> >   #15 0x0000555555a45266 in blk_aio_read_entry (opaque=0x5555577025e0) at block/block-backend.c:991
> >   #16 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78
> >
> > It turned out that re-entrant ioq_submit() and completion processing
> > between three requests caused this error.  The following check is not
> > sufficient to prevent recursively entering coroutines:
> >
> >   if (laiocb->co != qemu_coroutine_self()) {
> >       qemu_coroutine_enter(laiocb->co);
> >   }
> >
> > As the following coroutine backtrace shows, not just the current
> > coroutine (self) can be entered.  There might also be other coroutines
> > that are currently entered and transferred control due to the qcow2 lock
> > (CoMutex):
> 
> I doubt that that was introduced by the commit you've specified:
> 0ed93d84edab.
> 
> Before my patch coroutine was unconditionally entered.  The following
> is what was changed by 0ed93d84edab:
> 
>      if (laiocb->co) {
> -        qemu_coroutine_enter(laiocb->co);
> +        /* Jump and continue completion for foreign requests, don't do
> +         * anything for current request, it will be completed shortly. */
> +        if (laiocb->co != qemu_coroutine_self()) {
> +            qemu_coroutine_enter(laiocb->co);
> +        }

Unconditionally entering was safe prior to 0ed93d84edab since all
coroutines yielded and qemu_coroutine_entered() would be false all the
time.  Therefore it wasn't necessary to protect against re-entering a
coroutine.

> If you have a strong reproduction, could you please verify that.

The bug is 100% deterministic.  Just boot up a guest with -drive
format=qcow2,aio=native.

I noticed that I forgot to include a second backtrace in the commit
description.  I am resending the patch series with the missing
backtrace.  Hopefully that will make the bug clearer.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] coroutine: add qemu_coroutine_entered() function
  2016-09-27 14:06 ` [Qemu-devel] [PATCH 1/3] coroutine: add qemu_coroutine_entered() function Stefan Hajnoczi
@ 2016-09-27 16:20   ` Paolo Bonzini
  2016-09-27 16:29     ` Stefan Hajnoczi
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2016-09-27 16:20 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel; +Cc: Kevin Wolf, Fam Zheng, Roman Pen, qemu-block



On 27/09/2016 16:06, Stefan Hajnoczi wrote:
> See the doc comments for a description of this new coroutine API.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/qemu/coroutine.h | 13 +++++++++++++
>  util/qemu-coroutine.c    |  5 +++++
>  2 files changed, 18 insertions(+)
> 
> diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
> index 29a2078..e6a60d5 100644
> --- a/include/qemu/coroutine.h
> +++ b/include/qemu/coroutine.h
> @@ -92,6 +92,19 @@ Coroutine *coroutine_fn qemu_coroutine_self(void);
>   */
>  bool qemu_in_coroutine(void);
>  
> +/**
> + * Return true if the coroutine is currently entered
> + *
> + * A coroutine is "entered" if it has not yielded from the current
> + * qemu_coroutine_enter() call used to run it.  This does not mean that the
> + * coroutine is currently executing code since it may have transferred control
> + * to another coroutine using qemu_coroutine_enter().
> + *
> + * When several coroutines enter each other there may be no way to know which
> + * ones have already been entered.  In such situations this function can be
> + * used to avoid recursively entering coroutines.
> + */
> +bool qemu_coroutine_entered(Coroutine *co);

Perhaps qemu_coroutine_running is a better name?

Paolo

>  
>  
>  /**
> diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
> index 3cbf225..737bffa 100644
> --- a/util/qemu-coroutine.c
> +++ b/util/qemu-coroutine.c
> @@ -146,3 +146,8 @@ void coroutine_fn qemu_coroutine_yield(void)
>      self->caller = NULL;
>      qemu_coroutine_switch(self, to, COROUTINE_YIELD);
>  }
> +
> +bool qemu_coroutine_entered(Coroutine *co)
> +{
> +    return co->caller;
> +}
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] coroutine: add qemu_coroutine_entered() function
  2016-09-27 16:20   ` Paolo Bonzini
@ 2016-09-27 16:29     ` Stefan Hajnoczi
  2016-09-27 16:55       ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Hajnoczi @ 2016-09-27 16:29 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Stefan Hajnoczi, qemu-devel, Kevin Wolf, Fam Zheng, qemu block,
	Roman Pen

On Tue, Sep 27, 2016 at 5:20 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 27/09/2016 16:06, Stefan Hajnoczi wrote:
>> See the doc comments for a description of this new coroutine API.
>>
>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>>  include/qemu/coroutine.h | 13 +++++++++++++
>>  util/qemu-coroutine.c    |  5 +++++
>>  2 files changed, 18 insertions(+)
>>
>> diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
>> index 29a2078..e6a60d5 100644
>> --- a/include/qemu/coroutine.h
>> +++ b/include/qemu/coroutine.h
>> @@ -92,6 +92,19 @@ Coroutine *coroutine_fn qemu_coroutine_self(void);
>>   */
>>  bool qemu_in_coroutine(void);
>>
>> +/**
>> + * Return true if the coroutine is currently entered
>> + *
>> + * A coroutine is "entered" if it has not yielded from the current
>> + * qemu_coroutine_enter() call used to run it.  This does not mean that the
>> + * coroutine is currently executing code since it may have transferred control
>> + * to another coroutine using qemu_coroutine_enter().
>> + *
>> + * When several coroutines enter each other there may be no way to know which
>> + * ones have already been entered.  In such situations this function can be
>> + * used to avoid recursively entering coroutines.
>> + */
>> +bool qemu_coroutine_entered(Coroutine *co);
>
> Perhaps qemu_coroutine_running is a better name?

I find "running" confusing since the coroutine may not actually be
currently executing (as mentioned in the doc comment).

Stefan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] coroutine: add qemu_coroutine_entered() function
  2016-09-27 16:29     ` Stefan Hajnoczi
@ 2016-09-27 16:55       ` Paolo Bonzini
  2016-09-28  9:50         ` Kevin Wolf
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2016-09-27 16:55 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Stefan Hajnoczi, qemu-devel, Kevin Wolf, Fam Zheng, qemu block,
	Roman Pen



On 27/09/2016 18:29, Stefan Hajnoczi wrote:
> On Tue, Sep 27, 2016 at 5:20 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>>
>> On 27/09/2016 16:06, Stefan Hajnoczi wrote:
>>> See the doc comments for a description of this new coroutine API.
>>>
>>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>> ---
>>>  include/qemu/coroutine.h | 13 +++++++++++++
>>>  util/qemu-coroutine.c    |  5 +++++
>>>  2 files changed, 18 insertions(+)
>>>
>>> diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
>>> index 29a2078..e6a60d5 100644
>>> --- a/include/qemu/coroutine.h
>>> +++ b/include/qemu/coroutine.h
>>> @@ -92,6 +92,19 @@ Coroutine *coroutine_fn qemu_coroutine_self(void);
>>>   */
>>>  bool qemu_in_coroutine(void);
>>>
>>> +/**
>>> + * Return true if the coroutine is currently entered
>>> + *
>>> + * A coroutine is "entered" if it has not yielded from the current
>>> + * qemu_coroutine_enter() call used to run it.  This does not mean that the
>>> + * coroutine is currently executing code since it may have transferred control
>>> + * to another coroutine using qemu_coroutine_enter().
>>> + *
>>> + * When several coroutines enter each other there may be no way to know which
>>> + * ones have already been entered.  In such situations this function can be
>>> + * used to avoid recursively entering coroutines.
>>> + */
>>> +bool qemu_coroutine_entered(Coroutine *co);
>>
>> Perhaps qemu_coroutine_running is a better name?
> 
> I find "running" confusing since the coroutine may not actually be
> currently executing (as mentioned in the doc comment).

Ok, makes sense.  Another possibility is qemu_coroutine_on_stack, but
I'm not sure it's better...

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing
  2016-09-27 15:25     ` Stefan Hajnoczi
@ 2016-09-27 17:55       ` Roman Penyaev
  2016-09-28  3:01         ` Fam Zheng
  0 siblings, 1 reply; 15+ messages in thread
From: Roman Penyaev @ 2016-09-27 17:55 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel, Kevin Wolf, qemu-block, Fam Zheng

On Tue, Sep 27, 2016 at 5:25 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> On Tue, Sep 27, 2016 at 04:29:55PM +0200, Roman Penyaev wrote:
>> On Tue, Sep 27, 2016 at 4:06 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>> > Commit 0ed93d84edabc7656f5c998ae1a346fe8b94ca54 ("linux-aio: process
>> > completions from ioq_submit()") added an optimization that processes
>> > completions each time ioq_submit() returns with requests in flight.
>> > This commit introduces a "Co-routine re-entered recursively" error which
>> > can be triggered with -drive format=qcow2,aio=native.
>> >
>> > Fam Zheng <famz@redhat.com>, Kevin Wolf <kwolf@redhat.com>, and I
>> > debugged the following backtrace:
>> >
>> >   (gdb) bt
>> >   #0  0x00007ffff0a046f5 in raise () at /lib64/libc.so.6
>> >   #1  0x00007ffff0a062fa in abort () at /lib64/libc.so.6
>> >   #2  0x0000555555ac0013 in qemu_coroutine_enter (co=0x5555583464d0) at util/qemu-coroutine.c:113
>> >   #3  0x0000555555a4b663 in qemu_laio_process_completions (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218
>> >   #4  0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:331
>> >   #5  0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, laiocb=laiocb@entry=0x555559d38ae0, offset=offset@entry=2932727808, type=type@entry=1) at block/linux-aio.c:383
>> >   #6  0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, s=0x555557e2f7f0, fd=13, offset=2932727808, qiov=0x555559d38e20, type=1) at block/linux-aio.c:402
>> >   #7  0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, offset=offset@entry=2932727808, bytes=bytes@entry=8192, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:804
>> >   #8  0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x55555663bcb0, req=req@entry=0x555559d38d20, offset=offset@entry=2932727808, bytes=bytes@entry=8192, align=align@entry=512, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:1041
>> >   #9  0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=2932727808, bytes=8192, qiov=qiov@entry=0x555559d38e20, flags=flags@entry=0) at block/io.c:1133
>> >   #10 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=<optimized out>) at block/qcow2.c:1509
>> >   #11 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:804
>> >   #12 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x555556635890, req=req@entry=0x555559d39000, offset=offset@entry=6178725888, bytes=bytes@entry=8192, align=align@entry=1, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:1041
>> >   #13 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=flags@entry=0) at block/io.c:1133
>> >   #14 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=0) at block/block-backend.c:783
>> >   #15 0x0000555555a45266 in blk_aio_read_entry (opaque=0x5555577025e0) at block/block-backend.c:991
>> >   #16 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78
>> >
>> > It turned out that re-entrant ioq_submit() and completion processing
>> > between three requests caused this error.  The following check is not
>> > sufficient to prevent recursively entering coroutines:
>> >
>> >   if (laiocb->co != qemu_coroutine_self()) {
>> >       qemu_coroutine_enter(laiocb->co);
>> >   }
>> >
>> > As the following coroutine backtrace shows, not just the current
>> > coroutine (self) can be entered.  There might also be other coroutines
>> > that are currently entered and transferred control due to the qcow2 lock
>> > (CoMutex):
>>
>> I doubt that that was introduced by the commit you've specified:
>> 0ed93d84edab.
>>
>> Before my patch coroutine was unconditionally entered.  The following
>> is what was changed by 0ed93d84edab:
>>
>>      if (laiocb->co) {
>> -        qemu_coroutine_enter(laiocb->co);
>> +        /* Jump and continue completion for foreign requests, don't do
>> +         * anything for current request, it will be completed shortly. */
>> +        if (laiocb->co != qemu_coroutine_self()) {
>> +            qemu_coroutine_enter(laiocb->co);
>> +        }
>
> Unconditionally entering was safe prior to 0ed93d84edab since all
> coroutines yielded and qemu_coroutine_entered() would be false all the
> time.  Therefore it wasn't necessary to protect against re-entering a
> coroutine.
>
>> If you have a strong reproduction, could you please verify that.
>
> The bug is 100% deterministic.  Just boot up a guest with -drive
> format=qcow2,aio=native.

It turns out to be that everything is broken.  I started all my
tests with format=raw,aio=native and immediately got coroutine
recursive.  That is completely weird.

So, what I did is the following:

1. Took latest master (nothing works)
2. Did interactive rebase to 12c8720
12c8720 2016-06-28 | Merge remote-tracking branch
'remotes/stefanha/tags/block-pull-request' into staging [Peter
Maydell]

this merge request includes all your patches related to
virtio-blk and MQ support.

3. Applied 0ed93d84edab. Everything works fine.

4. Rebased up till 0647d47:
0647d47 2016-09-13 | qcow2: avoid memcpy(dst, NULL, len) [Stefan Hajnoczi]

this is the point, after which 0ed93d84edab was applied
on master.

Got recursive coroutine, so nothing works.

5. Did a besect, which shows this commit:

--
commit 1a62d0accdf85fbeac149018ee8d1728e754de73
Author: Eric Blake <eblake@redhat.com>
Date:   Fri Jul 15 12:31:59 2016 -0600

    block: Fragment reads to max transfer length
--

So after this commit my commit 0ed93d84edab stops working.
And now for me is completely not clear what is happening there.

--
Roman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing
  2016-09-27 17:55       ` Roman Penyaev
@ 2016-09-28  3:01         ` Fam Zheng
  2016-09-28  9:14           ` Roman Penyaev
  0 siblings, 1 reply; 15+ messages in thread
From: Fam Zheng @ 2016-09-28  3:01 UTC (permalink / raw)
  To: Roman Penyaev; +Cc: Stefan Hajnoczi, Kevin Wolf, qemu-devel, qemu-block

On Tue, 09/27 19:55, Roman Penyaev wrote:
> > The bug is 100% deterministic.  Just boot up a guest with -drive
> > format=qcow2,aio=native.
> 
> It turns out to be that everything is broken.  I started all my
> tests with format=raw,aio=native and immediately got coroutine
> recursive.  That is completely weird.
> 
> So, what I did is the following:
> 
> 1. Took latest master (nothing works)
> 2. Did interactive rebase to 12c8720
> 12c8720 2016-06-28 | Merge remote-tracking branch
> 'remotes/stefanha/tags/block-pull-request' into staging [Peter
> Maydell]
> 
> this merge request includes all your patches related to
> virtio-blk and MQ support.
> 
> 3. Applied 0ed93d84edab. Everything works fine.

Have you tried qcow2 at this point? raw crashes with 1a62d0accdf85 doesn't mean
qcow2 is fine without it.

Fam

> 
> 4. Rebased up till 0647d47:
> 0647d47 2016-09-13 | qcow2: avoid memcpy(dst, NULL, len) [Stefan Hajnoczi]
> 
> this is the point, after which 0ed93d84edab was applied
> on master.
> 
> Got recursive coroutine, so nothing works.
> 
> 5. Did a besect, which shows this commit:
> 
> --
> commit 1a62d0accdf85fbeac149018ee8d1728e754de73
> Author: Eric Blake <eblake@redhat.com>
> Date:   Fri Jul 15 12:31:59 2016 -0600
> 
>     block: Fragment reads to max transfer length
> --
> 
> So after this commit my commit 0ed93d84edab stops working.
> And now for me is completely not clear what is happening there.
> 
> --
> Roman
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing
  2016-09-28  3:01         ` Fam Zheng
@ 2016-09-28  9:14           ` Roman Penyaev
  2016-09-28  9:34             ` Fam Zheng
  0 siblings, 1 reply; 15+ messages in thread
From: Roman Penyaev @ 2016-09-28  9:14 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Stefan Hajnoczi, Kevin Wolf, qemu-devel, qemu-block

On Wed, Sep 28, 2016 at 5:01 AM, Fam Zheng <famz@redhat.com> wrote:
> On Tue, 09/27 19:55, Roman Penyaev wrote:
>> > The bug is 100% deterministic.  Just boot up a guest with -drive
>> > format=qcow2,aio=native.
>>
>> It turns out to be that everything is broken.  I started all my
>> tests with format=raw,aio=native and immediately got coroutine
>> recursive.  That is completely weird.
>>
>> So, what I did is the following:
>>
>> 1. Took latest master (nothing works)
>> 2. Did interactive rebase to 12c8720
>> 12c8720 2016-06-28 | Merge remote-tracking branch
>> 'remotes/stefanha/tags/block-pull-request' into staging [Peter
>> Maydell]
>>
>> this merge request includes all your patches related to
>> virtio-blk and MQ support.
>>
>> 3. Applied 0ed93d84edab. Everything works fine.
>
> Have you tried qcow2 at this point? raw crashes with 1a62d0accdf85 doesn't mean
> qcow2 is fine without it.
>

That's true.  qcow2 IO path is different, and presence of the
patch 1a62d0accdf85 does not affect - coroutine still enters
recursively.

But for me it is quite surprising that IO fragmentation (what
was done in 1a62d0accdf85) rises the misbehavior on raw IO path.

But of course originally issue was introduced by me.  Stefan,
thanks for a fix.

--
Roman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing
  2016-09-28  9:14           ` Roman Penyaev
@ 2016-09-28  9:34             ` Fam Zheng
  2016-09-28  9:38               ` Roman Penyaev
  0 siblings, 1 reply; 15+ messages in thread
From: Fam Zheng @ 2016-09-28  9:34 UTC (permalink / raw)
  To: Roman Penyaev; +Cc: Stefan Hajnoczi, Kevin Wolf, qemu-devel, qemu-block

On Wed, 09/28 11:14, Roman Penyaev wrote:
> On Wed, Sep 28, 2016 at 5:01 AM, Fam Zheng <famz@redhat.com> wrote:
> > On Tue, 09/27 19:55, Roman Penyaev wrote:
> >> > The bug is 100% deterministic.  Just boot up a guest with -drive
> >> > format=qcow2,aio=native.
> >>
> >> It turns out to be that everything is broken.  I started all my
> >> tests with format=raw,aio=native and immediately got coroutine
> >> recursive.  That is completely weird.
> >>
> >> So, what I did is the following:
> >>
> >> 1. Took latest master (nothing works)
> >> 2. Did interactive rebase to 12c8720
> >> 12c8720 2016-06-28 | Merge remote-tracking branch
> >> 'remotes/stefanha/tags/block-pull-request' into staging [Peter
> >> Maydell]
> >>
> >> this merge request includes all your patches related to
> >> virtio-blk and MQ support.
> >>
> >> 3. Applied 0ed93d84edab. Everything works fine.
> >
> > Have you tried qcow2 at this point? raw crashes with 1a62d0accdf85 doesn't mean
> > qcow2 is fine without it.
> >
> 
> That's true.  qcow2 IO path is different, and presence of the
> patch 1a62d0accdf85 does not affect - coroutine still enters
> recursively.
> 
> But for me it is quite surprising that IO fragmentation (what
> was done in 1a62d0accdf85) rises the misbehavior on raw IO path.

Maybe the mystery with this change is your particular I/O pattern on the raw
image is change thereafter, from ioq = 1 to ioq > 1 (from the linux-aio.c's
PoV, due to fragmentation), then multiple coroutines are created for one big
request, to trigger the crash.

Fam

> 
> But of course originally issue was introduced by me.  Stefan,
> thanks for a fix.
> 
> --
> Roman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing
  2016-09-28  9:34             ` Fam Zheng
@ 2016-09-28  9:38               ` Roman Penyaev
  0 siblings, 0 replies; 15+ messages in thread
From: Roman Penyaev @ 2016-09-28  9:38 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Stefan Hajnoczi, Kevin Wolf, qemu-devel, qemu-block

On Wed, Sep 28, 2016 at 11:34 AM, Fam Zheng <famz@redhat.com> wrote:
> On Wed, 09/28 11:14, Roman Penyaev wrote:
>> On Wed, Sep 28, 2016 at 5:01 AM, Fam Zheng <famz@redhat.com> wrote:
>> > On Tue, 09/27 19:55, Roman Penyaev wrote:
>> >> > The bug is 100% deterministic.  Just boot up a guest with -drive
>> >> > format=qcow2,aio=native.
>> >>
>> >> It turns out to be that everything is broken.  I started all my
>> >> tests with format=raw,aio=native and immediately got coroutine
>> >> recursive.  That is completely weird.
>> >>
>> >> So, what I did is the following:
>> >>
>> >> 1. Took latest master (nothing works)
>> >> 2. Did interactive rebase to 12c8720
>> >> 12c8720 2016-06-28 | Merge remote-tracking branch
>> >> 'remotes/stefanha/tags/block-pull-request' into staging [Peter
>> >> Maydell]
>> >>
>> >> this merge request includes all your patches related to
>> >> virtio-blk and MQ support.
>> >>
>> >> 3. Applied 0ed93d84edab. Everything works fine.
>> >
>> > Have you tried qcow2 at this point? raw crashes with 1a62d0accdf85 doesn't mean
>> > qcow2 is fine without it.
>> >
>>
>> That's true.  qcow2 IO path is different, and presence of the
>> patch 1a62d0accdf85 does not affect - coroutine still enters
>> recursively.
>>
>> But for me it is quite surprising that IO fragmentation (what
>> was done in 1a62d0accdf85) rises the misbehavior on raw IO path.
>
> Maybe the mystery with this change is your particular I/O pattern on the raw
> image is change thereafter, from ioq = 1 to ioq > 1 (from the linux-aio.c's
> PoV, due to fragmentation), then multiple coroutines are created for one big
> request, to trigger the crash.

Yes, could be.  The only major difference in this patch is a loop over
cut requests.

--
Roman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] coroutine: add qemu_coroutine_entered() function
  2016-09-27 16:55       ` Paolo Bonzini
@ 2016-09-28  9:50         ` Kevin Wolf
  0 siblings, 0 replies; 15+ messages in thread
From: Kevin Wolf @ 2016-09-28  9:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Stefan Hajnoczi, Stefan Hajnoczi, qemu-devel, Fam Zheng,
	qemu block, Roman Pen

Am 27.09.2016 um 18:55 hat Paolo Bonzini geschrieben:
> 
> 
> On 27/09/2016 18:29, Stefan Hajnoczi wrote:
> > On Tue, Sep 27, 2016 at 5:20 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>
> >>
> >> On 27/09/2016 16:06, Stefan Hajnoczi wrote:
> >>> See the doc comments for a description of this new coroutine API.
> >>>
> >>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> >>> ---
> >>>  include/qemu/coroutine.h | 13 +++++++++++++
> >>>  util/qemu-coroutine.c    |  5 +++++
> >>>  2 files changed, 18 insertions(+)
> >>>
> >>> diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
> >>> index 29a2078..e6a60d5 100644
> >>> --- a/include/qemu/coroutine.h
> >>> +++ b/include/qemu/coroutine.h
> >>> @@ -92,6 +92,19 @@ Coroutine *coroutine_fn qemu_coroutine_self(void);
> >>>   */
> >>>  bool qemu_in_coroutine(void);
> >>>
> >>> +/**
> >>> + * Return true if the coroutine is currently entered
> >>> + *
> >>> + * A coroutine is "entered" if it has not yielded from the current
> >>> + * qemu_coroutine_enter() call used to run it.  This does not mean that the
> >>> + * coroutine is currently executing code since it may have transferred control
> >>> + * to another coroutine using qemu_coroutine_enter().
> >>> + *
> >>> + * When several coroutines enter each other there may be no way to know which
> >>> + * ones have already been entered.  In such situations this function can be
> >>> + * used to avoid recursively entering coroutines.
> >>> + */
> >>> +bool qemu_coroutine_entered(Coroutine *co);
> >>
> >> Perhaps qemu_coroutine_running is a better name?
> > 
> > I find "running" confusing since the coroutine may not actually be
> > currently executing (as mentioned in the doc comment).
> 
> Ok, makes sense.  Another possibility is qemu_coroutine_on_stack, but
> I'm not sure it's better...

Maybe qemu_coroutine_active()?

Kevin

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-09-28  9:50 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-27 14:06 [Qemu-devel] [PATCH 0/3] linux-aio: fix "Co-routine re-entered recursively" error Stefan Hajnoczi
2016-09-27 14:06 ` [Qemu-devel] [PATCH 1/3] coroutine: add qemu_coroutine_entered() function Stefan Hajnoczi
2016-09-27 16:20   ` Paolo Bonzini
2016-09-27 16:29     ` Stefan Hajnoczi
2016-09-27 16:55       ` Paolo Bonzini
2016-09-28  9:50         ` Kevin Wolf
2016-09-27 14:06 ` [Qemu-devel] [PATCH 2/3] test-coroutine: test qemu_coroutine_entered() Stefan Hajnoczi
2016-09-27 14:06 ` [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing Stefan Hajnoczi
2016-09-27 14:29   ` Roman Penyaev
2016-09-27 15:25     ` Stefan Hajnoczi
2016-09-27 17:55       ` Roman Penyaev
2016-09-28  3:01         ` Fam Zheng
2016-09-28  9:14           ` Roman Penyaev
2016-09-28  9:34             ` Fam Zheng
2016-09-28  9:38               ` Roman Penyaev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.