All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cb
@ 2018-09-05  9:33 Sergio Lopez
  2018-09-12  7:35 ` Fam Zheng
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Sergio Lopez @ 2018-09-05  9:33 UTC (permalink / raw)
  To: kwolf, famz, stefanha, qemu-block; +Cc: qemu-devel, Sergio Lopez

AIO Coroutines shouldn't by managed by an AioContext different than the
one assigned when they are created. aio_co_enter avoids entering a
coroutine from a different AioContext, calling aio_co_schedule instead.

Scheduled coroutines are then entered by co_schedule_bh_cb using
qemu_coroutine_enter, which just calls qemu_aio_coroutine_enter with the
current AioContext obtained with qemu_get_current_aio_context.
Eventually, co->ctx will be set to the AioContext passed as an argument
to qemu_aio_coroutine_enter.

This means that, if an IO Thread's AioConext is being processed by the
Main Thread (due to aio_poll being called with a BDS AioContext, as it
happens in AIO_WAIT_WHILE among other places), the AioContext from some
coroutines may be wrongly replaced with the one from the Main Thread.

This is the root cause behind some crashes, mainly triggered by the
drain code at block/io.c. The most common are these abort and failed
assertion:

util/async.c:aio_co_schedule
456     if (scheduled) {
457         fprintf(stderr,
458                 "%s: Co-routine was already scheduled in '%s'\n",
459                 __func__, scheduled);
460         abort();
461     }

util/qemu-coroutine-lock.c:
286     assert(mutex->holder == self);

But it's also known to cause random errors at different locations, and
even SIGSEGV with broken coroutine backtraces.

By using qemu_aio_coroutine_enter directly in co_schedule_bh_cb, we can
pass the correct AioContext as an argument, making sure co->ctx is not
wrongly altered.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 util/async.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/util/async.c b/util/async.c
index 05979f8014..c10642a385 100644
--- a/util/async.c
+++ b/util/async.c
@@ -400,7 +400,7 @@ static void co_schedule_bh_cb(void *opaque)
 
         /* Protected by write barrier in qemu_aio_coroutine_enter */
         atomic_set(&co->scheduled, NULL);
-        qemu_coroutine_enter(co);
+        qemu_aio_coroutine_enter(ctx, co);
         aio_context_release(ctx);
     }
 }
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH] util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cb
  2018-09-05  9:33 [Qemu-devel] [PATCH] util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cb Sergio Lopez
@ 2018-09-12  7:35 ` Fam Zheng
  2018-09-12  7:41 ` Fam Zheng
  2018-09-13 14:58 ` [Qemu-devel] [Qemu-block] " Paolo Bonzini
  2 siblings, 0 replies; 5+ messages in thread
From: Fam Zheng @ 2018-09-12  7:35 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: kwolf, stefanha, qemu-block, qemu-devel

On Wed, 09/05 11:33, Sergio Lopez wrote:
> AIO Coroutines shouldn't by managed by an AioContext different than the
> one assigned when they are created. aio_co_enter avoids entering a
> coroutine from a different AioContext, calling aio_co_schedule instead.
> 
> Scheduled coroutines are then entered by co_schedule_bh_cb using
> qemu_coroutine_enter, which just calls qemu_aio_coroutine_enter with the
> current AioContext obtained with qemu_get_current_aio_context.
> Eventually, co->ctx will be set to the AioContext passed as an argument
> to qemu_aio_coroutine_enter.
> 
> This means that, if an IO Thread's AioConext is being processed by the
> Main Thread (due to aio_poll being called with a BDS AioContext, as it
> happens in AIO_WAIT_WHILE among other places), the AioContext from some
> coroutines may be wrongly replaced with the one from the Main Thread.
> 
> This is the root cause behind some crashes, mainly triggered by the
> drain code at block/io.c. The most common are these abort and failed
> assertion:
> 
> util/async.c:aio_co_schedule
> 456     if (scheduled) {
> 457         fprintf(stderr,
> 458                 "%s: Co-routine was already scheduled in '%s'\n",
> 459                 __func__, scheduled);
> 460         abort();
> 461     }
> 
> util/qemu-coroutine-lock.c:
> 286     assert(mutex->holder == self);
> 
> But it's also known to cause random errors at different locations, and
> even SIGSEGV with broken coroutine backtraces.
> 
> By using qemu_aio_coroutine_enter directly in co_schedule_bh_cb, we can
> pass the correct AioContext as an argument, making sure co->ctx is not
> wrongly altered.
> 
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
>  util/async.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/util/async.c b/util/async.c
> index 05979f8014..c10642a385 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -400,7 +400,7 @@ static void co_schedule_bh_cb(void *opaque)
>  
>          /* Protected by write barrier in qemu_aio_coroutine_enter */
>          atomic_set(&co->scheduled, NULL);
> -        qemu_coroutine_enter(co);
> +        qemu_aio_coroutine_enter(ctx, co);
>          aio_context_release(ctx);
>      }
>  }
> -- 
> 2.17.0
> 

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH] util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cb
  2018-09-05  9:33 [Qemu-devel] [PATCH] util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cb Sergio Lopez
  2018-09-12  7:35 ` Fam Zheng
@ 2018-09-12  7:41 ` Fam Zheng
  2018-09-12 10:28   ` Kevin Wolf
  2018-09-13 14:58 ` [Qemu-devel] [Qemu-block] " Paolo Bonzini
  2 siblings, 1 reply; 5+ messages in thread
From: Fam Zheng @ 2018-09-12  7:41 UTC (permalink / raw)
  To: kwolf, Sergio Lopez; +Cc: stefanha, qemu-block, qemu-devel

On Wed, 09/05 11:33, Sergio Lopez wrote:
> AIO Coroutines shouldn't by managed by an AioContext different than the
> one assigned when they are created. aio_co_enter avoids entering a
> coroutine from a different AioContext, calling aio_co_schedule instead.
> 
> Scheduled coroutines are then entered by co_schedule_bh_cb using
> qemu_coroutine_enter, which just calls qemu_aio_coroutine_enter with the
> current AioContext obtained with qemu_get_current_aio_context.
> Eventually, co->ctx will be set to the AioContext passed as an argument
> to qemu_aio_coroutine_enter.
> 
> This means that, if an IO Thread's AioConext is being processed by the
> Main Thread (due to aio_poll being called with a BDS AioContext, as it
> happens in AIO_WAIT_WHILE among other places), the AioContext from some
> coroutines may be wrongly replaced with the one from the Main Thread.
> 
> This is the root cause behind some crashes, mainly triggered by the
> drain code at block/io.c. The most common are these abort and failed
> assertion:
> 
> util/async.c:aio_co_schedule
> 456     if (scheduled) {
> 457         fprintf(stderr,
> 458                 "%s: Co-routine was already scheduled in '%s'\n",
> 459                 __func__, scheduled);
> 460         abort();
> 461     }
> 
> util/qemu-coroutine-lock.c:
> 286     assert(mutex->holder == self);
> 
> But it's also known to cause random errors at different locations, and
> even SIGSEGV with broken coroutine backtraces.
> 
> By using qemu_aio_coroutine_enter directly in co_schedule_bh_cb, we can
> pass the correct AioContext as an argument, making sure co->ctx is not
> wrongly altered.
> 
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
>  util/async.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/util/async.c b/util/async.c
> index 05979f8014..c10642a385 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -400,7 +400,7 @@ static void co_schedule_bh_cb(void *opaque)
>  
>          /* Protected by write barrier in qemu_aio_coroutine_enter */
>          atomic_set(&co->scheduled, NULL);
> -        qemu_coroutine_enter(co);
> +        qemu_aio_coroutine_enter(ctx, co);
>          aio_context_release(ctx);
>      }
>  }

Kevin, could you test this patch together with your next version of the drain
fix series? Since they are related, it's better if you could include it in your
series or even apply it yourself. Peter is not processing pull requests, so
scattering fixes in various trees will do no good.

Fam

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH] util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cb
  2018-09-12  7:41 ` Fam Zheng
@ 2018-09-12 10:28   ` Kevin Wolf
  0 siblings, 0 replies; 5+ messages in thread
From: Kevin Wolf @ 2018-09-12 10:28 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Sergio Lopez, stefanha, qemu-block, qemu-devel

Am 12.09.2018 um 09:41 hat Fam Zheng geschrieben:
> On Wed, 09/05 11:33, Sergio Lopez wrote:
> > AIO Coroutines shouldn't by managed by an AioContext different than the
> > one assigned when they are created. aio_co_enter avoids entering a
> > coroutine from a different AioContext, calling aio_co_schedule instead.
> > 
> > Scheduled coroutines are then entered by co_schedule_bh_cb using
> > qemu_coroutine_enter, which just calls qemu_aio_coroutine_enter with the
> > current AioContext obtained with qemu_get_current_aio_context.
> > Eventually, co->ctx will be set to the AioContext passed as an argument
> > to qemu_aio_coroutine_enter.
> > 
> > This means that, if an IO Thread's AioConext is being processed by the
> > Main Thread (due to aio_poll being called with a BDS AioContext, as it
> > happens in AIO_WAIT_WHILE among other places), the AioContext from some
> > coroutines may be wrongly replaced with the one from the Main Thread.
> > 
> > This is the root cause behind some crashes, mainly triggered by the
> > drain code at block/io.c. The most common are these abort and failed
> > assertion:
> > 
> > util/async.c:aio_co_schedule
> > 456     if (scheduled) {
> > 457         fprintf(stderr,
> > 458                 "%s: Co-routine was already scheduled in '%s'\n",
> > 459                 __func__, scheduled);
> > 460         abort();
> > 461     }
> > 
> > util/qemu-coroutine-lock.c:
> > 286     assert(mutex->holder == self);
> > 
> > But it's also known to cause random errors at different locations, and
> > even SIGSEGV with broken coroutine backtraces.
> > 
> > By using qemu_aio_coroutine_enter directly in co_schedule_bh_cb, we can
> > pass the correct AioContext as an argument, making sure co->ctx is not
> > wrongly altered.
> > 
> > Signed-off-by: Sergio Lopez <slp@redhat.com>
> > ---
> >  util/async.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/util/async.c b/util/async.c
> > index 05979f8014..c10642a385 100644
> > --- a/util/async.c
> > +++ b/util/async.c
> > @@ -400,7 +400,7 @@ static void co_schedule_bh_cb(void *opaque)
> >  
> >          /* Protected by write barrier in qemu_aio_coroutine_enter */
> >          atomic_set(&co->scheduled, NULL);
> > -        qemu_coroutine_enter(co);
> > +        qemu_aio_coroutine_enter(ctx, co);
> >          aio_context_release(ctx);
> >      }
> >  }
> 
> Kevin, could you test this patch together with your next version of the drain
> fix series? Since they are related, it's better if you could include it in your
> series or even apply it yourself. Peter is not processing pull requests, so
> scattering fixes in various trees will do no good.

Apparently I forgot to send an email, but I already applied this to my
block branch.

Kevin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH] util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cb
  2018-09-05  9:33 [Qemu-devel] [PATCH] util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cb Sergio Lopez
  2018-09-12  7:35 ` Fam Zheng
  2018-09-12  7:41 ` Fam Zheng
@ 2018-09-13 14:58 ` Paolo Bonzini
  2 siblings, 0 replies; 5+ messages in thread
From: Paolo Bonzini @ 2018-09-13 14:58 UTC (permalink / raw)
  To: Sergio Lopez, kwolf, famz, stefanha, qemu-block; +Cc: qemu-devel

On 05/09/2018 11:33, Sergio Lopez wrote:
> AIO Coroutines shouldn't by managed by an AioContext different than the
> one assigned when they are created. aio_co_enter avoids entering a
> coroutine from a different AioContext, calling aio_co_schedule instead.
> 
> Scheduled coroutines are then entered by co_schedule_bh_cb using
> qemu_coroutine_enter, which just calls qemu_aio_coroutine_enter with the
> current AioContext obtained with qemu_get_current_aio_context.
> Eventually, co->ctx will be set to the AioContext passed as an argument
> to qemu_aio_coroutine_enter.
> 
> This means that, if an IO Thread's AioConext is being processed by the
> Main Thread (due to aio_poll being called with a BDS AioContext, as it
> happens in AIO_WAIT_WHILE among other places), the AioContext from some
> coroutines may be wrongly replaced with the one from the Main Thread.
> 
> This is the root cause behind some crashes, mainly triggered by the
> drain code at block/io.c. The most common are these abort and failed
> assertion:
> 
> util/async.c:aio_co_schedule
> 456     if (scheduled) {
> 457         fprintf(stderr,
> 458                 "%s: Co-routine was already scheduled in '%s'\n",
> 459                 __func__, scheduled);
> 460         abort();
> 461     }
> 
> util/qemu-coroutine-lock.c:
> 286     assert(mutex->holder == self);
> 
> But it's also known to cause random errors at different locations, and
> even SIGSEGV with broken coroutine backtraces.
> 
> By using qemu_aio_coroutine_enter directly in co_schedule_bh_cb, we can
> pass the correct AioContext as an argument, making sure co->ctx is not
> wrongly altered.
> 
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
>  util/async.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/util/async.c b/util/async.c
> index 05979f8014..c10642a385 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -400,7 +400,7 @@ static void co_schedule_bh_cb(void *opaque)
>  
>          /* Protected by write barrier in qemu_aio_coroutine_enter */
>          atomic_set(&co->scheduled, NULL);
> -        qemu_coroutine_enter(co);
> +        qemu_aio_coroutine_enter(ctx, co);
>          aio_context_release(ctx);
>      }
>  }
> 

Ouch.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-09-13 14:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-05  9:33 [Qemu-devel] [PATCH] util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cb Sergio Lopez
2018-09-12  7:35 ` Fam Zheng
2018-09-12  7:41 ` Fam Zheng
2018-09-12 10:28   ` Kevin Wolf
2018-09-13 14:58 ` [Qemu-devel] [Qemu-block] " Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.