All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] bdrv_flush: only use fast path when in owned AioContext
@ 2020-05-11 16:50 Stefan Reiter
  2020-05-12 10:57 ` Kevin Wolf
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan Reiter @ 2020-05-11 16:50 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: fam, kwolf, t.lamprecht, stefanha, mreitz

Just because we're in a coroutine doesn't imply ownership of the context
of the flushed drive. In such a case use the slow path which explicitly
enters bdrv_flush_co_entry in the correct AioContext.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

We've experienced some lockups in this codepath when taking snapshots of VMs
with drives that have IO-Threads enabled (we have an async 'savevm'
implementation running from a coroutine).

Currently no reproducer for upstream versions I could find, but in testing this
patch fixes all issues we're seeing and I think the logic checks out.

The fast path pattern is repeated a few times in this file, so if this change
makes sense, it's probably worth evaluating the other occurences as well.

 block/io.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/io.c b/block/io.c
index aba67f66b9..ee7310fa13 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2895,8 +2895,9 @@ int bdrv_flush(BlockDriverState *bs)
         .ret = NOT_DONE,
     };
 
-    if (qemu_in_coroutine()) {
-        /* Fast-path if already in coroutine context */
+    if (qemu_in_coroutine() &&
+        bdrv_get_aio_context(bs) == qemu_get_current_aio_context()) {
+        /* Fast-path if already in coroutine and we own the drive's context */
         bdrv_flush_co_entry(&flush_co);
     } else {
         co = qemu_coroutine_create(bdrv_flush_co_entry, &flush_co);
-- 
2.20.1




^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC] bdrv_flush: only use fast path when in owned AioContext
  2020-05-11 16:50 [RFC] bdrv_flush: only use fast path when in owned AioContext Stefan Reiter
@ 2020-05-12 10:57 ` Kevin Wolf
  2020-05-12 11:32   ` Kevin Wolf
  0 siblings, 1 reply; 4+ messages in thread
From: Kevin Wolf @ 2020-05-12 10:57 UTC (permalink / raw)
  To: Stefan Reiter; +Cc: fam, qemu-block, qemu-devel, mreitz, stefanha, t.lamprecht

Am 11.05.2020 um 18:50 hat Stefan Reiter geschrieben:
> Just because we're in a coroutine doesn't imply ownership of the context
> of the flushed drive. In such a case use the slow path which explicitly
> enters bdrv_flush_co_entry in the correct AioContext.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 
> We've experienced some lockups in this codepath when taking snapshots of VMs
> with drives that have IO-Threads enabled (we have an async 'savevm'
> implementation running from a coroutine).
> 
> Currently no reproducer for upstream versions I could find, but in testing this
> patch fixes all issues we're seeing and I think the logic checks out.
> 
> The fast path pattern is repeated a few times in this file, so if this change
> makes sense, it's probably worth evaluating the other occurences as well.

What do you mean by "owning" the context? If it's about taking the
AioContext lock, isn't the problem more with calling bdrv_flush() from
code that doesn't take the locks?

Though I think we have some code that doesn't only rely on holding the
AioContext locks, but that actually depends on running in the right
thread, so the change looks right anyway.

Kevin

>  block/io.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index aba67f66b9..ee7310fa13 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2895,8 +2895,9 @@ int bdrv_flush(BlockDriverState *bs)
>          .ret = NOT_DONE,
>      };
>  
> -    if (qemu_in_coroutine()) {
> -        /* Fast-path if already in coroutine context */
> +    if (qemu_in_coroutine() &&
> +        bdrv_get_aio_context(bs) == qemu_get_current_aio_context()) {
> +        /* Fast-path if already in coroutine and we own the drive's context */
>          bdrv_flush_co_entry(&flush_co);
>      } else {
>          co = qemu_coroutine_create(bdrv_flush_co_entry, &flush_co);
> -- 
> 2.20.1
> 
> 



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] bdrv_flush: only use fast path when in owned AioContext
  2020-05-12 10:57 ` Kevin Wolf
@ 2020-05-12 11:32   ` Kevin Wolf
  2020-05-12 12:22     ` Stefan Reiter
  0 siblings, 1 reply; 4+ messages in thread
From: Kevin Wolf @ 2020-05-12 11:32 UTC (permalink / raw)
  To: Stefan Reiter; +Cc: fam, qemu-block, qemu-devel, mreitz, stefanha, t.lamprecht

Am 12.05.2020 um 12:57 hat Kevin Wolf geschrieben:
> Am 11.05.2020 um 18:50 hat Stefan Reiter geschrieben:
> > Just because we're in a coroutine doesn't imply ownership of the context
> > of the flushed drive. In such a case use the slow path which explicitly
> > enters bdrv_flush_co_entry in the correct AioContext.
> > 
> > Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> > ---
> > 
> > We've experienced some lockups in this codepath when taking snapshots of VMs
> > with drives that have IO-Threads enabled (we have an async 'savevm'
> > implementation running from a coroutine).
> > 
> > Currently no reproducer for upstream versions I could find, but in testing this
> > patch fixes all issues we're seeing and I think the logic checks out.
> > 
> > The fast path pattern is repeated a few times in this file, so if this change
> > makes sense, it's probably worth evaluating the other occurences as well.
> 
> What do you mean by "owning" the context? If it's about taking the
> AioContext lock, isn't the problem more with calling bdrv_flush() from
> code that doesn't take the locks?
> 
> Though I think we have some code that doesn't only rely on holding the
> AioContext locks, but that actually depends on running in the right
> thread, so the change looks right anyway.

Well, the idea is right, but the change itself isn't, of course. If
we're already in coroutine context, we must not busy wait with
BDRV_POLL_WHILE(). I'll see if I can put something together after lunch.

Kevin



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] bdrv_flush: only use fast path when in owned AioContext
  2020-05-12 11:32   ` Kevin Wolf
@ 2020-05-12 12:22     ` Stefan Reiter
  0 siblings, 0 replies; 4+ messages in thread
From: Stefan Reiter @ 2020-05-12 12:22 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: fam, qemu-block, qemu-devel, mreitz, stefanha, t.lamprecht

On 5/12/20 1:32 PM, Kevin Wolf wrote:
> Am 12.05.2020 um 12:57 hat Kevin Wolf geschrieben:
>> Am 11.05.2020 um 18:50 hat Stefan Reiter geschrieben:
>>> Just because we're in a coroutine doesn't imply ownership of the context
>>> of the flushed drive. In such a case use the slow path which explicitly
>>> enters bdrv_flush_co_entry in the correct AioContext.
>>>
>>> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
>>> ---
>>>
>>> We've experienced some lockups in this codepath when taking snapshots of VMs
>>> with drives that have IO-Threads enabled (we have an async 'savevm'
>>> implementation running from a coroutine).
>>>
>>> Currently no reproducer for upstream versions I could find, but in testing this
>>> patch fixes all issues we're seeing and I think the logic checks out.
>>>
>>> The fast path pattern is repeated a few times in this file, so if this change
>>> makes sense, it's probably worth evaluating the other occurences as well.
>>
>> What do you mean by "owning" the context? If it's about taking the
>> AioContext lock, isn't the problem more with calling bdrv_flush() from
>> code that doesn't take the locks?
>>
>> Though I think we have some code that doesn't only rely on holding the
>> AioContext locks, but that actually depends on running in the right
>> thread, so the change looks right anyway.

"Owning" as in it only works (doesn't hang) when bdrv_flush_co_entry 
runs on the same AioContext that the BlockDriverState it's flushing 
belongs to.

We hold the locks for all AioContexts we want to flush in our code (in 
this case called from do_vm_stop/bdrv_flush_all so we're even in a 
drained section).

> 
> Well, the idea is right, but the change itself isn't, of course. If
> we're already in coroutine context, we must not busy wait with
> BDRV_POLL_WHILE(). I'll see if I can put something together after lunch.
> 
> Kevin
> 
> 

Thanks for taking a look!



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-05-12 12:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-11 16:50 [RFC] bdrv_flush: only use fast path when in owned AioContext Stefan Reiter
2020-05-12 10:57 ` Kevin Wolf
2020-05-12 11:32   ` Kevin Wolf
2020-05-12 12:22     ` Stefan Reiter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.