All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/2] Fix aio_notify_accept()
@ 2018-08-09 13:22 Fam Zheng
  2018-08-09 13:22 ` [Qemu-devel] [PATCH v3 1/2] aio-posix: Don't count ctx->notifier as progress when polling Fam Zheng
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Fam Zheng @ 2018-08-09 13:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Stefan Weil, Fam Zheng, qemu-stable, pbonzini,
	Stefan Hajnoczi, lersek

v3: Fix commit message's bug description. [Paolo]

v2: Implement the fix following Paolo's idea.
    Testing is still in progress.

Calling aio_notify_accept(iothread->ctx) from main loop when it does
aio_poll(iothread->ctx, false) is a bug because it may steal the event needed
by aio_poll(iothread->ctx, true) in the IOThread. This can cause IOThread
hanging.

Fam Zheng (2):
  aio-posix: Don't count ctx->notifier as progress when polling
  aio: Do aio_notify_accept only during blocking aio_poll

 util/aio-posix.c | 7 ++++---
 util/aio-win32.c | 3 ++-
 2 files changed, 6 insertions(+), 4 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] [PATCH v3 1/2] aio-posix: Don't count ctx->notifier as progress when polling
  2018-08-09 13:22 [Qemu-devel] [PATCH v3 0/2] Fix aio_notify_accept() Fam Zheng
@ 2018-08-09 13:22 ` Fam Zheng
  2018-08-09 13:22 ` [Qemu-devel] [PATCH v3 2/2] aio: Do aio_notify_accept only during blocking aio_poll Fam Zheng
  2018-08-14  2:50 ` [Qemu-devel] [PATCH v3 0/2] Fix aio_notify_accept() Fam Zheng
  2 siblings, 0 replies; 7+ messages in thread
From: Fam Zheng @ 2018-08-09 13:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Stefan Weil, Fam Zheng, qemu-stable, pbonzini,
	Stefan Hajnoczi, lersek

The same logic exists in fd polling. This change is especially important
to avoid busy loop once we limit aio_notify_accept() to blocking
aio_poll().

Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
---
 util/aio-posix.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/util/aio-posix.c b/util/aio-posix.c
index 118bf5784b..b5c7f463aa 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -494,7 +494,8 @@ static bool run_poll_handlers_once(AioContext *ctx)
     QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
         if (!node->deleted && node->io_poll &&
             aio_node_check(ctx, node->is_external) &&
-            node->io_poll(node->opaque)) {
+            node->io_poll(node->opaque) &&
+            node->opaque != &ctx->notifier) {
             progress = true;
         }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Qemu-devel] [PATCH v3 2/2] aio: Do aio_notify_accept only during blocking aio_poll
  2018-08-09 13:22 [Qemu-devel] [PATCH v3 0/2] Fix aio_notify_accept() Fam Zheng
  2018-08-09 13:22 ` [Qemu-devel] [PATCH v3 1/2] aio-posix: Don't count ctx->notifier as progress when polling Fam Zheng
@ 2018-08-09 13:22 ` Fam Zheng
  2018-09-07 15:51   ` [Qemu-devel] [Qemu-block] " Kevin Wolf
  2018-08-14  2:50 ` [Qemu-devel] [PATCH v3 0/2] Fix aio_notify_accept() Fam Zheng
  2 siblings, 1 reply; 7+ messages in thread
From: Fam Zheng @ 2018-08-09 13:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Stefan Weil, Fam Zheng, qemu-stable, pbonzini,
	Stefan Hajnoczi, lersek

An aio_notify() pairs with an aio_notify_accept(). The former should
happen in the main thread or a vCPU thread, and the latter should be
done in the IOThread.

There is one rare case that the main thread or vCPU thread may "steal"
the aio_notify() event just raised by itself, in bdrv_set_aio_context()
[1]. The sequence is like this:

    main thread                     IO Thread
    ===============================================================
    bdrv_drained_begin()
      aio_disable_external(ctx)
                                    aio_poll(ctx, true)
                                      ctx->notify_me += 2
    ...
    bdrv_drained_end()
      ...
        aio_notify()
    ...
    bdrv_set_aio_context()
      aio_poll(ctx, false)
[1]     aio_notify_accept(ctx)
                                      ppoll() /* Hang! */

[1] is problematic. It will clear the ctx->notifier event so that
the blocked ppoll() will not return.

(For the curious, this bug was noticed when booting a number of VMs
simultaneously in RHV.  One or two of the VMs will hit this race
condition, making the VIRTIO device unresponsive to I/O commands. When
it hangs, Seabios is busy waiting for a read request to complete (read
MBR), right after initializing the virtio-blk-pci device, using 100%
guest CPU. See also https://bugzilla.redhat.com/show_bug.cgi?id=1562750
for the original bug analysis.)

aio_notify() only injects an event when ctx->notify_me is set,
correspondingly aio_notify_accept() is only useful when ctx->notify_me
_was_ set. Move the call to it into the "blocking" branch. This will
effectively skip [1] and fix the hang.

Furthermore, blocking aio_poll is only allowed on home thread
(in_aio_context_home_thread), because otherwise two blocking
aio_poll()'s can steal each other's ctx->notifier event and cause
hanging just like described above.

Cc: qemu-stable@nongnu.org
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
---
 util/aio-posix.c | 4 ++--
 util/aio-win32.c | 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/util/aio-posix.c b/util/aio-posix.c
index b5c7f463aa..b5c609b68b 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -591,6 +591,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
      * so disable the optimization now.
      */
     if (blocking) {
+        assert(in_aio_context_home_thread(ctx));
         atomic_add(&ctx->notify_me, 2);
     }
 
@@ -633,6 +634,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
     if (blocking) {
         atomic_sub(&ctx->notify_me, 2);
+        aio_notify_accept(ctx);
     }
 
     /* Adjust polling time */
@@ -676,8 +678,6 @@ bool aio_poll(AioContext *ctx, bool blocking)
         }
     }
 
-    aio_notify_accept(ctx);
-
     /* if we have any readable fds, dispatch event */
     if (ret > 0) {
         for (i = 0; i < npfd; i++) {
diff --git a/util/aio-win32.c b/util/aio-win32.c
index e676a8d9b2..c58957cc4b 100644
--- a/util/aio-win32.c
+++ b/util/aio-win32.c
@@ -373,11 +373,12 @@ bool aio_poll(AioContext *ctx, bool blocking)
         ret = WaitForMultipleObjects(count, events, FALSE, timeout);
         if (blocking) {
             assert(first);
+            assert(in_aio_context_home_thread(ctx));
             atomic_sub(&ctx->notify_me, 2);
+            aio_notify_accept(ctx);
         }
 
         if (first) {
-            aio_notify_accept(ctx);
             progress |= aio_bh_poll(ctx);
             first = false;
         }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/2] Fix aio_notify_accept()
  2018-08-09 13:22 [Qemu-devel] [PATCH v3 0/2] Fix aio_notify_accept() Fam Zheng
  2018-08-09 13:22 ` [Qemu-devel] [PATCH v3 1/2] aio-posix: Don't count ctx->notifier as progress when polling Fam Zheng
  2018-08-09 13:22 ` [Qemu-devel] [PATCH v3 2/2] aio: Do aio_notify_accept only during blocking aio_poll Fam Zheng
@ 2018-08-14  2:50 ` Fam Zheng
  2018-08-14  6:27   ` Paolo Bonzini
  2 siblings, 1 reply; 7+ messages in thread
From: Fam Zheng @ 2018-08-14  2:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Stefan Weil, qemu-stable, pbonzini, Stefan Hajnoczi, lersek

On Thu, 08/09 21:22, Fam Zheng wrote:
> v3: Fix commit message's bug description. [Paolo]

If there's no objection, I'm queuing this for 3.1.

Fam

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/2] Fix aio_notify_accept()
  2018-08-14  2:50 ` [Qemu-devel] [PATCH v3 0/2] Fix aio_notify_accept() Fam Zheng
@ 2018-08-14  6:27   ` Paolo Bonzini
  0 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2018-08-14  6:27 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: qemu-block, Stefan Weil, qemu-stable, Stefan Hajnoczi, lersek

On 14/08/2018 04:50, Fam Zheng wrote:
> On Thu, 08/09 21:22, Fam Zheng wrote:
>> v3: Fix commit message's bug description. [Paolo]
> 
> If there's no objection, I'm queuing this for 3.1.

Sure, thanks.

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v3 2/2] aio: Do aio_notify_accept only during blocking aio_poll
  2018-08-09 13:22 ` [Qemu-devel] [PATCH v3 2/2] aio: Do aio_notify_accept only during blocking aio_poll Fam Zheng
@ 2018-09-07 15:51   ` Kevin Wolf
  2018-09-10  3:59     ` Fam Zheng
  0 siblings, 1 reply; 7+ messages in thread
From: Kevin Wolf @ 2018-09-07 15:51 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, qemu-block, Stefan Weil, qemu-stable,
	Stefan Hajnoczi, pbonzini, lersek, slp

Am 09.08.2018 um 15:22 hat Fam Zheng geschrieben:
> Furthermore, blocking aio_poll is only allowed on home thread
> (in_aio_context_home_thread), because otherwise two blocking
> aio_poll()'s can steal each other's ctx->notifier event and cause
> hanging just like described above.

It's good to have this assertion now at least, but after digging into
some bugs, I think in fact that any aio_poll() (even non-blocking) is
only allowed in the home thread: At least one reason is that if you run
it from a different thread, qemu_get_current_aio_context() returns the
wrong AioContext in any callbacks called by aio_poll(). Anything else
using TLS can have similar problems.

One instance where this matters is fixed/worked around by Sergio's
"util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cb". We
wouldn't even need that patch if we could make sure that aio_poll() is
never called from the wrong thread. This would feel more robust.

I'll fix the aio_poll() calls in drain (the AIO_WAIT_WHILE() ones are
already fine, the rest by removing them). After that,
bdrv_set_aio_context() is still problematic, but the rest should be
okay. Hopefully we can use the tighter assertion then.

Kevin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v3 2/2] aio: Do aio_notify_accept only during blocking aio_poll
  2018-09-07 15:51   ` [Qemu-devel] [Qemu-block] " Kevin Wolf
@ 2018-09-10  3:59     ` Fam Zheng
  0 siblings, 0 replies; 7+ messages in thread
From: Fam Zheng @ 2018-09-10  3:59 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, qemu-block, Stefan Weil, qemu-stable,
	Stefan Hajnoczi, pbonzini, lersek, slp

On Fri, 09/07 17:51, Kevin Wolf wrote:
> Am 09.08.2018 um 15:22 hat Fam Zheng geschrieben:
> > Furthermore, blocking aio_poll is only allowed on home thread
> > (in_aio_context_home_thread), because otherwise two blocking
> > aio_poll()'s can steal each other's ctx->notifier event and cause
> > hanging just like described above.
> 
> It's good to have this assertion now at least, but after digging into
> some bugs, I think in fact that any aio_poll() (even non-blocking) is
> only allowed in the home thread: At least one reason is that if you run
> it from a different thread, qemu_get_current_aio_context() returns the
> wrong AioContext in any callbacks called by aio_poll(). Anything else
> using TLS can have similar problems.
> 
> One instance where this matters is fixed/worked around by Sergio's
> "util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cb". We
> wouldn't even need that patch if we could make sure that aio_poll() is
> never called from the wrong thread. This would feel more robust.
> 
> I'll fix the aio_poll() calls in drain (the AIO_WAIT_WHILE() ones are
> already fine, the rest by removing them). After that,
> bdrv_set_aio_context() is still problematic, but the rest should be
> okay. Hopefully we can use the tighter assertion then.

Fully agree with you.

Fam

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-09-10  3:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-09 13:22 [Qemu-devel] [PATCH v3 0/2] Fix aio_notify_accept() Fam Zheng
2018-08-09 13:22 ` [Qemu-devel] [PATCH v3 1/2] aio-posix: Don't count ctx->notifier as progress when polling Fam Zheng
2018-08-09 13:22 ` [Qemu-devel] [PATCH v3 2/2] aio: Do aio_notify_accept only during blocking aio_poll Fam Zheng
2018-09-07 15:51   ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2018-09-10  3:59     ` Fam Zheng
2018-08-14  2:50 ` [Qemu-devel] [PATCH v3 0/2] Fix aio_notify_accept() Fam Zheng
2018-08-14  6:27   ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.