* [Qemu-devel] [PATCH v2 0/2] Fix aio_notify_accept()
@ 2018-08-07 9:16 Fam Zheng
2018-08-07 9:16 ` [Qemu-devel] [PATCH v2 1/2] aio-posix: Don't count ctx->notifier as progress when polling Fam Zheng
2018-08-07 9:16 ` [Qemu-devel] [PATCH v2 2/2] aio: Do aio_notify_accept only during blocking aio_poll Fam Zheng
0 siblings, 2 replies; 5+ messages in thread
From: Fam Zheng @ 2018-08-07 9:16 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, Fam Zheng, Stefan Hajnoczi, lersek, qemu-block
v2: Implement the fix following Paolo's idea.
Testing is still in progress.
Calling aio_notify_accept(iothread->ctx) from main loop when it does
aio_poll(iothread->ctx, false) is a bug because it may steal the event needed
by aio_poll(iothread->ctx, true) in the IOThread. This can cause IOThread
hanging.
Fam Zheng (2):
aio-posix: Don't count ctx->notifier as progress when polling
aio: Do aio_notify_accept only during blocking aio_poll
util/aio-posix.c | 7 ++++---
util/aio-win32.c | 3 ++-
2 files changed, 6 insertions(+), 4 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Qemu-devel] [PATCH v2 1/2] aio-posix: Don't count ctx->notifier as progress when polling
2018-08-07 9:16 [Qemu-devel] [PATCH v2 0/2] Fix aio_notify_accept() Fam Zheng
@ 2018-08-07 9:16 ` Fam Zheng
2018-08-07 9:16 ` [Qemu-devel] [PATCH v2 2/2] aio: Do aio_notify_accept only during blocking aio_poll Fam Zheng
1 sibling, 0 replies; 5+ messages in thread
From: Fam Zheng @ 2018-08-07 9:16 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, Fam Zheng, Stefan Hajnoczi, lersek, qemu-block
The same logic exists in fd polling. This change is especially important
to avoid busy loop once we limit aio_notify_accept() to blocking
aio_poll().
Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
---
util/aio-posix.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/util/aio-posix.c b/util/aio-posix.c
index 118bf5784b..b5c7f463aa 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -494,7 +494,8 @@ static bool run_poll_handlers_once(AioContext *ctx)
QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
if (!node->deleted && node->io_poll &&
aio_node_check(ctx, node->is_external) &&
- node->io_poll(node->opaque)) {
+ node->io_poll(node->opaque) &&
+ node->opaque != &ctx->notifier) {
progress = true;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [Qemu-devel] [PATCH v2 2/2] aio: Do aio_notify_accept only during blocking aio_poll
2018-08-07 9:16 [Qemu-devel] [PATCH v2 0/2] Fix aio_notify_accept() Fam Zheng
2018-08-07 9:16 ` [Qemu-devel] [PATCH v2 1/2] aio-posix: Don't count ctx->notifier as progress when polling Fam Zheng
@ 2018-08-07 9:16 ` Fam Zheng
2018-08-07 10:15 ` Paolo Bonzini
1 sibling, 1 reply; 5+ messages in thread
From: Fam Zheng @ 2018-08-07 9:16 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, Fam Zheng, Stefan Hajnoczi, lersek, qemu-block
An aio_notify() pairs with an aio_notify_accept(). The former should
happen in the main thread or a vCPU thread, and the latter should be
done in the IOThread.
There is one rare case that the main thread or vCPU thread may "steal"
the aio_notify() event just raised by itself, in bdrv_set_aio_context()
[1]. The sequence is like this:
main thread IO Thread
===============================================================
bdrv_drained_begin()
aio_disable_external(ctx)
aio_poll(ctx, true)
ctx->notify_me += 2
ppoll() /* blocked */
...
bdrv_drained_end()
...
aio_notify()
...
bdrv_set_aio_context()
aio_poll(ctx, false)
[1] aio_notify_accept(ctx)
/* Hang! */
[1] is problematic. It will clear the ctx->notifier event so that
the blocked ppoll() will not return.
(For the curious, this bug was noticed when booting a number of VMs
simultaneously in RHV. One or two of the VMs will hit this race
condition, making the VIRTIO device unresponsive to I/O commands. When
it hangs, Seabios is busy waiting for a read request to complete (read
MBR), right after initializing the virtio-blk-pci device, using 100%
guest CPU. See also https://bugzilla.redhat.com/show_bug.cgi?id=1562750
for the original bug analysis.)
aio_notify() only injects an event when ctx->notify_me is set,
correspondingly aio_notify_accept() is only useful when ctx->notify_me
_was_ set. Move the call to it into the "blocking" branch. This will
effectively skip [1] and fix the hang.
Furthermore, blocking aio_poll is only allowed on home thread
(in_aio_context_home_thread), because otherwise two blocking
aio_poll()'s can steal each other's ctx->notifier event and cause
hanging just like described above.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
---
util/aio-posix.c | 4 ++--
util/aio-win32.c | 3 ++-
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/util/aio-posix.c b/util/aio-posix.c
index b5c7f463aa..b5c609b68b 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -591,6 +591,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
* so disable the optimization now.
*/
if (blocking) {
+ assert(in_aio_context_home_thread(ctx));
atomic_add(&ctx->notify_me, 2);
}
@@ -633,6 +634,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
if (blocking) {
atomic_sub(&ctx->notify_me, 2);
+ aio_notify_accept(ctx);
}
/* Adjust polling time */
@@ -676,8 +678,6 @@ bool aio_poll(AioContext *ctx, bool blocking)
}
}
- aio_notify_accept(ctx);
-
/* if we have any readable fds, dispatch event */
if (ret > 0) {
for (i = 0; i < npfd; i++) {
diff --git a/util/aio-win32.c b/util/aio-win32.c
index e676a8d9b2..c58957cc4b 100644
--- a/util/aio-win32.c
+++ b/util/aio-win32.c
@@ -373,11 +373,12 @@ bool aio_poll(AioContext *ctx, bool blocking)
ret = WaitForMultipleObjects(count, events, FALSE, timeout);
if (blocking) {
assert(first);
+ assert(in_aio_context_home_thread(ctx));
atomic_sub(&ctx->notify_me, 2);
+ aio_notify_accept(ctx);
}
if (first) {
- aio_notify_accept(ctx);
progress |= aio_bh_poll(ctx);
first = false;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH v2 2/2] aio: Do aio_notify_accept only during blocking aio_poll
2018-08-07 9:16 ` [Qemu-devel] [PATCH v2 2/2] aio: Do aio_notify_accept only during blocking aio_poll Fam Zheng
@ 2018-08-07 10:15 ` Paolo Bonzini
2018-08-07 14:11 ` Fam Zheng
0 siblings, 1 reply; 5+ messages in thread
From: Paolo Bonzini @ 2018-08-07 10:15 UTC (permalink / raw)
To: Fam Zheng, qemu-devel; +Cc: Stefan Hajnoczi, lersek, qemu-block
On 07/08/2018 11:16, Fam Zheng wrote:
> main thread IO Thread
> ===============================================================
> bdrv_drained_begin()
> aio_disable_external(ctx)
> aio_poll(ctx, true)
> ctx->notify_me += 2
> ppoll() /* blocked */
> ...
> bdrv_drained_end()
> ...
> aio_notify()
> ...
> bdrv_set_aio_context()
> aio_poll(ctx, false)
> [1] aio_notify_accept(ctx)
> /* Hang! */
Should ppoll() rather be after [1]? Otherwise the new commit message
and patches look great.
> aio_notify() only injects an event when ctx->notify_me is set,
> correspondingly aio_notify_accept() is only useful when ctx->notify_me
> _was_ set.
Very good point.
(Please Cc qemu-stable on the second patch too).
Paolo
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH v2 2/2] aio: Do aio_notify_accept only during blocking aio_poll
2018-08-07 10:15 ` Paolo Bonzini
@ 2018-08-07 14:11 ` Fam Zheng
0 siblings, 0 replies; 5+ messages in thread
From: Fam Zheng @ 2018-08-07 14:11 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel, Stefan Hajnoczi, lersek, qemu-block
On Tue, 08/07 12:15, Paolo Bonzini wrote:
> On 07/08/2018 11:16, Fam Zheng wrote:
> > main thread IO Thread
> > ===============================================================
> > bdrv_drained_begin()
> > aio_disable_external(ctx)
> > aio_poll(ctx, true)
> > ctx->notify_me += 2
> > ppoll() /* blocked */
> > ...
> > bdrv_drained_end()
> > ...
> > aio_notify()
[2] ^^^^^
> > ...
> > bdrv_set_aio_context()
> > aio_poll(ctx, false)
> > [1] aio_notify_accept(ctx)
> > /* Hang! */
>
> Should ppoll() rather be after [1]? Otherwise the new commit message
> and patches look great.
Good point. They race and I think aio_notify_accept() is indeed done before
ppoll() starts its waiting. I will finish testing and send v3.
Fam
>
> > aio_notify() only injects an event when ctx->notify_me is set,
> > correspondingly aio_notify_accept() is only useful when ctx->notify_me
> > _was_ set.
>
> Very good point.
>
> (Please Cc qemu-stable on the second patch too).
>
> Paolo
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-08-07 14:11 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-07 9:16 [Qemu-devel] [PATCH v2 0/2] Fix aio_notify_accept() Fam Zheng
2018-08-07 9:16 ` [Qemu-devel] [PATCH v2 1/2] aio-posix: Don't count ctx->notifier as progress when polling Fam Zheng
2018-08-07 9:16 ` [Qemu-devel] [PATCH v2 2/2] aio: Do aio_notify_accept only during blocking aio_poll Fam Zheng
2018-08-07 10:15 ` Paolo Bonzini
2018-08-07 14:11 ` Fam Zheng
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.