* [Qemu-devel] [PATCH v2 1/2] throttle-groups: fix restart coroutine iothread race
2019-01-14 13:32 [Qemu-devel] [PATCH v2 0/2] throttle-groups: fix restart coroutine iothread race Stefan Hajnoczi
@ 2019-01-14 13:32 ` Stefan Hajnoczi
2019-01-14 14:42 ` Alberto Garcia
2019-01-14 13:32 ` [Qemu-devel] [PATCH v2 2/2] iotests: add 238 for throttling tgm unregister iothread segfault Stefan Hajnoczi
2019-01-16 13:55 ` [Qemu-devel] [Qemu-block] [PATCH v2 0/2] throttle-groups: fix restart coroutine iothread race Stefan Hajnoczi
2 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2019-01-14 13:32 UTC (permalink / raw)
To: qemu-devel
Cc: Max Reitz, Paolo Bonzini, qemu-block, Alberto Garcia, Kevin Wolf,
Stefan Hajnoczi
The following QMP command leads to a crash when iothreads are used:
{ 'execute': 'device_del', 'arguments': {'id': 'data'} }
The backtrace involves the queue restart coroutine where
tgm->throttle_state is a NULL pointer because
throttle_group_unregister_tgm() has already been called:
(gdb) bt full
#0 0x00005585a7a3b378 in qemu_mutex_lock_impl (mutex=0xffffffffffffffd0, file=0x5585a7bb3d54 "block/throttle-groups.c", line=412) at util/qemu-thread-posix.c:64
err = <optimized out>
__PRETTY_FUNCTION__ = "qemu_mutex_lock_impl"
__func__ = "qemu_mutex_lock_impl"
#1 0x00005585a79be074 in throttle_group_restart_queue_entry (opaque=0x5585a9de4eb0) at block/throttle-groups.c:412
_f = <optimized out>
data = 0x5585a9de4eb0
tgm = 0x5585a9079440
ts = 0x0
tg = 0xffffffffffffff98
is_write = false
empty_queue = 255
This coroutine should not execute in the iothread after the throttle
group member has been unregistered!
The root cause is that the device_del code path schedules the restart
coroutine in the iothread while holding the AioContext lock. Therefore
the iothread cannot execute the coroutine until after device_del
releases the lock - by this time it's too late.
This patch adds a reference count to ThrottleGroupMember so we can
synchronously wait for restart coroutines to complete. Once they are
done it is safe to unregister the ThrottleGroupMember.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v2:
* Use atomic operations instead of AioContext lock [Paolo]
---
include/block/throttle-groups.h | 5 +++++
block/throttle-groups.c | 9 +++++++++
2 files changed, 14 insertions(+)
diff --git a/include/block/throttle-groups.h b/include/block/throttle-groups.h
index e2fd0513c4..712a8e64b4 100644
--- a/include/block/throttle-groups.h
+++ b/include/block/throttle-groups.h
@@ -43,6 +43,11 @@ typedef struct ThrottleGroupMember {
*/
unsigned int io_limits_disabled;
+ /* Number of pending throttle_group_restart_queue_entry() coroutines.
+ * Accessed with atomic operations.
+ */
+ unsigned int restart_pending;
+
/* The following fields are protected by the ThrottleGroup lock.
* See the ThrottleGroup documentation for details.
* throttle_state tells us if I/O limits are configured. */
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 5d8213a443..a5a2037924 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -415,6 +415,9 @@ static void coroutine_fn throttle_group_restart_queue_entry(void *opaque)
}
g_free(data);
+
+ atomic_dec(&tgm->restart_pending);
+ aio_wait_kick();
}
static void throttle_group_restart_queue(ThrottleGroupMember *tgm, bool is_write)
@@ -430,6 +433,8 @@ static void throttle_group_restart_queue(ThrottleGroupMember *tgm, bool is_write
* be no timer pending on this tgm at this point */
assert(!timer_pending(tgm->throttle_timers.timers[is_write]));
+ atomic_inc(&tgm->restart_pending);
+
co = qemu_coroutine_create(throttle_group_restart_queue_entry, rd);
aio_co_enter(tgm->aio_context, co);
}
@@ -538,6 +543,7 @@ void throttle_group_register_tgm(ThrottleGroupMember *tgm,
tgm->throttle_state = ts;
tgm->aio_context = ctx;
+ atomic_set(&tgm->restart_pending, 0);
qemu_mutex_lock(&tg->lock);
/* If the ThrottleGroup is new set this ThrottleGroupMember as the token */
@@ -584,6 +590,9 @@ void throttle_group_unregister_tgm(ThrottleGroupMember *tgm)
return;
}
+ /* Wait for throttle_group_restart_queue_entry() coroutines to finish */
+ AIO_WAIT_WHILE(tgm->aio_context, atomic_read(&tgm->restart_pending) > 0);
+
qemu_mutex_lock(&tg->lock);
for (i = 0; i < 2; i++) {
assert(tgm->pending_reqs[i] == 0);
--
2.20.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH v2 1/2] throttle-groups: fix restart coroutine iothread race
2019-01-14 13:32 ` [Qemu-devel] [PATCH v2 1/2] " Stefan Hajnoczi
@ 2019-01-14 14:42 ` Alberto Garcia
0 siblings, 0 replies; 6+ messages in thread
From: Alberto Garcia @ 2019-01-14 14:42 UTC (permalink / raw)
To: Stefan Hajnoczi, qemu-devel
Cc: Max Reitz, Paolo Bonzini, qemu-block, Kevin Wolf
On Mon 14 Jan 2019 02:32:56 PM CET, Stefan Hajnoczi wrote:
> The following QMP command leads to a crash when iothreads are used:
>
> { 'execute': 'device_del', 'arguments': {'id': 'data'} }
>
> The backtrace involves the queue restart coroutine where
> tgm->throttle_state is a NULL pointer because
> throttle_group_unregister_tgm() has already been called:
>
> (gdb) bt full
> #0 0x00005585a7a3b378 in qemu_mutex_lock_impl (mutex=0xffffffffffffffd0, file=0x5585a7bb3d54 "block/throttle-groups.c", line=412) at util/qemu-thread-posix.c:64
> err = <optimized out>
> __PRETTY_FUNCTION__ = "qemu_mutex_lock_impl"
> __func__ = "qemu_mutex_lock_impl"
> #1 0x00005585a79be074 in throttle_group_restart_queue_entry (opaque=0x5585a9de4eb0) at block/throttle-groups.c:412
> _f = <optimized out>
> data = 0x5585a9de4eb0
> tgm = 0x5585a9079440
> ts = 0x0
> tg = 0xffffffffffffff98
> is_write = false
> empty_queue = 255
>
> This coroutine should not execute in the iothread after the throttle
> group member has been unregistered!
>
> The root cause is that the device_del code path schedules the restart
> coroutine in the iothread while holding the AioContext lock. Therefore
> the iothread cannot execute the coroutine until after device_del
> releases the lock - by this time it's too late.
>
> This patch adds a reference count to ThrottleGroupMember so we can
> synchronously wait for restart coroutines to complete. Once they are
> done it is safe to unregister the ThrottleGroupMember.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Berto
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH v2 2/2] iotests: add 238 for throttling tgm unregister iothread segfault
2019-01-14 13:32 [Qemu-devel] [PATCH v2 0/2] throttle-groups: fix restart coroutine iothread race Stefan Hajnoczi
2019-01-14 13:32 ` [Qemu-devel] [PATCH v2 1/2] " Stefan Hajnoczi
@ 2019-01-14 13:32 ` Stefan Hajnoczi
2019-01-14 20:57 ` Alberto Garcia
2019-01-16 13:55 ` [Qemu-devel] [Qemu-block] [PATCH v2 0/2] throttle-groups: fix restart coroutine iothread race Stefan Hajnoczi
2 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2019-01-14 13:32 UTC (permalink / raw)
To: qemu-devel
Cc: Max Reitz, Paolo Bonzini, qemu-block, Alberto Garcia, Kevin Wolf,
Stefan Hajnoczi
Hot-unplug a scsi-hd using an iothread. The previous patch fixes a
segfault in this scenario.
This patch adds a regression test.
Suggested-by: Alberto Garcia <berto@igalia.com>
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v2:
* Added qemu-iotests test case [Kevin, Alberto]
---
tests/qemu-iotests/238 | 47 ++++++++++++++++++++++++++++++++++++++
tests/qemu-iotests/238.out | 6 +++++
tests/qemu-iotests/group | 1 +
3 files changed, 54 insertions(+)
create mode 100755 tests/qemu-iotests/238
create mode 100644 tests/qemu-iotests/238.out
diff --git a/tests/qemu-iotests/238 b/tests/qemu-iotests/238
new file mode 100755
index 0000000000..f81ee1112f
--- /dev/null
+++ b/tests/qemu-iotests/238
@@ -0,0 +1,47 @@
+#!/usr/bin/env python
+#
+# Regression test for throttle group member unregister segfault with iothread
+#
+# Copyright (c) 2019 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+#
+
+import sys
+import os
+import iotests
+from iotests import log
+
+sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'scripts'))
+
+from qemu import QEMUMachine
+
+if iotests.qemu_default_machine == 's390-ccw-virtio':
+ virtio_scsi_device = 'virtio-scsi-ccw'
+else:
+ virtio_scsi_device = 'virtio-scsi-pci'
+
+vm = QEMUMachine(iotests.qemu_prog)
+vm.add_args('-machine', 'accel=kvm')
+vm.launch()
+
+log(vm.qmp('blockdev-add', node_name='hd0', driver='null-co'))
+log(vm.qmp('object-add', qom_type='iothread', id='iothread0'))
+log(vm.qmp('device_add', id='scsi0', driver=virtio_scsi_device, iothread='iothread0'))
+log(vm.qmp('device_add', id='scsi-hd0', driver='scsi-hd', drive='hd0'))
+log(vm.qmp('block_set_io_throttle', id='scsi-hd0', bps=0, bps_rd=0, bps_wr=0,
+ iops=1000, iops_rd=0, iops_wr=0, conv_keys=False))
+log(vm.qmp('device_del', id='scsi-hd0'))
+
+vm.shutdown()
diff --git a/tests/qemu-iotests/238.out b/tests/qemu-iotests/238.out
new file mode 100644
index 0000000000..4de840ba8c
--- /dev/null
+++ b/tests/qemu-iotests/238.out
@@ -0,0 +1,6 @@
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 61a6d98ebd..1f55bcf4e6 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -233,3 +233,4 @@
233 auto quick
234 auto quick migration
235 auto quick
+238 auto quick
--
2.20.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [Qemu-block] [PATCH v2 0/2] throttle-groups: fix restart coroutine iothread race
2019-01-14 13:32 [Qemu-devel] [PATCH v2 0/2] throttle-groups: fix restart coroutine iothread race Stefan Hajnoczi
2019-01-14 13:32 ` [Qemu-devel] [PATCH v2 1/2] " Stefan Hajnoczi
2019-01-14 13:32 ` [Qemu-devel] [PATCH v2 2/2] iotests: add 238 for throttling tgm unregister iothread segfault Stefan Hajnoczi
@ 2019-01-16 13:55 ` Stefan Hajnoczi
2 siblings, 0 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2019-01-16 13:55 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Kevin Wolf, qemu-block, Max Reitz, Paolo Bonzini
[-- Attachment #1: Type: text/plain, Size: 968 bytes --]
On Mon, Jan 14, 2019 at 01:32:55PM +0000, Stefan Hajnoczi wrote:
> v2:
> * Added qemu-iotests test case [Kevin, Alberto]
> * Use atomic operations instead of AioContext lock [Paolo]
>
> This patch series fixes a segfault upon device_del when virtio-scsi is used
> with iothreads.
>
> Stefan Hajnoczi (2):
> throttle-groups: fix restart coroutine iothread race
> iotests: add 238 for throttling tgm unregister iothread segfault
>
> include/block/throttle-groups.h | 5 ++++
> block/throttle-groups.c | 9 +++++++
> tests/qemu-iotests/238 | 47 +++++++++++++++++++++++++++++++++
> tests/qemu-iotests/238.out | 6 +++++
> tests/qemu-iotests/group | 1 +
> 5 files changed, 68 insertions(+)
> create mode 100755 tests/qemu-iotests/238
> create mode 100644 tests/qemu-iotests/238.out
>
> --
> 2.20.1
>
>
Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread