All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] throttle-groups: fix restart coroutine iothread race
@ 2019-01-09 11:01 Stefan Hajnoczi
  2019-01-09 13:49 ` Alberto Garcia
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Stefan Hajnoczi @ 2019-01-09 11:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Max Reitz, qemu-block, Alberto Garcia, Paolo Bonzini,
	Stefan Hajnoczi

The following QMP command leads to a crash when iothreads are used:

  { 'execute': 'device_del', 'arguments': {'id': 'data'} }

The backtrace involves the queue restart coroutine where
tgm->throttle_state is a NULL pointer because
throttle_group_unregister_tgm() has already been called:

  (gdb) bt full
  #0  0x00005585a7a3b378 in qemu_mutex_lock_impl (mutex=0xffffffffffffffd0, file=0x5585a7bb3d54 "block/throttle-groups.c", line=412) at util/qemu-thread-posix.c:64
        err = <optimized out>
        __PRETTY_FUNCTION__ = "qemu_mutex_lock_impl"
        __func__ = "qemu_mutex_lock_impl"
  #1  0x00005585a79be074 in throttle_group_restart_queue_entry (opaque=0x5585a9de4eb0) at block/throttle-groups.c:412
        _f = <optimized out>
        data = 0x5585a9de4eb0
        tgm = 0x5585a9079440
        ts = 0x0
        tg = 0xffffffffffffff98
        is_write = false
        empty_queue = 255

This coroutine should not execute in the iothread after the throttle
group member has been unregistered!

The root cause is that the device_del code path schedules the restart
coroutine in the iothread while holding the AioContext lock.  Therefore
the iothread cannot execute the coroutine until after device_del
releases the lock - by this time it's too late.

This patch adds a reference count to ThrottleGroupMember so we can
synchronously wait for restart coroutines to complete.  Once they are
done it is safe to unregister the ThrottleGroupMember.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
I'm unhappy about adding another synchronous point but haven't found a
nicer solution.  Only the main loop calls
throttle_group_unregister_tgm() and the AioContext lock is held, so
using AIO_WAIT_WHILE() is safe.

Both iothread and non-iothread mode have been tested.  I also tested 2
scsi-hd instances in the same throttling group to make sure that hot
unplugging one drive doesn't interfere with the remaining drive in the
throttling group.

 include/block/throttle-groups.h | 5 +++++
 block/throttle-groups.c         | 9 +++++++++
 2 files changed, 14 insertions(+)

diff --git a/include/block/throttle-groups.h b/include/block/throttle-groups.h
index e2fd0513c4..7492bbc8a8 100644
--- a/include/block/throttle-groups.h
+++ b/include/block/throttle-groups.h
@@ -43,6 +43,11 @@ typedef struct ThrottleGroupMember {
      */
     unsigned int io_limits_disabled;
 
+    /* Number of pending throttle_group_restart_queue_entry() coroutines.
+     * Accessed under aio_context lock.
+     */
+    unsigned int restart_pending;
+
     /* The following fields are protected by the ThrottleGroup lock.
      * See the ThrottleGroup documentation for details.
      * throttle_state tells us if I/O limits are configured. */
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 5d8213a443..a4b15ea428 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -415,6 +415,9 @@ static void coroutine_fn throttle_group_restart_queue_entry(void *opaque)
     }
 
     g_free(data);
+
+    tgm->restart_pending--;
+    aio_wait_kick();
 }
 
 static void throttle_group_restart_queue(ThrottleGroupMember *tgm, bool is_write)
@@ -430,6 +433,8 @@ static void throttle_group_restart_queue(ThrottleGroupMember *tgm, bool is_write
      * be no timer pending on this tgm at this point */
     assert(!timer_pending(tgm->throttle_timers.timers[is_write]));
 
+    tgm->restart_pending++;
+
     co = qemu_coroutine_create(throttle_group_restart_queue_entry, rd);
     aio_co_enter(tgm->aio_context, co);
 }
@@ -538,6 +543,7 @@ void throttle_group_register_tgm(ThrottleGroupMember *tgm,
 
     tgm->throttle_state = ts;
     tgm->aio_context = ctx;
+    tgm->restart_pending = 0;
 
     qemu_mutex_lock(&tg->lock);
     /* If the ThrottleGroup is new set this ThrottleGroupMember as the token */
@@ -584,6 +590,9 @@ void throttle_group_unregister_tgm(ThrottleGroupMember *tgm)
         return;
     }
 
+    /* Wait for throttle_group_restart_queue_entry() coroutines to finish */
+    AIO_WAIT_WHILE(tgm->aio_context, tgm->restart_pending > 0);
+
     qemu_mutex_lock(&tg->lock);
     for (i = 0; i < 2; i++) {
         assert(tgm->pending_reqs[i] == 0);
-- 
2.20.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] throttle-groups: fix restart coroutine iothread race
  2019-01-09 11:01 [Qemu-devel] [PATCH] throttle-groups: fix restart coroutine iothread race Stefan Hajnoczi
@ 2019-01-09 13:49 ` Alberto Garcia
  2019-01-09 15:34 ` Alberto Garcia
  2019-01-11 14:36 ` [Qemu-devel] " Paolo Bonzini
  2 siblings, 0 replies; 15+ messages in thread
From: Alberto Garcia @ 2019-01-09 13:49 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Kevin Wolf, Max Reitz, qemu-block, Paolo Bonzini

On Wed 09 Jan 2019 12:01:44 PM CET, Stefan Hajnoczi wrote:
> The following QMP command leads to a crash when iothreads are used:
>
>   { 'execute': 'device_del', 'arguments': {'id': 'data'} }

I was trying to reproduce this and I found this crashing in master:

$ qemu-system-x86_64 -enable-kvm -qmp stdio -display none
{ "execute": "qmp_capabilities" }
{ "execute": "blockdev-add", "arguments": {"driver": "qcow2", "file": {"driver": "file", "filename": "hd.qcow2"}, "node-name": "hd0"}}
{ "execute": "object-add", "arguments": {"qom-type": "iothread", "id": "iothread0"}}
{ "execute": "x-blockdev-set-iothread", "arguments": {"node-name": "hd0", "iothread": "iothread0"}}
{ "execute": "device_add", "arguments": {"id": "virtio0", "driver": "virtio-blk-pci", "drive": "hd0"}}

Thread 1 "qemu-system-x86" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51         ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  [...] in __GI_abort () at abort.c:89
#2  [...] in error_exit () at util/qemu-thread-posix.c:36
#3  [...] in qemu_mutex_unlock_impl () at util/qemu-thread-posix.c:96
#4  [...] in aio_context_release () at util/async.c:516
#5  [...] in blk_prw () at block/block-backend.c:1262
#6  [...] in blk_pread () at block/block-backend.c:1424
#7  [...] in blk_pread_unthrottled () at block/block-backend.c:1279
#8  [...] in guess_disk_lchs () at hw/block/hd-geometry.c:71
#9  [...] in hd_geometry_guess () at hw/block/hd-geometry.c:136
#10 [...] in blkconf_geometry () at hw/block/block.c:99
#11 [...] in virtio_blk_device_realize () at hw/block/virtio-blk.c:944
#12 [...] in virtio_device_realize () at hw/virtio/virtio.c:2538
[...]

Berto

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] throttle-groups: fix restart coroutine iothread race
  2019-01-09 11:01 [Qemu-devel] [PATCH] throttle-groups: fix restart coroutine iothread race Stefan Hajnoczi
  2019-01-09 13:49 ` Alberto Garcia
@ 2019-01-09 15:34 ` Alberto Garcia
  2019-01-11 13:19   ` Alberto Garcia
  2019-01-11 14:36 ` [Qemu-devel] " Paolo Bonzini
  2 siblings, 1 reply; 15+ messages in thread
From: Alberto Garcia @ 2019-01-09 15:34 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Kevin Wolf, Max Reitz, qemu-block, Paolo Bonzini

On Wed 09 Jan 2019 12:01:44 PM CET, Stefan Hajnoczi wrote:
> The following QMP command leads to a crash when iothreads are used:
>
>   { 'execute': 'device_del', 'arguments': {'id': 'data'} }

How did you reproduce this? Do you have a test case?

Berto

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] throttle-groups: fix restart coroutine iothread race
  2019-01-09 15:34 ` Alberto Garcia
@ 2019-01-11 13:19   ` Alberto Garcia
  2019-01-11 13:24     ` Kevin Wolf
  0 siblings, 1 reply; 15+ messages in thread
From: Alberto Garcia @ 2019-01-11 13:19 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, qemu-block, Max Reitz

On Wed 09 Jan 2019 04:34:10 PM CET, Alberto Garcia wrote:
> On Wed 09 Jan 2019 12:01:44 PM CET, Stefan Hajnoczi wrote:
>> The following QMP command leads to a crash when iothreads are used:
>>
>>   { 'execute': 'device_del', 'arguments': {'id': 'data'} }
>
> How did you reproduce this? Do you have a test case?

Ok, I finally reproduced it, the patch looks good to me.

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] throttle-groups: fix restart coroutine iothread race
  2019-01-11 13:19   ` Alberto Garcia
@ 2019-01-11 13:24     ` Kevin Wolf
  2019-01-11 14:14       ` Alberto Garcia
  0 siblings, 1 reply; 15+ messages in thread
From: Kevin Wolf @ 2019-01-11 13:24 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: Stefan Hajnoczi, qemu-devel, Paolo Bonzini, qemu-block, Max Reitz

Am 11.01.2019 um 14:19 hat Alberto Garcia geschrieben:
> On Wed 09 Jan 2019 04:34:10 PM CET, Alberto Garcia wrote:
> > On Wed 09 Jan 2019 12:01:44 PM CET, Stefan Hajnoczi wrote:
> >> The following QMP command leads to a crash when iothreads are used:
> >>
> >>   { 'execute': 'device_del', 'arguments': {'id': 'data'} }
> >
> > How did you reproduce this? Do you have a test case?
> 
> Ok, I finally reproduced it, the patch looks good to me.

Can it be turned into a qemu-iotests case or is it too complicated for
that?

Kevin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] throttle-groups: fix restart coroutine iothread race
  2019-01-11 13:24     ` Kevin Wolf
@ 2019-01-11 14:14       ` Alberto Garcia
  2019-01-14 13:35         ` Stefan Hajnoczi
  0 siblings, 1 reply; 15+ messages in thread
From: Alberto Garcia @ 2019-01-11 14:14 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Stefan Hajnoczi, qemu-devel, Paolo Bonzini, qemu-block, Max Reitz

On Fri 11 Jan 2019 02:24:16 PM CET, Kevin Wolf wrote:
>> >> The following QMP command leads to a crash when iothreads are used:
>> >>
>> >>   { 'execute': 'device_del', 'arguments': {'id': 'data'} }
>> >
>> > How did you reproduce this? Do you have a test case?
>> 
>> Ok, I finally reproduced it, the patch looks good to me.
>
> Can it be turned into a qemu-iotests case or is it too complicated for
> that?

I can reproduce the problem reliably with this:

{ "execute": "qmp_capabilities" }
{ "execute": "blockdev-add",
  "arguments": {"driver": "null-co", "node-name": "hd0"}}
{ "execute": "object-add",
  "arguments": {"qom-type": "iothread", "id": "iothread0"}}
{ "execute": "device_add",
  "arguments": {"id": "scsi0", "driver": "virtio-scsi-pci",
                "iothread": "iothread0"}}
{ "execute": "device_add",
  "arguments": {"id": "scsi-hd0", "driver": "scsi-hd", "drive": "hd0"}}
{ "execute": "block_set_io_throttle",
  "arguments": {"id": "scsi-hd0", "bps": 0, "bps_rd": 0, "bps_wr": 0,
                "iops": 1000, "iops_rd": 0, "iops_wr": 0}}
{ "execute": "device_del", "arguments": {"id": "scsi-hd0"}}

But this doesn't crash if I put it in an iotest.

Berto

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] throttle-groups: fix restart coroutine iothread race
  2019-01-09 11:01 [Qemu-devel] [PATCH] throttle-groups: fix restart coroutine iothread race Stefan Hajnoczi
  2019-01-09 13:49 ` Alberto Garcia
  2019-01-09 15:34 ` Alberto Garcia
@ 2019-01-11 14:36 ` Paolo Bonzini
  2 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2019-01-11 14:36 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Kevin Wolf, Max Reitz, qemu-block, Alberto Garcia

On 09/01/19 12:01, Stefan Hajnoczi wrote:
>      g_free(data);
> +
> +    tgm->restart_pending--;
> +    aio_wait_kick();
>  }
>  
>  static void throttle_group_restart_queue(ThrottleGroupMember *tgm, bool is_write)
> @@ -430,6 +433,8 @@ static void throttle_group_restart_queue(ThrottleGroupMember *tgm, bool is_write
>       * be no timer pending on this tgm at this point */
>      assert(!timer_pending(tgm->throttle_timers.timers[is_write]));
>  
> +    tgm->restart_pending++;
> +
>      co = qemu_coroutine_create(throttle_group_restart_queue_entry, rd);
>      aio_co_enter(tgm->aio_context, co);
>  }
> @@ -538,6 +543,7 @@ void throttle_group_register_tgm(ThrottleGroupMember *tgm,
>  
>      tgm->throttle_state = ts;
>      tgm->aio_context = ctx;
> +    tgm->restart_pending = 0;
>  
>      qemu_mutex_lock(&tg->lock);
>      /* If the ThrottleGroup is new set this ThrottleGroupMember as the token */
> @@ -584,6 +590,9 @@ void throttle_group_unregister_tgm(ThrottleGroupMember *tgm)
>          return;
>      }
>  
> +    /* Wait for throttle_group_restart_queue_entry() coroutines to finish */
> +    AIO_WAIT_WHILE(tgm->aio_context, tgm->restart_pending > 0);
> +

Could you change this to atomic_inc/dec, and atomic_read here?  It would
be nice to avoid more uses of the AioContext lock.

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] throttle-groups: fix restart coroutine iothread race
  2019-01-11 14:14       ` Alberto Garcia
@ 2019-01-14 13:35         ` Stefan Hajnoczi
  2019-01-14 14:40           ` Alberto Garcia
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Hajnoczi @ 2019-01-14 13:35 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: Kevin Wolf, qemu-devel, Paolo Bonzini, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1418 bytes --]

On Fri, Jan 11, 2019 at 03:14:08PM +0100, Alberto Garcia wrote:
> On Fri 11 Jan 2019 02:24:16 PM CET, Kevin Wolf wrote:
> >> >> The following QMP command leads to a crash when iothreads are used:
> >> >>
> >> >>   { 'execute': 'device_del', 'arguments': {'id': 'data'} }
> >> >
> >> > How did you reproduce this? Do you have a test case?
> >> 
> >> Ok, I finally reproduced it, the patch looks good to me.
> >
> > Can it be turned into a qemu-iotests case or is it too complicated for
> > that?
> 
> I can reproduce the problem reliably with this:
> 
> { "execute": "qmp_capabilities" }
> { "execute": "blockdev-add",
>   "arguments": {"driver": "null-co", "node-name": "hd0"}}
> { "execute": "object-add",
>   "arguments": {"qom-type": "iothread", "id": "iothread0"}}
> { "execute": "device_add",
>   "arguments": {"id": "scsi0", "driver": "virtio-scsi-pci",
>                 "iothread": "iothread0"}}
> { "execute": "device_add",
>   "arguments": {"id": "scsi-hd0", "driver": "scsi-hd", "drive": "hd0"}}
> { "execute": "block_set_io_throttle",
>   "arguments": {"id": "scsi-hd0", "bps": 0, "bps_rd": 0, "bps_wr": 0,
>                 "iops": 1000, "iops_rd": 0, "iops_wr": 0}}
> { "execute": "device_del", "arguments": {"id": "scsi-hd0"}}
> 
> But this doesn't crash if I put it in an iotest.

I've been able to reproduce this in an iotest, please see v2 of this
series.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] throttle-groups: fix restart coroutine iothread race
  2019-01-14 13:35         ` Stefan Hajnoczi
@ 2019-01-14 14:40           ` Alberto Garcia
  2019-01-14 16:15             ` Stefan Hajnoczi
  0 siblings, 1 reply; 15+ messages in thread
From: Alberto Garcia @ 2019-01-14 14:40 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, qemu-devel, Paolo Bonzini, qemu-block, Max Reitz

On Mon 14 Jan 2019 02:35:53 PM CET, Stefan Hajnoczi wrote:
> On Fri, Jan 11, 2019 at 03:14:08PM +0100, Alberto Garcia wrote:
>> On Fri 11 Jan 2019 02:24:16 PM CET, Kevin Wolf wrote:
>> >> >> The following QMP command leads to a crash when iothreads are used:
>> >> >>
>> >> >>   { 'execute': 'device_del', 'arguments': {'id': 'data'} }
>> >> >
>> >> > How did you reproduce this? Do you have a test case?
>> >> 
>> >> Ok, I finally reproduced it, the patch looks good to me.
>> >
>> > Can it be turned into a qemu-iotests case or is it too complicated for
>> > that?
>> 
>> I can reproduce the problem reliably with this:
>> 
>> { "execute": "qmp_capabilities" }
>> { "execute": "blockdev-add",
>>   "arguments": {"driver": "null-co", "node-name": "hd0"}}
>> { "execute": "object-add",
>>   "arguments": {"qom-type": "iothread", "id": "iothread0"}}
>> { "execute": "device_add",
>>   "arguments": {"id": "scsi0", "driver": "virtio-scsi-pci",
>>                 "iothread": "iothread0"}}
>> { "execute": "device_add",
>>   "arguments": {"id": "scsi-hd0", "driver": "scsi-hd", "drive": "hd0"}}
>> { "execute": "block_set_io_throttle",
>>   "arguments": {"id": "scsi-hd0", "bps": 0, "bps_rd": 0, "bps_wr": 0,
>>                 "iops": 1000, "iops_rd": 0, "iops_wr": 0}}
>> { "execute": "device_del", "arguments": {"id": "scsi-hd0"}}
>> 
>> But this doesn't crash if I put it in an iotest.
>
> I've been able to reproduce this in an iotest, please see v2 of this
> series.

That iotest doesn't crash for me :-?

Berto

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] throttle-groups: fix restart coroutine iothread race
  2019-01-14 14:40           ` Alberto Garcia
@ 2019-01-14 16:15             ` Stefan Hajnoczi
  2019-01-14 16:26               ` Alberto Garcia
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Hajnoczi @ 2019-01-14 16:15 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: Kevin Wolf, qemu-devel, Paolo Bonzini, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1837 bytes --]

On Mon, Jan 14, 2019 at 03:40:39PM +0100, Alberto Garcia wrote:
> On Mon 14 Jan 2019 02:35:53 PM CET, Stefan Hajnoczi wrote:
> > On Fri, Jan 11, 2019 at 03:14:08PM +0100, Alberto Garcia wrote:
> >> On Fri 11 Jan 2019 02:24:16 PM CET, Kevin Wolf wrote:
> >> >> >> The following QMP command leads to a crash when iothreads are used:
> >> >> >>
> >> >> >>   { 'execute': 'device_del', 'arguments': {'id': 'data'} }
> >> >> >
> >> >> > How did you reproduce this? Do you have a test case?
> >> >> 
> >> >> Ok, I finally reproduced it, the patch looks good to me.
> >> >
> >> > Can it be turned into a qemu-iotests case or is it too complicated for
> >> > that?
> >> 
> >> I can reproduce the problem reliably with this:
> >> 
> >> { "execute": "qmp_capabilities" }
> >> { "execute": "blockdev-add",
> >>   "arguments": {"driver": "null-co", "node-name": "hd0"}}
> >> { "execute": "object-add",
> >>   "arguments": {"qom-type": "iothread", "id": "iothread0"}}
> >> { "execute": "device_add",
> >>   "arguments": {"id": "scsi0", "driver": "virtio-scsi-pci",
> >>                 "iothread": "iothread0"}}
> >> { "execute": "device_add",
> >>   "arguments": {"id": "scsi-hd0", "driver": "scsi-hd", "drive": "hd0"}}
> >> { "execute": "block_set_io_throttle",
> >>   "arguments": {"id": "scsi-hd0", "bps": 0, "bps_rd": 0, "bps_wr": 0,
> >>                 "iops": 1000, "iops_rd": 0, "iops_wr": 0}}
> >> { "execute": "device_del", "arguments": {"id": "scsi-hd0"}}
> >> 
> >> But this doesn't crash if I put it in an iotest.
> >
> > I've been able to reproduce this in an iotest, please see v2 of this
> > series.
> 
> That iotest doesn't crash for me :-?

Does my iotest pass for you?

On my machine the diff fails because signal 11 killed QEMU (and this
prints a warning to the test output).

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] throttle-groups: fix restart coroutine iothread race
  2019-01-14 16:15             ` Stefan Hajnoczi
@ 2019-01-14 16:26               ` Alberto Garcia
  2019-01-14 16:31                 ` Stefan Hajnoczi
  0 siblings, 1 reply; 15+ messages in thread
From: Alberto Garcia @ 2019-01-14 16:26 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, qemu-devel, Paolo Bonzini, qemu-block, Max Reitz

On Mon 14 Jan 2019 05:15:25 PM CET, Stefan Hajnoczi wrote:
>> > I've been able to reproduce this in an iotest, please see v2 of this
>> > series.
>> 
>> That iotest doesn't crash for me :-?
>
> Does my iotest pass for you?

Yes, it does. I'm trying to figure out why because if I run the QMP
commands by hand then it does crash.

Berto

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] throttle-groups: fix restart coroutine iothread race
  2019-01-14 16:26               ` Alberto Garcia
@ 2019-01-14 16:31                 ` Stefan Hajnoczi
  2019-01-14 20:56                   ` Alberto Garcia
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Hajnoczi @ 2019-01-14 16:31 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: Kevin Wolf, qemu-devel, Paolo Bonzini, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 539 bytes --]

On Mon, Jan 14, 2019 at 05:26:48PM +0100, Alberto Garcia wrote:
> On Mon 14 Jan 2019 05:15:25 PM CET, Stefan Hajnoczi wrote:
> >> > I've been able to reproduce this in an iotest, please see v2 of this
> >> > series.
> >> 
> >> That iotest doesn't crash for me :-?
> >
> > Does my iotest pass for you?
> 
> Yes, it does. I'm trying to figure out why because if I run the QMP
> commands by hand then it does crash.

I ran the iotest 20 times on my machine and it segfaulted every time
(with the fix not yet applied).

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] throttle-groups: fix restart coroutine iothread race
  2019-01-14 16:31                 ` Stefan Hajnoczi
@ 2019-01-14 20:56                   ` Alberto Garcia
  2019-01-15 14:18                     ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  0 siblings, 1 reply; 15+ messages in thread
From: Alberto Garcia @ 2019-01-14 20:56 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, qemu-devel, Paolo Bonzini, qemu-block, Max Reitz

On Mon 14 Jan 2019 05:31:17 PM CET, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> On Mon, Jan 14, 2019 at 05:26:48PM +0100, Alberto Garcia wrote:
>> On Mon 14 Jan 2019 05:15:25 PM CET, Stefan Hajnoczi wrote:
>> >> > I've been able to reproduce this in an iotest, please see v2 of this
>> >> > series.
>> >> 
>> >> That iotest doesn't crash for me :-?
>> >
>> > Does my iotest pass for you?
>> 
>> Yes, it does. I'm trying to figure out why because if I run the QMP
>> commands by hand then it does crash.
>
> I ran the iotest 20 times on my machine and it segfaulted every time
> (with the fix not yet applied).

Yeah I can also reproduce it all the time if I run it by hand...

I was debugging it and although I don't know why this is different when
I run it through tests/qemu-iotests/check, here's why it doesn't crash:

After the ThrottleGroupMember is unregistered and its BlockBackend is
destroyed, the throttle_group_co_restart_queue() coroutine takes
control.

The first thing that it does is lock tgm->throttled_reqs_lock. It turns
out that although this memory has been freed (it's part of the
BlockBackend struct) it is still accessible but contains pure
gargabe. 'Garbage' here means that the mutex counter contains some
random value != 0, so the thread waits, it doesn't have a chance to
crash the process, and QEMU shuts down cleanly.

So if my understanding is correct QEMU can be shut down when there are
iothreads waiting for a mutex. Is that something that we should be
worried about?

Berto

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH] throttle-groups: fix restart coroutine iothread race
  2019-01-14 20:56                   ` Alberto Garcia
@ 2019-01-15 14:18                     ` Stefan Hajnoczi
  2019-01-15 14:49                       ` Alberto Garcia
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Hajnoczi @ 2019-01-15 14:18 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: Stefan Hajnoczi, Kevin Wolf, Paolo Bonzini, qemu-devel,
	qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 2045 bytes --]

On Mon, Jan 14, 2019 at 09:56:28PM +0100, Alberto Garcia wrote:
> On Mon 14 Jan 2019 05:31:17 PM CET, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > On Mon, Jan 14, 2019 at 05:26:48PM +0100, Alberto Garcia wrote:
> >> On Mon 14 Jan 2019 05:15:25 PM CET, Stefan Hajnoczi wrote:
> >> >> > I've been able to reproduce this in an iotest, please see v2 of this
> >> >> > series.
> >> >> 
> >> >> That iotest doesn't crash for me :-?
> >> >
> >> > Does my iotest pass for you?
> >> 
> >> Yes, it does. I'm trying to figure out why because if I run the QMP
> >> commands by hand then it does crash.
> >
> > I ran the iotest 20 times on my machine and it segfaulted every time
> > (with the fix not yet applied).
> 
> Yeah I can also reproduce it all the time if I run it by hand...
> 
> I was debugging it and although I don't know why this is different when
> I run it through tests/qemu-iotests/check, here's why it doesn't crash:
> 
> After the ThrottleGroupMember is unregistered and its BlockBackend is
> destroyed, the throttle_group_co_restart_queue() coroutine takes
> control.
> 
> The first thing that it does is lock tgm->throttled_reqs_lock. It turns
> out that although this memory has been freed (it's part of the
> BlockBackend struct) it is still accessible but contains pure
> gargabe. 'Garbage' here means that the mutex counter contains some
> random value != 0, so the thread waits, it doesn't have a chance to
> crash the process, and QEMU shuts down cleanly.
> 
> So if my understanding is correct QEMU can be shut down when there are
> iothreads waiting for a mutex. Is that something that we should be
> worried about?

Nothing joins the iothreads in vl.c:main().

The assumption is that anything using iothreads will detach from them.
For example, the vm runstate changes during shutdown so devices can
disable the iothread code path (and this involves draining in-flight
requests).

My fix effectively does this by waiting for in-flight throttling restart
coroutines.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH] throttle-groups: fix restart coroutine iothread race
  2019-01-15 14:18                     ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2019-01-15 14:49                       ` Alberto Garcia
  0 siblings, 0 replies; 15+ messages in thread
From: Alberto Garcia @ 2019-01-15 14:49 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Stefan Hajnoczi, Kevin Wolf, Paolo Bonzini, qemu-devel,
	qemu-block, Max Reitz

On Tue 15 Jan 2019 03:18:00 PM CET, Stefan Hajnoczi wrote:
>> So if my understanding is correct QEMU can be shut down when there
>> are iothreads waiting for a mutex. Is that something that we should
>> be worried about?
>
> Nothing joins the iothreads in vl.c:main().
>
> The assumption is that anything using iothreads will detach from them.
> For example, the vm runstate changes during shutdown so devices can
> disable the iothread code path (and this involves draining in-flight
> requests).
>
> My fix effectively does this by waiting for in-flight throttling
> restart coroutines.

Yeah, it's clear in the case of your fix. Thanks!

Berto

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-01-15 14:49 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-09 11:01 [Qemu-devel] [PATCH] throttle-groups: fix restart coroutine iothread race Stefan Hajnoczi
2019-01-09 13:49 ` Alberto Garcia
2019-01-09 15:34 ` Alberto Garcia
2019-01-11 13:19   ` Alberto Garcia
2019-01-11 13:24     ` Kevin Wolf
2019-01-11 14:14       ` Alberto Garcia
2019-01-14 13:35         ` Stefan Hajnoczi
2019-01-14 14:40           ` Alberto Garcia
2019-01-14 16:15             ` Stefan Hajnoczi
2019-01-14 16:26               ` Alberto Garcia
2019-01-14 16:31                 ` Stefan Hajnoczi
2019-01-14 20:56                   ` Alberto Garcia
2019-01-15 14:18                     ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2019-01-15 14:49                       ` Alberto Garcia
2019-01-11 14:36 ` [Qemu-devel] " Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.