All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC for 2.9 1/1] block: add missed aio_context_acquire into blk_unref
@ 2017-03-27 15:35 Denis V. Lunev
  2017-03-27 15:45 ` Max Reitz
  0 siblings, 1 reply; 3+ messages in thread
From: Denis V. Lunev @ 2017-03-27 15:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Denis V. Lunev, Kevin Wolf, Max Reitz, Eric Blake, Markus Armbruster

Recently we expirience hang with iothreads enabled with the following
call trace:
Thread 1 (Thread 0x7fa95efebc80 (LWP 177117)):
0  ppoll () from /lib64/libc.so.6
2  qemu_poll_ns () at qemu-timer.c:313
3  aio_poll () at aio-posix.c:457
4  bdrv_flush () at block/io.c:2641
5  bdrv_close () at block.c:2143
6  bdrv_delete () at block.c:2352
7  bdrv_unref () at block.c:3429
8  blk_remove_bs () at block/block-backend.c:427
9  blk_delete () at block/block-backend.c:178
10 blk_unref () at block/block-backend.c:226
11 object_property_del_all () at qom/object.c:399
12 object_finalize () at qom/object.c:461
13 object_unref () at qom/object.c:898
14 object_property_del_child () at qom/object.c:422
15 qmp_marshal_device_del () at qmp-marshal.c:1145
16 handle_qmp_command () at /usr/src/debug/qemu-2.6.0/monitor.c:3929

Technically bdrv_flush() stucks in
    while (rwco.ret == NOT_DONE) {
        aio_poll(aio_context, true);
    }
but rwco.ret is equal to 0 thus we have missed wakeup. Code investigation
reveals that we do not have performed aio_context_acquire() on this call
stack.

This patch adds missed lock.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
---
 block/block-backend.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 5742c09..65d5da9 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -273,7 +273,11 @@ void blk_unref(BlockBackend *blk)
     if (blk) {
         assert(blk->refcnt > 0);
         if (!--blk->refcnt) {
+            AioContext *ctx = blk_get_aio_context(blk);
+
+            aio_context_acquire(ctx);
             blk_delete(blk);
+            aio_context_release(ctx);
         }
     }
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] [RFC for 2.9 1/1] block: add missed aio_context_acquire into blk_unref
  2017-03-27 15:35 [Qemu-devel] [RFC for 2.9 1/1] block: add missed aio_context_acquire into blk_unref Denis V. Lunev
@ 2017-03-27 15:45 ` Max Reitz
  2017-03-27 16:17   ` Denis V. Lunev
  0 siblings, 1 reply; 3+ messages in thread
From: Max Reitz @ 2017-03-27 15:45 UTC (permalink / raw)
  To: Denis V. Lunev, qemu-devel; +Cc: Kevin Wolf, Eric Blake, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 2752 bytes --]

@Subject: Do you mean "PATCH for-2.9?"? Because "RFC" to me means
"please do not merge". ;-)

I wouldn't mind a change like this to go into 2.9.

On 27.03.2017 17:35, Denis V. Lunev wrote:
> Recently we expirience hang with iothreads enabled with the following
> call trace:
> Thread 1 (Thread 0x7fa95efebc80 (LWP 177117)):
> 0  ppoll () from /lib64/libc.so.6
> 2  qemu_poll_ns () at qemu-timer.c:313
> 3  aio_poll () at aio-posix.c:457
> 4  bdrv_flush () at block/io.c:2641
> 5  bdrv_close () at block.c:2143
> 6  bdrv_delete () at block.c:2352
> 7  bdrv_unref () at block.c:3429
> 8  blk_remove_bs () at block/block-backend.c:427
> 9  blk_delete () at block/block-backend.c:178
> 10 blk_unref () at block/block-backend.c:226
> 11 object_property_del_all () at qom/object.c:399
> 12 object_finalize () at qom/object.c:461
> 13 object_unref () at qom/object.c:898
> 14 object_property_del_child () at qom/object.c:422
> 15 qmp_marshal_device_del () at qmp-marshal.c:1145
> 16 handle_qmp_command () at /usr/src/debug/qemu-2.6.0/monitor.c:3929
> 
> Technically bdrv_flush() stucks in
>     while (rwco.ret == NOT_DONE) {
>         aio_poll(aio_context, true);
>     }
> but rwco.ret is equal to 0 thus we have missed wakeup. Code investigation
> reveals that we do not have performed aio_context_acquire() on this call
> stack.
> 
> This patch adds missed lock.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Eric Blake <eblake@redhat.com>
> CC: Markus Armbruster <armbru@redhat.com>
> ---
>  block/block-backend.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 5742c09..65d5da9 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -273,7 +273,11 @@ void blk_unref(BlockBackend *blk)
>      if (blk) {
>          assert(blk->refcnt > 0);
>          if (!--blk->refcnt) {
> +            AioContext *ctx = blk_get_aio_context(blk);
> +
> +            aio_context_acquire(ctx);
>              blk_delete(blk);
> +            aio_context_release(ctx);

But I don't think this is quite the correct place. The caller of
blk_unref() should have acquired the AioContext already. As far as I can
tell in this case that would be originally release_drive() in
hw/core/qdev-properties-system.c and then blk_detach_dev().

I think the former would be the somehow more correct place but I can
imagine the latter to be more useful in reality. I'll leave it to you.

(As an alternative, you may of course convince me that this patch is
indeed correct and should be taken as-is. :-))

Max

>          }
>      }
>  }
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] [RFC for 2.9 1/1] block: add missed aio_context_acquire into blk_unref
  2017-03-27 15:45 ` Max Reitz
@ 2017-03-27 16:17   ` Denis V. Lunev
  0 siblings, 0 replies; 3+ messages in thread
From: Denis V. Lunev @ 2017-03-27 16:17 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Eric Blake, Markus Armbruster

On 03/27/2017 06:45 PM, Max Reitz wrote:
> @Subject: Do you mean "PATCH for-2.9?"? Because "RFC" to me means
> "please do not merge". ;-)
>
> I wouldn't mind a change like this to go into 2.9.
I am quite sure that problem is here but unsure about the place.
Here I use term 'RFC' as 'Can we start the discussion'?


> On 27.03.2017 17:35, Denis V. Lunev wrote:
>> Recently we expirience hang with iothreads enabled with the following
>> call trace:
>> Thread 1 (Thread 0x7fa95efebc80 (LWP 177117)):
>> 0  ppoll () from /lib64/libc.so.6
>> 2  qemu_poll_ns () at qemu-timer.c:313
>> 3  aio_poll () at aio-posix.c:457
>> 4  bdrv_flush () at block/io.c:2641
>> 5  bdrv_close () at block.c:2143
>> 6  bdrv_delete () at block.c:2352
>> 7  bdrv_unref () at block.c:3429
>> 8  blk_remove_bs () at block/block-backend.c:427
>> 9  blk_delete () at block/block-backend.c:178
>> 10 blk_unref () at block/block-backend.c:226
>> 11 object_property_del_all () at qom/object.c:399
>> 12 object_finalize () at qom/object.c:461
>> 13 object_unref () at qom/object.c:898
>> 14 object_property_del_child () at qom/object.c:422
>> 15 qmp_marshal_device_del () at qmp-marshal.c:1145
>> 16 handle_qmp_command () at /usr/src/debug/qemu-2.6.0/monitor.c:3929
>>
>> Technically bdrv_flush() stucks in
>>     while (rwco.ret == NOT_DONE) {
>>         aio_poll(aio_context, true);
>>     }
>> but rwco.ret is equal to 0 thus we have missed wakeup. Code investigation
>> reveals that we do not have performed aio_context_acquire() on this call
>> stack.
>>
>> This patch adds missed lock.
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Max Reitz <mreitz@redhat.com>
>> CC: Eric Blake <eblake@redhat.com>
>> CC: Markus Armbruster <armbru@redhat.com>
>> ---
>>  block/block-backend.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/block/block-backend.c b/block/block-backend.c
>> index 5742c09..65d5da9 100644
>> --- a/block/block-backend.c
>> +++ b/block/block-backend.c
>> @@ -273,7 +273,11 @@ void blk_unref(BlockBackend *blk)
>>      if (blk) {
>>          assert(blk->refcnt > 0);
>>          if (!--blk->refcnt) {
>> +            AioContext *ctx = blk_get_aio_context(blk);
>> +
>> +            aio_context_acquire(ctx);
>>              blk_delete(blk);
>> +            aio_context_release(ctx);
> But I don't think this is quite the correct place. The caller of
> blk_unref() should have acquired the AioContext already. As far as I can
> tell in this case that would be originally release_drive() in
> hw/core/qdev-properties-system.c and then blk_detach_dev().
>
> I think the former would be the somehow more correct place but I can
> imagine the latter to be more useful in reality. I'll leave it to you.
>
> (As an alternative, you may of course convince me that this patch is
> indeed correct and should be taken as-is. :-))
ha-ha ;) no, I not going to try to convince you and that is why the
patch was
marked as RFC. I like release_drive() place. This should solve the problem
for me.

Thank you for the quick answer and good proposal.

Den

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-03-27 16:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-27 15:35 [Qemu-devel] [RFC for 2.9 1/1] block: add missed aio_context_acquire into blk_unref Denis V. Lunev
2017-03-27 15:45 ` Max Reitz
2017-03-27 16:17   ` Denis V. Lunev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.