All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] block: Let bdrv_drain_all() to call aio_poll() for each AioContext
@ 2015-05-13 15:18 Alexander Yarygin
  2015-05-13 15:23 ` Paolo Bonzini
  2015-05-13 16:02 ` [Qemu-devel] [Qemu-block] " Alberto Garcia
  0 siblings, 2 replies; 10+ messages in thread
From: Alexander Yarygin @ 2015-05-13 15:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Alexander Yarygin, Christian Borntraeger,
	Stefan Hajnoczi, Cornelia Huck, Paolo Bonzini

After the commit 9b536adc ("block: acquire AioContext in
bdrv_drain_all()") the aio_poll() function got called for every
BlockDriverState, in assumption that every device may have its own
AioContext. The bdrv_drain_all() function is called in each
virtio_reset() call, which in turn is called for every virtio-blk
device on initialization, so we got aio_poll() called
'length(device_list)^2' times.

If we have thousands of disks attached, there are a lot of
BlockDriverStates but only a few AioContexts, leading to tons of
unnecessary aio_poll() calls. For example, startup times with 1000 disks
takes over 13 minutes.

This patch changes the bdrv_drain_all() function allowing it find shared
AioContexts and to call aio_poll() only for unique ones. This results in
much better startup times, e.g. 1000 disks do come up within 5 seconds.

Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
---
 block.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index f2f8ae7..7414815 100644
--- a/block.c
+++ b/block.c
@@ -1994,7 +1994,6 @@ static bool bdrv_drain_one(BlockDriverState *bs)
     bdrv_flush_io_queue(bs);
     bdrv_start_throttled_reqs(bs);
     bs_busy = bdrv_requests_pending(bs);
-    bs_busy |= aio_poll(bdrv_get_aio_context(bs), bs_busy);
     return bs_busy;
 }
 
@@ -2010,8 +2009,12 @@ static bool bdrv_drain_one(BlockDriverState *bs)
  */
 void bdrv_drain(BlockDriverState *bs)
 {
-    while (bdrv_drain_one(bs)) {
+    bool busy = true;
+
+    while (busy) {
         /* Keep iterating */
+        busy = bdrv_drain_one(bs);
+        busy |= aio_poll(bdrv_get_aio_context(bs), busy);
     }
 }
 
@@ -2032,6 +2035,7 @@ void bdrv_drain_all(void)
     /* Always run first iteration so any pending completion BHs run */
     bool busy = true;
     BlockDriverState *bs;
+    GList *aio_ctxs = NULL;
 
     while (busy) {
         busy = false;
@@ -2041,9 +2045,14 @@ void bdrv_drain_all(void)
 
             aio_context_acquire(aio_context);
             busy |= bdrv_drain_one(bs);
+            if (!aio_ctxs || !g_list_find(aio_ctxs, aio_context)) {
+                busy |= aio_poll(aio_context, busy);
+                aio_ctxs = g_list_append(aio_ctxs, aio_context);
+            }
             aio_context_release(aio_context);
         }
     }
+    g_list_free(aio_ctxs);
 }
 
 /* make a BlockDriverState anonymous by removing from bdrv_state and
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] block: Let bdrv_drain_all() to call aio_poll() for each AioContext
  2015-05-13 15:18 [Qemu-devel] [PATCH] block: Let bdrv_drain_all() to call aio_poll() for each AioContext Alexander Yarygin
@ 2015-05-13 15:23 ` Paolo Bonzini
  2015-05-13 16:34   ` Alexander Yarygin
  2015-05-13 16:02 ` [Qemu-devel] [Qemu-block] " Alberto Garcia
  1 sibling, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2015-05-13 15:23 UTC (permalink / raw)
  To: Alexander Yarygin, qemu-devel
  Cc: Cornelia Huck, Christian Borntraeger, Kevin Wolf,
	Stefan Hajnoczi, qemu-block



On 13/05/2015 17:18, Alexander Yarygin wrote:
> After the commit 9b536adc ("block: acquire AioContext in
> bdrv_drain_all()") the aio_poll() function got called for every
> BlockDriverState, in assumption that every device may have its own
> AioContext. The bdrv_drain_all() function is called in each
> virtio_reset() call,

... which should actually call bdrv_drain().  Can you fix that?

> which in turn is called for every virtio-blk
> device on initialization, so we got aio_poll() called
> 'length(device_list)^2' times.
> 
> If we have thousands of disks attached, there are a lot of
> BlockDriverStates but only a few AioContexts, leading to tons of
> unnecessary aio_poll() calls. For example, startup times with 1000 disks
> takes over 13 minutes.
> 
> This patch changes the bdrv_drain_all() function allowing it find shared
> AioContexts and to call aio_poll() only for unique ones. This results in
> much better startup times, e.g. 1000 disks do come up within 5 seconds.

I'm not sure this patch is correct.  You may have to call aio_poll
multiple times before a BlockDriverState is drained.

Paolo

> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
> ---
>  block.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/block.c b/block.c
> index f2f8ae7..7414815 100644
> --- a/block.c
> +++ b/block.c
> @@ -1994,7 +1994,6 @@ static bool bdrv_drain_one(BlockDriverState *bs)
>      bdrv_flush_io_queue(bs);
>      bdrv_start_throttled_reqs(bs);
>      bs_busy = bdrv_requests_pending(bs);
> -    bs_busy |= aio_poll(bdrv_get_aio_context(bs), bs_busy);
>      return bs_busy;
>  }
>  
> @@ -2010,8 +2009,12 @@ static bool bdrv_drain_one(BlockDriverState *bs)
>   */
>  void bdrv_drain(BlockDriverState *bs)
>  {
> -    while (bdrv_drain_one(bs)) {
> +    bool busy = true;
> +
> +    while (busy) {
>          /* Keep iterating */
> +        busy = bdrv_drain_one(bs);
> +        busy |= aio_poll(bdrv_get_aio_context(bs), busy);
>      }
>  }
>  
> @@ -2032,6 +2035,7 @@ void bdrv_drain_all(void)
>      /* Always run first iteration so any pending completion BHs run */
>      bool busy = true;
>      BlockDriverState *bs;
> +    GList *aio_ctxs = NULL;
>  
>      while (busy) {
>          busy = false;
> @@ -2041,9 +2045,14 @@ void bdrv_drain_all(void)
>  
>              aio_context_acquire(aio_context);
>              busy |= bdrv_drain_one(bs);
> +            if (!aio_ctxs || !g_list_find(aio_ctxs, aio_context)) {
> +                busy |= aio_poll(aio_context, busy);
> +                aio_ctxs = g_list_append(aio_ctxs, aio_context);
> +            }
>              aio_context_release(aio_context);
>          }
>      }
> +    g_list_free(aio_ctxs);
>  }
>  
>  /* make a BlockDriverState anonymous by removing from bdrv_state and
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH] block: Let bdrv_drain_all() to call aio_poll() for each AioContext
  2015-05-13 15:18 [Qemu-devel] [PATCH] block: Let bdrv_drain_all() to call aio_poll() for each AioContext Alexander Yarygin
  2015-05-13 15:23 ` Paolo Bonzini
@ 2015-05-13 16:02 ` Alberto Garcia
  2015-05-13 16:37   ` Alexander Yarygin
  1 sibling, 1 reply; 10+ messages in thread
From: Alberto Garcia @ 2015-05-13 16:02 UTC (permalink / raw)
  To: Alexander Yarygin, qemu-devel
  Cc: qemu-block, Christian Borntraeger, Stefan Hajnoczi,
	Cornelia Huck, Paolo Bonzini

On Wed 13 May 2015 05:18:31 PM CEST, Alexander Yarygin <yarygin@linux.vnet.ibm.com> wrote:

> +            if (!aio_ctxs || !g_list_find(aio_ctxs, aio_context)) {
> +                busy |= aio_poll(aio_context, busy);
> +                aio_ctxs = g_list_append(aio_ctxs, aio_context);
> +            }

g_list_append() walks the whole list in order to append an element, I
think you should use _prepend() instead.

And since that's the only operation you're doing you can use a GSList
instead.

Berto

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] block: Let bdrv_drain_all() to call aio_poll() for each AioContext
  2015-05-13 15:23 ` Paolo Bonzini
@ 2015-05-13 16:34   ` Alexander Yarygin
  2015-05-14  2:25     ` Fam Zheng
  2015-05-14 12:05     ` Paolo Bonzini
  0 siblings, 2 replies; 10+ messages in thread
From: Alexander Yarygin @ 2015-05-13 16:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, qemu-block, qemu-devel, Christian Borntraeger,
	Stefan Hajnoczi, Cornelia Huck

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 13/05/2015 17:18, Alexander Yarygin wrote:
>> After the commit 9b536adc ("block: acquire AioContext in
>> bdrv_drain_all()") the aio_poll() function got called for every
>> BlockDriverState, in assumption that every device may have its own
>> AioContext. The bdrv_drain_all() function is called in each
>> virtio_reset() call,
>
> ... which should actually call bdrv_drain().  Can you fix that?
>

I thought about it, but couldn't come to conclusion that it's safe. The
comment above bdrv_drain_all() states "... it is not possible to have a
function to drain a single device's I/O queue.", besides that what if we
have several virtual disks that share host file?
Or I'm wrong and it's ok to do?

>> which in turn is called for every virtio-blk
>> device on initialization, so we got aio_poll() called
>> 'length(device_list)^2' times.
>> 
>> If we have thousands of disks attached, there are a lot of
>> BlockDriverStates but only a few AioContexts, leading to tons of
>> unnecessary aio_poll() calls. For example, startup times with 1000 disks
>> takes over 13 minutes.
>> 
>> This patch changes the bdrv_drain_all() function allowing it find shared
>> AioContexts and to call aio_poll() only for unique ones. This results in
>> much better startup times, e.g. 1000 disks do come up within 5 seconds.
>
> I'm not sure this patch is correct.  You may have to call aio_poll
> multiple times before a BlockDriverState is drained.
>
> Paolo
>


Ah, right. We need second loop, something like this:

@@ -2030,20 +2033,33 @@ void bdrv_drain(BlockDriverState *bs)
 void bdrv_drain_all(void)
 {
     /* Always run first iteration so any pending completion BHs run */
-    bool busy = true;
+    bool busy = true, pending = false;
     BlockDriverState *bs;
+    GList *aio_ctxs = NULL, *ctx;
+    AioContext *aio_context;

     while (busy) {
         busy = false;

         QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
-            AioContext *aio_context = bdrv_get_aio_context(bs);
+            aio_context = bdrv_get_aio_context(bs);

             aio_context_acquire(aio_context);
             busy |= bdrv_drain_one(bs);
             aio_context_release(aio_context);
+            if (!aio_ctxs || !g_list_find(aio_ctxs, aio_context))
+                aio_ctxs = g_list_append(aio_ctxs, aio_context);
+        }
+        pending = busy;
+
+        for (ctx = aio_ctxs; ctx != NULL; ctx = ctx->next) {
+            aio_context = ctx->data;
+            aio_context_acquire(aio_context);
+            busy |= aio_poll(aio_context, pending);
+            aio_context_release(aio_context);
         }
     }
+    g_list_free(aio_ctxs);
 }

That looks quite ugly for me and breaks consistence of bdrv_drain_one()
since it doesn't call aio_poll() anymore...


>> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
>> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
>> Cc: Kevin Wolf <kwolf@redhat.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>> Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
>> ---
>>  block.c | 13 +++++++++++--
>>  1 file changed, 11 insertions(+), 2 deletions(-)
>> 
>> diff --git a/block.c b/block.c
>> index f2f8ae7..7414815 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -1994,7 +1994,6 @@ static bool bdrv_drain_one(BlockDriverState *bs)
>>      bdrv_flush_io_queue(bs);
>>      bdrv_start_throttled_reqs(bs);
>>      bs_busy = bdrv_requests_pending(bs);
>> -    bs_busy |= aio_poll(bdrv_get_aio_context(bs), bs_busy);
>>      return bs_busy;
>>  }
>>  
>> @@ -2010,8 +2009,12 @@ static bool bdrv_drain_one(BlockDriverState *bs)
>>   */
>>  void bdrv_drain(BlockDriverState *bs)
>>  {
>> -    while (bdrv_drain_one(bs)) {
>> +    bool busy = true;
>> +
>> +    while (busy) {
>>          /* Keep iterating */
>> +        busy = bdrv_drain_one(bs);
>> +        busy |= aio_poll(bdrv_get_aio_context(bs), busy);
>>      }
>>  }
>>  
>> @@ -2032,6 +2035,7 @@ void bdrv_drain_all(void)
>>      /* Always run first iteration so any pending completion BHs run */
>>      bool busy = true;
>>      BlockDriverState *bs;
>> +    GList *aio_ctxs = NULL;
>>  
>>      while (busy) {
>>          busy = false;
>> @@ -2041,9 +2045,14 @@ void bdrv_drain_all(void)
>>  
>>              aio_context_acquire(aio_context);
>>              busy |= bdrv_drain_one(bs);
>> +            if (!aio_ctxs || !g_list_find(aio_ctxs, aio_context)) {
>> +                busy |= aio_poll(aio_context, busy);
>> +                aio_ctxs = g_list_append(aio_ctxs, aio_context);
>> +            }
>>              aio_context_release(aio_context);
>>          }
>>      }
>> +    g_list_free(aio_ctxs);
>>  }
>>  
>>  /* make a BlockDriverState anonymous by removing from bdrv_state and
>> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH] block: Let bdrv_drain_all() to call aio_poll() for each AioContext
  2015-05-13 16:02 ` [Qemu-devel] [Qemu-block] " Alberto Garcia
@ 2015-05-13 16:37   ` Alexander Yarygin
  0 siblings, 0 replies; 10+ messages in thread
From: Alexander Yarygin @ 2015-05-13 16:37 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: qemu-block, qemu-devel, Christian Borntraeger, Stefan Hajnoczi,
	Cornelia Huck, Paolo Bonzini

Alberto Garcia <berto@igalia.com> writes:

> On Wed 13 May 2015 05:18:31 PM CEST, Alexander Yarygin <yarygin@linux.vnet.ibm.com> wrote:
>
>> +            if (!aio_ctxs || !g_list_find(aio_ctxs, aio_context)) {
>> +                busy |= aio_poll(aio_context, busy);
>> +                aio_ctxs = g_list_append(aio_ctxs, aio_context);
>> +            }
>
> g_list_append() walks the whole list in order to append an element, I
> think you should use _prepend() instead.
>
> And since that's the only operation you're doing you can use a GSList
> instead.
>
> Berto

This seems reasonable, thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] block: Let bdrv_drain_all() to call aio_poll() for each AioContext
  2015-05-13 16:34   ` Alexander Yarygin
@ 2015-05-14  2:25     ` Fam Zheng
  2015-05-14 10:57       ` Alexander Yarygin
  2015-05-14 12:05     ` Paolo Bonzini
  1 sibling, 1 reply; 10+ messages in thread
From: Fam Zheng @ 2015-05-14  2:25 UTC (permalink / raw)
  To: Alexander Yarygin
  Cc: Kevin Wolf, qemu-block, qemu-devel, Christian Borntraeger,
	Stefan Hajnoczi, Cornelia Huck, Paolo Bonzini

On Wed, 05/13 19:34, Alexander Yarygin wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
> > On 13/05/2015 17:18, Alexander Yarygin wrote:
> >> After the commit 9b536adc ("block: acquire AioContext in
> >> bdrv_drain_all()") the aio_poll() function got called for every
> >> BlockDriverState, in assumption that every device may have its own
> >> AioContext. The bdrv_drain_all() function is called in each
> >> virtio_reset() call,
> >
> > ... which should actually call bdrv_drain().  Can you fix that?
> >
> 
> I thought about it, but couldn't come to conclusion that it's safe. The
> comment above bdrv_drain_all() states "... it is not possible to have a
> function to drain a single device's I/O queue.",

I think that comment is stale - it predates the introduction of per BDS req
tracking and bdrv_drain.

> besides that what if we
> have several virtual disks that share host file?

I'm not sure what you mean, bdrv_drain works on a BDS, each virtual disk has
one of which.

> Or I'm wrong and it's ok to do?
> 
> >> which in turn is called for every virtio-blk
> >> device on initialization, so we got aio_poll() called
> >> 'length(device_list)^2' times.
> >> 
> >> If we have thousands of disks attached, there are a lot of
> >> BlockDriverStates but only a few AioContexts, leading to tons of
> >> unnecessary aio_poll() calls. For example, startup times with 1000 disks
> >> takes over 13 minutes.
> >> 
> >> This patch changes the bdrv_drain_all() function allowing it find shared
> >> AioContexts and to call aio_poll() only for unique ones. This results in
> >> much better startup times, e.g. 1000 disks do come up within 5 seconds.
> >
> > I'm not sure this patch is correct.  You may have to call aio_poll
> > multiple times before a BlockDriverState is drained.
> >
> > Paolo
> >
> 
> 
> Ah, right. We need second loop, something like this:
> 
> @@ -2030,20 +2033,33 @@ void bdrv_drain(BlockDriverState *bs)
>  void bdrv_drain_all(void)
>  {
>      /* Always run first iteration so any pending completion BHs run */
> -    bool busy = true;
> +    bool busy = true, pending = false;
>      BlockDriverState *bs;
> +    GList *aio_ctxs = NULL, *ctx;
> +    AioContext *aio_context;
> 
>      while (busy) {
>          busy = false;
> 
>          QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
> -            AioContext *aio_context = bdrv_get_aio_context(bs);
> +            aio_context = bdrv_get_aio_context(bs);
> 
>              aio_context_acquire(aio_context);
>              busy |= bdrv_drain_one(bs);
>              aio_context_release(aio_context);
> +            if (!aio_ctxs || !g_list_find(aio_ctxs, aio_context))
> +                aio_ctxs = g_list_append(aio_ctxs, aio_context);

Braces are required even for single line if. Moreover, I don't understand this
- aio_ctxs is a duplicate of bdrv_states.

Fam


> +        }
> +        pending = busy;
> +
> +        for (ctx = aio_ctxs; ctx != NULL; ctx = ctx->next) {
> +            aio_context = ctx->data;
> +            aio_context_acquire(aio_context);
> +            busy |= aio_poll(aio_context, pending);
> +            aio_context_release(aio_context);
>          }
>      }
> +    g_list_free(aio_ctxs);
>  }
> 
> That looks quite ugly for me and breaks consistence of bdrv_drain_one()
> since it doesn't call aio_poll() anymore...
> 
> 
> >> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> >> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> >> Cc: Kevin Wolf <kwolf@redhat.com>
> >> Cc: Paolo Bonzini <pbonzini@redhat.com>
> >> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> >> Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
> >> ---
> >>  block.c | 13 +++++++++++--
> >>  1 file changed, 11 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/block.c b/block.c
> >> index f2f8ae7..7414815 100644
> >> --- a/block.c
> >> +++ b/block.c
> >> @@ -1994,7 +1994,6 @@ static bool bdrv_drain_one(BlockDriverState *bs)
> >>      bdrv_flush_io_queue(bs);
> >>      bdrv_start_throttled_reqs(bs);
> >>      bs_busy = bdrv_requests_pending(bs);
> >> -    bs_busy |= aio_poll(bdrv_get_aio_context(bs), bs_busy);
> >>      return bs_busy;
> >>  }
> >>  
> >> @@ -2010,8 +2009,12 @@ static bool bdrv_drain_one(BlockDriverState *bs)
> >>   */
> >>  void bdrv_drain(BlockDriverState *bs)
> >>  {
> >> -    while (bdrv_drain_one(bs)) {
> >> +    bool busy = true;
> >> +
> >> +    while (busy) {
> >>          /* Keep iterating */
> >> +        busy = bdrv_drain_one(bs);
> >> +        busy |= aio_poll(bdrv_get_aio_context(bs), busy);
> >>      }
> >>  }
> >>  
> >> @@ -2032,6 +2035,7 @@ void bdrv_drain_all(void)
> >>      /* Always run first iteration so any pending completion BHs run */
> >>      bool busy = true;
> >>      BlockDriverState *bs;
> >> +    GList *aio_ctxs = NULL;
> >>  
> >>      while (busy) {
> >>          busy = false;
> >> @@ -2041,9 +2045,14 @@ void bdrv_drain_all(void)
> >>  
> >>              aio_context_acquire(aio_context);
> >>              busy |= bdrv_drain_one(bs);
> >> +            if (!aio_ctxs || !g_list_find(aio_ctxs, aio_context)) {
> >> +                busy |= aio_poll(aio_context, busy);
> >> +                aio_ctxs = g_list_append(aio_ctxs, aio_context);
> >> +            }
> >>              aio_context_release(aio_context);
> >>          }
> >>      }
> >> +    g_list_free(aio_ctxs);
> >>  }
> >>  
> >>  /* make a BlockDriverState anonymous by removing from bdrv_state and
> >> 
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] block: Let bdrv_drain_all() to call aio_poll() for each AioContext
  2015-05-14  2:25     ` Fam Zheng
@ 2015-05-14 10:57       ` Alexander Yarygin
  0 siblings, 0 replies; 10+ messages in thread
From: Alexander Yarygin @ 2015-05-14 10:57 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, qemu-block, qemu-devel, Christian Borntraeger,
	Stefan Hajnoczi, Cornelia Huck, Paolo Bonzini

Fam Zheng <famz@redhat.com> writes:

> On Wed, 05/13 19:34, Alexander Yarygin wrote:
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>> 
>> > On 13/05/2015 17:18, Alexander Yarygin wrote:
>> >> After the commit 9b536adc ("block: acquire AioContext in
>> >> bdrv_drain_all()") the aio_poll() function got called for every
>> >> BlockDriverState, in assumption that every device may have its own
>> >> AioContext. The bdrv_drain_all() function is called in each
>> >> virtio_reset() call,
>> >
>> > ... which should actually call bdrv_drain().  Can you fix that?
>> >
>> 
>> I thought about it, but couldn't come to conclusion that it's safe. The
>> comment above bdrv_drain_all() states "... it is not possible to have a
>> function to drain a single device's I/O queue.",
>
> I think that comment is stale - it predates the introduction of per BDS req
> tracking and bdrv_drain.
>

It says "completion of an asynchronous I/O operation can trigger any
number of other I/O operations on other devices". If this is no longer
the case, then I agree :). But I think it doesn't exclude this
patch anyway: bdrv_drain_all() is called in other places as well,
e.g. in do_vm_stop().

>> besides that what if we
>> have several virtual disks that share host file?
>
> I'm not sure what you mean, bdrv_drain works on a BDS, each virtual disk has
> one of which.
>
>> Or I'm wrong and it's ok to do?
>> 
>> >> which in turn is called for every virtio-blk
>> >> device on initialization, so we got aio_poll() called
>> >> 'length(device_list)^2' times.
>> >> 
>> >> If we have thousands of disks attached, there are a lot of
>> >> BlockDriverStates but only a few AioContexts, leading to tons of
>> >> unnecessary aio_poll() calls. For example, startup times with 1000 disks
>> >> takes over 13 minutes.
>> >> 
>> >> This patch changes the bdrv_drain_all() function allowing it find shared
>> >> AioContexts and to call aio_poll() only for unique ones. This results in
>> >> much better startup times, e.g. 1000 disks do come up within 5 seconds.
>> >
>> > I'm not sure this patch is correct.  You may have to call aio_poll
>> > multiple times before a BlockDriverState is drained.
>> >
>> > Paolo
>> >
>> 
>> 
>> Ah, right. We need second loop, something like this:
>> 
>> @@ -2030,20 +2033,33 @@ void bdrv_drain(BlockDriverState *bs)
>>  void bdrv_drain_all(void)
>>  {
>>      /* Always run first iteration so any pending completion BHs run */
>> -    bool busy = true;
>> +    bool busy = true, pending = false;
>>      BlockDriverState *bs;
>> +    GList *aio_ctxs = NULL, *ctx;
>> +    AioContext *aio_context;
>> 
>>      while (busy) {
>>          busy = false;
>> 
>>          QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
>> -            AioContext *aio_context = bdrv_get_aio_context(bs);
>> +            aio_context = bdrv_get_aio_context(bs);
>> 
>>              aio_context_acquire(aio_context);
>>              busy |= bdrv_drain_one(bs);
>>              aio_context_release(aio_context);
>> +            if (!aio_ctxs || !g_list_find(aio_ctxs, aio_context))
>> +                aio_ctxs = g_list_append(aio_ctxs, aio_context);
>
> Braces are required even for single line if. Moreover, I don't understand this
> - aio_ctxs is a duplicate of bdrv_states.
>
> Fam
>
>

length(bdrv_states) == amount of virtual disks
length(aio_ctxs) == amount of threads

We can get as many disks as we want, while amount of threads is
limited. In my case there were 1024 disks sharing one AioContext that
gives overhead at least in 1023 calls of aio_poll(). 

[.. skipped ..]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] block: Let bdrv_drain_all() to call aio_poll() for each AioContext
  2015-05-13 16:34   ` Alexander Yarygin
  2015-05-14  2:25     ` Fam Zheng
@ 2015-05-14 12:05     ` Paolo Bonzini
  2015-05-14 14:29       ` Alexander Yarygin
  1 sibling, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2015-05-14 12:05 UTC (permalink / raw)
  To: Alexander Yarygin
  Cc: Cornelia Huck, Christian Borntraeger, Stefan Hajnoczi,
	qemu-devel, qemu-block



On 13/05/2015 18:34, Alexander Yarygin wrote:
> Ah, right. We need second loop, something like this:
> 
> @@ -2030,20 +2033,33 @@ void bdrv_drain(BlockDriverState *bs)
>  void bdrv_drain_all(void)
>  {
>      /* Always run first iteration so any pending completion BHs run */
> -    bool busy = true;
> +    bool busy = true, pending = false;
>      BlockDriverState *bs;
> +    GList *aio_ctxs = NULL, *ctx;
> +    AioContext *aio_context;
> 
>      while (busy) {
>          busy = false;
> 
>          QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
> -            AioContext *aio_context = bdrv_get_aio_context(bs);
> +            aio_context = bdrv_get_aio_context(bs);
> 
>              aio_context_acquire(aio_context);
>              busy |= bdrv_drain_one(bs);
>              aio_context_release(aio_context);
> +            if (!aio_ctxs || !g_list_find(aio_ctxs, aio_context))
> +                aio_ctxs = g_list_append(aio_ctxs, aio_context);
> +        }
> +        pending = busy;
> +
> +        for (ctx = aio_ctxs; ctx != NULL; ctx = ctx->next) {
> +            aio_context = ctx->data;
> +            aio_context_acquire(aio_context);
> +            busy |= aio_poll(aio_context, pending);
> +            aio_context_release(aio_context);
>          }
>      }
> +    g_list_free(aio_ctxs);
>  }
> 
> That looks quite ugly for me and breaks consistence of bdrv_drain_one()
> since it doesn't call aio_poll() anymore...

It's not ugly.  After your patch bdrv_drain_one doesn't call aio_poll,
while bdrv_drain and bdrv_drain_all call bdrv_drain_one + aio_poll.  All
callers of bdrv_drain_one are consistent.

Perhaps you can rename bdrv_drain_one to bdrv_flush_io_queue (inlining
the existing bdrv_flush_io_queue into it)?  That would work very well
for me.

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] block: Let bdrv_drain_all() to call aio_poll() for each AioContext
  2015-05-14 12:05     ` Paolo Bonzini
@ 2015-05-14 14:29       ` Alexander Yarygin
  2015-05-14 14:34         ` Paolo Bonzini
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Yarygin @ 2015-05-14 14:29 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Cornelia Huck, Christian Borntraeger, Stefan Hajnoczi,
	qemu-devel, qemu-block

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 13/05/2015 18:34, Alexander Yarygin wrote:
>> Ah, right. We need second loop, something like this:
>> 
>> @@ -2030,20 +2033,33 @@ void bdrv_drain(BlockDriverState *bs)
>>  void bdrv_drain_all(void)
>>  {
>>      /* Always run first iteration so any pending completion BHs run */
>> -    bool busy = true;
>> +    bool busy = true, pending = false;
>>      BlockDriverState *bs;
>> +    GList *aio_ctxs = NULL, *ctx;
>> +    AioContext *aio_context;
>> 
>>      while (busy) {
>>          busy = false;
>> 
>>          QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
>> -            AioContext *aio_context = bdrv_get_aio_context(bs);
>> +            aio_context = bdrv_get_aio_context(bs);
>> 
>>              aio_context_acquire(aio_context);
>>              busy |= bdrv_drain_one(bs);
>>              aio_context_release(aio_context);
>> +            if (!aio_ctxs || !g_list_find(aio_ctxs, aio_context))
>> +                aio_ctxs = g_list_append(aio_ctxs, aio_context);
>> +        }
>> +        pending = busy;
>> +
>> +        for (ctx = aio_ctxs; ctx != NULL; ctx = ctx->next) {
>> +            aio_context = ctx->data;
>> +            aio_context_acquire(aio_context);
>> +            busy |= aio_poll(aio_context, pending);
>> +            aio_context_release(aio_context);
>>          }
>>      }
>> +    g_list_free(aio_ctxs);
>>  }
>> 
>> That looks quite ugly for me and breaks consistence of bdrv_drain_one()
>> since it doesn't call aio_poll() anymore...
>
> It's not ugly.  After your patch bdrv_drain_one doesn't call aio_poll,
> while bdrv_drain and bdrv_drain_all call bdrv_drain_one + aio_poll.  All
> callers of bdrv_drain_one are consistent.
>
> Perhaps you can rename bdrv_drain_one to bdrv_flush_io_queue (inlining
> the existing bdrv_flush_io_queue into it)?  That would work very well
> for me.
>
> Paolo

Hmm, bdrv_flush_io_queue() is public, but has no users. How about
different name, maybe something like "bdrv_drain_requests_one" or so?

Otherwise here is a patch related to renaming to bdrv_flush_io_queue()
below. If it's ok, I will respin the whole patch.

Thanks.

--- a/block.c
+++ b/block.c
@@ -1987,11 +1987,17 @@ static bool bdrv_requests_pending(BlockDriverState *bs)
     return false;
 }
 
-static bool bdrv_drain_one(BlockDriverState *bs)
+static bool bdrv_flush_io_queue(BlockDriverState *bs)
 {
     bool bs_busy;
+    BlockDriver *drv = bs->drv;
+
+    if (drv && drv->bdrv_flush_io_queue) {
+        drv->bdrv_flush_io_queue(bs);
+    } else if (bs->file) {
+        bdrv_flush_io_queue(bs->file);
+    }
 
-    bdrv_flush_io_queue(bs);
     bdrv_start_throttled_reqs(bs);
     bs_busy = bdrv_requests_pending(bs);
     return bs_busy;
@@ -2013,7 +2019,7 @@ void bdrv_drain(BlockDriverState *bs)
 
     while (busy) {
         /* Keep iterating */
-        busy = bdrv_drain_one(bs);
+        busy = bdrv_flush_io_queue(bs);
         busy |= aio_poll(bdrv_get_aio_context(bs), busy);
     }
 }
@@ -2044,7 +2050,7 @@ void bdrv_drain_all(void)
             AioContext *aio_context = bdrv_get_aio_context(bs);
 
             aio_context_acquire(aio_context);
-            busy |= bdrv_drain_one(bs);
+            busy |= bdrv_flush_io_queue(bs);
             if (!aio_ctxs || !g_slist_find(aio_ctxs, aio_context)) {
                 busy |= aio_poll(aio_context, busy);
                 aio_ctxs = g_slist_prepend(aio_ctxs, aio_context);
@@ -6088,16 +6094,6 @@ void bdrv_io_unplug(BlockDriverState *bs)
     }
 }
 
-void bdrv_flush_io_queue(BlockDriverState *bs)
-{
-    BlockDriver *drv = bs->drv;
-    if (drv && drv->bdrv_flush_io_queue) {
-        drv->bdrv_flush_io_queue(bs);
-    } else if (bs->file) {
-        bdrv_flush_io_queue(bs->file);
-    }
-}
-
 static bool append_open_options(QDict *d, BlockDriverState *bs)
 {
     const QDictEntry *entry;
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -565,7 +565,6 @@ int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry *geo);
 
 void bdrv_io_plug(BlockDriverState *bs);
 void bdrv_io_unplug(BlockDriverState *bs);
-void bdrv_flush_io_queue(BlockDriverState *bs);
 
 BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] block: Let bdrv_drain_all() to call aio_poll() for each AioContext
  2015-05-14 14:29       ` Alexander Yarygin
@ 2015-05-14 14:34         ` Paolo Bonzini
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2015-05-14 14:34 UTC (permalink / raw)
  To: Alexander Yarygin
  Cc: Cornelia Huck, Christian Borntraeger, Stefan Hajnoczi,
	qemu-devel, qemu-block



On 14/05/2015 16:29, Alexander Yarygin wrote:
> > Perhaps you can rename bdrv_drain_one to bdrv_flush_io_queue (inlining
> > the existing bdrv_flush_io_queue into it)?  That would work very well
> > for me.
>
> Hmm, bdrv_flush_io_queue() is public, but has no users. How about
> different name, maybe something like "bdrv_drain_requests_one" or so?

It's common for functions to call a driver hook, and then follow up with
generic code.  See bdrv_truncate for an example.  I would just keep
bdrv_flush_io_queue(); bdrv_start_throttled_reqs is really the generic
code to flush the I/O queue.

Perhaps, if you prefer, move bdrv_requests_pending(bs) to the callers so
that it keeps returning void?

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-05-14 14:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-13 15:18 [Qemu-devel] [PATCH] block: Let bdrv_drain_all() to call aio_poll() for each AioContext Alexander Yarygin
2015-05-13 15:23 ` Paolo Bonzini
2015-05-13 16:34   ` Alexander Yarygin
2015-05-14  2:25     ` Fam Zheng
2015-05-14 10:57       ` Alexander Yarygin
2015-05-14 12:05     ` Paolo Bonzini
2015-05-14 14:29       ` Alexander Yarygin
2015-05-14 14:34         ` Paolo Bonzini
2015-05-13 16:02 ` [Qemu-devel] [Qemu-block] " Alberto Garcia
2015-05-13 16:37   ` Alexander Yarygin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.