All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2] block: Let bdrv_drain_all() to call aio_poll() for each AioContext
@ 2015-05-14 16:03 Alexander Yarygin
  2015-05-15  2:04 ` Fam Zheng
  2015-05-15  6:59 ` Christian Borntraeger
  0 siblings, 2 replies; 5+ messages in thread
From: Alexander Yarygin @ 2015-05-14 16:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Alexander Yarygin, Ekaterina Tumanova,
	Christian Borntraeger, Stefan Hajnoczi, Cornelia Huck,
	Paolo Bonzini

After the commit 9b536adc ("block: acquire AioContext in
bdrv_drain_all()") the aio_poll() function got called for every
BlockDriverState, in assumption that every device may have its own
AioContext. The bdrv_drain_all() function is called in each
virtio_reset() call, which in turn is called for every virtio-blk
device on initialization, so we got aio_poll() called
'length(device_list)^2' times.

If we have thousands of disks attached, there are a lot of
BlockDriverStates but only a few AioContexts, leading to tons of
unnecessary aio_poll() calls. For example, startup times with 1000 disks
takes over 13 minutes.

This patch changes the bdrv_drain_all() function allowing it find shared
AioContexts and to call aio_poll() only for unique ones. This results in
much better startup times, e.g. 1000 disks do come up within 5 seconds.

Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
---
 block.c | 40 +++++++++++++++++++++++++---------------
 1 file changed, 25 insertions(+), 15 deletions(-)

diff --git a/block.c b/block.c
index f2f8ae7..bdfb1ce 100644
--- a/block.c
+++ b/block.c
@@ -1987,17 +1987,6 @@ static bool bdrv_requests_pending(BlockDriverState *bs)
     return false;
 }
 
-static bool bdrv_drain_one(BlockDriverState *bs)
-{
-    bool bs_busy;
-
-    bdrv_flush_io_queue(bs);
-    bdrv_start_throttled_reqs(bs);
-    bs_busy = bdrv_requests_pending(bs);
-    bs_busy |= aio_poll(bdrv_get_aio_context(bs), bs_busy);
-    return bs_busy;
-}
-
 /*
  * Wait for pending requests to complete on a single BlockDriverState subtree
  *
@@ -2010,8 +1999,13 @@ static bool bdrv_drain_one(BlockDriverState *bs)
  */
 void bdrv_drain(BlockDriverState *bs)
 {
-    while (bdrv_drain_one(bs)) {
+    bool busy = true;
+
+    while (busy) {
         /* Keep iterating */
+        bdrv_flush_io_queue(bs);
+        busy = bdrv_requests_pending(bs);
+        busy |= aio_poll(bdrv_get_aio_context(bs), busy);
     }
 }
 
@@ -2030,20 +2024,35 @@ void bdrv_drain(BlockDriverState *bs)
 void bdrv_drain_all(void)
 {
     /* Always run first iteration so any pending completion BHs run */
-    bool busy = true;
+    bool busy = true, pending = false;
     BlockDriverState *bs;
+    GSList *aio_ctxs = NULL, *ctx;
+    AioContext *aio_context;
 
     while (busy) {
         busy = false;
 
         QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
-            AioContext *aio_context = bdrv_get_aio_context(bs);
+            aio_context = bdrv_get_aio_context(bs);
+
+            aio_context_acquire(aio_context);
+            bdrv_flush_io_queue(bs);
+            busy |= bdrv_requests_pending(bs);
+            aio_context_release(aio_context);
+            if (!aio_ctxs || !g_slist_find(aio_ctxs, aio_context)) {
+                aio_ctxs = g_slist_prepend(aio_ctxs, aio_context);
+            }
+        }
+        pending = busy;
 
+        for (ctx = aio_ctxs; ctx != NULL; ctx = ctx->next) {
+            aio_context = ctx->data;
             aio_context_acquire(aio_context);
-            busy |= bdrv_drain_one(bs);
+            busy |= aio_poll(aio_context, pending);
             aio_context_release(aio_context);
         }
     }
+    g_slist_free(aio_ctxs);
 }
 
 /* make a BlockDriverState anonymous by removing from bdrv_state and
@@ -6087,6 +6096,7 @@ void bdrv_flush_io_queue(BlockDriverState *bs)
     } else if (bs->file) {
         bdrv_flush_io_queue(bs->file);
     }
+    bdrv_start_throttled_reqs(bs);
 }
 
 static bool append_open_options(QDict *d, BlockDriverState *bs)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH v2] block: Let bdrv_drain_all() to call aio_poll() for each AioContext
  2015-05-14 16:03 [Qemu-devel] [PATCH v2] block: Let bdrv_drain_all() to call aio_poll() for each AioContext Alexander Yarygin
@ 2015-05-15  2:04 ` Fam Zheng
  2015-05-15  6:59 ` Christian Borntraeger
  1 sibling, 0 replies; 5+ messages in thread
From: Fam Zheng @ 2015-05-15  2:04 UTC (permalink / raw)
  To: Alexander Yarygin
  Cc: Kevin Wolf, qemu-block, Ekaterina Tumanova, qemu-devel,
	Christian Borntraeger, Stefan Hajnoczi, Cornelia Huck,
	Paolo Bonzini

On Thu, 05/14 19:03, Alexander Yarygin wrote:
> After the commit 9b536adc ("block: acquire AioContext in
> bdrv_drain_all()") the aio_poll() function got called for every
> BlockDriverState, in assumption that every device may have its own
> AioContext. The bdrv_drain_all() function is called in each
> virtio_reset() call, which in turn is called for every virtio-blk
> device on initialization, so we got aio_poll() called
> 'length(device_list)^2' times.
> 
> If we have thousands of disks attached, there are a lot of
> BlockDriverStates but only a few AioContexts, leading to tons of
> unnecessary aio_poll() calls. For example, startup times with 1000 disks
> takes over 13 minutes.
> 
> This patch changes the bdrv_drain_all() function allowing it find shared
> AioContexts and to call aio_poll() only for unique ones. This results in
> much better startup times, e.g. 1000 disks do come up within 5 seconds.
> 
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
> ---
>  block.c | 40 +++++++++++++++++++++++++---------------

This doesn't apply to current master, the function has already changed and is
not in block/io.c, could you rebase it?

>  1 file changed, 25 insertions(+), 15 deletions(-)
> 
> diff --git a/block.c b/block.c
> index f2f8ae7..bdfb1ce 100644
> --- a/block.c
> +++ b/block.c
> @@ -1987,17 +1987,6 @@ static bool bdrv_requests_pending(BlockDriverState *bs)
>      return false;
>  }
>  
> -static bool bdrv_drain_one(BlockDriverState *bs)
> -{
> -    bool bs_busy;
> -
> -    bdrv_flush_io_queue(bs);
> -    bdrv_start_throttled_reqs(bs);
> -    bs_busy = bdrv_requests_pending(bs);
> -    bs_busy |= aio_poll(bdrv_get_aio_context(bs), bs_busy);
> -    return bs_busy;
> -}
> -
>  /*
>   * Wait for pending requests to complete on a single BlockDriverState subtree
>   *
> @@ -2010,8 +1999,13 @@ static bool bdrv_drain_one(BlockDriverState *bs)
>   */
>  void bdrv_drain(BlockDriverState *bs)
>  {
> -    while (bdrv_drain_one(bs)) {
> +    bool busy = true;
> +
> +    while (busy) {
>          /* Keep iterating */
> +        bdrv_flush_io_queue(bs);
> +        busy = bdrv_requests_pending(bs);
> +        busy |= aio_poll(bdrv_get_aio_context(bs), busy);
>      }
>  }
>  
> @@ -2030,20 +2024,35 @@ void bdrv_drain(BlockDriverState *bs)
>  void bdrv_drain_all(void)
>  {
>      /* Always run first iteration so any pending completion BHs run */
> -    bool busy = true;
> +    bool busy = true, pending = false;
>      BlockDriverState *bs;
> +    GSList *aio_ctxs = NULL, *ctx;
> +    AioContext *aio_context;
>  
>      while (busy) {
>          busy = false;
>  
>          QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
> -            AioContext *aio_context = bdrv_get_aio_context(bs);
> +            aio_context = bdrv_get_aio_context(bs);
> +
> +            aio_context_acquire(aio_context);
> +            bdrv_flush_io_queue(bs);
> +            busy |= bdrv_requests_pending(bs);
> +            aio_context_release(aio_context);
> +            if (!aio_ctxs || !g_slist_find(aio_ctxs, aio_context)) {
> +                aio_ctxs = g_slist_prepend(aio_ctxs, aio_context);
> +            }
> +        }
> +        pending = busy;
>  
> +        for (ctx = aio_ctxs; ctx != NULL; ctx = ctx->next) {
> +            aio_context = ctx->data;
>              aio_context_acquire(aio_context);
> -            busy |= bdrv_drain_one(bs);
> +            busy |= aio_poll(aio_context, pending);
>              aio_context_release(aio_context);
>          }
>      }
> +    g_slist_free(aio_ctxs);
>  }

How do you make sure that the second loop doesn't queue any request onto
bs->throttled_reqs?

Fam

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH v2] block: Let bdrv_drain_all() to call aio_poll() for each AioContext
  2015-05-14 16:03 [Qemu-devel] [PATCH v2] block: Let bdrv_drain_all() to call aio_poll() for each AioContext Alexander Yarygin
  2015-05-15  2:04 ` Fam Zheng
@ 2015-05-15  6:59 ` Christian Borntraeger
  2015-05-15  7:00   ` Christian Borntraeger
  2015-05-15  8:16   ` Christian Borntraeger
  1 sibling, 2 replies; 5+ messages in thread
From: Christian Borntraeger @ 2015-05-15  6:59 UTC (permalink / raw)
  To: Alexander Yarygin, qemu-devel
  Cc: Kevin Wolf, qemu-block, Ekaterina Tumanova, Stefan Hajnoczi,
	Cornelia Huck, Paolo Bonzini

Am 14.05.2015 um 18:03 schrieb Alexander Yarygin:
> After the commit 9b536adc ("block: acquire AioContext in
> bdrv_drain_all()") the aio_poll() function got called for every
> BlockDriverState, in assumption that every device may have its own
> AioContext. The bdrv_drain_all() function is called in each
> virtio_reset() call, which in turn is called for every virtio-blk
> device on initialization, so we got aio_poll() called
> 'length(device_list)^2' times.
> 
> If we have thousands of disks attached, there are a lot of
> BlockDriverStates but only a few AioContexts, leading to tons of
> unnecessary aio_poll() calls. For example, startup times with 1000 disks
> takes over 13 minutes.
> 
> This patch changes the bdrv_drain_all() function allowing it find shared
> AioContexts and to call aio_poll() only for unique ones. This results in
> much better startup times, e.g. 1000 disks do come up within 5 seconds.
> 
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>

Applying on top of 2.3 I can verify the speedup.
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>

PS: There is another independent issue now in the kernel when exiting QEMU
caused by Linux kernel commit 6098b45b32e6baeacc04790773ced9340601d511
Author:     Gu Zheng <guz.fnst@cn.fujitsu.com>
AuthorDate: Wed Sep 3 17:45:44 2014 +0800
Commit:     Benjamin LaHaise <bcrl@kvack.org>
CommitDate: Thu Sep 4 16:54:47 2014 -0400

    aio: block exit_aio() until all context requests are completed

A QEMU with 1000 devices will sleep a long time in exit_aio with now obvious
sign of activity as zombie process. I will take care of that....

Christian



> ---
>  block.c | 40 +++++++++++++++++++++++++---------------
>  1 file changed, 25 insertions(+), 15 deletions(-)
> 
> diff --git a/block.c b/block.c
> index f2f8ae7..bdfb1ce 100644
> --- a/block.c
> +++ b/block.c
> @@ -1987,17 +1987,6 @@ static bool bdrv_requests_pending(BlockDriverState *bs)
>      return false;
>  }
> 
> -static bool bdrv_drain_one(BlockDriverState *bs)
> -{
> -    bool bs_busy;
> -
> -    bdrv_flush_io_queue(bs);
> -    bdrv_start_throttled_reqs(bs);
> -    bs_busy = bdrv_requests_pending(bs);
> -    bs_busy |= aio_poll(bdrv_get_aio_context(bs), bs_busy);
> -    return bs_busy;
> -}
> -
>  /*
>   * Wait for pending requests to complete on a single BlockDriverState subtree
>   *
> @@ -2010,8 +1999,13 @@ static bool bdrv_drain_one(BlockDriverState *bs)
>   */
>  void bdrv_drain(BlockDriverState *bs)
>  {
> -    while (bdrv_drain_one(bs)) {
> +    bool busy = true;
> +
> +    while (busy) {
>          /* Keep iterating */
> +        bdrv_flush_io_queue(bs);
> +        busy = bdrv_requests_pending(bs);
> +        busy |= aio_poll(bdrv_get_aio_context(bs), busy);
>      }
>  }
> 
> @@ -2030,20 +2024,35 @@ void bdrv_drain(BlockDriverState *bs)
>  void bdrv_drain_all(void)
>  {
>      /* Always run first iteration so any pending completion BHs run */
> -    bool busy = true;
> +    bool busy = true, pending = false;
>      BlockDriverState *bs;
> +    GSList *aio_ctxs = NULL, *ctx;
> +    AioContext *aio_context;
> 
>      while (busy) {
>          busy = false;
> 
>          QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
> -            AioContext *aio_context = bdrv_get_aio_context(bs);
> +            aio_context = bdrv_get_aio_context(bs);
> +
> +            aio_context_acquire(aio_context);
> +            bdrv_flush_io_queue(bs);
> +            busy |= bdrv_requests_pending(bs);
> +            aio_context_release(aio_context);
> +            if (!aio_ctxs || !g_slist_find(aio_ctxs, aio_context)) {
> +                aio_ctxs = g_slist_prepend(aio_ctxs, aio_context);
> +            }
> +        }
> +        pending = busy;
> 
> +        for (ctx = aio_ctxs; ctx != NULL; ctx = ctx->next) {
> +            aio_context = ctx->data;
>              aio_context_acquire(aio_context);
> -            busy |= bdrv_drain_one(bs);
> +            busy |= aio_poll(aio_context, pending);
>              aio_context_release(aio_context);
>          }
>      }
> +    g_slist_free(aio_ctxs);
>  }
> 
>  /* make a BlockDriverState anonymous by removing from bdrv_state and
> @@ -6087,6 +6096,7 @@ void bdrv_flush_io_queue(BlockDriverState *bs)
>      } else if (bs->file) {
>          bdrv_flush_io_queue(bs->file);
>      }
> +    bdrv_start_throttled_reqs(bs);
>  }
> 
>  static bool append_open_options(QDict *d, BlockDriverState *bs)
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH v2] block: Let bdrv_drain_all() to call aio_poll() for each AioContext
  2015-05-15  6:59 ` Christian Borntraeger
@ 2015-05-15  7:00   ` Christian Borntraeger
  2015-05-15  8:16   ` Christian Borntraeger
  1 sibling, 0 replies; 5+ messages in thread
From: Christian Borntraeger @ 2015-05-15  7:00 UTC (permalink / raw)
  To: Alexander Yarygin, qemu-devel
  Cc: Kevin Wolf, qemu-block, Ekaterina Tumanova, Stefan Hajnoczi,
	Cornelia Huck, Paolo Bonzini

Am 15.05.2015 um 08:59 schrieb Christian Borntraeger:
> PS: There is another independent issue now in the kernel when exiting QEMU
> caused by Linux kernel commit 6098b45b32e6baeacc04790773ced9340601d511
> Author:     Gu Zheng <guz.fnst@cn.fujitsu.com>
> AuthorDate: Wed Sep 3 17:45:44 2014 +0800
> Commit:     Benjamin LaHaise <bcrl@kvack.org>
> CommitDate: Thu Sep 4 16:54:47 2014 -0400
> 
>     aio: block exit_aio() until all context requests are completed
> 
> A QEMU with 1000 devices will sleep a long time in exit_aio with now obvious
> sign of activity as zombie process. I will take care of that....
> 
> Christian

To make it clear. This kernel wait time happens with and without Alexanders patch.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH v2] block: Let bdrv_drain_all() to call aio_poll() for each AioContext
  2015-05-15  6:59 ` Christian Borntraeger
  2015-05-15  7:00   ` Christian Borntraeger
@ 2015-05-15  8:16   ` Christian Borntraeger
  1 sibling, 0 replies; 5+ messages in thread
From: Christian Borntraeger @ 2015-05-15  8:16 UTC (permalink / raw)
  To: Alexander Yarygin, qemu-devel
  Cc: Kevin Wolf, qemu-block, Ekaterina Tumanova, Stefan Hajnoczi,
	Cornelia Huck, Paolo Bonzini

Am 15.05.2015 um 08:59 schrieb Christian Borntraeger:
> Am 14.05.2015 um 18:03 schrieb Alexander Yarygin:
>> After the commit 9b536adc ("block: acquire AioContext in
>> bdrv_drain_all()") the aio_poll() function got called for every
>> BlockDriverState, in assumption that every device may have its own
>> AioContext. The bdrv_drain_all() function is called in each
>> virtio_reset() call, which in turn is called for every virtio-blk
>> device on initialization, so we got aio_poll() called
>> 'length(device_list)^2' times.
>>
>> If we have thousands of disks attached, there are a lot of
>> BlockDriverStates but only a few AioContexts, leading to tons of
>> unnecessary aio_poll() calls. For example, startup times with 1000 disks
>> takes over 13 minutes.
>>
>> This patch changes the bdrv_drain_all() function allowing it find shared
>> AioContexts and to call aio_poll() only for unique ones. This results in
>> much better startup times, e.g. 1000 disks do come up within 5 seconds.
>>
>> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
>> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
>> Cc: Kevin Wolf <kwolf@redhat.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>> Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
> 
> Applying on top of 2.3 I can verify the speedup.
> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>

Hmmm. When I enable iothreads for all of these devices I get hangs. So
lets defer my Tested-by until I understand that :-(

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-05-15  8:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-14 16:03 [Qemu-devel] [PATCH v2] block: Let bdrv_drain_all() to call aio_poll() for each AioContext Alexander Yarygin
2015-05-15  2:04 ` Fam Zheng
2015-05-15  6:59 ` Christian Borntraeger
2015-05-15  7:00   ` Christian Borntraeger
2015-05-15  8:16   ` Christian Borntraeger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.