* [PATCH v3 1/2] QSLIST: add atomic replace operation
@ 2020-10-16 11:26 wanghonghao
2020-10-16 11:26 ` [PATCH v3 2/2] coroutine: take exactly one batch from global pool at a time wanghonghao
0 siblings, 1 reply; 4+ messages in thread
From: wanghonghao @ 2020-10-16 11:26 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, fam, wanghonghao, stefanha
Replace a queue with another atomicly. It's useful when we need to transfer
queues between threads.
Signed-off-by: wanghonghao <wanghonghao@bytedance.com>
---
include/qemu/queue.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index e029e7bf66..1f0cbdf87e 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -226,6 +226,10 @@ struct { \
(dest)->slh_first = qatomic_xchg(&(src)->slh_first, NULL); \
} while (/*CONSTCOND*/0)
+#define QSLIST_REPLACE_ATOMIC(dest, src, old) do { \
+ (old)->slh_first = qatomic_xchg(&(dest)->slh_first, (src)->slh_first); \
+} while (/*CONSTCOND*/0)
+
#define QSLIST_REMOVE_HEAD(head, field) do { \
typeof((head)->slh_first) elm = (head)->slh_first; \
(head)->slh_first = elm->field.sle_next; \
--
2.24.3 (Apple Git-128)
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v3 2/2] coroutine: take exactly one batch from global pool at a time
2020-10-16 11:26 [PATCH v3 1/2] QSLIST: add atomic replace operation wanghonghao
@ 2020-10-16 11:26 ` wanghonghao
2021-03-08 10:27 ` Stefan Hajnoczi
0 siblings, 1 reply; 4+ messages in thread
From: wanghonghao @ 2020-10-16 11:26 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, fam, wanghonghao, stefanha
This patch replace the global coroutine queue with a lock-free stack of which
the elements are coroutine queues. Threads can put coroutine queues into the
stack or take queues from it and each coroutine queue has exactly
POOL_BATCH_SIZE coroutines. Note that the stack is not strictly LIFO, but it's
enough for buffer pool.
Coroutines will be put into thread-local pools first while release. Now the
fast pathes of both allocation and release are atomic-free, and there won't
be too many coroutines remain in a single thread since POOL_BATCH_SIZE has been
reduced to 16.
In practice, I've run a VM with two block devices binding to two different
iothreads, and run fio with iodepth 128 on each device. It maintains around
400 coroutines and has about 1% chance of calling to `qemu_coroutine_new`
without this patch. And with this patch, it maintains no more than 273
coroutines and doesn't call `qemu_coroutine_new` after initial allocations.
Signed-off-by: wanghonghao <wanghonghao@bytedance.com>
---
util/qemu-coroutine.c | 63 ++++++++++++++++++++++++++++---------------
1 file changed, 42 insertions(+), 21 deletions(-)
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index 38fb6d3084..46e5073796 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -21,13 +21,14 @@
#include "block/aio.h"
enum {
- POOL_BATCH_SIZE = 64,
+ POOL_BATCH_SIZE = 16,
+ POOL_MAX_BATCHES = 32,
};
-/** Free list to speed up creation */
-static QSLIST_HEAD(, Coroutine) release_pool = QSLIST_HEAD_INITIALIZER(pool);
-static unsigned int release_pool_size;
-static __thread QSLIST_HEAD(, Coroutine) alloc_pool = QSLIST_HEAD_INITIALIZER(pool);
+/** Free stack to speed up creation */
+static QSLIST_HEAD(, Coroutine) pool[POOL_MAX_BATCHES];
+static int pool_top;
+static __thread QSLIST_HEAD(, Coroutine) alloc_pool;
static __thread unsigned int alloc_pool_size;
static __thread Notifier coroutine_pool_cleanup_notifier;
@@ -49,20 +50,26 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry, void *opaque)
if (CONFIG_COROUTINE_POOL) {
co = QSLIST_FIRST(&alloc_pool);
if (!co) {
- if (release_pool_size > POOL_BATCH_SIZE) {
- /* Slow path; a good place to register the destructor, too. */
- if (!coroutine_pool_cleanup_notifier.notify) {
- coroutine_pool_cleanup_notifier.notify = coroutine_pool_cleanup;
- qemu_thread_atexit_add(&coroutine_pool_cleanup_notifier);
+ int top;
+
+ /* Slow path; a good place to register the destructor, too. */
+ if (!coroutine_pool_cleanup_notifier.notify) {
+ coroutine_pool_cleanup_notifier.notify = coroutine_pool_cleanup;
+ qemu_thread_atexit_add(&coroutine_pool_cleanup_notifier);
+ }
+
+ while ((top = qatomic_read(&pool_top)) > 0) {
+ if (qatomic_cmpxchg(&pool_top, top, top - 1) != top) {
+ continue;
}
- /* This is not exact; there could be a little skew between
- * release_pool_size and the actual size of release_pool. But
- * it is just a heuristic, it does not need to be perfect.
- */
- alloc_pool_size = qatomic_xchg(&release_pool_size, 0);
- QSLIST_MOVE_ATOMIC(&alloc_pool, &release_pool);
+ QSLIST_MOVE_ATOMIC(&alloc_pool, &pool[top - 1]);
co = QSLIST_FIRST(&alloc_pool);
+
+ if (co) {
+ alloc_pool_size = POOL_BATCH_SIZE;
+ break;
+ }
}
}
if (co) {
@@ -86,16 +93,30 @@ static void coroutine_delete(Coroutine *co)
co->caller = NULL;
if (CONFIG_COROUTINE_POOL) {
- if (release_pool_size < POOL_BATCH_SIZE * 2) {
- QSLIST_INSERT_HEAD_ATOMIC(&release_pool, co, pool_next);
- qatomic_inc(&release_pool_size);
- return;
- }
+ int top, value, old;
+
if (alloc_pool_size < POOL_BATCH_SIZE) {
QSLIST_INSERT_HEAD(&alloc_pool, co, pool_next);
alloc_pool_size++;
return;
}
+
+ for (top = qatomic_read(&pool_top); top < POOL_MAX_BATCHES; top++) {
+ QSLIST_REPLACE_ATOMIC(&pool[top], &alloc_pool, &alloc_pool);
+ if (!QSLIST_EMPTY(&alloc_pool)) {
+ continue;
+ }
+
+ value = top + 1;
+
+ do {
+ old = qatomic_cmpxchg(&pool_top, top, value);
+ } while (old != top && (top = old) < value);
+
+ QSLIST_INSERT_HEAD(&alloc_pool, co, pool_next);
+ alloc_pool_size = 1;
+ return;
+ }
}
qemu_coroutine_delete(co);
--
2.24.3 (Apple Git-128)
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v3 2/2] coroutine: take exactly one batch from global pool at a time
2020-10-16 11:26 ` [PATCH v3 2/2] coroutine: take exactly one batch from global pool at a time wanghonghao
@ 2021-03-08 10:27 ` Stefan Hajnoczi
2021-03-11 3:27 ` [External] " 王洪浩
0 siblings, 1 reply; 4+ messages in thread
From: Stefan Hajnoczi @ 2021-03-08 10:27 UTC (permalink / raw)
To: wanghonghao; +Cc: kwolf, pbonzini, fam, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 1535 bytes --]
On Fri, Oct 16, 2020 at 07:26:40PM +0800, wanghonghao wrote:
> This patch replace the global coroutine queue with a lock-free stack of which
> the elements are coroutine queues. Threads can put coroutine queues into the
> stack or take queues from it and each coroutine queue has exactly
> POOL_BATCH_SIZE coroutines. Note that the stack is not strictly LIFO, but it's
> enough for buffer pool.
>
> Coroutines will be put into thread-local pools first while release. Now the
> fast pathes of both allocation and release are atomic-free, and there won't
> be too many coroutines remain in a single thread since POOL_BATCH_SIZE has been
> reduced to 16.
>
> In practice, I've run a VM with two block devices binding to two different
> iothreads, and run fio with iodepth 128 on each device. It maintains around
> 400 coroutines and has about 1% chance of calling to `qemu_coroutine_new`
> without this patch. And with this patch, it maintains no more than 273
> coroutines and doesn't call `qemu_coroutine_new` after initial allocations.
>
> Signed-off-by: wanghonghao <wanghonghao@bytedance.com>
> ---
> util/qemu-coroutine.c | 63 ++++++++++++++++++++++++++++---------------
> 1 file changed, 42 insertions(+), 21 deletions(-)
Hi,
I noticed this patch received no reviews. If you would still like to get
it merged, please rebase to qemu.git/master and resend the patch series.
Feel free to reply to your patches to remind maintainers if they have
not reviewed it after a few days.
Thanks,
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [External] Re: [PATCH v3 2/2] coroutine: take exactly one batch from global pool at a time
2021-03-08 10:27 ` Stefan Hajnoczi
@ 2021-03-11 3:27 ` 王洪浩
0 siblings, 0 replies; 4+ messages in thread
From: 王洪浩 @ 2021-03-11 3:27 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kwolf, pbonzini, fam, qemu-devel
Will do, thanks!
Stefan Hajnoczi <stefanha@redhat.com> 于2021年3月8日周一 下午6:27写道:
>
> On Fri, Oct 16, 2020 at 07:26:40PM +0800, wanghonghao wrote:
> > This patch replace the global coroutine queue with a lock-free stack of which
> > the elements are coroutine queues. Threads can put coroutine queues into the
> > stack or take queues from it and each coroutine queue has exactly
> > POOL_BATCH_SIZE coroutines. Note that the stack is not strictly LIFO, but it's
> > enough for buffer pool.
> >
> > Coroutines will be put into thread-local pools first while release. Now the
> > fast pathes of both allocation and release are atomic-free, and there won't
> > be too many coroutines remain in a single thread since POOL_BATCH_SIZE has been
> > reduced to 16.
> >
> > In practice, I've run a VM with two block devices binding to two different
> > iothreads, and run fio with iodepth 128 on each device. It maintains around
> > 400 coroutines and has about 1% chance of calling to `qemu_coroutine_new`
> > without this patch. And with this patch, it maintains no more than 273
> > coroutines and doesn't call `qemu_coroutine_new` after initial allocations.
> >
> > Signed-off-by: wanghonghao <wanghonghao@bytedance.com>
> > ---
> > util/qemu-coroutine.c | 63 ++++++++++++++++++++++++++++---------------
> > 1 file changed, 42 insertions(+), 21 deletions(-)
>
> Hi,
> I noticed this patch received no reviews. If you would still like to get
> it merged, please rebase to qemu.git/master and resend the patch series.
>
> Feel free to reply to your patches to remind maintainers if they have
> not reviewed it after a few days.
>
> Thanks,
> Stefan
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-03-11 3:29 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-16 11:26 [PATCH v3 1/2] QSLIST: add atomic replace operation wanghonghao
2020-10-16 11:26 ` [PATCH v3 2/2] coroutine: take exactly one batch from global pool at a time wanghonghao
2021-03-08 10:27 ` Stefan Hajnoczi
2021-03-11 3:27 ` [External] " 王洪浩
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).