On Mon, Aug 24, 2020 at 12:31:21PM +0800, wanghonghao wrote: > This patch replace the global coroutine queue with a lock-free stack of which > the elements are coroutine queues. Threads can put coroutine queues into the > stack or take queues from it and each coroutine queue has exactly > POOL_BATCH_SIZE coroutines. Note that the stack is not strictly LIFO, but it's > enough for buffer pool. > > Coroutines will be put into thread-local pools first while release. Now the > fast pathes of both allocation and release are atomic-free, and there won't > be too many coroutines remain in a single thread since POOL_BATCH_SIZE has been > reduced to 16. > > In practice, I've run a VM with two block devices binding to two different > iothreads, and run fio with iodepth 128 on each device. It maintains around > 400 coroutines and has about 1% chance of calling to `qemu_coroutine_new` > without this patch. And with this patch, it maintains no more than 273 > coroutines and doesn't call `qemu_coroutine_new` after initial allocations. Does throughput or IOPS change? Is the main purpose of this patch to reduce memory consumption? Stefan