On Mon, Aug 24, 2020 at 12:31:21PM +0800, wanghonghao wrote:
> This patch replace the global coroutine queue with a lock-free stack of which
> the elements are coroutine queues. Threads can put coroutine queues into the
> stack or take queues from it and each coroutine queue has exactly
> POOL_BATCH_SIZE coroutines. Note that the stack is not strictly LIFO, but it's
> enough for buffer pool.
> 
> Coroutines will be put into thread-local pools first while release. Now the
> fast pathes of both allocation and release are atomic-free, and there won't
> be too many coroutines remain in a single thread since POOL_BATCH_SIZE has been
> reduced to 16.
> 
> In practice, I've run a VM with two block devices binding to two different
> iothreads, and run fio with iodepth 128 on each device. It maintains around
> 400 coroutines and has about 1% chance of calling to `qemu_coroutine_new`
> without this patch. And with this patch, it maintains no more than 273
> coroutines and doesn't call `qemu_coroutine_new` after initial allocations.

Does throughput or IOPS change?

Is the main purpose of this patch to reduce memory consumption?

Stefan