All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: "Kevin Wolf" <kwolf@redhat.com>,
	"Tingting Mao" <timao@redhat.com>,
	qemu-devel@nongnu.org, "Honghao Wang" <wanghonghao@bytedance.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>,
	"Daniele Buono" <dbuono@linux.vnet.ibm.com>,
	"Serge Guelton" <sguelton@redhat.com>
Subject: Re: [PATCH] coroutine: resize pool periodically instead of limiting size
Date: Thu, 9 Sep 2021 16:51:44 +0100	[thread overview]
Message-ID: <YTotkCiuqTeDgJJ0@stefanha-x1.localdomain> (raw)
In-Reply-To: <YTnHwJ/0O4rk7M7g@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3738 bytes --]

On Thu, Sep 09, 2021 at 09:37:20AM +0100, Daniel P. Berrangé wrote:
> On Wed, Sep 01, 2021 at 05:09:23PM +0100, Stefan Hajnoczi wrote:
> > It was reported that enabling SafeStack reduces IOPS significantly
> > (>25%) with the following fio benchmark on virtio-blk using a NVMe host
> > block device:
> > 
> >   # fio --rw=randrw --bs=4k --iodepth=64 --runtime=1m --direct=1 \
> > 	--filename=/dev/vdb --name=job1 --ioengine=libaio --thread \
> > 	--group_reporting --numjobs=16 --time_based \
> >         --output=/tmp/fio_result
> > 
> > Serge Guelton and I found that SafeStack is not really at fault, it just
> > increases the cost of coroutine creation. This fio workload exhausts the
> > coroutine pool and coroutine creation becomes a bottleneck. Previous
> > work by Honghao Wang also pointed to excessive coroutine creation.
> > 
> > Creating new coroutines is expensive due to allocating new stacks with
> > mmap(2) and mprotect(2). Currently there are thread-local and global
> > pools that recycle old Coroutine objects and their stacks but the
> > hardcoded size limit of 64 for thread-local pools and 128 for the global
> > pool is insufficient for the fio benchmark shown above.
> 
> Rather than keeping around a thread local pool of coroutine
> instances, did you ever consider keeping around a pool of
> allocated stacks ? Essentially it seems like you're syaing
> the stack allocation is the problem due to it using mmap()
> instead of malloc() and thus not benefiting from any of the
> performance tricks malloc() impls use to avoid repeated
> syscalls on every allocation.  If 'qemu_alloc_stack' and
> qemu_free_stack could be made more intelligent by caching
> stacks, then perhaps the coroutine side can be left "dumb" ?

What is the advantage of doing that? Then the Coroutine struct needs to
be malloced each time. Coroutines are the only users of
qemu_alloc_stack(), so I think pooling the Coroutines is optimal.

> > 
> > This patch changes the coroutine pool algorithm to a simple thread-local
> > pool without a size limit. Threads periodically shrink the pool down to
> > a size sufficient for the maximum observed number of coroutines.
> > 
> > This is a very simple algorithm. Fancier things could be done like
> > keeping a minimum number of coroutines around to avoid latency when a
> > new coroutine is created after a long period of inactivity. Another
> > thought is to stop the timer when the pool size is zero for power saving
> > on threads that aren't using coroutines. However, I'd rather not add
> > bells and whistles unless they are really necessary.
> > 
> > The global pool is removed by this patch. It can help to hide the fact
> > that local pools are easily exhausted, but it's doesn't fix the root
> > cause. I don't think there is a need for a global pool because QEMU's
> > threads are long-lived, so let's keep things simple.
> 
> QEMU's threads may be long-lived,  but are the workloads they service
> expected to be consistent over time?
> 
> eg can we ever get a situation where Thread A has a peak of activity
> causing caching of many coroutines, and then go idle for a long time,
> while Thread B then has a peak and is unable to take advantage of the
> cache from Thread A that is now unused ?

It's possible to trigger that case, but it's probably not a real-world
scenario. It requires a storage controller that is emulated in the vCPU
thread (like AHCI) and a workload that alternates between vCPUs.
However, that configuration isn't expected to perform well anyway
(virtio-blk and virtio-scsi use IOThreads instead of running emulation
in vCPU threads because this reduces the cost of the vmexit).

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2021-09-09 15:53 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-01 16:09 [PATCH] coroutine: resize pool periodically instead of limiting size Stefan Hajnoczi
2021-09-02  8:10 ` Stefan Hajnoczi
2021-09-09  2:30   ` Daniele Buono
2021-09-09 15:53     ` Stefan Hajnoczi
2021-09-02 10:24 ` Philippe Mathieu-Daudé
2021-09-09  8:37 ` Daniel P. Berrangé
2021-09-09 15:51   ` Stefan Hajnoczi [this message]
2021-09-09 16:08     ` Daniel P. Berrangé
2021-09-13 12:40       ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YTotkCiuqTeDgJJ0@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dbuono@linux.vnet.ibm.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=sguelton@redhat.com \
    --cc=timao@redhat.com \
    --cc=wanghonghao@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.