From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43243) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c9UDJ-0007U0-Oa for qemu-devel@nongnu.org; Wed, 23 Nov 2016 04:51:38 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c9UDG-0003mV-MD for qemu-devel@nongnu.org; Wed, 23 Nov 2016 04:51:37 -0500 Received: from mail-wm0-x243.google.com ([2a00:1450:400c:c09::243]:35747) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1c9UDG-0003lu-Eu for qemu-devel@nongnu.org; Wed, 23 Nov 2016 04:51:34 -0500 Received: by mail-wm0-x243.google.com with SMTP id a20so1371755wme.2 for ; Wed, 23 Nov 2016 01:51:34 -0800 (PST) Date: Wed, 23 Nov 2016 09:51:31 +0000 From: Stefan Hajnoczi Message-ID: <20161123095131.GE20034@stefanha-x1.localdomain> References: <1479832306-26440-1-git-send-email-stefanha@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="qFgkTsE6LiHkLPZw" Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] [PATCH v3 00/10] aio: experimental virtio-blk polling mode List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Christian Borntraeger Cc: Stefan Hajnoczi , qemu-devel@nongnu.org, Paolo Bonzini , Fam Zheng , Karl Rister --qFgkTsE6LiHkLPZw Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Nov 22, 2016 at 08:21:16PM +0100, Christian Borntraeger wrote: > On 11/22/2016 05:31 PM, Stefan Hajnoczi wrote: > > v3: > > * Avoid ppoll(2)/epoll_wait(2) if polling succeeded [Paolo] > > * Disable guest->host virtqueue notification during polling [Christian] > > * Rebased on top of my virtio-blk/scsi virtqueue notification disable = patches > >=20 > > v2: > > * Uninitialized node->deleted gone [Fam] > > * Removed 1024 polling loop iteration qemu_clock_get_ns() optimization= which > > created a weird step pattern [Fam] > > * Unified with AioHandler, dropped AioPollHandler struct [Paolo] > > (actually I think Paolo had more in mind but this is the first step) > > * Only poll when all event loop resources support it [Paolo] > > * Added run_poll_handlers_begin/end trace events for perf analysis > > * Sorry, Christian, no virtqueue kick suppression yet > >=20 > > Recent performance investigation work done by Karl Rister shows that the > > guest->host notification takes around 20 us. This is more than the "ov= erhead" > > of QEMU itself (e.g. block layer). > >=20 > > One way to avoid the costly exit is to use polling instead of notificat= ion. > > The main drawback of polling is that it consumes CPU resources. In ord= er to > > benefit performance the host must have extra CPU cycles available on ph= ysical > > CPUs that aren't used by the guest. > >=20 > > This is an experimental AioContext polling implementation. It adds a p= olling > > callback into the event loop. Polling functions are implemented for vi= rtio-blk > > virtqueue guest->host kick and Linux AIO completion. > >=20 > > The QEMU_AIO_POLL_MAX_NS environment variable sets the number of nanose= conds to > > poll before entering the usual blocking poll(2) syscall. Try setting t= his > > variable to the time from old request completion to new virtqueue kick. > >=20 > > By default no polling is done. The QEMU_AIO_POLL_MAX_NS must be set to= get any > > polling! >=20 > The notification suppression alone gives me about 10% for a single disk i= n fio throughput. > (It seems that more disks make it help less???). In a scenario with many disks there will be lots of notifications either way. ioeventfd offers a form of batching because it will coalesce multiple notifications to the same virtqueue until QEMU gets around to reading the ioeventfd. In other words, under heavy load ioeventfd coalesces notifications so QEMU will process the virtqueue fewer times even though the number of vmexits is unchanged. Maybe this plays a role? > If I set polling to high values > (e.g. 500000) then the guest->host notification rate basically drops to z= ero, so it seems > to work as expected. Polling also seems to provide some benefit in the ra= nge of another > 10 percent (again only for a single disk?) >=20 > So in general this looks promising. We want to keep it disabled as here u= ntil we > have some grow/shrink heuristics. There is one thing that the kernel can = do, which we > cannot easily (check if the CPU is contented) and avoid polling in that c= ase. One wild > idea can be to use clock_gettime with CLOCK_THREAD_CPUTIME_ID and CLOCK_R= EALTIME and > shrink polling if we have been scheduled away. >=20 > The case "number of iothreads > number of cpus" looks better than in v1. = Have you fixed=20 > something? v1 had a premature optimization (bug) where it ignored the precise QEMU_MAX_AIO_POLL_NS value and instead ran in steps of 1024 polling iterations. Perhaps we're simply burning less CPU now since I removed the 1024 loop granularity. Glad that you are seeing improvements. Self-tuning grow/shrink heuristics is the next step so that polling can be used with real workloads. I'll investigate it for the next revision. --qFgkTsE6LiHkLPZw Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEcBAEBAgAGBQJYNWajAAoJEJykq7OBq3PIF58H/1VnLwza1webtb6V1+gmpCoa C2/rxPxzA2VeqZqPE3i4wkLU1ta5tMFBhkVBV6xxCxnKqOAoC92e4rooLzP9C3FF Sjri8cQbDEHWUNXSaeobdvz0UFC9rnerSLuvsVSc9h3faML+RRbxK+/sWuhxbNz8 uHwJFkGG5pNvMOtZqChlChG2gINs2W9LRIMo8OyIMFrdOA+21Ef8NcCXrMYusu2o 6R52yeNhoypOWeje0QnfMZs+dKz6ac0ZXal1/s5RHS47U8FK5kpkVfcCHOAuX5uz 2gy5Z2IfttokLe15Ir1nhA/NO3BtYo9YXEhkxL2Yj3gscVAlkUKtjoKaceBG27c= =8vl2 -----END PGP SIGNATURE----- --qFgkTsE6LiHkLPZw--