From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56270) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cfpI9-0006mM-Ke for qemu-devel@nongnu.org; Mon, 20 Feb 2017 09:50:18 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cfpI7-0006DA-3M for qemu-devel@nongnu.org; Mon, 20 Feb 2017 09:50:17 -0500 Date: Mon, 20 Feb 2017 14:50:11 +0000 From: Stefan Hajnoczi Message-ID: <20170220145011.GJ21255@stefanha-x1.localdomain> References: <1487347224-8667-1-git-send-email-pl@kamp.de> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="SBT+cnFS/G3NVgv4" Content-Disposition: inline In-Reply-To: <1487347224-8667-1-git-send-email-pl@kamp.de> Subject: Re: [Qemu-devel] [Qemu-block] [RFC PATCH V4] qemu-img: make convert async List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Lieven Cc: qemu-devel@nongnu.org, kwolf@redhat.com, famz@redhat.com, qemu-block@nongnu.org, mreitz@redhat.com --SBT+cnFS/G3NVgv4 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Feb 17, 2017 at 05:00:24PM +0100, Peter Lieven wrote: > this is something I have been thinking about for almost 2 years now. > we heavily have the following two use cases when using qemu-img convert. >=20 > a) reading from NFS and writing to iSCSI for deploying templates > b) reading from iSCSI and writing to NFS for backups >=20 > In both processes we use libiscsi and libnfs so we have no kernel pagecac= he. > As qemu-img convert is implemented with sync operations that means we > read one buffer and then write it. No parallelism and each sync request > takes as long as it takes until it is completed. >=20 > This is version 4 of the approach using coroutine worker "threads". >=20 > So far I have the following runtimes when reading an uncompressed QCOW2 f= rom > NFS and writing it to iSCSI (raw): >=20 > qemu-img (master) > nfs -> iscsi 22.8 secs > nfs -> ram 11.7 secs > ram -> iscsi 12.3 secs >=20 > qemu-img-async (8 coroutines, in-order write disabled) > nfs -> iscsi 11.0 secs > nfs -> ram 10.4 secs > ram -> iscsi 9.0 secs >=20 > The following are the runtimes found with different settings between V3 a= nd V4. > This is always the best runtime out of 10 runs when converting from nfs t= o iscsi. > Please note that in V4 in-order write scenarios show a very high jitter. = I think > this is because the get_block_status on the NFS share is delayed by concu= rrent read > requests. >=20 > in-order out-of-order > V3 - 16 coroutines 12.4 seconds 11.1 seconds > - 8 coroutines 12.2 seconds 11.3 seconds > - 4 coroutines 12.5 seconds 11.1 seconds > - 2 coroutines 14.8 seconds 14.9 seconds >=20 > V4 - 32 coroutines 15.9 seconds 11.5 seconds > - 16 coroutines 12.5 seconds 11.0 seconds > - 8 coroutines 12.9 seconds 11.0 seconds > - 4 coroutines 14.1 seconds 11.5 seconds > - 2 coroutines 16.9 seconds 13.2 seconds Does this patch work with compressed images? Especially the out-of-order write mode may be problematic with a compressed qcow2 image. How should a user decide between in-order and out-of-order? > @@ -1651,12 +1680,117 @@ static int convert_write(ImgConvertState *s, int= 64_t sector_num, int nb_sectors, > return 0; > } > =20 > -static int convert_do_copy(ImgConvertState *s) > +static void convert_co_do_copy(void *opaque) Missing coroutine_fn here and for convert_co_read()/convert_co_write(). Functions that must be called from coroutine context (because they yield, use coroutine mutexes, etc) need to be marked as such. > + if (s->wr_in_order) { > + /* reenter the coroutine that might have waited > + * for this write to complete */ > + s->wr_offs =3D sector_num + n; > + for (i =3D 0; i < s->num_coroutines; i++) { > + if (s->co[i] && s->wait_sector_num[i] =3D=3D s->wr_offs)= { > + qemu_coroutine_enter(s->co[i]); > + break; This qemu_coroutine_enter() call relies on the yield pattern between sibling coroutines having no recursive qemu_coroutine_enter() calls. QEMU aborts if there is a code path where coroutine A enters B and then B enters A again before yielding. Paolo's new aio_co_wake() API solves this issue by deferring the qemu_coroutine_enter() to the event loop. It's similar to CoQueue wakeup. aio_co_wake() is part of my latest block pull request (should be merged into qemu.git soon). I *think* this patch has no A -> B -> A situation thanks to yields in the code path, but it would be nicer to use aio_co_wake() where this can never happen. > + case 'm': > + num_coroutines =3D atoi(optarg); > + if (num_coroutines > MAX_COROUTINES) { > + error_report("Maximum allowed number of coroutines is %d= ", > + MAX_COROUTINES); > + ret =3D -1; > + goto fail_getopt; > + } Missing input validation for the < 1 case. --SBT+cnFS/G3NVgv4 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEcBAEBAgAGBQJYqwIiAAoJEJykq7OBq3PI2EkIALSFSgb3qhWF5MWBCIZjZvII 9xDHiMhdBXazEXwfEErvLzZ5OLFk15iC4e62gqSNimXhLjMeq0t/Y/5zVN1XYJvO 14hg0srE8wVcVvfRdhAcIsQL5YHB86cwNdiW03CBmsAYCn4kKwMJPpMJh60Bj7iB TyzDvxfXee0GBSnCbyT/hhcFSVxMgCPAJZlFxyHRKs/jJ+jF10mSSiIjja6DLLKN cjL5BmS+WTsSQeaUYwt4nws1RTyu+89BEP7dZEnX+yn+eo5wIKb678aSUcApkS/v i5DuISGcn8VyUU8ymARkKpfpkN2yrWBnUiFGL3N5EBjfHvNqBHxHrknoGXBjF0c= =e8tq -----END PGP SIGNATURE----- --SBT+cnFS/G3NVgv4--