From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58222) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cfpRP-000756-RO for qemu-devel@nongnu.org; Mon, 20 Feb 2017 09:59:52 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cfpRL-0008Fz-R0 for qemu-devel@nongnu.org; Mon, 20 Feb 2017 09:59:51 -0500 Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:38475 helo=mx01.kamp.de) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cfpRL-0008Ej-HU for qemu-devel@nongnu.org; Mon, 20 Feb 2017 09:59:47 -0500 References: <1487347224-8667-1-git-send-email-pl@kamp.de> <20170220145011.GJ21255@stefanha-x1.localdomain> From: Peter Lieven Message-ID: <73412b92-353d-7109-cfbe-9c61bf91e7eb@kamp.de> Date: Mon, 20 Feb 2017 15:59:42 +0100 MIME-Version: 1.0 In-Reply-To: <20170220145011.GJ21255@stefanha-x1.localdomain> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [Qemu-block] [RFC PATCH V4] qemu-img: make convert async List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: qemu-devel@nongnu.org, kwolf@redhat.com, famz@redhat.com, qemu-block@nongnu.org, mreitz@redhat.com Am 20.02.2017 um 15:50 schrieb Stefan Hajnoczi: > On Fri, Feb 17, 2017 at 05:00:24PM +0100, Peter Lieven wrote: >> this is something I have been thinking about for almost 2 years now. >> we heavily have the following two use cases when using qemu-img convert. >> >> a) reading from NFS and writing to iSCSI for deploying templates >> b) reading from iSCSI and writing to NFS for backups >> >> In both processes we use libiscsi and libnfs so we have no kernel pagecache. >> As qemu-img convert is implemented with sync operations that means we >> read one buffer and then write it. No parallelism and each sync request >> takes as long as it takes until it is completed. >> >> This is version 4 of the approach using coroutine worker "threads". >> >> So far I have the following runtimes when reading an uncompressed QCOW2 from >> NFS and writing it to iSCSI (raw): >> >> qemu-img (master) >> nfs -> iscsi 22.8 secs >> nfs -> ram 11.7 secs >> ram -> iscsi 12.3 secs >> >> qemu-img-async (8 coroutines, in-order write disabled) >> nfs -> iscsi 11.0 secs >> nfs -> ram 10.4 secs >> ram -> iscsi 9.0 secs >> >> The following are the runtimes found with different settings between V3 and V4. >> This is always the best runtime out of 10 runs when converting from nfs to iscsi. >> Please note that in V4 in-order write scenarios show a very high jitter. I think >> this is because the get_block_status on the NFS share is delayed by concurrent read >> requests. >> >> in-order out-of-order >> V3 - 16 coroutines 12.4 seconds 11.1 seconds >> - 8 coroutines 12.2 seconds 11.3 seconds >> - 4 coroutines 12.5 seconds 11.1 seconds >> - 2 coroutines 14.8 seconds 14.9 seconds >> >> V4 - 32 coroutines 15.9 seconds 11.5 seconds >> - 16 coroutines 12.5 seconds 11.0 seconds >> - 8 coroutines 12.9 seconds 11.0 seconds >> - 4 coroutines 14.1 seconds 11.5 seconds >> - 2 coroutines 16.9 seconds 13.2 seconds > Does this patch work with compressed images? Especially the > out-of-order write mode may be problematic with a compressed qcow2 image. It does, but you are right, out-of-order writes and compression should be mutually exclusive. > > How should a user decide between in-order and out-of-order? In general, out of order is ok, when we write to a preallocated device (e.g. all raw devices or qcow2 with preallocation=full). default is in-order write. the user has to manually enable out of order. > >> @@ -1651,12 +1680,117 @@ static int convert_write(ImgConvertState *s, int64_t sector_num, int nb_sectors, >> return 0; >> } >> >> -static int convert_do_copy(ImgConvertState *s) >> +static void convert_co_do_copy(void *opaque) > Missing coroutine_fn here and for convert_co_read()/convert_co_write(). > Functions that must be called from coroutine context (because they > yield, use coroutine mutexes, etc) need to be marked as such. okay. > >> + if (s->wr_in_order) { >> + /* reenter the coroutine that might have waited >> + * for this write to complete */ >> + s->wr_offs = sector_num + n; >> + for (i = 0; i < s->num_coroutines; i++) { >> + if (s->co[i] && s->wait_sector_num[i] == s->wr_offs) { >> + qemu_coroutine_enter(s->co[i]); >> + break; > This qemu_coroutine_enter() call relies on the yield pattern between > sibling coroutines having no recursive qemu_coroutine_enter() calls. > QEMU aborts if there is a code path where coroutine A enters B and then > B enters A again before yielding. > > Paolo's new aio_co_wake() API solves this issue by deferring the > qemu_coroutine_enter() to the event loop. It's similar to CoQueue > wakeup. aio_co_wake() is part of my latest block pull request (should > be merged into qemu.git soon). > > I *think* this patch has no A -> B -> A situation thanks to yields in > the code path, but it would be nicer to use aio_co_wake() where this can > never happen. I also believe this can't happen. Would it be also okay to use qemu_coroutine_enter_if_inactive() here? > >> + case 'm': >> + num_coroutines = atoi(optarg); >> + if (num_coroutines > MAX_COROUTINES) { >> + error_report("Maximum allowed number of coroutines is %d", >> + MAX_COROUTINES); >> + ret = -1; >> + goto fail_getopt; >> + } > Missing input validation for the < 1 case. Right. Thank you, Peter