From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40842) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zvp9s-00078s-1G for qemu-devel@nongnu.org; Mon, 09 Nov 2015 11:19:04 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zvp9r-0002dK-2M for qemu-devel@nongnu.org; Mon, 09 Nov 2015 11:19:04 -0500 References: <1446805353-3047-1-git-send-email-famz@redhat.com> <20151109160409.GC3621@noname.redhat.com> From: Paolo Bonzini Message-ID: <5640C768.50704@redhat.com> Date: Mon, 9 Nov 2015 17:18:48 +0100 MIME-Version: 1.0 In-Reply-To: <20151109160409.GC3621@noname.redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH] mirror: Improve zero-write and discard with fragmented image List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf , Fam Zheng Cc: Jeff Cody , qemu-devel@nongnu.org, qemu-block@nongnu.org On 09/11/2015 17:04, Kevin Wolf wrote: > Am 06.11.2015 um 11:22 hat Fam Zheng geschrieben: >> The "pnum < nb_sectors" condition in deciding whether to actually copy >> data is unnecessarily strict, and the qiov initialization is >> unnecessarily too, for both bdrv_aio_write_zeroes and bdrv_aio_discard >> branches. >> >> Reorganize mirror_iteration flow so that we: >> >> 1) Find the contiguous zero/discarded sectors with >> bdrv_get_block_status_above() before deciding what to do. We query >> s->buf_size sized blocks at a time. >> >> 2) If the sectors in question are zeroed/discarded and aligned to >> target cluster, issue zero write or discard accordingly. It's done >> in mirror_do_zero_or_discard, where we don't add buffer to qiov. >> >> 3) Otherwise, do the same loop as before in mirror_do_read. >> >> Signed-off-by: Fam Zheng >=20 > I'm not sure where in the patch to comment on this, so I'll just do it > here right in the beginning. >=20 > I'm concerned that we need to be more careful about races in this patch= , > in particular regarding the bitmaps. I think the conditions for the two > bitmaps are: >=20 > * Dirty bitmap: We must clear the bit after finding the next piece of > data to be mirrored, but before we yield after getting information > that we use for the decision which kind of operation we need. >=20 > In other words, we need to clear the dirty bitmap bit before calling > bdrv_get_block_status_above(), because that's both the function that > retrieves information about the next chunk and also a function that > can yield. >=20 > If after this point the data is written to, we need to mirror it > again. With Fam's patch, that's not trivial for two reasons: 1) bdrv_get_block_status_above() can return a smaller amount than what is asked. 2) the "read and write" case can handle s->granularity sectors per iteration (many of them can be coalesced, but still that's how the iteration works). The simplest solution is to perform the query with s->granularity size rather than s->buf_size. Paolo > * In-flight bitmap: We need to make sure that we never mirror the same > data twice at the same time as older data could overwrite newer data > then. >=20 > Strictly speaking, it looks to me as if this meant we can delay > setting the bit until before we issue an AIO operation. It might be > more obviously correct to set it at the same time as the dirty bitmap > is cleared.