From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56280) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cyzAM-0000RJ-Kb for qemu-devel@nongnu.org; Fri, 14 Apr 2017 07:13:27 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cyzAJ-0000sD-El for qemu-devel@nongnu.org; Fri, 14 Apr 2017 07:13:26 -0400 Received: from mail-he1eur01on0099.outbound.protection.outlook.com ([104.47.0.99]:55726 helo=EUR01-HE1-obe.outbound.protection.outlook.com) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cyzAI-0000pz-Th for qemu-devel@nongnu.org; Fri, 14 Apr 2017 07:13:23 -0400 Date: Fri, 14 Apr 2017 10:40:41 +0300 From: Roman Kagan Message-ID: <20170414074040.GA30735@rkaganb.sw.ru> References: <20170406150148.zwjpozqtale44jfh@perseus.local> <20170413094454.GB5095@noname.redhat.com> <1cc754f6-6718-edbc-96ef-ab0e0e10fd56@openvz.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] [Qemu-block] [RFC] Proposed qcow2 extension: subcluster allocation List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: John Snow Cc: "Denis V. Lunev" , Kevin Wolf , Stefan Hajnoczi , qemu-devel@nongnu.org, qemu-block@nongnu.org, Max Reitz On Thu, Apr 13, 2017 at 09:06:19PM -0400, John Snow wrote: > So if we have a 1MB cluster with 64k subclusters as a hypothetical, if > we write just the first subcluster, we'll have a map like: > > X--------------- > > Whatever actually happens to exist in this space, whether it be a hole > we punched via fallocate or literal zeroes, this space is known to the > filesystem to be contiguous. > > If we write to the last subcluster, we'll get: > > X--------------X > > And again, maybe the dashes are a fallocate hole, maybe they're zeroes. > but the last subcluster is located virtually exactly 15 subclusters > behind the first, they're not physically contiguous. We've saved the > space between them. Future out-of-order writes won't contribute to any > fragmentation, at least at this level. Yeah I think this is where the confusion lies. You apparently assume that the filesystem is smart enough to compensate for the subclusters being sparse within a cluster, and will make them eventually contiguous on the *media* once they are all written. Denis is claiming the opposite. I posted a simple experiment with a 64kB sparse file written out of order which ended up being 16 disparate blocks on the platters (ext4; with xfs this may be different), and this is obviously detrimental for performance with rotating disks. Note also that if the filesystem actually is smart to maintain the subclusters contiguos even if written out of order, apparently by not allowing blocks from other files to take the yet unused space between sparse subclusters, the disk space saving becomes not so obvious. > You might be able to reduce COW from 5 IOPs to 3 IOPs, but if we tune > the subclusters right, we'll have *zero*, won't we? Right, this is an attractive advantage. Need to test if the later access to such interleaved clusters is not degraded, though. Roman.