From mboxrd@z Thu Jan 1 00:00:00 1970 From: Theodore Ts'o Subject: Re: fs compression Date: Wed, 20 May 2015 17:36:41 -0400 Message-ID: <20150520213641.GM2871@thunk.org> References: <1431145253-2019-1-git-send-email-jaegeuk@kernel.org> <1431145253-2019-3-git-send-email-jaegeuk@kernel.org> <20150513020208.GK15721@dastard> <20150513064802.GA48682@jaegeuk-mac02.hsd1.ca.comcast.net> <20150514003721.GN15721@dastard> <20150516132403.GA2998@thunk.org> <20150516171326.GA24795@eden.sea.cyngn.com> <20150520174635.GA17651@eden.sea.cyngn.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Jaegeuk Kim , linux-fsdevel@vger.kernel.org To: Tom Marshall Return-path: Received: from imap.thunk.org ([74.207.234.97]:48831 "EHLO imap.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754685AbbETVgp (ORCPT ); Wed, 20 May 2015 17:36:45 -0400 Content-Disposition: inline In-Reply-To: <20150520174635.GA17651@eden.sea.cyngn.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Wed, May 20, 2015 at 10:46:35AM -0700, Tom Marshall wrote: > So I've been playing around a bit and I have a basic strategy laid out. > Please let me know if I'm on the right track. > > Compressed file attributes > ========================== > > The filesystem is responsible for detecting whether a file is compressed and > hooking into the compression lib. This may be done with an inode flag, > xattr, or any other applicable method. No other special attributes are > necessary. So I assume what you are implementing is read-only compression; that is, once the file is written, and the attribute set indicating that this is a compressed file, it is now immutable. > Compressed file format > ====================== > > Compressed files shall have header, block map, and data sections. > > Header: > > byte[4] magic 'zzzz' (not strictly needed) > byte param1 method and flags > bits 0..3 = compression method (1=zlib, 2=lz4, etc.) > bits 4..7 = flags (none defined yet) > byte blocksize log2 of blocksize (max 31) I suggest using the term "compression cluster" to distinguish this from the file system block size. > le48 orig_size original uncompressed file size > > > Block map: > > Vector of le16 (if blocksize <= 16) or le32 (if blocksize > 16). Each entry > is the compressed size of the block. Zero indicates that the block is > stored uncompressed, in case compression expanded the block. What I would store instead is list of 32 or 64-bit offsets, where the nth entry in the array indicates the starting offset of the nth compression cluster. > Questions and issues ==================== > > Should there be any padding for the data blocks? For example, if writing is > to be supported, padding the compressed data to the filesystem block size > would allow for easy rewriting of individual blocks without disturbing the > surrounding blocks. Perhaps padding could be indicated by a flag. If you add padding then you defeat the whole point of adding compression. What if the initial contents of a 64k cluster was all zeros, so it trivially compresses down to a few dozen bytes; but then it gets replaced by completely uncompressible data? If you add 64k worth of padding to each block, then you're not saving any space, so what's the point? > The compression code must be able to read pages from the underlying > filesystem. This involves using the pagecache. But the uncompressed data > is what ultimately should end up in the pagecache. This is where I'm > currently stuck. How do I implement the code such that the underlying > compressed data may be read (using the pagecache or not) while not > disturbing the pagecache for the uncompressed data? I'm wondering if I need > to create an internal address_space to pass down into the underlying > readpage? Or is there another way to do this? So I would *not* reference the compressed data via the page cache. If you do that, then you end up wasting space in the page cache, since the page cache will contain both the compressed and decompressed data --- and once the data has been decompressed, the compressed version is completely useless. So it's better to have the file system supply the physical location on disk, and then to read in the compressed data to a scratched set of page which is freed immediately after you are done decompressing things. This is why compression is so very different from encryption. The constraints make it quite different. Regards, - Ted