Re: [f2fs-dev] [PATCH v6] f2fs: compress: support compress level

From: Chao Yu <yuchao0@huawei.com>
To: Gao Xiang <hsiangkao@redhat.com>
Cc: Eric Biggers <ebiggers@kernel.org>, <jaegeuk@kernel.org>,
	<linux-kernel@vger.kernel.org>,
	<linux-f2fs-devel@lists.sourceforge.net>
Subject: Re: [f2fs-dev] [PATCH v6] f2fs: compress: support compress level
Date: Fri, 4 Dec 2020 09:56:27 +0800	[thread overview]
Message-ID: <7b975d1a-a06c-4e14-067e-064afc200934@huawei.com> (raw)
In-Reply-To: <20201204003119.GA1957051@xiangao.remote.csb>

Hi Xiang,

On 2020/12/4 8:31, Gao Xiang wrote:
> Hi Chao,
> 
> On Thu, Dec 03, 2020 at 11:32:34AM -0800, Eric Biggers wrote:
> 
> ...
> 
>>
>> What is the use case for storing the compression level on-disk?
>>
>> Keep in mind that compression levels are an implementation detail; the exact
>> compressed data that is produced by a particular algorithm at a particular
>> compression level is *not* a stable interface.  It can change when the
>> compressor is updated, as long as the output continues to be compatible with the
>> decompressor.
>>
>> So does compression level really belong in the on-disk format?
>>
> 
> Curious about this, since f2fs compression uses 16k f2fs compress cluster
> by default (doesn't do sub-block compression by design as what btrfs did),
> so is there significant CR difference between lz4 and lz4hc on 16k
> configuration (I guess using zstd or lz4hc for 128k cluster like btrfs
> could make more sense), could you leave some CR numbers about these
> algorithms on typical datasets (enwik9, silisia.tar or else.) with 16k
> cluster size?

Yup, I can figure out some numbers later. :)

> 
> As you may noticed, lz4hc is much slower than lz4, so if it's used online,
> it's a good way to keep all CPUs busy (under writeback) with unprivileged
> users. I'm not sure if it does matter. (Ok, it'll give users more options
> at least, yet I'm not sure end users are quite understand what these
> algorithms really mean, I guess it spends more CPU time but without much
> more storage saving by the default 16k configuration.)
> 
> from https://github.com/lz4/lz4    Core i7-9700K CPU @ 4.9GHz
> Silesia Corpus
> 
> Compressor              Ratio   Compression     Decompression
> memcpy                  1.000   13700 MB/s      13700 MB/s
> Zstandard 1.4.0 -1      2.883   515 MB/s	1380 MB/s
> LZ4 HC -9 (v1.9.0)      2.721   41 MB/s         4900 MB/s

There is one solutions now, Daeho has submitted two patches:

f2fs: add compress_mode mount option
f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE

Which allows to specify all files in data partition be compressible, by default,
all files are written as non-compressed one, at free time of system, we can use
ioctl to reload and compress data for specific files.

> 
> Also a minor thing is lzo-rle, initially it was only used for in-memory
> anonymous pages and it won't be kept on-disk so that's fine. I'm not sure
> if lzo original author want to support it or not. It'd be better to get

Hmm.. that's a problem, as there may be existed potential users who are
using lzo-rle, remove lzo-rle support will cause compatibility issue...

IMO, the condition "f2fs may has persisted lzo-rle compress format data already"
may affect the decision of not supporting that algorithm from author.

> some opinion before keeping it on-disk.

Yes, I can try to ask... :)

Thanks,

> 
> Thanks,
> Gao Xiang
> 
>> - Eric
>>
>>
>> _______________________________________________
>> Linux-f2fs-devel mailing list
>> Linux-f2fs-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> 
> .
>