Re: [f2fs-dev] [PATCH v6] f2fs: compress: support compress level

From: Chao Yu <yuchao0@huawei.com>
To: Gao Xiang <hsiangkao@redhat.com>
Cc: Eric Biggers <ebiggers@kernel.org>, <jaegeuk@kernel.org>,
	<linux-kernel@vger.kernel.org>,
	<linux-f2fs-devel@lists.sourceforge.net>
Subject: Re: [f2fs-dev] [PATCH v6] f2fs: compress: support compress level
Date: Fri, 4 Dec 2020 10:38:08 +0800	[thread overview]
Message-ID: <3041968d-87d0-d2dc-822b-0bb4a94a365b@huawei.com> (raw)
In-Reply-To: <20201204020659.GB1963435@xiangao.remote.csb>

On 2020/12/4 10:06, Gao Xiang wrote:
> On Fri, Dec 04, 2020 at 09:56:27AM +0800, Chao Yu wrote:
>> Hi Xiang,
>>
>> On 2020/12/4 8:31, Gao Xiang wrote:
>>> Hi Chao,
>>>
>>> On Thu, Dec 03, 2020 at 11:32:34AM -0800, Eric Biggers wrote:
>>>
>>> ...
>>>
>>>>
>>>> What is the use case for storing the compression level on-disk?
>>>>
>>>> Keep in mind that compression levels are an implementation detail; the exact
>>>> compressed data that is produced by a particular algorithm at a particular
>>>> compression level is *not* a stable interface.  It can change when the
>>>> compressor is updated, as long as the output continues to be compatible with the
>>>> decompressor.
>>>>
>>>> So does compression level really belong in the on-disk format?
>>>>
>>>
>>> Curious about this, since f2fs compression uses 16k f2fs compress cluster
>>> by default (doesn't do sub-block compression by design as what btrfs did),
>>> so is there significant CR difference between lz4 and lz4hc on 16k
>>> configuration (I guess using zstd or lz4hc for 128k cluster like btrfs
>>> could make more sense), could you leave some CR numbers about these
>>> algorithms on typical datasets (enwik9, silisia.tar or else.) with 16k
>>> cluster size?
>>
>> Yup, I can figure out some numbers later. :)
>>
>>>
>>> As you may noticed, lz4hc is much slower than lz4, so if it's used online,
>>> it's a good way to keep all CPUs busy (under writeback) with unprivileged
>>> users. I'm not sure if it does matter. (Ok, it'll give users more options
>>> at least, yet I'm not sure end users are quite understand what these
>>> algorithms really mean, I guess it spends more CPU time but without much
>>> more storage saving by the default 16k configuration.)
>>>
>>> from https://github.com/lz4/lz4    Core i7-9700K CPU @ 4.9GHz
>>> Silesia Corpus
>>>
>>> Compressor              Ratio   Compression     Decompression
>>> memcpy                  1.000   13700 MB/s      13700 MB/s
>>> Zstandard 1.4.0 -1      2.883   515 MB/s	1380 MB/s
>>> LZ4 HC -9 (v1.9.0)      2.721   41 MB/s         4900 MB/s
>>
>> There is one solutions now, Daeho has submitted two patches:
>>
>> f2fs: add compress_mode mount option
>> f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE
>>
>> Which allows to specify all files in data partition be compressible, by default,
>> all files are written as non-compressed one, at free time of system, we can use
>> ioctl to reload and compress data for specific files.
>>
> 
> Yeah, my own premature suggestion is there are many compression options
> exist in f2fs compression, but end users are not compression experts.
> So it'd better to leave advantage options to users (or users might be
> confused or select wrong algorithm or make potential complaint...)

Yes, I agree.

> 
> Keep lz4hc dirty data under writeback could block writeback, make kswapd
> busy, and direct memory reclaim path, I guess that's why rare online
> compression chooses it. My own premature suggestion is that it'd better
> to show the CR or performance benefits in advance, and prevent unprivileged
> users from using high-level lz4hc algorithm (to avoid potential system attack.)
> either from mount options or ioctl.

Yes, I guess you are worry about destop/server scenario, as for android scenario,
all compression related flow can be customized, and I don't think we will use
online lz4hc compress; for other scenario, except the numbers, I need to add the
risk of using lz4hc algorithm in document.

Thanks,

> 
>>>
>>> Also a minor thing is lzo-rle, initially it was only used for in-memory
>>> anonymous pages and it won't be kept on-disk so that's fine. I'm not sure
>>> if lzo original author want to support it or not. It'd be better to get
>>
>>
>> Hmm.. that's a problem, as there may be existed potential users who are
>> using lzo-rle, remove lzo-rle support will cause compatibility issue...
>>
>> IMO, the condition "f2fs may has persisted lzo-rle compress format data already"
>> may affect the decision of not supporting that algorithm from author.
>>
>>> some opinion before keeping it on-disk.
>>
>> Yes, I can try to ask... :)
> 
> Yeah, it'd be better to ask the author first, or it may have to maintain
> a private lz4-rle folk...
> 
> Thanks,
> Gao Xiang
> 
>>
>> Thanks,
>>
>>>
>>> Thanks,
>>> Gao Xiang
>>>
>>>> - Eric
>>>>
>>>>
>>>> _______________________________________________
>>>> Linux-f2fs-devel mailing list
>>>> Linux-f2fs-devel@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>>>
>>> .
>>>
>>
> 
> .
>