Re: [PATCH] btrfs: zlib: refactor how we prepare the input buffer

From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: dsterba@suse.cz, Qu Wenruo <wqu@suse.com>,
	linux-btrfs@vger.kernel.org,
	"Fabio M . De Francesco" <fmdefrancesco@gmail.com>
Subject: Re: [PATCH] btrfs: zlib: refactor how we prepare the input buffer
Date: Tue, 21 Jun 2022 08:40:06 +0800	[thread overview]
Message-ID: <7bd3daa0-234d-4843-0f04-2b020f1c7b0e@gmx.com> (raw)
In-Reply-To: <20220620160806.GS20633@twin.jikos.cz>

On 2022/6/21 00:08, David Sterba wrote:
> On Sat, Jun 18, 2022 at 10:39:28AM +0800, Qu Wenruo wrote:
>> Inspired by recent kmap() change from Fabio M. De Francesco.
>>
>> There are some weird behavior in zlib_compress_pages(), mostly around how
>> we prepare the input buffer.
>>
>> [BEFORE]
>> - We hold a page mapped for a long time
>>    This is making it much harder to convert kmap() to kmap_local_page(),
>>    as such long mapped page can lead to nested mapped page.
>>
>> - Different paths in the name of "optimization"
>>    When we ran out of input buffer, we will grab the new input with two
>>    different paths:
>>
>>    * If there are more than one pages left, we copy the content into the
>>      input buffer.
>>      This behavior is introduced mostly for S390, as that arch needs
>>      multiple pages as input buffer for hardware decompression.
>>
>>    * If there is only one page left, we use that page from page cache
>>      directly without copying the content.
>>
>>    This is making page map/unmap much harder, especially due the latter
>>    case.
>>
>> [AFTER]
>> This patch will change the behavior by introducing a new helper, to
>> fulfill the input buffer:
>>
>> - Only map one page when we do the content copy
>>
>> - Unified path, by always copying the page content into workspace
>>    input buffer
>>    Yes, we're doing extra page copying. But the original optimization
>>    only work for the last page of the input range.
>>
>>    Thus I'd say the sacrifice is already not that big.
>
> I don't like that the performance may drop and that there's extra memory
> copyiing when not absolutely needed, OTOH it's in zlib code and I think
> though it's in use today, the zstd is a sufficient replacement so the
> perf drop should have low impact.

My bad, I didn't check buf_size which is conditionally assigned to
PAGE_SIZE or 4 * PAGE_SIZE.

So changing it to memcpy() is going affect all archs other than S390.

I'll change the mapping start and end part to re-enable the old behavior.

Thanks,
Qu

>
>> - Use kmap_local_page() and kunmap_local() instead
>>    Now the lifespan for the mapped page is only during memcpy() call,
>>    we're definitely fine to use kmap_local_page()/kunmap_local().
>>
>> Cc: Fabio M. De Francesco <fmdefrancesco@gmail.com>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> Only tested on x86_64 for the correctness of the new helper.
>>
>> But considering how small the window we need the page to be mapped, I
>> think it should also work for x86 without any problem.
>> ---
>>   fs/btrfs/zlib.c | 85 ++++++++++++++++++++++++-------------------------
>>   1 file changed, 41 insertions(+), 44 deletions(-)
>>
>> diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
>> index 767a0c6c9694..2cd4f6fb1537 100644
>> --- a/fs/btrfs/zlib.c
>> +++ b/fs/btrfs/zlib.c
>> @@ -91,20 +91,54 @@ struct list_head *zlib_alloc_workspace(unsigned int level)
>>   	return ERR_PTR(-ENOMEM);
>>   }
>>
>> +/*
>> + * Copy the content from page cache into @workspace->buf.
>> + *
>> + * @total_in:		The original total input length.
>> + * @fileoff_ret:	The file offset.
>> + *			Will be increased by the number of bytes we read.
>> + */
>> +static void fill_input_buffer(struct workspace *workspace,
>> +			      struct address_space *mapping,
>> +			      unsigned long total_in, u64 *fileoff_ret)
>> +{
>> +	unsigned long bytes_left = total_in - workspace->strm.total_in;
>> +	const int input_pages = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE),
>> +				    workspace->buf_size / PAGE_SIZE);
>> +	u64 file_offset = *fileoff_ret;
>> +	int i;
>> +
>> +	/* Copy the content of each page into the input buffer. */
>> +	for (i = 0; i < input_pages; i++) {
>> +		struct page *in_page;
>> +		void *addr;
>> +
>> +		in_page = find_get_page(mapping, file_offset >> PAGE_SHIFT);
>> +
>> +		addr = kmap_local_page(in_page);
>> +		memcpy(workspace->buf + i * PAGE_SIZE, addr, PAGE_SIZE);
>
> The workspace->buf is 1 page or 4 pages for x390, so here it looks
> confusing that it could overflow and no bounds are explicitly checked
> wherether the i * PAGE_SIZE offset is still OK. This would at least
> deserve a comment and some runtime checks.

The check is just above, @input_pages is calculated using
workspace->buf_size / PAGE_SIZE.

So it should be OK.

Although the real problem is subpage support, we should not just copy
the full page.

But thankfully, btrfs subpage only support full page compression for now.

Thanks,
Qu