patch

* Re: [PATCH] btrfs: zlib: do not do unnecessary page copying for compression
  2024-05-27 16:25  1% ` Zaslonko Mikhail
@ 2024-05-27 22:09  1%   ` Qu Wenruo
  0 siblings, 0 replies; 200+ results
From: Qu Wenruo @ 2024-05-27 22:09 UTC (permalink / raw)
  To: Zaslonko Mikhail, Qu Wenruo, linux-btrfs; +Cc: linux-s390, Vasily Gorbik

在 2024/5/28 01:55, Zaslonko Mikhail 写道:
> Hello Qu,
>
> I remember implementing btrfs zlib changes related to s390 dfltcc compression support a while ago:
> https://lwn.net/Articles/808809/
>
> The workspace buffer size was indeed enlarged for performance reasons.
>
> Please see my comments below.
>
> On 27.05.2024 11:24, Qu Wenruo wrote:
>> [BUG]
>> In function zlib_compress_folios(), we handle the input by:
>>
>> - If there are multiple pages left
>>    We copy the page content into workspace->buf, and use workspace->buf
>>    as input for compression.
>>
>>    But on x86_64 (which doesn't support dfltcc), that buffer size is just
>>    one page, so we're wasting our CPU time copying the page for no
>>    benefit.
>>
>> - If there is only one page left
>>    We use the mapped page address as input for compression.
>>
>> The problem is, this means we will copy the whole input range except the
>> last page (can be as large as 124K), without much obvious benefit.
>>
>> Meanwhile the cost is pretty obvious.
>
> Actually, the behavior for kernels w/o dfltcc support (currently available on s390
> only) should not be affected.
> We copy input pages to the workspace->buf only if the buffer size is larger than 1 page.
> At least it worked this way after my original btrfs zlib patch:
> https://lwn.net/ml/linux-kernel/20200108105103.29028-1-zaslonko@linux.ibm.com/
>
> Has this behavior somehow changed after your page->folio conversion performed for btrfs?
> https://lore.kernel.org/all/cover.1706521511.git.wqu@suse.com/

My bad, I forgot that the buf_size for non-S390 systems is fixed to one
page thus the page copy is not utilized for x86_64.

But I'm still wondering if we do not go 4 pages as buffer, how much
performance penalty would there be?

One of the objective is to prepare for the incoming sector perfect
subpage compression support, thus I'm re-checking the existing
compression code, preparing to change them to be subpage compatible.

If we can simplify the behavior without too large performance penalty,
can we consider just using one single page as buffer?

Thanks,
Qu

^ permalink raw reply	[relevance 1%]