All of lore.kernel.org
 help / color / mirror / Atom feed
From: Forza <forza@tnonline.net>
To: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Cc: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Subject: Re: Support for compressed inline extents
Date: Mon, 23 Aug 2021 21:34:27 +0200	[thread overview]
Message-ID: <ca2452a6-3f5d-76df-e91b-dff2dcb53052@tnonline.net> (raw)
In-Reply-To: <20210822083356.GE29026@hungrycats.org>



On 2021-08-22 10:33, Zygo Blaxell wrote:
> On Sun, Aug 22, 2021 at 09:09:10AM +0200, Forza wrote:
>> On 2021-08-22 07:45, Zygo Blaxell wrote:
>>> On Sun, Aug 22, 2021 at 01:25:48AM +0200, Forza wrote:
>>>> I'd like to see the option to allow compressed extents to be inlined. It
>>>> could greatly reduce disk usage and speed up small files by avoiding
>>>> extra seeks.
>>>>
>>>> I tried to understand why we don't allow
>>>> it but could only find this reference
>>>> https://btrfs.wiki.kernel.org/index.php/On-disk_Format#EXTENT_DATA_.286c.29
>>>>
>>>> "the extent is inline, the remaining item bytes are the data bytes
>>>> (n bytes in case no compression/encryption/other encoding is used)."
>>>>
>>>> Is the limitation in the disk format or perhaps in the compression
>>>> heuristics?
>>>
>>> A far better question is "when did we _stop_ compressing inlined extents",
>>> and the answer is in v5.14-rc1: f2165627319f "btrfs: compression: don't
>>> try to compress if we don't have enough pages".  This check affects
>>> inlined extents, so they are never compressed after 5.14.  Oops.
>>
>> I don't understand this comment as you say below we do not allow compressed
>> (encoded) data inline? Do you mean we only used to compress data inline if
>> the original uncompressed data would fit inline too?
> 
> The old code had two conditions and both must be met:
> 
> 	1.  encoded data size <= max_inline mount parameter (default
> 	page_size / 2)
> 
> 	2.  unencoded data size < page_size (must fit in a single page
> 	without filling it).

This means we have methods to read encoded inlined data? Further below..

> 
> So with compression you can fit up to 4095 bytes in an inline extent on
> a page-size-4096 machine; however, the data must compress to 2048 bytes
> or less (or whatever the max_inline limit is set to).  If the compressed
> data doesn't fit inline, it gets stored uncompressed in a normal data
> block, since the compression can't save any space.
> 
> Without compression, it's much simpler:  only the extent's length matters,
> it's inline or not inline.
> 
> The new code adds a third condition which must also be met:
> 
> 	3.  unencoded data size > page_size
> 
> Condition 3 and condition 2 can never be true at the same time, so new
> kernel code cannot compress any extent that could possibly be inline.

Ouch
> 
>>>> Not all use cases would benefit, and we'd have more metadata, which
>>>> increase the risk of enospc. But i think it could be very valuable
>>>> nonetheless. For example mail servers, source code/CI, webservers, and
>>>> others that commonly deal with many small but highly compressible files.
>>>>
>>>>
>>>> I did a quick test by copying all files smaller than 8192 bytes from
>>>> my home server. The filesystem has 90GiB used.
>>>
>>> An 8192 byte file cannot currently be inline (on a 4K page size system)
>>> because the read code in btrfs assumes inline extents always fit inside
>>> one page after decoding.
>>>
>>> What you're really asking here is "can we have an arbitrary length
>>> of compressed inline extent, as long as the encoded version fits in a
>>> metadata block" and the short answer is "not with this on-disk format,"
>>> because existing readers cannot cope with it.  If we are to consider this,
>>> it requires an incompatible format change.
>>
>> Yes, this is what I meant. As long as the resulting data after compression
>> would fit inline, we should allow it to be inlined.
> 
> There's nothing about the disk format that would prevent this, except that
> no implementations exist that could read it correctly.

Further up you showed that we can read encoded inlined data. What is 
missing for that we can read encoded inlined data that decode to 
 >page_size in size?

> 
>>>> The result was 357129 files, 207605 inline. 792MiB disk usage, 1.0GiB
>>>> data size, or 1.1% of fs usage.
>>>>
>>>> Zstd compressed them, which gave 295419 files inline. Total data size
>>>> 500MiB. The size of the inlined files is 208MiB.
>>>>
>>>> Uncompressed the inlined files to see how much of the original data
>>>> could have been compressed and inlined. 599MiB total data with 501MiB
>>>> disk usage and 207576 inlined files.
>>>>
>>>> All in all we would save 501-208=293MiB, which is very good. Ontop of
>>>> this we'd have savings because we avoid padding up to 4kiB block size
>>>> due to inlining. Also my test only included files less than 8kiB. It
>>>> is possible that many files larger than this could be compressed to
>>>> less than max_inline size.
>>>>
>>>>
>>>> Thanks

  reply	other threads:[~2021-08-23 19:34 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-21 23:25 Support for compressed inline extents Forza
2021-08-22  5:45 ` Zygo Blaxell
2021-08-22  7:09   ` Forza
2021-08-22  8:33     ` Zygo Blaxell
2021-08-23 19:34       ` Forza [this message]
2021-08-23 20:23         ` Zygo Blaxell
2021-08-27 10:08           ` David Sterba
2021-08-29 11:22             ` Forza
2021-08-29 12:07               ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ca2452a6-3f5d-76df-e91b-dff2dcb53052@tnonline.net \
    --to=forza@tnonline.net \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.