All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Eric Biggers <ebiggers@kernel.org>
Cc: dsterba@suse.cz, Neal Gompa <ngompa13@gmail.com>,
	Amy Parker <enbyamy@gmail.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: Adding LZ4 compression support to Btrfs
Date: Fri, 26 Feb 2021 10:36:53 +0100	[thread overview]
Message-ID: <20210226093653.GI7604@twin.jikos.cz> (raw)
In-Reply-To: <YDfxkGkWnLEfsDwZ@gmail.com>

On Thu, Feb 25, 2021 at 10:50:56AM -0800, Eric Biggers wrote:
> On Thu, Feb 25, 2021 at 02:26:47PM +0100, David Sterba wrote:
> > 
> > LZ4 support has been asked for so many times that it has it's own FAQ
> > entry:
> > https://btrfs.wiki.kernel.org/index.php/FAQ#Will_btrfs_support_LZ4.3F
> > 
> > The decompression speed is not the only thing that should be evaluated,
> > the way compression works in btrfs (in 4k blocks) does not allow good
> > compression ratios and overall LZ4 does not do much better than LZO. So
> > this is not worth the additional costs of compatibility. With ZSTD we
> > got the high compression and recently there have been added real-time
> > compression levels that we'll use in btrfs eventually.
> 
> When ZSTD support was being added to btrfs, it was claimed that btrfs compresses
> up to 128KB at a time
> (https://lore.kernel.org/r/5a7c09dd-3415-0c00-c0f2-a605a0656499@fb.com).
> So which is it -- 4KB or 128KB?

Logical extent ranges are sliced to 128K that are submitted to the
compression routine. Then, the whole range is fed by 4K (or more exactly
by page sized chunks) to the compression. Depending on the capabilities
of the compression algorithm, the 4K chunks are either independent or
can reuse some internal state of the algorithm.

LZO and LZ4 use some kind of embedded dictionary in the same buffer, and
references to that dictionary directly. Ie. assuming the whole input
range to be contiguous. Which is something that's not trivial to achive
in kernel because of pages that are not contiguous in general.

Thus, LZO and LZ4 compress 4K at a time, each chunk is independent. This
results in worse compression ratio because of less data reuse
possibilities. OTOH this allows decompression in place.

ZLIB and ZSTD can have a separate dictionary and don't need the input
chunks to be contiguous. This brings some additional overhead like
copying parts of the input to the dictionary and additional memory for
themporary structures, but with higher compression ratios.

IIRC the biggest problem for LZ4 was the cost of setting up each 4K
chunk, the work memory had to be zeroed. The size of the work memory is
tunable but trading off compression ratio. Either way it was either too
slow or too bad.

  parent reply	other threads:[~2021-02-26  9:41 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-24 22:50 Adding LZ4 compression support to Btrfs Amy Parker
2021-02-25 13:18 ` Neal Gompa
2021-02-25 13:26   ` David Sterba
2021-02-25 18:50     ` Eric Biggers
2021-02-26  3:54       ` Gao Xiang
2021-02-26  9:36       ` David Sterba [this message]
2021-02-26 11:28         ` Gao Xiang
2021-02-26 13:11           ` Gao Xiang
2021-02-26 14:12           ` David Sterba
2021-02-26 14:35             ` Gao Xiang
2021-02-26 16:39         ` Eric Biggers
2021-03-05 13:55           ` David Sterba
     [not found]     ` <CAPkEcwjcRgnaWLmqM1jEvH5A9PijsQEY5BKFyKdt_+TeugaJ_g@mail.gmail.com>
2021-02-25 23:18       ` Amy Parker
2021-02-26  0:21         ` Neal Gompa
2021-02-25 13:32   ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210226093653.GI7604@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=ebiggers@kernel.org \
    --cc=enbyamy@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=ngompa13@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.