All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Gao Xiang <hsiangkao@redhat.com>
Cc: Eric Biggers <ebiggers@kernel.org>,
	Neal Gompa <ngompa13@gmail.com>, Amy Parker <enbyamy@gmail.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: Adding LZ4 compression support to Btrfs
Date: Fri, 26 Feb 2021 15:12:03 +0100	[thread overview]
Message-ID: <20210226141203.GJ7604@suse.cz> (raw)
In-Reply-To: <20210226112854.GA1890271@xiangao.remote.csb>

On Fri, Feb 26, 2021 at 07:28:54PM +0800, Gao Xiang wrote:
> On Fri, Feb 26, 2021 at 10:36:53AM +0100, David Sterba wrote:
> > On Thu, Feb 25, 2021 at 10:50:56AM -0800, Eric Biggers wrote:
> > 
> > ZLIB and ZSTD can have a separate dictionary and don't need the input
> > chunks to be contiguous. This brings some additional overhead like
> > copying parts of the input to the dictionary and additional memory for
> > themporary structures, but with higher compression ratios.
> > 
> > IIRC the biggest problem for LZ4 was the cost of setting up each 4K
> > chunk, the work memory had to be zeroed. The size of the work memory is
> > tunable but trading off compression ratio. Either way it was either too
> > slow or too bad.
> 
> May I ask why LZ4 needs to zero the work memory (if you mean dest
> buffer and LZ4_decompress_safe), just out of curiousity... I didn't
> see that restriction before. Thanks!

Not the destination buffer, but the work memory or state as it can be
also called. This is from my initial interest in lz4 in 2012 and I got
that from Yann himself.  There was a tradeoff to either expect zeroed
work memory or add more conditionals.

At time he got some benchmark result and the conditionals came out
worse. And I see the memset is still there (see below) so there's been
no change.

For example in f2fs sources there is:
lz4_compress_pages
  LZ4_compress_default (cc->private is the work memory)
    LZ4_compress_fast
      LZ4_compress_fast_extState
        LZ4_resetStream
	  memset

Where the state size LZ4_MEM_COMPRESS is hidden in the maze od defines

#define LZ4_MEM_COMPRESS	LZ4_STREAMSIZE
#define LZ4_STREAMSIZE	(LZ4_STREAMSIZE_U64 * sizeof(unsigned long long))
#define LZ4_STREAMSIZE_U64 ((1 << (LZ4_MEMORY_USAGE - 3)) + 4)
/*
 * LZ4_MEMORY_USAGE :
 * Memory usage formula : N->2^N Bytes
 * (examples : 10 -> 1KB; 12 -> 4KB ; 16 -> 64KB; 20 -> 1MB; etc.)
 * Increasing memory usage improves compression ratio
 * Reduced memory usage can improve speed, due to cache effect
 * Default value is 14, for 16KB, which nicely fits into Intel x86 L1 cache
 */
#define LZ4_MEMORY_USAGE 14

So it's 16K by default in linux. Now imagine doing memset(16K) just to
compress 4K, and do that 32 times to compress the whole 128K chunk.
That's not a negligible overhead.

  parent reply	other threads:[~2021-02-26 14:15 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-24 22:50 Adding LZ4 compression support to Btrfs Amy Parker
2021-02-25 13:18 ` Neal Gompa
2021-02-25 13:26   ` David Sterba
2021-02-25 18:50     ` Eric Biggers
2021-02-26  3:54       ` Gao Xiang
2021-02-26  9:36       ` David Sterba
2021-02-26 11:28         ` Gao Xiang
2021-02-26 13:11           ` Gao Xiang
2021-02-26 14:12           ` David Sterba [this message]
2021-02-26 14:35             ` Gao Xiang
2021-02-26 16:39         ` Eric Biggers
2021-03-05 13:55           ` David Sterba
     [not found]     ` <CAPkEcwjcRgnaWLmqM1jEvH5A9PijsQEY5BKFyKdt_+TeugaJ_g@mail.gmail.com>
2021-02-25 23:18       ` Amy Parker
2021-02-26  0:21         ` Neal Gompa
2021-02-25 13:32   ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210226141203.GJ7604@suse.cz \
    --to=dsterba@suse.cz \
    --cc=ebiggers@kernel.org \
    --cc=enbyamy@gmail.com \
    --cc=hsiangkao@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=ngompa13@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.