All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Timofey Titovets <nefelim4ag@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [RFC PATCH v4 0/2] Btrfs: add compression heuristic
Date: Mon, 3 Jul 2017 19:09:00 +0200	[thread overview]
Message-ID: <20170703170900.GA2866@twin.jikos.cz> (raw)
In-Reply-To: <20170701165602.31189-1-nefelim4ag@gmail.com>

On Sat, Jul 01, 2017 at 07:56:00PM +0300, Timofey Titovets wrote:
> Today btrfs use simple logic to make decision
> compress data or not:
> Selected compression algorithm try compress
> data and if this save some space
> store that extent as compressed.
> 
> It's Reliable way to detect uncompressible data
> but it's will waste/burn cpu time for
> bad/un-compressible data and add latency.
> 
> This way also add additional pressure on
> memory subsystem as for every compressed write
> btrfs need to allocate pages and
> reuse compression workspace.
> 
> This is quite efficient, but not free.
> 
> So let's implement heuristic.
> Heuristic will analize data on the fly
> before call of compression code,
> detect uncompressible data and advice to skip it.

So let me recap the heuristic flow:

* before compression, map all the pages to be compressed
* grab samples, calculate entropy or look for known patterns
* unmap pages
* decide if it's worth to do compression

>From that it's quite easy to start with extending the code logic where
the heuristic would be a naive and very optimistic 'return true'.

>From that on, we can extend and fine tune the heuristic itself, whatever
we'd decide to do. There are likely some decisions that we'd have make
after we see the effects in the wild on real data. Getting the skeleton
code merged independently would hopefully make things easier to test.

The cost of the heuristic must be low so this could lead to further
optimizations. Allocating extra memory for the sample might be also not
the best choice, but we might preallocate the bytes within the
workspaces so there's not cost at the actual compression time.

The incremental updates to the heuristic should help us determine if
we're not making it worse, comparing to the current code as a baseline.

So, if you agree, let's start with the heuristic skeleton code first.
I'll commend in the patches.

      parent reply	other threads:[~2017-07-03 17:10 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-01 16:56 [RFC PATCH v4 0/2] Btrfs: add compression heuristic Timofey Titovets
2017-07-01 16:56 ` [RFC PATCH v4 1/2] Btrfs: add precomputed log2() Timofey Titovets
2017-07-01 16:56 ` [RFC PATCH v4 2/2] Btrfs: add heuristic method for make decision compress or not compress Timofey Titovets
2017-07-03 17:30   ` David Sterba
2017-07-04 11:11     ` Timofey Titovets
2017-07-04 15:25       ` David Sterba
2017-07-03 17:09 ` David Sterba [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170703170900.GA2866@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nefelim4ag@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.