All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Jeff King <peff@peff.net>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [PATCH 2/2] use zstd zlib wrapper
Date: Wed, 14 Sep 2016 18:22:17 -0700	[thread overview]
Message-ID: <CAGZ79kYcB-x40_w1fcWL3NSp8JU9SPrTAEiru-6Jpb7fDM1Y0w@mail.gmail.com> (raw)
In-Reply-To: <20160914235843.nacr54ekvl6rjipk@sigill.intra.peff.net>

On Wed, Sep 14, 2016 at 4:58 PM, Jeff King <peff@peff.net> wrote:
> There's a fancy new compression algorithm called "zstd". The
> idea is that it's supposed to get similar compression ratios
> to zlib, but with much faster compression and decompression
> times. And on top of that, a nice sliding scale to trade off
> size versus time on the compression side.
>
> The zstd site at https://facebook.github.io/zstd/ claims
> close to 3x speedup for both compression and decompression
> versus zlib, with similar compression ratios. There are
> other fast algorithms (like lz4), but they usually compress
> much worse (follow the link above for a nice table of
> results).
>
> Since any git operations that have to access objects need to
> do a zlib inflate, in theory we can speed up everything by
> using zstd. And then on the packing side, use higher
> compression levels when making on-disk packfiles (which will
> be accessed many times) and lower ones when making loose
> objects, or deflating packed objects on the fly when serving
> fetches.
>
> The catch, of course, is that it's a new incompatible
> format. This would be a pretty huge change and totally break
> backwards compatibility for git, not just on disk but
> on-the-wire as well. So my goal here was not a finished
> product but just a quick experiment to see if it did indeed
> bring the promise speedups.
>
> Disappointingly, the answer seems to be "no".

After having looked at the data, I disagree with the conclusion.
And for that I think we need to reason about the frequency
of the operations happening.

* As an enduser, happily hacking away at one repository,
  I probably do not care about the pack size on disk as much
  as I care about timing of the local operations. And I assume
  that for each repack we have about 1000 reads (log/rev-list)
  The 1000 is a wild speculation without any data to back it up.
  So as an end user I'd be happy about [zstd, ~5]
  For the end user LZ4 seems to be the best solution if it were available.

* As a service provider, I know we have a lot more reads than
  writes, and repacking is annoying. Also at that scale the disk
  isn't negligible cheap. So we need to weigh the numbers differently,
  but how? I suspect depending on the weighting it could still be
  considered beneficial to go with zstd5. (No hard numbers here)

  reply	other threads:[~2016-09-15  1:22 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-14 23:55 Journal of Failed Git Experiments, Volume 1 Jeff King
2016-09-14 23:56 ` [PATCH 1/2] obj_hash: convert to a critbit tree Jeff King
2016-09-15  0:52   ` Stefan Beller
2016-09-15  1:13     ` Jeff King
2016-09-14 23:58 ` [PATCH 2/2] use zstd zlib wrapper Jeff King
2016-09-15  1:22   ` Stefan Beller [this message]
2016-09-15  6:28     ` Jeff King
2016-09-25 14:17 ` Journal of Failed Git Experiments, Volume 1 Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kYcB-x40_w1fcWL3NSp8JU9SPrTAEiru-6Jpb7fDM1Y0w@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.