git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Jeff King <peff@peff.net>
Cc: "Derrick Stolee" <stolee@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	git@vger.kernel.org, dstolee@microsoft.com,
	git@jeffhostetler.com, gitster@pobox.com, jrnieder@gmail.com
Subject: Re: [RFC PATCH 00/18] Multi-pack index (MIDX)
Date: Mon, 8 Jan 2018 14:43:00 +0100 (STD)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.1801081438470.31@ZVAVAG-6OXH6DA.rhebcr.pbec.zvpebfbsg.pbz> (raw)
In-Reply-To: <20180108102029.GA21232@sigill.intra.peff.net>

Hi Peff,

On Mon, 8 Jan 2018, Jeff King wrote:

> On Sun, Jan 07, 2018 at 07:08:54PM -0500, Derrick Stolee wrote:
> 
> > > (Not a critique of this, just a (stupid) question)
> > > 
> > > What's the practical use-case for this feature? Since it doesn't
> > > help with --abbrev=40 the speedup is all in the part that ensures we
> > > don't show an ambiguous SHA-1.
> > 
> > The point of including the --abbrev=40 is to point out that object
> > lookups do not get slower with the MIDX feature. Using these "git log"
> > options is a good way to balance object lookups and abbreviations with
> > object parsing and diff machinery. And while the public data shape I
> > shared did not show a difference, our private testing of the Windows
> > repository did show a valuable improvement when isolating to object
> > lookups and ignoring abbreviation calculations.
> 
> Just to make sure I'm parsing this correctly: normal lookups do get
> faster when you have a single index, given the right setup?
> 
> I'm curious what that setup looked like. Is it just tons and tons of
> packs? Is it ones where the packs do not follow the mru patterns very
> well?
> 
> I think it's worth thinking a bit about, because...
> 
> > > If something cares about both throughput and e.g. is saving the
> > > abbreviated SHA-1s isn't it better off picking some arbitrary size
> > > (e.g. --abbrev=20), after all the default abbreviation is going to show
> > > something as small as possible, which may soon become ambigous after the
> > > next commit.
> > 
> > Unfortunately, with the way the abbreviation algorithms work, using
> > --abbrev=20 will have similar performance problems because you still need to
> > inspect all packfiles to ensure there isn't a collision in the first 20 hex
> > characters.
> 
> ...if what we primarily care about speeding up is abbreviations, is it
> crazy to consider disabling the disambiguation step entirely?

Not crazy. But it would break stuff. Because...

> The results of find_unique_abbrev are already a bit of a probability
> game. They're guaranteed at the moment of generation, but as more
> objects are added, ambiguities may be introduced. Likewise, what's
> unambiguous for you may not be for somebody else you're communicating
> with, if they have their own clone.

... this is only a probability game in the long term, when you consider
new objects to enter from *somewhere*.

But in purely local settings, when we expect no new objects to be
introduced, we do use known-unambiguous abbreviations.

Take the interactive rebase for example. It generates todo lists with
abbreviated commit names, for readability (and it is *really* important to
keep this readable). As we expect new objects to be introduced by the
interactive rebase, we convert that todo list to unabbreviated commit
names before executing the interactive rebase.

Your idea (to not care about unambiguous abbreviations) would break that.

Ciao,
Dscho

  parent reply	other threads:[~2018-01-08 13:43 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-07 18:14 [RFC PATCH 00/18] Multi-pack index (MIDX) Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 01/18] docs: Multi-Pack Index (MIDX) Design Notes Derrick Stolee
2018-01-08 19:32   ` Jonathan Tan
2018-01-08 20:35     ` Derrick Stolee
2018-01-08 22:06       ` Jonathan Tan
2018-01-07 18:14 ` [RFC PATCH 02/18] midx: specify midx file format Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 03/18] midx: create core.midx config setting Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 04/18] midx: write multi-pack indexes for an object list Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 05/18] midx: create midx builtin with --write mode Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 06/18] midx: add t5318-midx.sh test script Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 07/18] midx: teach midx --write to update midx-head Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 08/18] midx: teach git-midx to read midx file details Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 09/18] midx: find details of nth object in midx Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 10/18] midx: use existing midx when writing Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 11/18] midx: teach git-midx to clear midx files Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 12/18] midx: teach git-midx to delete expired files Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 13/18] t5318-midx.h: confirm git actions are stable Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 14/18] midx: load midx files when loading packs Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 15/18] midx: use midx for approximate object count Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 16/18] midx: nth_midxed_object_oid() and bsearch_midx() Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 17/18] sha1_name: use midx for abbreviations Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 18/18] packfile: use midx for object loads Derrick Stolee
2018-01-07 22:42 ` [RFC PATCH 00/18] Multi-pack index (MIDX) Ævar Arnfjörð Bjarmason
2018-01-08  0:08   ` Derrick Stolee
2018-01-08 10:20     ` Jeff King
2018-01-08 10:27       ` Jeff King
2018-01-08 12:28         ` Ævar Arnfjörð Bjarmason
2018-01-08 13:43       ` Johannes Schindelin [this message]
2018-01-09  6:50         ` Jeff King
2018-01-09 13:05           ` Johannes Schindelin
2018-01-09 19:51             ` Stefan Beller
2018-01-09 20:12               ` Junio C Hamano
2018-01-09 20:16                 ` Stefan Beller
2018-01-09 21:31                   ` Junio C Hamano
2018-01-10 17:05               ` Johannes Schindelin
2018-01-10 10:57             ` Jeff King
2018-01-08 13:43       ` Derrick Stolee
2018-01-09  7:12         ` Jeff King
2018-01-08 11:43     ` Ævar Arnfjörð Bjarmason
2018-06-06  8:13     ` Ævar Arnfjörð Bjarmason
2018-06-06 10:27       ` [RFC PATCH 0/2] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
2018-06-06 10:27       ` [RFC PATCH 1/2] config.c: use braces on multiple conditional arms Ævar Arnfjörð Bjarmason
2018-06-06 10:27       ` [RFC PATCH 2/2] sha1-name: add core.validateAbbrev & relative core.abbrev Ævar Arnfjörð Bjarmason
2018-06-06 12:04         ` Christian Couder
2018-06-06 11:24       ` [RFC PATCH 00/18] Multi-pack index (MIDX) Derrick Stolee
2018-01-10 18:25 ` Martin Fick
2018-01-10 19:39   ` Derrick Stolee
2018-01-10 21:01     ` Martin Fick

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.1801081438470.31@ZVAVAG-6OXH6DA.rhebcr.pbec.zvpebfbsg.pbz \
    --to=johannes.schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).