All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Junio C Hamano <gitster@pobox.com>, Johannes Sixt <j6t@kdbg.org>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
Date: Thu, 29 Sep 2016 15:16:09 -0400	[thread overview]
Message-ID: <20160929191609.maxggcli76472t4g@sigill.intra.peff.net> (raw)
In-Reply-To: <CA+55aFwbCNiF0nDppZ5SuRcZwc9kNvKYzgyd_bR8Ut8XRW_p4Q@mail.gmail.com>

On Thu, Sep 29, 2016 at 11:55:46AM -0700, Linus Torvalds wrote:

> I think the patch can speak for itself, but the basic core is this
> section in get_short_sha1():
> 
>   +       if (len < 16 && !status && (flags & GET_SHA1_AUTOMATIC)) {
>   +               unsigned int expect_collision = 1 << (len * 2);
>   +               if (ds.nrobjects > expect_collision)
>   +                       return SHORT_NAME_AMBIGUOUS;
>   +       }

Hmm. So at length 7, we expect collisions at 2^14, which is 16384. That
seems really low. I mean, by the birthday paradox that's where expect
a 50% chance of a collision. But that's a single collision. We
definitely don't expect them to be common at that size.

So I suspect this could be a bit looser. The real number we care about
is probably something like "there is probability 'p' of a collision when
we add a new object", but I'm not sure what that 'p' would be. Or
perhaps "we accept collisions in 'n' percent of objects". But again, I
don't know that 'n'.

I dunno. I suppose being overly conservative with this number leaves
room for growth. Repositories generally get bigger, not smaller. :)

> What do you think? It's actually a fairly simple patch and I really do
> think it makes sense and it seems to just DTRT automatically.

I like the general idea.

As far as the implementation, I was surprised to see it touch
get_short_sha1() at all. That's, after all, for lookups, and we would
never want to require more characters on the reading side.

I see you worked around it with a flag so that this behavior only kicks
in when called via find_unique_abbrev(). But if you look at the caller:

> @@ -458,14 +472,19 @@ int for_each_abbrev(const char *prefix, each_abbrev_fn fn, void *cb_data)
>  int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
>  {
>  	int status, exists;
> +	int flags = GET_SHA1_QUIETLY;
>  
> +	if (len < 0) {
> +		flags |= GET_SHA1_AUTOMATIC;
> +		len = 7;
> +	}
>  	sha1_to_hex_r(hex, sha1);
>  	if (len == 40 || !len)
>  		return 40;
>  	exists = has_sha1_file(sha1);
>  	while (len < 40) {
>  		unsigned char sha1_ret[20];
> -		status = get_short_sha1(hex, len, sha1_ret, GET_SHA1_QUIETLY);
> +		status = get_short_sha1(hex, len, sha1_ret, flags);
>  		if (exists
>  		    ? !status
>  		    : status == SHORT_NAME_NOT_FOUND) {

You can see that we're going to do more work than we would otherwise
need to. Because we start at 7, and ask get_short_sha1() "is this unique
enough?", and looping. But if we _know_ we won't accept any answer
shorter than some N based on the number of objects in the repository,
then we should start at that N.

IOW, something like:

  if (len < 0)
	len = ceil(log_base_2(repository_object_count()));

here, and then you don't have to touch get_short_sha1() at all.

I suspect you pushed it down into get_short_sha1() because it kind-of
does the repository_object_count() step for "free" as it's looking at
the object anyway. But that step is really not very expensive. And I'd
even say you could just ignore loose objects entirely, and treat them
like a rounding error (the way that duplicate objects in packs are
treated).

That leaves you with just an O(# of packs) loop over a linked list. You
could even just keep a global object count up to date in
add_packed_git(), and then it's O(1).

-Peff

  parent reply	other threads:[~2016-09-29 19:16 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-26  1:39 Changing the default for "core.abbrev"? Linus Torvalds
2016-09-26  3:46 ` Junio C Hamano
2016-09-26  4:34   ` Jeff King
2016-09-26  4:45     ` Junio C Hamano
2016-09-26 11:57       ` [PATCH 0/10] helping people resolve ambiguous sha1s Jeff King
2016-09-26 11:59         ` [PATCH 01/10] get_sha1: detect buggy calls with multiple disambiguators Jeff King
2016-09-26 16:37           ` Junio C Hamano
2016-09-26 17:21             ` Jeff King
2016-09-26 17:50               ` Junio C Hamano
2016-09-26 11:59         ` [PATCH 02/10] get_sha1: avoid repeating ourselves via ONLY_TO_DIE Jeff King
2016-09-26 11:59         ` [PATCH 03/10] get_sha1: propagate flags to child functions Jeff King
2016-09-26 11:59         ` [PATCH 04/10] get_short_sha1: peel tags when looking for treeish Jeff King
2016-09-26 12:11           ` Jeff King
2016-09-26 16:55           ` Junio C Hamano
2016-09-26 17:23             ` Jeff King
2016-09-26 12:00         ` [PATCH 05/10] get_short_sha1: refactor init of disambiguation code Jeff King
2016-09-26 12:00         ` [PATCH 06/10] get_short_sha1: NUL-terminate hex prefix Jeff King
2016-09-26 17:10           ` Junio C Hamano
2016-09-26 17:25             ` Jeff King
2016-09-26 17:36               ` Junio C Hamano
2016-09-26 12:00         ` [PATCH 07/10] get_short_sha1: mark ambiguity error for translation Jeff King
2016-09-26 12:00         ` [PATCH 08/10] sha1_array: let callbacks interrupt iteration Jeff King
2016-09-26 12:00         ` [PATCH 09/10] for_each_abbrev: drop duplicate objects Jeff King
2016-09-26 12:00         ` [PATCH 10/10] get_short_sha1: list ambiguous objects on error Jeff King
2016-09-26 16:36           ` Linus Torvalds
2016-09-27  5:42             ` Jacob Keller
2016-09-27 12:38             ` Jeff King
2016-09-29 13:01             ` Kyle J. McKay
2016-09-29 13:24               ` Jeff King
2016-09-29 14:36                 ` Kyle J. McKay
2016-09-29 14:55                   ` Jeff King
2016-09-26 17:30           ` Junio C Hamano
2016-09-26 17:34             ` Jeff King
2016-09-26 17:39               ` Junio C Hamano
2016-09-29 11:46           ` Kyle J. McKay
2016-09-29 13:03             ` Jeff King
2016-09-29 17:19               ` Junio C Hamano
2016-09-30  5:51                 ` Jacob Keller
2019-02-04 16:12     ` [RFC/PATCH] core.abbrev doc: document and test the abbreviation length Ævar Arnfjörð Bjarmason
2019-02-04 19:13       ` Junio C Hamano
2019-02-04 20:04       ` Junio C Hamano
2019-02-04 21:36         ` Ævar Arnfjörð Bjarmason
2019-02-04 23:32         ` Jeff King
2019-02-04 23:50           ` Ævar Arnfjörð Bjarmason
2019-02-06 18:29             ` Jeff King
2019-02-06 18:36               ` Ævar Arnfjörð Bjarmason
2016-09-26  6:33   ` Changing the default for "core.abbrev"? Matthieu Moy
2016-09-26 12:09     ` Jeff King
2016-09-29 13:01   ` Kyle J. McKay
2016-09-26  7:13 ` Christian Couder
2016-09-28 23:30 ` [PATCH 0/4] raising core.abbrev default to 12 hexdigits Junio C Hamano
2016-09-28 23:30   ` [PATCH 1/4] config: allow customizing /etc/gitconfig location Junio C Hamano
2016-09-29  9:53     ` Jakub Narębski
2016-09-29 17:20       ` Junio C Hamano
2016-09-29 17:45         ` Matthieu Moy
2016-09-28 23:30   ` [PATCH 2/4] t13xx: do not assume system config is empty Junio C Hamano
2016-09-29  9:01     ` Jeff King
2016-09-29 18:13       ` Junio C Hamano
2016-09-29 18:26         ` Jeff King
2016-09-29 18:57           ` Junio C Hamano
2016-09-29 19:18             ` Jeff King
2016-09-29 19:57               ` Junio C Hamano
2016-09-29 19:06           ` Junio C Hamano
2016-09-29 19:26             ` Jeff King
2016-09-29 21:03               ` Junio C Hamano
2016-09-29 21:08                 ` Jeff King
2016-09-28 23:30   ` [PATCH 3/4] worktree: honor configuration variables Junio C Hamano
2016-09-28 23:30   ` [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits Junio C Hamano
2016-09-29  2:44     ` SZEDER Gábor
2016-09-29  5:27       ` Lukas Fleischer
2016-09-29  9:22         ` Jeff King
2016-09-29  9:15       ` Jeff King
2016-09-29 10:03         ` Matthieu Moy
2016-09-29 12:52         ` SZEDER Gábor
2016-09-29  5:58     ` Johannes Sixt
2016-09-29 18:05       ` Junio C Hamano
2016-09-29 18:37         ` Linus Torvalds
2016-09-29 18:55           ` Linus Torvalds
2016-09-29 19:06             ` Linus Torvalds
2016-09-29 19:42               ` Junio C Hamano
2016-09-30  0:56               ` Mike Hommey
2016-09-30  1:01                 ` Linus Torvalds
2016-09-30 19:41                   ` Ævar Arnfjörð Bjarmason
2016-09-29 19:16             ` Jeff King [this message]
2016-09-29 19:40               ` Linus Torvalds
2016-09-29 19:45                 ` Junio C Hamano
2016-09-29 21:53                   ` Linus Torvalds
2016-09-29 23:13                     ` Junio C Hamano
2016-09-29 23:20                       ` Junio C Hamano
2016-09-30  0:20                       ` Linus Torvalds
2016-09-30  0:28                         ` Linus Torvalds
2016-09-30  0:57                           ` Linus Torvalds
2016-09-30  1:18                             ` Linus Torvalds
2016-09-30  3:54                               ` Junio C Hamano
2016-09-30  4:10                                 ` Junio C Hamano
2016-09-30  4:18                                   ` Linus Torvalds
2016-09-30  4:29                                     ` Linus Torvalds
2016-09-30  4:27                                   ` Junio C Hamano
2016-09-30  4:35                                     ` Junio C Hamano
2016-09-30 18:40                                     ` Junio C Hamano
2016-09-30 18:51                                       ` Linus Torvalds
2016-09-30 19:00                                         ` Junio C Hamano
2016-09-30  4:11                                 ` Linus Torvalds
2016-09-30  8:06                               ` Jeff King
2016-09-30 17:54                                 ` Linus Torvalds
2016-09-30 18:05                                   ` Jeff King
2016-09-30 18:21                                   ` Linus Torvalds
2016-09-30 20:01                                     ` Junio C Hamano
2016-09-30 17:56                                 ` Junio C Hamano
2016-09-30  7:47                       ` Jeff King
2016-09-29  9:25     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160929191609.maxggcli76472t4g@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.