All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rasmus Villemoes <linux@rasmusvillemoes.dk>
To: Nico Schottelius <nico-linuxsetlocalversion@schottelius.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	Brian Norris <briannorris@chromium.org>,
	Bhaskar Chowdhury <unixbhaskar@gmail.com>,
	Guenter Roeck <linux@roeck-us.net>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] scripts/setlocalversion: make git describe output more reliable
Date: Thu, 17 Sep 2020 14:58:32 +0200	[thread overview]
Message-ID: <73cb82c5-37fd-7fa3-5778-723337934a2b@rasmusvillemoes.dk> (raw)
In-Reply-To: <87pn6k384e.fsf@ungleich.ch>

On 17/09/2020 14.22, Nico Schottelius wrote:
> 
> Thanks for the patch Rasmus. Overall it looks good to me, be aligned to
> the stable patch submission rules makes sense. A tiny thing though:
> 
> I did not calculate the exact collision probability with 12 characters

For reference, the math is something like this: Consider a repo with N+1
objects. We look at one specific object (for setlocalversion its the
head commit being built, for the stable rules its whatever particular
commit one is interested in for backporting), and want to know the
probability that its sha1 collides with some other object in the first b
bits (here b=48). Assuming the sha1s are independent and uniformly
distributed, the probability of not colliding with one specific other
commit is x=1-1/2^b, and the probability of not colliding with any of
the other N commits is x^N, making the probability of a collision 1-x^N
= (1-x)(1+x+x^2+...+x^{N-1}). Now the N terms in the second factor are
very-close-to-but-slightly-smaller-than 1, so an upper bound for this
probability is (1-x)N = N/2^b, which is also what one would naively
expect. [This estimate is always valid, but it becomes a void statement
of "the probability is less then 1" when N is >= 2^b].

So, assuming some vendor kernel repo that has all of Greg's stable.git
(around 10M objects I think) and another 10M objects because random
vendor, that works out to 20e6/2^48 = 7.1e-8, 71 ppb.

> So I suggest you introduce something on the line of:
> 
> ...
> num_chars=12
> ...
> --abbrev=$num_chars

I considered that, but it becomes quite ugly since it needs to get into
the awk script (as a 13, though perhaps we could get awk to do the +1, I
don't really speak awk), where we'd then need to use " instead of ' and
then escape the $ that are to be interpreted by awk and not the shell.
So I think it's more readable with hardcoding and comments explaining
why they are there; should anyone ever want to change 12.

Rasmus

  reply	other threads:[~2020-09-17 12:59 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-10 11:26 [PATCH] scripts/setlocalversion: make git describe output more reliable Rasmus Villemoes
2020-09-10 14:28 ` Guenter Roeck
2020-09-10 14:34 ` Masahiro Yamada
2020-09-10 19:05   ` Brian Norris
2020-09-11  8:28     ` Rasmus Villemoes
2020-09-16 14:28       ` Masahiro Yamada
2020-09-16 15:23         ` Rasmus Villemoes
2020-09-16 18:01           ` Masahiro Yamada
2020-09-16 19:31             ` Rasmus Villemoes
2020-09-17  0:48               ` Masahiro Yamada
2020-09-10 22:56   ` Bhaskar Chowdhury
2020-09-16  8:48   ` Rasmus Villemoes
2020-09-17  6:56 ` [PATCH v2] " Rasmus Villemoes
2020-09-17 12:22   ` Nico Schottelius
2020-09-17 12:58     ` Rasmus Villemoes [this message]
2020-09-21  9:35       ` Nico Schottelius
2020-09-24 17:27   ` Masahiro Yamada

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=73cb82c5-37fd-7fa3-5778-723337934a2b@rasmusvillemoes.dk \
    --to=linux@rasmusvillemoes.dk \
    --cc=briannorris@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=nico-linuxsetlocalversion@schottelius.org \
    --cc=rdunlap@infradead.org \
    --cc=unixbhaskar@gmail.com \
    --cc=yamada.masahiro@socionext.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.