All of lore.kernel.org
 help / color / mirror / Atom feed
From: Masahiro Yamada <masahiroy@kernel.org>
To: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Brian Norris <briannorris@chromium.org>,
	Randy Dunlap <rdunlap@infradead.org>,
	Guenter Roeck <linux@roeck-us.net>,
	Bhaskar Chowdhury <unixbhaskar@gmail.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] scripts/setlocalversion: make git describe output more reliable
Date: Wed, 16 Sep 2020 23:28:16 +0900	[thread overview]
Message-ID: <CAK7LNATJ2seAuYpi-WPdLNFn2C9Vwm494nk-SWvgBJB3CmCJrw@mail.gmail.com> (raw)
In-Reply-To: <f3ce9b3e-d3c1-a96b-e14b-a8d3c4600b09@rasmusvillemoes.dk>

On Fri, Sep 11, 2020 at 5:28 PM Rasmus Villemoes
<linux@rasmusvillemoes.dk> wrote:
>
> On 10/09/2020 21.05, Brian Norris wrote:
> > On Thu, Sep 10, 2020 at 7:35 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
> >> On Thu, Sep 10, 2020 at 8:57 PM Rasmus Villemoes
> >> <linux@rasmusvillemoes.dk> wrote:
> >>> So in order to avoid `uname -a` output relying on such random details
> >>> of the build environment which are rather hard to ensure are
> >>> consistent between developers and buildbots, use an explicit
> >>> --abbrev=15 option (and for consistency, also use rev-parse --short=15
> >>> for the unlikely case of no signed tags being usable).
> >
> > For the patch:
> >
> > Reviewed-by: Brian Norris <briannorris@chromium.org>
> >
> >> I agree that any randomness should be avoided.
> >>
> >> My question is, do we need 15-digits?
> > ...
> >> So, I think the conflict happens
> >> only when we have two commits that start with the same 7-digits
> >> in the _same_ release. Is this correct?
>
> No.
>
> > For git-describe (the case where we have a tag to base off):
> > "use <n> digits, or as many digits as needed to form a unique object name"
>
> Yes, the abbreviated hash that `git describe` produces is unique among
> all objects (and objects are more than just commits) in the current
> repo, so what matters for probability-of-collision is the total number
> of objects - linus.git itself probably grows ~60000 objects per release.
>
> As for "do we need 15 digits", well, in theory the next time I merge the
> next rt-stable tag into our kernel I could end up with a commit that
> matches some existing object in the first 33 hex chars at which point I
> should have chosen 34 - but of course that's so unlikely that it's
> irrelevant.
>
> But the upshot of that is that there really is no objective answer to
> "how many digits do we need", so it becomes a tradeoff between "low
> enough probability that anyone anywhere in the next few years would have
> needed more than X when building their own kernel" and readability of
> `uname -r` etc. So I decided somewhat arbitrarily that each time one
> rolls a new release, there should be less than 1e-9 probability that
> HEAD collides with some other object when abbreviated to X hexchars.
> X=12 doesn't pass that criteria even when the repo has only 10M objects
> (and, current linus.git already has objects that need 12 to be unique,
> so such collisions do happen in practice, though of course very rarely).
> 13 and 14 are just weird numbers, so I ended with 15, which also allows
> many many more objects in the repo before the probability crosses that
> 1e-9 threshold.
>
> Rasmus
>
> Side note 1: Note that there really isn't any such thing as "last
> tag/previous tag" in a DAG; I think what git does is a best-effort
> search for the eligible tag that minimizes #({objects reachable from
> commit-to-be-described} - {objects reachable from tag}), where eligible
> tag means at least reachable from commit-to-be-described (so that set
> difference is a proper one), but there can be additional constraints
> (e.g. --match=...).
>
> Side note 2: Linus or Greg releases are actually not interesting here
> (see the logic in setlocalversion that stops early if we're exactly at
> an annotated tag) - the whole raison d'etre for setlocalversion is that
> people do maintain and build non-mainline kernels, and it's extremely
> useful for `uname -a` to accurately tell just which commit is running on
> the target.



This is because you use the output from git as-is.

Why are you still trying to rely on such obscure behavior of git?


It is pretty easy to get the fixed number of digits reliably.

For example,
git rev-parse --verify HEAD 2>/dev/null | cut -c1-7


It always returns a 7-digits hash,
and two different people will get the same result for sure.

Your solution, --short=15, means "at least 15",
which still contains ambiguity.



As I already noted, the kernelrelease string is
constructed in this format:

<kernel-version>-<number-of-commits-on-top>-<abbreviated-commit-hash>


For example, if I enable CONFIG_LOCALVERSION_AUTO=y
in today's Linus tree, I got this:

5.9.0-rc5-00005-gfc4f28bb3daf


What if the number of digits were 7?

5.9.0-rc5-00005-gfc4f28b

I see no problem here.

Even if we have another object that starts with "fc4f28b",
the kernelrelease string is still unique thanks to the
<kernel-version>-<number-of-commits-on-top> prefix.

Why do we care about the uniqueness of the abbreviated
hash in the whole git history?



Note:
We prepend $(KERNELVERSION) to the result
of setlocalversion. [1]

[1] https://github.com/torvalds/linux/blob/v5.9-rc4/Makefile#L1175


-- 
Best Regards
Masahiro Yamada

  reply	other threads:[~2020-09-16 15:55 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-10 11:26 [PATCH] scripts/setlocalversion: make git describe output more reliable Rasmus Villemoes
2020-09-10 14:28 ` Guenter Roeck
2020-09-10 14:34 ` Masahiro Yamada
2020-09-10 19:05   ` Brian Norris
2020-09-11  8:28     ` Rasmus Villemoes
2020-09-16 14:28       ` Masahiro Yamada [this message]
2020-09-16 15:23         ` Rasmus Villemoes
2020-09-16 18:01           ` Masahiro Yamada
2020-09-16 19:31             ` Rasmus Villemoes
2020-09-17  0:48               ` Masahiro Yamada
2020-09-10 22:56   ` Bhaskar Chowdhury
2020-09-16  8:48   ` Rasmus Villemoes
2020-09-17  6:56 ` [PATCH v2] " Rasmus Villemoes
2020-09-17 12:22   ` Nico Schottelius
2020-09-17 12:58     ` Rasmus Villemoes
2020-09-21  9:35       ` Nico Schottelius
2020-09-24 17:27   ` Masahiro Yamada

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAK7LNATJ2seAuYpi-WPdLNFn2C9Vwm494nk-SWvgBJB3CmCJrw@mail.gmail.com \
    --to=masahiroy@kernel.org \
    --cc=briannorris@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=linux@roeck-us.net \
    --cc=rdunlap@infradead.org \
    --cc=unixbhaskar@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.