* [PATCH] scripts/setlocalversion: make git describe output more reliable @ 2020-09-10 11:26 Rasmus Villemoes 2020-09-10 14:28 ` Guenter Roeck ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Rasmus Villemoes @ 2020-09-10 11:26 UTC (permalink / raw) To: Masahiro Yamada, Randy Dunlap, Brian Norris, Guenter Roeck, Bhaskar Chowdhury Cc: Rasmus Villemoes, linux-kernel When building for an embedded target using Yocto, we're sometimes observing that the version string that gets built into vmlinux (and thus what uname -a reports) differs from the path under /lib/modules/ where modules get installed in the rootfs, but only in the length of the -gabc123def suffix. Hence modprobe always fails. The problem is that Yocto has the concept of "sstate" (shared state), which allows different developers/buildbots/etc. to share build artifacts, based on a hash of all the metadata that went into building that artifact - and that metadata includes all dependencies (e.g. the compiler used etc.). That normally works quite well; usually a clean build (without using any sstate cache) done by one developer ends up being binary identical to a build done on another host. However, one thing that can cause two developers to end up with different builds [and thus make one's vmlinux package incompatible with the other's kernel-dev package], which is not captured by the metadata hashing, is this `git describe`: The output of that can be affected by (1) git version: before 2.11 git defaulted to a minimum of 7, since 2.11 (git.git commit e6c587) the default is dynamic based on the number of objects in the repo (2) hence even if both run the same git version, the output can differ based on how many remotes are being tracked (or just lots of local development branches or plain old garbage) (3) and of course somebody could have a core.abbrev config setting in ~/.gitconfig So in order to avoid `uname -a` output relying on such random details of the build environment which are rather hard to ensure are consistent between developers and buildbots, use an explicit --abbrev=15 option (and for consistency, also use rev-parse --short=15 for the unlikely case of no signed tags being usable). Now, why is 60 bits enough for everyone? It's not mathematically guaranteed that git won't have to use 16 in some git repo, but it is beyond unlikely: Even in a repo with 100M objects, the probability that any given commit (i.e. the one being described) clashes with some other object in the first 15 hex chars is less than 1e-10, and currently a git repo tracking Linus', -stable and -rt only has around 10M objects. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> --- I could probably fix things by adding a 'git config --local core.abbrev 15' step to the Yocto build process after the repo to build from has been cloned but before building has started. But in the interest of binary reproducibility outside of just Yocto builds, I think it's better if this lives in the kernel. scripts/setlocalversion | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/scripts/setlocalversion b/scripts/setlocalversion index 20f2efd57b11..c5262f0d953d 100755 --- a/scripts/setlocalversion +++ b/scripts/setlocalversion @@ -45,7 +45,7 @@ scm_version() # Check for git and a git repo. if test -z "$(git rev-parse --show-cdup 2>/dev/null)" && - head=$(git rev-parse --verify --short HEAD 2>/dev/null); then + head=$(git rev-parse --verify --short=15 HEAD 2>/dev/null); then # If we are at a tagged commit (like "v2.6.30-rc6"), we ignore # it, because this version is defined in the top level Makefile. @@ -59,7 +59,7 @@ scm_version() fi # If we are past a tagged commit (like # "v2.6.30-rc5-302-g72357d5"), we pretty print it. - if atag="$(git describe 2>/dev/null)"; then + if atag="$(git describe --abbrev=15 2>/dev/null)"; then echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),$(NF))}' # If we don't have a tag at all we print -g{commitish}. -- 2.23.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] scripts/setlocalversion: make git describe output more reliable 2020-09-10 11:26 [PATCH] scripts/setlocalversion: make git describe output more reliable Rasmus Villemoes @ 2020-09-10 14:28 ` Guenter Roeck 2020-09-10 14:34 ` Masahiro Yamada 2020-09-17 6:56 ` [PATCH v2] " Rasmus Villemoes 2 siblings, 0 replies; 17+ messages in thread From: Guenter Roeck @ 2020-09-10 14:28 UTC (permalink / raw) To: Rasmus Villemoes, Masahiro Yamada, Randy Dunlap, Brian Norris, Bhaskar Chowdhury Cc: linux-kernel On 9/10/20 4:26 AM, Rasmus Villemoes wrote: > When building for an embedded target using Yocto, we're sometimes > observing that the version string that gets built into vmlinux (and > thus what uname -a reports) differs from the path under /lib/modules/ > where modules get installed in the rootfs, but only in the length of > the -gabc123def suffix. Hence modprobe always fails. > > The problem is that Yocto has the concept of "sstate" (shared state), > which allows different developers/buildbots/etc. to share build > artifacts, based on a hash of all the metadata that went into building > that artifact - and that metadata includes all dependencies (e.g. the > compiler used etc.). That normally works quite well; usually a clean > build (without using any sstate cache) done by one developer ends up > being binary identical to a build done on another host. However, one > thing that can cause two developers to end up with different builds > [and thus make one's vmlinux package incompatible with the other's > kernel-dev package], which is not captured by the metadata hashing, is > this `git describe`: The output of that can be affected by > > (1) git version: before 2.11 git defaulted to a minimum of 7, since > 2.11 (git.git commit e6c587) the default is dynamic based on the > number of objects in the repo > (2) hence even if both run the same git version, the output can differ > based on how many remotes are being tracked (or just lots of local > development branches or plain old garbage) > (3) and of course somebody could have a core.abbrev config setting in > ~/.gitconfig > > So in order to avoid `uname -a` output relying on such random details > of the build environment which are rather hard to ensure are > consistent between developers and buildbots, use an explicit > --abbrev=15 option (and for consistency, also use rev-parse --short=15 > for the unlikely case of no signed tags being usable). > > Now, why is 60 bits enough for everyone? It's not mathematically > guaranteed that git won't have to use 16 in some git repo, but it is > beyond unlikely: Even in a repo with 100M objects, the probability > that any given commit (i.e. the one being described) clashes with some > other object in the first 15 hex chars is less than 1e-10, and > currently a git repo tracking Linus', -stable and -rt only has around > 10M objects. > > Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Makes sense to me. Acked-by: Guenter Roeck <linux@roeck-us.net> > --- > I could probably fix things by adding a 'git config --local > core.abbrev 15' step to the Yocto build process after the repo to > build from has been cloned but before building has started. But in the > interest of binary reproducibility outside of just Yocto builds, I > think it's better if this lives in the kernel. > > scripts/setlocalversion | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/scripts/setlocalversion b/scripts/setlocalversion > index 20f2efd57b11..c5262f0d953d 100755 > --- a/scripts/setlocalversion > +++ b/scripts/setlocalversion > @@ -45,7 +45,7 @@ scm_version() > > # Check for git and a git repo. > if test -z "$(git rev-parse --show-cdup 2>/dev/null)" && > - head=$(git rev-parse --verify --short HEAD 2>/dev/null); then > + head=$(git rev-parse --verify --short=15 HEAD 2>/dev/null); then > > # If we are at a tagged commit (like "v2.6.30-rc6"), we ignore > # it, because this version is defined in the top level Makefile. > @@ -59,7 +59,7 @@ scm_version() > fi > # If we are past a tagged commit (like > # "v2.6.30-rc5-302-g72357d5"), we pretty print it. > - if atag="$(git describe 2>/dev/null)"; then > + if atag="$(git describe --abbrev=15 2>/dev/null)"; then > echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),$(NF))}' > > # If we don't have a tag at all we print -g{commitish}. > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] scripts/setlocalversion: make git describe output more reliable 2020-09-10 11:26 [PATCH] scripts/setlocalversion: make git describe output more reliable Rasmus Villemoes 2020-09-10 14:28 ` Guenter Roeck @ 2020-09-10 14:34 ` Masahiro Yamada 2020-09-10 19:05 ` Brian Norris ` (2 more replies) 2020-09-17 6:56 ` [PATCH v2] " Rasmus Villemoes 2 siblings, 3 replies; 17+ messages in thread From: Masahiro Yamada @ 2020-09-10 14:34 UTC (permalink / raw) To: Rasmus Villemoes Cc: Randy Dunlap, Brian Norris, Guenter Roeck, Bhaskar Chowdhury, Linux Kernel Mailing List On Thu, Sep 10, 2020 at 8:57 PM Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: > > When building for an embedded target using Yocto, we're sometimes > observing that the version string that gets built into vmlinux (and > thus what uname -a reports) differs from the path under /lib/modules/ > where modules get installed in the rootfs, but only in the length of > the -gabc123def suffix. Hence modprobe always fails. > > The problem is that Yocto has the concept of "sstate" (shared state), > which allows different developers/buildbots/etc. to share build > artifacts, based on a hash of all the metadata that went into building > that artifact - and that metadata includes all dependencies (e.g. the > compiler used etc.). That normally works quite well; usually a clean > build (without using any sstate cache) done by one developer ends up > being binary identical to a build done on another host. However, one > thing that can cause two developers to end up with different builds > [and thus make one's vmlinux package incompatible with the other's > kernel-dev package], which is not captured by the metadata hashing, is > this `git describe`: The output of that can be affected by > > (1) git version: before 2.11 git defaulted to a minimum of 7, since > 2.11 (git.git commit e6c587) the default is dynamic based on the > number of objects in the repo > (2) hence even if both run the same git version, the output can differ > based on how many remotes are being tracked (or just lots of local > development branches or plain old garbage) > (3) and of course somebody could have a core.abbrev config setting in > ~/.gitconfig > > So in order to avoid `uname -a` output relying on such random details > of the build environment which are rather hard to ensure are > consistent between developers and buildbots, use an explicit > --abbrev=15 option (and for consistency, also use rev-parse --short=15 > for the unlikely case of no signed tags being usable). > > Now, why is 60 bits enough for everyone? It's not mathematically > guaranteed that git won't have to use 16 in some git repo, but it is > beyond unlikely: Even in a repo with 100M objects, the probability > that any given commit (i.e. the one being described) clashes with some > other object in the first 15 hex chars is less than 1e-10, and > currently a git repo tracking Linus', -stable and -rt only has around > 10M objects. I agree that any randomness should be avoided. My question is, do we need 15-digits? The kernelrelease is formed by [kernel version] + [some digits of git hash]. For example, "git describe" shows as follows: v5.9.0-rc4-00034-g7fe10096c150 Linus gives a new tag every week (or every two week). So, I think the conflict happens only when we have two commits that start with the same 7-digits in the _same_ release. Is this correct? We have 14000 - 15000 commits in each release, not 100M. > > Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> > --- > I could probably fix things by adding a 'git config --local > core.abbrev 15' step to the Yocto build process after the repo to > build from has been cloned but before building has started. But in the > interest of binary reproducibility outside of just Yocto builds, I > think it's better if this lives in the kernel. > > scripts/setlocalversion | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/scripts/setlocalversion b/scripts/setlocalversion > index 20f2efd57b11..c5262f0d953d 100755 > --- a/scripts/setlocalversion > +++ b/scripts/setlocalversion > @@ -45,7 +45,7 @@ scm_version() > > # Check for git and a git repo. > if test -z "$(git rev-parse --show-cdup 2>/dev/null)" && > - head=$(git rev-parse --verify --short HEAD 2>/dev/null); then > + head=$(git rev-parse --verify --short=15 HEAD 2>/dev/null); then > > # If we are at a tagged commit (like "v2.6.30-rc6"), we ignore > # it, because this version is defined in the top level Makefile. > @@ -59,7 +59,7 @@ scm_version() > fi > # If we are past a tagged commit (like > # "v2.6.30-rc5-302-g72357d5"), we pretty print it. > - if atag="$(git describe 2>/dev/null)"; then > + if atag="$(git describe --abbrev=15 2>/dev/null)"; then > echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),$(NF))}' > > # If we don't have a tag at all we print -g{commitish}. > -- > 2.23.0 > -- Best Regards Masahiro Yamada ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] scripts/setlocalversion: make git describe output more reliable 2020-09-10 14:34 ` Masahiro Yamada @ 2020-09-10 19:05 ` Brian Norris 2020-09-11 8:28 ` Rasmus Villemoes 2020-09-10 22:56 ` Bhaskar Chowdhury 2020-09-16 8:48 ` Rasmus Villemoes 2 siblings, 1 reply; 17+ messages in thread From: Brian Norris @ 2020-09-10 19:05 UTC (permalink / raw) To: Masahiro Yamada Cc: Rasmus Villemoes, Randy Dunlap, Guenter Roeck, Bhaskar Chowdhury, Linux Kernel Mailing List On Thu, Sep 10, 2020 at 7:35 AM Masahiro Yamada <masahiroy@kernel.org> wrote: > On Thu, Sep 10, 2020 at 8:57 PM Rasmus Villemoes > <linux@rasmusvillemoes.dk> wrote: > > So in order to avoid `uname -a` output relying on such random details > > of the build environment which are rather hard to ensure are > > consistent between developers and buildbots, use an explicit > > --abbrev=15 option (and for consistency, also use rev-parse --short=15 > > for the unlikely case of no signed tags being usable). For the patch: Reviewed-by: Brian Norris <briannorris@chromium.org> > I agree that any randomness should be avoided. > > My question is, do we need 15-digits? ... > So, I think the conflict happens > only when we have two commits that start with the same 7-digits > in the _same_ release. Is this correct? For the rev-parse usage ("unlikely case where we have no signed tag"), the total number of objects is definitely relevant, and the man-page says: "unique prefix with at least length characters" i.e., it might be longer, if needed for uniqueness. For git-describe (the case where we have a tag to base off): "use <n> digits, or as many digits as needed to form a unique object name" I'm not quite sure whether the uniqueness is including anything about the tag-relative prefix, but if not, then we have the same problem. At least, that's my reading of the situation. Brian ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] scripts/setlocalversion: make git describe output more reliable 2020-09-10 19:05 ` Brian Norris @ 2020-09-11 8:28 ` Rasmus Villemoes 2020-09-16 14:28 ` Masahiro Yamada 0 siblings, 1 reply; 17+ messages in thread From: Rasmus Villemoes @ 2020-09-11 8:28 UTC (permalink / raw) To: Brian Norris, Masahiro Yamada Cc: Randy Dunlap, Guenter Roeck, Bhaskar Chowdhury, Linux Kernel Mailing List On 10/09/2020 21.05, Brian Norris wrote: > On Thu, Sep 10, 2020 at 7:35 AM Masahiro Yamada <masahiroy@kernel.org> wrote: >> On Thu, Sep 10, 2020 at 8:57 PM Rasmus Villemoes >> <linux@rasmusvillemoes.dk> wrote: >>> So in order to avoid `uname -a` output relying on such random details >>> of the build environment which are rather hard to ensure are >>> consistent between developers and buildbots, use an explicit >>> --abbrev=15 option (and for consistency, also use rev-parse --short=15 >>> for the unlikely case of no signed tags being usable). > > For the patch: > > Reviewed-by: Brian Norris <briannorris@chromium.org> > >> I agree that any randomness should be avoided. >> >> My question is, do we need 15-digits? > ... >> So, I think the conflict happens >> only when we have two commits that start with the same 7-digits >> in the _same_ release. Is this correct? No. > For git-describe (the case where we have a tag to base off): > "use <n> digits, or as many digits as needed to form a unique object name" Yes, the abbreviated hash that `git describe` produces is unique among all objects (and objects are more than just commits) in the current repo, so what matters for probability-of-collision is the total number of objects - linus.git itself probably grows ~60000 objects per release. As for "do we need 15 digits", well, in theory the next time I merge the next rt-stable tag into our kernel I could end up with a commit that matches some existing object in the first 33 hex chars at which point I should have chosen 34 - but of course that's so unlikely that it's irrelevant. But the upshot of that is that there really is no objective answer to "how many digits do we need", so it becomes a tradeoff between "low enough probability that anyone anywhere in the next few years would have needed more than X when building their own kernel" and readability of `uname -r` etc. So I decided somewhat arbitrarily that each time one rolls a new release, there should be less than 1e-9 probability that HEAD collides with some other object when abbreviated to X hexchars. X=12 doesn't pass that criteria even when the repo has only 10M objects (and, current linus.git already has objects that need 12 to be unique, so such collisions do happen in practice, though of course very rarely). 13 and 14 are just weird numbers, so I ended with 15, which also allows many many more objects in the repo before the probability crosses that 1e-9 threshold. Rasmus Side note 1: Note that there really isn't any such thing as "last tag/previous tag" in a DAG; I think what git does is a best-effort search for the eligible tag that minimizes #({objects reachable from commit-to-be-described} - {objects reachable from tag}), where eligible tag means at least reachable from commit-to-be-described (so that set difference is a proper one), but there can be additional constraints (e.g. --match=...). Side note 2: Linus or Greg releases are actually not interesting here (see the logic in setlocalversion that stops early if we're exactly at an annotated tag) - the whole raison d'etre for setlocalversion is that people do maintain and build non-mainline kernels, and it's extremely useful for `uname -a` to accurately tell just which commit is running on the target. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] scripts/setlocalversion: make git describe output more reliable 2020-09-11 8:28 ` Rasmus Villemoes @ 2020-09-16 14:28 ` Masahiro Yamada 2020-09-16 15:23 ` Rasmus Villemoes 0 siblings, 1 reply; 17+ messages in thread From: Masahiro Yamada @ 2020-09-16 14:28 UTC (permalink / raw) To: Rasmus Villemoes Cc: Brian Norris, Randy Dunlap, Guenter Roeck, Bhaskar Chowdhury, Linux Kernel Mailing List On Fri, Sep 11, 2020 at 5:28 PM Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: > > On 10/09/2020 21.05, Brian Norris wrote: > > On Thu, Sep 10, 2020 at 7:35 AM Masahiro Yamada <masahiroy@kernel.org> wrote: > >> On Thu, Sep 10, 2020 at 8:57 PM Rasmus Villemoes > >> <linux@rasmusvillemoes.dk> wrote: > >>> So in order to avoid `uname -a` output relying on such random details > >>> of the build environment which are rather hard to ensure are > >>> consistent between developers and buildbots, use an explicit > >>> --abbrev=15 option (and for consistency, also use rev-parse --short=15 > >>> for the unlikely case of no signed tags being usable). > > > > For the patch: > > > > Reviewed-by: Brian Norris <briannorris@chromium.org> > > > >> I agree that any randomness should be avoided. > >> > >> My question is, do we need 15-digits? > > ... > >> So, I think the conflict happens > >> only when we have two commits that start with the same 7-digits > >> in the _same_ release. Is this correct? > > No. > > > For git-describe (the case where we have a tag to base off): > > "use <n> digits, or as many digits as needed to form a unique object name" > > Yes, the abbreviated hash that `git describe` produces is unique among > all objects (and objects are more than just commits) in the current > repo, so what matters for probability-of-collision is the total number > of objects - linus.git itself probably grows ~60000 objects per release. > > As for "do we need 15 digits", well, in theory the next time I merge the > next rt-stable tag into our kernel I could end up with a commit that > matches some existing object in the first 33 hex chars at which point I > should have chosen 34 - but of course that's so unlikely that it's > irrelevant. > > But the upshot of that is that there really is no objective answer to > "how many digits do we need", so it becomes a tradeoff between "low > enough probability that anyone anywhere in the next few years would have > needed more than X when building their own kernel" and readability of > `uname -r` etc. So I decided somewhat arbitrarily that each time one > rolls a new release, there should be less than 1e-9 probability that > HEAD collides with some other object when abbreviated to X hexchars. > X=12 doesn't pass that criteria even when the repo has only 10M objects > (and, current linus.git already has objects that need 12 to be unique, > so such collisions do happen in practice, though of course very rarely). > 13 and 14 are just weird numbers, so I ended with 15, which also allows > many many more objects in the repo before the probability crosses that > 1e-9 threshold. > > Rasmus > > Side note 1: Note that there really isn't any such thing as "last > tag/previous tag" in a DAG; I think what git does is a best-effort > search for the eligible tag that minimizes #({objects reachable from > commit-to-be-described} - {objects reachable from tag}), where eligible > tag means at least reachable from commit-to-be-described (so that set > difference is a proper one), but there can be additional constraints > (e.g. --match=...). > > Side note 2: Linus or Greg releases are actually not interesting here > (see the logic in setlocalversion that stops early if we're exactly at > an annotated tag) - the whole raison d'etre for setlocalversion is that > people do maintain and build non-mainline kernels, and it's extremely > useful for `uname -a` to accurately tell just which commit is running on > the target. This is because you use the output from git as-is. Why are you still trying to rely on such obscure behavior of git? It is pretty easy to get the fixed number of digits reliably. For example, git rev-parse --verify HEAD 2>/dev/null | cut -c1-7 It always returns a 7-digits hash, and two different people will get the same result for sure. Your solution, --short=15, means "at least 15", which still contains ambiguity. As I already noted, the kernelrelease string is constructed in this format: <kernel-version>-<number-of-commits-on-top>-<abbreviated-commit-hash> For example, if I enable CONFIG_LOCALVERSION_AUTO=y in today's Linus tree, I got this: 5.9.0-rc5-00005-gfc4f28bb3daf What if the number of digits were 7? 5.9.0-rc5-00005-gfc4f28b I see no problem here. Even if we have another object that starts with "fc4f28b", the kernelrelease string is still unique thanks to the <kernel-version>-<number-of-commits-on-top> prefix. Why do we care about the uniqueness of the abbreviated hash in the whole git history? Note: We prepend $(KERNELVERSION) to the result of setlocalversion. [1] [1] https://github.com/torvalds/linux/blob/v5.9-rc4/Makefile#L1175 -- Best Regards Masahiro Yamada ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] scripts/setlocalversion: make git describe output more reliable 2020-09-16 14:28 ` Masahiro Yamada @ 2020-09-16 15:23 ` Rasmus Villemoes 2020-09-16 18:01 ` Masahiro Yamada 0 siblings, 1 reply; 17+ messages in thread From: Rasmus Villemoes @ 2020-09-16 15:23 UTC (permalink / raw) To: Masahiro Yamada Cc: Brian Norris, Randy Dunlap, Guenter Roeck, Bhaskar Chowdhury, Linux Kernel Mailing List On 16/09/2020 16.28, Masahiro Yamada wrote: > On Fri, Sep 11, 2020 at 5:28 PM Rasmus Villemoes > <linux@rasmusvillemoes.dk> wrote: >> >> On 10/09/2020 21.05, Brian Norris wrote: >>> On Thu, Sep 10, 2020 at 7:35 AM Masahiro Yamada <masahiroy@kernel.org> wrote: >>>> On Thu, Sep 10, 2020 at 8:57 PM Rasmus Villemoes >>>> <linux@rasmusvillemoes.dk> wrote: >>>>> So in order to avoid `uname -a` output relying on such random details >>>>> of the build environment which are rather hard to ensure are >>>>> consistent between developers and buildbots, use an explicit >>>>> --abbrev=15 option (and for consistency, also use rev-parse --short=15 >>>>> for the unlikely case of no signed tags being usable). >>> >>> For the patch: >>> >>> Reviewed-by: Brian Norris <briannorris@chromium.org> >>> >>>> I agree that any randomness should be avoided. >>>> >>>> My question is, do we need 15-digits? >>> ... >>>> So, I think the conflict happens >>>> only when we have two commits that start with the same 7-digits >>>> in the _same_ release. Is this correct? >> >> No. >> >>> For git-describe (the case where we have a tag to base off): >>> "use <n> digits, or as many digits as needed to form a unique object name" >> >> Yes, the abbreviated hash that `git describe` produces is unique among >> all objects (and objects are more than just commits) in the current >> repo, so what matters for probability-of-collision is the total number >> of objects - linus.git itself probably grows ~60000 objects per release. >> >> As for "do we need 15 digits", well, in theory the next time I merge the >> next rt-stable tag into our kernel I could end up with a commit that >> matches some existing object in the first 33 hex chars at which point I >> should have chosen 34 - but of course that's so unlikely that it's >> irrelevant. >> >> But the upshot of that is that there really is no objective answer to >> "how many digits do we need", so it becomes a tradeoff between "low >> enough probability that anyone anywhere in the next few years would have >> needed more than X when building their own kernel" and readability of >> `uname -r` etc. So I decided somewhat arbitrarily that each time one >> rolls a new release, there should be less than 1e-9 probability that >> HEAD collides with some other object when abbreviated to X hexchars. >> X=12 doesn't pass that criteria even when the repo has only 10M objects >> (and, current linus.git already has objects that need 12 to be unique, >> so such collisions do happen in practice, though of course very rarely). >> 13 and 14 are just weird numbers, so I ended with 15, which also allows >> many many more objects in the repo before the probability crosses that >> 1e-9 threshold. >> > > > This is because you use the output from git as-is. > > Why are you still trying to rely on such obscure behavior of git? > > > It is pretty easy to get the fixed number of digits reliably. > > For example, > git rev-parse --verify HEAD 2>/dev/null | cut -c1-7 > > > It always returns a 7-digits hash, > and two different people will get the same result for sure. > > Your solution, --short=15, means "at least 15", > which still contains ambiguity. > > > > As I already noted, the kernelrelease string is > constructed in this format: > > <kernel-version>-<number-of-commits-on-top>-<abbreviated-commit-hash> > > > For example, if I enable CONFIG_LOCALVERSION_AUTO=y > in today's Linus tree, I got this: > > 5.9.0-rc5-00005-gfc4f28bb3daf > > > What if the number of digits were 7? > > 5.9.0-rc5-00005-gfc4f28b > > I see no problem here. The problem is that currently, the build relies on things which cannot easily be controlled or reproduced between different developers: Apart from git version (which is reasonable to mandate is the same, just like one must use same compiler, binutils etc. to get binary reproducible output), it depends on whether ~/.gitconfig has a core.abbrev setting - and even worse, it depends on the _total number of objects in the source git repo_, i.e. something that has nothing to do with what is currently in the work tree at all. And that leads to real bugs when one builds external modules that end up in one directory in the rootfs, while `uname -r` tells modprobe to look in some other directory (differing in the length of the abbreviated hash). > Even if we have another object that starts with "fc4f28b", > the kernelrelease string is still unique thanks to the > <kernel-version>-<number-of-commits-on-top> prefix. > > Why do we care about the uniqueness of the abbreviated > hash in the whole git history? Because when I ask a customer "what kernel are you running", and they tell me "4.19.45-rt67-00567-123abc8", I prefer that 123abc8 uniquely identifies the right commit, instead of me having to dig around each of the commits starting with that prefix and see which one of them matches "is exactly 567 commits from 4.19.45-rt67". 7 hexchars is nowhere near enough for that today, which is why Linus himself did that "auto-tune abbrev based on repo size" patch for git.git. But what I mostly care about is getting a consistent result. And yes, you're right that the porcelain command 'git describe' could end up using something more than 15 digits (though that's extremely unlikely). So if you prefer, I can try to rewrite the logic purely in terms of plumbing commands. But that's a much more invasive patch, and one would obviously lose the guarantee of the abbreviation being unique among current git objects. Alternatively, would you consider a Kconfig knob, LOCALVERSION_ABBREV? Rasmus ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] scripts/setlocalversion: make git describe output more reliable 2020-09-16 15:23 ` Rasmus Villemoes @ 2020-09-16 18:01 ` Masahiro Yamada 2020-09-16 19:31 ` Rasmus Villemoes 0 siblings, 1 reply; 17+ messages in thread From: Masahiro Yamada @ 2020-09-16 18:01 UTC (permalink / raw) To: Rasmus Villemoes Cc: Brian Norris, Randy Dunlap, Guenter Roeck, Bhaskar Chowdhury, Linux Kernel Mailing List On Thu, Sep 17, 2020 at 12:23 AM Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: > > On 16/09/2020 16.28, Masahiro Yamada wrote: > > On Fri, Sep 11, 2020 at 5:28 PM Rasmus Villemoes > > <linux@rasmusvillemoes.dk> wrote: > >> > >> On 10/09/2020 21.05, Brian Norris wrote: > >>> On Thu, Sep 10, 2020 at 7:35 AM Masahiro Yamada <masahiroy@kernel.org> wrote: > >>>> On Thu, Sep 10, 2020 at 8:57 PM Rasmus Villemoes > >>>> <linux@rasmusvillemoes.dk> wrote: > >>>>> So in order to avoid `uname -a` output relying on such random details > >>>>> of the build environment which are rather hard to ensure are > >>>>> consistent between developers and buildbots, use an explicit > >>>>> --abbrev=15 option (and for consistency, also use rev-parse --short=15 > >>>>> for the unlikely case of no signed tags being usable). > >>> > >>> For the patch: > >>> > >>> Reviewed-by: Brian Norris <briannorris@chromium.org> > >>> > >>>> I agree that any randomness should be avoided. > >>>> > >>>> My question is, do we need 15-digits? > >>> ... > >>>> So, I think the conflict happens > >>>> only when we have two commits that start with the same 7-digits > >>>> in the _same_ release. Is this correct? > >> > >> No. > >> > >>> For git-describe (the case where we have a tag to base off): > >>> "use <n> digits, or as many digits as needed to form a unique object name" > >> > >> Yes, the abbreviated hash that `git describe` produces is unique among > >> all objects (and objects are more than just commits) in the current > >> repo, so what matters for probability-of-collision is the total number > >> of objects - linus.git itself probably grows ~60000 objects per release. > >> > >> As for "do we need 15 digits", well, in theory the next time I merge the > >> next rt-stable tag into our kernel I could end up with a commit that > >> matches some existing object in the first 33 hex chars at which point I > >> should have chosen 34 - but of course that's so unlikely that it's > >> irrelevant. > >> > >> But the upshot of that is that there really is no objective answer to > >> "how many digits do we need", so it becomes a tradeoff between "low > >> enough probability that anyone anywhere in the next few years would have > >> needed more than X when building their own kernel" and readability of > >> `uname -r` etc. So I decided somewhat arbitrarily that each time one > >> rolls a new release, there should be less than 1e-9 probability that > >> HEAD collides with some other object when abbreviated to X hexchars. > >> X=12 doesn't pass that criteria even when the repo has only 10M objects > >> (and, current linus.git already has objects that need 12 to be unique, > >> so such collisions do happen in practice, though of course very rarely). > >> 13 and 14 are just weird numbers, so I ended with 15, which also allows > >> many many more objects in the repo before the probability crosses that > >> 1e-9 threshold. > >> > > > > > > This is because you use the output from git as-is. > > > > Why are you still trying to rely on such obscure behavior of git? > > > > > > It is pretty easy to get the fixed number of digits reliably. > > > > For example, > > git rev-parse --verify HEAD 2>/dev/null | cut -c1-7 > > > > > > It always returns a 7-digits hash, > > and two different people will get the same result for sure. > > > > Your solution, --short=15, means "at least 15", > > which still contains ambiguity. > > > > > > > > As I already noted, the kernelrelease string is > > constructed in this format: > > > > <kernel-version>-<number-of-commits-on-top>-<abbreviated-commit-hash> > > > > > > For example, if I enable CONFIG_LOCALVERSION_AUTO=y > > in today's Linus tree, I got this: > > > > 5.9.0-rc5-00005-gfc4f28bb3daf > > > > > > What if the number of digits were 7? > > > > 5.9.0-rc5-00005-gfc4f28b > > > > I see no problem here. > > The problem is that currently, the build relies on things which cannot > easily be controlled or reproduced between different developers: Apart > from git version (which is reasonable to mandate is the same, just like > one must use same compiler, binutils etc. to get binary reproducible > output), it depends on whether ~/.gitconfig has a core.abbrev setting - > and even worse, it depends on the _total number of objects in the source > git repo_, i.e. something that has nothing to do with what is currently > in the work tree at all. > > And that leads to real bugs when one builds external modules that end up > in one directory in the rootfs, while `uname -r` tells modprobe to look > in some other directory (differing in the length of the abbreviated hash). > > > Even if we have another object that starts with "fc4f28b", > > the kernelrelease string is still unique thanks to the > > <kernel-version>-<number-of-commits-on-top> prefix. > > > > Why do we care about the uniqueness of the abbreviated > > hash in the whole git history? > > Because when I ask a customer "what kernel are you running", and they > tell me "4.19.45-rt67-00567-123abc8", I prefer that 123abc8 uniquely > identifies the right commit, instead of me having to dig around each of > the commits starting with that prefix and see which one of them matches > "is exactly 567 commits from 4.19.45-rt67". 7 hexchars is nowhere near > enough for that today, which is why Linus himself did that "auto-tune > abbrev based on repo size" patch for git.git. I like: git rev-parse --verify HEAD 2>/dev/null | cut -c1-15 better than: git rev-parse --verify --short=15 HEAD 2>/dev/null The former produces the deterministic kernelrelease string. But, let's reconsider if we need as long as 15-digits. There are a couple of kinds of objects in git: commit, tree, blob. I think you came up with 15-digits to ensure the uniqueness in _all_ kinds of objects. But, when your customer says "4.19.45-rt67-00567-123abc8", 123abc8 is apparently a commit instead of a tree or a blob. In the context of the kernel version, we can exclude tree and blob objects. In other words, I think "grows ~60000 objects per release" is somewhat over-estimating. We have ~15000 commits per release. So, the difference is just 4x factor, though... BTW, I did some experiments, and I noticed "git log" accepts a shorter hash than "git show" does presumably because "git log" only takes a commit. For example, "git show 06a0d" fails, but "git log 06a0d" works. masahiro@oscar:~/ref/linux$ git show 06a0d error: short SHA1 06a0d is ambiguous hint: The candidates are: hint: 06a0df4d1b8b commit 2020-09-08 - fbcon: remove now unusued 'softback_lines' cursor() argument hint: 06a0d81911b3 tree hint: 06a0dc5a84d2 tree hint: 06a0d1947c77 blob hint: 06a0df434249 blob fatal: ambiguous argument '06a0d': unknown revision or path not in the working tree. Use '--' to separate paths from revisions, like this: 'git <command> [<revision>...] -- [<file>...]' masahiro@oscar:~/ref/linux$ git log --oneline -1 06a0d 06a0df4d1b8b fbcon: remove now unusued 'softback_lines' cursor() argument What is interesting is, if you prepend <tag>-<number-of-commits>- to the abbreviated hash, "git show" accepts as short as a commit "git log" does. masahiro@oscar:~/ref/linux$ git show v5.9-rc5-00002-g06a0d | head -1 commit 06a0df4d1b8b13b551668e47b11fd7629033b7df I guess git cleverly understands it refers to a commit object since "v5.9-rc5-00002-" prefix only makes sense in the commit context. Keeping those above in mind, I believe 15-digits is too long. So, I propose two options. [1] 7 digits The abbreviated hash part may not uniquely identify the commit. In that case, you need some extra git operations to find out which one is it. As for the kernel build, <kernel-version>-<number-of-commits>-<7-digits> is enough to provide the unique kernelrelease string. [2] 12 digits This matches to the Fixes: tag policy specified in Documentation/process/submitting-patches.rst The abbreviated hash part is very likely unique in the commit object space. (Of course, it is impossible to guarantee the uniqueness) I wait for some comments. > But what I mostly care about is getting a consistent result. And yes, > you're right that the porcelain command 'git describe' could end up > using something more than 15 digits (though that's extremely unlikely). > So if you prefer, I can try to rewrite the logic purely in terms of > plumbing commands. But that's a much more invasive patch, and one would > obviously lose the guarantee of the abbreviation being unique among > current git objects. > > Alternatively, would you consider a Kconfig knob, LOCALVERSION_ABBREV? No, I do not think LOCALVERSION_ABBREV is a good idea. We should eliminate this problem irrespective of the kernel configuration. -- Best Regards Masahiro Yamada ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] scripts/setlocalversion: make git describe output more reliable 2020-09-16 18:01 ` Masahiro Yamada @ 2020-09-16 19:31 ` Rasmus Villemoes 2020-09-17 0:48 ` Masahiro Yamada 0 siblings, 1 reply; 17+ messages in thread From: Rasmus Villemoes @ 2020-09-16 19:31 UTC (permalink / raw) To: Masahiro Yamada Cc: Brian Norris, Randy Dunlap, Guenter Roeck, Bhaskar Chowdhury, Linux Kernel Mailing List On 16/09/2020 20.01, Masahiro Yamada wrote: > On Thu, Sep 17, 2020 at 12:23 AM Rasmus Villemoes > <linux@rasmusvillemoes.dk> wrote: >> >> On 16/09/2020 16.28, Masahiro Yamada wrote: >>> On Fri, Sep 11, 2020 at 5:28 PM Rasmus Villemoes >>> <linux@rasmusvillemoes.dk> wrote: >>>> >>> Why do we care about the uniqueness of the abbreviated >>> hash in the whole git history? >> >> Because when I ask a customer "what kernel are you running", and they >> tell me "4.19.45-rt67-00567-123abc8", I prefer that 123abc8 uniquely >> identifies the right commit, instead of me having to dig around each of >> the commits starting with that prefix and see which one of them matches >> "is exactly 567 commits from 4.19.45-rt67". 7 hexchars is nowhere near >> enough for that today, which is why Linus himself did that "auto-tune >> abbrev based on repo size" patch for git.git. > > I like: > > git rev-parse --verify HEAD 2>/dev/null | cut -c1-15 > > better than: > > git rev-parse --verify --short=15 HEAD 2>/dev/null > > The former produces the deterministic kernelrelease string. > > > But, let's reconsider if we need as long as 15-digits. > > There are a couple of kinds of objects in git: commit, tree, blob. > > I think you came up with 15-digits to ensure the uniqueness > in _all_ kinds of objects. > > But, when your customer says "4.19.45-rt67-00567-123abc8", > 123abc8 is apparently a commit instead of a tree or a blob. > > > > In the context of the kernel version, we can exclude > tree and blob objects. > > In other words, I think "grows ~60000 objects per release" > is somewhat over-estimating. > > We have ~15000 commits per release. So, the difference > is just 4x factor, though... > > > > BTW, I did some experiments, and I noticed > "git log" accepts a shorter hash > than "git show" does presumably because > "git log" only takes a commit. > > > > For example, "git show 06a0d" fails, but > "git log 06a0d" works. > > > masahiro@oscar:~/ref/linux$ git show 06a0d > error: short SHA1 06a0d is ambiguous > hint: The candidates are: > hint: 06a0df4d1b8b commit 2020-09-08 - fbcon: remove now unusued > 'softback_lines' cursor() argument > hint: 06a0d81911b3 tree > hint: 06a0dc5a84d2 tree > hint: 06a0d1947c77 blob > hint: 06a0df434249 blob > fatal: ambiguous argument '06a0d': unknown revision or path not in the > working tree. > Use '--' to separate paths from revisions, like this: > 'git <command> [<revision>...] -- [<file>...]' > masahiro@oscar:~/ref/linux$ git log --oneline -1 06a0d > 06a0df4d1b8b fbcon: remove now unusued 'softback_lines' cursor() argument > > > > > What is interesting is, if you prepend <tag>-<number-of-commits>- > to the abbreviated hash, "git show" accepts as short as a commit > "git log" does. > > masahiro@oscar:~/ref/linux$ git show v5.9-rc5-00002-g06a0d | head -1 > commit 06a0df4d1b8b13b551668e47b11fd7629033b7df > > > I guess git cleverly understands it refers to a commit object > since "v5.9-rc5-00002-" prefix only makes sense > in the commit context. > Well, yes, but unfortunately only so far as applying the same "the user means a commit object" logic; git doesn't do anything to actually use the prefix to disambiguate. In fact, looking at a semi-random commit in 4.19-stable v4.19.114-7-g236c445eb352: $ git show 236c44 error: short SHA1 236c44 is ambiguous hint: The candidates are: hint: 236c445eb352 commit 2020-03-13 - drm/bochs: downgrade pci_request_region failure from error to warning hint: 236c448cb6e7 commit 2011-09-08 - usbmon vs. tcpdump: fix dropped packet count hint: 236c445b1930 blob Now you're right that we get $ git show v4.19.114-7-g236c445 | head -n1 commit 236c445eb3529aa7c976f8812513c3cb26d77e27 so it knows we're not looking at that blob. But "git show v4.19.114-7-g236c44" still fails because there are two commits. Adding to the fun is that "git show v4.19.114-7-g236c448" actually shows the usbmon commit, which is old v3.2 stuff and certainly not a descendant of v4.19.114. I didn't actually know that git would even accept those prefixed strings as possible refspecs, and for a moment you had me hoping that git would actually do the disambiguation using that prefix, which would certainly make 7 hex chars enough; unfortunately that's not the case. > Keeping those above in mind, I believe 15-digits is too long. Well, as you indicated, commits are probably ~1/4 of all objects, i.e. half a hexchar worth of bits. So the commit/object distinction shouldn't matter very much for the choice of suitable fixed length. > So, I propose two options. > > > [1] 7 digits > > The abbreviated hash part may not uniquely identify > the commit. In that case, you need some extra git > operations to find out which one is it. > > As for the kernel build, > <kernel-version>-<number-of-commits>-<7-digits> is enough > to provide the unique kernelrelease string. > > > > [2] 12 digits > > This matches to the Fixes: tag policy specified in > Documentation/process/submitting-patches.rst > > The abbreviated hash part is very likely unique > in the commit object space. > (Of course, it is impossible to guarantee the uniqueness) I can live with 12. It would be extremely rare that I would have to do some extra git yoga to figure out which commit they actually mean. I think we can still use git describe to do the bulk of the work (the 'git rev-parse | cut ... is easy, but it's somewhat tedious to find the closest tag, then compute the #commits-on-top part), then just have the awk script used for pretty-printing (the %05d of the #commits-on-top) can probably also be used to chop the abbreviated sha1 at 12 chars, should git have needed to make it longer. Please see below for an updated patch+commit log. >> Alternatively, would you consider a Kconfig knob, LOCALVERSION_ABBREV? > > > No, I do not think LOCALVERSION_ABBREV is a good idea. Neither do I; I considered it before sending the patch but decided that yes: > We should eliminate this problem > irrespective of the kernel configuration. Rasmus ==== something like this, then? scripts/setlocalversion: make git describe output more reliable When building for an embedded target using Yocto, we're sometimes observing that the version string that gets built into vmlinux (and thus what uname -a reports) differs from the path under /lib/modules/ where modules get installed in the rootfs, but only in the length of the -gabc123def suffix. Hence modprobe always fails. The problem is that Yocto has the concept of "sstate" (shared state), which allows different developers/buildbots/etc. to share build artifacts, based on a hash of all the metadata that went into building that artifact - and that metadata includes all dependencies (e.g. the compiler used etc.). That normally works quite well; usually a clean build (without using any sstate cache) done by one developer ends up being binary identical to a build done on another host. However, one thing that can cause two developers to end up with different builds [and thus make one's vmlinux package incompatible with the other's kernel-dev package], which is not captured by the metadata hashing, is this `git describe`: The output of that can be affected by (1) git version: before 2.11 git defaulted to a minimum of 7, since 2.11 (git.git commit e6c587) the default is dynamic based on the number of objects in the repo (2) hence even if both run the same git version, the output can differ based on how many remotes are being tracked (or just lots of local development branches or plain old garbage) (3) and of course somebody could have a core.abbrev config setting in ~/.gitconfig So in order to avoid `uname -a` output relying on such random details of the build environment which are rather hard to ensure are consistent between developers and buildbots, make sure the abbreviated sha1 always consists of exactly 12 hex characters. That is consistent with the current rule for -stable patches, and is almost always enough to identify the head commit unambigously - in the few cases where it does not, the v5.4.3-00021- prefix would certainly nail it down. diff --git a/scripts/setlocalversion b/scripts/setlocalversion index 20f2efd57b11..b51072167df1 100755 --- a/scripts/setlocalversion +++ b/scripts/setlocalversion @@ -45,7 +45,7 @@ scm_version() # Check for git and a git repo. if test -z "$(git rev-parse --show-cdup 2>/dev/null)" && - head=$(git rev-parse --verify --short HEAD 2>/dev/null); then + head=$(git rev-parse --verify HEAD 2>/dev/null); then # If we are at a tagged commit (like "v2.6.30-rc6"), we ignore # it, because this version is defined in the top level Makefile. @@ -59,11 +59,22 @@ scm_version() fi # If we are past a tagged commit (like # "v2.6.30-rc5-302-g72357d5"), we pretty print it. - if atag="$(git describe 2>/dev/null)"; then - echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),$(NF))}' - - # If we don't have a tag at all we print -g{commitish}. + # + # Ensure the abbreviated sha1 has exactly 12 + # hex characters, to make the output + # independent of git version, local + # core.abbrev settings and/or total number of + # objects in the current repository - passing + # --abbrev=12 ensures a minimum of 12, and the + # awk substr() then picks the 'g' and first 12 + # hex chars. + if atag="$(git describe --abbrev=12 2>/dev/null)"; then + echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),substr($(NF),0,13))}' + + # If we don't have a tag at all we print -g{commitish}, + # again using exactly 12 hex chars. else + head="$(echo $head | cut -c1-12)" printf '%s%s' -g $head fi fi ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] scripts/setlocalversion: make git describe output more reliable 2020-09-16 19:31 ` Rasmus Villemoes @ 2020-09-17 0:48 ` Masahiro Yamada 0 siblings, 0 replies; 17+ messages in thread From: Masahiro Yamada @ 2020-09-17 0:48 UTC (permalink / raw) To: Rasmus Villemoes Cc: Brian Norris, Randy Dunlap, Guenter Roeck, Bhaskar Chowdhury, Linux Kernel Mailing List On Thu, Sep 17, 2020 at 4:31 AM Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: > > On 16/09/2020 20.01, Masahiro Yamada wrote: > > On Thu, Sep 17, 2020 at 12:23 AM Rasmus Villemoes > > <linux@rasmusvillemoes.dk> wrote: > >> > >> On 16/09/2020 16.28, Masahiro Yamada wrote: > >>> On Fri, Sep 11, 2020 at 5:28 PM Rasmus Villemoes > >>> <linux@rasmusvillemoes.dk> wrote: > >>>> > > >>> Why do we care about the uniqueness of the abbreviated > >>> hash in the whole git history? > >> > >> Because when I ask a customer "what kernel are you running", and they > >> tell me "4.19.45-rt67-00567-123abc8", I prefer that 123abc8 uniquely > >> identifies the right commit, instead of me having to dig around each of > >> the commits starting with that prefix and see which one of them matches > >> "is exactly 567 commits from 4.19.45-rt67". 7 hexchars is nowhere near > >> enough for that today, which is why Linus himself did that "auto-tune > >> abbrev based on repo size" patch for git.git. > > > > I like: > > > > git rev-parse --verify HEAD 2>/dev/null | cut -c1-15 > > > > better than: > > > > git rev-parse --verify --short=15 HEAD 2>/dev/null > > > > The former produces the deterministic kernelrelease string. > > > > > > But, let's reconsider if we need as long as 15-digits. > > > > There are a couple of kinds of objects in git: commit, tree, blob. > > > > I think you came up with 15-digits to ensure the uniqueness > > in _all_ kinds of objects. > > > > But, when your customer says "4.19.45-rt67-00567-123abc8", > > 123abc8 is apparently a commit instead of a tree or a blob. > > > > > > > > In the context of the kernel version, we can exclude > > tree and blob objects. > > > > In other words, I think "grows ~60000 objects per release" > > is somewhat over-estimating. > > > > We have ~15000 commits per release. So, the difference > > is just 4x factor, though... > > > > > > > > BTW, I did some experiments, and I noticed > > "git log" accepts a shorter hash > > than "git show" does presumably because > > "git log" only takes a commit. > > > > > > > > For example, "git show 06a0d" fails, but > > "git log 06a0d" works. > > > > > > masahiro@oscar:~/ref/linux$ git show 06a0d > > error: short SHA1 06a0d is ambiguous > > hint: The candidates are: > > hint: 06a0df4d1b8b commit 2020-09-08 - fbcon: remove now unusued > > 'softback_lines' cursor() argument > > hint: 06a0d81911b3 tree > > hint: 06a0dc5a84d2 tree > > hint: 06a0d1947c77 blob > > hint: 06a0df434249 blob > > fatal: ambiguous argument '06a0d': unknown revision or path not in the > > working tree. > > Use '--' to separate paths from revisions, like this: > > 'git <command> [<revision>...] -- [<file>...]' > > masahiro@oscar:~/ref/linux$ git log --oneline -1 06a0d > > 06a0df4d1b8b fbcon: remove now unusued 'softback_lines' cursor() argument > > > > > > > > > > What is interesting is, if you prepend <tag>-<number-of-commits>- > > to the abbreviated hash, "git show" accepts as short as a commit > > "git log" does. > > > > masahiro@oscar:~/ref/linux$ git show v5.9-rc5-00002-g06a0d | head -1 > > commit 06a0df4d1b8b13b551668e47b11fd7629033b7df > > > > > > I guess git cleverly understands it refers to a commit object > > since "v5.9-rc5-00002-" prefix only makes sense > > in the commit context. > > > > Well, yes, but unfortunately only so far as applying the same "the user > means a commit object" logic; git doesn't do anything to actually use > the prefix to disambiguate. In fact, looking at a semi-random commit in > 4.19-stable v4.19.114-7-g236c445eb352: > > $ git show 236c44 > error: short SHA1 236c44 is ambiguous > hint: The candidates are: > hint: 236c445eb352 commit 2020-03-13 - drm/bochs: downgrade > pci_request_region failure from error to warning > hint: 236c448cb6e7 commit 2011-09-08 - usbmon vs. tcpdump: fix dropped > packet count > hint: 236c445b1930 blob > > Now you're right that we get > > $ git show v4.19.114-7-g236c445 | head -n1 > commit 236c445eb3529aa7c976f8812513c3cb26d77e27 > > so it knows we're not looking at that blob. But "git show > v4.19.114-7-g236c44" still fails because there are two commits. Adding > to the fun is that "git show v4.19.114-7-g236c448" actually shows the > usbmon commit, which is old v3.2 stuff and certainly not a descendant of > v4.19.114. Oh, I did not notice that. Maybe git can be improved to check the validity of the 'git describe' form, but that is another story. > I didn't actually know that git would even accept those prefixed strings > as possible refspecs, and for a moment you had me hoping that git would > actually do the disambiguation using that prefix, which would certainly > make 7 hex chars enough; unfortunately that's not the case. > > > Keeping those above in mind, I believe 15-digits is too long. > > Well, as you indicated, commits are probably ~1/4 of all objects, i.e. > half a hexchar worth of bits. So the commit/object distinction shouldn't > matter very much for the choice of suitable fixed length. > > > So, I propose two options. > > > > > > [1] 7 digits > > > > The abbreviated hash part may not uniquely identify > > the commit. In that case, you need some extra git > > operations to find out which one is it. > > > > As for the kernel build, > > <kernel-version>-<number-of-commits>-<7-digits> is enough > > to provide the unique kernelrelease string. > > > > > > > > [2] 12 digits > > > > This matches to the Fixes: tag policy specified in > > Documentation/process/submitting-patches.rst > > > > The abbreviated hash part is very likely unique > > in the commit object space. > > (Of course, it is impossible to guarantee the uniqueness) > > I can live with 12. It would be extremely rare that I would have to do > some extra git yoga to figure out which commit they actually mean. I > think we can still use git describe to do the bulk of the work (the 'git > rev-parse | cut ... is easy, but it's somewhat tedious to find the > closest tag, then compute the #commits-on-top part), then just have the > awk script used for pretty-printing (the %05d of the #commits-on-top) > can probably also be used to chop the abbreviated sha1 at 12 chars, > should git have needed to make it longer. Please see below for an > updated patch+commit log. LGTM. Could you send it to ML as a patch? > >> Alternatively, would you consider a Kconfig knob, LOCALVERSION_ABBREV? > > > > > > No, I do not think LOCALVERSION_ABBREV is a good idea. > > Neither do I; I considered it before sending the patch but decided that yes: > > > We should eliminate this problem > > irrespective of the kernel configuration. > > Rasmus > > ==== > > something like this, then? > > > scripts/setlocalversion: make git describe output more reliable > > When building for an embedded target using Yocto, we're sometimes > observing that the version string that gets built into vmlinux (and > thus what uname -a reports) differs from the path under /lib/modules/ > where modules get installed in the rootfs, but only in the length of > the -gabc123def suffix. Hence modprobe always fails. > > The problem is that Yocto has the concept of "sstate" (shared state), > which allows different developers/buildbots/etc. to share build > artifacts, based on a hash of all the metadata that went into building > that artifact - and that metadata includes all dependencies (e.g. the > compiler used etc.). That normally works quite well; usually a clean > build (without using any sstate cache) done by one developer ends up > being binary identical to a build done on another host. However, one > thing that can cause two developers to end up with different builds > [and thus make one's vmlinux package incompatible with the other's > kernel-dev package], which is not captured by the metadata hashing, is > this `git describe`: The output of that can be affected by > > (1) git version: before 2.11 git defaulted to a minimum of 7, since > 2.11 (git.git commit e6c587) the default is dynamic based on the > number of objects in the repo > (2) hence even if both run the same git version, the output can differ > based on how many remotes are being tracked (or just lots of local > development branches or plain old garbage) > (3) and of course somebody could have a core.abbrev config setting in > ~/.gitconfig > > So in order to avoid `uname -a` output relying on such random details > of the build environment which are rather hard to ensure are > consistent between developers and buildbots, make sure the abbreviated > sha1 always consists of exactly 12 hex characters. That is consistent > with the current rule for -stable patches, and is almost always enough > to identify the head commit unambigously - in the few cases where it > does not, the v5.4.3-00021- prefix would certainly nail it down. > > diff --git a/scripts/setlocalversion b/scripts/setlocalversion > index 20f2efd57b11..b51072167df1 100755 > --- a/scripts/setlocalversion > +++ b/scripts/setlocalversion > @@ -45,7 +45,7 @@ scm_version() > > # Check for git and a git repo. > if test -z "$(git rev-parse --show-cdup 2>/dev/null)" && > - head=$(git rev-parse --verify --short HEAD 2>/dev/null); then > + head=$(git rev-parse --verify HEAD 2>/dev/null); then > > # If we are at a tagged commit (like "v2.6.30-rc6"), we ignore > # it, because this version is defined in the top level Makefile. > @@ -59,11 +59,22 @@ scm_version() > fi > # If we are past a tagged commit (like > # "v2.6.30-rc5-302-g72357d5"), we pretty print it. > - if atag="$(git describe 2>/dev/null)"; then > - echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),$(NF))}' > - > - # If we don't have a tag at all we print -g{commitish}. > + # > + # Ensure the abbreviated sha1 has exactly 12 > + # hex characters, to make the output > + # independent of git version, local > + # core.abbrev settings and/or total number of > + # objects in the current repository - passing > + # --abbrev=12 ensures a minimum of 12, and the > + # awk substr() then picks the 'g' and first 12 > + # hex chars. > + if atag="$(git describe --abbrev=12 2>/dev/null)"; then > + echo "$atag" | awk -F- '{printf("-%05d-%s", > $(NF-1),substr($(NF),0,13))}' > + > + # If we don't have a tag at all we print -g{commitish}, > + # again using exactly 12 hex chars. > else > + head="$(echo $head | cut -c1-12)" > printf '%s%s' -g $head > fi > fi -- Best Regards Masahiro Yamada ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] scripts/setlocalversion: make git describe output more reliable 2020-09-10 14:34 ` Masahiro Yamada 2020-09-10 19:05 ` Brian Norris @ 2020-09-10 22:56 ` Bhaskar Chowdhury 2020-09-16 8:48 ` Rasmus Villemoes 2 siblings, 0 replies; 17+ messages in thread From: Bhaskar Chowdhury @ 2020-09-10 22:56 UTC (permalink / raw) To: Masahiro Yamada Cc: Rasmus Villemoes, Randy Dunlap, Brian Norris, Guenter Roeck, Linux Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 5048 bytes --] On 23:34 Thu 10 Sep 2020, Masahiro Yamada wrote: >On Thu, Sep 10, 2020 at 8:57 PM Rasmus Villemoes ><linux@rasmusvillemoes.dk> wrote: >> >> When building for an embedded target using Yocto, we're sometimes >> observing that the version string that gets built into vmlinux (and >> thus what uname -a reports) differs from the path under /lib/modules/ >> where modules get installed in the rootfs, but only in the length of >> the -gabc123def suffix. Hence modprobe always fails. >> >> The problem is that Yocto has the concept of "sstate" (shared state), >> which allows different developers/buildbots/etc. to share build >> artifacts, based on a hash of all the metadata that went into building >> that artifact - and that metadata includes all dependencies (e.g. the >> compiler used etc.). That normally works quite well; usually a clean >> build (without using any sstate cache) done by one developer ends up >> being binary identical to a build done on another host. However, one >> thing that can cause two developers to end up with different builds >> [and thus make one's vmlinux package incompatible with the other's >> kernel-dev package], which is not captured by the metadata hashing, is >> this `git describe`: The output of that can be affected by >> >> (1) git version: before 2.11 git defaulted to a minimum of 7, since >> 2.11 (git.git commit e6c587) the default is dynamic based on the >> number of objects in the repo >> (2) hence even if both run the same git version, the output can differ >> based on how many remotes are being tracked (or just lots of local >> development branches or plain old garbage) >> (3) and of course somebody could have a core.abbrev config setting in >> ~/.gitconfig >> >> So in order to avoid `uname -a` output relying on such random details >> of the build environment which are rather hard to ensure are >> consistent between developers and buildbots, use an explicit >> --abbrev=15 option (and for consistency, also use rev-parse --short=15 >> for the unlikely case of no signed tags being usable). >> >> Now, why is 60 bits enough for everyone? It's not mathematically >> guaranteed that git won't have to use 16 in some git repo, but it is >> beyond unlikely: Even in a repo with 100M objects, the probability >> that any given commit (i.e. the one being described) clashes with some >> other object in the first 15 hex chars is less than 1e-10, and >> currently a git repo tracking Linus', -stable and -rt only has around >> 10M objects. > > >I agree that any randomness should be avoided. > >My question is, do we need 15-digits? > > >The kernelrelease is formed by >[kernel version] + [some digits of git hash]. > > >For example, "git describe" shows as follows: > >v5.9.0-rc4-00034-g7fe10096c150 > > >Linus gives a new tag every week (or every two week). > > >So, I think the conflict happens >only when we have two commits that start with the same 7-digits >in the _same_ release. Is this correct? > >We have 14000 - 15000 commits in each release, >not 100M. > > I kinda agree with this...we need to chopped down the excess bits from the information . Indeed , a 15 digits is too long to keep up. ~Bhaskar > > >> >> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> >> --- >> I could probably fix things by adding a 'git config --local >> core.abbrev 15' step to the Yocto build process after the repo to >> build from has been cloned but before building has started. But in the >> interest of binary reproducibility outside of just Yocto builds, I >> think it's better if this lives in the kernel. >> >> scripts/setlocalversion | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/scripts/setlocalversion b/scripts/setlocalversion >> index 20f2efd57b11..c5262f0d953d 100755 >> --- a/scripts/setlocalversion >> +++ b/scripts/setlocalversion >> @@ -45,7 +45,7 @@ scm_version() >> >> # Check for git and a git repo. >> if test -z "$(git rev-parse --show-cdup 2>/dev/null)" && >> - head=$(git rev-parse --verify --short HEAD 2>/dev/null); then >> + head=$(git rev-parse --verify --short=15 HEAD 2>/dev/null); then >> >> # If we are at a tagged commit (like "v2.6.30-rc6"), we ignore >> # it, because this version is defined in the top level Makefile. >> @@ -59,7 +59,7 @@ scm_version() >> fi >> # If we are past a tagged commit (like >> # "v2.6.30-rc5-302-g72357d5"), we pretty print it. >> - if atag="$(git describe 2>/dev/null)"; then >> + if atag="$(git describe --abbrev=15 2>/dev/null)"; then >> echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),$(NF))}' >> >> # If we don't have a tag at all we print -g{commitish}. >> -- >> 2.23.0 >> > > >-- >Best Regards >Masahiro Yamada [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] scripts/setlocalversion: make git describe output more reliable 2020-09-10 14:34 ` Masahiro Yamada 2020-09-10 19:05 ` Brian Norris 2020-09-10 22:56 ` Bhaskar Chowdhury @ 2020-09-16 8:48 ` Rasmus Villemoes 2 siblings, 0 replies; 17+ messages in thread From: Rasmus Villemoes @ 2020-09-16 8:48 UTC (permalink / raw) To: Masahiro Yamada Cc: Randy Dunlap, Brian Norris, Guenter Roeck, Bhaskar Chowdhury, Linux Kernel Mailing List On 10/09/2020 16.34, Masahiro Yamada wrote: > On Thu, Sep 10, 2020 at 8:57 PM Rasmus Villemoes > <linux@rasmusvillemoes.dk> wrote: >> ... >> >> So in order to avoid `uname -a` output relying on such random details >> of the build environment which are rather hard to ensure are >> consistent between developers and buildbots, use an explicit >> --abbrev=15 option (and for consistency, also use rev-parse --short=15 >> for the unlikely case of no signed tags being usable). >> Hi Masahiro Can I get you to carry this through the kbuild tree? Unless of course your questions/concerns haven't been addressed. Thanks, Rasmus ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2] scripts/setlocalversion: make git describe output more reliable 2020-09-10 11:26 [PATCH] scripts/setlocalversion: make git describe output more reliable Rasmus Villemoes 2020-09-10 14:28 ` Guenter Roeck 2020-09-10 14:34 ` Masahiro Yamada @ 2020-09-17 6:56 ` Rasmus Villemoes 2020-09-17 12:22 ` Nico Schottelius 2020-09-24 17:27 ` Masahiro Yamada 2 siblings, 2 replies; 17+ messages in thread From: Rasmus Villemoes @ 2020-09-17 6:56 UTC (permalink / raw) To: Masahiro Yamada, Randy Dunlap, Brian Norris, Bhaskar Chowdhury, Nico Schottelius, Guenter Roeck Cc: Rasmus Villemoes, linux-kernel When building for an embedded target using Yocto, we're sometimes observing that the version string that gets built into vmlinux (and thus what uname -a reports) differs from the path under /lib/modules/ where modules get installed in the rootfs, but only in the length of the -gabc123def suffix. Hence modprobe always fails. The problem is that Yocto has the concept of "sstate" (shared state), which allows different developers/buildbots/etc. to share build artifacts, based on a hash of all the metadata that went into building that artifact - and that metadata includes all dependencies (e.g. the compiler used etc.). That normally works quite well; usually a clean build (without using any sstate cache) done by one developer ends up being binary identical to a build done on another host. However, one thing that can cause two developers to end up with different builds [and thus make one's vmlinux package incompatible with the other's kernel-dev package], which is not captured by the metadata hashing, is this `git describe`: The output of that can be affected by (1) git version: before 2.11 git defaulted to a minimum of 7, since 2.11 (git.git commit e6c587) the default is dynamic based on the number of objects in the repo (2) hence even if both run the same git version, the output can differ based on how many remotes are being tracked (or just lots of local development branches or plain old garbage) (3) and of course somebody could have a core.abbrev config setting in ~/.gitconfig So in order to avoid `uname -a` output relying on such random details of the build environment which are rather hard to ensure are consistent between developers and buildbots, make sure the abbreviated sha1 always consists of exactly 12 hex characters. That is consistent with the current rule for -stable patches, and is almost always enough to identify the head commit unambigously - in the few cases where it does not, the v5.4.3-00021- prefix would certainly nail it down. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> --- v2: use 12 instead of 15, and ensure that the result does have exactly 12 hex chars. scripts/setlocalversion | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/scripts/setlocalversion b/scripts/setlocalversion index 20f2efd57b11..bb709eda96cd 100755 --- a/scripts/setlocalversion +++ b/scripts/setlocalversion @@ -45,7 +45,7 @@ scm_version() # Check for git and a git repo. if test -z "$(git rev-parse --show-cdup 2>/dev/null)" && - head=$(git rev-parse --verify --short HEAD 2>/dev/null); then + head=$(git rev-parse --verify HEAD 2>/dev/null); then # If we are at a tagged commit (like "v2.6.30-rc6"), we ignore # it, because this version is defined in the top level Makefile. @@ -59,11 +59,22 @@ scm_version() fi # If we are past a tagged commit (like # "v2.6.30-rc5-302-g72357d5"), we pretty print it. - if atag="$(git describe 2>/dev/null)"; then - echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),$(NF))}' - - # If we don't have a tag at all we print -g{commitish}. + # + # Ensure the abbreviated sha1 has exactly 12 + # hex characters, to make the output + # independent of git version, local + # core.abbrev settings and/or total number of + # objects in the current repository - passing + # --abbrev=12 ensures a minimum of 12, and the + # awk substr() then picks the 'g' and first 12 + # hex chars. + if atag="$(git describe --abbrev=12 2>/dev/null)"; then + echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),substr($(NF),0,13))}' + + # If we don't have a tag at all we print -g{commitish}, + # again using exactly 12 hex chars. else + head="$(echo $head | cut -c1-12)" printf '%s%s' -g $head fi fi -- 2.23.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2] scripts/setlocalversion: make git describe output more reliable 2020-09-17 6:56 ` [PATCH v2] " Rasmus Villemoes @ 2020-09-17 12:22 ` Nico Schottelius 2020-09-17 12:58 ` Rasmus Villemoes 2020-09-24 17:27 ` Masahiro Yamada 1 sibling, 1 reply; 17+ messages in thread From: Nico Schottelius @ 2020-09-17 12:22 UTC (permalink / raw) To: Rasmus Villemoes Cc: Masahiro Yamada, Randy Dunlap, Brian Norris, Bhaskar Chowdhury, Nico Schottelius, Guenter Roeck, linux-kernel Thanks for the patch Rasmus. Overall it looks good to me, be aligned to the stable patch submission rules makes sense. A tiny thing though: I did not calculate the exact collision probability with 12 characters and it does not make sense to even discuss this, if this is a current rule for stable patches. However we have a couple of 12's scattered in the code. And if the submission rule changes in the future, we should have a single location to update it. So I suggest you introduce something on the line of: ... num_chars=12 ... --abbrev=$num_chars ... I guess you get the picture. Greetings from the sunny mountains, Nico Rasmus Villemoes <linux@rasmusvillemoes.dk> writes: > When building for an embedded target using Yocto, we're sometimes > observing that the version string that gets built into vmlinux (and > thus what uname -a reports) differs from the path under /lib/modules/ > where modules get installed in the rootfs, but only in the length of > the -gabc123def suffix. Hence modprobe always fails. > > The problem is that Yocto has the concept of "sstate" (shared state), > which allows different developers/buildbots/etc. to share build > artifacts, based on a hash of all the metadata that went into building > that artifact - and that metadata includes all dependencies (e.g. the > compiler used etc.). That normally works quite well; usually a clean > build (without using any sstate cache) done by one developer ends up > being binary identical to a build done on another host. However, one > thing that can cause two developers to end up with different builds > [and thus make one's vmlinux package incompatible with the other's > kernel-dev package], which is not captured by the metadata hashing, is > this `git describe`: The output of that can be affected by > > (1) git version: before 2.11 git defaulted to a minimum of 7, since > 2.11 (git.git commit e6c587) the default is dynamic based on the > number of objects in the repo > (2) hence even if both run the same git version, the output can differ > based on how many remotes are being tracked (or just lots of local > development branches or plain old garbage) > (3) and of course somebody could have a core.abbrev config setting in > ~/.gitconfig > > So in order to avoid `uname -a` output relying on such random details > of the build environment which are rather hard to ensure are > consistent between developers and buildbots, make sure the abbreviated > sha1 always consists of exactly 12 hex characters. That is consistent > with the current rule for -stable patches, and is almost always enough > to identify the head commit unambigously - in the few cases where it > does not, the v5.4.3-00021- prefix would certainly nail it down. > > Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> > --- > v2: use 12 instead of 15, and ensure that the result does have exactly > 12 hex chars. > > scripts/setlocalversion | 21 ++++++++++++++++----- > 1 file changed, 16 insertions(+), 5 deletions(-) > > diff --git a/scripts/setlocalversion b/scripts/setlocalversion > index 20f2efd57b11..bb709eda96cd 100755 > --- a/scripts/setlocalversion > +++ b/scripts/setlocalversion > @@ -45,7 +45,7 @@ scm_version() > > # Check for git and a git repo. > if test -z "$(git rev-parse --show-cdup 2>/dev/null)" && > - head=$(git rev-parse --verify --short HEAD 2>/dev/null); then > + head=$(git rev-parse --verify HEAD 2>/dev/null); then > > # If we are at a tagged commit (like "v2.6.30-rc6"), we ignore > # it, because this version is defined in the top level Makefile. > @@ -59,11 +59,22 @@ scm_version() > fi > # If we are past a tagged commit (like > # "v2.6.30-rc5-302-g72357d5"), we pretty print it. > - if atag="$(git describe 2>/dev/null)"; then > - echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),$(NF))}' > - > - # If we don't have a tag at all we print -g{commitish}. > + # > + # Ensure the abbreviated sha1 has exactly 12 > + # hex characters, to make the output > + # independent of git version, local > + # core.abbrev settings and/or total number of > + # objects in the current repository - passing > + # --abbrev=12 ensures a minimum of 12, and the > + # awk substr() then picks the 'g' and first 12 > + # hex chars. > + if atag="$(git describe --abbrev=12 2>/dev/null)"; then > + echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),substr($(NF),0,13))}' > + > + # If we don't have a tag at all we print -g{commitish}, > + # again using exactly 12 hex chars. > else > + head="$(echo $head | cut -c1-12)" > printf '%s%s' -g $head > fi > fi -- Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] scripts/setlocalversion: make git describe output more reliable 2020-09-17 12:22 ` Nico Schottelius @ 2020-09-17 12:58 ` Rasmus Villemoes 2020-09-21 9:35 ` Nico Schottelius 0 siblings, 1 reply; 17+ messages in thread From: Rasmus Villemoes @ 2020-09-17 12:58 UTC (permalink / raw) To: Nico Schottelius Cc: Masahiro Yamada, Randy Dunlap, Brian Norris, Bhaskar Chowdhury, Guenter Roeck, linux-kernel On 17/09/2020 14.22, Nico Schottelius wrote: > > Thanks for the patch Rasmus. Overall it looks good to me, be aligned to > the stable patch submission rules makes sense. A tiny thing though: > > I did not calculate the exact collision probability with 12 characters For reference, the math is something like this: Consider a repo with N+1 objects. We look at one specific object (for setlocalversion its the head commit being built, for the stable rules its whatever particular commit one is interested in for backporting), and want to know the probability that its sha1 collides with some other object in the first b bits (here b=48). Assuming the sha1s are independent and uniformly distributed, the probability of not colliding with one specific other commit is x=1-1/2^b, and the probability of not colliding with any of the other N commits is x^N, making the probability of a collision 1-x^N = (1-x)(1+x+x^2+...+x^{N-1}). Now the N terms in the second factor are very-close-to-but-slightly-smaller-than 1, so an upper bound for this probability is (1-x)N = N/2^b, which is also what one would naively expect. [This estimate is always valid, but it becomes a void statement of "the probability is less then 1" when N is >= 2^b]. So, assuming some vendor kernel repo that has all of Greg's stable.git (around 10M objects I think) and another 10M objects because random vendor, that works out to 20e6/2^48 = 7.1e-8, 71 ppb. > So I suggest you introduce something on the line of: > > ... > num_chars=12 > ... > --abbrev=$num_chars I considered that, but it becomes quite ugly since it needs to get into the awk script (as a 13, though perhaps we could get awk to do the +1, I don't really speak awk), where we'd then need to use " instead of ' and then escape the $ that are to be interpreted by awk and not the shell. So I think it's more readable with hardcoding and comments explaining why they are there; should anyone ever want to change 12. Rasmus ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] scripts/setlocalversion: make git describe output more reliable 2020-09-17 12:58 ` Rasmus Villemoes @ 2020-09-21 9:35 ` Nico Schottelius 0 siblings, 0 replies; 17+ messages in thread From: Nico Schottelius @ 2020-09-21 9:35 UTC (permalink / raw) To: Rasmus Villemoes Cc: Nico Schottelius, Masahiro Yamada, Randy Dunlap, Brian Norris, Bhaskar Chowdhury, Guenter Roeck, linux-kernel Hey Rasmus, Rasmus Villemoes <linux@rasmusvillemoes.dk> writes: >> So I suggest you introduce something on the line of: >> >> ... >> num_chars=12 >> ... >> --abbrev=$num_chars > > I considered that, but it becomes quite ugly since it needs to get into > the awk script (as a 13, though perhaps we could get awk to do the +1, I > don't really speak awk), where we'd then need to use " instead of ' and > then escape the $ that are to be interpreted by awk and not the shell. No need for that, awk can read (and use) environment variables... > So I think it's more readable with hardcoding and comments explaining > why they are there; should anyone ever want to change 12. ... so that in practice you only need to change 13 to var+1 and 12 to $var. Cheers, Nico -- Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] scripts/setlocalversion: make git describe output more reliable 2020-09-17 6:56 ` [PATCH v2] " Rasmus Villemoes 2020-09-17 12:22 ` Nico Schottelius @ 2020-09-24 17:27 ` Masahiro Yamada 1 sibling, 0 replies; 17+ messages in thread From: Masahiro Yamada @ 2020-09-24 17:27 UTC (permalink / raw) To: Rasmus Villemoes Cc: Randy Dunlap, Brian Norris, Bhaskar Chowdhury, Nico Schottelius, Guenter Roeck, Linux Kernel Mailing List On Thu, Sep 17, 2020 at 3:56 PM Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: > > When building for an embedded target using Yocto, we're sometimes > observing that the version string that gets built into vmlinux (and > thus what uname -a reports) differs from the path under /lib/modules/ > where modules get installed in the rootfs, but only in the length of > the -gabc123def suffix. Hence modprobe always fails. > > The problem is that Yocto has the concept of "sstate" (shared state), > which allows different developers/buildbots/etc. to share build > artifacts, based on a hash of all the metadata that went into building > that artifact - and that metadata includes all dependencies (e.g. the > compiler used etc.). That normally works quite well; usually a clean > build (without using any sstate cache) done by one developer ends up > being binary identical to a build done on another host. However, one > thing that can cause two developers to end up with different builds > [and thus make one's vmlinux package incompatible with the other's > kernel-dev package], which is not captured by the metadata hashing, is > this `git describe`: The output of that can be affected by > > (1) git version: before 2.11 git defaulted to a minimum of 7, since > 2.11 (git.git commit e6c587) the default is dynamic based on the > number of objects in the repo > (2) hence even if both run the same git version, the output can differ > based on how many remotes are being tracked (or just lots of local > development branches or plain old garbage) > (3) and of course somebody could have a core.abbrev config setting in > ~/.gitconfig > > So in order to avoid `uname -a` output relying on such random details > of the build environment which are rather hard to ensure are > consistent between developers and buildbots, make sure the abbreviated > sha1 always consists of exactly 12 hex characters. That is consistent > with the current rule for -stable patches, and is almost always enough > to identify the head commit unambigously - in the few cases where it > does not, the v5.4.3-00021- prefix would certainly nail it down. > > Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> > --- > v2: use 12 instead of 15, and ensure that the result does have exactly > 12 hex chars. > > scripts/setlocalversion | 21 ++++++++++++++++----- > 1 file changed, 16 insertions(+), 5 deletions(-) Applied to linux-kbuild. Thanks. -- Best Regards Masahiro Yamada ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2020-09-24 17:28 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-10 11:26 [PATCH] scripts/setlocalversion: make git describe output more reliable Rasmus Villemoes 2020-09-10 14:28 ` Guenter Roeck 2020-09-10 14:34 ` Masahiro Yamada 2020-09-10 19:05 ` Brian Norris 2020-09-11 8:28 ` Rasmus Villemoes 2020-09-16 14:28 ` Masahiro Yamada 2020-09-16 15:23 ` Rasmus Villemoes 2020-09-16 18:01 ` Masahiro Yamada 2020-09-16 19:31 ` Rasmus Villemoes 2020-09-17 0:48 ` Masahiro Yamada 2020-09-10 22:56 ` Bhaskar Chowdhury 2020-09-16 8:48 ` Rasmus Villemoes 2020-09-17 6:56 ` [PATCH v2] " Rasmus Villemoes 2020-09-17 12:22 ` Nico Schottelius 2020-09-17 12:58 ` Rasmus Villemoes 2020-09-21 9:35 ` Nico Schottelius 2020-09-24 17:27 ` Masahiro Yamada
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).