* arm crypto .S_shipped files sometimes get rebuilt randomly @ 2018-03-07 19:25 Leonard Crestez 2018-03-08 5:00 ` Masahiro Yamada 2018-03-08 23:19 ` Rasmus Villemoes 0 siblings, 2 replies; 13+ messages in thread From: Leonard Crestez @ 2018-03-07 19:25 UTC (permalink / raw) To: Ard Biesheuvel, Masahiro Yamada, Herbert Xu Cc: linux-arm-kernel, linux-kernel Hello, I am using a toolchain with a broken/old version of perl which doesn't include integer.pm and I noticed it triggers occasional build failures on arch/arm64/crypto/sha512-core.S_shipped. Workarounds are easy, but if the purpose of the .S_shipped is to avoid the need to have all dependencies on the build machine then something went wrong? This was introduced by commit 7918ecef073f ("crypto: arm64/sha2 - integrate OpenSSL implementations of SHA256/SHA512"). The makefile rules are not terribly complicated: quiet_cmd_perlasm = PERLASM $@ cmd_perlasm = $(PERL) $(<) void $(@) $(src)/sha512-core.S_shipped: $(src)/sha512-armv8.pl $(call cmd,perlasm) If a decision to rerun the rule is made based on their relative timestamps but both .S_shipped and sha512-armv8.pl are included in git then won't the result be essentially random, depending on file checkout order? I see random success/failure by just running something like the following multiple times: rm -rf arch/arm64/crypto git co -f arch/arm64/crypto make -- arch/arm64/crypto/ A reasonable fix might be to simply drop .S_shipped and require a functional recent version of perl. Then if it fails it will fail reliably. -- Regards, Leonard ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: arm crypto .S_shipped files sometimes get rebuilt randomly 2018-03-07 19:25 arm crypto .S_shipped files sometimes get rebuilt randomly Leonard Crestez @ 2018-03-08 5:00 ` Masahiro Yamada 2018-03-08 23:19 ` Rasmus Villemoes 1 sibling, 0 replies; 13+ messages in thread From: Masahiro Yamada @ 2018-03-08 5:00 UTC (permalink / raw) To: Leonard Crestez Cc: Ard Biesheuvel, Herbert Xu, linux-arm-kernel, linux-kernel 2018-03-08 4:25 GMT+09:00 Leonard Crestez <leonard.crestez@nxp.com>: > Hello, > > I am using a toolchain with a broken/old version of perl which doesn't > include integer.pm and I noticed it triggers occasional build failures > on arch/arm64/crypto/sha512-core.S_shipped. Workarounds are easy, but > if the purpose of the .S_shipped is to avoid the need to have all > dependencies on the build machine then something went wrong? > > This was introduced by commit 7918ecef073f ("crypto: arm64/sha2 - > integrate OpenSSL implementations of SHA256/SHA512"). The makefile > rules are not terribly complicated: > > quiet_cmd_perlasm = PERLASM $@ > cmd_perlasm = $(PERL) $(<) void $(@) > > $(src)/sha512-core.S_shipped: $(src)/sha512-armv8.pl > $(call cmd,perlasm) > > If a decision to rerun the rule is made based on their relative > timestamps but both .S_shipped and sha512-armv8.pl are included in git > then won't the result be essentially random, depending on file checkout > order? > > I see random success/failure by just running something like the > following multiple times: > rm -rf arch/arm64/crypto > git co -f arch/arm64/crypto > make -- arch/arm64/crypto/ > > A reasonable fix might be to simply drop .S_shipped and require a > functional recent version of perl. Then if it fails it will fail > reliably. > Indeed, this Makefile is weird. We have two choices. [1] If we intend to generate sha{256,512}-core.S from the perl script during the build, this should be: $(obj)/sha512-core.S: $(src)/sha512-armv8.pl $(call cmd,perlasm) $(obj)/sha512-core.S: $(src)/sha512-armv8.pl $(call cmd,perlasm) [2] If we want to check-in _shipped files and avoid running perl during the build, we can surround unnecessary rules with if-conditional, like if REGENERATE_ARM64_SHA $(src)/sha256-core.S_shipped: $(src)/sha512-armv8.pl $(call cmd,perlasm) $(src)/sha512-core.S_shipped: $(src)/sha512-armv8.pl $(call cmd,perlasm) endif Set REGENERATE_ARM64_SHA=1 from the command line only when you need to update the _shipped files. This is what commit 7373f4f83c71d50f0aece6d94309ab7fde42180f did. Recently, Kconfig switched to [1]. So, flex and bison are required to build the kernel. -- Best Regards Masahiro Yamada ^ permalink raw reply [flat|nested] 13+ messages in thread
* arm crypto .S_shipped files sometimes get rebuilt randomly @ 2018-03-08 5:00 ` Masahiro Yamada 0 siblings, 0 replies; 13+ messages in thread From: Masahiro Yamada @ 2018-03-08 5:00 UTC (permalink / raw) To: linux-arm-kernel 2018-03-08 4:25 GMT+09:00 Leonard Crestez <leonard.crestez@nxp.com>: > Hello, > > I am using a toolchain with a broken/old version of perl which doesn't > include integer.pm and I noticed it triggers occasional build failures > on arch/arm64/crypto/sha512-core.S_shipped. Workarounds are easy, but > if the purpose of the .S_shipped is to avoid the need to have all > dependencies on the build machine then something went wrong? > > This was introduced by commit 7918ecef073f ("crypto: arm64/sha2 - > integrate OpenSSL implementations of SHA256/SHA512"). The makefile > rules are not terribly complicated: > > quiet_cmd_perlasm = PERLASM $@ > cmd_perlasm = $(PERL) $(<) void $(@) > > $(src)/sha512-core.S_shipped: $(src)/sha512-armv8.pl > $(call cmd,perlasm) > > If a decision to rerun the rule is made based on their relative > timestamps but both .S_shipped and sha512-armv8.pl are included in git > then won't the result be essentially random, depending on file checkout > order? > > I see random success/failure by just running something like the > following multiple times: > rm -rf arch/arm64/crypto > git co -f arch/arm64/crypto > make -- arch/arm64/crypto/ > > A reasonable fix might be to simply drop .S_shipped and require a > functional recent version of perl. Then if it fails it will fail > reliably. > Indeed, this Makefile is weird. We have two choices. [1] If we intend to generate sha{256,512}-core.S from the perl script during the build, this should be: $(obj)/sha512-core.S: $(src)/sha512-armv8.pl $(call cmd,perlasm) $(obj)/sha512-core.S: $(src)/sha512-armv8.pl $(call cmd,perlasm) [2] If we want to check-in _shipped files and avoid running perl during the build, we can surround unnecessary rules with if-conditional, like if REGENERATE_ARM64_SHA $(src)/sha256-core.S_shipped: $(src)/sha512-armv8.pl $(call cmd,perlasm) $(src)/sha512-core.S_shipped: $(src)/sha512-armv8.pl $(call cmd,perlasm) endif Set REGENERATE_ARM64_SHA=1 from the command line only when you need to update the _shipped files. This is what commit 7373f4f83c71d50f0aece6d94309ab7fde42180f did. Recently, Kconfig switched to [1]. So, flex and bison are required to build the kernel. -- Best Regards Masahiro Yamada ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: arm crypto .S_shipped files sometimes get rebuilt randomly 2018-03-08 5:00 ` Masahiro Yamada @ 2018-03-08 7:02 ` Ard Biesheuvel -1 siblings, 0 replies; 13+ messages in thread From: Ard Biesheuvel @ 2018-03-08 7:02 UTC (permalink / raw) To: Masahiro Yamada Cc: Leonard Crestez, Herbert Xu, linux-arm-kernel, linux-kernel On 8 March 2018 at 05:00, Masahiro Yamada <yamada.masahiro@socionext.com> wrote: > 2018-03-08 4:25 GMT+09:00 Leonard Crestez <leonard.crestez@nxp.com>: >> Hello, >> >> I am using a toolchain with a broken/old version of perl which doesn't >> include integer.pm and I noticed it triggers occasional build failures >> on arch/arm64/crypto/sha512-core.S_shipped. Workarounds are easy, but >> if the purpose of the .S_shipped is to avoid the need to have all >> dependencies on the build machine then something went wrong? >> >> This was introduced by commit 7918ecef073f ("crypto: arm64/sha2 - >> integrate OpenSSL implementations of SHA256/SHA512"). The makefile >> rules are not terribly complicated: >> >> quiet_cmd_perlasm = PERLASM $@ >> cmd_perlasm = $(PERL) $(<) void $(@) >> >> $(src)/sha512-core.S_shipped: $(src)/sha512-armv8.pl >> $(call cmd,perlasm) >> >> If a decision to rerun the rule is made based on their relative >> timestamps but both .S_shipped and sha512-armv8.pl are included in git >> then won't the result be essentially random, depending on file checkout >> order? >> I agree with your analysis, although I never see these spurious rebuilds of these files. >> I see random success/failure by just running something like the >> following multiple times: >> rm -rf arch/arm64/crypto >> git co -f arch/arm64/crypto >> make -- arch/arm64/crypto/ >> >> A reasonable fix might be to simply drop .S_shipped and require a >> functional recent version of perl. Then if it fails it will fail >> reliably. >> > > Indeed, this Makefile is weird. > We have two choices. > > > [1] If we intend to generate > sha{256,512}-core.S from the perl script during the build, > this should be: > > $(obj)/sha512-core.S: $(src)/sha512-armv8.pl > $(call cmd,perlasm) > > $(obj)/sha512-core.S: $(src)/sha512-armv8.pl > $(call cmd,perlasm) > > > [2] If we want to check-in _shipped files > and avoid running perl during the build, > we can surround unnecessary rules with if-conditional, like > > > if REGENERATE_ARM64_SHA > $(src)/sha256-core.S_shipped: $(src)/sha512-armv8.pl > $(call cmd,perlasm) > > $(src)/sha512-core.S_shipped: $(src)/sha512-armv8.pl > $(call cmd,perlasm) > endif > > > Set REGENERATE_ARM64_SHA=1 from the command line > only when you need to update the _shipped files. > > This is what commit 7373f4f83c71d50f0aece6d94309ab7fde42180f did. > > > > > Recently, Kconfig switched to [1]. > So, flex and bison are required to build the kernel. > I would prefer option [1], but only if it is already documented somewhere that Perl is a build time dependency. ^ permalink raw reply [flat|nested] 13+ messages in thread
* arm crypto .S_shipped files sometimes get rebuilt randomly @ 2018-03-08 7:02 ` Ard Biesheuvel 0 siblings, 0 replies; 13+ messages in thread From: Ard Biesheuvel @ 2018-03-08 7:02 UTC (permalink / raw) To: linux-arm-kernel On 8 March 2018 at 05:00, Masahiro Yamada <yamada.masahiro@socionext.com> wrote: > 2018-03-08 4:25 GMT+09:00 Leonard Crestez <leonard.crestez@nxp.com>: >> Hello, >> >> I am using a toolchain with a broken/old version of perl which doesn't >> include integer.pm and I noticed it triggers occasional build failures >> on arch/arm64/crypto/sha512-core.S_shipped. Workarounds are easy, but >> if the purpose of the .S_shipped is to avoid the need to have all >> dependencies on the build machine then something went wrong? >> >> This was introduced by commit 7918ecef073f ("crypto: arm64/sha2 - >> integrate OpenSSL implementations of SHA256/SHA512"). The makefile >> rules are not terribly complicated: >> >> quiet_cmd_perlasm = PERLASM $@ >> cmd_perlasm = $(PERL) $(<) void $(@) >> >> $(src)/sha512-core.S_shipped: $(src)/sha512-armv8.pl >> $(call cmd,perlasm) >> >> If a decision to rerun the rule is made based on their relative >> timestamps but both .S_shipped and sha512-armv8.pl are included in git >> then won't the result be essentially random, depending on file checkout >> order? >> I agree with your analysis, although I never see these spurious rebuilds of these files. >> I see random success/failure by just running something like the >> following multiple times: >> rm -rf arch/arm64/crypto >> git co -f arch/arm64/crypto >> make -- arch/arm64/crypto/ >> >> A reasonable fix might be to simply drop .S_shipped and require a >> functional recent version of perl. Then if it fails it will fail >> reliably. >> > > Indeed, this Makefile is weird. > We have two choices. > > > [1] If we intend to generate > sha{256,512}-core.S from the perl script during the build, > this should be: > > $(obj)/sha512-core.S: $(src)/sha512-armv8.pl > $(call cmd,perlasm) > > $(obj)/sha512-core.S: $(src)/sha512-armv8.pl > $(call cmd,perlasm) > > > [2] If we want to check-in _shipped files > and avoid running perl during the build, > we can surround unnecessary rules with if-conditional, like > > > if REGENERATE_ARM64_SHA > $(src)/sha256-core.S_shipped: $(src)/sha512-armv8.pl > $(call cmd,perlasm) > > $(src)/sha512-core.S_shipped: $(src)/sha512-armv8.pl > $(call cmd,perlasm) > endif > > > Set REGENERATE_ARM64_SHA=1 from the command line > only when you need to update the _shipped files. > > This is what commit 7373f4f83c71d50f0aece6d94309ab7fde42180f did. > > > > > Recently, Kconfig switched to [1]. > So, flex and bison are required to build the kernel. > I would prefer option [1], but only if it is already documented somewhere that Perl is a build time dependency. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: arm crypto .S_shipped files sometimes get rebuilt randomly 2018-03-08 7:02 ` Ard Biesheuvel (?) @ 2018-03-08 14:11 ` Leonard Crestez -1 siblings, 0 replies; 13+ messages in thread From: Leonard Crestez @ 2018-03-08 14:11 UTC (permalink / raw) To: Ard Biesheuvel, Masahiro Yamada Cc: Herbert Xu, linux-arm-kernel, linux-kernel On Thu, 2018-03-08 at 07:02 +0000, Ard Biesheuvel wrote: > On 8 March 2018 at 05:00, Masahiro Yamada <yamada.masahiro@socionext.com> wrote: > > 2018-03-08 4:25 GMT+09:00 Leonard Crestez <leonard.crestez@nxp.com > > > If a decision to rerun the rule is made based on their relative > > > timestamps but both .S_shipped and sha512-armv8.pl are included in git > > > then won't the result be essentially random, depending on file checkout > > > order? > I agree with your analysis, although I never see these spurious > rebuilds of these files. It would happen at most once per checkout since after it happens the timestamp on .S_shipped is updated. An extra line in make output is very easy to miss. > > We have two choices. > > > > [1] If we intend to generate > > sha{256,512}-core.S from the perl script during the build, > > this should be: > > > > [2] If we want to check-in _shipped files > > and avoid running perl during the build, > > we can surround unnecessary rules with if-conditional, like > > > > Recently, Kconfig switched to [1]. > > So, flex and bison are required to build the kernel. > I would prefer option [1], but only if it is already documented > somewhere that Perl is a build time dependency. This already seems to be the case, in Documentation/process/changes.rst > You will need perl 5 and the following modules: ``Getopt::Long``, > ``Getopt::Std``, ``File::Basename``, and ``File::Find`` to build the > kernel. Maybe "integer.pm" would need to be explicitly documented as a dependency as well? I don't know anything about the perl ecosystem. Building the kernel with make PERL=/bin/false seems to mostly work though, it's used for stuff like docs and headers_check (which is not enabled by default). -- Regards, Leonard ^ permalink raw reply [flat|nested] 13+ messages in thread
* a Heisenbug tale (was: Re: arm crypto .S_shipped files sometimes get rebuilt randomly) 2018-03-07 19:25 arm crypto .S_shipped files sometimes get rebuilt randomly Leonard Crestez @ 2018-03-08 23:19 ` Rasmus Villemoes 2018-03-08 23:19 ` Rasmus Villemoes 1 sibling, 0 replies; 13+ messages in thread From: Rasmus Villemoes @ 2018-03-08 23:19 UTC (permalink / raw) To: Leonard Crestez, Ard Biesheuvel, Masahiro Yamada, Herbert Xu Cc: linux-arm-kernel, linux-kernel On 2018-03-07 20:25, Leonard Crestez wrote: > Hello, > > I am using a toolchain with a broken/old version of perl which doesn't > include integer.pm and I noticed it triggers occasional build failures > on arch/arm64/crypto/sha512-core.S_shipped. Workarounds are easy, but > if the purpose of the .S_shipped is to avoid the need to have all > dependencies on the build machine then something went wrong? About a year ago, I debugged a very similar problem for a customer. The build system (an OpenEmbedded clone) did something like (1) git clone -n ... (2) git checkout -q $somerev (3) make foo_config (4) make vmlinux (5) make LOADADDR=0x1234 uImage (6) make modules ... The symptoms, when the bug appeared, was that the kernel was built with some version string "4.5.6-01234-g0abcdef", but the modules (both the native kernel modules and those built externally using the linux-dev package) ended up in /lib/modules/4.5.6-01234-g0abcdef-dirty. Now, the truly annoying thing about this bug was that when I did the above steps manually, but did a "git status" in-between steps 4 and 5 to see where the -dirty might come from, I could _never_ reproduce the bug (and git status always reported that the tree was clean), while it usually happened at least 1 in 20 CI builds. Of course, I didn't realize right away that "git status" was hiding the bug, but without that, I could reproduce about as often as the CI, while 100 builds with that extra step never reproduced. With that as a hint, I found http://git.661346.n2.nabble.com/BUG-in-git-diff-index-td7652105.html#a7652109, and slowly began to realize what might be happening. The thing is, the setlocalversion script doesn't use "git status" to decide whether to append -dirty, but rather the plumbing command "git diff-index". During the "make vmlinux", the tree is obviously clean, so we get the 4.5.6-01234-g0abcdef string baked into vmlinux. In some cases, the two files arch/arm/crypto/aesbs-core.S_shipped arch/arm/crypto/bsaes-armv7.pl ended up with different timestamps (the latter being newest) after the git checkout, so during that same "make vmlinux", the S_shipped file would be regenerated using the perl script. Since it is updated with ">" shell redirection, the dev and ino stays the same, so does the content, but the mtime is obviously updated. If we then go on to do a "make LOADADDR=0x1234 uImage", the kernel build system again calls setlocalversion, and this time "git diff-index --name-only HEAD" actually produces output (namely, arch/arm/crypto/aesbs-core.S_shipped), so include/config/kernel.release is recreated, this time containing "4.5.6-01234-g0abcdef-dirty". The right version string was already baked into vmlinux, so the resulting kernel shows that for uname -r, but everything from now on, including internal and external module builds, ends up using the -dirty string. And this is why doing a "git status" would always hide the bug: git diff-index only looks at the stat(2) information, and the newer mtime was enough for it to print out arch/arm/crypto/aesbs-core.S_shipped ; but "git status" actually opens the file and compares the contents (which are unchanged), and then updates git's index, noting the new stat() info for the file. With that, I started looking at the timestamps of the above files immediately after "git checkout". They were either identical (=no bug), or the .pl was 4 ms (aka 1 jiffy) newer than the .S_shipped (=the bug would appear), so that explained why the bug apparently happened at random: if and only if a timer tick hit between git creating those two files. What we ended up doing was to explicitly set the mtime of every file in the repo to the same reference time after the git checkout (find ... | xargs touch --date=...). I also wanted to send a patch to change the Makefile to use the filechk mechanism to avoid updating the .S_shipped file when the script produced identical output, but never got around to it. Rasmus ^ permalink raw reply [flat|nested] 13+ messages in thread
* a Heisenbug tale (was: Re: arm crypto .S_shipped files sometimes get rebuilt randomly) @ 2018-03-08 23:19 ` Rasmus Villemoes 0 siblings, 0 replies; 13+ messages in thread From: Rasmus Villemoes @ 2018-03-08 23:19 UTC (permalink / raw) To: linux-arm-kernel On 2018-03-07 20:25, Leonard Crestez wrote: > Hello, > > I am using a toolchain with a broken/old version of perl which doesn't > include integer.pm and I noticed it triggers occasional build failures > on arch/arm64/crypto/sha512-core.S_shipped. Workarounds are easy, but > if the purpose of the .S_shipped is to avoid the need to have all > dependencies on the build machine then something went wrong? About a year ago, I debugged a very similar problem for a customer. The build system (an OpenEmbedded clone) did something like (1) git clone -n ... (2) git checkout -q $somerev (3) make foo_config (4) make vmlinux (5) make LOADADDR=0x1234 uImage (6) make modules ... The symptoms, when the bug appeared, was that the kernel was built with some version string "4.5.6-01234-g0abcdef", but the modules (both the native kernel modules and those built externally using the linux-dev package) ended up in /lib/modules/4.5.6-01234-g0abcdef-dirty. Now, the truly annoying thing about this bug was that when I did the above steps manually, but did a "git status" in-between steps 4 and 5 to see where the -dirty might come from, I could _never_ reproduce the bug (and git status always reported that the tree was clean), while it usually happened at least 1 in 20 CI builds. Of course, I didn't realize right away that "git status" was hiding the bug, but without that, I could reproduce about as often as the CI, while 100 builds with that extra step never reproduced. With that as a hint, I found http://git.661346.n2.nabble.com/BUG-in-git-diff-index-td7652105.html#a7652109, and slowly began to realize what might be happening. The thing is, the setlocalversion script doesn't use "git status" to decide whether to append -dirty, but rather the plumbing command "git diff-index". During the "make vmlinux", the tree is obviously clean, so we get the 4.5.6-01234-g0abcdef string baked into vmlinux. In some cases, the two files arch/arm/crypto/aesbs-core.S_shipped arch/arm/crypto/bsaes-armv7.pl ended up with different timestamps (the latter being newest) after the git checkout, so during that same "make vmlinux", the S_shipped file would be regenerated using the perl script. Since it is updated with ">" shell redirection, the dev and ino stays the same, so does the content, but the mtime is obviously updated. If we then go on to do a "make LOADADDR=0x1234 uImage", the kernel build system again calls setlocalversion, and this time "git diff-index --name-only HEAD" actually produces output (namely, arch/arm/crypto/aesbs-core.S_shipped), so include/config/kernel.release is recreated, this time containing "4.5.6-01234-g0abcdef-dirty". The right version string was already baked into vmlinux, so the resulting kernel shows that for uname -r, but everything from now on, including internal and external module builds, ends up using the -dirty string. And this is why doing a "git status" would always hide the bug: git diff-index only looks at the stat(2) information, and the newer mtime was enough for it to print out arch/arm/crypto/aesbs-core.S_shipped ; but "git status" actually opens the file and compares the contents (which are unchanged), and then updates git's index, noting the new stat() info for the file. With that, I started looking at the timestamps of the above files immediately after "git checkout". They were either identical (=no bug), or the .pl was 4 ms (aka 1 jiffy) newer than the .S_shipped (=the bug would appear), so that explained why the bug apparently happened at random: if and only if a timer tick hit between git creating those two files. What we ended up doing was to explicitly set the mtime of every file in the repo to the same reference time after the git checkout (find ... | xargs touch --date=...). I also wanted to send a patch to change the Makefile to use the filechk mechanism to avoid updating the .S_shipped file when the script produced identical output, but never got around to it. Rasmus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: a Heisenbug tale (was: Re: arm crypto .S_shipped files sometimes get rebuilt randomly) 2018-03-08 23:19 ` Rasmus Villemoes @ 2018-03-09 9:45 ` Ard Biesheuvel -1 siblings, 0 replies; 13+ messages in thread From: Ard Biesheuvel @ 2018-03-09 9:45 UTC (permalink / raw) To: Rasmus Villemoes, Russell King Cc: Leonard Crestez, Masahiro Yamada, Herbert Xu, linux-arm-kernel, linux-kernel On 8 March 2018 at 23:19, Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: > On 2018-03-07 20:25, Leonard Crestez wrote: >> Hello, >> >> I am using a toolchain with a broken/old version of perl which doesn't >> include integer.pm and I noticed it triggers occasional build failures >> on arch/arm64/crypto/sha512-core.S_shipped. Workarounds are easy, but >> if the purpose of the .S_shipped is to avoid the need to have all >> dependencies on the build machine then something went wrong? > > About a year ago, I debugged a very similar problem for a customer. The > build system (an OpenEmbedded clone) did something like > > (1) git clone -n ... > (2) git checkout -q $somerev > (3) make foo_config > (4) make vmlinux > (5) make LOADADDR=0x1234 uImage > (6) make modules > ... > > The symptoms, when the bug appeared, was that the kernel was built with > some version string "4.5.6-01234-g0abcdef", but the modules (both the > native kernel modules and those built externally using the linux-dev > package) ended up in /lib/modules/4.5.6-01234-g0abcdef-dirty. > > Now, the truly annoying thing about this bug was that when I did the > above steps manually, but did a "git status" in-between steps 4 and 5 to > see where the -dirty might come from, I could _never_ reproduce the bug > (and git status always reported that the tree was clean), while it > usually happened at least 1 in 20 CI builds. Of course, I didn't realize > right away that "git status" was hiding the bug, but without that, I > could reproduce about as often as the CI, while 100 builds with that > extra step never reproduced. With that as a hint, I found > http://git.661346.n2.nabble.com/BUG-in-git-diff-index-td7652105.html#a7652109, > and slowly began to realize what might be happening. > > The thing is, the setlocalversion script doesn't use "git status" to > decide whether to append -dirty, but rather the plumbing command "git > diff-index". During the "make vmlinux", the tree is obviously clean, so > we get the 4.5.6-01234-g0abcdef string baked into vmlinux. In some > cases, the two files > > arch/arm/crypto/aesbs-core.S_shipped > arch/arm/crypto/bsaes-armv7.pl > > ended up with different timestamps (the latter being newest) after the > git checkout, so during that same "make vmlinux", the S_shipped file > would be regenerated using the perl script. Since it is updated with ">" > shell redirection, the dev and ino stays the same, so does the content, > but the mtime is obviously updated. > > If we then go on to do a "make LOADADDR=0x1234 uImage", the kernel build > system again calls setlocalversion, and this time "git diff-index > --name-only HEAD" actually produces output (namely, > arch/arm/crypto/aesbs-core.S_shipped), so include/config/kernel.release > is recreated, this time containing "4.5.6-01234-g0abcdef-dirty". The > right version string was already baked into vmlinux, so the resulting > kernel shows that for uname -r, but everything from now on, including > internal and external module builds, ends up using the -dirty string. > > And this is why doing a "git status" would always hide the bug: git > diff-index only looks at the stat(2) information, and the newer mtime > was enough for it to print out arch/arm/crypto/aesbs-core.S_shipped ; > but "git status" actually opens the file and compares the contents > (which are unchanged), and then updates git's index, noting the new > stat() info for the file. > > With that, I started looking at the timestamps of the above files > immediately after "git checkout". They were either identical (=no bug), > or the .pl was 4 ms (aka 1 jiffy) newer than the .S_shipped (=the bug > would appear), so that explained why the bug apparently happened at > random: if and only if a timer tick hit between git creating those two > files. > > What we ended up doing was to explicitly set the mtime of every file in > the repo to the same reference time after the git checkout (find ... | > xargs touch --date=...). I also wanted to send a patch to change the > Makefile to use the filechk mechanism to avoid updating the .S_shipped > file when the script produced identical output, but never got around to it. > I had no idea that _shipped files were causing issues like this, and AFAICT, this is not specific to this use case in arch/arm/crypto, right? Russell, would you mind if we removed the _shipped.S file here and just assume that perl is available? ^ permalink raw reply [flat|nested] 13+ messages in thread
* a Heisenbug tale (was: Re: arm crypto .S_shipped files sometimes get rebuilt randomly) @ 2018-03-09 9:45 ` Ard Biesheuvel 0 siblings, 0 replies; 13+ messages in thread From: Ard Biesheuvel @ 2018-03-09 9:45 UTC (permalink / raw) To: linux-arm-kernel On 8 March 2018 at 23:19, Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: > On 2018-03-07 20:25, Leonard Crestez wrote: >> Hello, >> >> I am using a toolchain with a broken/old version of perl which doesn't >> include integer.pm and I noticed it triggers occasional build failures >> on arch/arm64/crypto/sha512-core.S_shipped. Workarounds are easy, but >> if the purpose of the .S_shipped is to avoid the need to have all >> dependencies on the build machine then something went wrong? > > About a year ago, I debugged a very similar problem for a customer. The > build system (an OpenEmbedded clone) did something like > > (1) git clone -n ... > (2) git checkout -q $somerev > (3) make foo_config > (4) make vmlinux > (5) make LOADADDR=0x1234 uImage > (6) make modules > ... > > The symptoms, when the bug appeared, was that the kernel was built with > some version string "4.5.6-01234-g0abcdef", but the modules (both the > native kernel modules and those built externally using the linux-dev > package) ended up in /lib/modules/4.5.6-01234-g0abcdef-dirty. > > Now, the truly annoying thing about this bug was that when I did the > above steps manually, but did a "git status" in-between steps 4 and 5 to > see where the -dirty might come from, I could _never_ reproduce the bug > (and git status always reported that the tree was clean), while it > usually happened at least 1 in 20 CI builds. Of course, I didn't realize > right away that "git status" was hiding the bug, but without that, I > could reproduce about as often as the CI, while 100 builds with that > extra step never reproduced. With that as a hint, I found > http://git.661346.n2.nabble.com/BUG-in-git-diff-index-td7652105.html#a7652109, > and slowly began to realize what might be happening. > > The thing is, the setlocalversion script doesn't use "git status" to > decide whether to append -dirty, but rather the plumbing command "git > diff-index". During the "make vmlinux", the tree is obviously clean, so > we get the 4.5.6-01234-g0abcdef string baked into vmlinux. In some > cases, the two files > > arch/arm/crypto/aesbs-core.S_shipped > arch/arm/crypto/bsaes-armv7.pl > > ended up with different timestamps (the latter being newest) after the > git checkout, so during that same "make vmlinux", the S_shipped file > would be regenerated using the perl script. Since it is updated with ">" > shell redirection, the dev and ino stays the same, so does the content, > but the mtime is obviously updated. > > If we then go on to do a "make LOADADDR=0x1234 uImage", the kernel build > system again calls setlocalversion, and this time "git diff-index > --name-only HEAD" actually produces output (namely, > arch/arm/crypto/aesbs-core.S_shipped), so include/config/kernel.release > is recreated, this time containing "4.5.6-01234-g0abcdef-dirty". The > right version string was already baked into vmlinux, so the resulting > kernel shows that for uname -r, but everything from now on, including > internal and external module builds, ends up using the -dirty string. > > And this is why doing a "git status" would always hide the bug: git > diff-index only looks at the stat(2) information, and the newer mtime > was enough for it to print out arch/arm/crypto/aesbs-core.S_shipped ; > but "git status" actually opens the file and compares the contents > (which are unchanged), and then updates git's index, noting the new > stat() info for the file. > > With that, I started looking at the timestamps of the above files > immediately after "git checkout". They were either identical (=no bug), > or the .pl was 4 ms (aka 1 jiffy) newer than the .S_shipped (=the bug > would appear), so that explained why the bug apparently happened at > random: if and only if a timer tick hit between git creating those two > files. > > What we ended up doing was to explicitly set the mtime of every file in > the repo to the same reference time after the git checkout (find ... | > xargs touch --date=...). I also wanted to send a patch to change the > Makefile to use the filechk mechanism to avoid updating the .S_shipped > file when the script produced identical output, but never got around to it. > I had no idea that _shipped files were causing issues like this, and AFAICT, this is not specific to this use case in arch/arm/crypto, right? Russell, would you mind if we removed the _shipped.S file here and just assume that perl is available? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: a Heisenbug tale 2018-03-09 9:45 ` Ard Biesheuvel @ 2018-03-11 0:56 ` Rasmus Villemoes -1 siblings, 0 replies; 13+ messages in thread From: Rasmus Villemoes @ 2018-03-11 0:56 UTC (permalink / raw) To: Ard Biesheuvel, Rasmus Villemoes, Russell King Cc: Leonard Crestez, Masahiro Yamada, Herbert Xu, linux-arm-kernel, linux-kernel On 2018-03-09 10:45, Ard Biesheuvel wrote: > On 8 March 2018 at 23:19, Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: >> On 2018-03-07 20:25, Leonard Crestez wrote: >>> Hello, >>> >> >> What we ended up doing was to explicitly set the mtime of every file in >> the repo to the same reference time after the git checkout (find ... | >> xargs touch --date=...). I also wanted to send a patch to change the >> Makefile to use the filechk mechanism to avoid updating the .S_shipped >> file when the script produced identical output, but never got around to it. >> > > I had no idea that _shipped files were causing issues like this, and > AFAICT, this is not specific to this use case in arch/arm/crypto, > right? > > Russell, would you mind if we removed the _shipped.S file here and > just assume that perl is available? > Well, in that case I won't need to write a proper changelog for the below, but this seems to work. It will of course still give the spurious build failures when perl is not available and one hits the "files got checked out at almost but not quite the same time", but it would have prevented the spurious -dirty bug. diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index 30ef8e291271..f0cec9a92fd8 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile @@ -54,13 +54,14 @@ crct10dif-arm-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o -quiet_cmd_perl = PERL $@ - cmd_perl = $(PERL) $(<) > $(@) +define filechk_perl + perl $< +endef $(src)/sha256-core.S_shipped: $(src)/sha256-armv4.pl - $(call cmd,perl) + $(call filechk,perl) $(src)/sha512-core.S_shipped: $(src)/sha512-armv4.pl - $(call cmd,perl) + $(call filechk,perl) .PRECIOUS: $(obj)/sha256-core.S $(obj)/sha512-core.S ^ permalink raw reply related [flat|nested] 13+ messages in thread
* a Heisenbug tale @ 2018-03-11 0:56 ` Rasmus Villemoes 0 siblings, 0 replies; 13+ messages in thread From: Rasmus Villemoes @ 2018-03-11 0:56 UTC (permalink / raw) To: linux-arm-kernel On 2018-03-09 10:45, Ard Biesheuvel wrote: > On 8 March 2018 at 23:19, Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: >> On 2018-03-07 20:25, Leonard Crestez wrote: >>> Hello, >>> >> >> What we ended up doing was to explicitly set the mtime of every file in >> the repo to the same reference time after the git checkout (find ... | >> xargs touch --date=...). I also wanted to send a patch to change the >> Makefile to use the filechk mechanism to avoid updating the .S_shipped >> file when the script produced identical output, but never got around to it. >> > > I had no idea that _shipped files were causing issues like this, and > AFAICT, this is not specific to this use case in arch/arm/crypto, > right? > > Russell, would you mind if we removed the _shipped.S file here and > just assume that perl is available? > Well, in that case I won't need to write a proper changelog for the below, but this seems to work. It will of course still give the spurious build failures when perl is not available and one hits the "files got checked out at almost but not quite the same time", but it would have prevented the spurious -dirty bug. diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index 30ef8e291271..f0cec9a92fd8 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile @@ -54,13 +54,14 @@ crct10dif-arm-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o -quiet_cmd_perl = PERL $@ - cmd_perl = $(PERL) $(<) > $(@) +define filechk_perl + perl $< +endef $(src)/sha256-core.S_shipped: $(src)/sha256-armv4.pl - $(call cmd,perl) + $(call filechk,perl) $(src)/sha512-core.S_shipped: $(src)/sha512-armv4.pl - $(call cmd,perl) + $(call filechk,perl) .PRECIOUS: $(obj)/sha256-core.S $(obj)/sha512-core.S ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: a Heisenbug tale (was: Re: arm crypto .S_shipped files sometimes get rebuilt randomly) 2018-03-09 9:45 ` Ard Biesheuvel (?) (?) @ 2018-03-12 16:52 ` Leonard Crestez -1 siblings, 0 replies; 13+ messages in thread From: Leonard Crestez @ 2018-03-12 16:52 UTC (permalink / raw) To: Ard Biesheuvel, Rasmus Villemoes, Russell King, Masahiro Yamada Cc: Herbert Xu, linux-arm-kernel, linux-kernel, Greg Kroah-Hartman On Fri, 2018-03-09 at 09:45 +0000, Ard Biesheuvel wrote: > On 8 March 2018 at 23:19, Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: > > On 2018-03-07 20:25, Leonard Crestez wrote: > > > I am using a toolchain with a broken/old version of perl which doesn't > > > include integer.pm and I noticed it triggers occasional build failures > > > on arch/arm64/crypto/sha512-core.S_shipped. Workarounds are easy, but > > > if the purpose of the .S_shipped is to avoid the need to have all > > > dependencies on the build machine then something went wrong? > > About a year ago, I debugged a very similar problem for a customer. The > > build system (an OpenEmbedded clone) did something like > > > > (1) git clone -n ... > > (2) git checkout -q $somerev > > (3) make foo_config > > (4) make vmlinux > > (5) make LOADADDR=0x1234 uImage > > (6) make modules > > ... > > > > The symptoms, when the bug appeared, was that the kernel was built with > > some version string "4.5.6-01234-g0abcdef", but the modules (both the > > native kernel modules and those built externally using the linux-dev > > package) ended up in /lib/modules/4.5.6-01234-g0abcdef-dirty. > I had no idea that _shipped files were causing issues like this, and > AFAICT, this is not specific to this use case in arch/arm/crypto, > right? > > Russell, would you mind if we removed the _shipped.S file here and > just assume that perl is available? It would be good for the fix to get into stable branches so that downstreams don't each have to invent workarounds. Isn't it much easier to do this with an #ifdef REGENERATE_ARM_CRYPTO_ASM? It might even be a requirement for getting into stable. The _shipped.S stuff can be removed later just for upstream. -- Regards, Leonard ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2018-03-12 16:52 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-03-07 19:25 arm crypto .S_shipped files sometimes get rebuilt randomly Leonard Crestez 2018-03-08 5:00 ` Masahiro Yamada 2018-03-08 5:00 ` Masahiro Yamada 2018-03-08 7:02 ` Ard Biesheuvel 2018-03-08 7:02 ` Ard Biesheuvel 2018-03-08 14:11 ` Leonard Crestez 2018-03-08 23:19 ` a Heisenbug tale (was: Re: arm crypto .S_shipped files sometimes get rebuilt randomly) Rasmus Villemoes 2018-03-08 23:19 ` Rasmus Villemoes 2018-03-09 9:45 ` Ard Biesheuvel 2018-03-09 9:45 ` Ard Biesheuvel 2018-03-11 0:56 ` a Heisenbug tale Rasmus Villemoes 2018-03-11 0:56 ` Rasmus Villemoes 2018-03-12 16:52 ` a Heisenbug tale (was: Re: arm crypto .S_shipped files sometimes get rebuilt randomly) Leonard Crestez
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.