linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] kbuild: remove the target in signal traps when interrupted
@ 2022-08-07  0:48 Masahiro Yamada
  2022-08-07  9:39 ` Ingo Molnar
  2022-08-19  4:37 ` Nicolas Schier
  0 siblings, 2 replies; 3+ messages in thread
From: Masahiro Yamada @ 2022-08-07  0:48 UTC (permalink / raw)
  To: linux-kbuild
  Cc: Masahiro Yamada, Ingo Molnar, Rob Herring, Michal Marek,
	Nick Desaulniers, linux-kernel

When receiving some signal, GNU Make automatically deletes the target if
it has already been changed by the interrupted recipe.

If the target is possibly incomplete due to interruption, it must be
deleted so that it will be remade from scratch on the next run of make.
Otherwise, the target would remain corrupted permanently because its
timestamp had already been updated.

Thanks to this behavior of Make, you can stop the build any time by
pressing Ctrl-C, and just run 'make' to resume it.

Kbuild also relies on this feature, but it is equivalently important
for any build systems that make decisions based on timestamps (if you
want to support stop/resume reliably).

However, this does not always work as claimed; Make immediately dies
with Ctrl-C if its stderr goes into a pipe.

  [Test Makefile]

    foo:
            echo hello > $@
            sleep 3
            echo world >> $@

  [Test Result]

    $ make                         # hit Ctrl-C
    echo hello > foo
    sleep 3
    ^Cmake: *** Deleting file 'foo'
    make: *** [Makefile:3: foo] Interrupt

    $ make 2>&1 | cat              # hit Ctrl-C
    echo hello > foo
    sleep 3
    ^C$                            # 'foo' is often left-over

The reason is because SIGINT is sent to the entire process group.
In this example, SIGINT kills 'cat', and 'make' writes the message to
the closed pipe, then dies with SIGPIPE.

A typical bad scenario (as reported by [1], [2]) is to save build log
by using the 'tee' command:

    $ make 2>&1 | tee log

Again, this can be problematic for any build systems based on Make, so
I hope it will be fixed in GNU Make. The maintainer of GNU Make stated
this is a long-standing issue and difficult to fix [3]. It has not been
fixed yet as of writing.

So, we cannot rely on Make cleaning the target. We can do it by
ourselves, in signal traps.

As far as I understand, Make takes care of SIGHUP, SIGINT, SIGQUIT, and
SITERM for the target removal. I added the traps for them, and also for
SIGPIPE just in case cmd_* rule prints something to stdout or stderr
(but I did not observe an actual case where SIGPIPE was triggered).

[Note 1]

The trap handler might be worth explaining.

    rm -f $@; trap - $(sig); kill -s $(sig) $$

This lets the shell kill itself by the signal it caught, so the parent
process can tell the child has exited on the signal. Generally, this is
a proper manner for handling signals, in case the calling program (like
Bash) may monitor WIFSIGNALED() and WTERMSIG() for WCE (Wait and
Cooperative Exit) [4] although this may not be a big deal here because
GNU Make handles SIGHUP, SIGINT, SIGQUIT in WUE (Wait and Unconditional
Exit) and SIGTERM in IUE (Immediate Unconditional Exit).

[Note 2]

Reverting 392885ee82d3 ("kbuild: let fixdep directly write to .*.cmd
files") would directly address [1], but it only saves if_changed_dep.
As reported in [2], all commands that use redirection can potentially
leave an empty (i.e. broken) target.

[Note 3]

Another (even safer) approach might be to always write to a temporary
file, and rename it to $@ at the end of the recipe.

   <command>  > $(tmp-target)
   mv $(tmp-target) $@

It would require a lot of Makefile changes, and result in ugly code,
so I did not take it.

[Note 4]

A little more thoughts about a pattern rule with multiple targets (or
a grouped target).

    %.x %.y: %.z
            <recipe>

When interrupted, GNU Make deletes both %.x and %.y, while this solution
only deletes $@. Probably, this is not a big deal. The next run of make
will execute the rule again to create $@ along with the other files.

[1]: https://lore.kernel.org/all/YLeot94yAaM4xbMY@gmail.com/
[2]: https://lore.kernel.org/all/20220510221333.2770571-1-robh@kernel.org/
[3]: https://lists.gnu.org/archive/html/help-make/2021-06/msg00001.html
[4]: https://www.cons.org/cracauer/sigint.html

Reported-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Rob Herring <robh@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---

If you are happy to help test this patch, that will be appreciated.

Without applying this patch,

    $ make -j<nr-proc> 2>&1 | tee log

Then, you will see an error reported in [1].
You may need to repeat it dozen of times to reproduce it.
The more CPU cores you have, the easier you will get the error.

Apply this patch, and repeat the same.
You will no longer see that error (hopefully).


 scripts/Kbuild.include | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index ece44b735061..9432a7f33186 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -100,8 +100,29 @@ echo-cmd = $(if $($(quiet)cmd_$(1)),\
  quiet_redirect :=
 silent_redirect := exec >/dev/null;
 
+# Delete the target on interruption
+#
+# GNU Make automatically deletes the target if it has already been changed by
+# the interrupted recipe. So, you can safely stop the build by Ctrl-C (Make
+# will delete incomplete targets), and resume it later.
+#
+# However, this does not work when the stderr is piped to another program, like
+#  $ make >&2 | tee log
+# Make dies with SIGPIPE before cleaning the targets.
+#
+# To address it, we cleans the target in signal traps.
+#
+# Make deletes the target when it catches SIGHUP, SIGINT, SIGQUIT, SIGTERM.
+# So, we cover them, and also SIGPIPE just in case.
+#
+# Of course, this is unneeded for phony targets.
+delete-on-interrupt = \
+	$(if $(filter-out $(PHONY), $@), \
+		$(foreach sig, HUP INT QUIT TERM PIPE, \
+			trap 'rm -f $@; trap - $(sig); kill -s $(sig) $$$$' $(sig);))
+
 # printing commands
-cmd = @set -e; $(echo-cmd) $($(quiet)redirect) $(cmd_$(1))
+cmd = @set -e; $(echo-cmd) $($(quiet)redirect) $(delete-on-interrupt) $(cmd_$(1))
 
 ###
 # if_changed      - execute command if any prerequisite is newer than
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] kbuild: remove the target in signal traps when interrupted
  2022-08-07  0:48 [PATCH] kbuild: remove the target in signal traps when interrupted Masahiro Yamada
@ 2022-08-07  9:39 ` Ingo Molnar
  2022-08-19  4:37 ` Nicolas Schier
  1 sibling, 0 replies; 3+ messages in thread
From: Ingo Molnar @ 2022-08-07  9:39 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: linux-kbuild, Rob Herring, Michal Marek, Nick Desaulniers, linux-kernel



* Masahiro Yamada <masahiroy@kernel.org> wrote:

> Reported-by: Ingo Molnar <mingo@kernel.org>
> Reported-by: Rob Herring <robh@kernel.org>
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> ---
> 
> If you are happy to help test this patch, that will be appreciated.
> 
> Without applying this patch,
> 
>     $ make -j<nr-proc> 2>&1 | tee log
> 
> Then, you will see an error reported in [1].
> You may need to repeat it dozen of times to reproduce it.
> The more CPU cores you have, the easier you will get the error.
> 
> Apply this patch, and repeat the same.
> You will no longer see that error (hopefully).
> 
> 
>  scripts/Kbuild.include | 23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)

I've tested your patch on a system with 64 CPUs on the latest upstream 
kernel, and without your patch I was able to quickly reproduce the 
corrupted .cmd files within 4-5 interrupted kernel builds:

  fs/xfs/libxfs/.xfs_alloc.o.cmd:5: *** unterminated call to function 'wildcard': missing ')'.  Stop.

or:

  kernel/time/.tick-broadcast.o.cmd:5: *** unterminated call to function 'wildcard': missing ')'.  Stop.

or:

  lib/.is_single_threaded.o.cmd:5: *** unterminated call to function 'wildcard': missing ')'.  Stop.

These corrupted .cmd files blocked subsequent additive builds perpetually, 
until I manually removed the corrupted .o.cmd file or did an explicit 'make clean'.

With your patch I've yet to see a single failure, after dozens of attempts.

  Tested-by: Ingo Molnar <mingo@kernel.org>

Thanks a lot for taking care of this problem!

	Ingo

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] kbuild: remove the target in signal traps when interrupted
  2022-08-07  0:48 [PATCH] kbuild: remove the target in signal traps when interrupted Masahiro Yamada
  2022-08-07  9:39 ` Ingo Molnar
@ 2022-08-19  4:37 ` Nicolas Schier
  1 sibling, 0 replies; 3+ messages in thread
From: Nicolas Schier @ 2022-08-19  4:37 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: linux-kbuild, Ingo Molnar, Rob Herring, Michal Marek,
	Nick Desaulniers, linux-kernel

On Sun 07 Aug 2022 09:48:09 +0900 Masahiro Yamada wrote:
> When receiving some signal, GNU Make automatically deletes the target if
> it has already been changed by the interrupted recipe.
> 
> If the target is possibly incomplete due to interruption, it must be
> deleted so that it will be remade from scratch on the next run of make.
> Otherwise, the target would remain corrupted permanently because its
> timestamp had already been updated.
> 
> Thanks to this behavior of Make, you can stop the build any time by
> pressing Ctrl-C, and just run 'make' to resume it.
> 
> Kbuild also relies on this feature, but it is equivalently important
> for any build systems that make decisions based on timestamps (if you
> want to support stop/resume reliably).
> 
> However, this does not always work as claimed; Make immediately dies
> with Ctrl-C if its stderr goes into a pipe.
> 
>   [Test Makefile]
> 
>     foo:
>             echo hello > $@
>             sleep 3
>             echo world >> $@
> 
>   [Test Result]
> 
>     $ make                         # hit Ctrl-C
>     echo hello > foo
>     sleep 3
>     ^Cmake: *** Deleting file 'foo'
>     make: *** [Makefile:3: foo] Interrupt
> 
>     $ make 2>&1 | cat              # hit Ctrl-C
>     echo hello > foo
>     sleep 3
>     ^C$                            # 'foo' is often left-over
> 
> The reason is because SIGINT is sent to the entire process group.
> In this example, SIGINT kills 'cat', and 'make' writes the message to
> the closed pipe, then dies with SIGPIPE.
> 
> A typical bad scenario (as reported by [1], [2]) is to save build log
> by using the 'tee' command:
> 
>     $ make 2>&1 | tee log
> 
> Again, this can be problematic for any build systems based on Make, so
> I hope it will be fixed in GNU Make. The maintainer of GNU Make stated
> this is a long-standing issue and difficult to fix [3]. It has not been
> fixed yet as of writing.
> 
> So, we cannot rely on Make cleaning the target. We can do it by
> ourselves, in signal traps.
> 
> As far as I understand, Make takes care of SIGHUP, SIGINT, SIGQUIT, and
> SITERM for the target removal. I added the traps for them, and also for
> SIGPIPE just in case cmd_* rule prints something to stdout or stderr
> (but I did not observe an actual case where SIGPIPE was triggered).
> 
> [Note 1]
> 
> The trap handler might be worth explaining.
> 
>     rm -f $@; trap - $(sig); kill -s $(sig) $$
> 
> This lets the shell kill itself by the signal it caught, so the parent
> process can tell the child has exited on the signal. Generally, this is
> a proper manner for handling signals, in case the calling program (like
> Bash) may monitor WIFSIGNALED() and WTERMSIG() for WCE (Wait and
> Cooperative Exit) [4] although this may not be a big deal here because
> GNU Make handles SIGHUP, SIGINT, SIGQUIT in WUE (Wait and Unconditional
> Exit) and SIGTERM in IUE (Immediate Unconditional Exit).
> 
> [Note 2]
> 
> Reverting 392885ee82d3 ("kbuild: let fixdep directly write to .*.cmd
> files") would directly address [1], but it only saves if_changed_dep.
> As reported in [2], all commands that use redirection can potentially
> leave an empty (i.e. broken) target.
> 
> [Note 3]
> 
> Another (even safer) approach might be to always write to a temporary
> file, and rename it to $@ at the end of the recipe.
> 
>    <command>  > $(tmp-target)
>    mv $(tmp-target) $@
> 
> It would require a lot of Makefile changes, and result in ugly code,
> so I did not take it.
> 
> [Note 4]
> 
> A little more thoughts about a pattern rule with multiple targets (or
> a grouped target).
> 
>     %.x %.y: %.z
>             <recipe>
> 
> When interrupted, GNU Make deletes both %.x and %.y, while this solution
> only deletes $@. Probably, this is not a big deal. The next run of make
> will execute the rule again to create $@ along with the other files.
> 
> [1]: https://lore.kernel.org/all/YLeot94yAaM4xbMY@gmail.com/
> [2]: https://lore.kernel.org/all/20220510221333.2770571-1-robh@kernel.org/
> [3]: https://lists.gnu.org/archive/html/help-make/2021-06/msg00001.html
> [4]: https://www.cons.org/cracauer/sigint.html
> 
> Reported-by: Ingo Molnar <mingo@kernel.org>
> Reported-by: Rob Herring <robh@kernel.org>
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> ---
> 
> If you are happy to help test this patch, that will be appreciated.
> 
> Without applying this patch,
> 
>     $ make -j<nr-proc> 2>&1 | tee log
> 
> Then, you will see an error reported in [1].
> You may need to repeat it dozen of times to reproduce it.
> The more CPU cores you have, the easier you will get the error.
> 
> Apply this patch, and repeat the same.
> You will no longer see that error (hopefully).
> 
> 
>  scripts/Kbuild.include | 23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
> index ece44b735061..9432a7f33186 100644
> --- a/scripts/Kbuild.include
> +++ b/scripts/Kbuild.include
> @@ -100,8 +100,29 @@ echo-cmd = $(if $($(quiet)cmd_$(1)),\
>   quiet_redirect :=
>  silent_redirect := exec >/dev/null;
>  
> +# Delete the target on interruption
> +#
> +# GNU Make automatically deletes the target if it has already been changed by
> +# the interrupted recipe. So, you can safely stop the build by Ctrl-C (Make
> +# will delete incomplete targets), and resume it later.
> +#
> +# However, this does not work when the stderr is piped to another program, like
> +#  $ make >&2 | tee log
> +# Make dies with SIGPIPE before cleaning the targets.
> +#
> +# To address it, we cleans the target in signal traps.

s/cleans/clean/

> +#
> +# Make deletes the target when it catches SIGHUP, SIGINT, SIGQUIT, SIGTERM.
> +# So, we cover them, and also SIGPIPE just in case.
> +#
> +# Of course, this is unneeded for phony targets.
> +delete-on-interrupt = \
> +	$(if $(filter-out $(PHONY), $@), \
> +		$(foreach sig, HUP INT QUIT TERM PIPE, \
> +			trap 'rm -f $@; trap - $(sig); kill -s $(sig) $$$$' $(sig);))
> +
>  # printing commands
> -cmd = @set -e; $(echo-cmd) $($(quiet)redirect) $(cmd_$(1))
> +cmd = @set -e; $(echo-cmd) $($(quiet)redirect) $(delete-on-interrupt) $(cmd_$(1))
>  
>  ###
>  # if_changed      - execute command if any prerequisite is newer than
> -- 
> 2.34.1

Thanks for the patch and the verbose reasoning.  I would like to see 
stable@k.o added if you think it is appropriate (patch applies cleanly 
to 5.4, 5.15).

Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>

Kind regards,
Nicolas

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-08-19  4:38 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-07  0:48 [PATCH] kbuild: remove the target in signal traps when interrupted Masahiro Yamada
2022-08-07  9:39 ` Ingo Molnar
2022-08-19  4:37 ` Nicolas Schier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).