All of lore.kernel.org
 help / color / mirror / Atom feed
* [LTP] [PATCH v2] tst_test: using SIGTERM to terminate process
@ 2021-05-19  8:58 Li Wang
  2021-05-19  9:21 ` Joerg Vehlow
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Li Wang @ 2021-05-19  8:58 UTC (permalink / raw)
  To: ltp

We'd better avoid using SIGINT for process terminating becuasue,
it has different behavoir on kind of shell.

From Joerg Vehlow's test:

 - bash does not seem to care about SIGINT delivery to background
   processes, but can be blocked using trap

 - zsh ignores SIGINT for background processes by default, but can be
   allowed using trap

 - dash and busybox sh ignore the signal to background processes, and
   this cannot be changed with trap

This patch cover the below situations:

 1. SIGINT (Ctrl^C) for terminating the main process and do cleanup
    correctly before a timeout

 2. Test finish normally and retrieves the _tst_timeout_process in the
    background via SIGTERM(sending by _tst_cleanup_timer)

 3. Test timed out occurs and _tst_kill_test sending SIGTERM to
    terminating all process, and the main process do cleanup work

 4. Test timed out occurs but still have process alive after _tst_kill_test
    sending SIGTERM, then sending SIGKILL to the whole group

 5. Test terminated by SIGTERM unexpectly (e.g. system shutdown or process
    manager) and do cleanup work as well

Co-authored-by: Joerg Vehlow <joerg.vehlow@aox-tech.de>
Signed-off-by: Li Wang <liwang@redhat.com>
Reviewed-by: Joerg Vehlow <joerg.vehlow@aox-tech.de>
---
 lib/newlib_tests/shell/test_timeout.sh | 2 +-
 lib/newlib_tests/shell/timeout03.sh    | 5 +++--
 testcases/lib/tst_test.sh              | 9 +++++----
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/lib/newlib_tests/shell/test_timeout.sh b/lib/newlib_tests/shell/test_timeout.sh
index b05680cb1..c70e22f27 100755
--- a/lib/newlib_tests/shell/test_timeout.sh
+++ b/lib/newlib_tests/shell/test_timeout.sh
@@ -28,7 +28,7 @@ timeout02.sh|  -10|0|  |2
 timeout02.sh| -0.1|0|  |0
 timeout02.sh| -1.1|0|  |2
 timeout02.sh|-10.1|0|  |2
-timeout03.sh|     |0|12|137| | | |Test kill if test does not terminate by SIGINT
+timeout03.sh|     |0|12|137| | | |Test is killed, if it does not terminate after SIGTERM
 timeout04.sh|     |0|  |  2|0|0|1|Verify that timeout is enforced
 timeout02.sh|    2|1| 2|   |1|0|0|Test termination of timeout process
 "
diff --git a/lib/newlib_tests/shell/timeout03.sh b/lib/newlib_tests/shell/timeout03.sh
index cd548d9a2..dabef32a2 100755
--- a/lib/newlib_tests/shell/timeout03.sh
+++ b/lib/newlib_tests/shell/timeout03.sh
@@ -6,8 +6,9 @@
 # expected output:
 # timeout03 1 TINFO: timeout per run is 0h 0m 1s
 # timeout03 1 TINFO: testing killing test after TST_TIMEOUT
-# timeout03 1 TBROK: Test timeouted, sending SIGINT! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
-# timeout03 1 TBROK: test interrupted or timed out
+# timeout03 1 TBROK: Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
+# Terminated
+# timeout03 1 TBROK: test terminated
 # timeout03 1 TPASS: test run cleanup after timeout
 # timeout03 1 TINFO: Test is still running, waiting 10s
 # timeout03 1 TINFO: Test is still running, waiting 9s
diff --git a/testcases/lib/tst_test.sh b/testcases/lib/tst_test.sh
index 3a5651c01..66ffde4eb 100644
--- a/testcases/lib/tst_test.sh
+++ b/testcases/lib/tst_test.sh
@@ -21,7 +21,8 @@ export TST_LIB_LOADED=1
 . tst_security.sh
 
 # default trap function
-trap "tst_brk TBROK 'test interrupted or timed out'" INT
+trap "tst_brk TBROK 'test interrupted'" INT
+trap "unset _tst_setup_timer_pid; tst_brk TBROK 'test terminated'" TERM
 
 _tst_do_exit()
 {
@@ -439,9 +440,9 @@ _tst_kill_test()
 {
 	local i=10
 
-	trap '' INT
-	tst_res TBROK "Test timeouted, sending SIGINT! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1"
-	kill -INT -$pid
+	trap '' TERM
+	tst_res TBROK "Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1"
+	kill -TERM -$pid
 	tst_sleep 100ms
 
 	while kill -0 $pid >/dev/null 2>&1 && [ $i -gt 0 ]; do
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [LTP] [PATCH v2] tst_test: using SIGTERM to terminate process
  2021-05-19  8:58 [LTP] [PATCH v2] tst_test: using SIGTERM to terminate process Li Wang
@ 2021-05-19  9:21 ` Joerg Vehlow
  2021-05-27  4:11   ` Li Wang
  2021-05-31  9:25 ` Li Wang
  2021-09-15 22:40   ` Petr Vorel
  2 siblings, 1 reply; 11+ messages in thread
From: Joerg Vehlow @ 2021-05-19  9:21 UTC (permalink / raw)
  To: ltp

Hi,

On 5/19/2021 10:58 AM, Li Wang wrote:
> diff --git a/lib/newlib_tests/shell/timeout03.sh b/lib/newlib_tests/shell/timeout03.sh
> index cd548d9a2..dabef32a2 100755
> --- a/lib/newlib_tests/shell/timeout03.sh
> +++ b/lib/newlib_tests/shell/timeout03.sh
> @@ -6,8 +6,9 @@
>   # expected output:
>   # timeout03 1 TINFO: timeout per run is 0h 0m 1s
>   # timeout03 1 TINFO: testing killing test after TST_TIMEOUT
> -# timeout03 1 TBROK: Test timeouted, sending SIGINT! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
> -# timeout03 1 TBROK: test interrupted or timed out
> +# timeout03 1 TBROK: Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
> +# Terminated
Nit: This Message seems to be only there for bash... I tried to find why 
and if it can be suppressed, but found nothing. I guess it comes from 
the terminated sleep process.

J?rg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [LTP] [PATCH v2] tst_test: using SIGTERM to terminate process
  2021-05-19  9:21 ` Joerg Vehlow
@ 2021-05-27  4:11   ` Li Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Li Wang @ 2021-05-27  4:11 UTC (permalink / raw)
  To: ltp

On Wed, May 19, 2021 at 5:19 PM Joerg Vehlow <lkml@jv-coder.de> wrote:
>
> Hi,
>
> On 5/19/2021 10:58 AM, Li Wang wrote:
> > diff --git a/lib/newlib_tests/shell/timeout03.sh b/lib/newlib_tests/shell/timeout03.sh
> > index cd548d9a2..dabef32a2 100755
> > --- a/lib/newlib_tests/shell/timeout03.sh
> > +++ b/lib/newlib_tests/shell/timeout03.sh
> > @@ -6,8 +6,9 @@
> >   # expected output:
> >   # timeout03 1 TINFO: timeout per run is 0h 0m 1s
> >   # timeout03 1 TINFO: testing killing test after TST_TIMEOUT
> > -# timeout03 1 TBROK: Test timeouted, sending SIGINT! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
> > -# timeout03 1 TBROK: test interrupted or timed out
> > +# timeout03 1 TBROK: Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
> > +# Terminated
> Nit: This Message seems to be only there for bash... I tried to find why
> and if it can be suppressed, but found nothing. I guess it comes from
> the terminated sleep process.

Yes, but that's not a big deal, it only appears in timed out and we
can tolerate:).

-- 
Regards,
Li Wang


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [LTP] [PATCH v2] tst_test: using SIGTERM to terminate process
  2021-05-19  8:58 [LTP] [PATCH v2] tst_test: using SIGTERM to terminate process Li Wang
  2021-05-19  9:21 ` Joerg Vehlow
@ 2021-05-31  9:25 ` Li Wang
  2021-09-15 22:40   ` Petr Vorel
  2 siblings, 0 replies; 11+ messages in thread
From: Li Wang @ 2021-05-31  9:25 UTC (permalink / raw)
  To: ltp

Hi Cyril, Petr,

Any thoughts on merging this one?

-- 
Regards,
Li Wang


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [LTP] [PATCH v2] tst_test: using SIGTERM to terminate process
@ 2021-09-15 22:40   ` Petr Vorel
  2021-09-15 23:01       ` Petr Vorel
  0 siblings, 1 reply; 11+ messages in thread
From: Petr Vorel @ 2021-09-15 22:40 UTC (permalink / raw)
  To: Li Wang; +Cc: Joerg Vehlow, ltp

Hi Li, all,

> We'd better avoid using SIGINT for process terminating becuasue,
> it has different behavoir on kind of shell.

> From Joerg Vehlow's test:

>  - bash does not seem to care about SIGINT delivery to background
>    processes, but can be blocked using trap

>  - zsh ignores SIGINT for background processes by default, but can be
>    allowed using trap

>  - dash and busybox sh ignore the signal to background processes, and
>    this cannot be changed with trap

> This patch cover the below situations:

>  1. SIGINT (Ctrl^C) for terminating the main process and do cleanup
>     correctly before a timeout

>  2. Test finish normally and retrieves the _tst_timeout_process in the
>     background via SIGTERM(sending by _tst_cleanup_timer)

>  3. Test timed out occurs and _tst_kill_test sending SIGTERM to
>     terminating all process, and the main process do cleanup work

>  4. Test timed out occurs but still have process alive after _tst_kill_test
>     sending SIGTERM, then sending SIGKILL to the whole group

>  5. Test terminated by SIGTERM unexpectly (e.g. system shutdown or process
>     manager) and do cleanup work as well

> Co-authored-by: Joerg Vehlow <joerg.vehlow@aox-tech.de>
> Signed-off-by: Li Wang <liwang@redhat.com>
> Reviewed-by: Joerg Vehlow <joerg.vehlow@aox-tech.de>
...

> +++ b/testcases/lib/tst_test.sh
> @@ -21,7 +21,8 @@ export TST_LIB_LOADED=1
>  . tst_security.sh

>  # default trap function
> -trap "tst_brk TBROK 'test interrupted or timed out'" INT
> +trap "tst_brk TBROK 'test interrupted'" INT
> +trap "unset _tst_setup_timer_pid; tst_brk TBROK 'test terminated'" TERM

FYI this commit (merged as 4a6b8a697 ("tst_test: using SIGTERM to terminate process"))
broke net_stress_interface tests, particularly tst_require_cmds() call (which
calls tst_brk TCONF:

# ./if-addr-adddel.sh -c ifconfig
if-addr-adddel 1 TINFO: initialize 'lhost' 'ltp_ns_veth2' interface
if-addr-adddel 1 TINFO: add local addr 10.0.0.2/24
if-addr-adddel 1 TINFO: add local addr fd00:1:1:1::2/64
if-addr-adddel 1 TINFO: initialize 'rhost' 'ltp_ns_veth1' interface
if-addr-adddel 1 TINFO: add remote addr 10.0.0.1/24
if-addr-adddel 1 TINFO: add remote addr fd00:1:1:1::1/64
if-addr-adddel 1 TINFO: Network config (local -- remote):
if-addr-adddel 1 TINFO: ltp_ns_veth2 -- ltp_ns_veth1
if-addr-adddel 1 TINFO: 10.0.0.2/24 -- 10.0.0.1/24
if-addr-adddel 1 TINFO: fd00:1:1:1::2/64 -- fd00:1:1:1::1/64
if-addr-adddel 1 TINFO: timeout per run is 0h 5m 0s
if-addr-adddel 1 TCONF: 'ifconfig' not found
=> waits till timeout
if-addr-adddel 1 TBROK: Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
if-addr-adddel 1 TWARN: test terminated

Debugging it hangs in wait in _tst_cleanup_timer():

kill -TERM $_tst_setup_timer_pid 2>/dev/null
wait $_tst_setup_timer_pid 2>/dev/null

because kill does not kill the test.

The problem looks to be that unset actually does not work.
trap "unset _tst_setup_timer_pid; tst_brk TBROK 'test terminated'" TERM

It looks to be something setup specific, because I discovered this on SLES on
both bash and dash. Running it on current Debian testing it works on both bash
and dash. I checked shopt output on both, but don't see anything obvious. It
must be something else.

Kind regards,
Petr

>  _tst_do_exit()
>  {
> @@ -439,9 +440,9 @@ _tst_kill_test()
>  {
>  	local i=10

> -	trap '' INT
> -	tst_res TBROK "Test timeouted, sending SIGINT! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1"
> -	kill -INT -$pid
> +	trap '' TERM
> +	tst_res TBROK "Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1"
> +	kill -TERM -$pid
>  	tst_sleep 100ms

>  	while kill -0 $pid >/dev/null 2>&1 && [ $i -gt 0 ]; do

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [LTP] [PATCH v2] tst_test: using SIGTERM to terminate process
@ 2021-09-15 23:01       ` Petr Vorel
  2021-09-17  8:50           ` Cyril Hrubis
  0 siblings, 1 reply; 11+ messages in thread
From: Petr Vorel @ 2021-09-15 23:01 UTC (permalink / raw)
  To: Li Wang, Joerg Vehlow, ltp

Hi Li, all,

> Hi Li, all,

[ Cc Cyril and Alexey ]

> > We'd better avoid using SIGINT for process terminating becuasue,
> > it has different behavoir on kind of shell.

> > From Joerg Vehlow's test:

> >  - bash does not seem to care about SIGINT delivery to background
> >    processes, but can be blocked using trap

> >  - zsh ignores SIGINT for background processes by default, but can be
> >    allowed using trap

> >  - dash and busybox sh ignore the signal to background processes, and
> >    this cannot be changed with trap

> > This patch cover the below situations:

> >  1. SIGINT (Ctrl^C) for terminating the main process and do cleanup
> >     correctly before a timeout

> >  2. Test finish normally and retrieves the _tst_timeout_process in the
> >     background via SIGTERM(sending by _tst_cleanup_timer)

> >  3. Test timed out occurs and _tst_kill_test sending SIGTERM to
> >     terminating all process, and the main process do cleanup work

> >  4. Test timed out occurs but still have process alive after _tst_kill_test
> >     sending SIGTERM, then sending SIGKILL to the whole group

> >  5. Test terminated by SIGTERM unexpectly (e.g. system shutdown or process
> >     manager) and do cleanup work as well

> > Co-authored-by: Joerg Vehlow <joerg.vehlow@aox-tech.de>
> > Signed-off-by: Li Wang <liwang@redhat.com>
> > Reviewed-by: Joerg Vehlow <joerg.vehlow@aox-tech.de>
> ...

> > +++ b/testcases/lib/tst_test.sh
> > @@ -21,7 +21,8 @@ export TST_LIB_LOADED=1
> >  . tst_security.sh

> >  # default trap function
> > -trap "tst_brk TBROK 'test interrupted or timed out'" INT
> > +trap "tst_brk TBROK 'test interrupted'" INT
> > +trap "unset _tst_setup_timer_pid; tst_brk TBROK 'test terminated'" TERM

> FYI this commit (merged as 4a6b8a697 ("tst_test: using SIGTERM to terminate process"))
> broke net_stress_interface tests, particularly tst_require_cmds() call (which
> calls tst_brk TCONF:

> # ./if-addr-adddel.sh -c ifconfig
> if-addr-adddel 1 TINFO: initialize 'lhost' 'ltp_ns_veth2' interface
> if-addr-adddel 1 TINFO: add local addr 10.0.0.2/24
> if-addr-adddel 1 TINFO: add local addr fd00:1:1:1::2/64
> if-addr-adddel 1 TINFO: initialize 'rhost' 'ltp_ns_veth1' interface
> if-addr-adddel 1 TINFO: add remote addr 10.0.0.1/24
> if-addr-adddel 1 TINFO: add remote addr fd00:1:1:1::1/64
> if-addr-adddel 1 TINFO: Network config (local -- remote):
> if-addr-adddel 1 TINFO: ltp_ns_veth2 -- ltp_ns_veth1
> if-addr-adddel 1 TINFO: 10.0.0.2/24 -- 10.0.0.1/24
> if-addr-adddel 1 TINFO: fd00:1:1:1::2/64 -- fd00:1:1:1::1/64
> if-addr-adddel 1 TINFO: timeout per run is 0h 5m 0s
> if-addr-adddel 1 TCONF: 'ifconfig' not found
> => waits till timeout
> if-addr-adddel 1 TBROK: Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
> if-addr-adddel 1 TWARN: test terminated

> Debugging it hangs in wait in _tst_cleanup_timer():

> kill -TERM $_tst_setup_timer_pid 2>/dev/null
> wait $_tst_setup_timer_pid 2>/dev/null

> because kill does not kill the test.

> The problem looks to be that unset actually does not work.
> trap "unset _tst_setup_timer_pid; tst_brk TBROK 'test terminated'" TERM

> It looks to be something setup specific, because I discovered this on SLES on
> both bash and dash. Running it on current Debian testing it works on both bash
> and dash. I checked shopt output on both, but don't see anything obvious. It
> must be something else.
OK, repeatedly running on Debian with dash I managed to get hang as well:

Here it does not even quit the test:

if-addr-adddel 1 TCONF: 'ifconfig' not found
if-addr-adddel 1 TBROK: Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
if-addr-adddel 1 TBROK: Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
if-addr-adddel 1 TBROK: Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
if-addr-adddel 1 TBROK: Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
if-addr-adddel 1 TBROK: Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
if-addr-adddel 1 TWARN: test terminated

Maybe not only SIGINT, but even SIGTERM is not reliable to background process?
Minimal reproducible example, on Dash needs few runs to hang:

cat > debug.sh <<EOF
#!/bin/sh

TST_SETUP="setup"
TST_TESTFUNC="do_test"
. tst_test.sh

setup()
{
	tst_brk TCONF "quit now!"
}

do_test()
{
	tst_res TPASS "pass :)"
}

tst_run
EOF

# while true; do ./debug.sh; done

Kind regards,
Petr

> Kind regards,
> Petr

> >  _tst_do_exit()
> >  {
> > @@ -439,9 +440,9 @@ _tst_kill_test()
> >  {
> >  	local i=10

> > -	trap '' INT
> > -	tst_res TBROK "Test timeouted, sending SIGINT! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1"
> > -	kill -INT -$pid
> > +	trap '' TERM
> > +	tst_res TBROK "Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1"
> > +	kill -TERM -$pid
> >  	tst_sleep 100ms

> >  	while kill -0 $pid >/dev/null 2>&1 && [ $i -gt 0 ]; do

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [LTP] [PATCH v2] tst_test: using SIGTERM to terminate process
@ 2021-09-17  8:50           ` Cyril Hrubis
  2021-09-17  9:17               ` Petr Vorel
  0 siblings, 1 reply; 11+ messages in thread
From: Cyril Hrubis @ 2021-09-17  8:50 UTC (permalink / raw)
  To: Petr Vorel; +Cc: ltp, Joerg Vehlow

Hi!
> setup()
> {
> 	tst_brk TCONF "quit now!"
> }
> 
> do_test()
> {
> 	tst_res TPASS "pass :)"
> }
> 
> tst_run
> EOF
> 
> # while true; do ./debug.sh; done

I managed to reproduce this in dash. I bet that this is a bug where
signal handler inside dash is temporarily disabled when we install the
trap and if we manage to hit that window the signal is discarded. At
least that is my working theory. After I've installed debug prints, in
the cases where it hangs the signal was sent just before have installed
the trap. And in some cases when the signal arrives the timer process is
killed but the trap is not invoked. So it really looks like signal
handling in dash is simply broken. Not sure what we can do about bugs
like this apart from switching to a real programming language.

-- 
Cyril Hrubis
chrubis@suse.cz

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [LTP] [PATCH v2] tst_test: using SIGTERM to terminate process
@ 2021-09-17  9:17               ` Petr Vorel
  2021-09-17 10:18                   ` Petr Vorel
  0 siblings, 1 reply; 11+ messages in thread
From: Petr Vorel @ 2021-09-17  9:17 UTC (permalink / raw)
  To: Cyril Hrubis; +Cc: ltp, Joerg Vehlow

Hi Cyril,

> Hi!
> > setup()
> > {
> > 	tst_brk TCONF "quit now!"
> > }

> > do_test()
> > {
> > 	tst_res TPASS "pass :)"
> > }

> > tst_run
> > EOF

> > # while true; do ./debug.sh; done

> I managed to reproduce this in dash. I bet that this is a bug where
> signal handler inside dash is temporarily disabled when we install the
> trap and if we manage to hit that window the signal is discarded. At
> least that is my working theory. After I've installed debug prints, in
> the cases where it hangs the signal was sent just before have installed
> the trap. And in some cases when the signal arrives the timer process is
> killed but the trap is not invoked. So it really looks like signal
> handling in dash is simply broken. Not sure what we can do about bugs
> like this apart from switching to a real programming language.

Thanks for the debugging. *bash* is also affected, at least some releases.
I reproduced it also on some older SLES, with bash 4.4.

Kind regards,
Petr

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [LTP] [PATCH v2] tst_test: using SIGTERM to terminate process
@ 2021-09-17 10:18                   ` Petr Vorel
  2021-09-17 10:59                       ` Cyril Hrubis
  0 siblings, 1 reply; 11+ messages in thread
From: Petr Vorel @ 2021-09-17 10:18 UTC (permalink / raw)
  To: Cyril Hrubis, Li Wang, Joerg Vehlow, ltp, Alexey Kodanev

> Hi Cyril,

> > Hi!
> > > setup()
> > > {
> > > 	tst_brk TCONF "quit now!"
> > > }

> > > do_test()
> > > {
> > > 	tst_res TPASS "pass :)"
> > > }

> > > tst_run
> > > EOF

> > > # while true; do ./debug.sh; done

> > I managed to reproduce this in dash. I bet that this is a bug where
> > signal handler inside dash is temporarily disabled when we install the
> > trap and if we manage to hit that window the signal is discarded. At
> > least that is my working theory. After I've installed debug prints, in
> > the cases where it hangs the signal was sent just before have installed
> > the trap. And in some cases when the signal arrives the timer process is
> > killed but the trap is not invoked. So it really looks like signal
> > handling in dash is simply broken. Not sure what we can do about bugs
> > like this apart from switching to a real programming language.
Which version of bash and dash are you testing on?

> Thanks for the debugging. *bash* is also affected, at least some releases.
> I reproduced it also on some older SLES, with bash 4.4.
dash 0.5.11.4 and 5.1.8 on my Tumbleweed laptop are OK.

I tested it on various my VM:

dash *failing*: 0.5.8 (SLES), 0.5.11.3 (Tumbleweed), 0.5.11+git20200708+dd9ef66-5 (Debian), 0.5.7-4+b1 (Debian)
dash *OK*: 0.5.11.2 (SLES 15), 0.5.10.2 (CentOS)

bash *failing*: 5.1.4-1.4 (Tumbleweed), 4.4-9.7.1 (SLES 15)
bash *ok*: 4.4-17 (SLES 15), 4.3-83 (SLES 12), 4.3-11+deb8u (Debian), 5.1-2+b3
(Debian), 4.2.46-34 (CentOS)

I have no idea what it causes, whether really some bash and dash versions are
buggy or it's reproducible only on certain environment.

Any tip what to search for?

Kind regards,
Petr

> Kind regards,
> Petr

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [LTP] [PATCH v2] tst_test: using SIGTERM to terminate process
@ 2021-09-17 10:59                       ` Cyril Hrubis
  2021-09-17 11:03                           ` Petr Vorel
  0 siblings, 1 reply; 11+ messages in thread
From: Cyril Hrubis @ 2021-09-17 10:59 UTC (permalink / raw)
  To: Petr Vorel; +Cc: ltp, Joerg Vehlow

Hi!
> > > I managed to reproduce this in dash. I bet that this is a bug where
> > > signal handler inside dash is temporarily disabled when we install the
> > > trap and if we manage to hit that window the signal is discarded. At
> > > least that is my working theory. After I've installed debug prints, in
> > > the cases where it hangs the signal was sent just before have installed
> > > the trap. And in some cases when the signal arrives the timer process is
> > > killed but the trap is not invoked. So it really looks like signal
> > > handling in dash is simply broken. Not sure what we can do about bugs
> > > like this apart from switching to a real programming language.
> Which version of bash and dash are you testing on?
> 
> > Thanks for the debugging. *bash* is also affected, at least some releases.
> > I reproduced it also on some older SLES, with bash 4.4.
> dash 0.5.11.4 and 5.1.8 on my Tumbleweed laptop are OK.
> 
> I tested it on various my VM:
> 
> dash *failing*: 0.5.8 (SLES), 0.5.11.3 (Tumbleweed), 0.5.11+git20200708+dd9ef66-5 (Debian), 0.5.7-4+b1 (Debian)
> dash *OK*: 0.5.11.2 (SLES 15), 0.5.10.2 (CentOS)
> 
> bash *failing*: 5.1.4-1.4 (Tumbleweed), 4.4-9.7.1 (SLES 15)
> bash *ok*: 4.4-17 (SLES 15), 4.3-83 (SLES 12), 4.3-11+deb8u (Debian), 5.1-2+b3
> (Debian), 4.2.46-34 (CentOS)
> 
> I have no idea what it causes, whether really some bash and dash versions are
> buggy or it's reproducible only on certain environment.

bash 5.1.8 seems to work okay, dash 0.5.11.3-r2 seems to fail here.

> Any tip what to search for?

Not really, apart from reading source code and figuring out exactly what
happens here.

I guess that we can make things more predictable and easier to read by
shifting parts of the shell library to a C process.

For instance if we wrote an utility that would implement all the
tst_kill_test and timeout process in C we would simplify things greatly.

Something as:

tst_timeout_kill

that would be used as:


tst_timeout_kill 300 12342
                  ^    ^
		  |    process group leader pid
		  timeout in seconds


That would implement both the loop for killing tests and timeout
processing as well. That way we would get rid of the trap in the
subshell and we would end up with a single pid for the whole timeout
process and avoid the recursive sigkill to begin with.

-- 
Cyril Hrubis
chrubis@suse.cz

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [LTP] [PATCH v2] tst_test: using SIGTERM to terminate process
@ 2021-09-17 11:03                           ` Petr Vorel
  0 siblings, 0 replies; 11+ messages in thread
From: Petr Vorel @ 2021-09-17 11:03 UTC (permalink / raw)
  To: Cyril Hrubis; +Cc: ltp, Joerg Vehlow

> Hi!
> > > > I managed to reproduce this in dash. I bet that this is a bug where
> > > > signal handler inside dash is temporarily disabled when we install the
> > > > trap and if we manage to hit that window the signal is discarded. At
> > > > least that is my working theory. After I've installed debug prints, in
> > > > the cases where it hangs the signal was sent just before have installed
> > > > the trap. And in some cases when the signal arrives the timer process is
> > > > killed but the trap is not invoked. So it really looks like signal
> > > > handling in dash is simply broken. Not sure what we can do about bugs
> > > > like this apart from switching to a real programming language.
> > Which version of bash and dash are you testing on?

> > > Thanks for the debugging. *bash* is also affected, at least some releases.
> > > I reproduced it also on some older SLES, with bash 4.4.
> > dash 0.5.11.4 and 5.1.8 on my Tumbleweed laptop are OK.

> > I tested it on various my VM:

> > dash *failing*: 0.5.8 (SLES), 0.5.11.3 (Tumbleweed), 0.5.11+git20200708+dd9ef66-5 (Debian), 0.5.7-4+b1 (Debian)
> > dash *OK*: 0.5.11.2 (SLES 15), 0.5.10.2 (CentOS)

> > bash *failing*: 5.1.4-1.4 (Tumbleweed), 4.4-9.7.1 (SLES 15)
> > bash *ok*: 4.4-17 (SLES 15), 4.3-83 (SLES 12), 4.3-11+deb8u (Debian), 5.1-2+b3
> > (Debian), 4.2.46-34 (CentOS)

> > I have no idea what it causes, whether really some bash and dash versions are
> > buggy or it's reproducible only on certain environment.

> bash 5.1.8 seems to work okay, dash 0.5.11.3-r2 seems to fail here.

> > Any tip what to search for?

> Not really, apart from reading source code and figuring out exactly what
> happens here.

> I guess that we can make things more predictable and easier to read by
> shifting parts of the shell library to a C process.

> For instance if we wrote an utility that would implement all the
> tst_kill_test and timeout process in C we would simplify things greatly.

> Something as:

> tst_timeout_kill

> that would be used as:


> tst_timeout_kill 300 12342
>                   ^    ^
> 		  |    process group leader pid
> 		  timeout in seconds


> That would implement both the loop for killing tests and timeout
> processing as well. That way we would get rid of the trap in the
> subshell and we would end up with a single pid for the whole timeout
> process and avoid the recursive sigkill to begin with.

+1 for this idea instead of never ending story to fix various shells.
Please if you have time, wrote that.

Kind regards,
Petr

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-09-17 11:03 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-19  8:58 [LTP] [PATCH v2] tst_test: using SIGTERM to terminate process Li Wang
2021-05-19  9:21 ` Joerg Vehlow
2021-05-27  4:11   ` Li Wang
2021-05-31  9:25 ` Li Wang
2021-09-15 22:40 ` Petr Vorel
2021-09-15 22:40   ` Petr Vorel
2021-09-15 23:01     ` Petr Vorel
2021-09-15 23:01       ` Petr Vorel
2021-09-17  8:50         ` Cyril Hrubis
2021-09-17  8:50           ` Cyril Hrubis
2021-09-17  9:17             ` Petr Vorel
2021-09-17  9:17               ` Petr Vorel
2021-09-17 10:18                 ` Petr Vorel
2021-09-17 10:18                   ` Petr Vorel
2021-09-17 10:59                     ` Cyril Hrubis
2021-09-17 10:59                       ` Cyril Hrubis
2021-09-17 11:03                         ` Petr Vorel
2021-09-17 11:03                           ` Petr Vorel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.