* [LTP] [PATCH v2] Terminate leftover subprocesses when main test process crashes @ 2022-02-11 11:44 Martin Doucha 2022-02-11 12:55 ` Cyril Hrubis 0 siblings, 1 reply; 9+ messages in thread From: Martin Doucha @ 2022-02-11 11:44 UTC (permalink / raw) To: ltp When the main test process crashes or gets killed e.g. by OOM killer, the watchdog process currently does not clean up any remaining child processes. Fix this by sending SIGKILL to the test process group when the watchdog process gets notified that the main test process has exited for any reason. Fixes #892 Signed-off-by: Martin Doucha <mdoucha@suse.cz> --- Changes since v1: - The SIGKILL can be sent after SAFE_WAITPID() returns, removed the unnecessary SIGCHLD handler - Fixed libtest shell script, calling test_children_cleanup inside backticks would block the shell script until any leftover child exits, which renders the test useless. Temp file must be used for passing child PID. lib/newlib_tests/runtest.sh | 3 +- lib/newlib_tests/test_children_cleanup.c | 43 +++++++++++++++++++++++ lib/newlib_tests/test_children_cleanup.sh | 19 ++++++++++ lib/tst_test.c | 3 ++ 4 files changed, 67 insertions(+), 1 deletion(-) create mode 100644 lib/newlib_tests/test_children_cleanup.c create mode 100755 lib/newlib_tests/test_children_cleanup.sh diff --git a/lib/newlib_tests/runtest.sh b/lib/newlib_tests/runtest.sh index 92fd3860e..327460e7b 100755 --- a/lib/newlib_tests/runtest.sh +++ b/lib/newlib_tests/runtest.sh @@ -4,7 +4,8 @@ LTP_C_API_TESTS="${LTP_C_API_TESTS:-test05 test07 test09 test12 test15 test18 tst_needs_cmds01 tst_needs_cmds02 tst_needs_cmds03 tst_needs_cmds06 tst_needs_cmds07 tst_bool_expr test_exec test_timer tst_res_hexd tst_strstatus -tst_fuzzy_sync03 test_zero_hugepage.sh test_kconfig.sh}" +tst_fuzzy_sync03 test_zero_hugepage.sh test_kconfig.sh +test_children_cleanup.sh}" LTP_SHELL_API_TESTS="${LTP_SHELL_API_TESTS:-shell/tst_check_driver.sh shell/tst_check_kconfig0[1-5].sh shell/net/*.sh}" diff --git a/lib/newlib_tests/test_children_cleanup.c b/lib/newlib_tests/test_children_cleanup.c new file mode 100644 index 000000000..2b1ca5f9c --- /dev/null +++ b/lib/newlib_tests/test_children_cleanup.c @@ -0,0 +1,43 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (c) 2022 SUSE LLC <mdoucha@suse.cz> + */ + +/*\ + * Test whether the LTP library properly reaps any children left over when + * the main test process dies. Run using test_children_cleanup.sh. + */ + +#include <unistd.h> +#include <signal.h> + +#include "tst_test.h" + +static void run(void) +{ + pid_t child_pid, main_pid = getpid(); + + tst_res(TINFO, "Main process %d starting", main_pid); + + /* Check that normal child reaping does not disrupt the test */ + if (!SAFE_FORK()) + return; + + SAFE_WAIT(NULL); + child_pid = SAFE_FORK(); + + /* Start child that will outlive the main test process */ + if (!child_pid) { + sleep(30); + return; + } + + tst_res(TINFO, "Forked child %d", child_pid); + kill(main_pid, SIGKILL); +} + +static struct tst_test test = { + .test_all = run, + .forks_child = 1, + .timeout = 10, +}; diff --git a/lib/newlib_tests/test_children_cleanup.sh b/lib/newlib_tests/test_children_cleanup.sh new file mode 100755 index 000000000..2a84c3d85 --- /dev/null +++ b/lib/newlib_tests/test_children_cleanup.sh @@ -0,0 +1,19 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 SUSE LLC <mdoucha@suse.cz> + +TMPFILE="/tmp/ltp_children_cleanup_$$.log" +./test_children_cleanup &>"$TMPFILE" +CHILD_PID=`sed -n 's/^.*Forked child \([0-9]*\)$/\1/p' "$TMPFILE"` +rm "$TMPFILE" + +if [ "x$CHILD_PID" = "x" ]; then + echo "TFAIL: Child process was not created" + exit 1 +elif ! kill -s 0 $CHILD_PID &>/dev/null; then + echo "TPASS: Child process was cleaned up" + exit 0 +else + echo "TFAIL: Child process was left behind" + exit 1 +fi diff --git a/lib/tst_test.c b/lib/tst_test.c index 191a9deab..a84d72d53 100644 --- a/lib/tst_test.c +++ b/lib/tst_test.c @@ -1495,6 +1495,9 @@ static int fork_testrun(void) return TFAIL; } + if (tst_test->forks_child) + kill(-test_pid, SIGKILL); + if (WIFEXITED(status) && WEXITSTATUS(status)) return WEXITSTATUS(status); -- 2.34.1 -- Mailing list info: https://lists.linux.it/listinfo/ltp ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [LTP] [PATCH v2] Terminate leftover subprocesses when main test process crashes 2022-02-11 11:44 [LTP] [PATCH v2] Terminate leftover subprocesses when main test process crashes Martin Doucha @ 2022-02-11 12:55 ` Cyril Hrubis 2022-02-11 13:29 ` Martin Doucha 0 siblings, 1 reply; 9+ messages in thread From: Cyril Hrubis @ 2022-02-11 12:55 UTC (permalink / raw) To: Martin Doucha; +Cc: ltp Hi! > --- a/lib/tst_test.c > +++ b/lib/tst_test.c > @@ -1495,6 +1495,9 @@ static int fork_testrun(void) > return TFAIL; > } > > + if (tst_test->forks_child) > + kill(-test_pid, SIGKILL); > + Maybe we can even print a message here if the kill() returns with 0, which would mean that there were any leftover child processes killed. Either way this is a nice and clean solution: Reviewed-by: Cyril Hrubis <chrubis@suse.cz> > if (WIFEXITED(status) && WEXITSTATUS(status)) > return WEXITSTATUS(status); > > -- > 2.34.1 > > > -- > Mailing list info: https://lists.linux.it/listinfo/ltp -- Cyril Hrubis chrubis@suse.cz -- Mailing list info: https://lists.linux.it/listinfo/ltp ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [LTP] [PATCH v2] Terminate leftover subprocesses when main test process crashes 2022-02-11 12:55 ` Cyril Hrubis @ 2022-02-11 13:29 ` Martin Doucha 2022-02-12 3:03 ` Li Wang 0 siblings, 1 reply; 9+ messages in thread From: Martin Doucha @ 2022-02-11 13:29 UTC (permalink / raw) To: Cyril Hrubis; +Cc: ltp On 11. 02. 22 13:55, Cyril Hrubis wrote: > Hi! >> --- a/lib/tst_test.c >> +++ b/lib/tst_test.c >> @@ -1495,6 +1495,9 @@ static int fork_testrun(void) >> return TFAIL; >> } >> >> + if (tst_test->forks_child) >> + kill(-test_pid, SIGKILL); >> + > > Maybe we can even print a message here if the kill() returns with 0, > which would mean that there were any leftover child processes killed. Feel free to add a message during merge. -- Martin Doucha mdoucha@suse.cz QA Engineer for Software Maintenance SUSE LINUX, s.r.o. CORSO IIa Krizikova 148/34 186 00 Prague 8 Czech Republic -- Mailing list info: https://lists.linux.it/listinfo/ltp ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [LTP] [PATCH v2] Terminate leftover subprocesses when main test process crashes 2022-02-11 13:29 ` Martin Doucha @ 2022-02-12 3:03 ` Li Wang 2022-02-18 12:30 ` [LTP] 72b1728674 causing regressions [ [PATCH v2] Terminate leftover subprocesses when main test process crashes] Petr Vorel 0 siblings, 1 reply; 9+ messages in thread From: Li Wang @ 2022-02-12 3:03 UTC (permalink / raw) To: Martin Doucha; +Cc: LTP List [-- Attachment #1.1: Type: text/plain, Size: 625 bytes --] On Fri, Feb 11, 2022 at 9:30 PM Martin Doucha <mdoucha@suse.cz> wrote: > On 11. 02. 22 13:55, Cyril Hrubis wrote: > > Hi! > >> --- a/lib/tst_test.c > >> +++ b/lib/tst_test.c > >> @@ -1495,6 +1495,9 @@ static int fork_testrun(void) > >> return TFAIL; > >> } > >> > >> + if (tst_test->forks_child) > >> + kill(-test_pid, SIGKILL); > >> + > > > > Maybe we can even print a message here if the kill() returns with 0, > > which would mean that there were any leftover child processes killed. > > Feel free to add a message during merge. > I added that and applied. Thanks~ -- Regards, Li Wang [-- Attachment #1.2: Type: text/html, Size: 1313 bytes --] [-- Attachment #2: Type: text/plain, Size: 60 bytes --] -- Mailing list info: https://lists.linux.it/listinfo/ltp ^ permalink raw reply [flat|nested] 9+ messages in thread
* [LTP] 72b1728674 causing regressions [ [PATCH v2] Terminate leftover subprocesses when main test process crashes] 2022-02-12 3:03 ` Li Wang @ 2022-02-18 12:30 ` Petr Vorel 2022-02-18 12:42 ` Cyril Hrubis 0 siblings, 1 reply; 9+ messages in thread From: Petr Vorel @ 2022-02-18 12:30 UTC (permalink / raw) To: Li Wang, Martin Doucha; +Cc: LTP List Hi all, > On Fri, Feb 11, 2022 at 9:30 PM Martin Doucha <mdoucha@suse.cz> wrote: > > On 11. 02. 22 13:55, Cyril Hrubis wrote: > > > Hi! > > >> --- a/lib/tst_test.c > > >> +++ b/lib/tst_test.c > > >> @@ -1495,6 +1495,9 @@ static int fork_testrun(void) > > >> return TFAIL; > > >> } > > >> + if (tst_test->forks_child) > > >> + kill(-test_pid, SIGKILL); FYI This broke all LTP network tests which use netstress.c binary, they now randomly fails after "tst_test.c:1499: TINFO: Killed the leftover descendant processes" I was thinking whether it's not actually kernel bug which is now visible, but the behavior is the same on various kernels: SLES 5.14, openSUSE 5.16.8, older Debian 5.3. and different VM setup (but disabled firewall, also randomly failing means it's not a firewall issue). Not sure now whether netstress.c should be altered or we should add flag to the API to not run this cleanup. DEBUGGING: The reason is hidden, because netstress.c output is redirected and printed only on error. Sometimes it's just a warning: # ./tcp_ipsec.sh -s 100:1000:65535:R65535 ... tcp_ipsec 1 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.Qn3NINBzja' tcp_ipsec 1 TINFO: run client 'netstress -l -H 10.0.0.1 -n 100 -N 100 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times tcp_ipsec 1 TWARN: netstress failed, ret: 2 tcp_ipsec 1 TPASS: netstress passed, median time 4 ms, data: 4 5 4 4 tcp_ipsec 2 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.Qn3NINBzja' tcp_ipsec 2 TINFO: run client 'netstress -l -H 10.0.0.1 -n 1000 -N 1000 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times tcp_ipsec 2 TPASS: netstress passed, median time 6 ms, data: 6 6 4 5 6 tcp_ipsec 3 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.Qn3NINBzja' tcp_ipsec 3 TINFO: run client 'netstress -l -H 10.0.0.1 -n 65535 -N 65535 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times tcp_ipsec 3 TPASS: netstress passed, median time 9 ms, data: 11 10 9 9 9 tcp_ipsec 4 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.Qn3NINBzja' tcp_ipsec 4 TINFO: run client 'netstress -l -H 10.0.0.1 -A 65535 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times tcp_ipsec 4 TPASS: netstress passed, median time 8 ms, data: 8 8 8 9 7 tcp_ipsec 5 TINFO: AppArmor enabled, this may affect test results tcp_ipsec 5 TINFO: it can be disabled with TST_DISABLE_APPARMOR=1 (requires super/root) tcp_ipsec 5 TINFO: loaded AppArmor profiles: none # ./tcp_ipsec.sh -s 100:1000:65535:R65535 ... tcp_ipsec 1 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.4I7mEMaCeK' tcp_ipsec 1 TINFO: run client 'netstress -l -H 10.0.0.1 -n 100 -N 100 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times tcp_ipsec 1 TPASS: netstress passed, median time 6 ms, data: 5 5 6 6 6 tcp_ipsec 2 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.4I7mEMaCeK' tcp_ipsec 2 TINFO: run client 'netstress -l -H 10.0.0.1 -n 1000 -N 1000 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times tcp_ipsec 2 TWARN: netstress failed, ret: 2 tcp_ipsec 2 TPASS: netstress passed, median time 5 ms, data: 4 6 5 5 tcp_ipsec 3 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.4I7mEMaCeK' tcp_ipsec 3 TINFO: run client 'netstress -l -H 10.0.0.1 -n 65535 -N 65535 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times tcp_ipsec 3 TPASS: netstress passed, median time 10 ms, data: 10 10 8 9 10 tcp_ipsec 4 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.4I7mEMaCeK' tcp_ipsec 4 TINFO: run client 'netstress -l -H 10.0.0.1 -A 65535 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times tcp_ipsec 4 TPASS: netstress passed, median time 11 ms, data: 12 11 11 11 11 tcp_ipsec 5 TINFO: AppArmor enabled, this may affect test results tcp_ipsec 5 TINFO: it can be disabled with TST_DISABLE_APPARMOR=1 (requires super/root) tcp_ipsec 5 TINFO: loaded AppArmor profiles: none Sometimes it's a hard failure, where we at least see the log: tcp_ipsec 1 TPASS: netstress passed, median time 5 ms, data: 4 7 4 8 5 tcp_ipsec 2 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.rEORDqdaS6' tcp_ipsec 2 TINFO: run client 'netstress -l -H 10.0.0.1 -n 1000 -N 1000 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times tcp_ipsec 2 TPASS: netstress passed, median time 6 ms, data: 4 6 6 4 6 tcp_ipsec 3 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.rEORDqdaS6' tcp_ipsec 3 TINFO: run client 'netstress -l -H 10.0.0.1 -n 65535 -N 65535 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times tcp_ipsec 3 TWARN: netstress failed, ret: 2 netstress.c:642: TBROK: Server closed tst_test.c:1457: TINFO: Timeout per run is 0h 05m 00s netstress.c:895: TINFO: connection: addr '10.0.0.1', port '33985' netstress.c:896: TINFO: client max req: 100 netstress.c:897: TINFO: clients num: 2 netstress.c:902: TINFO: client msg size: 65535 netstress.c:903: TINFO: server msg size: 65535 netstress.c:817: TINFO: tcp_tw_reuse is already set netstress.c:947: TINFO: TCP client is using old TCP API. netstress.c:789: TINFO: '/proc/sys/net/ipv4/tcp_fastopen' is 1 netstress.c:476: TINFO: Running the test over IPv4 netstress.c:344: TBROK: connect(4, 10.0.0.1:33985, 16) failed: ECONNREFUSED (111) netstress.c:344: TBROK: connect(3, 10.0.0.1:33985, 16) failed: ECONNREFUSED (111) But with patch below it shows that server process is killed: tcp_ipsec 1 TPASS: netstress passed, median time 5 ms, data: 6 5 5 4 5 tcp_ipsec 2 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.DId6DBCQ2W' tcp_ipsec 2 TINFO: run client 'netstress -l -H 10.0.0.1 -n 1000 -N 1000 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times tcp_ipsec 2 TINFO: ===== 1: remote netstress, ret: 0, cat tst_netload.log ===== tst_test.c:1457: TINFO: Timeout per run is 0h 05m 00s netstress.c:923: TINFO: max requests '10' netstress.c:947: TINFO: TCP server is using old TCP API. netstress.c:789: TINFO: '/proc/sys/net/ipv4/tcp_fastopen' is 1 netstress.c:678: TINFO: assigning a name to the server socket... netstress.c:685: TINFO: bind to port 36103 netstress.c:706: TINFO: Listen on the socket '5' tst_test.c:1499: TINFO: Killed the leftover descendant processes => HERE netstress server process is killed after TPASS Summary: passed 0 failed 0 broken 0 skipped 0 warnings 0 --- tcp_ipsec 2 TWARN: netstress failed, ret: 2 => causing TWARN for client. And hard failure: tcp_ipsec 4 TINFO: ===== 5: remote netstress, ret: 0, cat tst_netload.log ===== tst_test.c:1457: TINFO: Timeout per run is 0h 05m 00s netstress.c:923: TINFO: max requests '10' netstress.c:947: TINFO: TCP server is using old TCP API. netstress.c:789: TINFO: '/proc/sys/net/ipv4/tcp_fastopen' is 1 netstress.c:678: TINFO: assigning a name to the server socket... netstress.c:685: TINFO: bind to port 36709 netstress.c:706: TINFO: Listen on the socket '5' tst_test.c:1499: TINFO: Killed the leftover descendant processes Summary: passed 0 failed 0 broken 0 skipped 0 warnings 0 --- tcp_ipsec 4 TWARN: netstress failed, ret: 2 netstress.c:642: TBROK: Server closed tst_test.c:1457: TINFO: Timeout per run is 0h 05m 00s netstress.c:874: TINFO: rand start seed 0xff9e netstress.c:895: TINFO: connection: addr '10.0.0.1', port '36709' netstress.c:896: TINFO: client max req: 100 netstress.c:897: TINFO: clients num: 2 netstress.c:900: TINFO: random msg size [5 65530] netstress.c:817: TINFO: tcp_tw_reuse is already set netstress.c:947: TINFO: TCP client is using old TCP API. netstress.c:789: TINFO: '/proc/sys/net/ipv4/tcp_fastopen' is 1 netstress.c:476: TINFO: Running the test over IPv4 netstress.c:344: TBROK: connect(4, 10.0.0.1:36709, 16) failed: ECONNREFUSED (111) netstress.c:344: TBROK: connect(3, 10.0.0.1:36709, 16) failed: ECONNREFUSED (111) Summary: passed 0 failed 0 broken 2 skipped 0 warnings 0 tcp_ipsec 4 TFAIL: expected 'pass' but ret: '2' Kind regards, Petr +++ testcases/lib/tst_net.sh @@ -728,6 +728,10 @@ tst_netload() for i in $(seq 1 $run_cnt); do tst_rhost_run -c "netstress $s_opts" > tst_netload.log 2>&1 + tst_res_ TINFO "===== $i: remote netstress, ret: $ret, cat tst_netload.log =====" + cat tst_netload.log + printf -- "---\n\n" + if [ $? -ne 0 ]; then cat tst_netload.log local ttype="TFAIL" -- Mailing list info: https://lists.linux.it/listinfo/ltp ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [LTP] 72b1728674 causing regressions [ [PATCH v2] Terminate leftover subprocesses when main test process crashes] 2022-02-18 12:30 ` [LTP] 72b1728674 causing regressions [ [PATCH v2] Terminate leftover subprocesses when main test process crashes] Petr Vorel @ 2022-02-18 12:42 ` Cyril Hrubis 2022-02-18 14:42 ` Petr Vorel 0 siblings, 1 reply; 9+ messages in thread From: Cyril Hrubis @ 2022-02-18 12:42 UTC (permalink / raw) To: Petr Vorel; +Cc: LTP List, Martin Doucha Hi! > > > >> --- a/lib/tst_test.c > > > >> +++ b/lib/tst_test.c > > > >> @@ -1495,6 +1495,9 @@ static int fork_testrun(void) > > > >> return TFAIL; > > > >> } > > > > >> + if (tst_test->forks_child) > > > >> + kill(-test_pid, SIGKILL); > FYI This broke all LTP network tests which use netstress.c binary, > they now randomly fails after "tst_test.c:1499: TINFO: Killed the leftover descendant processes" > > I was thinking whether it's not actually kernel bug which is now visible, > but the behavior is the same on various kernels: SLES 5.14, openSUSE 5.16.8, > older Debian 5.3. and different VM setup (but disabled firewall, also randomly > failing means it's not a firewall issue). > > Not sure now whether netstress.c should be altered or we should add flag to the > API to not run this cleanup. > > DEBUGGING: > > The reason is hidden, because netstress.c output is redirected and printed only > on error. > > Sometimes it's just a warning: I guess that it's a race between the SETSID() and exit(0) in the move_to_background() function. If the main test process calls exit(0) before the newly forked child managed to do SETSID() it's killed by the test library because it's still in the old process group. Try this: diff --git a/testcases/network/netstress/netstress.c b/testcases/network/netstress/netstress.c index 0914c65bd..04a850134 100644 --- a/testcases/network/netstress/netstress.c +++ b/testcases/network/netstress/netstress.c @@ -713,11 +713,15 @@ static void server_cleanup(void) static void move_to_background(void) { - if (SAFE_FORK()) + if (SAFE_FORK()) { + pause(); exit(0); + } SAFE_SETSID(); + SAFE_KILL(getppid(), SIGUSR1); + close(STDIN_FILENO); SAFE_OPEN("/dev/null", O_RDONLY); close(STDOUT_FILENO); -- Cyril Hrubis chrubis@suse.cz -- Mailing list info: https://lists.linux.it/listinfo/ltp ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [LTP] 72b1728674 causing regressions [ [PATCH v2] Terminate leftover subprocesses when main test process crashes] 2022-02-18 12:42 ` Cyril Hrubis @ 2022-02-18 14:42 ` Petr Vorel 2022-02-18 14:48 ` Cyril Hrubis 0 siblings, 1 reply; 9+ messages in thread From: Petr Vorel @ 2022-02-18 14:42 UTC (permalink / raw) To: Cyril Hrubis; +Cc: LTP List, Martin Doucha Hi Cyril, > > Sometimes it's just a warning: > I guess that it's a race between the SETSID() and exit(0) in the > move_to_background() function. If the main test process calls exit(0) > before the newly forked child managed to do SETSID() it's killed by the > test library because it's still in the old process group. Thanks! Yep, it'll probably be a race. Your patch causes server being killed: tst_test.c:1511: TBROK: Test killed by SIGUSR1! (no big surprise) Also netstress server remains running: netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.RZ6H3Adg4e Kind regards, Petr > Try this: > diff --git a/testcases/network/netstress/netstress.c b/testcases/network/netstress/netstress.c > index 0914c65bd..04a850134 100644 > --- a/testcases/network/netstress/netstress.c > +++ b/testcases/network/netstress/netstress.c > @@ -713,11 +713,15 @@ static void server_cleanup(void) > static void move_to_background(void) > { > - if (SAFE_FORK()) > + if (SAFE_FORK()) { > + pause(); > exit(0); > + } > SAFE_SETSID(); > + SAFE_KILL(getppid(), SIGUSR1); > + > close(STDIN_FILENO); > SAFE_OPEN("/dev/null", O_RDONLY); > close(STDOUT_FILENO); -- Mailing list info: https://lists.linux.it/listinfo/ltp ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [LTP] 72b1728674 causing regressions [ [PATCH v2] Terminate leftover subprocesses when main test process crashes] 2022-02-18 14:42 ` Petr Vorel @ 2022-02-18 14:48 ` Cyril Hrubis 2022-02-18 15:32 ` Petr Vorel 0 siblings, 1 reply; 9+ messages in thread From: Cyril Hrubis @ 2022-02-18 14:48 UTC (permalink / raw) To: Petr Vorel; +Cc: LTP List, Martin Doucha Hi! > > I guess that it's a race between the SETSID() and exit(0) in the > > move_to_background() function. If the main test process calls exit(0) > > before the newly forked child managed to do SETSID() it's killed by the > > test library because it's still in the old process group. > > Thanks! Yep, it'll probably be a race. > > Your patch causes server being killed: > tst_test.c:1511: TBROK: Test killed by SIGUSR1! > > (no big surprise) Ah right, we have to set up a dummy handler for SIGUSR1 since default handler for SIGUSR1 will terminate the process. Or set the .needs_checkpoints in the tst_test structure and use WAKE and WAIT pair to synchronize. -- Cyril Hrubis chrubis@suse.cz -- Mailing list info: https://lists.linux.it/listinfo/ltp ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [LTP] 72b1728674 causing regressions [ [PATCH v2] Terminate leftover subprocesses when main test process crashes] 2022-02-18 14:48 ` Cyril Hrubis @ 2022-02-18 15:32 ` Petr Vorel 0 siblings, 0 replies; 9+ messages in thread From: Petr Vorel @ 2022-02-18 15:32 UTC (permalink / raw) To: Cyril Hrubis; +Cc: LTP List, Martin Doucha Hi Cyril, > Hi! > > > I guess that it's a race between the SETSID() and exit(0) in the > > > move_to_background() function. If the main test process calls exit(0) > > > before the newly forked child managed to do SETSID() it's killed by the > > > test library because it's still in the old process group. > > Thanks! Yep, it'll probably be a race. > > Your patch causes server being killed: > > tst_test.c:1511: TBROK: Test killed by SIGUSR1! > > (no big surprise) > Ah right, we have to set up a dummy handler for SIGUSR1 since default > handler for SIGUSR1 will terminate the process. Yep, with with adding it it works as expected. (I wasn't sure if you plan to somehow align with the cleanup from tst_test.c, which also uses SIGUSR1 for cleanup, but right we're masking it). static void sig_handler(int sig LTP_ATTRIBUTE_UNUSED) { } and setting it in setup() SAFE_SIGNAL(SIGUSR1, sig_handler); > Or set the .needs_checkpoints in the tst_test structure and use WAKE and > WAIT pair to synchronize. As signal handler works, I'd go for it unless this preferred. Going to send proper patch. I wonder if it's just netstress.c which suffers this problem. Kind regards, Petr -- Mailing list info: https://lists.linux.it/listinfo/ltp ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-02-18 15:32 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-02-11 11:44 [LTP] [PATCH v2] Terminate leftover subprocesses when main test process crashes Martin Doucha 2022-02-11 12:55 ` Cyril Hrubis 2022-02-11 13:29 ` Martin Doucha 2022-02-12 3:03 ` Li Wang 2022-02-18 12:30 ` [LTP] 72b1728674 causing regressions [ [PATCH v2] Terminate leftover subprocesses when main test process crashes] Petr Vorel 2022-02-18 12:42 ` Cyril Hrubis 2022-02-18 14:42 ` Petr Vorel 2022-02-18 14:48 ` Cyril Hrubis 2022-02-18 15:32 ` Petr Vorel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).