All of lore.kernel.org
 help / color / mirror / Atom feed
* [LTP] [PATCH v2] Terminate leftover subprocesses when main test process crashes
@ 2022-02-11 11:44 Martin Doucha
  2022-02-11 12:55 ` Cyril Hrubis
  0 siblings, 1 reply; 9+ messages in thread
From: Martin Doucha @ 2022-02-11 11:44 UTC (permalink / raw)
  To: ltp

When the main test process crashes or gets killed e.g. by OOM killer,
the watchdog process currently does not clean up any remaining child
processes. Fix this by sending SIGKILL to the test process group when
the watchdog process gets notified that the main test process has exited
for any reason.

Fixes #892

Signed-off-by: Martin Doucha <mdoucha@suse.cz>
---

Changes since v1:
- The SIGKILL can be sent after SAFE_WAITPID() returns, removed
  the unnecessary SIGCHLD handler
- Fixed libtest shell script, calling test_children_cleanup inside backticks
  would block the shell script until any leftover child exits, which renders
  the test useless. Temp file must be used for passing child PID.

 lib/newlib_tests/runtest.sh               |  3 +-
 lib/newlib_tests/test_children_cleanup.c  | 43 +++++++++++++++++++++++
 lib/newlib_tests/test_children_cleanup.sh | 19 ++++++++++
 lib/tst_test.c                            |  3 ++
 4 files changed, 67 insertions(+), 1 deletion(-)
 create mode 100644 lib/newlib_tests/test_children_cleanup.c
 create mode 100755 lib/newlib_tests/test_children_cleanup.sh

diff --git a/lib/newlib_tests/runtest.sh b/lib/newlib_tests/runtest.sh
index 92fd3860e..327460e7b 100755
--- a/lib/newlib_tests/runtest.sh
+++ b/lib/newlib_tests/runtest.sh
@@ -4,7 +4,8 @@
 LTP_C_API_TESTS="${LTP_C_API_TESTS:-test05 test07 test09 test12 test15 test18
 tst_needs_cmds01 tst_needs_cmds02 tst_needs_cmds03 tst_needs_cmds06
 tst_needs_cmds07 tst_bool_expr test_exec test_timer tst_res_hexd tst_strstatus
-tst_fuzzy_sync03 test_zero_hugepage.sh test_kconfig.sh}"
+tst_fuzzy_sync03 test_zero_hugepage.sh test_kconfig.sh
+test_children_cleanup.sh}"
 
 LTP_SHELL_API_TESTS="${LTP_SHELL_API_TESTS:-shell/tst_check_driver.sh
 shell/tst_check_kconfig0[1-5].sh shell/net/*.sh}"
diff --git a/lib/newlib_tests/test_children_cleanup.c b/lib/newlib_tests/test_children_cleanup.c
new file mode 100644
index 000000000..2b1ca5f9c
--- /dev/null
+++ b/lib/newlib_tests/test_children_cleanup.c
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2022 SUSE LLC <mdoucha@suse.cz>
+ */
+
+/*\
+ * Test whether the LTP library properly reaps any children left over when
+ * the main test process dies. Run using test_children_cleanup.sh.
+ */
+
+#include <unistd.h>
+#include <signal.h>
+
+#include "tst_test.h"
+
+static void run(void)
+{
+	pid_t child_pid, main_pid = getpid();
+
+	tst_res(TINFO, "Main process %d starting", main_pid);
+
+	/* Check that normal child reaping does not disrupt the test */
+	if (!SAFE_FORK())
+		return;
+
+	SAFE_WAIT(NULL);
+	child_pid = SAFE_FORK();
+
+	/* Start child that will outlive the main test process */
+	if (!child_pid) {
+		sleep(30);
+		return;
+	}
+
+	tst_res(TINFO, "Forked child %d", child_pid);
+	kill(main_pid, SIGKILL);
+}
+
+static struct tst_test test = {
+	.test_all = run,
+	.forks_child = 1,
+	.timeout = 10,
+};
diff --git a/lib/newlib_tests/test_children_cleanup.sh b/lib/newlib_tests/test_children_cleanup.sh
new file mode 100755
index 000000000..2a84c3d85
--- /dev/null
+++ b/lib/newlib_tests/test_children_cleanup.sh
@@ -0,0 +1,19 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0-or-later
+# Copyright (c) 2022 SUSE LLC <mdoucha@suse.cz>
+
+TMPFILE="/tmp/ltp_children_cleanup_$$.log"
+./test_children_cleanup &>"$TMPFILE"
+CHILD_PID=`sed -n 's/^.*Forked child \([0-9]*\)$/\1/p' "$TMPFILE"`
+rm "$TMPFILE"
+
+if [ "x$CHILD_PID" = "x" ]; then
+	echo "TFAIL: Child process was not created"
+	exit 1
+elif ! kill -s 0 $CHILD_PID &>/dev/null; then
+	echo "TPASS: Child process was cleaned up"
+	exit 0
+else
+	echo "TFAIL: Child process was left behind"
+	exit 1
+fi
diff --git a/lib/tst_test.c b/lib/tst_test.c
index 191a9deab..a84d72d53 100644
--- a/lib/tst_test.c
+++ b/lib/tst_test.c
@@ -1495,6 +1495,9 @@ static int fork_testrun(void)
 		return TFAIL;
 	}
 
+	if (tst_test->forks_child)
+		kill(-test_pid, SIGKILL);
+
 	if (WIFEXITED(status) && WEXITSTATUS(status))
 		return WEXITSTATUS(status);
 
-- 
2.34.1


-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [LTP] [PATCH v2] Terminate leftover subprocesses when main test process crashes
  2022-02-11 11:44 [LTP] [PATCH v2] Terminate leftover subprocesses when main test process crashes Martin Doucha
@ 2022-02-11 12:55 ` Cyril Hrubis
  2022-02-11 13:29   ` Martin Doucha
  0 siblings, 1 reply; 9+ messages in thread
From: Cyril Hrubis @ 2022-02-11 12:55 UTC (permalink / raw)
  To: Martin Doucha; +Cc: ltp

Hi!
> --- a/lib/tst_test.c
> +++ b/lib/tst_test.c
> @@ -1495,6 +1495,9 @@ static int fork_testrun(void)
>  		return TFAIL;
>  	}
>  
> +	if (tst_test->forks_child)
> +		kill(-test_pid, SIGKILL);
> +

Maybe we can even print a message here if the kill() returns with 0,
which would mean that there were any leftover child processes killed.

Either way this is a nice and clean solution:

Reviewed-by: Cyril Hrubis <chrubis@suse.cz>

>  	if (WIFEXITED(status) && WEXITSTATUS(status))
>  		return WEXITSTATUS(status);
>  
> -- 
> 2.34.1
> 
> 
> -- 
> Mailing list info: https://lists.linux.it/listinfo/ltp

-- 
Cyril Hrubis
chrubis@suse.cz

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LTP] [PATCH v2] Terminate leftover subprocesses when main test process crashes
  2022-02-11 12:55 ` Cyril Hrubis
@ 2022-02-11 13:29   ` Martin Doucha
  2022-02-12  3:03     ` Li Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Martin Doucha @ 2022-02-11 13:29 UTC (permalink / raw)
  To: Cyril Hrubis; +Cc: ltp

On 11. 02. 22 13:55, Cyril Hrubis wrote:
> Hi!
>> --- a/lib/tst_test.c
>> +++ b/lib/tst_test.c
>> @@ -1495,6 +1495,9 @@ static int fork_testrun(void)
>>  		return TFAIL;
>>  	}
>>  
>> +	if (tst_test->forks_child)
>> +		kill(-test_pid, SIGKILL);
>> +
> 
> Maybe we can even print a message here if the kill() returns with 0,
> which would mean that there were any leftover child processes killed.

Feel free to add a message during merge.

-- 
Martin Doucha   mdoucha@suse.cz
QA Engineer for Software Maintenance
SUSE LINUX, s.r.o.
CORSO IIa
Krizikova 148/34
186 00 Prague 8
Czech Republic

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LTP] [PATCH v2] Terminate leftover subprocesses when main test process crashes
  2022-02-11 13:29   ` Martin Doucha
@ 2022-02-12  3:03     ` Li Wang
  2022-02-18 12:30       ` [LTP] 72b1728674 causing regressions [ [PATCH v2] Terminate leftover subprocesses when main test process crashes] Petr Vorel
  0 siblings, 1 reply; 9+ messages in thread
From: Li Wang @ 2022-02-12  3:03 UTC (permalink / raw)
  To: Martin Doucha; +Cc: LTP List


[-- Attachment #1.1: Type: text/plain, Size: 625 bytes --]

On Fri, Feb 11, 2022 at 9:30 PM Martin Doucha <mdoucha@suse.cz> wrote:

> On 11. 02. 22 13:55, Cyril Hrubis wrote:
> > Hi!
> >> --- a/lib/tst_test.c
> >> +++ b/lib/tst_test.c
> >> @@ -1495,6 +1495,9 @@ static int fork_testrun(void)
> >>              return TFAIL;
> >>      }
> >>
> >> +    if (tst_test->forks_child)
> >> +            kill(-test_pid, SIGKILL);
> >> +
> >
> > Maybe we can even print a message here if the kill() returns with 0,
> > which would mean that there were any leftover child processes killed.
>
> Feel free to add a message during merge.
>

I added that and applied. Thanks~


-- 
Regards,
Li Wang

[-- Attachment #1.2: Type: text/html, Size: 1313 bytes --]

[-- Attachment #2: Type: text/plain, Size: 60 bytes --]


-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [LTP] 72b1728674 causing regressions [ [PATCH v2] Terminate leftover subprocesses when main test process crashes]
  2022-02-12  3:03     ` Li Wang
@ 2022-02-18 12:30       ` Petr Vorel
  2022-02-18 12:42         ` Cyril Hrubis
  0 siblings, 1 reply; 9+ messages in thread
From: Petr Vorel @ 2022-02-18 12:30 UTC (permalink / raw)
  To: Li Wang, Martin Doucha; +Cc: LTP List

Hi all,

> On Fri, Feb 11, 2022 at 9:30 PM Martin Doucha <mdoucha@suse.cz> wrote:

> > On 11. 02. 22 13:55, Cyril Hrubis wrote:
> > > Hi!
> > >> --- a/lib/tst_test.c
> > >> +++ b/lib/tst_test.c
> > >> @@ -1495,6 +1495,9 @@ static int fork_testrun(void)
> > >>              return TFAIL;
> > >>      }

> > >> +    if (tst_test->forks_child)
> > >> +            kill(-test_pid, SIGKILL);
FYI This broke all LTP network tests which use netstress.c binary,
they now randomly fails after "tst_test.c:1499: TINFO: Killed the leftover descendant processes"

I was thinking whether it's not actually kernel bug which is now visible,
but the behavior is the same on various kernels: SLES 5.14, openSUSE 5.16.8,
older Debian 5.3. and different VM setup (but disabled firewall, also randomly
failing means it's not a firewall issue).

Not sure now whether netstress.c should be altered or we should add flag to the
API to not run this cleanup.

DEBUGGING:

The reason is hidden, because netstress.c output is redirected and printed only
on error.

Sometimes it's just a warning:

# ./tcp_ipsec.sh -s 100:1000:65535:R65535
...
tcp_ipsec 1 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.Qn3NINBzja'
tcp_ipsec 1 TINFO: run client 'netstress -l -H 10.0.0.1 -n 100 -N 100 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 1 TWARN: netstress failed, ret: 2
tcp_ipsec 1 TPASS: netstress passed, median time 4 ms, data: 4 5 4 4
tcp_ipsec 2 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.Qn3NINBzja'
tcp_ipsec 2 TINFO: run client 'netstress -l -H 10.0.0.1 -n 1000 -N 1000 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 2 TPASS: netstress passed, median time 6 ms, data: 6 6 4 5 6
tcp_ipsec 3 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.Qn3NINBzja'
tcp_ipsec 3 TINFO: run client 'netstress -l -H 10.0.0.1 -n 65535 -N 65535 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 3 TPASS: netstress passed, median time 9 ms, data: 11 10 9 9 9
tcp_ipsec 4 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.Qn3NINBzja'
tcp_ipsec 4 TINFO: run client 'netstress -l -H 10.0.0.1 -A 65535 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 4 TPASS: netstress passed, median time 8 ms, data: 8 8 8 9 7
tcp_ipsec 5 TINFO: AppArmor enabled, this may affect test results
tcp_ipsec 5 TINFO: it can be disabled with TST_DISABLE_APPARMOR=1 (requires super/root)
tcp_ipsec 5 TINFO: loaded AppArmor profiles: none

# ./tcp_ipsec.sh -s 100:1000:65535:R65535
...
tcp_ipsec 1 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.4I7mEMaCeK'
tcp_ipsec 1 TINFO: run client 'netstress -l -H 10.0.0.1 -n 100 -N 100 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 1 TPASS: netstress passed, median time 6 ms, data: 5 5 6 6 6
tcp_ipsec 2 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.4I7mEMaCeK'
tcp_ipsec 2 TINFO: run client 'netstress -l -H 10.0.0.1 -n 1000 -N 1000 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 2 TWARN: netstress failed, ret: 2
tcp_ipsec 2 TPASS: netstress passed, median time 5 ms, data: 4 6 5 5
tcp_ipsec 3 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.4I7mEMaCeK'
tcp_ipsec 3 TINFO: run client 'netstress -l -H 10.0.0.1 -n 65535 -N 65535 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 3 TPASS: netstress passed, median time 10 ms, data: 10 10 8 9 10
tcp_ipsec 4 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.4I7mEMaCeK'
tcp_ipsec 4 TINFO: run client 'netstress -l -H 10.0.0.1 -A 65535 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 4 TPASS: netstress passed, median time 11 ms, data: 12 11 11 11 11
tcp_ipsec 5 TINFO: AppArmor enabled, this may affect test results
tcp_ipsec 5 TINFO: it can be disabled with TST_DISABLE_APPARMOR=1 (requires super/root)
tcp_ipsec 5 TINFO: loaded AppArmor profiles: none

Sometimes it's a hard failure, where we at least see the log:
tcp_ipsec 1 TPASS: netstress passed, median time 5 ms, data: 4 7 4 8 5
tcp_ipsec 2 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.rEORDqdaS6'
tcp_ipsec 2 TINFO: run client 'netstress -l -H 10.0.0.1 -n 1000 -N 1000 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 2 TPASS: netstress passed, median time 6 ms, data: 4 6 6 4 6
tcp_ipsec 3 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.rEORDqdaS6'
tcp_ipsec 3 TINFO: run client 'netstress -l -H 10.0.0.1 -n 65535 -N 65535 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 3 TWARN: netstress failed, ret: 2
netstress.c:642: TBROK: Server closed
tst_test.c:1457: TINFO: Timeout per run is 0h 05m 00s
netstress.c:895: TINFO: connection: addr '10.0.0.1', port '33985'
netstress.c:896: TINFO: client max req: 100
netstress.c:897: TINFO: clients num: 2
netstress.c:902: TINFO: client msg size: 65535
netstress.c:903: TINFO: server msg size: 65535
netstress.c:817: TINFO: tcp_tw_reuse is already set
netstress.c:947: TINFO: TCP client is using old TCP API.
netstress.c:789: TINFO: '/proc/sys/net/ipv4/tcp_fastopen' is 1
netstress.c:476: TINFO: Running the test over IPv4
netstress.c:344: TBROK: connect(4, 10.0.0.1:33985, 16) failed: ECONNREFUSED (111)
netstress.c:344: TBROK: connect(3, 10.0.0.1:33985, 16) failed: ECONNREFUSED (111)

But with patch below it shows that server process is killed:

tcp_ipsec 1 TPASS: netstress passed, median time 5 ms, data: 6 5 5 4 5
tcp_ipsec 2 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.DId6DBCQ2W'
tcp_ipsec 2 TINFO: run client 'netstress -l -H 10.0.0.1 -n 1000 -N 1000 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 2 TINFO: ===== 1: remote netstress, ret: 0, cat tst_netload.log =====
tst_test.c:1457: TINFO: Timeout per run is 0h 05m 00s
netstress.c:923: TINFO: max requests '10'
netstress.c:947: TINFO: TCP server is using old TCP API.
netstress.c:789: TINFO: '/proc/sys/net/ipv4/tcp_fastopen' is 1
netstress.c:678: TINFO: assigning a name to the server socket...
netstress.c:685: TINFO: bind to port 36103
netstress.c:706: TINFO: Listen on the socket '5'
tst_test.c:1499: TINFO: Killed the leftover descendant processes
=> HERE netstress server process is killed after TPASS

Summary:
passed   0
failed   0
broken   0
skipped  0
warnings 0
---

tcp_ipsec 2 TWARN: netstress failed, ret: 2
=> causing TWARN for client.

And hard failure:

tcp_ipsec 4 TINFO: ===== 5: remote netstress, ret: 0, cat tst_netload.log =====
tst_test.c:1457: TINFO: Timeout per run is 0h 05m 00s
netstress.c:923: TINFO: max requests '10'
netstress.c:947: TINFO: TCP server is using old TCP API.
netstress.c:789: TINFO: '/proc/sys/net/ipv4/tcp_fastopen' is 1
netstress.c:678: TINFO: assigning a name to the server socket...
netstress.c:685: TINFO: bind to port 36709
netstress.c:706: TINFO: Listen on the socket '5'
tst_test.c:1499: TINFO: Killed the leftover descendant processes

Summary:
passed   0
failed   0
broken   0
skipped  0
warnings 0
---

tcp_ipsec 4 TWARN: netstress failed, ret: 2
netstress.c:642: TBROK: Server closed
tst_test.c:1457: TINFO: Timeout per run is 0h 05m 00s
netstress.c:874: TINFO: rand start seed 0xff9e
netstress.c:895: TINFO: connection: addr '10.0.0.1', port '36709'
netstress.c:896: TINFO: client max req: 100
netstress.c:897: TINFO: clients num: 2
netstress.c:900: TINFO: random msg size [5 65530]
netstress.c:817: TINFO: tcp_tw_reuse is already set
netstress.c:947: TINFO: TCP client is using old TCP API.
netstress.c:789: TINFO: '/proc/sys/net/ipv4/tcp_fastopen' is 1
netstress.c:476: TINFO: Running the test over IPv4
netstress.c:344: TBROK: connect(4, 10.0.0.1:36709, 16) failed: ECONNREFUSED (111)
netstress.c:344: TBROK: connect(3, 10.0.0.1:36709, 16) failed: ECONNREFUSED (111)

Summary:
passed   0
failed   0
broken   2
skipped  0
warnings 0
tcp_ipsec 4 TFAIL: expected 'pass' but ret: '2'

Kind regards,
Petr

+++ testcases/lib/tst_net.sh
@@ -728,6 +728,10 @@ tst_netload()
 
 	for i in $(seq 1 $run_cnt); do
 		tst_rhost_run -c "netstress $s_opts" > tst_netload.log 2>&1
+		tst_res_ TINFO "===== $i: remote netstress, ret: $ret, cat tst_netload.log ====="
+		cat tst_netload.log
+		printf -- "---\n\n"
+
 		if [ $? -ne 0 ]; then
 			cat tst_netload.log
 			local ttype="TFAIL"

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LTP] 72b1728674 causing regressions [ [PATCH v2] Terminate leftover subprocesses when main test process crashes]
  2022-02-18 12:30       ` [LTP] 72b1728674 causing regressions [ [PATCH v2] Terminate leftover subprocesses when main test process crashes] Petr Vorel
@ 2022-02-18 12:42         ` Cyril Hrubis
  2022-02-18 14:42           ` Petr Vorel
  0 siblings, 1 reply; 9+ messages in thread
From: Cyril Hrubis @ 2022-02-18 12:42 UTC (permalink / raw)
  To: Petr Vorel; +Cc: LTP List, Martin Doucha

Hi!
> > > >> --- a/lib/tst_test.c
> > > >> +++ b/lib/tst_test.c
> > > >> @@ -1495,6 +1495,9 @@ static int fork_testrun(void)
> > > >>              return TFAIL;
> > > >>      }
> 
> > > >> +    if (tst_test->forks_child)
> > > >> +            kill(-test_pid, SIGKILL);
> FYI This broke all LTP network tests which use netstress.c binary,
> they now randomly fails after "tst_test.c:1499: TINFO: Killed the leftover descendant processes"
> 
> I was thinking whether it's not actually kernel bug which is now visible,
> but the behavior is the same on various kernels: SLES 5.14, openSUSE 5.16.8,
> older Debian 5.3. and different VM setup (but disabled firewall, also randomly
> failing means it's not a firewall issue).
> 
> Not sure now whether netstress.c should be altered or we should add flag to the
> API to not run this cleanup.
> 
> DEBUGGING:
> 
> The reason is hidden, because netstress.c output is redirected and printed only
> on error.
> 
> Sometimes it's just a warning:

I guess that it's a race between the SETSID() and exit(0) in the
move_to_background() function. If the main test process calls exit(0)
before the newly forked child managed to do SETSID() it's killed by the
test library because it's still in the old process group.

Try this:

diff --git a/testcases/network/netstress/netstress.c b/testcases/network/netstress/netstress.c
index 0914c65bd..04a850134 100644
--- a/testcases/network/netstress/netstress.c
+++ b/testcases/network/netstress/netstress.c
@@ -713,11 +713,15 @@ static void server_cleanup(void)

 static void move_to_background(void)
 {
-       if (SAFE_FORK())
+       if (SAFE_FORK()) {
+               pause();
                exit(0);
+       }

        SAFE_SETSID();

+       SAFE_KILL(getppid(), SIGUSR1);
+
        close(STDIN_FILENO);
        SAFE_OPEN("/dev/null", O_RDONLY);
        close(STDOUT_FILENO);


-- 
Cyril Hrubis
chrubis@suse.cz

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [LTP] 72b1728674 causing regressions [ [PATCH v2] Terminate leftover subprocesses when main test process crashes]
  2022-02-18 12:42         ` Cyril Hrubis
@ 2022-02-18 14:42           ` Petr Vorel
  2022-02-18 14:48             ` Cyril Hrubis
  0 siblings, 1 reply; 9+ messages in thread
From: Petr Vorel @ 2022-02-18 14:42 UTC (permalink / raw)
  To: Cyril Hrubis; +Cc: LTP List, Martin Doucha

Hi Cyril,

> > Sometimes it's just a warning:

> I guess that it's a race between the SETSID() and exit(0) in the
> move_to_background() function. If the main test process calls exit(0)
> before the newly forked child managed to do SETSID() it's killed by the
> test library because it's still in the old process group.

Thanks!  Yep, it'll probably be a race.

Your patch causes server being killed:
tst_test.c:1511: TBROK: Test killed by SIGUSR1!

(no big surprise)

Also netstress server remains running:
netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.RZ6H3Adg4e

Kind regards,
Petr

> Try this:

> diff --git a/testcases/network/netstress/netstress.c b/testcases/network/netstress/netstress.c
> index 0914c65bd..04a850134 100644
> --- a/testcases/network/netstress/netstress.c
> +++ b/testcases/network/netstress/netstress.c
> @@ -713,11 +713,15 @@ static void server_cleanup(void)

>  static void move_to_background(void)
>  {
> -       if (SAFE_FORK())
> +       if (SAFE_FORK()) {
> +               pause();
>                 exit(0);
> +       }

>         SAFE_SETSID();

> +       SAFE_KILL(getppid(), SIGUSR1);
> +
>         close(STDIN_FILENO);
>         SAFE_OPEN("/dev/null", O_RDONLY);
>         close(STDOUT_FILENO);

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LTP] 72b1728674 causing regressions [ [PATCH v2] Terminate leftover subprocesses when main test process crashes]
  2022-02-18 14:42           ` Petr Vorel
@ 2022-02-18 14:48             ` Cyril Hrubis
  2022-02-18 15:32               ` Petr Vorel
  0 siblings, 1 reply; 9+ messages in thread
From: Cyril Hrubis @ 2022-02-18 14:48 UTC (permalink / raw)
  To: Petr Vorel; +Cc: LTP List, Martin Doucha

Hi!
> > I guess that it's a race between the SETSID() and exit(0) in the
> > move_to_background() function. If the main test process calls exit(0)
> > before the newly forked child managed to do SETSID() it's killed by the
> > test library because it's still in the old process group.
> 
> Thanks!  Yep, it'll probably be a race.
> 
> Your patch causes server being killed:
> tst_test.c:1511: TBROK: Test killed by SIGUSR1!
> 
> (no big surprise)

Ah right, we have to set up a dummy handler for SIGUSR1 since default
handler for SIGUSR1 will terminate the process.

Or set the .needs_checkpoints in the tst_test structure and use WAKE and
WAIT pair to synchronize.

-- 
Cyril Hrubis
chrubis@suse.cz

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LTP] 72b1728674 causing regressions [ [PATCH v2] Terminate leftover subprocesses when main test process crashes]
  2022-02-18 14:48             ` Cyril Hrubis
@ 2022-02-18 15:32               ` Petr Vorel
  0 siblings, 0 replies; 9+ messages in thread
From: Petr Vorel @ 2022-02-18 15:32 UTC (permalink / raw)
  To: Cyril Hrubis; +Cc: LTP List, Martin Doucha

Hi Cyril,

> Hi!
> > > I guess that it's a race between the SETSID() and exit(0) in the
> > > move_to_background() function. If the main test process calls exit(0)
> > > before the newly forked child managed to do SETSID() it's killed by the
> > > test library because it's still in the old process group.

> > Thanks!  Yep, it'll probably be a race.

> > Your patch causes server being killed:
> > tst_test.c:1511: TBROK: Test killed by SIGUSR1!

> > (no big surprise)

> Ah right, we have to set up a dummy handler for SIGUSR1 since default
> handler for SIGUSR1 will terminate the process.


Yep, with with adding it it works as expected.
(I wasn't sure if you plan to somehow align with the cleanup from tst_test.c,
which also uses SIGUSR1 for cleanup, but right we're masking it).

static void sig_handler(int sig LTP_ATTRIBUTE_UNUSED)
{
}

and setting it in setup()
SAFE_SIGNAL(SIGUSR1, sig_handler);

> Or set the .needs_checkpoints in the tst_test structure and use WAKE and
> WAIT pair to synchronize.
As signal handler works, I'd go for it unless this preferred.
Going to send proper patch.

I wonder if it's just netstress.c which suffers this problem.

Kind regards,
Petr

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-02-18 15:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-11 11:44 [LTP] [PATCH v2] Terminate leftover subprocesses when main test process crashes Martin Doucha
2022-02-11 12:55 ` Cyril Hrubis
2022-02-11 13:29   ` Martin Doucha
2022-02-12  3:03     ` Li Wang
2022-02-18 12:30       ` [LTP] 72b1728674 causing regressions [ [PATCH v2] Terminate leftover subprocesses when main test process crashes] Petr Vorel
2022-02-18 12:42         ` Cyril Hrubis
2022-02-18 14:42           ` Petr Vorel
2022-02-18 14:48             ` Cyril Hrubis
2022-02-18 15:32               ` Petr Vorel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.