ltp.lists.linux.it archive mirror
 help / color / mirror / Atom feed
From: Petr Vorel <pvorel@suse.cz>
To: Li Wang <liwang@redhat.com>, Martin Doucha <martin.doucha@suse.com>
Cc: LTP List <ltp@lists.linux.it>
Subject: [LTP] 72b1728674 causing regressions [ [PATCH v2] Terminate leftover subprocesses when main test process crashes]
Date: Fri, 18 Feb 2022 13:30:21 +0100	[thread overview]
Message-ID: <Yg+RXbUTOxK56iZa@pevik> (raw)
In-Reply-To: <CAEemH2fqy3_t=-dbqE9Bx3VH6sZbNvM_bMon4zMukOh+rmw42Q@mail.gmail.com>

Hi all,

> On Fri, Feb 11, 2022 at 9:30 PM Martin Doucha <mdoucha@suse.cz> wrote:

> > On 11. 02. 22 13:55, Cyril Hrubis wrote:
> > > Hi!
> > >> --- a/lib/tst_test.c
> > >> +++ b/lib/tst_test.c
> > >> @@ -1495,6 +1495,9 @@ static int fork_testrun(void)
> > >>              return TFAIL;
> > >>      }

> > >> +    if (tst_test->forks_child)
> > >> +            kill(-test_pid, SIGKILL);
FYI This broke all LTP network tests which use netstress.c binary,
they now randomly fails after "tst_test.c:1499: TINFO: Killed the leftover descendant processes"

I was thinking whether it's not actually kernel bug which is now visible,
but the behavior is the same on various kernels: SLES 5.14, openSUSE 5.16.8,
older Debian 5.3. and different VM setup (but disabled firewall, also randomly
failing means it's not a firewall issue).

Not sure now whether netstress.c should be altered or we should add flag to the
API to not run this cleanup.

DEBUGGING:

The reason is hidden, because netstress.c output is redirected and printed only
on error.

Sometimes it's just a warning:

# ./tcp_ipsec.sh -s 100:1000:65535:R65535
...
tcp_ipsec 1 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.Qn3NINBzja'
tcp_ipsec 1 TINFO: run client 'netstress -l -H 10.0.0.1 -n 100 -N 100 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 1 TWARN: netstress failed, ret: 2
tcp_ipsec 1 TPASS: netstress passed, median time 4 ms, data: 4 5 4 4
tcp_ipsec 2 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.Qn3NINBzja'
tcp_ipsec 2 TINFO: run client 'netstress -l -H 10.0.0.1 -n 1000 -N 1000 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 2 TPASS: netstress passed, median time 6 ms, data: 6 6 4 5 6
tcp_ipsec 3 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.Qn3NINBzja'
tcp_ipsec 3 TINFO: run client 'netstress -l -H 10.0.0.1 -n 65535 -N 65535 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 3 TPASS: netstress passed, median time 9 ms, data: 11 10 9 9 9
tcp_ipsec 4 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.Qn3NINBzja'
tcp_ipsec 4 TINFO: run client 'netstress -l -H 10.0.0.1 -A 65535 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 4 TPASS: netstress passed, median time 8 ms, data: 8 8 8 9 7
tcp_ipsec 5 TINFO: AppArmor enabled, this may affect test results
tcp_ipsec 5 TINFO: it can be disabled with TST_DISABLE_APPARMOR=1 (requires super/root)
tcp_ipsec 5 TINFO: loaded AppArmor profiles: none

# ./tcp_ipsec.sh -s 100:1000:65535:R65535
...
tcp_ipsec 1 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.4I7mEMaCeK'
tcp_ipsec 1 TINFO: run client 'netstress -l -H 10.0.0.1 -n 100 -N 100 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 1 TPASS: netstress passed, median time 6 ms, data: 5 5 6 6 6
tcp_ipsec 2 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.4I7mEMaCeK'
tcp_ipsec 2 TINFO: run client 'netstress -l -H 10.0.0.1 -n 1000 -N 1000 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 2 TWARN: netstress failed, ret: 2
tcp_ipsec 2 TPASS: netstress passed, median time 5 ms, data: 4 6 5 5
tcp_ipsec 3 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.4I7mEMaCeK'
tcp_ipsec 3 TINFO: run client 'netstress -l -H 10.0.0.1 -n 65535 -N 65535 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 3 TPASS: netstress passed, median time 10 ms, data: 10 10 8 9 10
tcp_ipsec 4 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.4I7mEMaCeK'
tcp_ipsec 4 TINFO: run client 'netstress -l -H 10.0.0.1 -A 65535 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 4 TPASS: netstress passed, median time 11 ms, data: 12 11 11 11 11
tcp_ipsec 5 TINFO: AppArmor enabled, this may affect test results
tcp_ipsec 5 TINFO: it can be disabled with TST_DISABLE_APPARMOR=1 (requires super/root)
tcp_ipsec 5 TINFO: loaded AppArmor profiles: none

Sometimes it's a hard failure, where we at least see the log:
tcp_ipsec 1 TPASS: netstress passed, median time 5 ms, data: 4 7 4 8 5
tcp_ipsec 2 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.rEORDqdaS6'
tcp_ipsec 2 TINFO: run client 'netstress -l -H 10.0.0.1 -n 1000 -N 1000 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 2 TPASS: netstress passed, median time 6 ms, data: 4 6 6 4 6
tcp_ipsec 3 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.rEORDqdaS6'
tcp_ipsec 3 TINFO: run client 'netstress -l -H 10.0.0.1 -n 65535 -N 65535 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 3 TWARN: netstress failed, ret: 2
netstress.c:642: TBROK: Server closed
tst_test.c:1457: TINFO: Timeout per run is 0h 05m 00s
netstress.c:895: TINFO: connection: addr '10.0.0.1', port '33985'
netstress.c:896: TINFO: client max req: 100
netstress.c:897: TINFO: clients num: 2
netstress.c:902: TINFO: client msg size: 65535
netstress.c:903: TINFO: server msg size: 65535
netstress.c:817: TINFO: tcp_tw_reuse is already set
netstress.c:947: TINFO: TCP client is using old TCP API.
netstress.c:789: TINFO: '/proc/sys/net/ipv4/tcp_fastopen' is 1
netstress.c:476: TINFO: Running the test over IPv4
netstress.c:344: TBROK: connect(4, 10.0.0.1:33985, 16) failed: ECONNREFUSED (111)
netstress.c:344: TBROK: connect(3, 10.0.0.1:33985, 16) failed: ECONNREFUSED (111)

But with patch below it shows that server process is killed:

tcp_ipsec 1 TPASS: netstress passed, median time 5 ms, data: 6 5 5 4 5
tcp_ipsec 2 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.DId6DBCQ2W'
tcp_ipsec 2 TINFO: run client 'netstress -l -H 10.0.0.1 -n 1000 -N 1000 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 2 TINFO: ===== 1: remote netstress, ret: 0, cat tst_netload.log =====
tst_test.c:1457: TINFO: Timeout per run is 0h 05m 00s
netstress.c:923: TINFO: max requests '10'
netstress.c:947: TINFO: TCP server is using old TCP API.
netstress.c:789: TINFO: '/proc/sys/net/ipv4/tcp_fastopen' is 1
netstress.c:678: TINFO: assigning a name to the server socket...
netstress.c:685: TINFO: bind to port 36103
netstress.c:706: TINFO: Listen on the socket '5'
tst_test.c:1499: TINFO: Killed the leftover descendant processes
=> HERE netstress server process is killed after TPASS

Summary:
passed   0
failed   0
broken   0
skipped  0
warnings 0
---

tcp_ipsec 2 TWARN: netstress failed, ret: 2
=> causing TWARN for client.

And hard failure:

tcp_ipsec 4 TINFO: ===== 5: remote netstress, ret: 0, cat tst_netload.log =====
tst_test.c:1457: TINFO: Timeout per run is 0h 05m 00s
netstress.c:923: TINFO: max requests '10'
netstress.c:947: TINFO: TCP server is using old TCP API.
netstress.c:789: TINFO: '/proc/sys/net/ipv4/tcp_fastopen' is 1
netstress.c:678: TINFO: assigning a name to the server socket...
netstress.c:685: TINFO: bind to port 36709
netstress.c:706: TINFO: Listen on the socket '5'
tst_test.c:1499: TINFO: Killed the leftover descendant processes

Summary:
passed   0
failed   0
broken   0
skipped  0
warnings 0
---

tcp_ipsec 4 TWARN: netstress failed, ret: 2
netstress.c:642: TBROK: Server closed
tst_test.c:1457: TINFO: Timeout per run is 0h 05m 00s
netstress.c:874: TINFO: rand start seed 0xff9e
netstress.c:895: TINFO: connection: addr '10.0.0.1', port '36709'
netstress.c:896: TINFO: client max req: 100
netstress.c:897: TINFO: clients num: 2
netstress.c:900: TINFO: random msg size [5 65530]
netstress.c:817: TINFO: tcp_tw_reuse is already set
netstress.c:947: TINFO: TCP client is using old TCP API.
netstress.c:789: TINFO: '/proc/sys/net/ipv4/tcp_fastopen' is 1
netstress.c:476: TINFO: Running the test over IPv4
netstress.c:344: TBROK: connect(4, 10.0.0.1:36709, 16) failed: ECONNREFUSED (111)
netstress.c:344: TBROK: connect(3, 10.0.0.1:36709, 16) failed: ECONNREFUSED (111)

Summary:
passed   0
failed   0
broken   2
skipped  0
warnings 0
tcp_ipsec 4 TFAIL: expected 'pass' but ret: '2'

Kind regards,
Petr

+++ testcases/lib/tst_net.sh
@@ -728,6 +728,10 @@ tst_netload()
 
 	for i in $(seq 1 $run_cnt); do
 		tst_rhost_run -c "netstress $s_opts" > tst_netload.log 2>&1
+		tst_res_ TINFO "===== $i: remote netstress, ret: $ret, cat tst_netload.log ====="
+		cat tst_netload.log
+		printf -- "---\n\n"
+
 		if [ $? -ne 0 ]; then
 			cat tst_netload.log
 			local ttype="TFAIL"

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

  reply	other threads:[~2022-02-18 12:30 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-11 11:44 [LTP] [PATCH v2] Terminate leftover subprocesses when main test process crashes Martin Doucha
2022-02-11 12:55 ` Cyril Hrubis
2022-02-11 13:29   ` Martin Doucha
2022-02-12  3:03     ` Li Wang
2022-02-18 12:30       ` Petr Vorel [this message]
2022-02-18 12:42         ` [LTP] 72b1728674 causing regressions [ [PATCH v2] Terminate leftover subprocesses when main test process crashes] Cyril Hrubis
2022-02-18 14:42           ` Petr Vorel
2022-02-18 14:48             ` Cyril Hrubis
2022-02-18 15:32               ` Petr Vorel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yg+RXbUTOxK56iZa@pevik \
    --to=pvorel@suse.cz \
    --cc=liwang@redhat.com \
    --cc=ltp@lists.linux.it \
    --cc=martin.doucha@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).