* [LTP] [PATCH] test_children_cleanup.sh: Fix race condition
@ 2022-02-14 16:51 Martin Doucha
2022-02-15 4:30 ` Li Wang
0 siblings, 1 reply; 6+ messages in thread
From: Martin Doucha @ 2022-02-14 16:51 UTC (permalink / raw)
To: ltp
Processes can stay alive for a short while even after receiving SIGKILL.
Give the child in subprocess cleanup libtest up to 5 seconds to fully exit
before reporting that it was left behind.
Signed-off-by: Martin Doucha <mdoucha@suse.cz>
---
lib/newlib_tests/test_children_cleanup.sh | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/lib/newlib_tests/test_children_cleanup.sh b/lib/newlib_tests/test_children_cleanup.sh
index 4b4e8b2f0..44505aa51 100755
--- a/lib/newlib_tests/test_children_cleanup.sh
+++ b/lib/newlib_tests/test_children_cleanup.sh
@@ -10,10 +10,19 @@ rm "$TMPFILE"
if [ "x$CHILD_PID" = "x" ]; then
echo "TFAIL: Child process was not created"
exit 1
-elif ! kill -s 0 $CHILD_PID &>/dev/null; then
- echo "TPASS: Child process was cleaned up"
- exit 0
-else
- echo "TFAIL: Child process was left behind"
- exit 1
fi
+
+# The child process can stay alive for a short while even after receiving
+# SIGKILL, especially if the system is under heavy load. Wait up to 5 seconds
+# for it to fully exit.
+for i in `seq 6`; do
+ if ! [ -e "/proc/$CHILD_PID" ]; then
+ echo "TPASS: Child process was cleaned up"
+ exit 0
+ fi
+
+ sleep 1
+done
+
+echo "TFAIL: Child process was left behind"
+exit 1
--
2.34.1
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [LTP] [PATCH] test_children_cleanup.sh: Fix race condition
2022-02-14 16:51 [LTP] [PATCH] test_children_cleanup.sh: Fix race condition Martin Doucha
@ 2022-02-15 4:30 ` Li Wang
2022-02-15 7:34 ` Petr Vorel
2022-02-15 9:27 ` Martin Doucha
0 siblings, 2 replies; 6+ messages in thread
From: Li Wang @ 2022-02-15 4:30 UTC (permalink / raw)
To: Martin Doucha; +Cc: LTP List
[-- Attachment #1.1: Type: text/plain, Size: 2616 bytes --]
On Tue, Feb 15, 2022 at 12:51 AM Martin Doucha <mdoucha@suse.cz> wrote:
> Processes can stay alive for a short while even after receiving SIGKILL.
> Give the child in subprocess cleanup libtest up to 5 seconds to fully exit
> before reporting that it was left behind.
> Signed-off-by: Martin Doucha <mdoucha@suse.cz>
> ---
> lib/newlib_tests/test_children_cleanup.sh | 21 +++++++++++++++------
> 1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/lib/newlib_tests/test_children_cleanup.sh
> b/lib/newlib_tests/test_children_cleanup.sh
> index 4b4e8b2f0..44505aa51 100755
> --- a/lib/newlib_tests/test_children_cleanup.sh
> +++ b/lib/newlib_tests/test_children_cleanup.sh
> @@ -10,10 +10,19 @@ rm "$TMPFILE"
> if [ "x$CHILD_PID" = "x" ]; then
> echo "TFAIL: Child process was not created"
> exit 1
> -elif ! kill -s 0 $CHILD_PID &>/dev/null; then
> - echo "TPASS: Child process was cleaned up"
> - exit 0
> -else
> - echo "TFAIL: Child process was left behind"
> - exit 1
> fi
> +
> +# The child process can stay alive for a short while even after receiving
> +# SIGKILL, especially if the system is under heavy load. Wait up to 5
> seconds
> +# for it to fully exit.
>
It doesn't work for all platforms and we can not guarantee how long it will
cost before PID 1 reaps zombie process.
Also, I just get to know that Docker does not run processes under a
special init process that properly reaps child processes, so that it is
possible for the container to end up with zombie processes that cause
all sorts of trouble.
I even try adding `kill -SIGCHLD 1` but does not work as expected.
See CI jobs:
https://github.com/wangli5665/ltp/runs/5194270998?check_suite_focus=true
<https://mail.google.com/mail/u/1/%E2%80%8Bhttps://github.com/wangli5665/ltp/runs/5194270998?check_suite_focus=true>
Therefore, I suggest giving a chance to my refined patch V2 :).
--- a/lib/newlib_tests/test_children_cleanup.sh
+++ b/lib/newlib_tests/test_children_cleanup.sh
@@ -10,10 +10,16 @@ rm "$TMPFILE"
if [ "x$CHILD_PID" = "x" ]; then
echo "TFAIL: Child process was not created"
exit 1
+elif grep -q "Z (zombie)" /proc/$CHILD_PID/status; then
+ echo "TPASS: Child process is in zombie state"
+ exit 0
elif ! kill -s 0 $CHILD_PID &>/dev/null; then
echo "TPASS: Child process was cleaned up"
exit 0
else
echo "TFAIL: Child process was left behind"
+ echo "cat /proc/$CHILD_PID/status"
+ echo "---------------------------"
+ cat /proc/$CHILD_PID/status
exit 1
fi
--
Regards,
Li Wang
[-- Attachment #1.2: Type: text/html, Size: 4472 bytes --]
[-- Attachment #2: Type: text/plain, Size: 60 bytes --]
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [LTP] [PATCH] test_children_cleanup.sh: Fix race condition
2022-02-15 4:30 ` Li Wang
@ 2022-02-15 7:34 ` Petr Vorel
2022-02-15 9:27 ` Martin Doucha
1 sibling, 0 replies; 6+ messages in thread
From: Petr Vorel @ 2022-02-15 7:34 UTC (permalink / raw)
To: Li Wang; +Cc: LTP List
Hi all,
...
> It doesn't work for all platforms and we can not guarantee how long it will
> cost before PID 1 reaps zombie process.
> Also, I just get to know that Docker does not run processes under a
> special init process that properly reaps child processes, so that it is
> possible for the container to end up with zombie processes that cause
> all sorts of trouble.
> I even try adding `kill -SIGCHLD 1` but does not work as expected.
> See CI jobs:
> https://github.com/wangli5665/ltp/runs/5194270998?check_suite_focus=true
> <https://mail.google.com/mail/u/1/%E2%80%8Bhttps://github.com/wangli5665/ltp/runs/5194270998?check_suite_focus=true>
> Therefore, I suggest giving a chance to my refined patch V2 :).
> --- a/lib/newlib_tests/test_children_cleanup.sh
> +++ b/lib/newlib_tests/test_children_cleanup.sh
> @@ -10,10 +10,16 @@ rm "$TMPFILE"
> if [ "x$CHILD_PID" = "x" ]; then
> echo "TFAIL: Child process was not created"
> exit 1
> +elif grep -q "Z (zombie)" /proc/$CHILD_PID/status; then
> + echo "TPASS: Child process is in zombie state"
> + exit 0
> elif ! kill -s 0 $CHILD_PID &>/dev/null; then
> echo "TPASS: Child process was cleaned up"
> exit 0
> else
> echo "TFAIL: Child process was left behind"
> + echo "cat /proc/$CHILD_PID/status"
> + echo "---------------------------"
> + cat /proc/$CHILD_PID/status
> exit 1
> fi
Li's approach LGTM.
Kind regards,
Petr
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [LTP] [PATCH] test_children_cleanup.sh: Fix race condition
2022-02-15 4:30 ` Li Wang
2022-02-15 7:34 ` Petr Vorel
@ 2022-02-15 9:27 ` Martin Doucha
2022-02-15 9:39 ` Petr Vorel
1 sibling, 1 reply; 6+ messages in thread
From: Martin Doucha @ 2022-02-15 9:27 UTC (permalink / raw)
To: Li Wang; +Cc: LTP List
On 15. 02. 22 5:30, Li Wang wrote:
> It doesn't work for all platforms and we can not guarantee how long it will
> cost before PID 1 reaps zombie process.
>
> Also, I just get to know that Docker does not run processes under a
> special init process that properly reaps child processes, so that it is
> possible for the container to end up with zombie processes that cause
> all sorts of trouble.
>
> I even try adding `kill -SIGCHLD 1` but does not work as expected.
>
> See CI jobs:
> https://github.com/wangli5665/ltp/runs/5194270998?check_suite_focus=true
> <https://mail.google.com/mail/u/1/%E2%80%8Bhttps://github.com/wangli5665/ltp/runs/5194270998?check_suite_focus=true>
>
> Therefore, I suggest giving a chance to my refined patch V2 :).
When I was testing the libtest yesterday on a moderately stressed
machine, I actually saw the child process still in the R state during
the first state check a couple of times. That's why I've added looping
with delay.
On the other hand I did not see any zombies even after several hundred
tries. But I can add a zombie check to my patch a well.
--
Martin Doucha mdoucha@suse.cz
QA Engineer for Software Maintenance
SUSE LINUX, s.r.o.
CORSO IIa
Krizikova 148/34
186 00 Prague 8
Czech Republic
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [LTP] [PATCH] test_children_cleanup.sh: Fix race condition
2022-02-15 9:27 ` Martin Doucha
@ 2022-02-15 9:39 ` Petr Vorel
2022-02-15 9:50 ` Li Wang
0 siblings, 1 reply; 6+ messages in thread
From: Petr Vorel @ 2022-02-15 9:39 UTC (permalink / raw)
To: Martin Doucha; +Cc: LTP List
> On 15. 02. 22 5:30, Li Wang wrote:
> > It doesn't work for all platforms and we can not guarantee how long it will
> > cost before PID 1 reaps zombie process.
> > Also, I just get to know that Docker does not run processes under a
> > special init process that properly reaps child processes, so that it is
> > possible for the container to end up with zombie processes that cause
> > all sorts of trouble.
> > I even try adding `kill -SIGCHLD 1` but does not work as expected.
> > See CI jobs:
> > https://github.com/wangli5665/ltp/runs/5194270998?check_suite_focus=true
> > <https://mail.google.com/mail/u/1/%E2%80%8Bhttps://github.com/wangli5665/ltp/runs/5194270998?check_suite_focus=true>
> > Therefore, I suggest giving a chance to my refined patch V2 :).
> When I was testing the libtest yesterday on a moderately stressed
> machine, I actually saw the child process still in the R state during
> the first state check a couple of times. That's why I've added looping
> with delay.
> On the other hand I did not see any zombies even after several hundred
> tries. But I can add a zombie check to my patch a well.
I'd be for it. As Li noticed, containers behave really differently
(maybe it'd be faster to debug tests using podman).
Kind regards,
Petr
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [LTP] [PATCH] test_children_cleanup.sh: Fix race condition
2022-02-15 9:39 ` Petr Vorel
@ 2022-02-15 9:50 ` Li Wang
0 siblings, 0 replies; 6+ messages in thread
From: Li Wang @ 2022-02-15 9:50 UTC (permalink / raw)
To: Petr Vorel; +Cc: LTP List
[-- Attachment #1.1: Type: text/plain, Size: 1904 bytes --]
On Tue, Feb 15, 2022 at 5:39 PM Petr Vorel <pvorel@suse.cz> wrote:
> > On 15. 02. 22 5:30, Li Wang wrote:
> > > It doesn't work for all platforms and we can not guarantee how long it
> will
> > > cost before PID 1 reaps zombie process.
>
> > > Also, I just get to know that Docker does not run processes under a
> > > special init process that properly reaps child processes, so that it is
> > > possible for the container to end up with zombie processes that cause
> > > all sorts of trouble.
>
> > > I even try adding `kill -SIGCHLD 1` but does not work as expected.
>
> > > See CI jobs:
> > >
> https://github.com/wangli5665/ltp/runs/5194270998?check_suite_focus=true
> > > <
> https://mail.google.com/mail/u/1/%E2%80%8Bhttps://github.com/wangli5665/ltp/runs/5194270998?check_suite_focus=true
> >
>
> > > Therefore, I suggest giving a chance to my refined patch V2 :).
>
> > When I was testing the libtest yesterday on a moderately stressed
> > machine, I actually saw the child process still in the R state during
>
That might cause by signal asynchronous processing when the system is under
pressure.
> > the first state check a couple of times. That's why I've added looping
> > with delay.
> > On the other hand I did not see any zombies even after several hundred
> > tries. But I can add a zombie check to my patch a well.
> I'd be for it. As Li noticed, containers behave really differently
> (maybe it'd be faster to debug tests using podman).
>
Right, we have to consider many scenarios:
1. asynchronous processing signal so that child still in 'R'
2. the child correctly terminated on bare-metal system
3. allow zombied while running by docker (without a formal init(PID 1)
process)
Especially scenario-3 confused me for quite a while.
@Martin, feel free to send the patch that includes the above.
or I can help summarize these in the commit description.
--
Regards,
Li Wang
[-- Attachment #1.2: Type: text/html, Size: 3777 bytes --]
[-- Attachment #2: Type: text/plain, Size: 60 bytes --]
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-02-15 9:50 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-14 16:51 [LTP] [PATCH] test_children_cleanup.sh: Fix race condition Martin Doucha
2022-02-15 4:30 ` Li Wang
2022-02-15 7:34 ` Petr Vorel
2022-02-15 9:27 ` Martin Doucha
2022-02-15 9:39 ` Petr Vorel
2022-02-15 9:50 ` Li Wang
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.