All of lore.kernel.org
 help / color / mirror / Atom feed
* [LTP] [PATCH] test_children_cleanup: allow child zombied for a while
@ 2022-02-13  4:28 Li Wang
  2022-02-13 16:57 ` Petr Vorel
  2022-02-14 10:40 ` Martin Doucha
  0 siblings, 2 replies; 5+ messages in thread
From: Li Wang @ 2022-02-13  4:28 UTC (permalink / raw)
  To: ltp

Zombie process is already dead after SIGKILL is processed by the kernel,
there's usually a kernel reason (similar to being "blocked" waiting on
a syscall to finish) for the process not terminating.

Due to this child having been moved under PID 1(init), there is no waitpid()
guarantee of reaping it anymore. It completely depends on PID 1(init) reclaims
zombie process at its own pace.

So here allow the child exist in a zombie state if PID 1(init) does not
reclaim resource and update the process table instantly.

Failed CI:
  https://github.com/linux-test-project/ltp/runs/5171298664?check_suite_focus=true

  -------------------
  runtest TINFO: * test_children_cleanup
  tst_test.c:1455: TINFO: Timeout per run is 0h 00m 10s
  test_children_cleanup.c:20: TINFO: Main process 69516 starting
  test_children_cleanup.c:35: TINFO: Forked child 69518
  tst_test.c:1500: TINFO: Killed the leftover descendant processes
  tst_test.c:1506: TINFO: If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
  tst_test.c:1508: TBROK: Test killed! (timeout?)

  -------------------
  runtest TINFO: * test_children_cleanup.sh
  TFAIL: Child process was left behind
  cat /proc/69523/status
  Name:	test_children_c
  State:	Z (zombie)
  Tgid:	69523
  Ngid:	0
  Pid:	69523
  PPid:	1
  ...

New test CI:
  https://github.com/wangli5665/ltp/runs/5171508466?check_suite_focus=true

Also, add some debug codes if test fails.

Signed-off-by: Li Wang <liwang@redhat.com>
Cc: Martin Doucha <mdoucha@suse.cz>
---
 lib/newlib_tests/test_children_cleanup.sh | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/lib/newlib_tests/test_children_cleanup.sh b/lib/newlib_tests/test_children_cleanup.sh
index 4b4e8b2f0..ec1a0d4fe 100755
--- a/lib/newlib_tests/test_children_cleanup.sh
+++ b/lib/newlib_tests/test_children_cleanup.sh
@@ -13,7 +13,13 @@ if [ "x$CHILD_PID" = "x" ]; then
 elif ! kill -s 0 $CHILD_PID &>/dev/null; then
 	echo "TPASS: Child process was cleaned up"
 	exit 0
+elif grep -q -E "Z|zombie" /proc/$CHILD_PID/status; then
+       echo "TPASS: Child process was in zombie state"
+       exit 0
 else
 	echo "TFAIL: Child process was left behind"
+	echo "cat /proc/$CHILD_PID/status"
+	echo "---------------------------"
+	cat /proc/$CHILD_PID/status
 	exit 1
 fi
-- 
2.31.1


-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [LTP] [PATCH] test_children_cleanup: allow child zombied for a while
  2022-02-13  4:28 [LTP] [PATCH] test_children_cleanup: allow child zombied for a while Li Wang
@ 2022-02-13 16:57 ` Petr Vorel
  2022-02-14 11:21   ` Li Wang
  2022-02-14 10:40 ` Martin Doucha
  1 sibling, 1 reply; 5+ messages in thread
From: Petr Vorel @ 2022-02-13 16:57 UTC (permalink / raw)
  To: Li Wang; +Cc: ltp

Hi Li,

> Zombie process is already dead after SIGKILL is processed by the kernel,
> there's usually a kernel reason (similar to being "blocked" waiting on
> a syscall to finish) for the process not terminating.

> Due to this child having been moved under PID 1(init), there is no waitpid()
> guarantee of reaping it anymore. It completely depends on PID 1(init) reclaims
> zombie process at its own pace.

> So here allow the child exist in a zombie state if PID 1(init) does not
> reclaim resource and update the process table instantly.

> Failed CI:
>   https://github.com/linux-test-project/ltp/runs/5171298664?check_suite_focus=true

Good catch!

...
>   runtest TINFO: * test_children_cleanup.sh
>   TFAIL: Child process was left behind
>   cat /proc/69523/status
>   Name:	test_children_c
>   State:	Z (zombie)
>   Tgid:	69523
>   Ngid:	0
>   Pid:	69523
>   PPid:	1

...
> +++ b/lib/newlib_tests/test_children_cleanup.sh
> @@ -13,7 +13,13 @@ if [ "x$CHILD_PID" = "x" ]; then
>  elif ! kill -s 0 $CHILD_PID &>/dev/null; then
>  	echo "TPASS: Child process was cleaned up"
>  	exit 0
> +elif grep -q -E "Z|zombie" /proc/$CHILD_PID/status; then

task_state_array[] in fs/proc/array.c has "Z (zombie)"
Is there a reason to grep just "Z"? Because -E "Z|zombie" matches just "Z" and
when we don't grep for "State:" we can likely search for different result.
"Z (zombie)" is here form kernel git beginning (2.6.12-rc2), we should match it.

Kind regards,
Petr

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LTP] [PATCH] test_children_cleanup: allow child zombied for a while
  2022-02-13  4:28 [LTP] [PATCH] test_children_cleanup: allow child zombied for a while Li Wang
  2022-02-13 16:57 ` Petr Vorel
@ 2022-02-14 10:40 ` Martin Doucha
  2022-02-14 11:18   ` Li Wang
  1 sibling, 1 reply; 5+ messages in thread
From: Martin Doucha @ 2022-02-14 10:40 UTC (permalink / raw)
  To: Li Wang, ltp

On 13. 02. 22 5:28, Li Wang wrote:
> diff --git a/lib/newlib_tests/test_children_cleanup.sh b/lib/newlib_tests/test_children_cleanup.sh
> index 4b4e8b2f0..ec1a0d4fe 100755
> --- a/lib/newlib_tests/test_children_cleanup.sh
> +++ b/lib/newlib_tests/test_children_cleanup.sh
> @@ -13,7 +13,13 @@ if [ "x$CHILD_PID" = "x" ]; then
>  elif ! kill -s 0 $CHILD_PID &>/dev/null; then
>  	echo "TPASS: Child process was cleaned up"
>  	exit 0
> +elif grep -q -E "Z|zombie" /proc/$CHILD_PID/status; then
> +       echo "TPASS: Child process was in zombie state"
> +       exit 0

We're in a race condition here either way so reading the status procfile
after checking whether the process still exists can result in failure
even when the child was properly killed. I wrongly believed that
`kill -s 0` would fail when the target process is a zombie because the
manpage vaguely suggests that (see the description of ESRCH errno in the
kill(2) manpage) but it turns out I was wrong again.

I'll send a fix myself later today.

-- 
Martin Doucha   mdoucha@suse.cz
QA Engineer for Software Maintenance
SUSE LINUX, s.r.o.
CORSO IIa
Krizikova 148/34
186 00 Prague 8
Czech Republic

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LTP] [PATCH] test_children_cleanup: allow child zombied for a while
  2022-02-14 10:40 ` Martin Doucha
@ 2022-02-14 11:18   ` Li Wang
  0 siblings, 0 replies; 5+ messages in thread
From: Li Wang @ 2022-02-14 11:18 UTC (permalink / raw)
  To: Martin Doucha; +Cc: LTP List


[-- Attachment #1.1: Type: text/plain, Size: 1298 bytes --]

On Mon, Feb 14, 2022 at 6:40 PM Martin Doucha <mdoucha@suse.cz> wrote:

> On 13. 02. 22 5:28, Li Wang wrote:
> > diff --git a/lib/newlib_tests/test_children_cleanup.sh
> b/lib/newlib_tests/test_children_cleanup.sh
> > index 4b4e8b2f0..ec1a0d4fe 100755
> > --- a/lib/newlib_tests/test_children_cleanup.sh
> > +++ b/lib/newlib_tests/test_children_cleanup.sh
> > @@ -13,7 +13,13 @@ if [ "x$CHILD_PID" = "x" ]; then
> >  elif ! kill -s 0 $CHILD_PID &>/dev/null; then
> >       echo "TPASS: Child process was cleaned up"
> >       exit 0
> > +elif grep -q -E "Z|zombie" /proc/$CHILD_PID/status; then
> > +       echo "TPASS: Child process was in zombie state"
> > +       exit 0
>
> We're in a race condition here either way so reading the status procfile
> after checking whether the process still exists can result in failure
> even when the child was properly killed. I wrongly believed that
>

Ah yes. It is still possible to complete the zombie reclaim just before
doing the grep.



> `kill -s 0` would fail when the target process is a zombie because the
> manpage vaguely suggests that (see the description of ESRCH errno in the
> kill(2) manpage) but it turns out I was wrong again.
>
> I'll send a fix myself later today.
>

Thanks, I look forward to a better solution.


-- 
Regards,
Li Wang

[-- Attachment #1.2: Type: text/html, Size: 2258 bytes --]

[-- Attachment #2: Type: text/plain, Size: 60 bytes --]


-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LTP] [PATCH] test_children_cleanup: allow child zombied for a while
  2022-02-13 16:57 ` Petr Vorel
@ 2022-02-14 11:21   ` Li Wang
  0 siblings, 0 replies; 5+ messages in thread
From: Li Wang @ 2022-02-14 11:21 UTC (permalink / raw)
  To: Petr Vorel; +Cc: LTP List


[-- Attachment #1.1: Type: text/plain, Size: 467 bytes --]

> > +elif grep -q -E "Z|zombie" /proc/$CHILD_PID/status; then
>
> task_state_array[] in fs/proc/array.c has "Z (zombie)"
> Is there a reason to grep just "Z"? Because -E "Z|zombie" matches just "Z"
> and
> when we don't grep for "State:" we can likely search for different result.
> "Z (zombie)" is here form kernel git beginning (2.6.12-rc2), we should
> match it.
>

Thanks for pointing out this, but it seems this fix-way will be superseded.

-- 
Regards,
Li Wang

[-- Attachment #1.2: Type: text/html, Size: 950 bytes --]

[-- Attachment #2: Type: text/plain, Size: 60 bytes --]


-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-02-14 11:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-13  4:28 [LTP] [PATCH] test_children_cleanup: allow child zombied for a while Li Wang
2022-02-13 16:57 ` Petr Vorel
2022-02-14 11:21   ` Li Wang
2022-02-14 10:40 ` Martin Doucha
2022-02-14 11:18   ` Li Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.