All of lore.kernel.org
 help / color / mirror / Atom feed
* [LTP] [PATCH 1/1] controllers/cgroup_fj: fix longtime wait cgroup_fj_proc.
@ 2016-10-13  3:05 shuwang
  2016-10-13 14:26 ` Cyril Hrubis
  0 siblings, 1 reply; 4+ messages in thread
From: shuwang @ 2016-10-13  3:05 UTC (permalink / raw)
  To: ltp

From: shuwang <shuwang@redhat.com>

On some machines, when many cgroup_fj_proc created on the background,
killall may failed to find and kill them all as the processes are
just created and not ready. And that will cause the ltp testrun wait
forever. So changed to use kill -9 instead.

Signed-off-by: shuwang <shuwang@redhat.com>
---
 testcases/kernel/controllers/cgroup_fj/cgroup_fj_stress.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/testcases/kernel/controllers/cgroup_fj/cgroup_fj_stress.sh b/testcases/kernel/controllers/cgroup_fj/cgroup_fj_stress.sh
index 698aa49..8c21d59 100755
--- a/testcases/kernel/controllers/cgroup_fj/cgroup_fj_stress.sh
+++ b/testcases/kernel/controllers/cgroup_fj/cgroup_fj_stress.sh
@@ -107,6 +107,7 @@ attach_task()
     if [ -z "$ppid" ]; then
         cgroup_fj_proc&
         pid=$!
+        pids+="$pid "
     else
         pid="$ppid"
     fi
@@ -148,7 +149,7 @@ case $attach_operation in
 "each" )
     tst_resm TINFO "Attaching task to each subgroup"
     attach_task "$start_path" 0
-    ROD killall -9 "cgroup_fj_proc"
+    ROD kill -9 "$pids"
     # Wait for attached tasks to terminate
     wait
     ;;
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [LTP] [PATCH 1/1] controllers/cgroup_fj: fix longtime wait cgroup_fj_proc.
  2016-10-13  3:05 [LTP] [PATCH 1/1] controllers/cgroup_fj: fix longtime wait cgroup_fj_proc shuwang
@ 2016-10-13 14:26 ` Cyril Hrubis
  2016-10-14  8:07   ` Shu Wang
  0 siblings, 1 reply; 4+ messages in thread
From: Cyril Hrubis @ 2016-10-13 14:26 UTC (permalink / raw)
  To: ltp

Hi!
> On some machines, when many cgroup_fj_proc created on the background,
> killall may failed to find and kill them all as the processes are
> just created and not ready. And that will cause the ltp testrun wait
> forever. So changed to use kill -9 instead.

What is the exact race here? What exactly "just created and not ready"
means here?

-- 
Cyril Hrubis
chrubis@suse.cz

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [LTP] [PATCH 1/1] controllers/cgroup_fj: fix longtime wait cgroup_fj_proc.
  2016-10-13 14:26 ` Cyril Hrubis
@ 2016-10-14  8:07   ` Shu Wang
  2016-10-17 14:04     ` Cyril Hrubis
  0 siblings, 1 reply; 4+ messages in thread
From: Shu Wang @ 2016-10-14  8:07 UTC (permalink / raw)
  To: ltp



----- Original Message -----
> From: "Cyril Hrubis" <chrubis@suse.cz>
> To: shuwang@redhat.com
> Cc: ltp@lists.linux.it
> Sent: Thursday, October 13, 2016 10:26:04 PM
> Subject: Re: [LTP] [PATCH 1/1] controllers/cgroup_fj: fix longtime wait cgroup_fj_proc.
> 
> Hi!
> > On some machines, when many cgroup_fj_proc created on the background,
> > killall may failed to find and kill them all as the processes are
> > just created and not ready. And that will cause the ltp testrun wait
> > forever. So changed to use kill -9 instead.
> 
> What is the exact race here? What exactly "just created and not ready"
> means here?

The case cgroup_fj_stress.sh creates many cgroup subgroups according to
$1 (subgroup_num) and $2 (subgroup_depth) parameters, and if $3 
attach_operation is 'each', it creates cgroup_fj_proc on the background
attached to each subgroup.

The race here is to use 'killall -9 cgroup_fj_proc' right after background
processes cgroup_fj_proc were created. And a few cgroup_fj_proc processes
may not be killed, still running on the background, stalls the wait command.

reproducer:
for i in `seq 10`
do
 sleep 10000 &
done;
killall -9 sleep;
wait;                  #stall here


> 
> --
> Cyril Hrubis
> chrubis@suse.cz
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [LTP] [PATCH 1/1] controllers/cgroup_fj: fix longtime wait cgroup_fj_proc.
  2016-10-14  8:07   ` Shu Wang
@ 2016-10-17 14:04     ` Cyril Hrubis
  0 siblings, 0 replies; 4+ messages in thread
From: Cyril Hrubis @ 2016-10-17 14:04 UTC (permalink / raw)
  To: ltp

Hi!
> The case cgroup_fj_stress.sh creates many cgroup subgroups according to
> $1 (subgroup_num) and $2 (subgroup_depth) parameters, and if $3 
> attach_operation is 'each', it creates cgroup_fj_proc on the background
> attached to each subgroup.
> 
> The race here is to use 'killall -9 cgroup_fj_proc' right after background
> processes cgroup_fj_proc were created. And a few cgroup_fj_proc processes
> may not be killed, still running on the background, stalls the wait command.
> 
> reproducer:
> for i in `seq 10`
> do
>  sleep 10000 &
> done;
> killall -9 sleep;
> wait;                  #stall here

This reproducer should have been in the commit message. I've managed to
hit the problem with this once redirected the output from this script
into a file. Possibly printing output into stdout slowed it down enough
so that the issue haven't shown.

I was thinking if it's safe to use variable to store the pids, since in the
each case we fork fair amount of pids (it tops at ~1000) and there is a
limit on the command line argument lenght.

For our case it should suffice, even when counting 10 characters to
store pid and number we have string that is ~10000 chars long, that is
still ~100x times less than usuall limit on the number of pids.

It may still break if someone really wants to stress a machine with a
large amount of memory though. If you pass a large enough parameters to
the script, it will run probably for a day or two then may fail to kill
the processes because the kill command line was too long. So maybe it
would be better to store these into a file, but that may slow down the
test significantly, which should be avoided as well.


-- 
Cyril Hrubis
chrubis@suse.cz

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-10-17 14:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-13  3:05 [LTP] [PATCH 1/1] controllers/cgroup_fj: fix longtime wait cgroup_fj_proc shuwang
2016-10-13 14:26 ` Cyril Hrubis
2016-10-14  8:07   ` Shu Wang
2016-10-17 14:04     ` Cyril Hrubis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.