All of lore.kernel.org
 help / color / mirror / Atom feed
* [LTP] [PATCH] cgroups/cgroup_regression_test: fix sporadic failures
@ 2011-04-19 13:27 Jan Stancek
  2011-04-19 16:13 ` Garrett Cooper
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Stancek @ 2011-04-19 13:27 UTC (permalink / raw)
  To: ltp-list

[-- Attachment #1: Type: text/plain, Size: 1383 bytes --]


There were failures caused by incomplete cleanup,
leaving groups behind after some stress tests.
Some stress tests failed to complete upon receiving SIGUSR1.

1. dmesg can rotate and number of found bugs can actually go down
clear the buffer before test to avoid this

2. test_5: test should mount 2 subsystems, but mount command
says "$subsys" instead of "$subsys2"

3. test_6: test may leave groups behind, fix rmdir
to match test_6_1.sh

4. test_7_2: mounts whole cgroup not $subsys

5. test_10: can leave cgroups umounted before cleanup
make sure cgroups are mounted before doing cleanup

6. test_*.sh scripts use trap in loop, which may cause bash
to miss signal, see https://bugzilla.redhat.com/show_bug.cgi?id=695656
move trap outside loop to avoid it

Signed-off-by: Jan Stancek <jstancek@redhat.com>
---
 .../controllers/cgroup/cgroup_regression_test.sh   |   15 ++++++++++-----
 testcases/kernel/controllers/cgroup/test_10_1.sh   |    3 +--
 testcases/kernel/controllers/cgroup/test_10_2.sh   |    3 +--
 testcases/kernel/controllers/cgroup/test_3_1.sh    |    3 +--
 testcases/kernel/controllers/cgroup/test_3_2.sh    |    3 +--
 testcases/kernel/controllers/cgroup/test_6_1.sh    |    3 +--
 testcases/kernel/controllers/cgroup/test_9_1.sh    |    3 +--
 testcases/kernel/controllers/cgroup/test_9_2.sh    |    3 +--
 8 files changed, 17 insertions(+), 19 deletions(-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-cgroups-cgroup_regression_test-fix-sporadic-failures.patch --]
[-- Type: text/x-patch; name=0001-cgroups-cgroup_regression_test-fix-sporadic-failures.patch, Size: 6223 bytes --]

diff --git a/testcases/kernel/controllers/cgroup/cgroup_regression_test.sh b/testcases/kernel/controllers/cgroup/cgroup_regression_test.sh
index 5527afc..6c74b92 100755
--- a/testcases/kernel/controllers/cgroup/cgroup_regression_test.sh
+++ b/testcases/kernel/controllers/cgroup/cgroup_regression_test.sh
@@ -40,6 +40,7 @@ elif [ "x$(id -ru)" != x0 ]; then
 	exit 0
 fi
 
+dmesg -c > /dev/null
 nr_bug=`dmesg | grep -c "kernel BUG"`
 nr_null=`dmesg | grep -c "kernel NULL pointer dereference"`
 nr_warning=`dmesg | grep -c "^WARNING"`
@@ -78,6 +79,8 @@ check_kernel_bug()
 	nr_warning=$new_warning
 	nr_lockdep=$new_lockdep
 
+	echo "check_kernel_bug found something!"
+	dmesg
 	failed=1
 	return 0
 }
@@ -254,7 +257,7 @@ test_5()
 	subsys1=`tail -n 1 /proc/cgroups | awk '{ print $1 }'`
 	subsys2=`tail -n 2 /proc/cgroups | head -1 | awk '{ print $1 }'`
 
-	mount -t cgroup -o $subsys1,$subsys xxx cgroup/
+	mount -t cgroup -o $subsys1,$subsys2 xxx cgroup/
 	if [ $? -ne 0 ]; then
 		tst_resm TFAIL "mount $subsys1 and $subsys2 failed"
 		failed=1
@@ -325,7 +328,7 @@ test_6()
 
 	# clean up
 	mount -t cgroup -o ns xxx cgroup/ > /dev/null 2>&1
-	rmdir cgroup/[1-9] > /dev/null 2>&1
+	rmdir cgroup/[1-9]* > /dev/null 2>&1
 	umount cgroup/
 }
 
@@ -362,7 +365,7 @@ test_7_2()
 {
 	mount -t cgroup xxx cgroup/
 	if [ $? -ne 0 ]; then
-		tst_resm TFAIL "failed to mount $subsys"
+		tst_resm TFAIL "failed to mount cgroup"
 		failed=1
 		return
 	fi
@@ -499,8 +502,9 @@ test_10()
 	wait $pid1
 	wait $pid2
 
-	rmdir cgroup/0 2> /dev/null
-	umount cgroup/ 2> /dev/null
+	mount -t cgroup none cgroup 2> /dev/null
+	rmdir cgroup/0
+	umount cgroup/
 
 	check_kernel_bug
 	if [ $? -eq 1 ]; then
@@ -510,6 +514,7 @@ test_10()
 
 # main
 
+failed=0
 mkdir cgroup/
 
 for ((cur = 1; cur <= $TST_TOTAL; cur++))
diff --git a/testcases/kernel/controllers/cgroup/test_10_1.sh b/testcases/kernel/controllers/cgroup/test_10_1.sh
index 6284722..ffa0d5f 100755
--- a/testcases/kernel/controllers/cgroup/test_10_1.sh
+++ b/testcases/kernel/controllers/cgroup/test_10_1.sh
@@ -22,13 +22,12 @@
 ##                                                                            ##
 ################################################################################
 
+trap exit SIGUSR1
 for ((; ;))
 {
 	mount -t cgroup xxx cgroup/ > /dev/null 2>&1
 	mkdir cgroup/0 > /dev/null 2>&1
 	rmdir cgroup/0 > /dev/null 2>&1
 	umount cgroup/ > /dev/null 2>&1
-
-	trap exit SIGUSR1
 }
 
diff --git a/testcases/kernel/controllers/cgroup/test_10_2.sh b/testcases/kernel/controllers/cgroup/test_10_2.sh
index 82b91e3..f811bbd 100755
--- a/testcases/kernel/controllers/cgroup/test_10_2.sh
+++ b/testcases/kernel/controllers/cgroup/test_10_2.sh
@@ -22,11 +22,10 @@
 ##                                                                            ##
 ################################################################################
 
+trap exit SIGUSR1
 for ((; ;))
 {
 	mount -t cgroup xxx cgroup/ > /dev/null 2>&1
 	umount cgroup/ > /dev/null 2>&1
-
-	trap exit SIGUSR1
 }
 
diff --git a/testcases/kernel/controllers/cgroup/test_3_1.sh b/testcases/kernel/controllers/cgroup/test_3_1.sh
index 86627d4..507a2c4 100755
--- a/testcases/kernel/controllers/cgroup/test_3_1.sh
+++ b/testcases/kernel/controllers/cgroup/test_3_1.sh
@@ -22,11 +22,10 @@
 ##                                                                            ##
 ################################################################################
 
+trap exit SIGUSR1
 for ((; ;))
 {
 	mkdir cgroup/0
 	rmdir cgroup/0
-
-	trap exit SIGUSR1
 }
 
diff --git a/testcases/kernel/controllers/cgroup/test_3_2.sh b/testcases/kernel/controllers/cgroup/test_3_2.sh
index c942969..9f83d9d 100755
--- a/testcases/kernel/controllers/cgroup/test_3_2.sh
+++ b/testcases/kernel/controllers/cgroup/test_3_2.sh
@@ -22,10 +22,9 @@
 ##                                                                            ##
 ################################################################################
 
+trap exit SIGUSR1
 for ((; ;))
 {
 	cat /proc/sched_debug > /dev/null
-
-	trap exit SIGUSR1
 }
 
diff --git a/testcases/kernel/controllers/cgroup/test_6_1.sh b/testcases/kernel/controllers/cgroup/test_6_1.sh
index ff70677..e91e794 100755
--- a/testcases/kernel/controllers/cgroup/test_6_1.sh
+++ b/testcases/kernel/controllers/cgroup/test_6_1.sh
@@ -22,12 +22,11 @@
 ##                                                                            ##
 ################################################################################
 
+trap exit SIGUSR1
 for ((; ;))
 {
 	mount -t cgroup -o ns xxx cgroup/ > /dev/null 2>&1
 	rmdir cgroup/[1-9]* > /dev/null 2>&1
 	umount cgroup/ > /dev/null 2>&1
-
-	trap exit SIGUSR1
 }
 
diff --git a/testcases/kernel/controllers/cgroup/test_9_1.sh b/testcases/kernel/controllers/cgroup/test_9_1.sh
index 24a1524..c6e8f6f 100755
--- a/testcases/kernel/controllers/cgroup/test_9_1.sh
+++ b/testcases/kernel/controllers/cgroup/test_9_1.sh
@@ -22,13 +22,12 @@
 ##                                                                            ##
 ################################################################################
 
+trap exit SIGUSR1
 for ((; ;))
 {
 #	mount -t cgroup -o debug xxx cgroup/ > /dev/null 2>&1
 	mount -t cgroup xxx cgroup/ > /dev/null 2>&1
 	cat cgroup/release_agent > /dev/null 2>&1
 	umount cgroup/ > /dev/null 2>&1
-
-	trap exit SIGUSR1
 }
 
diff --git a/testcases/kernel/controllers/cgroup/test_9_2.sh b/testcases/kernel/controllers/cgroup/test_9_2.sh
index 654df4d..f8e1c61 100755
--- a/testcases/kernel/controllers/cgroup/test_9_2.sh
+++ b/testcases/kernel/controllers/cgroup/test_9_2.sh
@@ -22,12 +22,11 @@
 ##                                                                            ##
 ################################################################################
 
+trap exit SIGUSR1
 for ((; ;))
 {
 #	mount -t cgroup -o debug xxx cgroup/ > /dev/null 2>&1
 	mount -t cgroup xxx cgroup/ > /dev/null 2>&1
 	umount cgroup/ > /dev/null 2>&1
-
-	trap exit SIGUSR1
 }
 

[-- Attachment #3: Type: text/plain, Size: 438 bytes --]

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev

[-- Attachment #4: Type: text/plain, Size: 155 bytes --]

_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [LTP] [PATCH] cgroups/cgroup_regression_test: fix sporadic failures
  2011-04-19 13:27 [LTP] [PATCH] cgroups/cgroup_regression_test: fix sporadic failures Jan Stancek
@ 2011-04-19 16:13 ` Garrett Cooper
  2011-04-19 16:31   ` Jan Stancek
  0 siblings, 1 reply; 7+ messages in thread
From: Garrett Cooper @ 2011-04-19 16:13 UTC (permalink / raw)
  To: Jan Stancek; +Cc: ltp-list

On Tue, Apr 19, 2011 at 6:27 AM, Jan Stancek <jstancek@redhat.com> wrote:
>
> There were failures caused by incomplete cleanup,
> leaving groups behind after some stress tests.
> Some stress tests failed to complete upon receiving SIGUSR1.
>
> 1. dmesg can rotate and number of found bugs can actually go down
> clear the buffer before test to avoid this
>
> 2. test_5: test should mount 2 subsystems, but mount command
> says "$subsys" instead of "$subsys2"
>
> 3. test_6: test may leave groups behind, fix rmdir
> to match test_6_1.sh
>
> 4. test_7_2: mounts whole cgroup not $subsys
>
> 5. test_10: can leave cgroups umounted before cleanup
> make sure cgroups are mounted before doing cleanup
>
> 6. test_*.sh scripts use trap in loop, which may cause bash
> to miss signal, see https://bugzilla.redhat.com/show_bug.cgi?id=695656
> move trap outside loop to avoid it

    I personally don't have a lot of context into cgroups, but when is
it acceptable for Linux to send SIGUSR1 when mounting, unmounting, or
removing cgroup directories?
Thanks,
-Garrett

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LTP] [PATCH] cgroups/cgroup_regression_test: fix sporadic failures
  2011-04-19 16:13 ` Garrett Cooper
@ 2011-04-19 16:31   ` Jan Stancek
  2011-04-19 17:40     ` Garrett Cooper
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Stancek @ 2011-04-19 16:31 UTC (permalink / raw)
  To: Garrett Cooper; +Cc: ltp-list



----- Original Message -----
> From: "Garrett Cooper" <yanegomi@gmail.com>
> To: "Jan Stancek" <jstancek@redhat.com>
> Cc: ltp-list@lists.sourceforge.net
> Sent: Tuesday, April 19, 2011 6:13:48 PM
> Subject: Re: [LTP] [PATCH] cgroups/cgroup_regression_test: fix sporadic failures
> On Tue, Apr 19, 2011 at 6:27 AM, Jan Stancek <jstancek@redhat.com>
> wrote:
> >
> > There were failures caused by incomplete cleanup,
> > leaving groups behind after some stress tests.
> > Some stress tests failed to complete upon receiving SIGUSR1.
> >
> > 1. dmesg can rotate and number of found bugs can actually go down
> > clear the buffer before test to avoid this
> >
> > 2. test_5: test should mount 2 subsystems, but mount command
> > says "$subsys" instead of "$subsys2"
> >
> > 3. test_6: test may leave groups behind, fix rmdir
> > to match test_6_1.sh
> >
> > 4. test_7_2: mounts whole cgroup not $subsys
> >
> > 5. test_10: can leave cgroups umounted before cleanup
> > make sure cgroups are mounted before doing cleanup
> >
> > 6. test_*.sh scripts use trap in loop, which may cause bash
> > to miss signal, see
> > https://bugzilla.redhat.com/show_bug.cgi?id=695656
> > move trap outside loop to avoid it
> 
> I personally don't have a lot of context into cgroups, but when is
> it acceptable for Linux to send SIGUSR1 when mounting, unmounting, or
> removing cgroup directories?

The main test spawns couple of workers, which run infinite loop and stress
test some area. SIGUSR1 was chosen by author of test to stop these workers
after certain amount of time.

The signal only controls workers, it is not directly related to any
cgroup functionality AFAIK.

Unfortunetly, when resetting "trap" in bash, signal is ignored for
short period of time, which occasionally hangs the whole test.

> Thanks,
> -Garrett

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LTP] [PATCH] cgroups/cgroup_regression_test: fix sporadic failures
  2011-04-19 16:31   ` Jan Stancek
@ 2011-04-19 17:40     ` Garrett Cooper
  2011-04-19 18:00       ` Jan Stancek
  0 siblings, 1 reply; 7+ messages in thread
From: Garrett Cooper @ 2011-04-19 17:40 UTC (permalink / raw)
  To: Jan Stancek; +Cc: ltp-list

On Tue, Apr 19, 2011 at 9:31 AM, Jan Stancek <jstancek@redhat.com> wrote:
>
>
> ----- Original Message -----
>> From: "Garrett Cooper" <yanegomi@gmail.com>
>> To: "Jan Stancek" <jstancek@redhat.com>
>> Cc: ltp-list@lists.sourceforge.net
>> Sent: Tuesday, April 19, 2011 6:13:48 PM
>> Subject: Re: [LTP] [PATCH] cgroups/cgroup_regression_test: fix sporadic failures
>> On Tue, Apr 19, 2011 at 6:27 AM, Jan Stancek <jstancek@redhat.com>
>> wrote:
>> >
>> > There were failures caused by incomplete cleanup,
>> > leaving groups behind after some stress tests.
>> > Some stress tests failed to complete upon receiving SIGUSR1.
>> >
>> > 1. dmesg can rotate and number of found bugs can actually go down
>> > clear the buffer before test to avoid this
>> >
>> > 2. test_5: test should mount 2 subsystems, but mount command
>> > says "$subsys" instead of "$subsys2"
>> >
>> > 3. test_6: test may leave groups behind, fix rmdir
>> > to match test_6_1.sh
>> >
>> > 4. test_7_2: mounts whole cgroup not $subsys
>> >
>> > 5. test_10: can leave cgroups umounted before cleanup
>> > make sure cgroups are mounted before doing cleanup
>> >
>> > 6. test_*.sh scripts use trap in loop, which may cause bash
>> > to miss signal, see
>> > https://bugzilla.redhat.com/show_bug.cgi?id=695656
>> > move trap outside loop to avoid it
>>
>> I personally don't have a lot of context into cgroups, but when is
>> it acceptable for Linux to send SIGUSR1 when mounting, unmounting, or
>> removing cgroup directories?
>
> The main test spawns couple of workers, which run infinite loop and stress
> test some area. SIGUSR1 was chosen by author of test to stop these workers
> after certain amount of time.
>
> The signal only controls workers, it is not directly related to any
> cgroup functionality AFAIK.
>
> Unfortunetly, when resetting "trap" in bash, signal is ignored for
> short period of time, which occasionally hangs the whole test.

    That just sounds like a cop-out for fixing a bug in bash. Unless
the item is documented in bash and/or the POSIX spec prior to that
bug, I would just push back on the devs until they fix the shell.
    Setting signal handlers in a synchronous fashion isn't rocket science.
Thanks,
-Garrett

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LTP] [PATCH] cgroups/cgroup_regression_test: fix sporadic failures
  2011-04-19 17:40     ` Garrett Cooper
@ 2011-04-19 18:00       ` Jan Stancek
  2011-04-19 22:04         ` Garrett Cooper
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Stancek @ 2011-04-19 18:00 UTC (permalink / raw)
  To: Garrett Cooper; +Cc: ltp-list



----- Original Message -----
> From: "Garrett Cooper" <yanegomi@gmail.com>
> To: "Jan Stancek" <jstancek@redhat.com>
> Cc: ltp-list@lists.sourceforge.net
> Sent: Tuesday, April 19, 2011 7:40:46 PM
> Subject: Re: [LTP] [PATCH] cgroups/cgroup_regression_test: fix sporadic failures
> On Tue, Apr 19, 2011 at 9:31 AM, Jan Stancek <jstancek@redhat.com>
> wrote:
> >
> >
> > ----- Original Message -----
> >> From: "Garrett Cooper" <yanegomi@gmail.com>
> >> To: "Jan Stancek" <jstancek@redhat.com>
> >> Cc: ltp-list@lists.sourceforge.net
> >> Sent: Tuesday, April 19, 2011 6:13:48 PM
> >> Subject: Re: [LTP] [PATCH] cgroups/cgroup_regression_test: fix
> >> sporadic failures
> >> On Tue, Apr 19, 2011 at 6:27 AM, Jan Stancek <jstancek@redhat.com>
> >> wrote:
> >> >
> >> > There were failures caused by incomplete cleanup,
> >> > leaving groups behind after some stress tests.
> >> > Some stress tests failed to complete upon receiving SIGUSR1.
> >> >
> >> > 1. dmesg can rotate and number of found bugs can actually go down
> >> > clear the buffer before test to avoid this
> >> >
> >> > 2. test_5: test should mount 2 subsystems, but mount command
> >> > says "$subsys" instead of "$subsys2"
> >> >
> >> > 3. test_6: test may leave groups behind, fix rmdir
> >> > to match test_6_1.sh
> >> >
> >> > 4. test_7_2: mounts whole cgroup not $subsys
> >> >
> >> > 5. test_10: can leave cgroups umounted before cleanup
> >> > make sure cgroups are mounted before doing cleanup
> >> >
> >> > 6. test_*.sh scripts use trap in loop, which may cause bash
> >> > to miss signal, see
> >> > https://bugzilla.redhat.com/show_bug.cgi?id=695656
> >> > move trap outside loop to avoid it
> >>
> >> I personally don't have a lot of context into cgroups, but when is
> >> it acceptable for Linux to send SIGUSR1 when mounting, unmounting,
> >> or
> >> removing cgroup directories?
> >
> > The main test spawns couple of workers, which run infinite loop and
> > stress
> > test some area. SIGUSR1 was chosen by author of test to stop these
> > workers
> > after certain amount of time.
> >
> > The signal only controls workers, it is not directly related to any
> > cgroup functionality AFAIK.
> >
> > Unfortunetly, when resetting "trap" in bash, signal is ignored for
> > short period of time, which occasionally hangs the whole test.
> 
> That just sounds like a cop-out for fixing a bug in bash. Unless
> the item is documented in bash and/or the POSIX spec prior to that
> bug, I would just push back on the devs until they fix the shell.
> Setting signal handlers in a synchronous fashion isn't rocket science.
> Thanks,
> -Garrett

I am trying to push them :-). If you look at bz, maintainer is trying
to get things moving upstream:
http://www.mail-archive.com/bug-bash@gnu.org/msg09099.html

But at the same time, it seems pointless for test to keep resetting
signal handler in busy loop, unless it is a bash stress test. 

One way or another, bash folks will deal with the issue: fix it or
document it. Avoiding this problem by moving trap out of loop allows
people to use test also on older versions.

Or as alternative, I can put in extra "kill -SIGTERM", so even
when SIGUSR1 gets lost, test will be able to progress.

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LTP] [PATCH] cgroups/cgroup_regression_test: fix sporadic failures
  2011-04-19 18:00       ` Jan Stancek
@ 2011-04-19 22:04         ` Garrett Cooper
  2011-04-20  6:11           ` Garrett Cooper
  0 siblings, 1 reply; 7+ messages in thread
From: Garrett Cooper @ 2011-04-19 22:04 UTC (permalink / raw)
  To: Jan Stancek; +Cc: ltp-list

On Tue, Apr 19, 2011 at 11:00 AM, Jan Stancek <jstancek@redhat.com> wrote:
>
>
> ----- Original Message -----
>> From: "Garrett Cooper" <yanegomi@gmail.com>
>> To: "Jan Stancek" <jstancek@redhat.com>
>> Cc: ltp-list@lists.sourceforge.net
>> Sent: Tuesday, April 19, 2011 7:40:46 PM
>> Subject: Re: [LTP] [PATCH] cgroups/cgroup_regression_test: fix sporadic failures
>> On Tue, Apr 19, 2011 at 9:31 AM, Jan Stancek <jstancek@redhat.com>
>> wrote:
>> >
>> >
>> > ----- Original Message -----
>> >> From: "Garrett Cooper" <yanegomi@gmail.com>
>> >> To: "Jan Stancek" <jstancek@redhat.com>
>> >> Cc: ltp-list@lists.sourceforge.net
>> >> Sent: Tuesday, April 19, 2011 6:13:48 PM
>> >> Subject: Re: [LTP] [PATCH] cgroups/cgroup_regression_test: fix
>> >> sporadic failures
>> >> On Tue, Apr 19, 2011 at 6:27 AM, Jan Stancek <jstancek@redhat.com>
>> >> wrote:
>> >> >
>> >> > There were failures caused by incomplete cleanup,
>> >> > leaving groups behind after some stress tests.
>> >> > Some stress tests failed to complete upon receiving SIGUSR1.
>> >> >
>> >> > 1. dmesg can rotate and number of found bugs can actually go down
>> >> > clear the buffer before test to avoid this
>> >> >
>> >> > 2. test_5: test should mount 2 subsystems, but mount command
>> >> > says "$subsys" instead of "$subsys2"
>> >> >
>> >> > 3. test_6: test may leave groups behind, fix rmdir
>> >> > to match test_6_1.sh
>> >> >
>> >> > 4. test_7_2: mounts whole cgroup not $subsys
>> >> >
>> >> > 5. test_10: can leave cgroups umounted before cleanup
>> >> > make sure cgroups are mounted before doing cleanup
>> >> >
>> >> > 6. test_*.sh scripts use trap in loop, which may cause bash
>> >> > to miss signal, see
>> >> > https://bugzilla.redhat.com/show_bug.cgi?id=695656
>> >> > move trap outside loop to avoid it
>> >>
>> >> I personally don't have a lot of context into cgroups, but when is
>> >> it acceptable for Linux to send SIGUSR1 when mounting, unmounting,
>> >> or
>> >> removing cgroup directories?
>> >
>> > The main test spawns couple of workers, which run infinite loop and
>> > stress
>> > test some area. SIGUSR1 was chosen by author of test to stop these
>> > workers
>> > after certain amount of time.
>> >
>> > The signal only controls workers, it is not directly related to any
>> > cgroup functionality AFAIK.
>> >
>> > Unfortunetly, when resetting "trap" in bash, signal is ignored for
>> > short period of time, which occasionally hangs the whole test.
>>
>> That just sounds like a cop-out for fixing a bug in bash. Unless
>> the item is documented in bash and/or the POSIX spec prior to that
>> bug, I would just push back on the devs until they fix the shell.
>> Setting signal handlers in a synchronous fashion isn't rocket science.
>> Thanks,
>> -Garrett
>
> I am trying to push them :-). If you look at bz, maintainer is trying
> to get things moving upstream:
> http://www.mail-archive.com/bug-bash@gnu.org/msg09099.html
>
> But at the same time, it seems pointless for test to keep resetting
> signal handler in busy loop, unless it is a bash stress test.
>
> One way or another, bash folks will deal with the issue: fix it or
> document it. Avoiding this problem by moving trap out of loop allows
> people to use test also on older versions.
>
> Or as alternative, I can put in extra "kill -SIGTERM", so even
> when SIGUSR1 gets lost, test will be able to progress.

    Sure. My concern is that there could be other unintended behavior
that crops up because the signal handler is being setup once now
instead of each and every loop. But I also understand your plight...
    FWIW it would be nice to move away from SIGUSR1/SIGUSR2 because I
know people who have hacked the Linux kernel and init in the past to
pass 'special messages' / trigger asynchronous systemwide events with
these signals. Granted, I think they're morons for doing so as
SIGUSR1/SIGUSR2 are general purpose user-defined signals with certain
semantic meaning (in particular dealing with legacy shell and init
behavior), but I was QA at the time and had no real say in what
'design'/hacks they employed to get software out the door.
Thanks,
-Garrett

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LTP] [PATCH] cgroups/cgroup_regression_test: fix sporadic failures
  2011-04-19 22:04         ` Garrett Cooper
@ 2011-04-20  6:11           ` Garrett Cooper
  0 siblings, 0 replies; 7+ messages in thread
From: Garrett Cooper @ 2011-04-20  6:11 UTC (permalink / raw)
  To: Jan Stancek; +Cc: ltp-list

On Tue, Apr 19, 2011 at 3:04 PM, Garrett Cooper <yanegomi@gmail.com> wrote:
> On Tue, Apr 19, 2011 at 11:00 AM, Jan Stancek <jstancek@redhat.com> wrote:
>>
>>
>> ----- Original Message -----
>>> From: "Garrett Cooper" <yanegomi@gmail.com>
>>> To: "Jan Stancek" <jstancek@redhat.com>
>>> Cc: ltp-list@lists.sourceforge.net
>>> Sent: Tuesday, April 19, 2011 7:40:46 PM
>>> Subject: Re: [LTP] [PATCH] cgroups/cgroup_regression_test: fix sporadic failures
>>> On Tue, Apr 19, 2011 at 9:31 AM, Jan Stancek <jstancek@redhat.com>
>>> wrote:
>>> >
>>> >
>>> > ----- Original Message -----
>>> >> From: "Garrett Cooper" <yanegomi@gmail.com>
>>> >> To: "Jan Stancek" <jstancek@redhat.com>
>>> >> Cc: ltp-list@lists.sourceforge.net
>>> >> Sent: Tuesday, April 19, 2011 6:13:48 PM
>>> >> Subject: Re: [LTP] [PATCH] cgroups/cgroup_regression_test: fix
>>> >> sporadic failures
>>> >> On Tue, Apr 19, 2011 at 6:27 AM, Jan Stancek <jstancek@redhat.com>
>>> >> wrote:
>>> >> >
>>> >> > There were failures caused by incomplete cleanup,
>>> >> > leaving groups behind after some stress tests.
>>> >> > Some stress tests failed to complete upon receiving SIGUSR1.
>>> >> >
>>> >> > 1. dmesg can rotate and number of found bugs can actually go down
>>> >> > clear the buffer before test to avoid this
>>> >> >
>>> >> > 2. test_5: test should mount 2 subsystems, but mount command
>>> >> > says "$subsys" instead of "$subsys2"
>>> >> >
>>> >> > 3. test_6: test may leave groups behind, fix rmdir
>>> >> > to match test_6_1.sh
>>> >> >
>>> >> > 4. test_7_2: mounts whole cgroup not $subsys
>>> >> >
>>> >> > 5. test_10: can leave cgroups umounted before cleanup
>>> >> > make sure cgroups are mounted before doing cleanup
>>> >> >
>>> >> > 6. test_*.sh scripts use trap in loop, which may cause bash
>>> >> > to miss signal, see
>>> >> > https://bugzilla.redhat.com/show_bug.cgi?id=695656
>>> >> > move trap outside loop to avoid it
>>> >>
>>> >> I personally don't have a lot of context into cgroups, but when is
>>> >> it acceptable for Linux to send SIGUSR1 when mounting, unmounting,
>>> >> or
>>> >> removing cgroup directories?
>>> >
>>> > The main test spawns couple of workers, which run infinite loop and
>>> > stress
>>> > test some area. SIGUSR1 was chosen by author of test to stop these
>>> > workers
>>> > after certain amount of time.
>>> >
>>> > The signal only controls workers, it is not directly related to any
>>> > cgroup functionality AFAIK.
>>> >
>>> > Unfortunetly, when resetting "trap" in bash, signal is ignored for
>>> > short period of time, which occasionally hangs the whole test.
>>>
>>> That just sounds like a cop-out for fixing a bug in bash. Unless
>>> the item is documented in bash and/or the POSIX spec prior to that
>>> bug, I would just push back on the devs until they fix the shell.
>>> Setting signal handlers in a synchronous fashion isn't rocket science.
>>> Thanks,
>>> -Garrett
>>
>> I am trying to push them :-). If you look at bz, maintainer is trying
>> to get things moving upstream:
>> http://www.mail-archive.com/bug-bash@gnu.org/msg09099.html
>>
>> But at the same time, it seems pointless for test to keep resetting
>> signal handler in busy loop, unless it is a bash stress test.
>>
>> One way or another, bash folks will deal with the issue: fix it or
>> document it. Avoiding this problem by moving trap out of loop allows
>> people to use test also on older versions.
>>
>> Or as alternative, I can put in extra "kill -SIGTERM", so even
>> when SIGUSR1 gets lost, test will be able to progress.
>
>    Sure. My concern is that there could be other unintended behavior
> that crops up because the signal handler is being setup once now
> instead of each and every loop. But I also understand your plight...
>    FWIW it would be nice to move away from SIGUSR1/SIGUSR2 because I
> know people who have hacked the Linux kernel and init in the past to
> pass 'special messages' / trigger asynchronous systemwide events with
> these signals. Granted, I think they're morons for doing so as
> SIGUSR1/SIGUSR2 are general purpose user-defined signals with certain
> semantic meaning (in particular dealing with legacy shell and init
> behavior), but I was QA at the time and had no real say in what
> 'design'/hacks they employed to get software out the door.

Patches don't apply.
Thanks,
-Garrett

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-04-20  6:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-19 13:27 [LTP] [PATCH] cgroups/cgroup_regression_test: fix sporadic failures Jan Stancek
2011-04-19 16:13 ` Garrett Cooper
2011-04-19 16:31   ` Jan Stancek
2011-04-19 17:40     ` Garrett Cooper
2011-04-19 18:00       ` Jan Stancek
2011-04-19 22:04         ` Garrett Cooper
2011-04-20  6:11           ` Garrett Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.