linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] update 068 to reproduce an unfreeze hanging up problem
@ 2011-12-13  0:42 Masayoshi MIZUMA
  2011-12-13  6:32 ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Masayoshi MIZUMA @ 2011-12-13  0:42 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs, linux-ext4, Dave Chinner

update 068 to reproduce an unfreeze hanging up problem which is unfreeze
function, thaw_super(), sometimes hangs up if flush kernel thread does
writeback to the same filesystem concurrently.
The problem occurs on ext4 and ext3. They are reported at
ext4:
http://marc.info/?l=linux-ext4&m=132339590004560&w=2
ext3:
http://marc.info/?l=linux-ext4&m=131536612113658&w=2

This test runs freeze/unfreeze under heavy load. If the problem is
reproduced, this test will hang up because "xfs_freeze -u" hangs up...

Signed-off-by: Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com>
---
 068     |   17 +++----
 068.out |  160 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 167 insertions(+), 10 deletions(-)

diff --git a/068 b/068
index 5d0053d..3b3597d 100755
--- a/068
+++ b/068
@@ -32,7 +32,7 @@ here=`pwd`
 tmp=/tmp/$$
 status=0	# success is the default!
 
-ITERATIONS=10
+ITERATIONS=50
 
 _cleanup()
 {
@@ -51,7 +51,7 @@ trap "_cleanup" 0 1 2 3 15
 . ./common.filter
 
 # real QA test starts here
-_supported_fs xfs
+_supported_fs ext3 ext4 xfs
 _supported_os Linux IRIX
 
 _require_scratch
@@ -62,7 +62,7 @@ rm -f $seq.full
 umount $SCRATCH_DEV >/dev/null 2>&1
 echo "*** MKFS ***"                         >>$seq.full
 echo ""                                     >>$seq.full
-_scratch_mkfs_xfs                           >>$seq.full 2>&1 \
+_scratch_mkfs                               >>$seq.full 2>&1 \
     || _fail "mkfs failed"
 _scratch_mount                              >>$seq.full 2>&1 \
     || _fail "mount failed"
@@ -75,12 +75,11 @@ touch $tmp.running
     STRESS_DIR="$SCRATCH_MNT/fsstress_test_dir"
     mkdir "$STRESS_DIR"
 
-    procs=2
-    nops=200
+    procs=100
+    nops=1000
     while [ -f "$tmp.running" ]
-      do
-      #	-w ensures that the only ops are ones which cause write I/O
-      $FSSTRESS_PROG -d $STRESS_DIR -w -p $procs -n $nops $FSSTRESS_AVOID \
+    do
+      $FSSTRESS_PROG -d $STRESS_DIR -p $procs -n $nops $FSSTRESS_AVOID \
 	  > /dev/null 2>&1
     done
 
@@ -99,13 +98,11 @@ do
 	xfs_freeze -f "$SCRATCH_MNT" | tee -a $seq.full
 	[ $? != 0 ] && echo xfs_freeze -f "$SCRATCH_MNT" failed | \
 	    tee -a $seq.full
-	sleep 2
 
 	echo "*** thawing  \$SCRATCH_MNT" | tee -a $seq.full
 	xfs_freeze -u "$SCRATCH_MNT" | tee -a $seq.full
 	[ $? != 0 ] && echo xfs_freeze -u "$SCRATCH_MNT" failed | \
 	    tee -a $seq.full
-	sleep 2
 
 	echo  | tee -a $seq.full
 	let i=$i+1
diff --git a/068.out b/068.out
index 363d0e9..11eb58d 100644
--- a/068.out
+++ b/068.out
@@ -41,3 +41,163 @@ QA output created by 068
 *** freezing $SCRATCH_MNT
 *** thawing  $SCRATCH_MNT
 
+*** iteration: 10
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 11
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 12
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 13
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 14
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 15
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 16
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 17
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 18
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 19
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 20
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 21
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 22
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 23
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 24
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 25
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 26
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 27
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 28
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 29
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 30
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 31
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 32
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 33
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 34
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 35
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 36
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 37
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 38
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 39
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 40
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 41
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 42
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 43
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 44
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 45
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 46
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 47
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 48
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
+*** iteration: 49
+*** freezing $SCRATCH_MNT
+*** thawing  $SCRATCH_MNT
+
-- 
1.7.1

Thanks,
Masayoshi Mizuma



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] update 068 to reproduce an unfreeze hanging up problem
  2011-12-13  0:42 [PATCH] update 068 to reproduce an unfreeze hanging up problem Masayoshi MIZUMA
@ 2011-12-13  6:32 ` Dave Chinner
  2011-12-14  2:22   ` Masayoshi MIZUMA
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2011-12-13  6:32 UTC (permalink / raw)
  To: Masayoshi MIZUMA; +Cc: Christoph Hellwig, linux-fsdevel, xfs, linux-ext4

On Tue, Dec 13, 2011 at 09:42:46AM +0900, Masayoshi MIZUMA wrote:
> update 068 to reproduce an unfreeze hanging up problem which is unfreeze
> function, thaw_super(), sometimes hangs up if flush kernel thread does
> writeback to the same filesystem concurrently.

That's not exactly what I asked to be done when I reviewed the
original patch -  I asked you to "make 068 generic" because it
already exercises freeze/thaw under a stressful workload. What I
expected was a change to "supported_fs" and the scratch mkfs
command so it will run on all filesystems.

test 068 will catch problems like the one your specific test
catches, but maybe not every time. Test 068 will catch problems your
test case won't, though - it's a trade-off between having lots of
tests that are similar but slightly different (difficult to
maintain, increases runtime, etc), and having one test that
exercises the functionality in a simple manner likely to detect
problems.

Test 068 is likely to detect problems because it:

	a) allows fsstress to try to do stuff while the filesystem
	is frozen by waiting a short time before thawing, hence load
	processes can get stuck either during the freeze of once the
	freeze is complete. Without that window, we are much less
	likely to test opeations on a frozen filesystem.

	b) allows more dirty data/metadata to build up between
	thaw/freeze commands, rather than running them as quickly as
	possible. This means freeze has more work to do, extenting
	the different phases of the freeze, making it more likely we
	have processes hit in different phases and hence test
	different parts of the freeze process.

IOWs, test 068 gives good coverage across most aspects of
freezing/thawing filesystems under load - and a lot of that woul dbe
lost by changing the test to mimic the ext4 specific test case you
have. It will still be able to trigger the problem you are trying to
expose, but it also has a much better chance of triggering problems
at different points of the freeze/thaw lifecycle that your specific
test....

> The problem occurs on ext4 and ext3. They are reported at
> ext4:
> http://marc.info/?l=linux-ext4&m=132339590004560&w=2
> ext3:
> http://marc.info/?l=linux-ext4&m=131536612113658&w=2
> 
> This test runs freeze/unfreeze under heavy load. If the problem is
> reproduced, this test will hang up because "xfs_freeze -u" hangs up...

> -ITERATIONS=10
> +ITERATIONS=50

....

> -    procs=2
> -    nops=200
> +    procs=100
> +    nops=1000

>      while [ -f "$tmp.running" ]
> -      do
> -      #	-w ensures that the only ops are ones which cause write I/O
> -      $FSSTRESS_PROG -d $STRESS_DIR -w -p $procs -n $nops $FSSTRESS_AVOID \
> +    do
> +      $FSSTRESS_PROG -d $STRESS_DIR -p $procs -n $nops $FSSTRESS_AVOID \
>  	  > /dev/null 2>&1

And this is one of those cases - it is the write operations
that are the ones that cause trouble for freeze/thaw, so changing
the test to use read operations simply reduces the stress that is
being put on the filesytem freeze...

Also, you don't need lots of processes and ops to keep the filesystems
busy while freeze/thaw cycles are going on - if fsstress completes,
it simply gets started again. Hence it doesn't need to be configured
to run for a long time by ramping up processes and opcount. Yes, the
proc count could probably be increased a bit to increase the freeze
load, but i don't think that will improve the test all that much...

> @@ -99,13 +98,11 @@ do
>  	xfs_freeze -f "$SCRATCH_MNT" | tee -a $seq.full
>  	[ $? != 0 ] && echo xfs_freeze -f "$SCRATCH_MNT" failed | \
>  	    tee -a $seq.full
> -	sleep 2

And this simulates typical freeze/do something/thaw cycles. It also
allows fsstress to execute operations while the filesytem is frozen
and potentially try to grab things like the superblock lock because
fsstress issued a sync() system call. Dropping the sleep makes the
test less likely to find problems....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] update 068 to reproduce an unfreeze hanging up problem
  2011-12-13  6:32 ` Dave Chinner
@ 2011-12-14  2:22   ` Masayoshi MIZUMA
  2012-01-04 14:59     ` Christoph Hellwig
  2012-01-04 18:42     ` 068: run on more filesystems Christoph Hellwig
  0 siblings, 2 replies; 5+ messages in thread
From: Masayoshi MIZUMA @ 2011-12-14  2:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, xfs, linux-ext4, Christoph Hellwig


(2011/12/13 15:32), Dave Chinner wrote:

> On Tue, Dec 13, 2011 at 09:42:46AM +0900, Masayoshi MIZUMA wrote:
> > update 068 to reproduce an unfreeze hanging up problem which is unfreeze
> > function, thaw_super(), sometimes hangs up if flush kernel thread does
> > writeback to the same filesystem concurrently.
> 
> That's not exactly what I asked to be done when I reviewed the
> original patch -  I asked you to "make 068 generic" because it
> already exercises freeze/thaw under a stressful workload. What I
> expected was a change to "supported_fs" and the scratch mkfs
> command so it will run on all filesystems.
> 
> test 068 will catch problems like the one your specific test
> catches, but maybe not every time. Test 068 will catch problems your
> test case won't, though - it's a trade-off between having lots of
> tests that are similar but slightly different (difficult to
> maintain, increases runtime, etc), and having one test that
> exercises the functionality in a simple manner likely to detect
> problems.

Thank you for your explanation about the policy and I understand it.

(snip)
 
> > @@ -99,13 +98,11 @@ do
> >  	xfs_freeze -f "$SCRATCH_MNT" | tee -a $seq.full
> >  	[ $? != 0 ] && echo xfs_freeze -f "$SCRATCH_MNT" failed | \
> >  	    tee -a $seq.full
> > -	sleep 2
> 
> And this simulates typical freeze/do something/thaw cycles. It also
> allows fsstress to execute operations while the filesytem is frozen
> and potentially try to grab things like the superblock lock because
> fsstress issued a sync() system call. Dropping the sleep makes the
> test less likely to find problems....

I tried to reproduce the problem not dropping the sleep, but the problem was
not reproduced... Therefore, I dropped it and the problem was reproduced.

However, as you mentioned, the problem is a timing proglem, so the
my reproduction might be just by chance. Dropping sleep may increase
the possibility of the reproduction, but not every time, so the change
is not good. That is same for the arguments of fsstress which I changed.

OK, I update 068 just to run other filesystem, ext3, ext4 and btrfs which
I confirmed xfs_freeze works on.
(xfs_freeze may work on the other filesystems which have freeze_fs/unfreeze_fs
 super_operations but I don't confirm...)

The patch is below.
-----------------------------------------------------------
update 068 to run other filesystems, ext3, ext4 and btrfs because
xfs_freeze works on the filesystems.

---
 068 |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/068 b/068
index 5d0053d..6f08f18 100755
--- a/068
+++ b/068
@@ -51,7 +51,7 @@ trap "_cleanup" 0 1 2 3 15
 . ./common.filter
 
 # real QA test starts here
-_supported_fs xfs
+_supported_fs btrfs ext3 ext4 xfs
 _supported_os Linux IRIX
 
 _require_scratch
@@ -62,7 +62,7 @@ rm -f $seq.full
 umount $SCRATCH_DEV >/dev/null 2>&1
 echo "*** MKFS ***"                         >>$seq.full
 echo ""                                     >>$seq.full
-_scratch_mkfs_xfs                           >>$seq.full 2>&1 \
+_scratch_mkfs                               >>$seq.full 2>&1 \
     || _fail "mkfs failed"
 _scratch_mount                              >>$seq.full 2>&1 \
     || _fail "mount failed"
-- 
1.7.1

Thanks,
Masayoshi Mizuma


> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com




^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] update 068 to reproduce an unfreeze hanging up problem
  2011-12-14  2:22   ` Masayoshi MIZUMA
@ 2012-01-04 14:59     ` Christoph Hellwig
  2012-01-04 18:42     ` 068: run on more filesystems Christoph Hellwig
  1 sibling, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2012-01-04 14:59 UTC (permalink / raw)
  To: Masayoshi MIZUMA
  Cc: Dave Chinner, linux-fsdevel, Christoph Hellwig, linux-ext4, xfs

On Wed, Dec 14, 2011 at 11:22:10AM +0900, Masayoshi MIZUMA wrote:
> > test 068 will catch problems like the one your specific test
> > catches, but maybe not every time. Test 068 will catch problems your
> > test case won't, though - it's a trade-off between having lots of
> > tests that are similar but slightly different (difficult to
> > maintain, increases runtime, etc), and having one test that
> > exercises the functionality in a simple manner likely to detect
> > problems.
> 
> Thank you for your explanation about the policy and I understand it.


> I tried to reproduce the problem not dropping the sleep, but the problem was
> not reproduced... Therefore, I dropped it and the problem was reproduced.
> 
> However, as you mentioned, the problem is a timing proglem, so the
> my reproduction might be just by chance. Dropping sleep may increase
> the possibility of the reproduction, but not every time, so the change
> is not good. That is same for the arguments of fsstress which I changed.
> 
> OK, I update 068 just to run other filesystem, ext3, ext4 and btrfs which
> I confirmed xfs_freeze works on.
> (xfs_freeze may work on the other filesystems which have freeze_fs/unfreeze_fs
>  super_operations but I don't confirm...)
> 
> The patch is below.

Given that MIZUMAs patch reproduces a real life issue I think adding
his original patch in addition to this change would be a good idea.

Dave, do you have a strong opinion against that?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 068: run on more filesystems
  2011-12-14  2:22   ` Masayoshi MIZUMA
  2012-01-04 14:59     ` Christoph Hellwig
@ 2012-01-04 18:42     ` Christoph Hellwig
  1 sibling, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2012-01-04 18:42 UTC (permalink / raw)
  To: Masayoshi MIZUMA
  Cc: Dave Chinner, linux-fsdevel, Christoph Hellwig, linux-ext4, xfs

Thanks, applied.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-01-04 18:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-13  0:42 [PATCH] update 068 to reproduce an unfreeze hanging up problem Masayoshi MIZUMA
2011-12-13  6:32 ` Dave Chinner
2011-12-14  2:22   ` Masayoshi MIZUMA
2012-01-04 14:59     ` Christoph Hellwig
2012-01-04 18:42     ` 068: run on more filesystems Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).