All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] fstests: fix a major test setup performance regression
@ 2022-05-20  1:23 Dave Chinner
  2022-05-20  1:23 ` [PATCH 1/2] fstests: tests should not source common/rc directly Dave Chinner
  2022-05-20  1:23 ` [PATCH 2/2] common/rc: fix test init performance regression on XFS Dave Chinner
  0 siblings, 2 replies; 7+ messages in thread
From: Dave Chinner @ 2022-05-20  1:23 UTC (permalink / raw)
  To: fstests

Hi folks,

I've been seeing symptoms of this issue described in patch 2 for
most of this merge cycle, and it's slowly been getting worse, to
the point where I had to fix it to be able to get any amount of
testing done. Turns out we're doing something that is not smart
with XFS test setup, and recent changes to mkfs.xfs caused that
to be exposed. These patches fix it.

The first patch is a cleanup - no test should be including common/rc
directly anymore - _being_fstest() does that now. The second patch
fixes the XFS issues in init_rc() which is called from _begin_fstest
when is sources common/rc....

Cheers,

Dave.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] fstests: tests should not source common/rc directly
  2022-05-20  1:23 [PATCH 0/2] fstests: fix a major test setup performance regression Dave Chinner
@ 2022-05-20  1:23 ` Dave Chinner
  2022-05-20  1:59   ` Darrick J. Wong
  2022-05-20 11:40   ` Christian Brauner
  2022-05-20  1:23 ` [PATCH 2/2] common/rc: fix test init performance regression on XFS Dave Chinner
  1 sibling, 2 replies; 7+ messages in thread
From: Dave Chinner @ 2022-05-20  1:23 UTC (permalink / raw)
  To: fstests

From: Dave Chinner <dchinner@redhat.com>

_begin_fstest() does this as every test needs it, so remove it from
all the tests that source it a second time.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 tests/btrfs/243   | 1 -
 tests/btrfs/245   | 1 -
 tests/ext4/053    | 1 -
 tests/generic/644 | 1 -
 tests/generic/645 | 1 -
 tests/generic/647 | 1 -
 tests/generic/677 | 1 -
 tests/generic/678 | 1 -
 tests/generic/679 | 1 -
 tests/generic/690 | 1 -
 10 files changed, 10 deletions(-)

diff --git a/tests/btrfs/243 b/tests/btrfs/243
index 750b4323..09a63bcd 100755
--- a/tests/btrfs/243
+++ b/tests/btrfs/243
@@ -20,7 +20,6 @@ _cleanup()
 	rm -r -f $tmp.*
 }
 
-. ./common/rc
 . ./common/filter
 . ./common/dmflakey
 
diff --git a/tests/btrfs/245 b/tests/btrfs/245
index dadc8492..426aab1c 100755
--- a/tests/btrfs/245
+++ b/tests/btrfs/245
@@ -11,7 +11,6 @@
 _begin_fstest auto quick idmapped subvol
 
 # get standard environment, filters and checks
-. ./common/rc
 . ./common/filter
 
 # real QA test starts here
diff --git a/tests/ext4/053 b/tests/ext4/053
index bf6e3f6b..187a2515 100755
--- a/tests/ext4/053
+++ b/tests/ext4/053
@@ -28,7 +28,6 @@ _cleanup()
 }
 
 # get standard environment, filters and checks
-. ./common/rc
 . ./common/filter
 . ./common/quota
 
diff --git a/tests/generic/644 b/tests/generic/644
index edf9b03e..c48338d8 100755
--- a/tests/generic/644
+++ b/tests/generic/644
@@ -10,7 +10,6 @@
 _begin_fstest auto quick cap idmapped mount
 
 # get standard environment, filters and checks
-. ./common/rc
 . ./common/filter
 
 # real QA test starts here
diff --git a/tests/generic/645 b/tests/generic/645
index 74e0f589..556d2450 100755
--- a/tests/generic/645
+++ b/tests/generic/645
@@ -10,7 +10,6 @@
 _begin_fstest auto quick idmapped mount
 
 # get standard environment, filters and checks
-. ./common/rc
 . ./common/filter
 
 # real QA test starts here
diff --git a/tests/generic/647 b/tests/generic/647
index fa6edadc..8484fa8d 100755
--- a/tests/generic/647
+++ b/tests/generic/647
@@ -18,7 +18,6 @@ _cleanup()
 }
 
 # get standard environment, filters and checks
-. ./common/rc
 . ./common/filter
 
 # real QA test starts here
diff --git a/tests/generic/677 b/tests/generic/677
index 1d4eaa53..39af90a9 100755
--- a/tests/generic/677
+++ b/tests/generic/677
@@ -18,7 +18,6 @@ _cleanup()
 	rm -r -f $tmp.*
 }
 
-. ./common/rc
 . ./common/filter
 . ./common/dmflakey
 . ./common/punch
diff --git a/tests/generic/678 b/tests/generic/678
index 1c3cf6e3..3c3c05b7 100755
--- a/tests/generic/678
+++ b/tests/generic/678
@@ -22,7 +22,6 @@ _cleanup()
 }
 
 # get standard environment, filters and checks
-. ./common/rc
 . ./common/filter
 
 # real QA test starts here
diff --git a/tests/generic/679 b/tests/generic/679
index c32d42b9..a0094e48 100755
--- a/tests/generic/679
+++ b/tests/generic/679
@@ -11,7 +11,6 @@
 . ./common/preamble
 _begin_fstest auto quick prealloc
 
-. ./common/rc
 . ./common/filter
 . ./common/punch
 
diff --git a/tests/generic/690 b/tests/generic/690
index f03295a5..cef8d6e8 100755
--- a/tests/generic/690
+++ b/tests/generic/690
@@ -24,7 +24,6 @@ _cleanup()
 	rm -r -f $tmp.*
 }
 
-. ./common/rc
 . ./common/filter
 . ./common/dmflakey
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] common/rc: fix test init performance regression on XFS
  2022-05-20  1:23 [PATCH 0/2] fstests: fix a major test setup performance regression Dave Chinner
  2022-05-20  1:23 ` [PATCH 1/2] fstests: tests should not source common/rc directly Dave Chinner
@ 2022-05-20  1:23 ` Dave Chinner
  2022-05-20  2:00   ` Darrick J. Wong
  2022-05-20  4:42   ` Zorro Lang
  1 sibling, 2 replies; 7+ messages in thread
From: Dave Chinner @ 2022-05-20  1:23 UTC (permalink / raw)
  To: fstests

From: Dave Chinner <dchinner@redhat.com>

I've been having problems over the past month with tests randomly
running much slower than they should. The overall effect has been
that running an auto group iterate has blown out from about 3
hours to almost 5 hours.

The problem has been that the root disks have been thrashing.
Thousands of iops and the backing store getting IO bound so all test
vms with root drive images hosted on that backing store then also
go slow.

I finally got to the bottom of it - it looked for all the world like
a qemu image format write amplification problem, because I couldn't
capture anything unusual from inside the guest. I converted all
the root images to raw format on fast SSD storage, and the problem
didn't go away - while some tests were faster, other tests were
still randomly slow.

Finally I tracked it down with blktrace/blkparse. All the IO was
coming from processes with the command line like:

 /sbin/mkfs.xfs -d file name=/tmp/791409.img size=512m

A quick grep then pointed me at common/rc::init_rc() where there is
a test for XFS specific on-disk format behaviour that uses that
exact command line.

init_rc() is run from _begin_fstest() so it is executed on every
test start. We run mkfs to create a 512MB filesystem in $tmp on
every test start. That's ..... not smart. And we do it so we can
change the setup of $XFS_COPY_PROG, which is only used in a handful
of tests. That's also .... not smart.

And then the penny dropped: we recently increased the default log
size of small filesystems, so this is now zeroing a 64MB log instead
of a 5-10MB log. And, of course it's on the root drive because of
the $tmp image prefix, which in this case is ext3, and it's
fragmenting the crap out of the writes and so it increases runtime
of the mkfs.xfs execution from nothing to 5-10s.

Then when one of these mkfs invocations co-incides with journal
flushing driven writeback, everything just stops until the writeback
stops thrashing. And so all tests are 5-10s slows, and random tests
take anywhere between 20-100s longer to start up....

Fix it by moving the setup of XFS_COPY_PROG to _require_xfs_copy()
so only the tests that use xfs_copy run this code.

Numbers don't lie:

generic/001 11s ...  6s
generic/002 9s ...  1s
generic/003 17s ...  10s
generic/004 12s ...  1s
generic/005 9s ...  1s
generic/006 9s ...  2s
generic/007 11s ...  3s
generic/008 5s ...  2s
generic/009 10s ...  2s
generic/010 5s ...  0s
generic/011 9s ...  1s
generic/012 7s ...  2s
generic/013 10s ...  5s
generic/014 11s ...  1s
generic/015 7s ...  1s
generic/016 7s ...  2s
.....

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 common/rc  | 13 -------------
 common/xfs | 12 ++++++++++++
 2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/common/rc b/common/rc
index f5ead044..67b4cc77 100644
--- a/common/rc
+++ b/common/rc
@@ -4522,19 +4522,6 @@ init_rc()
 	# it is supported.
 	$XFS_IO_PROG -i -c quit 2>/dev/null && \
 		export XFS_IO_PROG="$XFS_IO_PROG -i"
-
-	# xfs_copy on v5 filesystems do not require the "-d" option if xfs_db
-	# can change the UUID on v5 filesystems
-	if [ "$FSTYP" == "xfs" ]; then
-		touch /tmp/$$.img
-		$MKFS_XFS_PROG -d file,name=/tmp/$$.img,size=512m >/dev/null 2>&1
-		# xfs_db will return 0 even if it can't generate a new uuid, so
-		# check the output to make sure if it can change UUID of V5 xfs
-		$XFS_DB_PROG -x -c "uuid generate" /tmp/$$.img \
-			| grep -q "invalid UUID\|supported on V5 fs" \
-			&& export XFS_COPY_PROG="$XFS_COPY_PROG -d"
-		rm -f /tmp/$$.img
-	fi
 }
 
 # get real device path name by following link
diff --git a/common/xfs b/common/xfs
index ac1d021e..2123a4ab 100644
--- a/common/xfs
+++ b/common/xfs
@@ -1160,6 +1160,18 @@ _require_xfs_copy()
 	[ -n "$XFS_COPY_PROG" ] || _notrun "xfs_copy binary not yet installed"
 	[ "$USE_EXTERNAL" = yes ] && \
 		_notrun "Cannot xfs_copy with external devices"
+
+	# xfs_copy on v5 filesystems do not require the "-d" option if xfs_db
+	# can change the UUID on v5 filesystems
+	touch /tmp/$$.img
+	$MKFS_XFS_PROG -d file,name=/tmp/$$.img,size=64m >/dev/null 2>&1
+
+	# xfs_db will return 0 even if it can't generate a new uuid, so
+	# check the output to make sure if it can change UUID of V5 xfs
+	$XFS_DB_PROG -x -c "uuid generate" /tmp/$$.img \
+		| grep -q "invalid UUID\|supported on V5 fs" \
+		&& export XFS_COPY_PROG="$XFS_COPY_PROG -d"
+	rm -f /tmp/$$.img
 }
 
 __xfs_cowgc_interval_knob1="/proc/sys/fs/xfs/speculative_cow_prealloc_lifetime"
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] fstests: tests should not source common/rc directly
  2022-05-20  1:23 ` [PATCH 1/2] fstests: tests should not source common/rc directly Dave Chinner
@ 2022-05-20  1:59   ` Darrick J. Wong
  2022-05-20 11:40   ` Christian Brauner
  1 sibling, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2022-05-20  1:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: fstests

On Fri, May 20, 2022 at 11:23:35AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> _begin_fstest() does this as every test needs it, so remove it from
> all the tests that source it a second time.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks good,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  tests/btrfs/243   | 1 -
>  tests/btrfs/245   | 1 -
>  tests/ext4/053    | 1 -
>  tests/generic/644 | 1 -
>  tests/generic/645 | 1 -
>  tests/generic/647 | 1 -
>  tests/generic/677 | 1 -
>  tests/generic/678 | 1 -
>  tests/generic/679 | 1 -
>  tests/generic/690 | 1 -
>  10 files changed, 10 deletions(-)
> 
> diff --git a/tests/btrfs/243 b/tests/btrfs/243
> index 750b4323..09a63bcd 100755
> --- a/tests/btrfs/243
> +++ b/tests/btrfs/243
> @@ -20,7 +20,6 @@ _cleanup()
>  	rm -r -f $tmp.*
>  }
>  
> -. ./common/rc
>  . ./common/filter
>  . ./common/dmflakey
>  
> diff --git a/tests/btrfs/245 b/tests/btrfs/245
> index dadc8492..426aab1c 100755
> --- a/tests/btrfs/245
> +++ b/tests/btrfs/245
> @@ -11,7 +11,6 @@
>  _begin_fstest auto quick idmapped subvol
>  
>  # get standard environment, filters and checks
> -. ./common/rc
>  . ./common/filter
>  
>  # real QA test starts here
> diff --git a/tests/ext4/053 b/tests/ext4/053
> index bf6e3f6b..187a2515 100755
> --- a/tests/ext4/053
> +++ b/tests/ext4/053
> @@ -28,7 +28,6 @@ _cleanup()
>  }
>  
>  # get standard environment, filters and checks
> -. ./common/rc
>  . ./common/filter
>  . ./common/quota
>  
> diff --git a/tests/generic/644 b/tests/generic/644
> index edf9b03e..c48338d8 100755
> --- a/tests/generic/644
> +++ b/tests/generic/644
> @@ -10,7 +10,6 @@
>  _begin_fstest auto quick cap idmapped mount
>  
>  # get standard environment, filters and checks
> -. ./common/rc
>  . ./common/filter
>  
>  # real QA test starts here
> diff --git a/tests/generic/645 b/tests/generic/645
> index 74e0f589..556d2450 100755
> --- a/tests/generic/645
> +++ b/tests/generic/645
> @@ -10,7 +10,6 @@
>  _begin_fstest auto quick idmapped mount
>  
>  # get standard environment, filters and checks
> -. ./common/rc
>  . ./common/filter
>  
>  # real QA test starts here
> diff --git a/tests/generic/647 b/tests/generic/647
> index fa6edadc..8484fa8d 100755
> --- a/tests/generic/647
> +++ b/tests/generic/647
> @@ -18,7 +18,6 @@ _cleanup()
>  }
>  
>  # get standard environment, filters and checks
> -. ./common/rc
>  . ./common/filter
>  
>  # real QA test starts here
> diff --git a/tests/generic/677 b/tests/generic/677
> index 1d4eaa53..39af90a9 100755
> --- a/tests/generic/677
> +++ b/tests/generic/677
> @@ -18,7 +18,6 @@ _cleanup()
>  	rm -r -f $tmp.*
>  }
>  
> -. ./common/rc
>  . ./common/filter
>  . ./common/dmflakey
>  . ./common/punch
> diff --git a/tests/generic/678 b/tests/generic/678
> index 1c3cf6e3..3c3c05b7 100755
> --- a/tests/generic/678
> +++ b/tests/generic/678
> @@ -22,7 +22,6 @@ _cleanup()
>  }
>  
>  # get standard environment, filters and checks
> -. ./common/rc
>  . ./common/filter
>  
>  # real QA test starts here
> diff --git a/tests/generic/679 b/tests/generic/679
> index c32d42b9..a0094e48 100755
> --- a/tests/generic/679
> +++ b/tests/generic/679
> @@ -11,7 +11,6 @@
>  . ./common/preamble
>  _begin_fstest auto quick prealloc
>  
> -. ./common/rc
>  . ./common/filter
>  . ./common/punch
>  
> diff --git a/tests/generic/690 b/tests/generic/690
> index f03295a5..cef8d6e8 100755
> --- a/tests/generic/690
> +++ b/tests/generic/690
> @@ -24,7 +24,6 @@ _cleanup()
>  	rm -r -f $tmp.*
>  }
>  
> -. ./common/rc
>  . ./common/filter
>  . ./common/dmflakey
>  
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] common/rc: fix test init performance regression on XFS
  2022-05-20  1:23 ` [PATCH 2/2] common/rc: fix test init performance regression on XFS Dave Chinner
@ 2022-05-20  2:00   ` Darrick J. Wong
  2022-05-20  4:42   ` Zorro Lang
  1 sibling, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2022-05-20  2:00 UTC (permalink / raw)
  To: Dave Chinner; +Cc: fstests

On Fri, May 20, 2022 at 11:23:36AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> I've been having problems over the past month with tests randomly
> running much slower than they should. The overall effect has been
> that running an auto group iterate has blown out from about 3
> hours to almost 5 hours.
> 
> The problem has been that the root disks have been thrashing.
> Thousands of iops and the backing store getting IO bound so all test
> vms with root drive images hosted on that backing store then also
> go slow.
> 
> I finally got to the bottom of it - it looked for all the world like
> a qemu image format write amplification problem, because I couldn't
> capture anything unusual from inside the guest. I converted all
> the root images to raw format on fast SSD storage, and the problem
> didn't go away - while some tests were faster, other tests were
> still randomly slow.
> 
> Finally I tracked it down with blktrace/blkparse. All the IO was
> coming from processes with the command line like:
> 
>  /sbin/mkfs.xfs -d file name=/tmp/791409.img size=512m
> 
> A quick grep then pointed me at common/rc::init_rc() where there is
> a test for XFS specific on-disk format behaviour that uses that
> exact command line.
> 
> init_rc() is run from _begin_fstest() so it is executed on every
> test start. We run mkfs to create a 512MB filesystem in $tmp on
> every test start. That's ..... not smart. And we do it so we can
> change the setup of $XFS_COPY_PROG, which is only used in a handful
> of tests. That's also .... not smart.
> 
> And then the penny dropped: we recently increased the default log
> size of small filesystems, so this is now zeroing a 64MB log instead
> of a 5-10MB log. And, of course it's on the root drive because of
> the $tmp image prefix, which in this case is ext3, and it's
> fragmenting the crap out of the writes and so it increases runtime
> of the mkfs.xfs execution from nothing to 5-10s.
> 
> Then when one of these mkfs invocations co-incides with journal
> flushing driven writeback, everything just stops until the writeback
> stops thrashing. And so all tests are 5-10s slows, and random tests
> take anywhere between 20-100s longer to start up....
> 
> Fix it by moving the setup of XFS_COPY_PROG to _require_xfs_copy()
> so only the tests that use xfs_copy run this code.
> 
> Numbers don't lie:
> 
> generic/001 11s ...  6s
> generic/002 9s ...  1s
> generic/003 17s ...  10s
> generic/004 12s ...  1s
> generic/005 9s ...  1s
> generic/006 9s ...  2s
> generic/007 11s ...  3s
> generic/008 5s ...  2s
> generic/009 10s ...  2s
> generic/010 5s ...  0s
> generic/011 9s ...  1s
> generic/012 7s ...  2s
> generic/013 10s ...  5s
> generic/014 11s ...  1s
> generic/015 7s ...  1s
> generic/016 7s ...  2s
> .....
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Ah, nice.  I hadn't even realized that was there!
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  common/rc  | 13 -------------
>  common/xfs | 12 ++++++++++++
>  2 files changed, 12 insertions(+), 13 deletions(-)
> 
> diff --git a/common/rc b/common/rc
> index f5ead044..67b4cc77 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -4522,19 +4522,6 @@ init_rc()
>  	# it is supported.
>  	$XFS_IO_PROG -i -c quit 2>/dev/null && \
>  		export XFS_IO_PROG="$XFS_IO_PROG -i"
> -
> -	# xfs_copy on v5 filesystems do not require the "-d" option if xfs_db
> -	# can change the UUID on v5 filesystems
> -	if [ "$FSTYP" == "xfs" ]; then
> -		touch /tmp/$$.img
> -		$MKFS_XFS_PROG -d file,name=/tmp/$$.img,size=512m >/dev/null 2>&1
> -		# xfs_db will return 0 even if it can't generate a new uuid, so
> -		# check the output to make sure if it can change UUID of V5 xfs
> -		$XFS_DB_PROG -x -c "uuid generate" /tmp/$$.img \
> -			| grep -q "invalid UUID\|supported on V5 fs" \
> -			&& export XFS_COPY_PROG="$XFS_COPY_PROG -d"
> -		rm -f /tmp/$$.img
> -	fi
>  }
>  
>  # get real device path name by following link
> diff --git a/common/xfs b/common/xfs
> index ac1d021e..2123a4ab 100644
> --- a/common/xfs
> +++ b/common/xfs
> @@ -1160,6 +1160,18 @@ _require_xfs_copy()
>  	[ -n "$XFS_COPY_PROG" ] || _notrun "xfs_copy binary not yet installed"
>  	[ "$USE_EXTERNAL" = yes ] && \
>  		_notrun "Cannot xfs_copy with external devices"
> +
> +	# xfs_copy on v5 filesystems do not require the "-d" option if xfs_db
> +	# can change the UUID on v5 filesystems
> +	touch /tmp/$$.img
> +	$MKFS_XFS_PROG -d file,name=/tmp/$$.img,size=64m >/dev/null 2>&1
> +
> +	# xfs_db will return 0 even if it can't generate a new uuid, so
> +	# check the output to make sure if it can change UUID of V5 xfs
> +	$XFS_DB_PROG -x -c "uuid generate" /tmp/$$.img \
> +		| grep -q "invalid UUID\|supported on V5 fs" \
> +		&& export XFS_COPY_PROG="$XFS_COPY_PROG -d"
> +	rm -f /tmp/$$.img
>  }
>  
>  __xfs_cowgc_interval_knob1="/proc/sys/fs/xfs/speculative_cow_prealloc_lifetime"
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] common/rc: fix test init performance regression on XFS
  2022-05-20  1:23 ` [PATCH 2/2] common/rc: fix test init performance regression on XFS Dave Chinner
  2022-05-20  2:00   ` Darrick J. Wong
@ 2022-05-20  4:42   ` Zorro Lang
  1 sibling, 0 replies; 7+ messages in thread
From: Zorro Lang @ 2022-05-20  4:42 UTC (permalink / raw)
  To: Dave Chinner; +Cc: fstests

On Fri, May 20, 2022 at 11:23:36AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> I've been having problems over the past month with tests randomly
> running much slower than they should. The overall effect has been
> that running an auto group iterate has blown out from about 3
> hours to almost 5 hours.
> 
> The problem has been that the root disks have been thrashing.
> Thousands of iops and the backing store getting IO bound so all test
> vms with root drive images hosted on that backing store then also
> go slow.
> 
> I finally got to the bottom of it - it looked for all the world like
> a qemu image format write amplification problem, because I couldn't
> capture anything unusual from inside the guest. I converted all
> the root images to raw format on fast SSD storage, and the problem
> didn't go away - while some tests were faster, other tests were
> still randomly slow.
> 
> Finally I tracked it down with blktrace/blkparse. All the IO was
> coming from processes with the command line like:
> 
>  /sbin/mkfs.xfs -d file name=/tmp/791409.img size=512m
> 
> A quick grep then pointed me at common/rc::init_rc() where there is
> a test for XFS specific on-disk format behaviour that uses that
> exact command line.
> 
> init_rc() is run from _begin_fstest() so it is executed on every
> test start. We run mkfs to create a 512MB filesystem in $tmp on
> every test start. That's ..... not smart. And we do it so we can
> change the setup of $XFS_COPY_PROG, which is only used in a handful
> of tests. That's also .... not smart.
> 
> And then the penny dropped: we recently increased the default log
> size of small filesystems, so this is now zeroing a 64MB log instead
> of a 5-10MB log. And, of course it's on the root drive because of
> the $tmp image prefix, which in this case is ext3, and it's
> fragmenting the crap out of the writes and so it increases runtime
> of the mkfs.xfs execution from nothing to 5-10s.
> 
> Then when one of these mkfs invocations co-incides with journal
> flushing driven writeback, everything just stops until the writeback
> stops thrashing. And so all tests are 5-10s slows, and random tests
> take anywhere between 20-100s longer to start up....
> 
> Fix it by moving the setup of XFS_COPY_PROG to _require_xfs_copy()
> so only the tests that use xfs_copy run this code.
> 
> Numbers don't lie:
> 
> generic/001 11s ...  6s
> generic/002 9s ...  1s
> generic/003 17s ...  10s
> generic/004 12s ...  1s
> generic/005 9s ...  1s
> generic/006 9s ...  2s
> generic/007 11s ...  3s
> generic/008 5s ...  2s
> generic/009 10s ...  2s
> generic/010 5s ...  0s
> generic/011 9s ...  1s
> generic/012 7s ...  2s
> generic/013 10s ...  5s
> generic/014 11s ...  1s
> generic/015 7s ...  1s
> generic/016 7s ...  2s
> .....
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Nice catch! I just checked all cases use XFS_COPY_PROG, they all contains
_require_xfs_copy at beginning, so I think this change is good. I'll give
it more test before pushing.

That remind me better to check the common code which runs before each case
real starting. And carefully review patches might add more operations at
each case beginning. Anyway, thanks for this performance improvement.

Reviewed-by: Zorro Lang <zlang@redhat.com>

Thanks,
Zorro

>  common/rc  | 13 -------------
>  common/xfs | 12 ++++++++++++
>  2 files changed, 12 insertions(+), 13 deletions(-)
> 
> diff --git a/common/rc b/common/rc
> index f5ead044..67b4cc77 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -4522,19 +4522,6 @@ init_rc()
>  	# it is supported.
>  	$XFS_IO_PROG -i -c quit 2>/dev/null && \
>  		export XFS_IO_PROG="$XFS_IO_PROG -i"
> -
> -	# xfs_copy on v5 filesystems do not require the "-d" option if xfs_db
> -	# can change the UUID on v5 filesystems
> -	if [ "$FSTYP" == "xfs" ]; then
> -		touch /tmp/$$.img
> -		$MKFS_XFS_PROG -d file,name=/tmp/$$.img,size=512m >/dev/null 2>&1
> -		# xfs_db will return 0 even if it can't generate a new uuid, so
> -		# check the output to make sure if it can change UUID of V5 xfs
> -		$XFS_DB_PROG -x -c "uuid generate" /tmp/$$.img \
> -			| grep -q "invalid UUID\|supported on V5 fs" \
> -			&& export XFS_COPY_PROG="$XFS_COPY_PROG -d"
> -		rm -f /tmp/$$.img
> -	fi
>  }
>  
>  # get real device path name by following link
> diff --git a/common/xfs b/common/xfs
> index ac1d021e..2123a4ab 100644
> --- a/common/xfs
> +++ b/common/xfs
> @@ -1160,6 +1160,18 @@ _require_xfs_copy()
>  	[ -n "$XFS_COPY_PROG" ] || _notrun "xfs_copy binary not yet installed"
>  	[ "$USE_EXTERNAL" = yes ] && \
>  		_notrun "Cannot xfs_copy with external devices"
> +
> +	# xfs_copy on v5 filesystems do not require the "-d" option if xfs_db
> +	# can change the UUID on v5 filesystems
> +	touch /tmp/$$.img
> +	$MKFS_XFS_PROG -d file,name=/tmp/$$.img,size=64m >/dev/null 2>&1
> +
> +	# xfs_db will return 0 even if it can't generate a new uuid, so
> +	# check the output to make sure if it can change UUID of V5 xfs
> +	$XFS_DB_PROG -x -c "uuid generate" /tmp/$$.img \
> +		| grep -q "invalid UUID\|supported on V5 fs" \
> +		&& export XFS_COPY_PROG="$XFS_COPY_PROG -d"
> +	rm -f /tmp/$$.img
>  }
>  
>  __xfs_cowgc_interval_knob1="/proc/sys/fs/xfs/speculative_cow_prealloc_lifetime"
> -- 
> 2.35.1
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] fstests: tests should not source common/rc directly
  2022-05-20  1:23 ` [PATCH 1/2] fstests: tests should not source common/rc directly Dave Chinner
  2022-05-20  1:59   ` Darrick J. Wong
@ 2022-05-20 11:40   ` Christian Brauner
  1 sibling, 0 replies; 7+ messages in thread
From: Christian Brauner @ 2022-05-20 11:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: fstests

On Fri, May 20, 2022 at 11:23:35AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> _begin_fstest() does this as every test needs it, so remove it from
> all the tests that source it a second time.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Looks good to me,
Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-05-20 11:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-20  1:23 [PATCH 0/2] fstests: fix a major test setup performance regression Dave Chinner
2022-05-20  1:23 ` [PATCH 1/2] fstests: tests should not source common/rc directly Dave Chinner
2022-05-20  1:59   ` Darrick J. Wong
2022-05-20 11:40   ` Christian Brauner
2022-05-20  1:23 ` [PATCH 2/2] common/rc: fix test init performance regression on XFS Dave Chinner
2022-05-20  2:00   ` Darrick J. Wong
2022-05-20  4:42   ` Zorro Lang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.