linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] xfs: fix I_DONTCACHE
@ 2021-08-24  2:32 Dave Chinner
  2021-08-24  2:58 ` Darrick J. Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dave Chinner @ 2021-08-24  2:32 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Yup, the VFS hoist broke it, and nobody noticed. Bulkstat workloads
make it clear that it doesn't work as it should.

Fixes: dae2f8ed7992 ("fs: Lift XFS_IDONTCACHE to the VFS layer")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_icache.c | 3 ++-
 fs/xfs/xfs_iops.c   | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index a3fe4c5307d3..f2210d927481 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -84,8 +84,9 @@ xfs_inode_alloc(
 		return NULL;
 	}
 
-	/* VFS doesn't initialise i_mode! */
+	/* VFS doesn't initialise i_mode or i_state! */
 	VFS_I(ip)->i_mode = 0;
+	VFS_I(ip)->i_state = 0;
 
 	XFS_STATS_INC(mp, vn_active);
 	ASSERT(atomic_read(&ip->i_pincount) == 0);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 0ff0cca94092..a607d6aca5c4 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1344,7 +1344,7 @@ xfs_setup_inode(
 	gfp_t			gfp_mask;
 
 	inode->i_ino = ip->i_ino;
-	inode->i_state = I_NEW;
+	inode->i_state |= I_NEW;
 
 	inode_sb_list_add(inode);
 	/* make the inode look hashed for the writeback code */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: fix I_DONTCACHE
  2021-08-24  2:32 [PATCH] xfs: fix I_DONTCACHE Dave Chinner
@ 2021-08-24  2:58 ` Darrick J. Wong
  2021-08-24  5:04   ` Dave Chinner
  2021-08-25 23:07 ` [RFC PATCH] xfs: test DONTCACHE behavior with the inode cache Darrick J. Wong
  2021-08-25 23:09 ` [PATCH] xfs: fix I_DONTCACHE Darrick J. Wong
  2 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2021-08-24  2:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Aug 24, 2021 at 12:32:08PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Yup, the VFS hoist broke it, and nobody noticed. Bulkstat workloads
> make it clear that it doesn't work as it should.

Is there an easy way to test the dontcache behavior so that we don't
screw this up again?

/me's brain is fried, will study this in more detail in the morning.

--D

> Fixes: dae2f8ed7992 ("fs: Lift XFS_IDONTCACHE to the VFS layer")
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_icache.c | 3 ++-
>  fs/xfs/xfs_iops.c   | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index a3fe4c5307d3..f2210d927481 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -84,8 +84,9 @@ xfs_inode_alloc(
>  		return NULL;
>  	}
>  
> -	/* VFS doesn't initialise i_mode! */
> +	/* VFS doesn't initialise i_mode or i_state! */
>  	VFS_I(ip)->i_mode = 0;
> +	VFS_I(ip)->i_state = 0;
>  
>  	XFS_STATS_INC(mp, vn_active);
>  	ASSERT(atomic_read(&ip->i_pincount) == 0);
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 0ff0cca94092..a607d6aca5c4 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -1344,7 +1344,7 @@ xfs_setup_inode(
>  	gfp_t			gfp_mask;
>  
>  	inode->i_ino = ip->i_ino;
> -	inode->i_state = I_NEW;
> +	inode->i_state |= I_NEW;
>  
>  	inode_sb_list_add(inode);
>  	/* make the inode look hashed for the writeback code */
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: fix I_DONTCACHE
  2021-08-24  2:58 ` Darrick J. Wong
@ 2021-08-24  5:04   ` Dave Chinner
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2021-08-24  5:04 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Aug 23, 2021 at 07:58:41PM -0700, Darrick J. Wong wrote:
> On Tue, Aug 24, 2021 at 12:32:08PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Yup, the VFS hoist broke it, and nobody noticed. Bulkstat workloads
> > make it clear that it doesn't work as it should.
> 
> Is there an easy way to test the dontcache behavior so that we don't
> screw this up again?
> 
> /me's brain is fried, will study this in more detail in the morning.

Perhaps. We can measure how many xfs inodes are cached via the
filesystem stats e.g.

$ pminfo -t xfs.vnodes.active
xfs.vnodes.active [number of vnodes not on free lists]
$ sudo grep xfs_inode /proc/slabinfo | awk '{ print $2 }'
243440
$ pminfo -f xfs.vnodes.active

xfs.vnodes.active
    value 243166
$

And so we should be able to run a bulkstat from fstests on a
filesystem with a known amount of files in it and measure the number
of cached inodes before/after...

I noticed this because I recently re-added the threaded per-ag
bulkstat scan to my scalability workload (via the xfs_io bulkstat
command) after I dropped it ages ago because per-ag threading of
fstests::src/bulkstat.c was really messy. It appears nobody has
been paying attention to bulkstat memory usage (and therefore
I_DONTCACHE behaviour) for some time....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH] xfs: test DONTCACHE behavior with the inode cache
  2021-08-24  2:32 [PATCH] xfs: fix I_DONTCACHE Dave Chinner
  2021-08-24  2:58 ` Darrick J. Wong
@ 2021-08-25 23:07 ` Darrick J. Wong
  2021-08-26  0:47   ` Dave Chinner
  2021-08-25 23:09 ` [PATCH] xfs: fix I_DONTCACHE Darrick J. Wong
  2 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2021-08-25 23:07 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, fstests

From: Darrick J. Wong <djwong@kernel.org>

Basic testing that DONTCACHE affects the XFS inode cache in the manner
that we expect.  The only way we can do that (for XFS, anyway) is to
play around with the BULKSTAT ioctl.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/780     |  293 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/780.out |    7 +
 2 files changed, 300 insertions(+)
 create mode 100755 tests/xfs/780
 create mode 100644 tests/xfs/780.out

diff --git a/tests/xfs/780 b/tests/xfs/780
new file mode 100755
index 00000000..9bf1f482
--- /dev/null
+++ b/tests/xfs/780
@@ -0,0 +1,293 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2021 Oracle.  All Rights Reserved.
+#
+# FS QA Test 780
+#
+# Functional testing for the I_DONTCACHE inode flag, as set by the BULKSTAT
+# ioctl.  This flag neuters the inode cache's tendency to try to hang on to
+# incore inodes for a while after the last program closes the file, which
+# is helpful for filesystem scanners to avoid trashing the inode cache.
+#
+# However, the inode cache doesn't always honor the DONTCACHE behavior -- the
+# only time it really applies is to cache misses from a bulkstat scan.  If
+# any other threads accessed the inode before or immediately after the scan,
+# the DONTCACHE flag is ignored.  This includes other scans.
+#
+# Regrettably, there is no way to poke /only/ XFS inode reclamation directly,
+# so we're stuck with setting xfssyncd_centisecs to a low value and watching
+# the slab counters.
+#
+. ./common/preamble
+_begin_fstest auto ioctl
+
+_cleanup()
+{
+	cd /
+	rm -r -f $tmp.*
+	test -n "$junkdir" && rm -r -f "$junkdir"
+	test -n "$old_centisecs" && echo "$old_centisecs" > "$xfs_centisecs_file"
+}
+
+# Import common functions.
+# . ./common/filter
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs generic
+_require_test
+
+# Either of these need to be available to monitor slab usage
+xfs_ino_objcount_file=/sys/kernel/slab/xfs_inode/objects
+slabinfo_file=/proc/slabinfo
+if [ ! -r "$xfs_ino_objcount_file" ] && [ ! -r "$slabinfo_file" ]; then
+	_notrun "Cannot find xfs_inode slab count?"
+fi
+
+# Background reclamation of disused xfs inodes is scheduled for $xfssyncd
+# centiseconds after the first inode is tagged for reclamation.  It's not great
+# to encode this implementation detail in a test like this, but there isn't
+# any means to trigger *only* inode cache reclaim -- actual memory pressure
+# can trigger the VFS to drop non-DONTCACHE inodes, which is not what we want.
+xfs_centisecs_file=/proc/sys/fs/xfs/xfssyncd_centisecs
+test -w "$xfs_centisecs_file" || _notrun "Cannot find xfssyncd_centisecs?"
+
+# Set the syncd knob to the minimum value 100cs (aka 1s) 
+old_centisecs="$(cat "$xfs_centisecs_file")"
+echo 100 > "$xfs_centisecs_file" || _notrun "Cannot adjust xfssyncd_centisecs?"
+delay_centisecs="$(cat "$xfs_centisecs_file")"
+
+# Sleep one second more than the xfssyncd delay to give background inode
+# reclaim enough time to run.
+sleep_seconds=$(( ( (99 + delay_centisecs) / 100) + 1))
+
+count_xfs_inode_objs() {
+	if [ -r "$xfs_ino_objcount_file" ]; then
+		cat "$xfs_ino_objcount_file" | cut -d ' ' -f 1
+	elif [ -r "$slabinfo_file" ]; then
+		grep -w '^xfs_inode' "$slabinfo_file" | awk '{print $2}'
+	else
+		echo "ERROR"
+	fi
+}
+
+junkdir=$TEST_DIR/$seq.junk
+nr_cpus=$(getconf _NPROCESSORS_ONLN)
+
+# Sample the baseline count of cached inodes after a fresh remount.
+_test_cycle_mount
+baseline_count=$(count_xfs_inode_objs)
+
+# Create a junk directory with about a thousand files.
+nr_files=1024
+mkdir -p $junkdir
+for ((i = 0; i < nr_files; i++)); do
+	touch "$junkdir/$i"
+done
+new_files=$(find $junkdir | wc -l)
+echo "created $new_files files" >> $seqres.full
+test "$new_files" -gt "$nr_files" || \
+	echo "created $new_files files, expected $nr_files"
+
+# Sanity check: Make sure that all those new inodes are still in the cache.
+# We assume that memory limits are not so low that reclaim started for a bunch
+# of empty files.
+work_count=$(count_xfs_inode_objs)
+test "$work_count" -ge "$new_files" || \
+	echo "found $work_count cached inodes after creating $new_files files?"
+
+# Round 1: Check the DONTCACHE behavior when it is invoked once per inode.
+# The inodes should be reclaimed if we wait long enough.
+echo "Round 1"
+
+# Sample again to see if we're still within the baseline.
+_test_cycle_mount
+fresh_count=$(count_xfs_inode_objs)
+
+# Run bulkstat to exercise DONTCACHE behavior, and sample again.
+$here/src/bstat -q $junkdir
+post_count=$(count_xfs_inode_objs)
+
+# Let background reclaim run
+sleep $sleep_seconds
+end_count=$(count_xfs_inode_objs)
+
+# Even with our greatly reduced syncd value, the inodes should still be in
+# memory immediately after the second bulkstat concludes.
+test "$post_count" -ge "$new_files" || \
+	echo "found $post_count cached inodes after bulkstat $new_files files?"
+
+# After we've let memory reclaim run, the inodes should no longer be cached
+# in memory.
+test "$end_count" -le "$new_files" || \
+	echo "found $end_count cached inodes after letting $new_files DONTCACHE files expire?"
+
+# Dump full results for debugging
+cat >> $seqres.full << ENDL
+round1 baseline: $baseline_count
+work: $work_count
+fresh: $fresh_count
+post: $post_count
+end: $end_count
+ENDL
+
+# Round 2: Check the DONTCACHE behavior when it is invoked multiple times per
+# inode in rapid succession.  The inodes should remain in memory even after
+# reclaim because the cache gets wise to repeated scans.
+echo "Round 2"
+
+# Sample again to see if we're still within the baseline.
+_test_cycle_mount
+fresh_count=$(count_xfs_inode_objs)
+
+# Run bulkstat twice in rapid succession to exercise DONTCACHE behavior.
+# The first bulkstat run will bring the inodes into memory (marked DONTCACHE).
+# The second bulkstat causes cache hits before the inodes can reclaimed, which
+# means that they should stay in memory.  Sample again afterwards.
+$here/src/bstat -q $junkdir
+$here/src/bstat -q $junkdir
+post_count=$(count_xfs_inode_objs)
+
+# Let background reclaim run
+sleep $sleep_seconds
+end_count=$(count_xfs_inode_objs)
+
+# Even with our greatly reduced syncd value, the inodes should still be in
+# memory immediately after the second bulkstat concludes.
+test "$post_count" -ge "$new_files" || \
+	echo "found $post_count cached inodes after bulkstat $new_files files?"
+
+# After we've let memory reclaim run and cache hits happen, the inodes should
+# still be cached in memory.
+test "$end_count" -ge "$new_files" || \
+	echo "found $end_count cached inodes after letting $new_files DONTCACHE files expire?"
+
+# Dump full results for debugging
+cat >> $seqres.full << ENDL
+round2 baseline: $baseline_count
+work: $work_count
+fresh: $fresh_count
+post: $post_count
+end: $end_count
+ENDL
+
+# Round 3: Check that DONTCACHE results in inode evictions when it is invoked
+# multiple times per inode but there's enough time between each bulkstat for
+# reclaim to kill the inodes.
+echo "Round 3"
+
+# Sample again to see if we're still within the baseline.
+_test_cycle_mount
+fresh_count=$(count_xfs_inode_objs)
+
+# Run bulkstat twice in slow succession to exercise DONTCACHE behavior.
+# The first bulkstat run will bring the inodes into memory (marked DONTCACHE).
+# Pause long enough for reclaim to tear down the inodes so that the second
+# bulkstat brings them back into memory (again marked DONTCACHE).
+# Sample again afterwards.
+$here/src/bstat -q $junkdir
+sleep $sleep_seconds
+post_count=$(count_xfs_inode_objs)
+
+$here/src/bstat -q $junkdir
+sleep $sleep_seconds
+end_count=$(count_xfs_inode_objs)
+
+# Even with our greatly reduced syncd value, the inodes should still be in
+# memory immediately after the second bulkstat concludes.
+test "$post_count" -le "$new_files" || \
+	echo "found $post_count cached inodes after bulkstat $new_files files?"
+
+# After we've let memory reclaim run, the inodes should no longer be cached
+# in memory.
+test "$end_count" -le "$new_files" || \
+	echo "found $end_count cached inodes after letting $new_files DONTCACHE files expire?"
+
+# Dump full results for debugging
+cat >> $seqres.full << ENDL
+round3 baseline: $baseline_count
+work: $work_count
+fresh: $fresh_count
+post: $post_count
+end: $end_count
+ENDL
+
+# Round 4: Check that DONTCACHE doesn't do much when all the files are accessed
+# immediately after a bulkstat.
+echo "Round 4"
+
+# Sample again to see if we're still within the baseline.
+_test_cycle_mount
+fresh_count=$(count_xfs_inode_objs)
+
+# Run bulkstat and then cat every file in the junkdir so that the new inode
+# grabs will clear DONTCACHE off the inodes.  Sample again afterwards.
+$here/src/bstat -q $junkdir
+find $junkdir -type f -print0 | xargs -0 cat
+post_count=$(count_xfs_inode_objs)
+
+# Let background reclaim run
+sleep $sleep_seconds
+end_count=$(count_xfs_inode_objs)
+
+# Even with our greatly reduced syncd value, the inodes should still be in
+# memory immediately after the second bulkstat concludes.
+test "$post_count" -ge "$new_files" || \
+	echo "found $post_count cached inodes after bulkstat $new_files files?"
+
+# After we've let memory reclaim run, the inodes should stll be cached in
+# memory because we opened everything.
+test "$end_count" -ge "$new_files" || \
+	echo "found $end_count cached inodes after letting $new_files DONTCACHE files expire?"
+
+# Dump full results for debugging
+cat >> $seqres.full << ENDL
+round4 baseline: $baseline_count
+work: $work_count
+fresh: $fresh_count
+post: $post_count
+end: $end_count
+ENDL
+
+# Round 5: Check that DONTCACHE has no effect if the inodes were already in
+# cache due to reader programs.
+echo "Round 5"
+
+# Sample again to see if we're still within the baseline.
+_test_cycle_mount
+fresh_count=$(count_xfs_inode_objs)
+
+# cat every file in the junkdir and then run BULKSTAT so that the DONTCACHE
+# flags passed to iget by bulkstat will be ignored for already-cached inodes.
+# Sample again afterwards.
+find $junkdir -type f -print0 | xargs -0 cat
+$here/src/bstat -q $junkdir
+post_count=$(count_xfs_inode_objs)
+
+# Let background reclaim run
+sleep $sleep_seconds
+end_count=$(count_xfs_inode_objs)
+
+# Even with our greatly reduced syncd value, the inodes should still be in
+# memory immediately after the second bulkstat concludes.
+test "$post_count" -ge "$new_files" || \
+	echo "found $post_count cached inodes after bulkstat $new_files files?"
+
+# After we've let memory reclaim run, the inodes should stll be cached in
+# memory because we opened everything.
+test "$end_count" -ge "$new_files" || \
+	echo "found $end_count cached inodes after letting $new_files DONTCACHE files expire?"
+
+# Dump full results for debugging
+cat >> $seqres.full << ENDL
+round5 baseline: $baseline_count
+work: $work_count
+fresh: $fresh_count
+post: $post_count
+end: $end_count
+ENDL
+
+echo Silence is golden
+status=0
+exit
diff --git a/tests/xfs/780.out b/tests/xfs/780.out
new file mode 100644
index 00000000..7366678c
--- /dev/null
+++ b/tests/xfs/780.out
@@ -0,0 +1,7 @@
+QA output created by 780
+Round 1
+Round 2
+Round 3
+Round 4
+Round 5
+Silence is golden

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: fix I_DONTCACHE
  2021-08-24  2:32 [PATCH] xfs: fix I_DONTCACHE Dave Chinner
  2021-08-24  2:58 ` Darrick J. Wong
  2021-08-25 23:07 ` [RFC PATCH] xfs: test DONTCACHE behavior with the inode cache Darrick J. Wong
@ 2021-08-25 23:09 ` Darrick J. Wong
  2 siblings, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2021-08-25 23:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Aug 24, 2021 at 12:32:08PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Yup, the VFS hoist broke it, and nobody noticed. Bulkstat workloads
> make it clear that it doesn't work as it should.
> 
> Fixes: dae2f8ed7992 ("fs: Lift XFS_IDONTCACHE to the VFS layer")
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Assuming the RFC test that I just sent out looks reasonable to you, I
think I have a sufficient understanding of what DONTCACHE is supposed to
do to say:

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_icache.c | 3 ++-
>  fs/xfs/xfs_iops.c   | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index a3fe4c5307d3..f2210d927481 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -84,8 +84,9 @@ xfs_inode_alloc(
>  		return NULL;
>  	}
>  
> -	/* VFS doesn't initialise i_mode! */
> +	/* VFS doesn't initialise i_mode or i_state! */
>  	VFS_I(ip)->i_mode = 0;
> +	VFS_I(ip)->i_state = 0;
>  
>  	XFS_STATS_INC(mp, vn_active);
>  	ASSERT(atomic_read(&ip->i_pincount) == 0);
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 0ff0cca94092..a607d6aca5c4 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -1344,7 +1344,7 @@ xfs_setup_inode(
>  	gfp_t			gfp_mask;
>  
>  	inode->i_ino = ip->i_ino;
> -	inode->i_state = I_NEW;
> +	inode->i_state |= I_NEW;
>  
>  	inode_sb_list_add(inode);
>  	/* make the inode look hashed for the writeback code */
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] xfs: test DONTCACHE behavior with the inode cache
  2021-08-25 23:07 ` [RFC PATCH] xfs: test DONTCACHE behavior with the inode cache Darrick J. Wong
@ 2021-08-26  0:47   ` Dave Chinner
  2021-08-27  0:07     ` Darrick J. Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2021-08-26  0:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, fstests

On Wed, Aug 25, 2021 at 04:07:03PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Basic testing that DONTCACHE affects the XFS inode cache in the manner
> that we expect.  The only way we can do that (for XFS, anyway) is to
> play around with the BULKSTAT ioctl.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  tests/xfs/780     |  293 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/xfs/780.out |    7 +
>  2 files changed, 300 insertions(+)
>  create mode 100755 tests/xfs/780
>  create mode 100644 tests/xfs/780.out
> 
> diff --git a/tests/xfs/780 b/tests/xfs/780
> new file mode 100755
> index 00000000..9bf1f482
> --- /dev/null
> +++ b/tests/xfs/780
> @@ -0,0 +1,293 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2021 Oracle.  All Rights Reserved.
> +#
> +# FS QA Test 780
> +#
> +# Functional testing for the I_DONTCACHE inode flag, as set by the BULKSTAT
> +# ioctl.  This flag neuters the inode cache's tendency to try to hang on to
> +# incore inodes for a while after the last program closes the file, which
> +# is helpful for filesystem scanners to avoid trashing the inode cache.
> +#
> +# However, the inode cache doesn't always honor the DONTCACHE behavior -- the
> +# only time it really applies is to cache misses from a bulkstat scan.  If
> +# any other threads accessed the inode before or immediately after the scan,
> +# the DONTCACHE flag is ignored.  This includes other scans.
> +#
> +# Regrettably, there is no way to poke /only/ XFS inode reclamation directly,
> +# so we're stuck with setting xfssyncd_centisecs to a low value and watching
> +# the slab counters.
> +#
> +. ./common/preamble
> +_begin_fstest auto ioctl
> +
> +_cleanup()
> +{
> +	cd /
> +	rm -r -f $tmp.*
> +	test -n "$junkdir" && rm -r -f "$junkdir"
> +	test -n "$old_centisecs" && echo "$old_centisecs" > "$xfs_centisecs_file"
> +}
> +
> +# Import common functions.
> +# . ./common/filter
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs generic
> +_require_test
> +
> +# Either of these need to be available to monitor slab usage
> +xfs_ino_objcount_file=/sys/kernel/slab/xfs_inode/objects
> +slabinfo_file=/proc/slabinfo
> +if [ ! -r "$xfs_ino_objcount_file" ] && [ ! -r "$slabinfo_file" ]; then
> +	_notrun "Cannot find xfs_inode slab count?"
> +fi

WE should use the either /proc/fs/xfs/stat or
/sys/fs/xfs/<dev>/stats to get this information.

$ awk '/vnodes/ { print $2}' /proc/fs/xfs/stat
245626
$ awk '/vnodes/ { print $2}' /sys/fs/xfs/dm-0/stats/stats
245648

> +# Background reclamation of disused xfs inodes is scheduled for $xfssyncd
> +# centiseconds after the first inode is tagged for reclamation.  It's not great

Actually, the background inode reclaim period $((xfssyncd / 6)),
which means it is 5 seconds by default.

> +# to encode this implementation detail in a test like this, but there isn't
> +# any means to trigger *only* inode cache reclaim -- actual memory pressure
> +# can trigger the VFS to drop non-DONTCACHE inodes, which is not what we want.
> +xfs_centisecs_file=/proc/sys/fs/xfs/xfssyncd_centisecs
> +test -w "$xfs_centisecs_file" || _notrun "Cannot find xfssyncd_centisecs?"
> +
> +# Set the syncd knob to the minimum value 100cs (aka 1s) 
> +old_centisecs="$(cat "$xfs_centisecs_file")"
> +echo 100 > "$xfs_centisecs_file" || _notrun "Cannot adjust xfssyncd_centisecs?"
> +delay_centisecs="$(cat "$xfs_centisecs_file")"
> +
> +# Sleep one second more than the xfssyncd delay to give background inode
> +# reclaim enough time to run.
> +sleep_seconds=$(( ( (99 + delay_centisecs) / 100) + 1))

I hate centiseconds.

> +
> +count_xfs_inode_objs() {
> +	if [ -r "$xfs_ino_objcount_file" ]; then
> +		cat "$xfs_ino_objcount_file" | cut -d ' ' -f 1
> +	elif [ -r "$slabinfo_file" ]; then
> +		grep -w '^xfs_inode' "$slabinfo_file" | awk '{print $2}'
> +	else
> +		echo "ERROR"
> +	fi
> +}
> +
> +junkdir=$TEST_DIR/$seq.junk
> +nr_cpus=$(getconf _NPROCESSORS_ONLN)

This would probably be much easier using the scratch device and
/sys/fs/xfs/<SCRATCH_DEV>/stats/stats, because then the baseline is
effectively zero...

> +# Sample the baseline count of cached inodes after a fresh remount.
> +_test_cycle_mount
> +baseline_count=$(count_xfs_inode_objs)
> +
> +# Create a junk directory with about a thousand files.
> +nr_files=1024
> +mkdir -p $junkdir
> +for ((i = 0; i < nr_files; i++)); do
> +	touch "$junkdir/$i"
> +done
> +new_files=$(find $junkdir | wc -l)
> +echo "created $new_files files" >> $seqres.full
> +test "$new_files" -gt "$nr_files" || \
> +	echo "created $new_files files, expected $nr_files"
> +
> +# Sanity check: Make sure that all those new inodes are still in the cache.
> +# We assume that memory limits are not so low that reclaim started for a bunch
> +# of empty files.
> +work_count=$(count_xfs_inode_objs)
> +test "$work_count" -ge "$new_files" || \
> +	echo "found $work_count cached inodes after creating $new_files files?"

Might be better as:

_within_tolerance "Cached inode after creating new files" $work_count $new_files 5

> +
> +# Round 1: Check the DONTCACHE behavior when it is invoked once per inode.
> +# The inodes should be reclaimed if we wait long enough.
> +echo "Round 1"
> +
> +# Sample again to see if we're still within the baseline.
> +_test_cycle_mount
> +fresh_count=$(count_xfs_inode_objs)
> +
> +# Run bulkstat to exercise DONTCACHE behavior, and sample again.
> +$here/src/bstat -q $junkdir
> +post_count=$(count_xfs_inode_objs)

Can we use xfs_io -c "bulkstat" here?

I'd like to get rid of $here/src/bstat now that we have bulkstat
functionality in xfs_io....

> +
> +# Let background reclaim run
> +sleep $sleep_seconds
> +end_count=$(count_xfs_inode_objs)
> +
> +# Even with our greatly reduced syncd value, the inodes should still be in
> +# memory immediately after the second bulkstat concludes.
> +test "$post_count" -ge "$new_files" || \
> +	echo "found $post_count cached inodes after bulkstat $new_files files?"
> +
> +# After we've let memory reclaim run, the inodes should no longer be cached
> +# in memory.
> +test "$end_count" -le "$new_files" || \
> +	echo "found $end_count cached inodes after letting $new_files DONTCACHE files expire?"
> +
> +# Dump full results for debugging
> +cat >> $seqres.full << ENDL
> +round1 baseline: $baseline_count
> +work: $work_count
> +fresh: $fresh_count
> +post: $post_count
> +end: $end_count
> +ENDL

Wrap this in a function to reduce verbosity?

debug_output()
{
	cat >> $seqres.full << ENDL
round $1 baseline: $2
work: $3
fresh: $4
post: $5
end: $6
ENDL
}

debug_output 1 $baseline_count $work_count $fresh_count $post_count $end_count

> +
> +# Round 2: Check the DONTCACHE behavior when it is invoked multiple times per
> +# inode in rapid succession.  The inodes should remain in memory even after
> +# reclaim because the cache gets wise to repeated scans.
> +echo "Round 2"
> +
> +# Sample again to see if we're still within the baseline.
> +_test_cycle_mount
> +fresh_count=$(count_xfs_inode_objs)
> +
> +# Run bulkstat twice in rapid succession to exercise DONTCACHE behavior.
> +# The first bulkstat run will bring the inodes into memory (marked DONTCACHE).
> +# The second bulkstat causes cache hits before the inodes can reclaimed, which
> +# means that they should stay in memory.  Sample again afterwards.
> +$here/src/bstat -q $junkdir
> +$here/src/bstat -q $junkdir
> +post_count=$(count_xfs_inode_objs)
> +
> +# Let background reclaim run
> +sleep $sleep_seconds
> +end_count=$(count_xfs_inode_objs)
> +
> +# Even with our greatly reduced syncd value, the inodes should still be in
> +# memory immediately after the second bulkstat concludes.
> +test "$post_count" -ge "$new_files" || \
> +	echo "found $post_count cached inodes after bulkstat $new_files files?"
> +
> +# After we've let memory reclaim run and cache hits happen, the inodes should
> +# still be cached in memory.
> +test "$end_count" -ge "$new_files" || \
> +	echo "found $end_count cached inodes after letting $new_files DONTCACHE files expire?"
> +
> +# Dump full results for debugging
> +cat >> $seqres.full << ENDL
> +round2 baseline: $baseline_count
> +work: $work_count
> +fresh: $fresh_count
> +post: $post_count
> +end: $end_count
> +ENDL

I'm struggling to see what is being tested amongst all the comments.
Can you chop down the comments to a single "round X" comment per
test?

[...]

> +# Even with our greatly reduced syncd value, the inodes should still be in
> +# memory immediately after the second bulkstat concludes.
> +test "$post_count" -ge "$new_files" || \
> +	echo "found $post_count cached inodes after bulkstat $new_files files?"
> +
> +# After we've let memory reclaim run, the inodes should stll be cached in
> +# memory because we opened everything.
> +test "$end_count" -ge "$new_files" || \
> +	echo "found $end_count cached inodes after letting $new_files DONTCACHE files expire?"
> +
> +# Dump full results for debugging
> +cat >> $seqres.full << ENDL
> +round5 baseline: $baseline_count
> +work: $work_count
> +fresh: $fresh_count
> +post: $post_count
> +end: $end_count
> +ENDL
> +
> +echo Silence is golden

There is output on success, so no need for this.

But overall it looks like you've captured the behaviour that should
be occurring with bulkstat and DONTCACHE.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] xfs: test DONTCACHE behavior with the inode cache
  2021-08-26  0:47   ` Dave Chinner
@ 2021-08-27  0:07     ` Darrick J. Wong
  0 siblings, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2021-08-27  0:07 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, fstests

On Thu, Aug 26, 2021 at 10:47:47AM +1000, Dave Chinner wrote:
> On Wed, Aug 25, 2021 at 04:07:03PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Basic testing that DONTCACHE affects the XFS inode cache in the manner
> > that we expect.  The only way we can do that (for XFS, anyway) is to
> > play around with the BULKSTAT ioctl.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  tests/xfs/780     |  293 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  tests/xfs/780.out |    7 +
> >  2 files changed, 300 insertions(+)
> >  create mode 100755 tests/xfs/780
> >  create mode 100644 tests/xfs/780.out
> > 
> > diff --git a/tests/xfs/780 b/tests/xfs/780
> > new file mode 100755
> > index 00000000..9bf1f482
> > --- /dev/null
> > +++ b/tests/xfs/780
> > @@ -0,0 +1,293 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (c) 2021 Oracle.  All Rights Reserved.
> > +#
> > +# FS QA Test 780
> > +#
> > +# Functional testing for the I_DONTCACHE inode flag, as set by the BULKSTAT
> > +# ioctl.  This flag neuters the inode cache's tendency to try to hang on to
> > +# incore inodes for a while after the last program closes the file, which
> > +# is helpful for filesystem scanners to avoid trashing the inode cache.
> > +#
> > +# However, the inode cache doesn't always honor the DONTCACHE behavior -- the
> > +# only time it really applies is to cache misses from a bulkstat scan.  If
> > +# any other threads accessed the inode before or immediately after the scan,
> > +# the DONTCACHE flag is ignored.  This includes other scans.
> > +#
> > +# Regrettably, there is no way to poke /only/ XFS inode reclamation directly,
> > +# so we're stuck with setting xfssyncd_centisecs to a low value and watching
> > +# the slab counters.
> > +#
> > +. ./common/preamble
> > +_begin_fstest auto ioctl
> > +
> > +_cleanup()
> > +{
> > +	cd /
> > +	rm -r -f $tmp.*
> > +	test -n "$junkdir" && rm -r -f "$junkdir"
> > +	test -n "$old_centisecs" && echo "$old_centisecs" > "$xfs_centisecs_file"
> > +}
> > +
> > +# Import common functions.
> > +# . ./common/filter
> > +
> > +# real QA test starts here
> > +
> > +# Modify as appropriate.
> > +_supported_fs generic
> > +_require_test
> > +
> > +# Either of these need to be available to monitor slab usage
> > +xfs_ino_objcount_file=/sys/kernel/slab/xfs_inode/objects
> > +slabinfo_file=/proc/slabinfo
> > +if [ ! -r "$xfs_ino_objcount_file" ] && [ ! -r "$slabinfo_file" ]; then
> > +	_notrun "Cannot find xfs_inode slab count?"
> > +fi
> 
> WE should use the either /proc/fs/xfs/stat or
> /sys/fs/xfs/<dev>/stats to get this information.
> 
> $ awk '/vnodes/ { print $2}' /proc/fs/xfs/stat
> 245626
> $ awk '/vnodes/ { print $2}' /sys/fs/xfs/dm-0/stats/stats
> 245648

Ok.  I'll grab from /sys/fs/xfs/$dev/stats/stats since it's more
precise.

> > +# Background reclamation of disused xfs inodes is scheduled for $xfssyncd
> > +# centiseconds after the first inode is tagged for reclamation.  It's not great
> 
> Actually, the background inode reclaim period $((xfssyncd / 6)),
> which means it is 5 seconds by default.

Will update the code, though the interval is so small it makes little
difference. :)

> > +# to encode this implementation detail in a test like this, but there isn't
> > +# any means to trigger *only* inode cache reclaim -- actual memory pressure
> > +# can trigger the VFS to drop non-DONTCACHE inodes, which is not what we want.
> > +xfs_centisecs_file=/proc/sys/fs/xfs/xfssyncd_centisecs
> > +test -w "$xfs_centisecs_file" || _notrun "Cannot find xfssyncd_centisecs?"
> > +
> > +# Set the syncd knob to the minimum value 100cs (aka 1s) 
> > +old_centisecs="$(cat "$xfs_centisecs_file")"
> > +echo 100 > "$xfs_centisecs_file" || _notrun "Cannot adjust xfssyncd_centisecs?"
> > +delay_centisecs="$(cat "$xfs_centisecs_file")"
> > +
> > +# Sleep one second more than the xfssyncd delay to give background inode
> > +# reclaim enough time to run.
> > +sleep_seconds=$(( ( (99 + delay_centisecs) / 100) + 1))
> 
> I hate centiseconds.

Me too.

> > +
> > +count_xfs_inode_objs() {
> > +	if [ -r "$xfs_ino_objcount_file" ]; then
> > +		cat "$xfs_ino_objcount_file" | cut -d ' ' -f 1
> > +	elif [ -r "$slabinfo_file" ]; then
> > +		grep -w '^xfs_inode' "$slabinfo_file" | awk '{print $2}'
> > +	else
> > +		echo "ERROR"
> > +	fi
> > +}
> > +
> > +junkdir=$TEST_DIR/$seq.junk
> > +nr_cpus=$(getconf _NPROCESSORS_ONLN)
> 
> This would probably be much easier using the scratch device and
> /sys/fs/xfs/<SCRATCH_DEV>/stats/stats, because then the baseline is
> effectively zero...

Yes.

> > +# Sample the baseline count of cached inodes after a fresh remount.
> > +_test_cycle_mount
> > +baseline_count=$(count_xfs_inode_objs)
> > +
> > +# Create a junk directory with about a thousand files.
> > +nr_files=1024
> > +mkdir -p $junkdir
> > +for ((i = 0; i < nr_files; i++)); do
> > +	touch "$junkdir/$i"
> > +done
> > +new_files=$(find $junkdir | wc -l)
> > +echo "created $new_files files" >> $seqres.full
> > +test "$new_files" -gt "$nr_files" || \
> > +	echo "created $new_files files, expected $nr_files"
> > +
> > +# Sanity check: Make sure that all those new inodes are still in the cache.
> > +# We assume that memory limits are not so low that reclaim started for a bunch
> > +# of empty files.
> > +work_count=$(count_xfs_inode_objs)
> > +test "$work_count" -ge "$new_files" || \
> > +	echo "found $work_count cached inodes after creating $new_files files?"
> 
> Might be better as:
> 
> _within_tolerance "Cached inode after creating new files" $work_count $new_files 5

Ooh that's better.  Will change.

> > +
> > +# Round 1: Check the DONTCACHE behavior when it is invoked once per inode.
> > +# The inodes should be reclaimed if we wait long enough.
> > +echo "Round 1"
> > +
> > +# Sample again to see if we're still within the baseline.
> > +_test_cycle_mount
> > +fresh_count=$(count_xfs_inode_objs)
> > +
> > +# Run bulkstat to exercise DONTCACHE behavior, and sample again.
> > +$here/src/bstat -q $junkdir
> > +post_count=$(count_xfs_inode_objs)
> 
> Can we use xfs_io -c "bulkstat" here?
> 
> I'd like to get rid of $here/src/bstat now that we have bulkstat
> functionality in xfs_io....

Changed.

> > +
> > +# Let background reclaim run
> > +sleep $sleep_seconds
> > +end_count=$(count_xfs_inode_objs)
> > +
> > +# Even with our greatly reduced syncd value, the inodes should still be in
> > +# memory immediately after the second bulkstat concludes.
> > +test "$post_count" -ge "$new_files" || \
> > +	echo "found $post_count cached inodes after bulkstat $new_files files?"
> > +
> > +# After we've let memory reclaim run, the inodes should no longer be cached
> > +# in memory.
> > +test "$end_count" -le "$new_files" || \
> > +	echo "found $end_count cached inodes after letting $new_files DONTCACHE files expire?"
> > +
> > +# Dump full results for debugging
> > +cat >> $seqres.full << ENDL
> > +round1 baseline: $baseline_count
> > +work: $work_count
> > +fresh: $fresh_count
> > +post: $post_count
> > +end: $end_count
> > +ENDL
> 
> Wrap this in a function to reduce verbosity?
> 
> debug_output()
> {
> 	cat >> $seqres.full << ENDL
> round $1 baseline: $2
> work: $3
> fresh: $4
> post: $5
> end: $6
> ENDL
> }
> 
> debug_output 1 $baseline_count $work_count $fresh_count $post_count $end_count

Changed.

> > +
> > +# Round 2: Check the DONTCACHE behavior when it is invoked multiple times per
> > +# inode in rapid succession.  The inodes should remain in memory even after
> > +# reclaim because the cache gets wise to repeated scans.
> > +echo "Round 2"
> > +
> > +# Sample again to see if we're still within the baseline.
> > +_test_cycle_mount
> > +fresh_count=$(count_xfs_inode_objs)
> > +
> > +# Run bulkstat twice in rapid succession to exercise DONTCACHE behavior.
> > +# The first bulkstat run will bring the inodes into memory (marked DONTCACHE).
> > +# The second bulkstat causes cache hits before the inodes can reclaimed, which
> > +# means that they should stay in memory.  Sample again afterwards.
> > +$here/src/bstat -q $junkdir
> > +$here/src/bstat -q $junkdir
> > +post_count=$(count_xfs_inode_objs)
> > +
> > +# Let background reclaim run
> > +sleep $sleep_seconds
> > +end_count=$(count_xfs_inode_objs)
> > +
> > +# Even with our greatly reduced syncd value, the inodes should still be in
> > +# memory immediately after the second bulkstat concludes.
> > +test "$post_count" -ge "$new_files" || \
> > +	echo "found $post_count cached inodes after bulkstat $new_files files?"
> > +
> > +# After we've let memory reclaim run and cache hits happen, the inodes should
> > +# still be cached in memory.
> > +test "$end_count" -ge "$new_files" || \
> > +	echo "found $end_count cached inodes after letting $new_files DONTCACHE files expire?"
> > +
> > +# Dump full results for debugging
> > +cat >> $seqres.full << ENDL
> > +round2 baseline: $baseline_count
> > +work: $work_count
> > +fresh: $fresh_count
> > +post: $post_count
> > +end: $end_count
> > +ENDL
> 
> I'm struggling to see what is being tested amongst all the comments.
> Can you chop down the comments to a single "round X" comment per
> test?

Yes.

> [...]
> 
> > +# Even with our greatly reduced syncd value, the inodes should still be in
> > +# memory immediately after the second bulkstat concludes.
> > +test "$post_count" -ge "$new_files" || \
> > +	echo "found $post_count cached inodes after bulkstat $new_files files?"
> > +
> > +# After we've let memory reclaim run, the inodes should stll be cached in
> > +# memory because we opened everything.
> > +test "$end_count" -ge "$new_files" || \
> > +	echo "found $end_count cached inodes after letting $new_files DONTCACHE files expire?"
> > +
> > +# Dump full results for debugging
> > +cat >> $seqres.full << ENDL
> > +round5 baseline: $baseline_count
> > +work: $work_count
> > +fresh: $fresh_count
> > +post: $post_count
> > +end: $end_count
> > +ENDL
> > +
> > +echo Silence is golden
> 
> There is output on success, so no need for this.
> 
> But overall it looks like you've captured the behaviour that should
> be occurring with bulkstat and DONTCACHE.

Yay!

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-08-27  0:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-24  2:32 [PATCH] xfs: fix I_DONTCACHE Dave Chinner
2021-08-24  2:58 ` Darrick J. Wong
2021-08-24  5:04   ` Dave Chinner
2021-08-25 23:07 ` [RFC PATCH] xfs: test DONTCACHE behavior with the inode cache Darrick J. Wong
2021-08-26  0:47   ` Dave Chinner
2021-08-27  0:07     ` Darrick J. Wong
2021-08-25 23:09 ` [PATCH] xfs: fix I_DONTCACHE Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).