linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] xfs: 64kb directory block verification hurts
@ 2021-02-23  5:47 Dave Chinner
  2021-02-23  5:47 ` [PATCH 1/3] xfs: type verification is expensive Dave Chinner
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Dave Chinner @ 2021-02-23  5:47 UTC (permalink / raw)
  To: linux-xfs

Hi folks,

Quite simple: the debug code we have for internal directory block
verification does not scale to large directory blocks. What is a
minor nuisance at 4kB block size turns into a major problem at 64kB
block size.

These three patches work to reduce the worst of this overhead
without completely gutting the debug checks that are being done.
The type verification code that is generated is horrible - the
compiler does not inline all the nested 3 line functions and so
the function call overhead is significant. Adding a few inline
keywords so that the internal nesting is inlined cuts the overhead
by 30%.

The overhead is still huge. Some of the verification testing is
unnecessary - testing for error injection for a bad inode number
millions of times a second is hugely expensive, and getting rid of
that cuts overhead of directory inode number validation in half.

But the overhead is still huge. The biggest offender is the
directory leaf hash ordering checks. On every debug check call, of
which there is several for every directory modification, we walk the
entire hash entry table in the buffer (8k entries!) to check for
order. This is largely unnecessary, so only do this full order check
when the check function is called from the IO verifiers. If a kernel
dev needs more expensive checks to be re-instated, they only need to
change a single parameter from false to true to do so.

These changes make scalability testing with 64kB directory blocks on
debug kernels possible.

Cheers,

Dave.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] xfs: type verification is expensive
  2021-02-23  5:47 [PATCH 0/3] xfs: 64kb directory block verification hurts Dave Chinner
@ 2021-02-23  5:47 ` Dave Chinner
  2021-02-24 21:46   ` Darrick J. Wong
  2021-02-25  9:03   ` Christoph Hellwig
  2021-02-23  5:47 ` [PATCH 2/3] xfs: No need for inode number error injection in __xfs_dir3_data_check Dave Chinner
  2021-02-23  5:47 ` [PATCH 3/3] xfs: reduce debug overhead of dir leaf/node checks Dave Chinner
  2 siblings, 2 replies; 11+ messages in thread
From: Dave Chinner @ 2021-02-23  5:47 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

From a concurrent rm -rf workload:

  41.04%  [kernel]  [k] xfs_dir3_leaf_check_int
   9.85%  [kernel]  [k] __xfs_dir3_data_check
   5.60%  [kernel]  [k] xfs_verify_ino
   5.32%  [kernel]  [k] xfs_agino_range
   4.21%  [kernel]  [k] memcpy
   3.06%  [kernel]  [k] xfs_errortag_test
   2.57%  [kernel]  [k] xfs_dir_ino_validate
   1.66%  [kernel]  [k] xfs_dir2_data_get_ftype
   1.17%  [kernel]  [k] do_raw_spin_lock
   1.11%  [kernel]  [k] xfs_verify_dir_ino
   0.84%  [kernel]  [k] __raw_callee_save___pv_queued_spin_unlock
   0.83%  [kernel]  [k] xfs_buf_find
   0.64%  [kernel]  [k] xfs_log_commit_cil

THere's an awful lot of overhead in just range checking inode
numbers in that, but each inode number check is not a lot of code.
The total is a bit over 14.5% of the CPU time is spent validating
inode numbers.

The problem is that they deeply nested global scope functions so the
overhead here is all in function call marshalling.

   text	   data	    bss	    dec	    hex	filename
   2077	      0	      0	   2077	    81d fs/xfs/libxfs/xfs_types.o.orig
   2197	      0	      0	   2197	    895	fs/xfs/libxfs/xfs_types.o

There's a small increase in binary size by inlining all the local
nested calls in the verifier functions, but the same workload now
profiles as:

  40.69%  [kernel]  [k] xfs_dir3_leaf_check_int
  10.52%  [kernel]  [k] __xfs_dir3_data_check
   6.68%  [kernel]  [k] xfs_verify_dir_ino
   4.22%  [kernel]  [k] xfs_errortag_test
   4.15%  [kernel]  [k] memcpy
   3.53%  [kernel]  [k] xfs_dir_ino_validate
   1.87%  [kernel]  [k] xfs_dir2_data_get_ftype
   1.37%  [kernel]  [k] do_raw_spin_lock
   0.98%  [kernel]  [k] xfs_buf_find
   0.94%  [kernel]  [k] __raw_callee_save___pv_queued_spin_unlock
   0.73%  [kernel]  [k] xfs_log_commit_cil

Now we only spend just over 10% of the time validing inode numbers
for the same workload. Hence a few "inline" keyworks is good enough
to reduce the validation overhead by 30%...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_types.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_types.c b/fs/xfs/libxfs/xfs_types.c
index b254fbeaaa50..04801362e1a7 100644
--- a/fs/xfs/libxfs/xfs_types.c
+++ b/fs/xfs/libxfs/xfs_types.c
@@ -13,7 +13,7 @@
 #include "xfs_mount.h"
 
 /* Find the size of the AG, in blocks. */
-xfs_agblock_t
+inline xfs_agblock_t
 xfs_ag_block_count(
 	struct xfs_mount	*mp,
 	xfs_agnumber_t		agno)
@@ -29,7 +29,7 @@ xfs_ag_block_count(
  * Verify that an AG block number pointer neither points outside the AG
  * nor points at static metadata.
  */
-bool
+inline bool
 xfs_verify_agbno(
 	struct xfs_mount	*mp,
 	xfs_agnumber_t		agno,
@@ -49,7 +49,7 @@ xfs_verify_agbno(
  * Verify that an FS block number pointer neither points outside the
  * filesystem nor points at static AG metadata.
  */
-bool
+inline bool
 xfs_verify_fsbno(
 	struct xfs_mount	*mp,
 	xfs_fsblock_t		fsbno)
@@ -85,7 +85,7 @@ xfs_verify_fsbext(
 }
 
 /* Calculate the first and last possible inode number in an AG. */
-void
+inline void
 xfs_agino_range(
 	struct xfs_mount	*mp,
 	xfs_agnumber_t		agno,
@@ -116,7 +116,7 @@ xfs_agino_range(
  * Verify that an AG inode number pointer neither points outside the AG
  * nor points at static metadata.
  */
-bool
+inline bool
 xfs_verify_agino(
 	struct xfs_mount	*mp,
 	xfs_agnumber_t		agno,
@@ -146,7 +146,7 @@ xfs_verify_agino_or_null(
  * Verify that an FS inode number pointer neither points outside the
  * filesystem nor points at static AG metadata.
  */
-bool
+inline bool
 xfs_verify_ino(
 	struct xfs_mount	*mp,
 	xfs_ino_t		ino)
@@ -162,7 +162,7 @@ xfs_verify_ino(
 }
 
 /* Is this an internal inode number? */
-bool
+inline bool
 xfs_internal_inum(
 	struct xfs_mount	*mp,
 	xfs_ino_t		ino)
@@ -190,7 +190,7 @@ xfs_verify_dir_ino(
  * Verify that an realtime block number pointer doesn't point off the
  * end of the realtime device.
  */
-bool
+inline bool
 xfs_verify_rtbno(
 	struct xfs_mount	*mp,
 	xfs_rtblock_t		rtbno)
@@ -215,7 +215,7 @@ xfs_verify_rtext(
 }
 
 /* Calculate the range of valid icount values. */
-void
+inline void
 xfs_icount_range(
 	struct xfs_mount	*mp,
 	unsigned long long	*min,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] xfs: No need for inode number error injection in __xfs_dir3_data_check
  2021-02-23  5:47 [PATCH 0/3] xfs: 64kb directory block verification hurts Dave Chinner
  2021-02-23  5:47 ` [PATCH 1/3] xfs: type verification is expensive Dave Chinner
@ 2021-02-23  5:47 ` Dave Chinner
  2021-02-24 21:47   ` Darrick J. Wong
  2021-02-25  9:06   ` Christoph Hellwig
  2021-02-23  5:47 ` [PATCH 3/3] xfs: reduce debug overhead of dir leaf/node checks Dave Chinner
  2 siblings, 2 replies; 11+ messages in thread
From: Dave Chinner @ 2021-02-23  5:47 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We call xfs_dir_ino_validate() for every dir entry in a directory
when doing validity checking of the directory. It calls
xfs_verify_dir_ino() then emits a corruption report if bad or does
error injection if good. It is extremely costly:

  43.27%  [kernel]  [k] xfs_dir3_leaf_check_int
  10.28%  [kernel]  [k] __xfs_dir3_data_check
   6.61%  [kernel]  [k] xfs_verify_dir_ino
   4.16%  [kernel]  [k] xfs_errortag_test
   4.00%  [kernel]  [k] memcpy
   3.48%  [kernel]  [k] xfs_dir_ino_validate

7% of the cpu usage in this directory traversal workload is
xfs_dir_ino_validate() doing absolutely nothing.

We don't need error injection to simulate a bad inode numbers in the
directory structure because we can do that by fuzzing the structure
on disk.

And we don't need a corruption report, because the
__xfs_dir3_data_check() will emit one if the inode number is bad.

So just call xfs_verify_dir_ino() directly here, and get rid of all
this unnecessary overhead:

  40.30%  [kernel]  [k] xfs_dir3_leaf_check_int
  10.98%  [kernel]  [k] __xfs_dir3_data_check
   8.10%  [kernel]  [k] xfs_verify_dir_ino
   4.42%  [kernel]  [k] memcpy
   2.22%  [kernel]  [k] xfs_dir2_data_get_ftype
   1.52%  [kernel]  [k] do_raw_spin_lock

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_dir2_data.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_dir2_data.c b/fs/xfs/libxfs/xfs_dir2_data.c
index 375b3edb2ad2..e67fa086f2c1 100644
--- a/fs/xfs/libxfs/xfs_dir2_data.c
+++ b/fs/xfs/libxfs/xfs_dir2_data.c
@@ -218,7 +218,7 @@ __xfs_dir3_data_check(
 		 */
 		if (dep->namelen == 0)
 			return __this_address;
-		if (xfs_dir_ino_validate(mp, be64_to_cpu(dep->inumber)))
+		if (!xfs_verify_dir_ino(mp, be64_to_cpu(dep->inumber)))
 			return __this_address;
 		if (offset + xfs_dir2_data_entsize(mp, dep->namelen) > end)
 			return __this_address;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] xfs: reduce debug overhead of dir leaf/node checks
  2021-02-23  5:47 [PATCH 0/3] xfs: 64kb directory block verification hurts Dave Chinner
  2021-02-23  5:47 ` [PATCH 1/3] xfs: type verification is expensive Dave Chinner
  2021-02-23  5:47 ` [PATCH 2/3] xfs: No need for inode number error injection in __xfs_dir3_data_check Dave Chinner
@ 2021-02-23  5:47 ` Dave Chinner
  2021-02-24 21:50   ` Darrick J. Wong
  2021-02-25  9:09   ` Christoph Hellwig
  2 siblings, 2 replies; 11+ messages in thread
From: Dave Chinner @ 2021-02-23  5:47 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

On debug kernels, we call xfs_dir3_leaf_check_int() multiple times
on every directory modification. The robust hash ordering checks it
does on every entry in the leaf on every call results in a massive
CPU overhead which slows down debug kernels by a large amount.

We use xfs_dir3_leaf_check_int() for the verifiers as well, so we
can't just gut the function to reduce overhead. What we can do,
however, is reduce the work it does when it is called from the
debug interfaces, just leaving the high level checks in place and
leaving the robust validation to the verifiers. This means the debug
checks will catch gross errors, but subtle bugs might not be caught
until a verifier is run.

It is easy enough to restore the existing debug behaviour if the
developer needs it (just change a call parameter in the debug code),
but overwise the overhead makes testing large directory block sizes
on debug kernels very slow.

Profile at an unlink rate of ~80k file/s on a 64k block size
filesystem before the patch:

  40.30%  [kernel]  [k] xfs_dir3_leaf_check_int
  10.98%  [kernel]  [k] __xfs_dir3_data_check
   8.10%  [kernel]  [k] xfs_verify_dir_ino
   4.42%  [kernel]  [k] memcpy
   2.22%  [kernel]  [k] xfs_dir2_data_get_ftype
   1.52%  [kernel]  [k] do_raw_spin_lock

Profile after, at an unlink rate of ~125k files/s (+50% improvement)
has largely dropped the leaf verification debug overhead out of the
profile.

  16.53%  [kernel]  [k] __xfs_dir3_data_check
  12.53%  [kernel]  [k] xfs_verify_dir_ino
   7.97%  [kernel]  [k] memcpy
   3.36%  [kernel]  [k] xfs_dir2_data_get_ftype
   2.86%  [kernel]  [k] __pv_queued_spin_lock_slowpath

Create shows a similar change in profile and a +25% improvement in
performance.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_dir2_leaf.c | 10 +++++++---
 fs/xfs/libxfs/xfs_dir2_node.c |  2 +-
 fs/xfs/libxfs/xfs_dir2_priv.h |  3 ++-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_dir2_leaf.c b/fs/xfs/libxfs/xfs_dir2_leaf.c
index 95d2a3f92d75..ccd8d0aa62b8 100644
--- a/fs/xfs/libxfs/xfs_dir2_leaf.c
+++ b/fs/xfs/libxfs/xfs_dir2_leaf.c
@@ -113,7 +113,7 @@ xfs_dir3_leaf1_check(
 	} else if (leafhdr.magic != XFS_DIR2_LEAF1_MAGIC)
 		return __this_address;
 
-	return xfs_dir3_leaf_check_int(dp->i_mount, &leafhdr, leaf);
+	return xfs_dir3_leaf_check_int(dp->i_mount, &leafhdr, leaf, false);
 }
 
 static inline void
@@ -139,7 +139,8 @@ xfs_failaddr_t
 xfs_dir3_leaf_check_int(
 	struct xfs_mount		*mp,
 	struct xfs_dir3_icleaf_hdr	*hdr,
-	struct xfs_dir2_leaf		*leaf)
+	struct xfs_dir2_leaf		*leaf,
+	bool				expensive_checking)
 {
 	struct xfs_da_geometry		*geo = mp->m_dir_geo;
 	xfs_dir2_leaf_tail_t		*ltp;
@@ -162,6 +163,9 @@ xfs_dir3_leaf_check_int(
 	    (char *)&hdr->ents[hdr->count] > (char *)xfs_dir2_leaf_bests_p(ltp))
 		return __this_address;
 
+	if (!expensive_checking)
+		return NULL;
+
 	/* Check hash value order, count stale entries.  */
 	for (i = stale = 0; i < hdr->count; i++) {
 		if (i + 1 < hdr->count) {
@@ -195,7 +199,7 @@ xfs_dir3_leaf_verify(
 		return fa;
 
 	xfs_dir2_leaf_hdr_from_disk(mp, &leafhdr, bp->b_addr);
-	return xfs_dir3_leaf_check_int(mp, &leafhdr, bp->b_addr);
+	return xfs_dir3_leaf_check_int(mp, &leafhdr, bp->b_addr, true);
 }
 
 static void
diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
index 5d51265d29d6..80a64117b460 100644
--- a/fs/xfs/libxfs/xfs_dir2_node.c
+++ b/fs/xfs/libxfs/xfs_dir2_node.c
@@ -73,7 +73,7 @@ xfs_dir3_leafn_check(
 	} else if (leafhdr.magic != XFS_DIR2_LEAFN_MAGIC)
 		return __this_address;
 
-	return xfs_dir3_leaf_check_int(dp->i_mount, &leafhdr, leaf);
+	return xfs_dir3_leaf_check_int(dp->i_mount, &leafhdr, leaf, false);
 }
 
 static inline void
diff --git a/fs/xfs/libxfs/xfs_dir2_priv.h b/fs/xfs/libxfs/xfs_dir2_priv.h
index 44c6a77cba05..94943ce49cab 100644
--- a/fs/xfs/libxfs/xfs_dir2_priv.h
+++ b/fs/xfs/libxfs/xfs_dir2_priv.h
@@ -127,7 +127,8 @@ xfs_dir3_leaf_find_entry(struct xfs_dir3_icleaf_hdr *leafhdr,
 extern int xfs_dir2_node_to_leaf(struct xfs_da_state *state);
 
 extern xfs_failaddr_t xfs_dir3_leaf_check_int(struct xfs_mount *mp,
-		struct xfs_dir3_icleaf_hdr *hdr, struct xfs_dir2_leaf *leaf);
+		struct xfs_dir3_icleaf_hdr *hdr, struct xfs_dir2_leaf *leaf,
+		bool expensive_checks);
 
 /* xfs_dir2_node.c */
 void xfs_dir2_free_hdr_from_disk(struct xfs_mount *mp,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] xfs: type verification is expensive
  2021-02-23  5:47 ` [PATCH 1/3] xfs: type verification is expensive Dave Chinner
@ 2021-02-24 21:46   ` Darrick J. Wong
  2021-02-25  9:03   ` Christoph Hellwig
  1 sibling, 0 replies; 11+ messages in thread
From: Darrick J. Wong @ 2021-02-24 21:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Feb 23, 2021 at 04:47:46PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> From a concurrent rm -rf workload:
> 
>   41.04%  [kernel]  [k] xfs_dir3_leaf_check_int
>    9.85%  [kernel]  [k] __xfs_dir3_data_check
>    5.60%  [kernel]  [k] xfs_verify_ino
>    5.32%  [kernel]  [k] xfs_agino_range
>    4.21%  [kernel]  [k] memcpy
>    3.06%  [kernel]  [k] xfs_errortag_test
>    2.57%  [kernel]  [k] xfs_dir_ino_validate
>    1.66%  [kernel]  [k] xfs_dir2_data_get_ftype
>    1.17%  [kernel]  [k] do_raw_spin_lock
>    1.11%  [kernel]  [k] xfs_verify_dir_ino
>    0.84%  [kernel]  [k] __raw_callee_save___pv_queued_spin_unlock
>    0.83%  [kernel]  [k] xfs_buf_find
>    0.64%  [kernel]  [k] xfs_log_commit_cil
> 
> THere's an awful lot of overhead in just range checking inode
> numbers in that, but each inode number check is not a lot of code.
> The total is a bit over 14.5% of the CPU time is spent validating
> inode numbers.
> 
> The problem is that they deeply nested global scope functions so the
> overhead here is all in function call marshalling.
> 
>    text	   data	    bss	    dec	    hex	filename
>    2077	      0	      0	   2077	    81d fs/xfs/libxfs/xfs_types.o.orig
>    2197	      0	      0	   2197	    895	fs/xfs/libxfs/xfs_types.o
> 
> There's a small increase in binary size by inlining all the local
> nested calls in the verifier functions, but the same workload now
> profiles as:
> 
>   40.69%  [kernel]  [k] xfs_dir3_leaf_check_int
>   10.52%  [kernel]  [k] __xfs_dir3_data_check
>    6.68%  [kernel]  [k] xfs_verify_dir_ino
>    4.22%  [kernel]  [k] xfs_errortag_test
>    4.15%  [kernel]  [k] memcpy
>    3.53%  [kernel]  [k] xfs_dir_ino_validate
>    1.87%  [kernel]  [k] xfs_dir2_data_get_ftype
>    1.37%  [kernel]  [k] do_raw_spin_lock
>    0.98%  [kernel]  [k] xfs_buf_find
>    0.94%  [kernel]  [k] __raw_callee_save___pv_queued_spin_unlock
>    0.73%  [kernel]  [k] xfs_log_commit_cil
> 
> Now we only spend just over 10% of the time validing inode numbers
> for the same workload. Hence a few "inline" keyworks is good enough
> to reduce the validation overhead by 30%...
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks fine I guess,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_types.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_types.c b/fs/xfs/libxfs/xfs_types.c
> index b254fbeaaa50..04801362e1a7 100644
> --- a/fs/xfs/libxfs/xfs_types.c
> +++ b/fs/xfs/libxfs/xfs_types.c
> @@ -13,7 +13,7 @@
>  #include "xfs_mount.h"
>  
>  /* Find the size of the AG, in blocks. */
> -xfs_agblock_t
> +inline xfs_agblock_t
>  xfs_ag_block_count(
>  	struct xfs_mount	*mp,
>  	xfs_agnumber_t		agno)
> @@ -29,7 +29,7 @@ xfs_ag_block_count(
>   * Verify that an AG block number pointer neither points outside the AG
>   * nor points at static metadata.
>   */
> -bool
> +inline bool
>  xfs_verify_agbno(
>  	struct xfs_mount	*mp,
>  	xfs_agnumber_t		agno,
> @@ -49,7 +49,7 @@ xfs_verify_agbno(
>   * Verify that an FS block number pointer neither points outside the
>   * filesystem nor points at static AG metadata.
>   */
> -bool
> +inline bool
>  xfs_verify_fsbno(
>  	struct xfs_mount	*mp,
>  	xfs_fsblock_t		fsbno)
> @@ -85,7 +85,7 @@ xfs_verify_fsbext(
>  }
>  
>  /* Calculate the first and last possible inode number in an AG. */
> -void
> +inline void
>  xfs_agino_range(
>  	struct xfs_mount	*mp,
>  	xfs_agnumber_t		agno,
> @@ -116,7 +116,7 @@ xfs_agino_range(
>   * Verify that an AG inode number pointer neither points outside the AG
>   * nor points at static metadata.
>   */
> -bool
> +inline bool
>  xfs_verify_agino(
>  	struct xfs_mount	*mp,
>  	xfs_agnumber_t		agno,
> @@ -146,7 +146,7 @@ xfs_verify_agino_or_null(
>   * Verify that an FS inode number pointer neither points outside the
>   * filesystem nor points at static AG metadata.
>   */
> -bool
> +inline bool
>  xfs_verify_ino(
>  	struct xfs_mount	*mp,
>  	xfs_ino_t		ino)
> @@ -162,7 +162,7 @@ xfs_verify_ino(
>  }
>  
>  /* Is this an internal inode number? */
> -bool
> +inline bool
>  xfs_internal_inum(
>  	struct xfs_mount	*mp,
>  	xfs_ino_t		ino)
> @@ -190,7 +190,7 @@ xfs_verify_dir_ino(
>   * Verify that an realtime block number pointer doesn't point off the
>   * end of the realtime device.
>   */
> -bool
> +inline bool
>  xfs_verify_rtbno(
>  	struct xfs_mount	*mp,
>  	xfs_rtblock_t		rtbno)
> @@ -215,7 +215,7 @@ xfs_verify_rtext(
>  }
>  
>  /* Calculate the range of valid icount values. */
> -void
> +inline void
>  xfs_icount_range(
>  	struct xfs_mount	*mp,
>  	unsigned long long	*min,
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] xfs: No need for inode number error injection in __xfs_dir3_data_check
  2021-02-23  5:47 ` [PATCH 2/3] xfs: No need for inode number error injection in __xfs_dir3_data_check Dave Chinner
@ 2021-02-24 21:47   ` Darrick J. Wong
  2021-02-25  9:06   ` Christoph Hellwig
  1 sibling, 0 replies; 11+ messages in thread
From: Darrick J. Wong @ 2021-02-24 21:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Feb 23, 2021 at 04:47:47PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We call xfs_dir_ino_validate() for every dir entry in a directory
> when doing validity checking of the directory. It calls
> xfs_verify_dir_ino() then emits a corruption report if bad or does
> error injection if good. It is extremely costly:
> 
>   43.27%  [kernel]  [k] xfs_dir3_leaf_check_int
>   10.28%  [kernel]  [k] __xfs_dir3_data_check
>    6.61%  [kernel]  [k] xfs_verify_dir_ino
>    4.16%  [kernel]  [k] xfs_errortag_test
>    4.00%  [kernel]  [k] memcpy
>    3.48%  [kernel]  [k] xfs_dir_ino_validate
> 
> 7% of the cpu usage in this directory traversal workload is
> xfs_dir_ino_validate() doing absolutely nothing.
> 
> We don't need error injection to simulate a bad inode numbers in the
> directory structure because we can do that by fuzzing the structure
> on disk.
> 
> And we don't need a corruption report, because the
> __xfs_dir3_data_check() will emit one if the inode number is bad.
> 
> So just call xfs_verify_dir_ino() directly here, and get rid of all
> this unnecessary overhead:
> 
>   40.30%  [kernel]  [k] xfs_dir3_leaf_check_int
>   10.98%  [kernel]  [k] __xfs_dir3_data_check
>    8.10%  [kernel]  [k] xfs_verify_dir_ino
>    4.42%  [kernel]  [k] memcpy
>    2.22%  [kernel]  [k] xfs_dir2_data_get_ftype
>    1.52%  [kernel]  [k] do_raw_spin_lock
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Heh, yeah, this one has filled up the fuzzer test results for a while...

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_dir2_data.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_dir2_data.c b/fs/xfs/libxfs/xfs_dir2_data.c
> index 375b3edb2ad2..e67fa086f2c1 100644
> --- a/fs/xfs/libxfs/xfs_dir2_data.c
> +++ b/fs/xfs/libxfs/xfs_dir2_data.c
> @@ -218,7 +218,7 @@ __xfs_dir3_data_check(
>  		 */
>  		if (dep->namelen == 0)
>  			return __this_address;
> -		if (xfs_dir_ino_validate(mp, be64_to_cpu(dep->inumber)))
> +		if (!xfs_verify_dir_ino(mp, be64_to_cpu(dep->inumber)))
>  			return __this_address;
>  		if (offset + xfs_dir2_data_entsize(mp, dep->namelen) > end)
>  			return __this_address;
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] xfs: reduce debug overhead of dir leaf/node checks
  2021-02-23  5:47 ` [PATCH 3/3] xfs: reduce debug overhead of dir leaf/node checks Dave Chinner
@ 2021-02-24 21:50   ` Darrick J. Wong
  2021-02-25  9:09   ` Christoph Hellwig
  1 sibling, 0 replies; 11+ messages in thread
From: Darrick J. Wong @ 2021-02-24 21:50 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Feb 23, 2021 at 04:47:48PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> On debug kernels, we call xfs_dir3_leaf_check_int() multiple times
> on every directory modification. The robust hash ordering checks it
> does on every entry in the leaf on every call results in a massive
> CPU overhead which slows down debug kernels by a large amount.
> 
> We use xfs_dir3_leaf_check_int() for the verifiers as well, so we
> can't just gut the function to reduce overhead. What we can do,
> however, is reduce the work it does when it is called from the
> debug interfaces, just leaving the high level checks in place and
> leaving the robust validation to the verifiers. This means the debug
> checks will catch gross errors, but subtle bugs might not be caught
> until a verifier is run.
> 
> It is easy enough to restore the existing debug behaviour if the
> developer needs it (just change a call parameter in the debug code),
> but overwise the overhead makes testing large directory block sizes
> on debug kernels very slow.
> 
> Profile at an unlink rate of ~80k file/s on a 64k block size
> filesystem before the patch:
> 
>   40.30%  [kernel]  [k] xfs_dir3_leaf_check_int
>   10.98%  [kernel]  [k] __xfs_dir3_data_check
>    8.10%  [kernel]  [k] xfs_verify_dir_ino
>    4.42%  [kernel]  [k] memcpy
>    2.22%  [kernel]  [k] xfs_dir2_data_get_ftype
>    1.52%  [kernel]  [k] do_raw_spin_lock
> 
> Profile after, at an unlink rate of ~125k files/s (+50% improvement)
> has largely dropped the leaf verification debug overhead out of the
> profile.
> 
>   16.53%  [kernel]  [k] __xfs_dir3_data_check
>   12.53%  [kernel]  [k] xfs_verify_dir_ino
>    7.97%  [kernel]  [k] memcpy
>    3.36%  [kernel]  [k] xfs_dir2_data_get_ftype
>    2.86%  [kernel]  [k] __pv_queued_spin_lock_slowpath
> 
> Create shows a similar change in profile and a +25% improvement in
> performance.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks good,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_dir2_leaf.c | 10 +++++++---
>  fs/xfs/libxfs/xfs_dir2_node.c |  2 +-
>  fs/xfs/libxfs/xfs_dir2_priv.h |  3 ++-
>  3 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_dir2_leaf.c b/fs/xfs/libxfs/xfs_dir2_leaf.c
> index 95d2a3f92d75..ccd8d0aa62b8 100644
> --- a/fs/xfs/libxfs/xfs_dir2_leaf.c
> +++ b/fs/xfs/libxfs/xfs_dir2_leaf.c
> @@ -113,7 +113,7 @@ xfs_dir3_leaf1_check(
>  	} else if (leafhdr.magic != XFS_DIR2_LEAF1_MAGIC)
>  		return __this_address;
>  
> -	return xfs_dir3_leaf_check_int(dp->i_mount, &leafhdr, leaf);
> +	return xfs_dir3_leaf_check_int(dp->i_mount, &leafhdr, leaf, false);
>  }
>  
>  static inline void
> @@ -139,7 +139,8 @@ xfs_failaddr_t
>  xfs_dir3_leaf_check_int(
>  	struct xfs_mount		*mp,
>  	struct xfs_dir3_icleaf_hdr	*hdr,
> -	struct xfs_dir2_leaf		*leaf)
> +	struct xfs_dir2_leaf		*leaf,
> +	bool				expensive_checking)
>  {
>  	struct xfs_da_geometry		*geo = mp->m_dir_geo;
>  	xfs_dir2_leaf_tail_t		*ltp;
> @@ -162,6 +163,9 @@ xfs_dir3_leaf_check_int(
>  	    (char *)&hdr->ents[hdr->count] > (char *)xfs_dir2_leaf_bests_p(ltp))
>  		return __this_address;
>  
> +	if (!expensive_checking)
> +		return NULL;
> +
>  	/* Check hash value order, count stale entries.  */
>  	for (i = stale = 0; i < hdr->count; i++) {
>  		if (i + 1 < hdr->count) {
> @@ -195,7 +199,7 @@ xfs_dir3_leaf_verify(
>  		return fa;
>  
>  	xfs_dir2_leaf_hdr_from_disk(mp, &leafhdr, bp->b_addr);
> -	return xfs_dir3_leaf_check_int(mp, &leafhdr, bp->b_addr);
> +	return xfs_dir3_leaf_check_int(mp, &leafhdr, bp->b_addr, true);
>  }
>  
>  static void
> diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
> index 5d51265d29d6..80a64117b460 100644
> --- a/fs/xfs/libxfs/xfs_dir2_node.c
> +++ b/fs/xfs/libxfs/xfs_dir2_node.c
> @@ -73,7 +73,7 @@ xfs_dir3_leafn_check(
>  	} else if (leafhdr.magic != XFS_DIR2_LEAFN_MAGIC)
>  		return __this_address;
>  
> -	return xfs_dir3_leaf_check_int(dp->i_mount, &leafhdr, leaf);
> +	return xfs_dir3_leaf_check_int(dp->i_mount, &leafhdr, leaf, false);
>  }
>  
>  static inline void
> diff --git a/fs/xfs/libxfs/xfs_dir2_priv.h b/fs/xfs/libxfs/xfs_dir2_priv.h
> index 44c6a77cba05..94943ce49cab 100644
> --- a/fs/xfs/libxfs/xfs_dir2_priv.h
> +++ b/fs/xfs/libxfs/xfs_dir2_priv.h
> @@ -127,7 +127,8 @@ xfs_dir3_leaf_find_entry(struct xfs_dir3_icleaf_hdr *leafhdr,
>  extern int xfs_dir2_node_to_leaf(struct xfs_da_state *state);
>  
>  extern xfs_failaddr_t xfs_dir3_leaf_check_int(struct xfs_mount *mp,
> -		struct xfs_dir3_icleaf_hdr *hdr, struct xfs_dir2_leaf *leaf);
> +		struct xfs_dir3_icleaf_hdr *hdr, struct xfs_dir2_leaf *leaf,
> +		bool expensive_checks);
>  
>  /* xfs_dir2_node.c */
>  void xfs_dir2_free_hdr_from_disk(struct xfs_mount *mp,
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] xfs: type verification is expensive
  2021-02-23  5:47 ` [PATCH 1/3] xfs: type verification is expensive Dave Chinner
  2021-02-24 21:46   ` Darrick J. Wong
@ 2021-02-25  9:03   ` Christoph Hellwig
  2021-02-25 22:04     ` Dave Chinner
  1 sibling, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2021-02-25  9:03 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

Any reason to not just mark them static inline and move them to
xfs_types.h?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] xfs: No need for inode number error injection in __xfs_dir3_data_check
  2021-02-23  5:47 ` [PATCH 2/3] xfs: No need for inode number error injection in __xfs_dir3_data_check Dave Chinner
  2021-02-24 21:47   ` Darrick J. Wong
@ 2021-02-25  9:06   ` Christoph Hellwig
  1 sibling, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2021-02-25  9:06 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

On Tue, Feb 23, 2021 at 04:47:47PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We call xfs_dir_ino_validate() for every dir entry in a directory
> when doing validity checking of the directory. It calls
> xfs_verify_dir_ino() then emits a corruption report if bad or does
> error injection if good. It is extremely costly:
> 
>   43.27%  [kernel]  [k] xfs_dir3_leaf_check_int
>   10.28%  [kernel]  [k] __xfs_dir3_data_check
>    6.61%  [kernel]  [k] xfs_verify_dir_ino
>    4.16%  [kernel]  [k] xfs_errortag_test
>    4.00%  [kernel]  [k] memcpy
>    3.48%  [kernel]  [k] xfs_dir_ino_validate
> 
> 7% of the cpu usage in this directory traversal workload is
> xfs_dir_ino_validate() doing absolutely nothing.
> 
> We don't need error injection to simulate a bad inode numbers in the
> directory structure because we can do that by fuzzing the structure
> on disk.
> 
> And we don't need a corruption report, because the
> __xfs_dir3_data_check() will emit one if the inode number is bad.
> 
> So just call xfs_verify_dir_ino() directly here, and get rid of all
> this unnecessary overhead:
> 
>   40.30%  [kernel]  [k] xfs_dir3_leaf_check_int
>   10.98%  [kernel]  [k] __xfs_dir3_data_check
>    8.10%  [kernel]  [k] xfs_verify_dir_ino
>    4.42%  [kernel]  [k] memcpy
>    2.22%  [kernel]  [k] xfs_dir2_data_get_ftype
>    1.52%  [kernel]  [k] do_raw_spin_lock
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_dir2_data.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_dir2_data.c b/fs/xfs/libxfs/xfs_dir2_data.c
> index 375b3edb2ad2..e67fa086f2c1 100644
> --- a/fs/xfs/libxfs/xfs_dir2_data.c
> +++ b/fs/xfs/libxfs/xfs_dir2_data.c
> @@ -218,7 +218,7 @@ __xfs_dir3_data_check(
>  		 */
>  		if (dep->namelen == 0)
>  			return __this_address;
> -		if (xfs_dir_ino_validate(mp, be64_to_cpu(dep->inumber)))
> +		if (!xfs_verify_dir_ino(mp, be64_to_cpu(dep->inumber)))
>  			return __this_address;
>  		if (offset + xfs_dir2_data_entsize(mp, dep->namelen) > end)
>  			return __this_address;
> -- 
> 2.28.0
> 
---end quoted text---

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] xfs: reduce debug overhead of dir leaf/node checks
  2021-02-23  5:47 ` [PATCH 3/3] xfs: reduce debug overhead of dir leaf/node checks Dave Chinner
  2021-02-24 21:50   ` Darrick J. Wong
@ 2021-02-25  9:09   ` Christoph Hellwig
  1 sibling, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2021-02-25  9:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] xfs: type verification is expensive
  2021-02-25  9:03   ` Christoph Hellwig
@ 2021-02-25 22:04     ` Dave Chinner
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2021-02-25 22:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Thu, Feb 25, 2021 at 10:03:37AM +0100, Christoph Hellwig wrote:
> Any reason to not just mark them static inline and move them to
> xfs_types.h?

Circular header file dependencies. xfs_mount.h needs xfs_types.h and
moving these to xfs_types.h means xfs_types.h now depends on
xfs_mount.h and a bunch of other header files...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-02-25 22:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-23  5:47 [PATCH 0/3] xfs: 64kb directory block verification hurts Dave Chinner
2021-02-23  5:47 ` [PATCH 1/3] xfs: type verification is expensive Dave Chinner
2021-02-24 21:46   ` Darrick J. Wong
2021-02-25  9:03   ` Christoph Hellwig
2021-02-25 22:04     ` Dave Chinner
2021-02-23  5:47 ` [PATCH 2/3] xfs: No need for inode number error injection in __xfs_dir3_data_check Dave Chinner
2021-02-24 21:47   ` Darrick J. Wong
2021-02-25  9:06   ` Christoph Hellwig
2021-02-23  5:47 ` [PATCH 3/3] xfs: reduce debug overhead of dir leaf/node checks Dave Chinner
2021-02-24 21:50   ` Darrick J. Wong
2021-02-25  9:09   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).