All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] xfs: scrub-related fixes
@ 2019-06-26 20:46 Darrick J. Wong
  2019-06-26 20:46 ` [PATCH 1/6] xfs: remove more ondisk directory corruption asserts Darrick J. Wong
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Darrick J. Wong @ 2019-06-26 20:46 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

Hi all,

I discovered by sampling xfs_scrub stack trace swith a flame graph that
the attr scrub code has a sizeable oversight -- the xattr scrub code
always allocates a zeroed 65K temporary buffer before locking the inode,
even if it then turns out that the inode does not have extended
attributes.

In addition to the pointless memory allocation, the scrub code itself is
careful to initialize whatever part of the memory buffer it's going to
use before reading the contents, which means that the memory clearing is
not only painful (it's 5% of the sample traces!) but totally pointless.

The first patch does more whack-a-mole cleanup of places where corrupt
ondisk directory metadata causes ASSERTs instead of -EFSCORRUPTED
returns.

The rest of the series first cleans up the open-coded pointer
calculations where the buffer is concerned, and then restructures the
code so to allocate the smallest size buffer needed and only just before
it's actually needed.  The final patch disables buffer zeroing for
better performance.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=attr-scrub-fixes

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/6] xfs: remove more ondisk directory corruption asserts
  2019-06-26 20:46 [PATCH v2 0/6] xfs: scrub-related fixes Darrick J. Wong
@ 2019-06-26 20:46 ` Darrick J. Wong
  2019-07-05 14:49   ` Brian Foster
  2019-06-26 20:46 ` [PATCH 2/6] xfs: attribute scrub should use seen_enough to pass error values Darrick J. Wong
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2019-06-26 20:46 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Continue our game of replacing ASSERTs for corrupt ondisk metadata with
EFSCORRUPTED returns.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_da_btree.c  |   19 ++++++++++++-------
 fs/xfs/libxfs/xfs_dir2_node.c |    3 ++-
 2 files changed, 14 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
index e2737e2ac2ae..c914ae763113 100644
--- a/fs/xfs/libxfs/xfs_da_btree.c
+++ b/fs/xfs/libxfs/xfs_da_btree.c
@@ -493,10 +493,8 @@ xfs_da3_split(
 	ASSERT(state->path.active == 0);
 	oldblk = &state->path.blk[0];
 	error = xfs_da3_root_split(state, oldblk, addblk);
-	if (error) {
-		addblk->bp = NULL;
-		return error;	/* GROT: dir is inconsistent */
-	}
+	if (error)
+		goto out;
 
 	/*
 	 * Update pointers to the node which used to be block 0 and just got
@@ -511,7 +509,10 @@ xfs_da3_split(
 	 */
 	node = oldblk->bp->b_addr;
 	if (node->hdr.info.forw) {
-		ASSERT(be32_to_cpu(node->hdr.info.forw) == addblk->blkno);
+		if (be32_to_cpu(node->hdr.info.forw) != addblk->blkno) {
+			error = -EFSCORRUPTED;
+			goto out;
+		}
 		node = addblk->bp->b_addr;
 		node->hdr.info.back = cpu_to_be32(oldblk->blkno);
 		xfs_trans_log_buf(state->args->trans, addblk->bp,
@@ -520,15 +521,19 @@ xfs_da3_split(
 	}
 	node = oldblk->bp->b_addr;
 	if (node->hdr.info.back) {
-		ASSERT(be32_to_cpu(node->hdr.info.back) == addblk->blkno);
+		if (be32_to_cpu(node->hdr.info.back) != addblk->blkno) {
+			error = -EFSCORRUPTED;
+			goto out;
+		}
 		node = addblk->bp->b_addr;
 		node->hdr.info.forw = cpu_to_be32(oldblk->blkno);
 		xfs_trans_log_buf(state->args->trans, addblk->bp,
 				  XFS_DA_LOGRANGE(node, &node->hdr.info,
 				  sizeof(node->hdr.info)));
 	}
+out:
 	addblk->bp = NULL;
-	return 0;
+	return error;
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
index 16731d2d684b..f7f3fb458019 100644
--- a/fs/xfs/libxfs/xfs_dir2_node.c
+++ b/fs/xfs/libxfs/xfs_dir2_node.c
@@ -743,7 +743,8 @@ xfs_dir2_leafn_lookup_for_entry(
 	ents = dp->d_ops->leaf_ents_p(leaf);
 
 	xfs_dir3_leaf_check(dp, bp);
-	ASSERT(leafhdr.count > 0);
+	if (leafhdr.count <= 0)
+		return -EFSCORRUPTED;
 
 	/*
 	 * Look up the hash value in the leaf entries.

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/6] xfs: attribute scrub should use seen_enough to pass error values
  2019-06-26 20:46 [PATCH v2 0/6] xfs: scrub-related fixes Darrick J. Wong
  2019-06-26 20:46 ` [PATCH 1/6] xfs: remove more ondisk directory corruption asserts Darrick J. Wong
@ 2019-06-26 20:46 ` Darrick J. Wong
  2019-07-05 14:49   ` Brian Foster
  2019-06-26 20:46 ` [PATCH 3/6] xfs: refactor extended attribute buffer pointer functions Darrick J. Wong
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2019-06-26 20:46 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

When we're iterating all the attributes using the built-in xattr
iterator, we can use the seen_enough variable to pass error codes back
to the main scrub function instead of flattening them into 0/1.  This
will be used in a more exciting fashion in upcoming patches.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/attr.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index dce74ec57038..f0fd26abd39d 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -83,7 +83,7 @@ xchk_xattr_listent(
 	sx = container_of(context, struct xchk_xattr, context);
 
 	if (xchk_should_terminate(sx->sc, &error)) {
-		context->seen_enough = 1;
+		context->seen_enough = error;
 		return;
 	}
 
@@ -125,7 +125,7 @@ xchk_xattr_listent(
 					     args.blkno);
 fail_xref:
 	if (sx->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
-		context->seen_enough = 1;
+		context->seen_enough = XFS_ITER_ABORT;
 	return;
 }
 
@@ -464,6 +464,10 @@ xchk_xattr(
 	error = xfs_attr_list_int_ilocked(&sx.context);
 	if (!xchk_fblock_process_error(sc, XFS_ATTR_FORK, 0, &error))
 		goto out;
+
+	/* Did our listent function try to return any errors? */
+	if (sx.context.seen_enough < 0)
+		error = sx.context.seen_enough;
 out:
 	return error;
 }

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/6] xfs: refactor extended attribute buffer pointer functions
  2019-06-26 20:46 [PATCH v2 0/6] xfs: scrub-related fixes Darrick J. Wong
  2019-06-26 20:46 ` [PATCH 1/6] xfs: remove more ondisk directory corruption asserts Darrick J. Wong
  2019-06-26 20:46 ` [PATCH 2/6] xfs: attribute scrub should use seen_enough to pass error values Darrick J. Wong
@ 2019-06-26 20:46 ` Darrick J. Wong
  2019-07-05 14:52   ` Brian Foster
  2019-06-26 20:46 ` [PATCH 4/6] xfs: refactor attr scrub memory allocation function Darrick J. Wong
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2019-06-26 20:46 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Replace the open-coded attribute buffer pointer calculations with helper
functions to make it more obvious what we're doing with our freeform
memory allocation w.r.t. either storing xattr values or computing btree
block free space.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/attr.c |   15 +++++-------
 fs/xfs/scrub/attr.h |   65 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 72 insertions(+), 8 deletions(-)
 create mode 100644 fs/xfs/scrub/attr.h


diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index f0fd26abd39d..fd16eb3fa003 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -26,6 +26,7 @@
 #include "scrub/common.h"
 #include "scrub/dabtree.h"
 #include "scrub/trace.h"
+#include "scrub/attr.h"
 
 #include <linux/posix_acl_xattr.h>
 #include <linux/xattr.h>
@@ -111,7 +112,7 @@ xchk_xattr_listent(
 	args.namelen = namelen;
 	args.hashval = xfs_da_hashname(args.name, args.namelen);
 	args.trans = context->tp;
-	args.value = sx->sc->buf;
+	args.value = xchk_xattr_valuebuf(sx->sc);
 	args.valuelen = XATTR_SIZE_MAX;
 
 	error = xfs_attr_get_ilocked(context->dp, &args);
@@ -170,13 +171,12 @@ xchk_xattr_check_freemap(
 	unsigned long			*map,
 	struct xfs_attr3_icleaf_hdr	*leafhdr)
 {
-	unsigned long			*freemap;
-	unsigned long			*dstmap;
+	unsigned long			*freemap = xchk_xattr_freemap(sc);
+	unsigned long			*dstmap = xchk_xattr_dstmap(sc);
 	unsigned int			mapsize = sc->mp->m_attr_geo->blksize;
 	int				i;
 
 	/* Construct bitmap of freemap contents. */
-	freemap = (unsigned long *)sc->buf + BITS_TO_LONGS(mapsize);
 	bitmap_zero(freemap, mapsize);
 	for (i = 0; i < XFS_ATTR_LEAF_MAPSIZE; i++) {
 		if (!xchk_xattr_set_map(sc, freemap,
@@ -186,7 +186,6 @@ xchk_xattr_check_freemap(
 	}
 
 	/* Look for bits that are set in freemap and are marked in use. */
-	dstmap = freemap + BITS_TO_LONGS(mapsize);
 	return bitmap_and(dstmap, freemap, map, mapsize) == 0;
 }
 
@@ -201,13 +200,13 @@ xchk_xattr_entry(
 	char				*buf_end,
 	struct xfs_attr_leafblock	*leaf,
 	struct xfs_attr3_icleaf_hdr	*leafhdr,
-	unsigned long			*usedmap,
 	struct xfs_attr_leaf_entry	*ent,
 	int				idx,
 	unsigned int			*usedbytes,
 	__u32				*last_hashval)
 {
 	struct xfs_mount		*mp = ds->state->mp;
+	unsigned long			*usedmap = xchk_xattr_usedmap(ds->sc);
 	char				*name_end;
 	struct xfs_attr_leaf_name_local	*lentry;
 	struct xfs_attr_leaf_name_remote *rentry;
@@ -267,7 +266,7 @@ xchk_xattr_block(
 	struct xfs_attr_leafblock	*leaf = bp->b_addr;
 	struct xfs_attr_leaf_entry	*ent;
 	struct xfs_attr_leaf_entry	*entries;
-	unsigned long			*usedmap = ds->sc->buf;
+	unsigned long			*usedmap = xchk_xattr_usedmap(ds->sc);
 	char				*buf_end;
 	size_t				off;
 	__u32				last_hashval = 0;
@@ -324,7 +323,7 @@ xchk_xattr_block(
 
 		/* Check the entry and nameval. */
 		xchk_xattr_entry(ds, level, buf_end, leaf, &leafhdr,
-				usedmap, ent, i, &usedbytes, &last_hashval);
+				ent, i, &usedbytes, &last_hashval);
 
 		if (ds->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
 			goto out;
diff --git a/fs/xfs/scrub/attr.h b/fs/xfs/scrub/attr.h
new file mode 100644
index 000000000000..88bb5e29c60c
--- /dev/null
+++ b/fs/xfs/scrub/attr.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2019 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#ifndef __XFS_SCRUB_ATTR_H__
+#define __XFS_SCRUB_ATTR_H__
+
+/*
+ * Temporary storage for online scrub and repair of extended attributes.
+ */
+struct xchk_xattr_buf {
+	/*
+	 * Memory buffer -- either used for extracting attr values while
+	 * walking the attributes; or for computing attr block bitmaps when
+	 * checking the attribute tree.
+	 *
+	 * Each bitmap contains enough bits to track every byte in an attr
+	 * block (rounded up to the size of an unsigned long).  The attr block
+	 * used space bitmap starts at the beginning of the buffer; the free
+	 * space bitmap follows immediately after; and we have a third buffer
+	 * for storing intermediate bitmap results.
+	 */
+	uint8_t			buf[0];
+};
+
+/* A place to store attribute values. */
+static inline uint8_t *
+xchk_xattr_valuebuf(
+	struct xfs_scrub	*sc)
+{
+	struct xchk_xattr_buf	*ab = sc->buf;
+
+	return ab->buf;
+}
+
+/* A bitmap of space usage computed by walking an attr leaf block. */
+static inline unsigned long *
+xchk_xattr_usedmap(
+	struct xfs_scrub	*sc)
+{
+	struct xchk_xattr_buf	*ab = sc->buf;
+
+	return (unsigned long *)ab->buf;
+}
+
+/* A bitmap of free space computed by walking attr leaf block free info. */
+static inline unsigned long *
+xchk_xattr_freemap(
+	struct xfs_scrub	*sc)
+{
+	return xchk_xattr_usedmap(sc) +
+			BITS_TO_LONGS(sc->mp->m_attr_geo->blksize);
+}
+
+/* A bitmap used to hold temporary results. */
+static inline unsigned long *
+xchk_xattr_dstmap(
+	struct xfs_scrub	*sc)
+{
+	return xchk_xattr_freemap(sc) +
+			BITS_TO_LONGS(sc->mp->m_attr_geo->blksize);
+}
+
+#endif	/* __XFS_SCRUB_ATTR_H__ */

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/6] xfs: refactor attr scrub memory allocation function
  2019-06-26 20:46 [PATCH v2 0/6] xfs: scrub-related fixes Darrick J. Wong
                   ` (2 preceding siblings ...)
  2019-06-26 20:46 ` [PATCH 3/6] xfs: refactor extended attribute buffer pointer functions Darrick J. Wong
@ 2019-06-26 20:46 ` Darrick J. Wong
  2019-07-05 14:52   ` Brian Foster
  2019-06-26 20:47 ` [PATCH 5/6] xfs: only allocate memory for scrubbing attributes when we need it Darrick J. Wong
  2019-06-26 20:47 ` [PATCH 6/6] xfs: online scrub needn't bother zeroing its temporary buffer Darrick J. Wong
  5 siblings, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2019-06-26 20:46 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Move the code that allocates memory buffers for the extended attribute
scrub code into a separate function so we can reduce memory allocations
in the next patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/attr.c |   33 ++++++++++++++++++++++++---------
 fs/xfs/scrub/attr.h |    2 ++
 2 files changed, 26 insertions(+), 9 deletions(-)


diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index fd16eb3fa003..c20b6da1db84 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -31,26 +31,41 @@
 #include <linux/posix_acl_xattr.h>
 #include <linux/xattr.h>
 
-/* Set us up to scrub an inode's extended attributes. */
+/* Allocate enough memory to hold an attr value and attr block bitmaps. */
 int
-xchk_setup_xattr(
+xchk_setup_xattr_buf(
 	struct xfs_scrub	*sc,
-	struct xfs_inode	*ip)
+	size_t			value_size)
 {
 	size_t			sz;
 
 	/*
-	 * Allocate the buffer without the inode lock held.  We need enough
-	 * space to read every xattr value in the file or enough space to
-	 * hold three copies of the xattr free space bitmap.  (Not both at
-	 * the same time.)
+	 * We need enough space to read an xattr value from the file or enough
+	 * space to hold three copies of the xattr free space bitmap.  We don't
+	 * need the buffer space for both purposes at the same time.
 	 */
-	sz = max_t(size_t, XATTR_SIZE_MAX, 3 * sizeof(long) *
-			BITS_TO_LONGS(sc->mp->m_attr_geo->blksize));
+	sz = 3 * sizeof(long) * BITS_TO_LONGS(sc->mp->m_attr_geo->blksize);
+	sz = max_t(size_t, sz, value_size);
+
 	sc->buf = kmem_zalloc_large(sz, KM_SLEEP);
 	if (!sc->buf)
 		return -ENOMEM;
 
+	return 0;
+}
+
+/* Set us up to scrub an inode's extended attributes. */
+int
+xchk_setup_xattr(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*ip)
+{
+	int			error;
+
+	error = xchk_setup_xattr_buf(sc, XATTR_SIZE_MAX);
+	if (error)
+		return error;
+
 	return xchk_setup_inode_contents(sc, ip, 0);
 }
 
diff --git a/fs/xfs/scrub/attr.h b/fs/xfs/scrub/attr.h
index 88bb5e29c60c..27e879aeaafc 100644
--- a/fs/xfs/scrub/attr.h
+++ b/fs/xfs/scrub/attr.h
@@ -62,4 +62,6 @@ xchk_xattr_dstmap(
 			BITS_TO_LONGS(sc->mp->m_attr_geo->blksize);
 }
 
+int xchk_setup_xattr_buf(struct xfs_scrub *sc, size_t value_size);
+
 #endif	/* __XFS_SCRUB_ATTR_H__ */

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/6] xfs: only allocate memory for scrubbing attributes when we need it
  2019-06-26 20:46 [PATCH v2 0/6] xfs: scrub-related fixes Darrick J. Wong
                   ` (3 preceding siblings ...)
  2019-06-26 20:46 ` [PATCH 4/6] xfs: refactor attr scrub memory allocation function Darrick J. Wong
@ 2019-06-26 20:47 ` Darrick J. Wong
  2019-07-05 14:52   ` Brian Foster
  2019-06-26 20:47 ` [PATCH 6/6] xfs: online scrub needn't bother zeroing its temporary buffer Darrick J. Wong
  5 siblings, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2019-06-26 20:47 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

In examining a flame graph of time spent running xfs_scrub on various
filesystems, I noticed that we spent nearly 7% of the total runtime on
allocating a zeroed 65k buffer for every SCRUB_TYPE_XATTR invocation.
We do this even if none of the attribute values were anywhere near 64k
in size, even if there were no attribute blocks to check space on, and
even if it just turns out there are no attributes at all.

Therefore, rearrange the xattr buffer setup code to support reallocating
with a bigger buffer and redistribute the callers of that function so
that we only allocate memory just prior to needing it, and only allocate
as much as we need.  If we can't get memory with the ILOCK held we'll
bail out with EDEADLOCK which will allocate the maximum memory.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/attr.c |   67 ++++++++++++++++++++++++++++++++++++++++++++-------
 fs/xfs/scrub/attr.h |    6 ++++-
 2 files changed, 63 insertions(+), 10 deletions(-)


diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index c20b6da1db84..09081d8ab34b 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -31,13 +31,19 @@
 #include <linux/posix_acl_xattr.h>
 #include <linux/xattr.h>
 
-/* Allocate enough memory to hold an attr value and attr block bitmaps. */
+/*
+ * Allocate enough memory to hold an attr value and attr block bitmaps,
+ * reallocating the buffer if necessary.  Buffer contents are not preserved
+ * across a reallocation.
+ */
 int
 xchk_setup_xattr_buf(
 	struct xfs_scrub	*sc,
-	size_t			value_size)
+	size_t			value_size,
+	xfs_km_flags_t		flags)
 {
 	size_t			sz;
+	struct xchk_xattr_buf	*ab = sc->buf;
 
 	/*
 	 * We need enough space to read an xattr value from the file or enough
@@ -47,10 +53,23 @@ xchk_setup_xattr_buf(
 	sz = 3 * sizeof(long) * BITS_TO_LONGS(sc->mp->m_attr_geo->blksize);
 	sz = max_t(size_t, sz, value_size);
 
-	sc->buf = kmem_zalloc_large(sz, KM_SLEEP);
-	if (!sc->buf)
+	/*
+	 * If there's already a buffer, figure out if we need to reallocate it
+	 * to accomdate a larger size.
+	 */
+	if (ab) {
+		if (sz <= ab->sz)
+			return 0;
+		kmem_free(ab);
+		sc->buf = NULL;
+	}
+
+	ab = kmem_zalloc_large(sizeof(*ab) + sz, flags);
+	if (!ab)
 		return -ENOMEM;
 
+	ab->sz = sz;
+	sc->buf = ab;
 	return 0;
 }
 
@@ -62,9 +81,16 @@ xchk_setup_xattr(
 {
 	int			error;
 
-	error = xchk_setup_xattr_buf(sc, XATTR_SIZE_MAX);
-	if (error)
-		return error;
+	/*
+	 * We failed to get memory while checking attrs, so this time try to
+	 * get all the memory we're ever going to need.  Allocate the buffer
+	 * without the inode lock held, which means we can sleep.
+	 */
+	if (sc->flags & XCHK_TRY_HARDER) {
+		error = xchk_setup_xattr_buf(sc, XATTR_SIZE_MAX, KM_SLEEP);
+		if (error)
+			return error;
+	}
 
 	return xchk_setup_inode_contents(sc, ip, 0);
 }
@@ -115,6 +141,19 @@ xchk_xattr_listent(
 		return;
 	}
 
+	/*
+	 * Try to allocate enough memory to extrat the attr value.  If that
+	 * doesn't work, we overload the seen_enough variable to convey
+	 * the error message back to the main scrub function.
+	 */
+	error = xchk_setup_xattr_buf(sx->sc, valuelen, KM_MAYFAIL);
+	if (error == -ENOMEM)
+		error = -EDEADLOCK;
+	if (error) {
+		context->seen_enough = error;
+		return;
+	}
+
 	args.flags = ATTR_KERNOTIME;
 	if (flags & XFS_ATTR_ROOT)
 		args.flags |= ATTR_ROOT;
@@ -128,7 +167,7 @@ xchk_xattr_listent(
 	args.hashval = xfs_da_hashname(args.name, args.namelen);
 	args.trans = context->tp;
 	args.value = xchk_xattr_valuebuf(sx->sc);
-	args.valuelen = XATTR_SIZE_MAX;
+	args.valuelen = valuelen;
 
 	error = xfs_attr_get_ilocked(context->dp, &args);
 	if (error == -EEXIST)
@@ -281,16 +320,26 @@ xchk_xattr_block(
 	struct xfs_attr_leafblock	*leaf = bp->b_addr;
 	struct xfs_attr_leaf_entry	*ent;
 	struct xfs_attr_leaf_entry	*entries;
-	unsigned long			*usedmap = xchk_xattr_usedmap(ds->sc);
+	unsigned long			*usedmap;
 	char				*buf_end;
 	size_t				off;
 	__u32				last_hashval = 0;
 	unsigned int			usedbytes = 0;
 	unsigned int			hdrsize;
 	int				i;
+	int				error;
 
 	if (*last_checked == blk->blkno)
 		return 0;
+
+	/* Allocate memory for block usage checking. */
+	error = xchk_setup_xattr_buf(ds->sc, 0, KM_MAYFAIL);
+	if (error == -ENOMEM)
+		return -EDEADLOCK;
+	if (error)
+		return error;
+	usedmap = xchk_xattr_usedmap(ds->sc);
+
 	*last_checked = blk->blkno;
 	bitmap_zero(usedmap, mp->m_attr_geo->blksize);
 
diff --git a/fs/xfs/scrub/attr.h b/fs/xfs/scrub/attr.h
index 27e879aeaafc..13a1d2e8424d 100644
--- a/fs/xfs/scrub/attr.h
+++ b/fs/xfs/scrub/attr.h
@@ -10,6 +10,9 @@
  * Temporary storage for online scrub and repair of extended attributes.
  */
 struct xchk_xattr_buf {
+	/* Size of @buf, in bytes. */
+	size_t			sz;
+
 	/*
 	 * Memory buffer -- either used for extracting attr values while
 	 * walking the attributes; or for computing attr block bitmaps when
@@ -62,6 +65,7 @@ xchk_xattr_dstmap(
 			BITS_TO_LONGS(sc->mp->m_attr_geo->blksize);
 }
 
-int xchk_setup_xattr_buf(struct xfs_scrub *sc, size_t value_size);
+int xchk_setup_xattr_buf(struct xfs_scrub *sc, size_t value_size,
+		xfs_km_flags_t flags);
 
 #endif	/* __XFS_SCRUB_ATTR_H__ */

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/6] xfs: online scrub needn't bother zeroing its temporary buffer
  2019-06-26 20:46 [PATCH v2 0/6] xfs: scrub-related fixes Darrick J. Wong
                   ` (4 preceding siblings ...)
  2019-06-26 20:47 ` [PATCH 5/6] xfs: only allocate memory for scrubbing attributes when we need it Darrick J. Wong
@ 2019-06-26 20:47 ` Darrick J. Wong
  2019-07-05 14:52   ` Brian Foster
  5 siblings, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2019-06-26 20:47 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

The xattr scrubber functions use the temporary memory buffer either for
storing bitmaps or for testing if attribute value extraction works.  The
bitmap code always zeroes what it needs and the value extraction merely
sets the buffer contents (we never read the contents, we just look for
return codes), so it's not necessary to waste CPU time zeroing on
allocation.

A flame graph analysis showed that we were spending 7% of a xfs_scrub
run (the whole program, not just the attr scrubber itself) allocating
and zeroing 64k segments needlessly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/attr.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index 09081d8ab34b..d3a6f3dacf0d 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -64,7 +64,12 @@ xchk_setup_xattr_buf(
 		sc->buf = NULL;
 	}
 
-	ab = kmem_zalloc_large(sizeof(*ab) + sz, flags);
+	/*
+	 * Allocate the big buffer.  We skip zeroing it because that added 7%
+	 * to the scrub runtime and all the users were careful never to read
+	 * uninitialized contents.
+	 */
+	ab = kmem_alloc_large(sizeof(*ab) + sz, flags);
 	if (!ab)
 		return -ENOMEM;
 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] xfs: remove more ondisk directory corruption asserts
  2019-06-26 20:46 ` [PATCH 1/6] xfs: remove more ondisk directory corruption asserts Darrick J. Wong
@ 2019-07-05 14:49   ` Brian Foster
  2019-07-05 17:03     ` Darrick J. Wong
  0 siblings, 1 reply; 20+ messages in thread
From: Brian Foster @ 2019-07-05 14:49 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Jun 26, 2019 at 01:46:40PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Continue our game of replacing ASSERTs for corrupt ondisk metadata with
> EFSCORRUPTED returns.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_da_btree.c  |   19 ++++++++++++-------
>  fs/xfs/libxfs/xfs_dir2_node.c |    3 ++-
>  2 files changed, 14 insertions(+), 8 deletions(-)
> 
> 
...
> diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
> index 16731d2d684b..f7f3fb458019 100644
> --- a/fs/xfs/libxfs/xfs_dir2_node.c
> +++ b/fs/xfs/libxfs/xfs_dir2_node.c
> @@ -743,7 +743,8 @@ xfs_dir2_leafn_lookup_for_entry(
>  	ents = dp->d_ops->leaf_ents_p(leaf);
>  
>  	xfs_dir3_leaf_check(dp, bp);
> -	ASSERT(leafhdr.count > 0);
> +	if (leafhdr.count <= 0)
> +		return -EFSCORRUPTED;

This error return bubbles up to xfs_dir2_leafn_lookup_int() and
xfs_da3_node_lookup_int(). The latter has a direct return value as well
as a *result return parameter, which unconditionally carries the return
value from xfs_dir2_leafn_lookup_int(). xfs_da3_node_lookup_int() has
multiple callers, but a quick look at one (xfs_attr_node_addname())
suggests we might not handle corruption errors properly via the *result
parameter. Perhaps we also need to fix up xfs_da3_node_lookup_int() to
return particular errors directly?

Brian

>  
>  	/*
>  	 * Look up the hash value in the leaf entries.
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/6] xfs: attribute scrub should use seen_enough to pass error values
  2019-06-26 20:46 ` [PATCH 2/6] xfs: attribute scrub should use seen_enough to pass error values Darrick J. Wong
@ 2019-07-05 14:49   ` Brian Foster
  2019-07-05 16:46     ` Darrick J. Wong
  0 siblings, 1 reply; 20+ messages in thread
From: Brian Foster @ 2019-07-05 14:49 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Jun 26, 2019 at 01:46:45PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When we're iterating all the attributes using the built-in xattr
> iterator, we can use the seen_enough variable to pass error codes back
> to the main scrub function instead of flattening them into 0/1.  This
> will be used in a more exciting fashion in upcoming patches.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/scrub/attr.c |    8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
> index dce74ec57038..f0fd26abd39d 100644
> --- a/fs/xfs/scrub/attr.c
> +++ b/fs/xfs/scrub/attr.c
> @@ -83,7 +83,7 @@ xchk_xattr_listent(
>  	sx = container_of(context, struct xchk_xattr, context);
>  
>  	if (xchk_should_terminate(sx->sc, &error)) {
> -		context->seen_enough = 1;
> +		context->seen_enough = error;

It might be appropriate to update the xfs_attr_list_context structure
definition comment since 'seen_enough' is not self explanatory as an
error code..? Otherwise looks fine:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  		return;
>  	}
>  
> @@ -125,7 +125,7 @@ xchk_xattr_listent(
>  					     args.blkno);
>  fail_xref:
>  	if (sx->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
> -		context->seen_enough = 1;
> +		context->seen_enough = XFS_ITER_ABORT;
>  	return;
>  }
>  
> @@ -464,6 +464,10 @@ xchk_xattr(
>  	error = xfs_attr_list_int_ilocked(&sx.context);
>  	if (!xchk_fblock_process_error(sc, XFS_ATTR_FORK, 0, &error))
>  		goto out;
> +
> +	/* Did our listent function try to return any errors? */
> +	if (sx.context.seen_enough < 0)
> +		error = sx.context.seen_enough;
>  out:
>  	return error;
>  }
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] xfs: refactor extended attribute buffer pointer functions
  2019-06-26 20:46 ` [PATCH 3/6] xfs: refactor extended attribute buffer pointer functions Darrick J. Wong
@ 2019-07-05 14:52   ` Brian Foster
  0 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2019-07-05 14:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Jun 26, 2019 at 01:46:52PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Replace the open-coded attribute buffer pointer calculations with helper
> functions to make it more obvious what we're doing with our freeform
> memory allocation w.r.t. either storing xattr values or computing btree
> block free space.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/scrub/attr.c |   15 +++++-------
>  fs/xfs/scrub/attr.h |   65 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 72 insertions(+), 8 deletions(-)
>  create mode 100644 fs/xfs/scrub/attr.h
> 
> 
...
> diff --git a/fs/xfs/scrub/attr.h b/fs/xfs/scrub/attr.h
> new file mode 100644
> index 000000000000..88bb5e29c60c
> --- /dev/null
> +++ b/fs/xfs/scrub/attr.h
> @@ -0,0 +1,65 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Copyright (C) 2019 Oracle.  All Rights Reserved.
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + */
> +#ifndef __XFS_SCRUB_ATTR_H__
> +#define __XFS_SCRUB_ATTR_H__
> +
> +/*
> + * Temporary storage for online scrub and repair of extended attributes.
> + */
> +struct xchk_xattr_buf {
> +	/*
> +	 * Memory buffer -- either used for extracting attr values while
> +	 * walking the attributes; or for computing attr block bitmaps when
> +	 * checking the attribute tree.
> +	 *
> +	 * Each bitmap contains enough bits to track every byte in an attr
> +	 * block (rounded up to the size of an unsigned long).  The attr block
> +	 * used space bitmap starts at the beginning of the buffer; the free
> +	 * space bitmap follows immediately after; and we have a third buffer
> +	 * for storing intermediate bitmap results.
> +	 */
> +	uint8_t			buf[0];
> +};
> +
> +/* A place to store attribute values. */
> +static inline uint8_t *
> +xchk_xattr_valuebuf(
> +	struct xfs_scrub	*sc)
> +{
> +	struct xchk_xattr_buf	*ab = sc->buf;
> +

I was a little confused by this at first because it seemed unnecessary
to inject the structure in this type conversion from a void pointer. I
see that the next patch adds another field, however, so this looks fine:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> +	return ab->buf;
> +}
> +
> +/* A bitmap of space usage computed by walking an attr leaf block. */
> +static inline unsigned long *
> +xchk_xattr_usedmap(
> +	struct xfs_scrub	*sc)
> +{
> +	struct xchk_xattr_buf	*ab = sc->buf;
> +
> +	return (unsigned long *)ab->buf;
> +}
> +
> +/* A bitmap of free space computed by walking attr leaf block free info. */
> +static inline unsigned long *
> +xchk_xattr_freemap(
> +	struct xfs_scrub	*sc)
> +{
> +	return xchk_xattr_usedmap(sc) +
> +			BITS_TO_LONGS(sc->mp->m_attr_geo->blksize);
> +}
> +
> +/* A bitmap used to hold temporary results. */
> +static inline unsigned long *
> +xchk_xattr_dstmap(
> +	struct xfs_scrub	*sc)
> +{
> +	return xchk_xattr_freemap(sc) +
> +			BITS_TO_LONGS(sc->mp->m_attr_geo->blksize);
> +}
> +
> +#endif	/* __XFS_SCRUB_ATTR_H__ */
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] xfs: refactor attr scrub memory allocation function
  2019-06-26 20:46 ` [PATCH 4/6] xfs: refactor attr scrub memory allocation function Darrick J. Wong
@ 2019-07-05 14:52   ` Brian Foster
  0 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2019-07-05 14:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Jun 26, 2019 at 01:46:58PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Move the code that allocates memory buffers for the extended attribute
> scrub code into a separate function so we can reduce memory allocations
> in the next patch.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/scrub/attr.c |   33 ++++++++++++++++++++++++---------
>  fs/xfs/scrub/attr.h |    2 ++
>  2 files changed, 26 insertions(+), 9 deletions(-)
> 
> 
> diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
> index fd16eb3fa003..c20b6da1db84 100644
> --- a/fs/xfs/scrub/attr.c
> +++ b/fs/xfs/scrub/attr.c
> @@ -31,26 +31,41 @@
>  #include <linux/posix_acl_xattr.h>
>  #include <linux/xattr.h>
>  
> -/* Set us up to scrub an inode's extended attributes. */
> +/* Allocate enough memory to hold an attr value and attr block bitmaps. */
>  int
> -xchk_setup_xattr(
> +xchk_setup_xattr_buf(
>  	struct xfs_scrub	*sc,
> -	struct xfs_inode	*ip)
> +	size_t			value_size)
>  {
>  	size_t			sz;
>  
>  	/*
> -	 * Allocate the buffer without the inode lock held.  We need enough
> -	 * space to read every xattr value in the file or enough space to
> -	 * hold three copies of the xattr free space bitmap.  (Not both at
> -	 * the same time.)
> +	 * We need enough space to read an xattr value from the file or enough
> +	 * space to hold three copies of the xattr free space bitmap.  We don't
> +	 * need the buffer space for both purposes at the same time.
>  	 */
> -	sz = max_t(size_t, XATTR_SIZE_MAX, 3 * sizeof(long) *
> -			BITS_TO_LONGS(sc->mp->m_attr_geo->blksize));
> +	sz = 3 * sizeof(long) * BITS_TO_LONGS(sc->mp->m_attr_geo->blksize);
> +	sz = max_t(size_t, sz, value_size);
> +
>  	sc->buf = kmem_zalloc_large(sz, KM_SLEEP);
>  	if (!sc->buf)
>  		return -ENOMEM;
>  
> +	return 0;
> +}
> +
> +/* Set us up to scrub an inode's extended attributes. */
> +int
> +xchk_setup_xattr(
> +	struct xfs_scrub	*sc,
> +	struct xfs_inode	*ip)
> +{
> +	int			error;
> +
> +	error = xchk_setup_xattr_buf(sc, XATTR_SIZE_MAX);
> +	if (error)
> +		return error;
> +
>  	return xchk_setup_inode_contents(sc, ip, 0);
>  }
>  
> diff --git a/fs/xfs/scrub/attr.h b/fs/xfs/scrub/attr.h
> index 88bb5e29c60c..27e879aeaafc 100644
> --- a/fs/xfs/scrub/attr.h
> +++ b/fs/xfs/scrub/attr.h
> @@ -62,4 +62,6 @@ xchk_xattr_dstmap(
>  			BITS_TO_LONGS(sc->mp->m_attr_geo->blksize);
>  }
>  
> +int xchk_setup_xattr_buf(struct xfs_scrub *sc, size_t value_size);
> +
>  #endif	/* __XFS_SCRUB_ATTR_H__ */
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/6] xfs: only allocate memory for scrubbing attributes when we need it
  2019-06-26 20:47 ` [PATCH 5/6] xfs: only allocate memory for scrubbing attributes when we need it Darrick J. Wong
@ 2019-07-05 14:52   ` Brian Foster
  2019-07-05 16:49     ` Darrick J. Wong
  0 siblings, 1 reply; 20+ messages in thread
From: Brian Foster @ 2019-07-05 14:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Jun 26, 2019 at 01:47:04PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> In examining a flame graph of time spent running xfs_scrub on various
> filesystems, I noticed that we spent nearly 7% of the total runtime on
> allocating a zeroed 65k buffer for every SCRUB_TYPE_XATTR invocation.
> We do this even if none of the attribute values were anywhere near 64k
> in size, even if there were no attribute blocks to check space on, and
> even if it just turns out there are no attributes at all.
> 
> Therefore, rearrange the xattr buffer setup code to support reallocating
> with a bigger buffer and redistribute the callers of that function so
> that we only allocate memory just prior to needing it, and only allocate
> as much as we need.  If we can't get memory with the ILOCK held we'll
> bail out with EDEADLOCK which will allocate the maximum memory.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/scrub/attr.c |   67 ++++++++++++++++++++++++++++++++++++++++++++-------
>  fs/xfs/scrub/attr.h |    6 ++++-
>  2 files changed, 63 insertions(+), 10 deletions(-)
> 
> 
> diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
> index c20b6da1db84..09081d8ab34b 100644
> --- a/fs/xfs/scrub/attr.c
> +++ b/fs/xfs/scrub/attr.c
...
> @@ -47,10 +53,23 @@ xchk_setup_xattr_buf(
>  	sz = 3 * sizeof(long) * BITS_TO_LONGS(sc->mp->m_attr_geo->blksize);
>  	sz = max_t(size_t, sz, value_size);
>  
> -	sc->buf = kmem_zalloc_large(sz, KM_SLEEP);
> -	if (!sc->buf)
> +	/*
> +	 * If there's already a buffer, figure out if we need to reallocate it
> +	 * to accomdate a larger size.

accommodate

Otherwise looks good:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> +	 */
> +	if (ab) {
> +		if (sz <= ab->sz)
> +			return 0;
> +		kmem_free(ab);
> +		sc->buf = NULL;
> +	}
> +
> +	ab = kmem_zalloc_large(sizeof(*ab) + sz, flags);
> +	if (!ab)
>  		return -ENOMEM;
>  
> +	ab->sz = sz;
> +	sc->buf = ab;
>  	return 0;
>  }
>  
> @@ -62,9 +81,16 @@ xchk_setup_xattr(
>  {
>  	int			error;
>  
> -	error = xchk_setup_xattr_buf(sc, XATTR_SIZE_MAX);
> -	if (error)
> -		return error;
> +	/*
> +	 * We failed to get memory while checking attrs, so this time try to
> +	 * get all the memory we're ever going to need.  Allocate the buffer
> +	 * without the inode lock held, which means we can sleep.
> +	 */
> +	if (sc->flags & XCHK_TRY_HARDER) {
> +		error = xchk_setup_xattr_buf(sc, XATTR_SIZE_MAX, KM_SLEEP);
> +		if (error)
> +			return error;
> +	}
>  
>  	return xchk_setup_inode_contents(sc, ip, 0);
>  }
> @@ -115,6 +141,19 @@ xchk_xattr_listent(
>  		return;
>  	}
>  
> +	/*
> +	 * Try to allocate enough memory to extrat the attr value.  If that
> +	 * doesn't work, we overload the seen_enough variable to convey
> +	 * the error message back to the main scrub function.
> +	 */
> +	error = xchk_setup_xattr_buf(sx->sc, valuelen, KM_MAYFAIL);
> +	if (error == -ENOMEM)
> +		error = -EDEADLOCK;
> +	if (error) {
> +		context->seen_enough = error;
> +		return;
> +	}
> +
>  	args.flags = ATTR_KERNOTIME;
>  	if (flags & XFS_ATTR_ROOT)
>  		args.flags |= ATTR_ROOT;
> @@ -128,7 +167,7 @@ xchk_xattr_listent(
>  	args.hashval = xfs_da_hashname(args.name, args.namelen);
>  	args.trans = context->tp;
>  	args.value = xchk_xattr_valuebuf(sx->sc);
> -	args.valuelen = XATTR_SIZE_MAX;
> +	args.valuelen = valuelen;
>  
>  	error = xfs_attr_get_ilocked(context->dp, &args);
>  	if (error == -EEXIST)
> @@ -281,16 +320,26 @@ xchk_xattr_block(
>  	struct xfs_attr_leafblock	*leaf = bp->b_addr;
>  	struct xfs_attr_leaf_entry	*ent;
>  	struct xfs_attr_leaf_entry	*entries;
> -	unsigned long			*usedmap = xchk_xattr_usedmap(ds->sc);
> +	unsigned long			*usedmap;
>  	char				*buf_end;
>  	size_t				off;
>  	__u32				last_hashval = 0;
>  	unsigned int			usedbytes = 0;
>  	unsigned int			hdrsize;
>  	int				i;
> +	int				error;
>  
>  	if (*last_checked == blk->blkno)
>  		return 0;
> +
> +	/* Allocate memory for block usage checking. */
> +	error = xchk_setup_xattr_buf(ds->sc, 0, KM_MAYFAIL);
> +	if (error == -ENOMEM)
> +		return -EDEADLOCK;
> +	if (error)
> +		return error;
> +	usedmap = xchk_xattr_usedmap(ds->sc);
> +
>  	*last_checked = blk->blkno;
>  	bitmap_zero(usedmap, mp->m_attr_geo->blksize);
>  
> diff --git a/fs/xfs/scrub/attr.h b/fs/xfs/scrub/attr.h
> index 27e879aeaafc..13a1d2e8424d 100644
> --- a/fs/xfs/scrub/attr.h
> +++ b/fs/xfs/scrub/attr.h
> @@ -10,6 +10,9 @@
>   * Temporary storage for online scrub and repair of extended attributes.
>   */
>  struct xchk_xattr_buf {
> +	/* Size of @buf, in bytes. */
> +	size_t			sz;
> +
>  	/*
>  	 * Memory buffer -- either used for extracting attr values while
>  	 * walking the attributes; or for computing attr block bitmaps when
> @@ -62,6 +65,7 @@ xchk_xattr_dstmap(
>  			BITS_TO_LONGS(sc->mp->m_attr_geo->blksize);
>  }
>  
> -int xchk_setup_xattr_buf(struct xfs_scrub *sc, size_t value_size);
> +int xchk_setup_xattr_buf(struct xfs_scrub *sc, size_t value_size,
> +		xfs_km_flags_t flags);
>  
>  #endif	/* __XFS_SCRUB_ATTR_H__ */
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/6] xfs: online scrub needn't bother zeroing its temporary buffer
  2019-06-26 20:47 ` [PATCH 6/6] xfs: online scrub needn't bother zeroing its temporary buffer Darrick J. Wong
@ 2019-07-05 14:52   ` Brian Foster
  2019-07-05 16:35     ` Darrick J. Wong
  0 siblings, 1 reply; 20+ messages in thread
From: Brian Foster @ 2019-07-05 14:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Jun 26, 2019 at 01:47:10PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> The xattr scrubber functions use the temporary memory buffer either for
> storing bitmaps or for testing if attribute value extraction works.  The
> bitmap code always zeroes what it needs and the value extraction merely
> sets the buffer contents (we never read the contents, we just look for
> return codes), so it's not necessary to waste CPU time zeroing on
> allocation.
> 

If we don't need to zero the buffer because we never look at the result,
that suggests we don't need to populate it in the first place right?

> A flame graph analysis showed that we were spending 7% of a xfs_scrub
> run (the whole program, not just the attr scrubber itself) allocating
> and zeroing 64k segments needlessly.
> 

How much does this patch help?

> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/scrub/attr.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> 
> diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
> index 09081d8ab34b..d3a6f3dacf0d 100644
> --- a/fs/xfs/scrub/attr.c
> +++ b/fs/xfs/scrub/attr.c
> @@ -64,7 +64,12 @@ xchk_setup_xattr_buf(
>  		sc->buf = NULL;
>  	}
>  
> -	ab = kmem_zalloc_large(sizeof(*ab) + sz, flags);
> +	/*
> +	 * Allocate the big buffer.  We skip zeroing it because that added 7%
> +	 * to the scrub runtime and all the users were careful never to read
> +	 * uninitialized contents.
> +	 */

Ok, that suggests the 7% hit was due to zeroing (where the commit log
says "allocating and zeroing"). Either way, we probably don't need such
details in the code. Can we tweak the comment to something like:

/*
 * Don't zero the buffer on allocation to avoid runtime overhead. All
 * users must be careful never to read uninitialized contents.
 */ 

With that:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> +	ab = kmem_alloc_large(sizeof(*ab) + sz, flags);
>  	if (!ab)
>  		return -ENOMEM;
>  
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/6] xfs: online scrub needn't bother zeroing its temporary buffer
  2019-07-05 14:52   ` Brian Foster
@ 2019-07-05 16:35     ` Darrick J. Wong
  2019-07-05 17:26       ` Brian Foster
  0 siblings, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2019-07-05 16:35 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Jul 05, 2019 at 10:52:46AM -0400, Brian Foster wrote:
> On Wed, Jun 26, 2019 at 01:47:10PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > The xattr scrubber functions use the temporary memory buffer either for
> > storing bitmaps or for testing if attribute value extraction works.  The
> > bitmap code always zeroes what it needs and the value extraction merely
> > sets the buffer contents (we never read the contents, we just look for
> > return codes), so it's not necessary to waste CPU time zeroing on
> > allocation.
> > 
> 
> If we don't need to zero the buffer because we never look at the result,
> that suggests we don't need to populate it in the first place right?

We still need to read the attr value into the buffer (at least for
remote attr values) because scrub doesn't otherwise check the remote
attribute block header.

We never read the contents (because the contents are just arbitrary
bytes) but we do need to be able to catch an EFSCORRUPTED if, say, the
attribute dabtree points at a corrupt block.

> > A flame graph analysis showed that we were spending 7% of a xfs_scrub
> > run (the whole program, not just the attr scrubber itself) allocating
> > and zeroing 64k segments needlessly.
> > 
> 
> How much does this patch help?

About 1-2% I think.  FWIW the "7%" figure represents the smallest
improvement I saw in runtimes, where allocation ate 1-2% of the runtime
and zeroing accounts for the rest (~5-6%).

Practically speaking, when I retested with NVME flash instead of
spinning rust then the improvement jumped to 15-20% overall.

> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/scrub/attr.c |    7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > 
> > diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
> > index 09081d8ab34b..d3a6f3dacf0d 100644
> > --- a/fs/xfs/scrub/attr.c
> > +++ b/fs/xfs/scrub/attr.c
> > @@ -64,7 +64,12 @@ xchk_setup_xattr_buf(
> >  		sc->buf = NULL;
> >  	}
> >  
> > -	ab = kmem_zalloc_large(sizeof(*ab) + sz, flags);
> > +	/*
> > +	 * Allocate the big buffer.  We skip zeroing it because that added 7%
> > +	 * to the scrub runtime and all the users were careful never to read
> > +	 * uninitialized contents.
> > +	 */
> 
> Ok, that suggests the 7% hit was due to zeroing (where the commit log
> says "allocating and zeroing"). Either way, we probably don't need such
> details in the code. Can we tweak the comment to something like:
> 
> /*
>  * Don't zero the buffer on allocation to avoid runtime overhead. All
>  * users must be careful never to read uninitialized contents.
>  */ 

Ok, I'll do that.

Thanks for all the review! :)

--D

> 
> With that:
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> > +	ab = kmem_alloc_large(sizeof(*ab) + sz, flags);
> >  	if (!ab)
> >  		return -ENOMEM;
> >  
> > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/6] xfs: attribute scrub should use seen_enough to pass error values
  2019-07-05 14:49   ` Brian Foster
@ 2019-07-05 16:46     ` Darrick J. Wong
  0 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2019-07-05 16:46 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Jul 05, 2019 at 10:49:17AM -0400, Brian Foster wrote:
> On Wed, Jun 26, 2019 at 01:46:45PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > When we're iterating all the attributes using the built-in xattr
> > iterator, we can use the seen_enough variable to pass error codes back
> > to the main scrub function instead of flattening them into 0/1.  This
> > will be used in a more exciting fashion in upcoming patches.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/scrub/attr.c |    8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
> > index dce74ec57038..f0fd26abd39d 100644
> > --- a/fs/xfs/scrub/attr.c
> > +++ b/fs/xfs/scrub/attr.c
> > @@ -83,7 +83,7 @@ xchk_xattr_listent(
> >  	sx = container_of(context, struct xchk_xattr, context);
> >  
> >  	if (xchk_should_terminate(sx->sc, &error)) {
> > -		context->seen_enough = 1;
> > +		context->seen_enough = error;
> 
> It might be appropriate to update the xfs_attr_list_context structure
> definition comment since 'seen_enough' is not self explanatory as an
> error code..? Otherwise looks fine:

Ok.  I'll change it to:

	/*
	 * Abort attribute list iteration if non-zero.  Can be used to
	 * pass error values to the xfs_attr_list caller.
	 */

--D

> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> >  		return;
> >  	}
> >  
> > @@ -125,7 +125,7 @@ xchk_xattr_listent(
> >  					     args.blkno);
> >  fail_xref:
> >  	if (sx->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
> > -		context->seen_enough = 1;
> > +		context->seen_enough = XFS_ITER_ABORT;
> >  	return;
> >  }
> >  
> > @@ -464,6 +464,10 @@ xchk_xattr(
> >  	error = xfs_attr_list_int_ilocked(&sx.context);
> >  	if (!xchk_fblock_process_error(sc, XFS_ATTR_FORK, 0, &error))
> >  		goto out;
> > +
> > +	/* Did our listent function try to return any errors? */
> > +	if (sx.context.seen_enough < 0)
> > +		error = sx.context.seen_enough;
> >  out:
> >  	return error;
> >  }
> > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/6] xfs: only allocate memory for scrubbing attributes when we need it
  2019-07-05 14:52   ` Brian Foster
@ 2019-07-05 16:49     ` Darrick J. Wong
  0 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2019-07-05 16:49 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Jul 05, 2019 at 10:52:33AM -0400, Brian Foster wrote:
> On Wed, Jun 26, 2019 at 01:47:04PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > In examining a flame graph of time spent running xfs_scrub on various
> > filesystems, I noticed that we spent nearly 7% of the total runtime on
> > allocating a zeroed 65k buffer for every SCRUB_TYPE_XATTR invocation.
> > We do this even if none of the attribute values were anywhere near 64k
> > in size, even if there were no attribute blocks to check space on, and
> > even if it just turns out there are no attributes at all.
> > 
> > Therefore, rearrange the xattr buffer setup code to support reallocating
> > with a bigger buffer and redistribute the callers of that function so
> > that we only allocate memory just prior to needing it, and only allocate
> > as much as we need.  If we can't get memory with the ILOCK held we'll
> > bail out with EDEADLOCK which will allocate the maximum memory.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/scrub/attr.c |   67 ++++++++++++++++++++++++++++++++++++++++++++-------
> >  fs/xfs/scrub/attr.h |    6 ++++-
> >  2 files changed, 63 insertions(+), 10 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
> > index c20b6da1db84..09081d8ab34b 100644
> > --- a/fs/xfs/scrub/attr.c
> > +++ b/fs/xfs/scrub/attr.c
> ...
> > @@ -47,10 +53,23 @@ xchk_setup_xattr_buf(
> >  	sz = 3 * sizeof(long) * BITS_TO_LONGS(sc->mp->m_attr_geo->blksize);
> >  	sz = max_t(size_t, sz, value_size);
> >  
> > -	sc->buf = kmem_zalloc_large(sz, KM_SLEEP);
> > -	if (!sc->buf)
> > +	/*
> > +	 * If there's already a buffer, figure out if we need to reallocate it
> > +	 * to accomdate a larger size.
> 
> accommodate

Fixed.  Thanks for catching that.

--D

> Otherwise looks good:
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> > +	 */
> > +	if (ab) {
> > +		if (sz <= ab->sz)
> > +			return 0;
> > +		kmem_free(ab);
> > +		sc->buf = NULL;
> > +	}
> > +
> > +	ab = kmem_zalloc_large(sizeof(*ab) + sz, flags);
> > +	if (!ab)
> >  		return -ENOMEM;
> >  
> > +	ab->sz = sz;
> > +	sc->buf = ab;
> >  	return 0;
> >  }
> >  
> > @@ -62,9 +81,16 @@ xchk_setup_xattr(
> >  {
> >  	int			error;
> >  
> > -	error = xchk_setup_xattr_buf(sc, XATTR_SIZE_MAX);
> > -	if (error)
> > -		return error;
> > +	/*
> > +	 * We failed to get memory while checking attrs, so this time try to
> > +	 * get all the memory we're ever going to need.  Allocate the buffer
> > +	 * without the inode lock held, which means we can sleep.
> > +	 */
> > +	if (sc->flags & XCHK_TRY_HARDER) {
> > +		error = xchk_setup_xattr_buf(sc, XATTR_SIZE_MAX, KM_SLEEP);
> > +		if (error)
> > +			return error;
> > +	}
> >  
> >  	return xchk_setup_inode_contents(sc, ip, 0);
> >  }
> > @@ -115,6 +141,19 @@ xchk_xattr_listent(
> >  		return;
> >  	}
> >  
> > +	/*
> > +	 * Try to allocate enough memory to extrat the attr value.  If that
> > +	 * doesn't work, we overload the seen_enough variable to convey
> > +	 * the error message back to the main scrub function.
> > +	 */
> > +	error = xchk_setup_xattr_buf(sx->sc, valuelen, KM_MAYFAIL);
> > +	if (error == -ENOMEM)
> > +		error = -EDEADLOCK;
> > +	if (error) {
> > +		context->seen_enough = error;
> > +		return;
> > +	}
> > +
> >  	args.flags = ATTR_KERNOTIME;
> >  	if (flags & XFS_ATTR_ROOT)
> >  		args.flags |= ATTR_ROOT;
> > @@ -128,7 +167,7 @@ xchk_xattr_listent(
> >  	args.hashval = xfs_da_hashname(args.name, args.namelen);
> >  	args.trans = context->tp;
> >  	args.value = xchk_xattr_valuebuf(sx->sc);
> > -	args.valuelen = XATTR_SIZE_MAX;
> > +	args.valuelen = valuelen;
> >  
> >  	error = xfs_attr_get_ilocked(context->dp, &args);
> >  	if (error == -EEXIST)
> > @@ -281,16 +320,26 @@ xchk_xattr_block(
> >  	struct xfs_attr_leafblock	*leaf = bp->b_addr;
> >  	struct xfs_attr_leaf_entry	*ent;
> >  	struct xfs_attr_leaf_entry	*entries;
> > -	unsigned long			*usedmap = xchk_xattr_usedmap(ds->sc);
> > +	unsigned long			*usedmap;
> >  	char				*buf_end;
> >  	size_t				off;
> >  	__u32				last_hashval = 0;
> >  	unsigned int			usedbytes = 0;
> >  	unsigned int			hdrsize;
> >  	int				i;
> > +	int				error;
> >  
> >  	if (*last_checked == blk->blkno)
> >  		return 0;
> > +
> > +	/* Allocate memory for block usage checking. */
> > +	error = xchk_setup_xattr_buf(ds->sc, 0, KM_MAYFAIL);
> > +	if (error == -ENOMEM)
> > +		return -EDEADLOCK;
> > +	if (error)
> > +		return error;
> > +	usedmap = xchk_xattr_usedmap(ds->sc);
> > +
> >  	*last_checked = blk->blkno;
> >  	bitmap_zero(usedmap, mp->m_attr_geo->blksize);
> >  
> > diff --git a/fs/xfs/scrub/attr.h b/fs/xfs/scrub/attr.h
> > index 27e879aeaafc..13a1d2e8424d 100644
> > --- a/fs/xfs/scrub/attr.h
> > +++ b/fs/xfs/scrub/attr.h
> > @@ -10,6 +10,9 @@
> >   * Temporary storage for online scrub and repair of extended attributes.
> >   */
> >  struct xchk_xattr_buf {
> > +	/* Size of @buf, in bytes. */
> > +	size_t			sz;
> > +
> >  	/*
> >  	 * Memory buffer -- either used for extracting attr values while
> >  	 * walking the attributes; or for computing attr block bitmaps when
> > @@ -62,6 +65,7 @@ xchk_xattr_dstmap(
> >  			BITS_TO_LONGS(sc->mp->m_attr_geo->blksize);
> >  }
> >  
> > -int xchk_setup_xattr_buf(struct xfs_scrub *sc, size_t value_size);
> > +int xchk_setup_xattr_buf(struct xfs_scrub *sc, size_t value_size,
> > +		xfs_km_flags_t flags);
> >  
> >  #endif	/* __XFS_SCRUB_ATTR_H__ */
> > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] xfs: remove more ondisk directory corruption asserts
  2019-07-05 14:49   ` Brian Foster
@ 2019-07-05 17:03     ` Darrick J. Wong
  2019-07-05 17:27       ` Brian Foster
  0 siblings, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2019-07-05 17:03 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Jul 05, 2019 at 10:49:05AM -0400, Brian Foster wrote:
> On Wed, Jun 26, 2019 at 01:46:40PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Continue our game of replacing ASSERTs for corrupt ondisk metadata with
> > EFSCORRUPTED returns.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_da_btree.c  |   19 ++++++++++++-------
> >  fs/xfs/libxfs/xfs_dir2_node.c |    3 ++-
> >  2 files changed, 14 insertions(+), 8 deletions(-)
> > 
> > 
> ...
> > diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
> > index 16731d2d684b..f7f3fb458019 100644
> > --- a/fs/xfs/libxfs/xfs_dir2_node.c
> > +++ b/fs/xfs/libxfs/xfs_dir2_node.c
> > @@ -743,7 +743,8 @@ xfs_dir2_leafn_lookup_for_entry(
> >  	ents = dp->d_ops->leaf_ents_p(leaf);
> >  
> >  	xfs_dir3_leaf_check(dp, bp);
> > -	ASSERT(leafhdr.count > 0);
> > +	if (leafhdr.count <= 0)
> > +		return -EFSCORRUPTED;
> 
> This error return bubbles up to xfs_dir2_leafn_lookup_int() and
> xfs_da3_node_lookup_int(). The latter has a direct return value as well
> as a *result return parameter, which unconditionally carries the return
> value from xfs_dir2_leafn_lookup_int(). xfs_da3_node_lookup_int() has
> multiple callers, but a quick look at one (xfs_attr_node_addname())
> suggests we might not handle corruption errors properly via the *result
> parameter. Perhaps we also need to fix up xfs_da3_node_lookup_int() to
> return particular errors directly?

It would be a good idea to clean up the whole return value/*retval mess
in that function (and xfs_da3_path_shift where *retval came from) but
that quickly turned into a bigger cleanup of magic values and dual
returns, particularly since the dabtree shrinking code turns the
_path_shift *retval into yet another series of magic int numbers...

...so in the meantime this at least fixes the asserts I see when running
fuzz testing.  I'll look at the broader cleanup for 5.4.

--D

> 
> Brian
> 
> >  
> >  	/*
> >  	 * Look up the hash value in the leaf entries.
> > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/6] xfs: online scrub needn't bother zeroing its temporary buffer
  2019-07-05 16:35     ` Darrick J. Wong
@ 2019-07-05 17:26       ` Brian Foster
  2019-07-05 17:57         ` Darrick J. Wong
  0 siblings, 1 reply; 20+ messages in thread
From: Brian Foster @ 2019-07-05 17:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Fri, Jul 05, 2019 at 09:35:04AM -0700, Darrick J. Wong wrote:
> On Fri, Jul 05, 2019 at 10:52:46AM -0400, Brian Foster wrote:
> > On Wed, Jun 26, 2019 at 01:47:10PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > The xattr scrubber functions use the temporary memory buffer either for
> > > storing bitmaps or for testing if attribute value extraction works.  The
> > > bitmap code always zeroes what it needs and the value extraction merely
> > > sets the buffer contents (we never read the contents, we just look for
> > > return codes), so it's not necessary to waste CPU time zeroing on
> > > allocation.
> > > 
> > 
> > If we don't need to zero the buffer because we never look at the result,
> > that suggests we don't need to populate it in the first place right?
> 
> We still need to read the attr value into the buffer (at least for
> remote attr values) because scrub doesn't otherwise check the remote
> attribute block header.
> 
> We never read the contents (because the contents are just arbitrary
> bytes) but we do need to be able to catch an EFSCORRUPTED if, say, the
> attribute dabtree points at a corrupt block.
> 

Ok.. what I'm getting at here is basically wondering if since the buffer
zeroing was noticeable in performance traces, whether the xattr value
memory copy might be similarly noticeable for certain datasets (many
large xattrs?). I suppose that may be less prominent if the buffer
alloc/zero was unconditional as opposed to tied to the existence of an
actual xattr, but that doesn't necessarily mean the performance impact
is zero.

If non-zero, it might be interesting to explore whether some sort of
lookup interface makes sense for xattrs that essentially do everything
we currently do via xfs_attr_get() except read the attr. Presumably we
could avoid the memory copy along with the buffer allocation in that
case. But that's just a random thought for future consideration,
certainly not low handing fruit as is this patch. If you have a good
scrub performance test, an easy experiment might be to run it with a
hack to skip the buffer allocation, pass a NULL buffer and
conditionalize the ->value accesses/copies in the xattr code to avoid
explosions and see whether there's any benefit.

> > > A flame graph analysis showed that we were spending 7% of a xfs_scrub
> > > run (the whole program, not just the attr scrubber itself) allocating
> > > and zeroing 64k segments needlessly.
> > > 
> > 
> > How much does this patch help?
> 
> About 1-2% I think.  FWIW the "7%" figure represents the smallest
> improvement I saw in runtimes, where allocation ate 1-2% of the runtime
> and zeroing accounts for the rest (~5-6%).
> 
> Practically speaking, when I retested with NVME flash instead of
> spinning rust then the improvement jumped to 15-20% overall.
> 

Nice!

Brian

> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  fs/xfs/scrub/attr.c |    7 ++++++-
> > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > 
> > > 
> > > diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
> > > index 09081d8ab34b..d3a6f3dacf0d 100644
> > > --- a/fs/xfs/scrub/attr.c
> > > +++ b/fs/xfs/scrub/attr.c
> > > @@ -64,7 +64,12 @@ xchk_setup_xattr_buf(
> > >  		sc->buf = NULL;
> > >  	}
> > >  
> > > -	ab = kmem_zalloc_large(sizeof(*ab) + sz, flags);
> > > +	/*
> > > +	 * Allocate the big buffer.  We skip zeroing it because that added 7%
> > > +	 * to the scrub runtime and all the users were careful never to read
> > > +	 * uninitialized contents.
> > > +	 */
> > 
> > Ok, that suggests the 7% hit was due to zeroing (where the commit log
> > says "allocating and zeroing"). Either way, we probably don't need such
> > details in the code. Can we tweak the comment to something like:
> > 
> > /*
> >  * Don't zero the buffer on allocation to avoid runtime overhead. All
> >  * users must be careful never to read uninitialized contents.
> >  */ 
> 
> Ok, I'll do that.
> 
> Thanks for all the review! :)
> 
> --D
> 
> > 
> > With that:
> > 
> > Reviewed-by: Brian Foster <bfoster@redhat.com>
> > 
> > > +	ab = kmem_alloc_large(sizeof(*ab) + sz, flags);
> > >  	if (!ab)
> > >  		return -ENOMEM;
> > >  
> > > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] xfs: remove more ondisk directory corruption asserts
  2019-07-05 17:03     ` Darrick J. Wong
@ 2019-07-05 17:27       ` Brian Foster
  0 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2019-07-05 17:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Fri, Jul 05, 2019 at 10:03:45AM -0700, Darrick J. Wong wrote:
> On Fri, Jul 05, 2019 at 10:49:05AM -0400, Brian Foster wrote:
> > On Wed, Jun 26, 2019 at 01:46:40PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Continue our game of replacing ASSERTs for corrupt ondisk metadata with
> > > EFSCORRUPTED returns.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  fs/xfs/libxfs/xfs_da_btree.c  |   19 ++++++++++++-------
> > >  fs/xfs/libxfs/xfs_dir2_node.c |    3 ++-
> > >  2 files changed, 14 insertions(+), 8 deletions(-)
> > > 
> > > 
> > ...
> > > diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
> > > index 16731d2d684b..f7f3fb458019 100644
> > > --- a/fs/xfs/libxfs/xfs_dir2_node.c
> > > +++ b/fs/xfs/libxfs/xfs_dir2_node.c
> > > @@ -743,7 +743,8 @@ xfs_dir2_leafn_lookup_for_entry(
> > >  	ents = dp->d_ops->leaf_ents_p(leaf);
> > >  
> > >  	xfs_dir3_leaf_check(dp, bp);
> > > -	ASSERT(leafhdr.count > 0);
> > > +	if (leafhdr.count <= 0)
> > > +		return -EFSCORRUPTED;
> > 
> > This error return bubbles up to xfs_dir2_leafn_lookup_int() and
> > xfs_da3_node_lookup_int(). The latter has a direct return value as well
> > as a *result return parameter, which unconditionally carries the return
> > value from xfs_dir2_leafn_lookup_int(). xfs_da3_node_lookup_int() has
> > multiple callers, but a quick look at one (xfs_attr_node_addname())
> > suggests we might not handle corruption errors properly via the *result
> > parameter. Perhaps we also need to fix up xfs_da3_node_lookup_int() to
> > return particular errors directly?
> 
> It would be a good idea to clean up the whole return value/*retval mess
> in that function (and xfs_da3_path_shift where *retval came from) but
> that quickly turned into a bigger cleanup of magic values and dual
> returns, particularly since the dabtree shrinking code turns the
> _path_shift *retval into yet another series of magic int numbers...
> 

Hm.. sure, but in the meantime couldn't this patch just insert a check
for the obvious -EFSCORRUPTED error in xfs_da3_node_lookup_int() and
return that directly as opposed to via *result? That function already
returns -EFSCORRUPTED directly in other places.

As it is, this hunk is kind of strange because it inserts an error
return in a place where it seems more likely than not to be either
ignored or misinterpreted if it ever occurs. I'm fine with putting this
off as a broader cleanup, but in that case I'd prefer to just drop this
hunk for now and change the assert over when we know the return will be
handled correctly.

Either option seems reasonable (and FWIW the other bits in this patch
look fine to me)...

Brian

> ...so in the meantime this at least fixes the asserts I see when running
> fuzz testing.  I'll look at the broader cleanup for 5.4.
> 
> --D
> 
> > 
> > Brian
> > 
> > >  
> > >  	/*
> > >  	 * Look up the hash value in the leaf entries.
> > > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/6] xfs: online scrub needn't bother zeroing its temporary buffer
  2019-07-05 17:26       ` Brian Foster
@ 2019-07-05 17:57         ` Darrick J. Wong
  0 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2019-07-05 17:57 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Jul 05, 2019 at 01:26:39PM -0400, Brian Foster wrote:
> On Fri, Jul 05, 2019 at 09:35:04AM -0700, Darrick J. Wong wrote:
> > On Fri, Jul 05, 2019 at 10:52:46AM -0400, Brian Foster wrote:
> > > On Wed, Jun 26, 2019 at 01:47:10PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > The xattr scrubber functions use the temporary memory buffer either for
> > > > storing bitmaps or for testing if attribute value extraction works.  The
> > > > bitmap code always zeroes what it needs and the value extraction merely
> > > > sets the buffer contents (we never read the contents, we just look for
> > > > return codes), so it's not necessary to waste CPU time zeroing on
> > > > allocation.
> > > > 
> > > 
> > > If we don't need to zero the buffer because we never look at the result,
> > > that suggests we don't need to populate it in the first place right?
> > 
> > We still need to read the attr value into the buffer (at least for
> > remote attr values) because scrub doesn't otherwise check the remote
> > attribute block header.
> > 
> > We never read the contents (because the contents are just arbitrary
> > bytes) but we do need to be able to catch an EFSCORRUPTED if, say, the
> > attribute dabtree points at a corrupt block.
> > 
> 
> Ok.. what I'm getting at here is basically wondering if since the buffer
> zeroing was noticeable in performance traces, whether the xattr value
> memory copy might be similarly noticeable for certain datasets (many
> large xattrs?). I suppose that may be less prominent if the buffer
> alloc/zero was unconditional as opposed to tied to the existence of an
> actual xattr, but that doesn't necessarily mean the performance impact
> is zero.
> 
> If non-zero, it might be interesting to explore whether some sort of
> lookup interface makes sense for xattrs that essentially do everything
> we currently do via xfs_attr_get() except read the attr. Presumably we
> could avoid the memory copy along with the buffer allocation in that
> case. But that's just a random thought for future consideration,
> certainly not low handing fruit as is this patch. If you have a good
> scrub performance test, an easy experiment might be to run it with a
> hack to skip the buffer allocation, pass a NULL buffer and
> conditionalize the ->value accesses/copies in the xattr code to avoid
> explosions and see whether there's any benefit.

Ahhh, yes.  Currently for flame graph analysis I just use perf record +
Brendan Gregg's flamegraph tools to spit out a svg and then go digging
into any call stack is wide and not especially conical.  I hadn't really
noticed the actual attr value copyout but that's only because it tends
to get lost in the noise of parsing through attr leaves and whatnot.

However, it does sound like a nice shortcut to be able to set
xfs_da_args.value = NULL and have the attr value code go through the
motions of extracting the value but skipping the memcpy part.

Will put this on my list of things to study for 5.4. :)

--D

> > > > A flame graph analysis showed that we were spending 7% of a xfs_scrub
> > > > run (the whole program, not just the attr scrubber itself) allocating
> > > > and zeroing 64k segments needlessly.
> > > > 
> > > 
> > > How much does this patch help?
> > 
> > About 1-2% I think.  FWIW the "7%" figure represents the smallest
> > improvement I saw in runtimes, where allocation ate 1-2% of the runtime
> > and zeroing accounts for the rest (~5-6%).
> > 
> > Practically speaking, when I retested with NVME flash instead of
> > spinning rust then the improvement jumped to 15-20% overall.
> > 
> 
> Nice!
> 
> Brian
> 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > >  fs/xfs/scrub/attr.c |    7 ++++++-
> > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > > 
> > > > 
> > > > diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
> > > > index 09081d8ab34b..d3a6f3dacf0d 100644
> > > > --- a/fs/xfs/scrub/attr.c
> > > > +++ b/fs/xfs/scrub/attr.c
> > > > @@ -64,7 +64,12 @@ xchk_setup_xattr_buf(
> > > >  		sc->buf = NULL;
> > > >  	}
> > > >  
> > > > -	ab = kmem_zalloc_large(sizeof(*ab) + sz, flags);
> > > > +	/*
> > > > +	 * Allocate the big buffer.  We skip zeroing it because that added 7%
> > > > +	 * to the scrub runtime and all the users were careful never to read
> > > > +	 * uninitialized contents.
> > > > +	 */
> > > 
> > > Ok, that suggests the 7% hit was due to zeroing (where the commit log
> > > says "allocating and zeroing"). Either way, we probably don't need such
> > > details in the code. Can we tweak the comment to something like:
> > > 
> > > /*
> > >  * Don't zero the buffer on allocation to avoid runtime overhead. All
> > >  * users must be careful never to read uninitialized contents.
> > >  */ 
> > 
> > Ok, I'll do that.
> > 
> > Thanks for all the review! :)
> > 
> > --D
> > 
> > > 
> > > With that:
> > > 
> > > Reviewed-by: Brian Foster <bfoster@redhat.com>
> > > 
> > > > +	ab = kmem_alloc_large(sizeof(*ab) + sz, flags);
> > > >  	if (!ab)
> > > >  		return -ENOMEM;
> > > >  
> > > > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-07-05 17:57 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-26 20:46 [PATCH v2 0/6] xfs: scrub-related fixes Darrick J. Wong
2019-06-26 20:46 ` [PATCH 1/6] xfs: remove more ondisk directory corruption asserts Darrick J. Wong
2019-07-05 14:49   ` Brian Foster
2019-07-05 17:03     ` Darrick J. Wong
2019-07-05 17:27       ` Brian Foster
2019-06-26 20:46 ` [PATCH 2/6] xfs: attribute scrub should use seen_enough to pass error values Darrick J. Wong
2019-07-05 14:49   ` Brian Foster
2019-07-05 16:46     ` Darrick J. Wong
2019-06-26 20:46 ` [PATCH 3/6] xfs: refactor extended attribute buffer pointer functions Darrick J. Wong
2019-07-05 14:52   ` Brian Foster
2019-06-26 20:46 ` [PATCH 4/6] xfs: refactor attr scrub memory allocation function Darrick J. Wong
2019-07-05 14:52   ` Brian Foster
2019-06-26 20:47 ` [PATCH 5/6] xfs: only allocate memory for scrubbing attributes when we need it Darrick J. Wong
2019-07-05 14:52   ` Brian Foster
2019-07-05 16:49     ` Darrick J. Wong
2019-06-26 20:47 ` [PATCH 6/6] xfs: online scrub needn't bother zeroing its temporary buffer Darrick J. Wong
2019-07-05 14:52   ` Brian Foster
2019-07-05 16:35     ` Darrick J. Wong
2019-07-05 17:26       ` Brian Foster
2019-07-05 17:57         ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.