linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/14] xfs: refactor and improve inode iteration
@ 2019-06-12  6:47 Darrick J. Wong
  2019-06-12  6:47 ` [PATCH 01/14] xfs: create iterator error codes Darrick J. Wong
                   ` (13 more replies)
  0 siblings, 14 replies; 33+ messages in thread
From: Darrick J. Wong @ 2019-06-12  6:47 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

Hi all,

This next series refactors all the inode walking code in XFS into a
single set of helper functions.  The goal is to separate the mechanics
of iterating a subset of inode in the filesystem from bulkstat.

First we clean up a few weird things in XFS, then build a generic inode
iteration function.  Next, we convert the bulkstat ioctl to use it, then
fix a few things from some of the code we saved from the old bulkstat
inode iteration code.  After that, we restructure the code slightly to
support the inumbers functionality, and then port the inumbers ioctl to
it too.

Finally, we introduce a parallel inode walk feature to speed up
quotacheck on large filesystems.  The justification for this part is a
little questionable since it needs further discovery of what hardware
and software this works best on.  It's also an open question of whether
or not bulkstat could be optimized further.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=parallel-iwalk

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=parallel-iwalk

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 01/14] xfs: create iterator error codes
  2019-06-12  6:47 [PATCH v5 00/14] xfs: refactor and improve inode iteration Darrick J. Wong
@ 2019-06-12  6:47 ` Darrick J. Wong
  2019-06-13 16:24   ` Brian Foster
  2019-06-12  6:47 ` [PATCH 02/14] xfs: create simplified inode walk function Darrick J. Wong
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Darrick J. Wong @ 2019-06-12  6:47 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Currently, xfs doesn't have generic error codes defined for "stop
iterating"; we just reuse the XFS_BTREE_QUERY_* return values.  This
looks a little weird if we're not actually iterating a btree index.
Before we start adding more iterators, we should create general
XFS_ITER_{CONTINUE,ABORT} return values and define the XFS_BTREE_QUERY_*
ones from that.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c  |    2 +-
 fs/xfs/libxfs/xfs_btree.h  |    4 ++--
 fs/xfs/libxfs/xfs_shared.h |    6 ++++++
 fs/xfs/scrub/agheader.c    |    4 ++--
 fs/xfs/scrub/repair.c      |    4 ++--
 fs/xfs/xfs_dquot.c         |    2 +-
 6 files changed, 14 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index a9ff3cf82cce..b9eb3a8aeaf9 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -3146,7 +3146,7 @@ xfs_alloc_has_record(
 
 /*
  * Walk all the blocks in the AGFL.  The @walk_fn can return any negative
- * error code or XFS_BTREE_QUERY_RANGE_ABORT.
+ * error code or XFS_ITER_*.
  */
 int
 xfs_agfl_walk(
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index e3b3e9dce5da..94530766dd30 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -469,8 +469,8 @@ uint xfs_btree_compute_maxlevels(uint *limits, unsigned long len);
 unsigned long long xfs_btree_calc_size(uint *limits, unsigned long long len);
 
 /* return codes */
-#define XFS_BTREE_QUERY_RANGE_CONTINUE	0	/* keep iterating */
-#define XFS_BTREE_QUERY_RANGE_ABORT	1	/* stop iterating */
+#define XFS_BTREE_QUERY_RANGE_CONTINUE	(XFS_ITER_CONTINUE) /* keep iterating */
+#define XFS_BTREE_QUERY_RANGE_ABORT	(XFS_ITER_ABORT)    /* stop iterating */
 typedef int (*xfs_btree_query_range_fn)(struct xfs_btree_cur *cur,
 		union xfs_btree_rec *rec, void *priv);
 
diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
index 4e909791aeac..fa788139dfe3 100644
--- a/fs/xfs/libxfs/xfs_shared.h
+++ b/fs/xfs/libxfs/xfs_shared.h
@@ -136,4 +136,10 @@ void xfs_symlink_local_to_remote(struct xfs_trans *tp, struct xfs_buf *bp,
 				 struct xfs_inode *ip, struct xfs_ifork *ifp);
 xfs_failaddr_t xfs_symlink_shortform_verify(struct xfs_inode *ip);
 
+/* Keep iterating the data structure. */
+#define XFS_ITER_CONTINUE	(0)
+
+/* Stop iterating the data structure. */
+#define XFS_ITER_ABORT		(1)
+
 #endif /* __XFS_SHARED_H__ */
diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
index adaeabdefdd3..1d5361f9ebfc 100644
--- a/fs/xfs/scrub/agheader.c
+++ b/fs/xfs/scrub/agheader.c
@@ -646,7 +646,7 @@ xchk_agfl_block(
 	xchk_agfl_block_xref(sc, agbno);
 
 	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
-		return XFS_BTREE_QUERY_RANGE_ABORT;
+		return XFS_ITER_ABORT;
 
 	return 0;
 }
@@ -737,7 +737,7 @@ xchk_agfl(
 	/* Check the blocks in the AGFL. */
 	error = xfs_agfl_walk(sc->mp, XFS_BUF_TO_AGF(sc->sa.agf_bp),
 			sc->sa.agfl_bp, xchk_agfl_block, &sai);
-	if (error == XFS_BTREE_QUERY_RANGE_ABORT) {
+	if (error == XFS_ITER_ABORT) {
 		error = 0;
 		goto out_free;
 	}
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index eb358f0f5e0a..e2a352c1bad7 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -672,7 +672,7 @@ xrep_findroot_agfl_walk(
 {
 	xfs_agblock_t		*agbno = priv;
 
-	return (*agbno == bno) ? XFS_BTREE_QUERY_RANGE_ABORT : 0;
+	return (*agbno == bno) ? XFS_ITER_ABORT : 0;
 }
 
 /* Does this block match the btree information passed in? */
@@ -702,7 +702,7 @@ xrep_findroot_block(
 	if (owner == XFS_RMAP_OWN_AG) {
 		error = xfs_agfl_walk(mp, ri->agf, ri->agfl_bp,
 				xrep_findroot_agfl_walk, &agbno);
-		if (error == XFS_BTREE_QUERY_RANGE_ABORT)
+		if (error == XFS_ITER_ABORT)
 			return 0;
 		if (error)
 			return error;
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index a1af984e4913..8674551c5e98 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -1243,7 +1243,7 @@ xfs_qm_exit(void)
 /*
  * Iterate every dquot of a particular type.  The caller must ensure that the
  * particular quota type is active.  iter_fn can return negative error codes,
- * or XFS_BTREE_QUERY_RANGE_ABORT to indicate that it wants to stop iterating.
+ * or XFS_ITER_ABORT to indicate that it wants to stop iterating.
  */
 int
 xfs_qm_dqiterate(

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 02/14] xfs: create simplified inode walk function
  2019-06-12  6:47 [PATCH v5 00/14] xfs: refactor and improve inode iteration Darrick J. Wong
  2019-06-12  6:47 ` [PATCH 01/14] xfs: create iterator error codes Darrick J. Wong
@ 2019-06-12  6:47 ` Darrick J. Wong
  2019-06-13 16:27   ` Brian Foster
  2019-06-12  6:47 ` [PATCH 03/14] xfs: convert quotacheck to use the new iwalk functions Darrick J. Wong
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Darrick J. Wong @ 2019-06-12  6:47 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Create a new iterator function to simplify walking inodes in an XFS
filesystem.  This new iterator will replace the existing open-coded
walking that goes on in various places.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile                  |    1 
 fs/xfs/libxfs/xfs_ialloc_btree.c |   36 +++
 fs/xfs/libxfs/xfs_ialloc_btree.h |    3 
 fs/xfs/xfs_itable.c              |    5 
 fs/xfs/xfs_itable.h              |    8 +
 fs/xfs/xfs_iwalk.c               |  418 ++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_iwalk.h               |   19 ++
 fs/xfs/xfs_trace.h               |   40 ++++
 8 files changed, 524 insertions(+), 6 deletions(-)
 create mode 100644 fs/xfs/xfs_iwalk.c
 create mode 100644 fs/xfs/xfs_iwalk.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 91831975363b..74d30ef0dbce 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -80,6 +80,7 @@ xfs-y				+= xfs_aops.o \
 				   xfs_iops.o \
 				   xfs_inode.o \
 				   xfs_itable.o \
+				   xfs_iwalk.o \
 				   xfs_message.o \
 				   xfs_mount.o \
 				   xfs_mru_cache.o \
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index ac4b65da4c2b..430bc26f1d8f 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -564,6 +564,35 @@ xfs_inobt_max_size(
 					XFS_INODES_PER_CHUNK);
 }
 
+/* Read AGI and create inobt cursor. */
+int
+xfs_inobt_cur(
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	xfs_agnumber_t		agno,
+	struct xfs_btree_cur	**curpp,
+	struct xfs_buf		**agi_bpp)
+{
+	struct xfs_btree_cur	*cur;
+	int			error;
+
+	ASSERT(*agi_bpp == NULL);
+	ASSERT(*curpp == NULL);
+
+	error = xfs_ialloc_read_agi(mp, tp, agno, agi_bpp);
+	if (error)
+		return error;
+
+	cur = xfs_inobt_init_cursor(mp, tp, *agi_bpp, agno, XFS_BTNUM_INO);
+	if (!cur) {
+		xfs_trans_brelse(tp, *agi_bpp);
+		*agi_bpp = NULL;
+		return -ENOMEM;
+	}
+	*curpp = cur;
+	return 0;
+}
+
 static int
 xfs_inobt_count_blocks(
 	struct xfs_mount	*mp,
@@ -572,15 +601,14 @@ xfs_inobt_count_blocks(
 	xfs_btnum_t		btnum,
 	xfs_extlen_t		*tree_blocks)
 {
-	struct xfs_buf		*agbp;
-	struct xfs_btree_cur	*cur;
+	struct xfs_buf		*agbp = NULL;
+	struct xfs_btree_cur	*cur = NULL;
 	int			error;
 
-	error = xfs_ialloc_read_agi(mp, tp, agno, &agbp);
+	error = xfs_inobt_cur(mp, tp, agno, &cur, &agbp);
 	if (error)
 		return error;
 
-	cur = xfs_inobt_init_cursor(mp, tp, agbp, agno, btnum);
 	error = xfs_btree_count_blocks(cur, tree_blocks);
 	xfs_btree_del_cursor(cur, error);
 	xfs_trans_brelse(tp, agbp);
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.h b/fs/xfs/libxfs/xfs_ialloc_btree.h
index ebdd0c6b8766..1bc44b4a2b6c 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.h
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.h
@@ -64,5 +64,8 @@ int xfs_finobt_calc_reserves(struct xfs_mount *mp, struct xfs_trans *tp,
 		xfs_agnumber_t agno, xfs_extlen_t *ask, xfs_extlen_t *used);
 extern xfs_extlen_t xfs_iallocbt_calc_size(struct xfs_mount *mp,
 		unsigned long long len);
+int xfs_inobt_cur(struct xfs_mount *mp, struct xfs_trans *tp,
+		xfs_agnumber_t agno, struct xfs_btree_cur **curpp,
+		struct xfs_buf **agi_bpp);
 
 #endif	/* __XFS_IALLOC_BTREE_H__ */
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index eef307cf90a7..3ca1c454afe6 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -19,6 +19,7 @@
 #include "xfs_trace.h"
 #include "xfs_icache.h"
 #include "xfs_health.h"
+#include "xfs_iwalk.h"
 
 /*
  * Return stat information for one inode.
@@ -161,7 +162,7 @@ xfs_bulkstat_one(
  * Loop over all clusters in a chunk for a given incore inode allocation btree
  * record.  Do a readahead if there are any allocated inodes in that cluster.
  */
-STATIC void
+void
 xfs_bulkstat_ichunk_ra(
 	struct xfs_mount		*mp,
 	xfs_agnumber_t			agno,
@@ -195,7 +196,7 @@ xfs_bulkstat_ichunk_ra(
  * are some left allocated, update the data for the pointed-to record as well as
  * return the count of grabbed inodes.
  */
-STATIC int
+int
 xfs_bulkstat_grab_ichunk(
 	struct xfs_btree_cur		*cur,	/* btree cursor */
 	xfs_agino_t			agino,	/* starting inode of chunk */
diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
index 8a822285b671..369e3f159d4e 100644
--- a/fs/xfs/xfs_itable.h
+++ b/fs/xfs/xfs_itable.h
@@ -84,4 +84,12 @@ xfs_inumbers(
 	void			__user *buffer, /* buffer with inode info */
 	inumbers_fmt_pf		formatter);
 
+/* Temporarily needed while we refactor functions. */
+struct xfs_btree_cur;
+struct xfs_inobt_rec_incore;
+void xfs_bulkstat_ichunk_ra(struct xfs_mount *mp, xfs_agnumber_t agno,
+		struct xfs_inobt_rec_incore *irec);
+int xfs_bulkstat_grab_ichunk(struct xfs_btree_cur *cur, xfs_agino_t agino,
+		int *icount, struct xfs_inobt_rec_incore *irec);
+
 #endif	/* __XFS_ITABLE_H__ */
diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
new file mode 100644
index 000000000000..49289588413f
--- /dev/null
+++ b/fs/xfs/xfs_iwalk.c
@@ -0,0 +1,418 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2019 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_inode.h"
+#include "xfs_btree.h"
+#include "xfs_ialloc.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_iwalk.h"
+#include "xfs_itable.h"
+#include "xfs_error.h"
+#include "xfs_trace.h"
+#include "xfs_icache.h"
+#include "xfs_health.h"
+#include "xfs_trans.h"
+
+/*
+ * Walking Inodes in the Filesystem
+ * ================================
+ *
+ * This iterator function walks a subset of filesystem inodes in increasing
+ * order from @startino until there are no more inodes.  For each allocated
+ * inode it finds, it calls a walk function with the relevant inode number and
+ * a pointer to caller-provided data.  The walk function can return the usual
+ * negative error code to stop the iteration; 0 to continue the iteration; or
+ * XFS_IWALK_ABORT to stop the iteration.  This return value is returned to the
+ * caller.
+ *
+ * Internally, we allow the walk function to do anything, which means that we
+ * cannot maintain the inobt cursor or our lock on the AGI buffer.  We
+ * therefore cache the inobt records in kernel memory and only call the walk
+ * function when our memory buffer is full.  @nr_recs is the number of records
+ * that we've cached, and @sz_recs is the size of our cache.
+ *
+ * It is the responsibility of the walk function to ensure it accesses
+ * allocated inodes, as the inobt records may be stale by the time they are
+ * acted upon.
+ */
+
+struct xfs_iwalk_ag {
+	struct xfs_mount		*mp;
+	struct xfs_trans		*tp;
+
+	/* Where do we start the traversal? */
+	xfs_ino_t			startino;
+
+	/* Array of inobt records we cache. */
+	struct xfs_inobt_rec_incore	*recs;
+
+	/* Number of entries allocated for the @recs array. */
+	unsigned int			sz_recs;
+
+	/* Number of entries in the @recs array that are in use. */
+	unsigned int			nr_recs;
+
+	/* Inode walk function and data pointer. */
+	xfs_iwalk_fn			iwalk_fn;
+	void				*data;
+};
+
+/* Allocate memory for a walk. */
+STATIC int
+xfs_iwalk_alloc(
+	struct xfs_iwalk_ag	*iwag)
+{
+	size_t			size;
+
+	ASSERT(iwag->recs == NULL);
+	iwag->nr_recs = 0;
+
+	/* Allocate a prefetch buffer for inobt records. */
+	size = iwag->sz_recs * sizeof(struct xfs_inobt_rec_incore);
+	iwag->recs = kmem_alloc(size, KM_MAYFAIL);
+	if (iwag->recs == NULL)
+		return -ENOMEM;
+
+	return 0;
+}
+
+/* Free memory we allocated for a walk. */
+STATIC void
+xfs_iwalk_free(
+	struct xfs_iwalk_ag	*iwag)
+{
+	kmem_free(iwag->recs);
+}
+
+/* For each inuse inode in each cached inobt record, call our function. */
+STATIC int
+xfs_iwalk_ag_recs(
+	struct xfs_iwalk_ag		*iwag)
+{
+	struct xfs_mount		*mp = iwag->mp;
+	struct xfs_trans		*tp = iwag->tp;
+	xfs_ino_t			ino;
+	unsigned int			i, j;
+	xfs_agnumber_t			agno;
+	int				error;
+
+	agno = XFS_INO_TO_AGNO(mp, iwag->startino);
+	for (i = 0; i < iwag->nr_recs; i++) {
+		struct xfs_inobt_rec_incore	*irec = &iwag->recs[i];
+
+		trace_xfs_iwalk_ag_rec(mp, agno, irec);
+
+		for (j = 0; j < XFS_INODES_PER_CHUNK; j++) {
+			/* Skip if this inode is free */
+			if (XFS_INOBT_MASK(j) & irec->ir_free)
+				continue;
+
+			/* Otherwise call our function. */
+			ino = XFS_AGINO_TO_INO(mp, agno, irec->ir_startino + j);
+			error = iwag->iwalk_fn(mp, tp, ino, iwag->data);
+			if (error)
+				return error;
+		}
+	}
+
+	return 0;
+}
+
+/* Delete cursor and let go of AGI. */
+static inline void
+xfs_iwalk_del_inobt(
+	struct xfs_trans	*tp,
+	struct xfs_btree_cur	**curpp,
+	struct xfs_buf		**agi_bpp,
+	int			error)
+{
+	if (*curpp) {
+		xfs_btree_del_cursor(*curpp, error);
+		*curpp = NULL;
+	}
+	if (*agi_bpp) {
+		xfs_trans_brelse(tp, *agi_bpp);
+		*agi_bpp = NULL;
+	}
+}
+
+/*
+ * Set ourselves up for walking inobt records starting from a given point in
+ * the filesystem.
+ *
+ * If caller passed in a nonzero start inode number, load the record from the
+ * inobt and make the record look like all the inodes before agino are free so
+ * that we skip them, and then move the cursor to the next inobt record.  This
+ * is how we support starting an iwalk in the middle of an inode chunk.
+ *
+ * If the caller passed in a start number of zero, move the cursor to the first
+ * inobt record.
+ *
+ * The caller is responsible for cleaning up the cursor and buffer pointer
+ * regardless of the error status.
+ */
+STATIC int
+xfs_iwalk_ag_start(
+	struct xfs_iwalk_ag	*iwag,
+	xfs_agnumber_t		agno,
+	xfs_agino_t		agino,
+	struct xfs_btree_cur	**curpp,
+	struct xfs_buf		**agi_bpp,
+	int			*has_more)
+{
+	struct xfs_mount	*mp = iwag->mp;
+	struct xfs_trans	*tp = iwag->tp;
+	int			icount;
+	int			error;
+
+	/* Set up a fresh cursor and empty the inobt cache. */
+	iwag->nr_recs = 0;
+	error = xfs_inobt_cur(mp, tp, agno, curpp, agi_bpp);
+	if (error)
+		return error;
+
+	/* Starting at the beginning of the AG?  That's easy! */
+	if (agino == 0)
+		return xfs_inobt_lookup(*curpp, 0, XFS_LOOKUP_GE, has_more);
+
+	/*
+	 * Otherwise, we have to grab the inobt record where we left off, stuff
+	 * the record into our cache, and then see if there are more records.
+	 * We require a lookup cache of at least two elements so that we don't
+	 * have to deal with tearing down the cursor to walk the records.
+	 */
+	error = xfs_bulkstat_grab_ichunk(*curpp, agino - 1, &icount,
+			&iwag->recs[iwag->nr_recs]);
+	if (error)
+		return error;
+	if (icount)
+		iwag->nr_recs++;
+
+	/*
+	 * set_prefetch is supposed to give us a large enough inobt record
+	 * cache that grab_ichunk can stage a partial first record and the loop
+	 * body can cache a record without having to check for cache space
+	 * until after it reads an inobt record.
+	 */
+	ASSERT(iwag->nr_recs < iwag->sz_recs);
+
+	return xfs_btree_increment(*curpp, 0, has_more);
+}
+
+/*
+ * The inobt record cache is full, so preserve the inobt cursor state and
+ * run callbacks on the cached inobt records.  When we're done, restore the
+ * cursor state to wherever the cursor would have been had the cache not been
+ * full (and therefore we could've just incremented the cursor) if *@has_more
+ * is true.  On exit, *@has_more will indicate whether or not the caller should
+ * try for more inode records.
+ */
+STATIC int
+xfs_iwalk_run_callbacks(
+	struct xfs_iwalk_ag		*iwag,
+	xfs_agnumber_t			agno,
+	struct xfs_btree_cur		**curpp,
+	struct xfs_buf			**agi_bpp,
+	int				*has_more)
+{
+	struct xfs_mount		*mp = iwag->mp;
+	struct xfs_trans		*tp = iwag->tp;
+	struct xfs_inobt_rec_incore	*irec;
+	xfs_agino_t			restart;
+	int				error;
+
+	ASSERT(iwag->nr_recs > 0);
+
+	/* Delete cursor but remember the last record we cached... */
+	xfs_iwalk_del_inobt(tp, curpp, agi_bpp, 0);
+	irec = &iwag->recs[iwag->nr_recs - 1];
+	restart = irec->ir_startino + XFS_INODES_PER_CHUNK - 1;
+
+	error = xfs_iwalk_ag_recs(iwag);
+	if (error)
+		return error;
+
+	/* ...empty the cache... */
+	iwag->nr_recs = 0;
+
+	if (!has_more)
+		return 0;
+
+	/* ...and recreate the cursor just past where we left off. */
+	error = xfs_inobt_cur(mp, tp, agno, curpp, agi_bpp);
+	if (error)
+		return error;
+
+	return xfs_inobt_lookup(*curpp, restart, XFS_LOOKUP_GE, has_more);
+}
+
+/* Walk all inodes in a single AG, from @iwag->startino to the end of the AG. */
+STATIC int
+xfs_iwalk_ag(
+	struct xfs_iwalk_ag		*iwag)
+{
+	struct xfs_mount		*mp = iwag->mp;
+	struct xfs_trans		*tp = iwag->tp;
+	struct xfs_buf			*agi_bp = NULL;
+	struct xfs_btree_cur		*cur = NULL;
+	xfs_agnumber_t			agno;
+	xfs_agino_t			agino;
+	int				has_more;
+	int				error = 0;
+
+	/* Set up our cursor at the right place in the inode btree. */
+	agno = XFS_INO_TO_AGNO(mp, iwag->startino);
+	agino = XFS_INO_TO_AGINO(mp, iwag->startino);
+	error = xfs_iwalk_ag_start(iwag, agno, agino, &cur, &agi_bp, &has_more);
+
+	while (!error && has_more) {
+		struct xfs_inobt_rec_incore	*irec;
+
+		cond_resched();
+
+		/* Fetch the inobt record. */
+		irec = &iwag->recs[iwag->nr_recs];
+		error = xfs_inobt_get_rec(cur, irec, &has_more);
+		if (error || !has_more)
+			break;
+
+		/* No allocated inodes in this chunk; skip it. */
+		if (irec->ir_freecount == irec->ir_count) {
+			error = xfs_btree_increment(cur, 0, &has_more);
+			if (error)
+				break;
+			continue;
+		}
+
+		/*
+		 * Start readahead for this inode chunk in anticipation of
+		 * walking the inodes.
+		 */
+		xfs_bulkstat_ichunk_ra(mp, agno, irec);
+
+		/*
+		 * If there's space in the buffer for more records, increment
+		 * the btree cursor and grab more.
+		 */
+		if (++iwag->nr_recs < iwag->sz_recs) {
+			error = xfs_btree_increment(cur, 0, &has_more);
+			if (error || !has_more)
+				break;
+			continue;
+		}
+
+		/*
+		 * Otherwise, we need to save cursor state and run the callback
+		 * function on the cached records.  The run_callbacks function
+		 * is supposed to return a cursor pointing to the record where
+		 * we would be if we had been able to increment like above.
+		 */
+		has_more = true;
+		error = xfs_iwalk_run_callbacks(iwag, agno, &cur, &agi_bp,
+				&has_more);
+	}
+
+	if (iwag->nr_recs == 0 || error)
+		goto out;
+
+	/* Walk the unprocessed records in the cache. */
+	error = xfs_iwalk_run_callbacks(iwag, agno, &cur, &agi_bp, &has_more);
+
+out:
+	xfs_iwalk_del_inobt(tp, &cur, &agi_bp, error);
+	return error;
+}
+
+/*
+ * Given the number of inodes to prefetch, set the number of inobt records that
+ * we cache in memory, which controls the number of inodes we try to read
+ * ahead.
+ */
+static inline void
+xfs_iwalk_set_prefetch(
+	struct xfs_iwalk_ag	*iwag,
+	unsigned int		max_prefetch)
+{
+	/*
+	 * Default to 4096 bytes' worth of inobt records; this should be plenty
+	 * of inodes to read ahead.  This number was chosen so that the cache
+	 * is never more than a single memory page and the amount of inode
+	 * readahead is limited to to 16k inodes regardless of CPU:
+	 *
+	 * 4096 bytes / 16 bytes per inobt record = 256 inobt records
+	 * 256 inobt records * 64 inodes per record = 16384 inodes
+	 * 16384 inodes * 512 bytes per inode(?) = 8MB of inode readahead
+	 */
+	iwag->sz_recs = 4096 / sizeof(struct xfs_inobt_rec_incore);
+
+	/*
+	 * If the caller gives us a desired prefetch amount, round it up to
+	 * an even inode chunk and cap it as defined previously.
+	 */
+	if (max_prefetch) {
+		unsigned int	nr;
+
+		nr = round_up(max_prefetch, XFS_INODES_PER_CHUNK) /
+				XFS_INODES_PER_CHUNK;
+		iwag->sz_recs = min_t(unsigned int, iwag->sz_recs, nr);
+	}
+
+	/*
+	 * Allocate enough space to prefetch at least two records so that we
+	 * can cache both the inobt record where the iwalk started and the next
+	 * record.  This simplifies the AG inode walk loop setup code.
+	 */
+	iwag->sz_recs = max_t(unsigned int, iwag->sz_recs, 2);
+}
+
+/*
+ * Walk all inodes in the filesystem starting from @startino.  The @iwalk_fn
+ * will be called for each allocated inode, being passed the inode's number and
+ * @data.  @max_prefetch controls how many inobt records' worth of inodes we
+ * try to readahead.
+ */
+int
+xfs_iwalk(
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	xfs_ino_t		startino,
+	xfs_iwalk_fn		iwalk_fn,
+	unsigned int		max_prefetch,
+	void			*data)
+{
+	struct xfs_iwalk_ag	iwag = {
+		.mp		= mp,
+		.tp		= tp,
+		.iwalk_fn	= iwalk_fn,
+		.data		= data,
+		.startino	= startino,
+	};
+	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, startino);
+	int			error;
+
+	ASSERT(agno < mp->m_sb.sb_agcount);
+
+	xfs_iwalk_set_prefetch(&iwag, max_prefetch);
+	error = xfs_iwalk_alloc(&iwag);
+	if (error)
+		return error;
+
+	for (; agno < mp->m_sb.sb_agcount; agno++) {
+		error = xfs_iwalk_ag(&iwag);
+		if (error)
+			break;
+		iwag.startino = XFS_AGINO_TO_INO(mp, agno + 1, 0);
+	}
+
+	xfs_iwalk_free(&iwag);
+	return error;
+}
diff --git a/fs/xfs/xfs_iwalk.h b/fs/xfs/xfs_iwalk.h
new file mode 100644
index 000000000000..9e762e31dadc
--- /dev/null
+++ b/fs/xfs/xfs_iwalk.h
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2019 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#ifndef __XFS_IWALK_H__
+#define __XFS_IWALK_H__
+
+/* Walk all inodes in the filesystem starting from @startino. */
+typedef int (*xfs_iwalk_fn)(struct xfs_mount *mp, struct xfs_trans *tp,
+			    xfs_ino_t ino, void *data);
+/* Return values for xfs_iwalk_fn. */
+#define XFS_IWALK_CONTINUE	(XFS_ITER_CONTINUE)
+#define XFS_IWALK_ABORT		(XFS_ITER_ABORT)
+
+int xfs_iwalk(struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t startino,
+		xfs_iwalk_fn iwalk_fn, unsigned int max_prefetch, void *data);
+
+#endif /* __XFS_IWALK_H__ */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 2464ea351f83..f9bb1d50bc0e 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3516,6 +3516,46 @@ DEFINE_EVENT(xfs_inode_corrupt_class, name,	\
 DEFINE_INODE_CORRUPT_EVENT(xfs_inode_mark_sick);
 DEFINE_INODE_CORRUPT_EVENT(xfs_inode_mark_healthy);
 
+TRACE_EVENT(xfs_iwalk_ag,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+		 xfs_agino_t startino),
+	TP_ARGS(mp, agno, startino),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agino_t, startino)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->startino = startino;
+	),
+	TP_printk("dev %d:%d agno %d startino %u",
+		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->agno,
+		  __entry->startino)
+)
+
+TRACE_EVENT(xfs_iwalk_ag_rec,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+		 struct xfs_inobt_rec_incore *irec),
+	TP_ARGS(mp, agno, irec),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agino_t, startino)
+		__field(uint64_t, freemask)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->startino = irec->ir_startino;
+		__entry->freemask = irec->ir_free;
+	),
+	TP_printk("dev %d:%d agno %d startino %u freemask 0x%llx",
+		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->agno,
+		  __entry->startino, __entry->freemask)
+)
+
 #endif /* _TRACE_XFS_H */
 
 #undef TRACE_INCLUDE_PATH

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 03/14] xfs: convert quotacheck to use the new iwalk functions
  2019-06-12  6:47 [PATCH v5 00/14] xfs: refactor and improve inode iteration Darrick J. Wong
  2019-06-12  6:47 ` [PATCH 01/14] xfs: create iterator error codes Darrick J. Wong
  2019-06-12  6:47 ` [PATCH 02/14] xfs: create simplified inode walk function Darrick J. Wong
@ 2019-06-12  6:47 ` Darrick J. Wong
  2019-06-12  6:47 ` [PATCH 04/14] xfs: bulkstat should copy lastip whenever userspace supplies one Darrick J. Wong
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: Darrick J. Wong @ 2019-06-12  6:47 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster, Dave Chinner

From: Darrick J. Wong <darrick.wong@oracle.com>

Convert quotacheck to use the new iwalk iterator to dig through the
inodes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_qm.c |   63 +++++++++++++++++--------------------------------------
 1 file changed, 20 insertions(+), 43 deletions(-)


diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index aa6b6db3db0e..52e8ec0aa064 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -14,7 +14,7 @@
 #include "xfs_mount.h"
 #include "xfs_inode.h"
 #include "xfs_ialloc.h"
-#include "xfs_itable.h"
+#include "xfs_iwalk.h"
 #include "xfs_quota.h"
 #include "xfs_error.h"
 #include "xfs_bmap.h"
@@ -1118,17 +1118,15 @@ xfs_qm_quotacheck_dqadjust(
 /* ARGSUSED */
 STATIC int
 xfs_qm_dqusage_adjust(
-	xfs_mount_t	*mp,		/* mount point for filesystem */
-	xfs_ino_t	ino,		/* inode number to get data for */
-	void		__user *buffer,	/* not used */
-	int		ubsize,		/* not used */
-	int		*ubused,	/* not used */
-	int		*res)		/* result code value */
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	xfs_ino_t		ino,
+	void			*data)
 {
-	xfs_inode_t	*ip;
-	xfs_qcnt_t	nblks;
-	xfs_filblks_t	rtblks = 0;	/* total rt blks */
-	int		error;
+	struct xfs_inode	*ip;
+	xfs_qcnt_t		nblks;
+	xfs_filblks_t		rtblks = 0;	/* total rt blks */
+	int			error;
 
 	ASSERT(XFS_IS_QUOTA_RUNNING(mp));
 
@@ -1136,20 +1134,18 @@ xfs_qm_dqusage_adjust(
 	 * rootino must have its resources accounted for, not so with the quota
 	 * inodes.
 	 */
-	if (xfs_is_quota_inode(&mp->m_sb, ino)) {
-		*res = BULKSTAT_RV_NOTHING;
-		return -EINVAL;
-	}
+	if (xfs_is_quota_inode(&mp->m_sb, ino))
+		return 0;
 
 	/*
 	 * We don't _need_ to take the ilock EXCL here because quotacheck runs
 	 * at mount time and therefore nobody will be racing chown/chproj.
 	 */
-	error = xfs_iget(mp, NULL, ino, XFS_IGET_DONTCACHE, 0, &ip);
-	if (error) {
-		*res = BULKSTAT_RV_NOTHING;
+	error = xfs_iget(mp, tp, ino, XFS_IGET_DONTCACHE, 0, &ip);
+	if (error == -EINVAL || error == -ENOENT)
+		return 0;
+	if (error)
 		return error;
-	}
 
 	ASSERT(ip->i_delayed_blks == 0);
 
@@ -1157,7 +1153,7 @@ xfs_qm_dqusage_adjust(
 		struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
 
 		if (!(ifp->if_flags & XFS_IFEXTENTS)) {
-			error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
+			error = xfs_iread_extents(tp, ip, XFS_DATA_FORK);
 			if (error)
 				goto error0;
 		}
@@ -1200,13 +1196,8 @@ xfs_qm_dqusage_adjust(
 			goto error0;
 	}
 
-	xfs_irele(ip);
-	*res = BULKSTAT_RV_DIDONE;
-	return 0;
-
 error0:
 	xfs_irele(ip);
-	*res = BULKSTAT_RV_GIVEUP;
 	return error;
 }
 
@@ -1270,18 +1261,13 @@ STATIC int
 xfs_qm_quotacheck(
 	xfs_mount_t	*mp)
 {
-	int			done, count, error, error2;
-	xfs_ino_t		lastino;
-	size_t			structsz;
+	int			error, error2;
 	uint			flags;
 	LIST_HEAD		(buffer_list);
 	struct xfs_inode	*uip = mp->m_quotainfo->qi_uquotaip;
 	struct xfs_inode	*gip = mp->m_quotainfo->qi_gquotaip;
 	struct xfs_inode	*pip = mp->m_quotainfo->qi_pquotaip;
 
-	count = INT_MAX;
-	structsz = 1;
-	lastino = 0;
 	flags = 0;
 
 	ASSERT(uip || gip || pip);
@@ -1318,18 +1304,9 @@ xfs_qm_quotacheck(
 		flags |= XFS_PQUOTA_CHKD;
 	}
 
-	do {
-		/*
-		 * Iterate thru all the inodes in the file system,
-		 * adjusting the corresponding dquot counters in core.
-		 */
-		error = xfs_bulkstat(mp, &lastino, &count,
-				     xfs_qm_dqusage_adjust,
-				     structsz, NULL, &done);
-		if (error)
-			break;
-
-	} while (!done);
+	error = xfs_iwalk(mp, NULL, 0, xfs_qm_dqusage_adjust, 0, NULL);
+	if (error)
+		goto error_return;
 
 	/*
 	 * We've made all the changes that we need to make incore.  Flush them

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 04/14] xfs: bulkstat should copy lastip whenever userspace supplies one
  2019-06-12  6:47 [PATCH v5 00/14] xfs: refactor and improve inode iteration Darrick J. Wong
                   ` (2 preceding siblings ...)
  2019-06-12  6:47 ` [PATCH 03/14] xfs: convert quotacheck to use the new iwalk functions Darrick J. Wong
@ 2019-06-12  6:47 ` Darrick J. Wong
  2019-06-12  6:48 ` [PATCH 05/14] xfs: remove unnecessary includes of xfs_itable.h Darrick J. Wong
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: Darrick J. Wong @ 2019-06-12  6:47 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

When userspace passes in a @lastip pointer we should copy the results
back, even if the @ocount pointer is NULL.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_ioctl.c   |   13 ++++++-------
 fs/xfs/xfs_ioctl32.c |   13 ++++++-------
 2 files changed, 12 insertions(+), 14 deletions(-)


diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index d7dfc13f30f5..5ffbdcff3dba 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -768,14 +768,13 @@ xfs_ioc_bulkstat(
 	if (error)
 		return error;
 
-	if (bulkreq.ocount != NULL) {
-		if (copy_to_user(bulkreq.lastip, &inlast,
-						sizeof(xfs_ino_t)))
-			return -EFAULT;
+	if (bulkreq.lastip != NULL &&
+	    copy_to_user(bulkreq.lastip, &inlast, sizeof(xfs_ino_t)))
+		return -EFAULT;
 
-		if (copy_to_user(bulkreq.ocount, &count, sizeof(count)))
-			return -EFAULT;
-	}
+	if (bulkreq.ocount != NULL &&
+	    copy_to_user(bulkreq.ocount, &count, sizeof(count)))
+		return -EFAULT;
 
 	return 0;
 }
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index 614fc6886d24..814ffe6fbab7 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -310,14 +310,13 @@ xfs_compat_ioc_bulkstat(
 	if (error)
 		return error;
 
-	if (bulkreq.ocount != NULL) {
-		if (copy_to_user(bulkreq.lastip, &inlast,
-						sizeof(xfs_ino_t)))
-			return -EFAULT;
+	if (bulkreq.lastip != NULL &&
+	    copy_to_user(bulkreq.lastip, &inlast, sizeof(xfs_ino_t)))
+		return -EFAULT;
 
-		if (copy_to_user(bulkreq.ocount, &count, sizeof(count)))
-			return -EFAULT;
-	}
+	if (bulkreq.ocount != NULL &&
+	    copy_to_user(bulkreq.ocount, &count, sizeof(count)))
+		return -EFAULT;
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 05/14] xfs: remove unnecessary includes of xfs_itable.h
  2019-06-12  6:47 [PATCH v5 00/14] xfs: refactor and improve inode iteration Darrick J. Wong
                   ` (3 preceding siblings ...)
  2019-06-12  6:47 ` [PATCH 04/14] xfs: bulkstat should copy lastip whenever userspace supplies one Darrick J. Wong
@ 2019-06-12  6:48 ` Darrick J. Wong
  2019-06-13 16:27   ` Brian Foster
  2019-06-12  6:48 ` [PATCH 06/14] xfs: convert bulkstat to new iwalk infrastructure Darrick J. Wong
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Darrick J. Wong @ 2019-06-12  6:48 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Don't include xfs_itable.h in files that don't need it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/common.c |    1 -
 fs/xfs/scrub/dir.c    |    1 -
 fs/xfs/scrub/scrub.c  |    1 -
 fs/xfs/xfs_trace.c    |    1 -
 4 files changed, 4 deletions(-)


diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 973aa59975e3..561d7e818e8b 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -17,7 +17,6 @@
 #include "xfs_sb.h"
 #include "xfs_inode.h"
 #include "xfs_icache.h"
-#include "xfs_itable.h"
 #include "xfs_alloc.h"
 #include "xfs_alloc_btree.h"
 #include "xfs_bmap.h"
diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index a38a22785a1a..9018ca4aba64 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -17,7 +17,6 @@
 #include "xfs_sb.h"
 #include "xfs_inode.h"
 #include "xfs_icache.h"
-#include "xfs_itable.h"
 #include "xfs_da_format.h"
 #include "xfs_da_btree.h"
 #include "xfs_dir2.h"
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index f630389ee176..5689a33e999c 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -17,7 +17,6 @@
 #include "xfs_sb.h"
 #include "xfs_inode.h"
 #include "xfs_icache.h"
-#include "xfs_itable.h"
 #include "xfs_alloc.h"
 #include "xfs_alloc_btree.h"
 #include "xfs_bmap.h"
diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c
index cb6489c22cad..f555a3c560b9 100644
--- a/fs/xfs/xfs_trace.c
+++ b/fs/xfs/xfs_trace.c
@@ -16,7 +16,6 @@
 #include "xfs_btree.h"
 #include "xfs_da_btree.h"
 #include "xfs_ialloc.h"
-#include "xfs_itable.h"
 #include "xfs_alloc.h"
 #include "xfs_bmap.h"
 #include "xfs_attr.h"

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 06/14] xfs: convert bulkstat to new iwalk infrastructure
  2019-06-12  6:47 [PATCH v5 00/14] xfs: refactor and improve inode iteration Darrick J. Wong
                   ` (4 preceding siblings ...)
  2019-06-12  6:48 ` [PATCH 05/14] xfs: remove unnecessary includes of xfs_itable.h Darrick J. Wong
@ 2019-06-12  6:48 ` Darrick J. Wong
  2019-06-13 16:31   ` Brian Foster
  2019-06-12  6:48 ` [PATCH 07/14] xfs: move bulkstat ichunk helpers to iwalk code Darrick J. Wong
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Darrick J. Wong @ 2019-06-12  6:48 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Create a new ibulk structure incore to help us deal with bulk inode stat
state tracking and then convert the bulkstat code to use the new iwalk
iterator.  This disentangles inode walking from bulk stat control for
simpler code and enables us to isolate the formatter functions to the
ioctl handling code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_ioctl.c   |   70 ++++++--
 fs/xfs/xfs_ioctl.h   |    5 +
 fs/xfs/xfs_ioctl32.c |   93 ++++++-----
 fs/xfs/xfs_itable.c  |  431 ++++++++++++++++----------------------------------
 fs/xfs/xfs_itable.h  |   79 ++++-----
 5 files changed, 272 insertions(+), 406 deletions(-)


diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 5ffbdcff3dba..60595e61f2a6 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -14,6 +14,7 @@
 #include "xfs_ioctl.h"
 #include "xfs_alloc.h"
 #include "xfs_rtalloc.h"
+#include "xfs_iwalk.h"
 #include "xfs_itable.h"
 #include "xfs_error.h"
 #include "xfs_attr.h"
@@ -721,16 +722,29 @@ xfs_ioc_space(
 	return error;
 }
 
+/* Return 0 on success or positive error */
+int
+xfs_bulkstat_one_fmt(
+	struct xfs_ibulk	*breq,
+	const struct xfs_bstat	*bstat)
+{
+	if (copy_to_user(breq->ubuffer, bstat, sizeof(*bstat)))
+		return -EFAULT;
+	return xfs_ibulk_advance(breq, sizeof(struct xfs_bstat));
+}
+
 STATIC int
 xfs_ioc_bulkstat(
 	xfs_mount_t		*mp,
 	unsigned int		cmd,
 	void			__user *arg)
 {
-	xfs_fsop_bulkreq_t	bulkreq;
-	int			count;	/* # of records returned */
-	xfs_ino_t		inlast;	/* last inode number */
-	int			done;
+	struct xfs_fsop_bulkreq	bulkreq;
+	struct xfs_ibulk	breq = {
+		.mp		= mp,
+		.ocount		= 0,
+	};
+	xfs_ino_t		lastino;
 	int			error;
 
 	/* done = 1 if there are more stats to get and if bulkstat */
@@ -745,35 +759,57 @@ xfs_ioc_bulkstat(
 	if (copy_from_user(&bulkreq, arg, sizeof(xfs_fsop_bulkreq_t)))
 		return -EFAULT;
 
-	if (copy_from_user(&inlast, bulkreq.lastip, sizeof(__s64)))
+	if (copy_from_user(&lastino, bulkreq.lastip, sizeof(__s64)))
 		return -EFAULT;
 
-	if ((count = bulkreq.icount) <= 0)
+	if (bulkreq.icount <= 0)
 		return -EINVAL;
 
 	if (bulkreq.ubuffer == NULL)
 		return -EINVAL;
 
-	if (cmd == XFS_IOC_FSINUMBERS)
-		error = xfs_inumbers(mp, &inlast, &count,
+	breq.ubuffer = bulkreq.ubuffer;
+	breq.icount = bulkreq.icount;
+
+	/*
+	 * FSBULKSTAT_SINGLE expects that *lastip contains the inode number
+	 * that we want to stat.  However, FSINUMBERS and FSBULKSTAT expect
+	 * that *lastip contains either zero or the number of the last inode to
+	 * be examined by the previous call and return results starting with
+	 * the next inode after that.  The new bulk request back end functions
+	 * take the inode to start with, so we have to compute the startino
+	 * parameter from lastino to maintain correct function.  lastino == 0
+	 * is a special case because it has traditionally meant "first inode
+	 * in filesystem".
+	 */
+	if (cmd == XFS_IOC_FSINUMBERS) {
+		int	count = breq.icount;
+
+		breq.startino = lastino;
+		error = xfs_inumbers(mp, &breq.startino, &count,
 					bulkreq.ubuffer, xfs_inumbers_fmt);
-	else if (cmd == XFS_IOC_FSBULKSTAT_SINGLE)
-		error = xfs_bulkstat_one(mp, inlast, bulkreq.ubuffer,
-					sizeof(xfs_bstat_t), NULL, &done);
-	else	/* XFS_IOC_FSBULKSTAT */
-		error = xfs_bulkstat(mp, &inlast, &count, xfs_bulkstat_one,
-				     sizeof(xfs_bstat_t), bulkreq.ubuffer,
-				     &done);
+		breq.ocount = count;
+		lastino = breq.startino;
+	} else if (cmd == XFS_IOC_FSBULKSTAT_SINGLE) {
+		breq.startino = lastino;
+		breq.icount = 1;
+		error = xfs_bulkstat_one(&breq, xfs_bulkstat_one_fmt);
+		lastino = breq.startino;
+	} else {	/* XFS_IOC_FSBULKSTAT */
+		breq.startino = lastino ? lastino + 1 : 0;
+		error = xfs_bulkstat(&breq, xfs_bulkstat_one_fmt);
+		lastino = breq.startino - 1;
+	}
 
 	if (error)
 		return error;
 
 	if (bulkreq.lastip != NULL &&
-	    copy_to_user(bulkreq.lastip, &inlast, sizeof(xfs_ino_t)))
+	    copy_to_user(bulkreq.lastip, &lastino, sizeof(xfs_ino_t)))
 		return -EFAULT;
 
 	if (bulkreq.ocount != NULL &&
-	    copy_to_user(bulkreq.ocount, &count, sizeof(count)))
+	    copy_to_user(bulkreq.ocount, &breq.ocount, sizeof(__s32)))
 		return -EFAULT;
 
 	return 0;
diff --git a/fs/xfs/xfs_ioctl.h b/fs/xfs/xfs_ioctl.h
index 4b17f67c888a..f32c8aadfeba 100644
--- a/fs/xfs/xfs_ioctl.h
+++ b/fs/xfs/xfs_ioctl.h
@@ -77,4 +77,9 @@ xfs_set_dmattrs(
 	uint			evmask,
 	uint16_t		state);
 
+struct xfs_ibulk;
+struct xfs_bstat;
+
+int xfs_bulkstat_one_fmt(struct xfs_ibulk *breq, const struct xfs_bstat *bstat);
+
 #endif
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index 814ffe6fbab7..5d1c143bac18 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -11,11 +11,13 @@
 #include <linux/fsmap.h>
 #include "xfs.h"
 #include "xfs_fs.h"
+#include "xfs_shared.h"
 #include "xfs_format.h"
 #include "xfs_log_format.h"
 #include "xfs_trans_resv.h"
 #include "xfs_mount.h"
 #include "xfs_inode.h"
+#include "xfs_iwalk.h"
 #include "xfs_itable.h"
 #include "xfs_error.h"
 #include "xfs_fsops.h"
@@ -172,15 +174,10 @@ xfs_bstime_store_compat(
 /* Return 0 on success or positive error (to xfs_bulkstat()) */
 STATIC int
 xfs_bulkstat_one_fmt_compat(
-	void			__user *ubuffer,
-	int			ubsize,
-	int			*ubused,
-	const xfs_bstat_t	*buffer)
+	struct xfs_ibulk	*breq,
+	const struct xfs_bstat	*buffer)
 {
-	compat_xfs_bstat_t	__user *p32 = ubuffer;
-
-	if (ubsize < sizeof(*p32))
-		return -ENOMEM;
+	struct compat_xfs_bstat	__user *p32 = breq->ubuffer;
 
 	if (put_user(buffer->bs_ino,	  &p32->bs_ino)		||
 	    put_user(buffer->bs_mode,	  &p32->bs_mode)	||
@@ -205,23 +202,8 @@ xfs_bulkstat_one_fmt_compat(
 	    put_user(buffer->bs_dmstate,  &p32->bs_dmstate)	||
 	    put_user(buffer->bs_aextents, &p32->bs_aextents))
 		return -EFAULT;
-	if (ubused)
-		*ubused = sizeof(*p32);
-	return 0;
-}
 
-STATIC int
-xfs_bulkstat_one_compat(
-	xfs_mount_t	*mp,		/* mount point for filesystem */
-	xfs_ino_t	ino,		/* inode number to get data for */
-	void		__user *buffer,	/* buffer to place output in */
-	int		ubsize,		/* size of buffer */
-	int		*ubused,	/* bytes used by me */
-	int		*stat)		/* BULKSTAT_RV_... */
-{
-	return xfs_bulkstat_one_int(mp, ino, buffer, ubsize,
-				    xfs_bulkstat_one_fmt_compat,
-				    ubused, stat);
+	return xfs_ibulk_advance(breq, sizeof(struct compat_xfs_bstat));
 }
 
 /* copied from xfs_ioctl.c */
@@ -232,10 +214,12 @@ xfs_compat_ioc_bulkstat(
 	compat_xfs_fsop_bulkreq_t __user *p32)
 {
 	u32			addr;
-	xfs_fsop_bulkreq_t	bulkreq;
-	int			count;	/* # of records returned */
-	xfs_ino_t		inlast;	/* last inode number */
-	int			done;
+	struct xfs_fsop_bulkreq	bulkreq;
+	struct xfs_ibulk	breq = {
+		.mp		= mp,
+		.ocount		= 0,
+	};
+	xfs_ino_t		lastino;
 	int			error;
 
 	/*
@@ -245,8 +229,7 @@ xfs_compat_ioc_bulkstat(
 	 * functions and structure size are the correct ones to use ...
 	 */
 	inumbers_fmt_pf inumbers_func = xfs_inumbers_fmt_compat;
-	bulkstat_one_pf	bs_one_func = xfs_bulkstat_one_compat;
-	size_t bs_one_size = sizeof(struct compat_xfs_bstat);
+	bulkstat_one_fmt_pf	bs_one_func = xfs_bulkstat_one_fmt_compat;
 
 #ifdef CONFIG_X86_X32
 	if (in_x32_syscall()) {
@@ -259,8 +242,7 @@ xfs_compat_ioc_bulkstat(
 		 * x32 userspace expects.
 		 */
 		inumbers_func = xfs_inumbers_fmt;
-		bs_one_func = xfs_bulkstat_one;
-		bs_one_size = sizeof(struct xfs_bstat);
+		bs_one_func = xfs_bulkstat_one_fmt;
 	}
 #endif
 
@@ -284,38 +266,59 @@ xfs_compat_ioc_bulkstat(
 		return -EFAULT;
 	bulkreq.ocount = compat_ptr(addr);
 
-	if (copy_from_user(&inlast, bulkreq.lastip, sizeof(__s64)))
+	if (copy_from_user(&lastino, bulkreq.lastip, sizeof(__s64)))
 		return -EFAULT;
+	breq.startino = lastino + 1;
 
-	if ((count = bulkreq.icount) <= 0)
+	if (bulkreq.icount <= 0)
 		return -EINVAL;
 
 	if (bulkreq.ubuffer == NULL)
 		return -EINVAL;
 
+	breq.ubuffer = bulkreq.ubuffer;
+	breq.icount = bulkreq.icount;
+
+	/*
+	 * FSBULKSTAT_SINGLE expects that *lastip contains the inode number
+	 * that we want to stat.  However, FSINUMBERS and FSBULKSTAT expect
+	 * that *lastip contains either zero or the number of the last inode to
+	 * be examined by the previous call and return results starting with
+	 * the next inode after that.  The new bulk request back end functions
+	 * take the inode to start with, so we have to compute the startino
+	 * parameter from lastino to maintain correct function.  lastino == 0
+	 * is a special case because it has traditionally meant "first inode
+	 * in filesystem".
+	 */
 	if (cmd == XFS_IOC_FSINUMBERS_32) {
-		error = xfs_inumbers(mp, &inlast, &count,
+		int	count = breq.icount;
+
+		breq.startino = lastino;
+		error = xfs_inumbers(mp, &breq.startino, &count,
 				bulkreq.ubuffer, inumbers_func);
+		breq.ocount = count;
+		lastino = breq.startino;
 	} else if (cmd == XFS_IOC_FSBULKSTAT_SINGLE_32) {
-		int res;
-
-		error = bs_one_func(mp, inlast, bulkreq.ubuffer,
-				bs_one_size, NULL, &res);
+		breq.startino = lastino;
+		breq.icount = 1;
+		error = xfs_bulkstat_one(&breq, bs_one_func);
+		lastino = breq.startino;
 	} else if (cmd == XFS_IOC_FSBULKSTAT_32) {
-		error = xfs_bulkstat(mp, &inlast, &count,
-			bs_one_func, bs_one_size,
-			bulkreq.ubuffer, &done);
-	} else
+		breq.startino = lastino ? lastino + 1 : 0;
+		error = xfs_bulkstat(&breq, bs_one_func);
+		lastino = breq.startino - 1;
+	} else {
 		error = -EINVAL;
+	}
 	if (error)
 		return error;
 
 	if (bulkreq.lastip != NULL &&
-	    copy_to_user(bulkreq.lastip, &inlast, sizeof(xfs_ino_t)))
+	    copy_to_user(bulkreq.lastip, &lastino, sizeof(xfs_ino_t)))
 		return -EFAULT;
 
 	if (bulkreq.ocount != NULL &&
-	    copy_to_user(bulkreq.ocount, &count, sizeof(count)))
+	    copy_to_user(bulkreq.ocount, &breq.ocount, sizeof(__s32)))
 		return -EFAULT;
 
 	return 0;
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index 3ca1c454afe6..58e411e11d6c 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -14,47 +14,68 @@
 #include "xfs_btree.h"
 #include "xfs_ialloc.h"
 #include "xfs_ialloc_btree.h"
+#include "xfs_iwalk.h"
 #include "xfs_itable.h"
 #include "xfs_error.h"
 #include "xfs_trace.h"
 #include "xfs_icache.h"
 #include "xfs_health.h"
-#include "xfs_iwalk.h"
 
 /*
- * Return stat information for one inode.
- * Return 0 if ok, else errno.
+ * Bulk Stat
+ * =========
+ *
+ * Use the inode walking functions to fill out struct xfs_bstat for every
+ * allocated inode, then pass the stat information to some externally provided
+ * iteration function.
  */
-int
+
+struct xfs_bstat_chunk {
+	bulkstat_one_fmt_pf	formatter;
+	struct xfs_ibulk	*breq;
+	struct xfs_bstat	*buf;
+};
+
+/*
+ * Fill out the bulkstat info for a single inode and report it somewhere.
+ *
+ * bc->breq->lastino is effectively the inode cursor as we walk through the
+ * filesystem.  Therefore, we update it any time we need to move the cursor
+ * forward, regardless of whether or not we're sending any bstat information
+ * back to userspace.  If the inode is internal metadata or, has been freed
+ * out from under us, we just simply keep going.
+ *
+ * However, if any other type of error happens we want to stop right where we
+ * are so that userspace will call back with exact number of the bad inode and
+ * we can send back an error code.
+ *
+ * Note that if the formatter tells us there's no space left in the buffer we
+ * move the cursor forward and abort the walk.
+ */
+STATIC int
 xfs_bulkstat_one_int(
-	struct xfs_mount	*mp,		/* mount point for filesystem */
-	xfs_ino_t		ino,		/* inode to get data for */
-	void __user		*buffer,	/* buffer to place output in */
-	int			ubsize,		/* size of buffer */
-	bulkstat_one_fmt_pf	formatter,	/* formatter, copy to user */
-	int			*ubused,	/* bytes used by me */
-	int			*stat)		/* BULKSTAT_RV_... */
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	xfs_ino_t		ino,
+	void			*data)
 {
+	struct xfs_bstat_chunk	*bc = data;
 	struct xfs_icdinode	*dic;		/* dinode core info pointer */
 	struct xfs_inode	*ip;		/* incore inode pointer */
 	struct inode		*inode;
-	struct xfs_bstat	*buf;		/* return buffer */
-	int			error = 0;	/* error value */
+	struct xfs_bstat	*buf = bc->buf;
+	int			error = -EINVAL;
 
-	*stat = BULKSTAT_RV_NOTHING;
+	if (xfs_internal_inum(mp, ino))
+		goto out_advance;
 
-	if (!buffer || xfs_internal_inum(mp, ino))
-		return -EINVAL;
-
-	buf = kmem_zalloc(sizeof(*buf), KM_SLEEP | KM_MAYFAIL);
-	if (!buf)
-		return -ENOMEM;
-
-	error = xfs_iget(mp, NULL, ino,
+	error = xfs_iget(mp, tp, ino,
 			 (XFS_IGET_DONTCACHE | XFS_IGET_UNTRUSTED),
 			 XFS_ILOCK_SHARED, &ip);
+	if (error == -ENOENT || error == -EINVAL)
+		goto out_advance;
 	if (error)
-		goto out_free;
+		goto out;
 
 	ASSERT(ip != NULL);
 	ASSERT(ip->i_imap.im_blkno != 0);
@@ -119,43 +140,56 @@ xfs_bulkstat_one_int(
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
 	xfs_irele(ip);
 
-	error = formatter(buffer, ubsize, ubused, buf);
-	if (!error)
-		*stat = BULKSTAT_RV_DIDONE;
+	error = bc->formatter(bc->breq, buf);
+	if (error == XFS_IBULK_BUFFER_FULL) {
+		error = XFS_IWALK_ABORT;
+		goto out_advance;
+	}
+	if (error)
+		goto out;
 
- out_free:
-	kmem_free(buf);
+out_advance:
+	/*
+	 * Advance the cursor to the inode that comes after the one we just
+	 * looked at.  We want the caller to move along if the bulkstat
+	 * information was copied successfully; if we tried to grab the inode
+	 * but it's no longer allocated; or if it's internal metadata.
+	 */
+	bc->breq->startino = ino + 1;
+out:
 	return error;
 }
 
-/* Return 0 on success or positive error */
-STATIC int
-xfs_bulkstat_one_fmt(
-	void			__user *ubuffer,
-	int			ubsize,
-	int			*ubused,
-	const xfs_bstat_t	*buffer)
-{
-	if (ubsize < sizeof(*buffer))
-		return -ENOMEM;
-	if (copy_to_user(ubuffer, buffer, sizeof(*buffer)))
-		return -EFAULT;
-	if (ubused)
-		*ubused = sizeof(*buffer);
-	return 0;
-}
-
+/* Bulkstat a single inode. */
 int
 xfs_bulkstat_one(
-	xfs_mount_t	*mp,		/* mount point for filesystem */
-	xfs_ino_t	ino,		/* inode number to get data for */
-	void		__user *buffer,	/* buffer to place output in */
-	int		ubsize,		/* size of buffer */
-	int		*ubused,	/* bytes used by me */
-	int		*stat)		/* BULKSTAT_RV_... */
+	struct xfs_ibulk	*breq,
+	bulkstat_one_fmt_pf	formatter)
 {
-	return xfs_bulkstat_one_int(mp, ino, buffer, ubsize,
-				    xfs_bulkstat_one_fmt, ubused, stat);
+	struct xfs_bstat_chunk	bc = {
+		.formatter	= formatter,
+		.breq		= breq,
+	};
+	int			error;
+
+	ASSERT(breq->icount == 1);
+
+	bc.buf = kmem_zalloc(sizeof(struct xfs_bstat), KM_SLEEP | KM_MAYFAIL);
+	if (!bc.buf)
+		return -ENOMEM;
+
+	error = xfs_bulkstat_one_int(breq->mp, NULL, breq->startino, &bc);
+
+	kmem_free(bc.buf);
+
+	/*
+	 * If we reported one inode to userspace then we abort because we hit
+	 * the end of the buffer.  Don't leak that back to userspace.
+	 */
+	if (error == XFS_IWALK_ABORT)
+		error = 0;
+
+	return error;
 }
 
 /*
@@ -251,256 +285,69 @@ xfs_bulkstat_grab_ichunk(
 
 #define XFS_BULKSTAT_UBLEFT(ubleft)	((ubleft) >= statstruct_size)
 
-struct xfs_bulkstat_agichunk {
-	char		__user **ac_ubuffer;/* pointer into user's buffer */
-	int		ac_ubleft;	/* bytes left in user's buffer */
-	int		ac_ubelem;	/* spaces used in user's buffer */
-};
-
-/*
- * Process inodes in chunk with a pointer to a formatter function
- * that will iget the inode and fill in the appropriate structure.
- */
 static int
-xfs_bulkstat_ag_ichunk(
-	struct xfs_mount		*mp,
-	xfs_agnumber_t			agno,
-	struct xfs_inobt_rec_incore	*irbp,
-	bulkstat_one_pf			formatter,
-	size_t				statstruct_size,
-	struct xfs_bulkstat_agichunk	*acp,
-	xfs_agino_t			*last_agino)
+xfs_bulkstat_iwalk(
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	xfs_ino_t		ino,
+	void			*data)
 {
-	char				__user **ubufp = acp->ac_ubuffer;
-	int				chunkidx;
-	int				error = 0;
-	xfs_agino_t			agino = irbp->ir_startino;
-
-	for (chunkidx = 0; chunkidx < XFS_INODES_PER_CHUNK;
-	     chunkidx++, agino++) {
-		int		fmterror;
-		int		ubused;
-
-		/* inode won't fit in buffer, we are done */
-		if (acp->ac_ubleft < statstruct_size)
-			break;
-
-		/* Skip if this inode is free */
-		if (XFS_INOBT_MASK(chunkidx) & irbp->ir_free)
-			continue;
-
-		/* Get the inode and fill in a single buffer */
-		ubused = statstruct_size;
-		error = formatter(mp, XFS_AGINO_TO_INO(mp, agno, agino),
-				  *ubufp, acp->ac_ubleft, &ubused, &fmterror);
-
-		if (fmterror == BULKSTAT_RV_GIVEUP ||
-		    (error && error != -ENOENT && error != -EINVAL)) {
-			acp->ac_ubleft = 0;
-			ASSERT(error);
-			break;
-		}
-
-		/* be careful not to leak error if at end of chunk */
-		if (fmterror == BULKSTAT_RV_NOTHING || error) {
-			error = 0;
-			continue;
-		}
-
-		*ubufp += ubused;
-		acp->ac_ubleft -= ubused;
-		acp->ac_ubelem++;
-	}
-
-	/*
-	 * Post-update *last_agino. At this point, agino will always point one
-	 * inode past the last inode we processed successfully. Hence we
-	 * substract that inode when setting the *last_agino cursor so that we
-	 * return the correct cookie to userspace. On the next bulkstat call,
-	 * the inode under the lastino cookie will be skipped as we have already
-	 * processed it here.
-	 */
-	*last_agino = agino - 1;
+	int			error;
 
+	error = xfs_bulkstat_one_int(mp, tp, ino, data);
+	/* bulkstat just skips over missing inodes */
+	if (error == -ENOENT || error == -EINVAL)
+		return 0;
 	return error;
 }
 
 /*
- * Return stat information in bulk (by-inode) for the filesystem.
+ * Check the incoming lastino parameter.
+ *
+ * We allow any inode value that could map to physical space inside the
+ * filesystem because if there are no inodes there, bulkstat moves on to the
+ * next chunk.  In other words, the magic agino value of zero takes us to the
+ * first chunk in the AG, and an agino value past the end of the AG takes us to
+ * the first chunk in the next AG.
+ *
+ * Therefore we can end early if the requested inode is beyond the end of the
+ * filesystem or doesn't map properly.
  */
-int					/* error status */
-xfs_bulkstat(
-	xfs_mount_t		*mp,	/* mount point for filesystem */
-	xfs_ino_t		*lastinop, /* last inode returned */
-	int			*ubcountp, /* size of buffer/count returned */
-	bulkstat_one_pf		formatter, /* func that'd fill a single buf */
-	size_t			statstruct_size, /* sizeof struct filling */
-	char			__user *ubuffer, /* buffer with inode stats */
-	int			*done)	/* 1 if there are more stats to get */
+static inline bool
+xfs_bulkstat_already_done(
+	struct xfs_mount	*mp,
+	xfs_ino_t		startino)
 {
-	xfs_buf_t		*agbp;	/* agi header buffer */
-	xfs_agino_t		agino;	/* inode # in allocation group */
-	xfs_agnumber_t		agno;	/* allocation group number */
-	xfs_btree_cur_t		*cur;	/* btree cursor for ialloc btree */
-	xfs_inobt_rec_incore_t	*irbuf;	/* start of irec buffer */
-	int			nirbuf;	/* size of irbuf */
-	int			ubcount; /* size of user's buffer */
-	struct xfs_bulkstat_agichunk ac;
-	int			error = 0;
+	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, startino);
+	xfs_agino_t		agino = XFS_INO_TO_AGINO(mp, startino);
 
-	/*
-	 * Get the last inode value, see if there's nothing to do.
-	 */
-	agno = XFS_INO_TO_AGNO(mp, *lastinop);
-	agino = XFS_INO_TO_AGINO(mp, *lastinop);
-	if (agno >= mp->m_sb.sb_agcount ||
-	    *lastinop != XFS_AGINO_TO_INO(mp, agno, agino)) {
-		*done = 1;
-		*ubcountp = 0;
-		return 0;
-	}
+	return agno >= mp->m_sb.sb_agcount ||
+	       startino != XFS_AGINO_TO_INO(mp, agno, agino);
+}
 
-	ubcount = *ubcountp; /* statstruct's */
-	ac.ac_ubuffer = &ubuffer;
-	ac.ac_ubleft = ubcount * statstruct_size; /* bytes */;
-	ac.ac_ubelem = 0;
+/* Return stat information in bulk (by-inode) for the filesystem. */
+int
+xfs_bulkstat(
+	struct xfs_ibulk	*breq,
+	bulkstat_one_fmt_pf	formatter)
+{
+	struct xfs_bstat_chunk	bc = {
+		.formatter	= formatter,
+		.breq		= breq,
+	};
+	int			error;
 
-	*ubcountp = 0;
-	*done = 0;
+	if (xfs_bulkstat_already_done(breq->mp, breq->startino))
+		return 0;
 
-	irbuf = kmem_zalloc_large(PAGE_SIZE * 4, KM_SLEEP);
-	if (!irbuf)
+	bc.buf = kmem_zalloc(sizeof(struct xfs_bstat), KM_SLEEP | KM_MAYFAIL);
+	if (!bc.buf)
 		return -ENOMEM;
-	nirbuf = (PAGE_SIZE * 4) / sizeof(*irbuf);
 
-	/*
-	 * Loop over the allocation groups, starting from the last
-	 * inode returned; 0 means start of the allocation group.
-	 */
-	while (agno < mp->m_sb.sb_agcount) {
-		struct xfs_inobt_rec_incore	*irbp = irbuf;
-		struct xfs_inobt_rec_incore	*irbufend = irbuf + nirbuf;
-		bool				end_of_ag = false;
-		int				icount = 0;
-		int				stat;
+	error = xfs_iwalk(breq->mp, NULL, breq->startino, xfs_bulkstat_iwalk,
+			breq->icount, &bc);
 
-		error = xfs_ialloc_read_agi(mp, NULL, agno, &agbp);
-		if (error)
-			break;
-		/*
-		 * Allocate and initialize a btree cursor for ialloc btree.
-		 */
-		cur = xfs_inobt_init_cursor(mp, NULL, agbp, agno,
-					    XFS_BTNUM_INO);
-		if (agino > 0) {
-			/*
-			 * In the middle of an allocation group, we need to get
-			 * the remainder of the chunk we're in.
-			 */
-			struct xfs_inobt_rec_incore	r;
-
-			error = xfs_bulkstat_grab_ichunk(cur, agino, &icount, &r);
-			if (error)
-				goto del_cursor;
-			if (icount) {
-				irbp->ir_startino = r.ir_startino;
-				irbp->ir_holemask = r.ir_holemask;
-				irbp->ir_count = r.ir_count;
-				irbp->ir_freecount = r.ir_freecount;
-				irbp->ir_free = r.ir_free;
-				irbp++;
-			}
-			/* Increment to the next record */
-			error = xfs_btree_increment(cur, 0, &stat);
-		} else {
-			/* Start of ag.  Lookup the first inode chunk */
-			error = xfs_inobt_lookup(cur, 0, XFS_LOOKUP_GE, &stat);
-		}
-		if (error || stat == 0) {
-			end_of_ag = true;
-			goto del_cursor;
-		}
-
-		/*
-		 * Loop through inode btree records in this ag,
-		 * until we run out of inodes or space in the buffer.
-		 */
-		while (irbp < irbufend && icount < ubcount) {
-			struct xfs_inobt_rec_incore	r;
-
-			error = xfs_inobt_get_rec(cur, &r, &stat);
-			if (error || stat == 0) {
-				end_of_ag = true;
-				goto del_cursor;
-			}
-
-			/*
-			 * If this chunk has any allocated inodes, save it.
-			 * Also start read-ahead now for this chunk.
-			 */
-			if (r.ir_freecount < r.ir_count) {
-				xfs_bulkstat_ichunk_ra(mp, agno, &r);
-				irbp->ir_startino = r.ir_startino;
-				irbp->ir_holemask = r.ir_holemask;
-				irbp->ir_count = r.ir_count;
-				irbp->ir_freecount = r.ir_freecount;
-				irbp->ir_free = r.ir_free;
-				irbp++;
-				icount += r.ir_count - r.ir_freecount;
-			}
-			error = xfs_btree_increment(cur, 0, &stat);
-			if (error || stat == 0) {
-				end_of_ag = true;
-				goto del_cursor;
-			}
-			cond_resched();
-		}
-
-		/*
-		 * Drop the btree buffers and the agi buffer as we can't hold any
-		 * of the locks these represent when calling iget. If there is a
-		 * pending error, then we are done.
-		 */
-del_cursor:
-		xfs_btree_del_cursor(cur, error);
-		xfs_buf_relse(agbp);
-		if (error)
-			break;
-		/*
-		 * Now format all the good inodes into the user's buffer. The
-		 * call to xfs_bulkstat_ag_ichunk() sets up the agino pointer
-		 * for the next loop iteration.
-		 */
-		irbufend = irbp;
-		for (irbp = irbuf;
-		     irbp < irbufend && ac.ac_ubleft >= statstruct_size;
-		     irbp++) {
-			error = xfs_bulkstat_ag_ichunk(mp, agno, irbp,
-					formatter, statstruct_size, &ac,
-					&agino);
-			if (error)
-				break;
-
-			cond_resched();
-		}
-
-		/*
-		 * If we've run out of space or had a formatting error, we
-		 * are now done
-		 */
-		if (ac.ac_ubleft < statstruct_size || error)
-			break;
-
-		if (end_of_ag) {
-			agno++;
-			agino = 0;
-		}
-	}
-	/*
-	 * Done, we're either out of filesystem or space to put the data.
-	 */
-	kmem_free(irbuf);
-	*ubcountp = ac.ac_ubelem;
+	kmem_free(bc.buf);
 
 	/*
 	 * We found some inodes, so clear the error status and return them.
@@ -509,17 +356,9 @@ xfs_bulkstat(
 	 * triggered again and propagated to userspace as there will be no
 	 * formatted inodes in the buffer.
 	 */
-	if (ac.ac_ubelem)
+	if (breq->ocount > 0)
 		error = 0;
 
-	/*
-	 * If we ran out of filesystem, lastino will point off the end of
-	 * the filesystem so the next call will return immediately.
-	 */
-	*lastinop = XFS_AGINO_TO_INO(mp, agno, agino);
-	if (agno >= mp->m_sb.sb_agcount)
-		*done = 1;
-
 	return error;
 }
 
diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
index 369e3f159d4e..7c5f1df360e6 100644
--- a/fs/xfs/xfs_itable.h
+++ b/fs/xfs/xfs_itable.h
@@ -5,63 +5,46 @@
 #ifndef __XFS_ITABLE_H__
 #define	__XFS_ITABLE_H__
 
-/*
- * xfs_bulkstat() is used to fill in xfs_bstat structures as well as dm_stat
- * structures (by the dmi library). This is a pointer to a formatter function
- * that will iget the inode and fill in the appropriate structure.
- * see xfs_bulkstat_one() and xfs_dm_bulkstat_one() in dmapi_xfs.c
- */
-typedef int (*bulkstat_one_pf)(struct xfs_mount	*mp,
-			       xfs_ino_t	ino,
-			       void		__user *buffer,
-			       int		ubsize,
-			       int		*ubused,
-			       int		*stat);
+/* In-memory representation of a userspace request for batch inode data. */
+struct xfs_ibulk {
+	struct xfs_mount	*mp;
+	void __user		*ubuffer; /* user output buffer */
+	xfs_ino_t		startino; /* start with this inode */
+	unsigned int		icount;   /* number of elements in ubuffer */
+	unsigned int		ocount;   /* number of records returned */
+};
+
+/* Return value that means we want to abort the walk. */
+#define XFS_IBULK_ABORT		(XFS_IWALK_ABORT)
+
+/* Return value that means the formatting buffer is now full. */
+#define XFS_IBULK_BUFFER_FULL	(XFS_IBULK_ABORT + 1)
 
 /*
- * Values for stat return value.
+ * Advance the user buffer pointer by one record of the given size.  If the
+ * buffer is now full, return the appropriate error code.
  */
-#define BULKSTAT_RV_NOTHING	0
-#define BULKSTAT_RV_DIDONE	1
-#define BULKSTAT_RV_GIVEUP	2
+static inline int
+xfs_ibulk_advance(
+	struct xfs_ibulk	*breq,
+	size_t			bytes)
+{
+	char __user		*b = breq->ubuffer;
+
+	breq->ubuffer = b + bytes;
+	breq->ocount++;
+	return breq->ocount == breq->icount ? XFS_IBULK_BUFFER_FULL : 0;
+}
 
 /*
  * Return stat information in bulk (by-inode) for the filesystem.
  */
-int					/* error status */
-xfs_bulkstat(
-	xfs_mount_t	*mp,		/* mount point for filesystem */
-	xfs_ino_t	*lastino,	/* last inode returned */
-	int		*count,		/* size of buffer/count returned */
-	bulkstat_one_pf formatter,	/* func that'd fill a single buf */
-	size_t		statstruct_size,/* sizeof struct that we're filling */
-	char		__user *ubuffer,/* buffer with inode stats */
-	int		*done);		/* 1 if there are more stats to get */
 
-typedef int (*bulkstat_one_fmt_pf)(  /* used size in bytes or negative error */
-	void			__user *ubuffer, /* buffer to write to */
-	int			ubsize,		 /* remaining user buffer sz */
-	int			*ubused,	 /* bytes used by formatter */
-	const xfs_bstat_t	*buffer);        /* buffer to read from */
+typedef int (*bulkstat_one_fmt_pf)(struct xfs_ibulk *breq,
+		const struct xfs_bstat *bstat);
 
-int
-xfs_bulkstat_one_int(
-	xfs_mount_t		*mp,
-	xfs_ino_t		ino,
-	void			__user *buffer,
-	int			ubsize,
-	bulkstat_one_fmt_pf	formatter,
-	int			*ubused,
-	int			*stat);
-
-int
-xfs_bulkstat_one(
-	xfs_mount_t		*mp,
-	xfs_ino_t		ino,
-	void			__user *buffer,
-	int			ubsize,
-	int			*ubused,
-	int			*stat);
+int xfs_bulkstat_one(struct xfs_ibulk *breq, bulkstat_one_fmt_pf formatter);
+int xfs_bulkstat(struct xfs_ibulk *breq, bulkstat_one_fmt_pf formatter);
 
 typedef int (*inumbers_fmt_pf)(
 	void			__user *ubuffer, /* buffer to write to */

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 07/14] xfs: move bulkstat ichunk helpers to iwalk code
  2019-06-12  6:47 [PATCH v5 00/14] xfs: refactor and improve inode iteration Darrick J. Wong
                   ` (5 preceding siblings ...)
  2019-06-12  6:48 ` [PATCH 06/14] xfs: convert bulkstat to new iwalk infrastructure Darrick J. Wong
@ 2019-06-12  6:48 ` Darrick J. Wong
  2019-06-12  6:48 ` [PATCH 08/14] xfs: change xfs_iwalk_grab_ichunk to use startino, not lastino Darrick J. Wong
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: Darrick J. Wong @ 2019-06-12  6:48 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Now that we've reworked the bulkstat code to use iwalk, we can move the
old bulkstat ichunk helpers to xfs_iwalk.c.  No functional changes here.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_itable.c |   93 -------------------------------------------------
 fs/xfs/xfs_itable.h |    8 ----
 fs/xfs/xfs_iwalk.c  |   96 +++++++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 93 insertions(+), 104 deletions(-)


diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index 58e411e11d6c..1b3c9feb5f6f 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -192,99 +192,6 @@ xfs_bulkstat_one(
 	return error;
 }
 
-/*
- * Loop over all clusters in a chunk for a given incore inode allocation btree
- * record.  Do a readahead if there are any allocated inodes in that cluster.
- */
-void
-xfs_bulkstat_ichunk_ra(
-	struct xfs_mount		*mp,
-	xfs_agnumber_t			agno,
-	struct xfs_inobt_rec_incore	*irec)
-{
-	struct xfs_ino_geometry		*igeo = M_IGEO(mp);
-	xfs_agblock_t			agbno;
-	struct blk_plug			plug;
-	int				i;	/* inode chunk index */
-
-	agbno = XFS_AGINO_TO_AGBNO(mp, irec->ir_startino);
-
-	blk_start_plug(&plug);
-	for (i = 0;
-	     i < XFS_INODES_PER_CHUNK;
-	     i += igeo->inodes_per_cluster,
-			agbno += igeo->blocks_per_cluster) {
-		if (xfs_inobt_maskn(i, igeo->inodes_per_cluster) &
-		    ~irec->ir_free) {
-			xfs_btree_reada_bufs(mp, agno, agbno,
-					igeo->blocks_per_cluster,
-					&xfs_inode_buf_ops);
-		}
-	}
-	blk_finish_plug(&plug);
-}
-
-/*
- * Lookup the inode chunk that the given inode lives in and then get the record
- * if we found the chunk.  If the inode was not the last in the chunk and there
- * are some left allocated, update the data for the pointed-to record as well as
- * return the count of grabbed inodes.
- */
-int
-xfs_bulkstat_grab_ichunk(
-	struct xfs_btree_cur		*cur,	/* btree cursor */
-	xfs_agino_t			agino,	/* starting inode of chunk */
-	int				*icount,/* return # of inodes grabbed */
-	struct xfs_inobt_rec_incore	*irec)	/* btree record */
-{
-	int				idx;	/* index into inode chunk */
-	int				stat;
-	int				error = 0;
-
-	/* Lookup the inode chunk that this inode lives in */
-	error = xfs_inobt_lookup(cur, agino, XFS_LOOKUP_LE, &stat);
-	if (error)
-		return error;
-	if (!stat) {
-		*icount = 0;
-		return error;
-	}
-
-	/* Get the record, should always work */
-	error = xfs_inobt_get_rec(cur, irec, &stat);
-	if (error)
-		return error;
-	XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, stat == 1);
-
-	/* Check if the record contains the inode in request */
-	if (irec->ir_startino + XFS_INODES_PER_CHUNK <= agino) {
-		*icount = 0;
-		return 0;
-	}
-
-	idx = agino - irec->ir_startino + 1;
-	if (idx < XFS_INODES_PER_CHUNK &&
-	    (xfs_inobt_maskn(idx, XFS_INODES_PER_CHUNK - idx) & ~irec->ir_free)) {
-		int	i;
-
-		/* We got a right chunk with some left inodes allocated at it.
-		 * Grab the chunk record.  Mark all the uninteresting inodes
-		 * free -- because they're before our start point.
-		 */
-		for (i = 0; i < idx; i++) {
-			if (XFS_INOBT_MASK(i) & ~irec->ir_free)
-				irec->ir_freecount++;
-		}
-
-		irec->ir_free |= xfs_inobt_maskn(0, idx);
-		*icount = irec->ir_count - irec->ir_freecount;
-	}
-
-	return 0;
-}
-
-#define XFS_BULKSTAT_UBLEFT(ubleft)	((ubleft) >= statstruct_size)
-
 static int
 xfs_bulkstat_iwalk(
 	struct xfs_mount	*mp,
diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
index 7c5f1df360e6..328a161b8898 100644
--- a/fs/xfs/xfs_itable.h
+++ b/fs/xfs/xfs_itable.h
@@ -67,12 +67,4 @@ xfs_inumbers(
 	void			__user *buffer, /* buffer with inode info */
 	inumbers_fmt_pf		formatter);
 
-/* Temporarily needed while we refactor functions. */
-struct xfs_btree_cur;
-struct xfs_inobt_rec_incore;
-void xfs_bulkstat_ichunk_ra(struct xfs_mount *mp, xfs_agnumber_t agno,
-		struct xfs_inobt_rec_incore *irec);
-int xfs_bulkstat_grab_ichunk(struct xfs_btree_cur *cur, xfs_agino_t agino,
-		int *icount, struct xfs_inobt_rec_incore *irec);
-
 #endif	/* __XFS_ITABLE_H__ */
diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
index 49289588413f..46fa1ea603e2 100644
--- a/fs/xfs/xfs_iwalk.c
+++ b/fs/xfs/xfs_iwalk.c
@@ -15,7 +15,6 @@
 #include "xfs_ialloc.h"
 #include "xfs_ialloc_btree.h"
 #include "xfs_iwalk.h"
-#include "xfs_itable.h"
 #include "xfs_error.h"
 #include "xfs_trace.h"
 #include "xfs_icache.h"
@@ -66,6 +65,97 @@ struct xfs_iwalk_ag {
 	void				*data;
 };
 
+/*
+ * Loop over all clusters in a chunk for a given incore inode allocation btree
+ * record.  Do a readahead if there are any allocated inodes in that cluster.
+ */
+STATIC void
+xfs_iwalk_ichunk_ra(
+	struct xfs_mount		*mp,
+	xfs_agnumber_t			agno,
+	struct xfs_inobt_rec_incore	*irec)
+{
+	struct xfs_ino_geometry		*igeo = M_IGEO(mp);
+	xfs_agblock_t			agbno;
+	struct blk_plug			plug;
+	int				i;	/* inode chunk index */
+
+	agbno = XFS_AGINO_TO_AGBNO(mp, irec->ir_startino);
+
+	blk_start_plug(&plug);
+	for (i = 0;
+	     i < XFS_INODES_PER_CHUNK;
+	     i += igeo->inodes_per_cluster,
+			agbno += igeo->blocks_per_cluster) {
+		if (xfs_inobt_maskn(i, igeo->inodes_per_cluster) &
+		    ~irec->ir_free) {
+			xfs_btree_reada_bufs(mp, agno, agbno,
+					igeo->blocks_per_cluster,
+					&xfs_inode_buf_ops);
+		}
+	}
+	blk_finish_plug(&plug);
+}
+
+/*
+ * Lookup the inode chunk that the given inode lives in and then get the record
+ * if we found the chunk.  If the inode was not the last in the chunk and there
+ * are some left allocated, update the data for the pointed-to record as well as
+ * return the count of grabbed inodes.
+ */
+STATIC int
+xfs_iwalk_grab_ichunk(
+	struct xfs_btree_cur		*cur,	/* btree cursor */
+	xfs_agino_t			agino,	/* starting inode of chunk */
+	int				*icount,/* return # of inodes grabbed */
+	struct xfs_inobt_rec_incore	*irec)	/* btree record */
+{
+	int				idx;	/* index into inode chunk */
+	int				stat;
+	int				error = 0;
+
+	/* Lookup the inode chunk that this inode lives in */
+	error = xfs_inobt_lookup(cur, agino, XFS_LOOKUP_LE, &stat);
+	if (error)
+		return error;
+	if (!stat) {
+		*icount = 0;
+		return error;
+	}
+
+	/* Get the record, should always work */
+	error = xfs_inobt_get_rec(cur, irec, &stat);
+	if (error)
+		return error;
+	XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, stat == 1);
+
+	/* Check if the record contains the inode in request */
+	if (irec->ir_startino + XFS_INODES_PER_CHUNK <= agino) {
+		*icount = 0;
+		return 0;
+	}
+
+	idx = agino - irec->ir_startino + 1;
+	if (idx < XFS_INODES_PER_CHUNK &&
+	    (xfs_inobt_maskn(idx, XFS_INODES_PER_CHUNK - idx) & ~irec->ir_free)) {
+		int	i;
+
+		/* We got a right chunk with some left inodes allocated at it.
+		 * Grab the chunk record.  Mark all the uninteresting inodes
+		 * free -- because they're before our start point.
+		 */
+		for (i = 0; i < idx; i++) {
+			if (XFS_INOBT_MASK(i) & ~irec->ir_free)
+				irec->ir_freecount++;
+		}
+
+		irec->ir_free |= xfs_inobt_maskn(0, idx);
+		*icount = irec->ir_count - irec->ir_freecount;
+	}
+
+	return 0;
+}
+
 /* Allocate memory for a walk. */
 STATIC int
 xfs_iwalk_alloc(
@@ -190,7 +280,7 @@ xfs_iwalk_ag_start(
 	 * We require a lookup cache of at least two elements so that we don't
 	 * have to deal with tearing down the cursor to walk the records.
 	 */
-	error = xfs_bulkstat_grab_ichunk(*curpp, agino - 1, &icount,
+	error = xfs_iwalk_grab_ichunk(*curpp, agino - 1, &icount,
 			&iwag->recs[iwag->nr_recs]);
 	if (error)
 		return error;
@@ -297,7 +387,7 @@ xfs_iwalk_ag(
 		 * Start readahead for this inode chunk in anticipation of
 		 * walking the inodes.
 		 */
-		xfs_bulkstat_ichunk_ra(mp, agno, irec);
+		xfs_iwalk_ichunk_ra(mp, agno, irec);
 
 		/*
 		 * If there's space in the buffer for more records, increment

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 08/14] xfs: change xfs_iwalk_grab_ichunk to use startino, not lastino
  2019-06-12  6:47 [PATCH v5 00/14] xfs: refactor and improve inode iteration Darrick J. Wong
                   ` (6 preceding siblings ...)
  2019-06-12  6:48 ` [PATCH 07/14] xfs: move bulkstat ichunk helpers to iwalk code Darrick J. Wong
@ 2019-06-12  6:48 ` Darrick J. Wong
  2019-06-12  6:48 ` [PATCH 09/14] xfs: clean up long conditionals in xfs_iwalk_ichunk_ra Darrick J. Wong
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: Darrick J. Wong @ 2019-06-12  6:48 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Now that the inode chunk grabbing function is a static function in the
iwalk code, change its behavior so that @agino is the inode where we
want to /start/ the iteration.  This reduces cognitive friction with the
callers and simplifes the code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_iwalk.c |   37 +++++++++++++++++--------------------
 1 file changed, 17 insertions(+), 20 deletions(-)


diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
index 46fa1ea603e2..b30257a4bebb 100644
--- a/fs/xfs/xfs_iwalk.c
+++ b/fs/xfs/xfs_iwalk.c
@@ -98,10 +98,10 @@ xfs_iwalk_ichunk_ra(
 }
 
 /*
- * Lookup the inode chunk that the given inode lives in and then get the record
- * if we found the chunk.  If the inode was not the last in the chunk and there
- * are some left allocated, update the data for the pointed-to record as well as
- * return the count of grabbed inodes.
+ * Lookup the inode chunk that the given @agino lives in and then get the
+ * record if we found the chunk.  Set the bits in @irec's free mask that
+ * correspond to the inodes before @agino so that we skip them.  This is how we
+ * restart an inode walk that was interrupted in the middle of an inode record.
  */
 STATIC int
 xfs_iwalk_grab_ichunk(
@@ -112,6 +112,7 @@ xfs_iwalk_grab_ichunk(
 {
 	int				idx;	/* index into inode chunk */
 	int				stat;
+	int				i;
 	int				error = 0;
 
 	/* Lookup the inode chunk that this inode lives in */
@@ -135,24 +136,20 @@ xfs_iwalk_grab_ichunk(
 		return 0;
 	}
 
-	idx = agino - irec->ir_startino + 1;
-	if (idx < XFS_INODES_PER_CHUNK &&
-	    (xfs_inobt_maskn(idx, XFS_INODES_PER_CHUNK - idx) & ~irec->ir_free)) {
-		int	i;
+	idx = agino - irec->ir_startino;
 
-		/* We got a right chunk with some left inodes allocated at it.
-		 * Grab the chunk record.  Mark all the uninteresting inodes
-		 * free -- because they're before our start point.
-		 */
-		for (i = 0; i < idx; i++) {
-			if (XFS_INOBT_MASK(i) & ~irec->ir_free)
-				irec->ir_freecount++;
-		}
-
-		irec->ir_free |= xfs_inobt_maskn(0, idx);
-		*icount = irec->ir_count - irec->ir_freecount;
+	/*
+	 * We got a right chunk with some left inodes allocated at it.  Grab
+	 * the chunk record.  Mark all the uninteresting inodes free because
+	 * they're before our start point.
+	 */
+	for (i = 0; i < idx; i++) {
+		if (XFS_INOBT_MASK(i) & ~irec->ir_free)
+			irec->ir_freecount++;
 	}
 
+	irec->ir_free |= xfs_inobt_maskn(0, idx);
+	*icount = irec->ir_count - irec->ir_freecount;
 	return 0;
 }
 
@@ -280,7 +277,7 @@ xfs_iwalk_ag_start(
 	 * We require a lookup cache of at least two elements so that we don't
 	 * have to deal with tearing down the cursor to walk the records.
 	 */
-	error = xfs_iwalk_grab_ichunk(*curpp, agino - 1, &icount,
+	error = xfs_iwalk_grab_ichunk(*curpp, agino, &icount,
 			&iwag->recs[iwag->nr_recs]);
 	if (error)
 		return error;

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 09/14] xfs: clean up long conditionals in xfs_iwalk_ichunk_ra
  2019-06-12  6:47 [PATCH v5 00/14] xfs: refactor and improve inode iteration Darrick J. Wong
                   ` (7 preceding siblings ...)
  2019-06-12  6:48 ` [PATCH 08/14] xfs: change xfs_iwalk_grab_ichunk to use startino, not lastino Darrick J. Wong
@ 2019-06-12  6:48 ` Darrick J. Wong
  2019-06-12  6:48 ` [PATCH 10/14] xfs: refactor xfs_iwalk_grab_ichunk Darrick J. Wong
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: Darrick J. Wong @ 2019-06-12  6:48 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Refactor xfs_iwalk_ichunk_ra to avoid long conditionals.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_iwalk.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)


diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
index b30257a4bebb..a2102fa94ff5 100644
--- a/fs/xfs/xfs_iwalk.c
+++ b/fs/xfs/xfs_iwalk.c
@@ -83,16 +83,16 @@ xfs_iwalk_ichunk_ra(
 	agbno = XFS_AGINO_TO_AGBNO(mp, irec->ir_startino);
 
 	blk_start_plug(&plug);
-	for (i = 0;
-	     i < XFS_INODES_PER_CHUNK;
-	     i += igeo->inodes_per_cluster,
-			agbno += igeo->blocks_per_cluster) {
-		if (xfs_inobt_maskn(i, igeo->inodes_per_cluster) &
-		    ~irec->ir_free) {
+	for (i = 0; i < XFS_INODES_PER_CHUNK; i += igeo->inodes_per_cluster) {
+		xfs_inofree_t	imask;
+
+		imask = xfs_inobt_maskn(i, igeo->inodes_per_cluster);
+		if (imask & ~irec->ir_free) {
 			xfs_btree_reada_bufs(mp, agno, agbno,
 					igeo->blocks_per_cluster,
 					&xfs_inode_buf_ops);
 		}
+		agbno += igeo->blocks_per_cluster;
 	}
 	blk_finish_plug(&plug);
 }

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 10/14] xfs: refactor xfs_iwalk_grab_ichunk
  2019-06-12  6:47 [PATCH v5 00/14] xfs: refactor and improve inode iteration Darrick J. Wong
                   ` (8 preceding siblings ...)
  2019-06-12  6:48 ` [PATCH 09/14] xfs: clean up long conditionals in xfs_iwalk_ichunk_ra Darrick J. Wong
@ 2019-06-12  6:48 ` Darrick J. Wong
  2019-06-14 14:04   ` Brian Foster
  2019-06-12  6:48 ` [PATCH 11/14] xfs: refactor iwalk code to handle walking inobt records Darrick J. Wong
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Darrick J. Wong @ 2019-06-12  6:48 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

In preparation for reusing the iwalk code for the inogrp walking code
(aka INUMBERS), move the initial inobt lookup and retrieval code out of
xfs_iwalk_grab_ichunk so that we call the masking code only when we need
to trim out the inodes that came before the cursor in the inobt record
(aka BULKSTAT).

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_iwalk.c |   79 ++++++++++++++++++++++++++--------------------------
 1 file changed, 39 insertions(+), 40 deletions(-)


diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
index a2102fa94ff5..8c4d7e59f86a 100644
--- a/fs/xfs/xfs_iwalk.c
+++ b/fs/xfs/xfs_iwalk.c
@@ -98,43 +98,17 @@ xfs_iwalk_ichunk_ra(
 }
 
 /*
- * Lookup the inode chunk that the given @agino lives in and then get the
- * record if we found the chunk.  Set the bits in @irec's free mask that
- * correspond to the inodes before @agino so that we skip them.  This is how we
- * restart an inode walk that was interrupted in the middle of an inode record.
+ * Set the bits in @irec's free mask that correspond to the inodes before
+ * @agino so that we skip them.  This is how we restart an inode walk that was
+ * interrupted in the middle of an inode record.
  */
-STATIC int
-xfs_iwalk_grab_ichunk(
-	struct xfs_btree_cur		*cur,	/* btree cursor */
+STATIC void
+xfs_iwalk_adjust_start(
 	xfs_agino_t			agino,	/* starting inode of chunk */
-	int				*icount,/* return # of inodes grabbed */
 	struct xfs_inobt_rec_incore	*irec)	/* btree record */
 {
 	int				idx;	/* index into inode chunk */
-	int				stat;
 	int				i;
-	int				error = 0;
-
-	/* Lookup the inode chunk that this inode lives in */
-	error = xfs_inobt_lookup(cur, agino, XFS_LOOKUP_LE, &stat);
-	if (error)
-		return error;
-	if (!stat) {
-		*icount = 0;
-		return error;
-	}
-
-	/* Get the record, should always work */
-	error = xfs_inobt_get_rec(cur, irec, &stat);
-	if (error)
-		return error;
-	XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, stat == 1);
-
-	/* Check if the record contains the inode in request */
-	if (irec->ir_startino + XFS_INODES_PER_CHUNK <= agino) {
-		*icount = 0;
-		return 0;
-	}
 
 	idx = agino - irec->ir_startino;
 
@@ -149,8 +123,6 @@ xfs_iwalk_grab_ichunk(
 	}
 
 	irec->ir_free |= xfs_inobt_maskn(0, idx);
-	*icount = irec->ir_count - irec->ir_freecount;
-	return 0;
 }
 
 /* Allocate memory for a walk. */
@@ -258,7 +230,7 @@ xfs_iwalk_ag_start(
 {
 	struct xfs_mount	*mp = iwag->mp;
 	struct xfs_trans	*tp = iwag->tp;
-	int			icount;
+	struct xfs_inobt_rec_incore *irec;
 	int			error;
 
 	/* Set up a fresh cursor and empty the inobt cache. */
@@ -274,15 +246,40 @@ xfs_iwalk_ag_start(
 	/*
 	 * Otherwise, we have to grab the inobt record where we left off, stuff
 	 * the record into our cache, and then see if there are more records.
-	 * We require a lookup cache of at least two elements so that we don't
-	 * have to deal with tearing down the cursor to walk the records.
+	 * We require a lookup cache of at least two elements so that the
+	 * caller doesn't have to deal with tearing down the cursor to walk the
+	 * records.
 	 */
-	error = xfs_iwalk_grab_ichunk(*curpp, agino, &icount,
-			&iwag->recs[iwag->nr_recs]);
+	error = xfs_inobt_lookup(*curpp, agino, XFS_LOOKUP_LE, has_more);
+	if (error)
+		return error;
+
+	/*
+	 * If the LE lookup at @agino yields no records, jump ahead to the
+	 * inobt cursor increment to see if there are more records to process.
+	 */
+	if (!*has_more)
+		goto out_advance;
+
+	/* Get the record, should always work */
+	irec = &iwag->recs[iwag->nr_recs];
+	error = xfs_inobt_get_rec(*curpp, irec, has_more);
 	if (error)
 		return error;
-	if (icount)
-		iwag->nr_recs++;
+	XFS_WANT_CORRUPTED_RETURN(mp, *has_more == 1);
+
+	/*
+	 * If the LE lookup yielded an inobt record before the cursor position,
+	 * skip it and see if there's another one after it.
+	 */
+	if (irec->ir_startino + XFS_INODES_PER_CHUNK <= agino)
+		goto out_advance;
+
+	/*
+	 * If agino fell in the middle of the inode record, make it look like
+	 * the inodes up to agino are free so that we don't return them again.
+	 */
+	xfs_iwalk_adjust_start(agino, irec);
 
 	/*
 	 * set_prefetch is supposed to give us a large enough inobt record
@@ -290,8 +287,10 @@ xfs_iwalk_ag_start(
 	 * body can cache a record without having to check for cache space
 	 * until after it reads an inobt record.
 	 */
+	iwag->nr_recs++;
 	ASSERT(iwag->nr_recs < iwag->sz_recs);
 
+out_advance:
 	return xfs_btree_increment(*curpp, 0, has_more);
 }
 

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 11/14] xfs: refactor iwalk code to handle walking inobt records
  2019-06-12  6:47 [PATCH v5 00/14] xfs: refactor and improve inode iteration Darrick J. Wong
                   ` (9 preceding siblings ...)
  2019-06-12  6:48 ` [PATCH 10/14] xfs: refactor xfs_iwalk_grab_ichunk Darrick J. Wong
@ 2019-06-12  6:48 ` Darrick J. Wong
  2019-06-14 14:04   ` Brian Foster
  2019-06-12  6:48 ` [PATCH 12/14] xfs: refactor INUMBERS to use iwalk functions Darrick J. Wong
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Darrick J. Wong @ 2019-06-12  6:48 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Refactor xfs_iwalk_ag_start and xfs_iwalk_ag so that the bits that are
particular to bulkstat (trimming the start irec, starting inode
readahead, and skipping empty groups) can be controlled via flags in the
iwag structure.

This enables us to add a new function to walk all inobt records which
will be used for the new INUMBERS implementation in the next patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_iwalk.c |   75 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 fs/xfs/xfs_iwalk.h |   12 ++++++++
 2 files changed, 84 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
index 8c4d7e59f86a..def37347a362 100644
--- a/fs/xfs/xfs_iwalk.c
+++ b/fs/xfs/xfs_iwalk.c
@@ -62,7 +62,18 @@ struct xfs_iwalk_ag {
 
 	/* Inode walk function and data pointer. */
 	xfs_iwalk_fn			iwalk_fn;
+	xfs_inobt_walk_fn		inobt_walk_fn;
 	void				*data;
+
+	/*
+	 * Make it look like the inodes up to startino are free so that
+	 * bulkstat can start its inode iteration at the correct place without
+	 * needing to special case everywhere.
+	 */
+	unsigned int			trim_start:1;
+
+	/* Skip empty inobt records? */
+	unsigned int			skip_empty:1;
 };
 
 /*
@@ -170,6 +181,16 @@ xfs_iwalk_ag_recs(
 
 		trace_xfs_iwalk_ag_rec(mp, agno, irec);
 
+		if (iwag->inobt_walk_fn) {
+			error = iwag->inobt_walk_fn(mp, tp, agno, irec,
+					iwag->data);
+			if (error)
+				return error;
+		}
+
+		if (!iwag->iwalk_fn)
+			continue;
+
 		for (j = 0; j < XFS_INODES_PER_CHUNK; j++) {
 			/* Skip if this inode is free */
 			if (XFS_INOBT_MASK(j) & irec->ir_free)
@@ -279,7 +300,8 @@ xfs_iwalk_ag_start(
 	 * If agino fell in the middle of the inode record, make it look like
 	 * the inodes up to agino are free so that we don't return them again.
 	 */
-	xfs_iwalk_adjust_start(agino, irec);
+	if (iwag->trim_start)
+		xfs_iwalk_adjust_start(agino, irec);
 
 	/*
 	 * set_prefetch is supposed to give us a large enough inobt record
@@ -372,7 +394,7 @@ xfs_iwalk_ag(
 			break;
 
 		/* No allocated inodes in this chunk; skip it. */
-		if (irec->ir_freecount == irec->ir_count) {
+		if (iwag->skip_empty && irec->ir_freecount == irec->ir_count) {
 			error = xfs_btree_increment(cur, 0, &has_more);
 			if (error)
 				break;
@@ -383,7 +405,8 @@ xfs_iwalk_ag(
 		 * Start readahead for this inode chunk in anticipation of
 		 * walking the inodes.
 		 */
-		xfs_iwalk_ichunk_ra(mp, agno, irec);
+		if (iwag->iwalk_fn)
+			xfs_iwalk_ichunk_ra(mp, agno, irec);
 
 		/*
 		 * If there's space in the buffer for more records, increment
@@ -481,6 +504,8 @@ xfs_iwalk(
 		.iwalk_fn	= iwalk_fn,
 		.data		= data,
 		.startino	= startino,
+		.trim_start	= 1,
+		.skip_empty	= 1,
 	};
 	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, startino);
 	int			error;
@@ -502,3 +527,47 @@ xfs_iwalk(
 	xfs_iwalk_free(&iwag);
 	return error;
 }
+
+/*
+ * Walk all inode btree records in the filesystem starting from @startino.  The
+ * @inobt_walk_fn will be called for each btree record, being passed the incore
+ * record and @data.  @max_prefetch controls how many inobt records we try to
+ * cache ahead of time.
+ */
+int
+xfs_inobt_walk(
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	xfs_ino_t		startino,
+	xfs_inobt_walk_fn	inobt_walk_fn,
+	unsigned int		max_prefetch,
+	void			*data)
+{
+	struct xfs_iwalk_ag	iwag = {
+		.mp		= mp,
+		.tp		= tp,
+		.inobt_walk_fn	= inobt_walk_fn,
+		.data		= data,
+		.startino	= startino,
+	};
+	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, startino);
+	int			error;
+
+	ASSERT(agno < mp->m_sb.sb_agcount);
+
+	/* Translate inumbers record count to inode count. */
+	xfs_iwalk_set_prefetch(&iwag, max_prefetch * XFS_INODES_PER_CHUNK);
+	error = xfs_iwalk_alloc(&iwag);
+	if (error)
+		return error;
+
+	for (; agno < mp->m_sb.sb_agcount; agno++) {
+		error = xfs_iwalk_ag(&iwag);
+		if (error)
+			break;
+		iwag.startino = XFS_AGINO_TO_INO(mp, agno + 1, 0);
+	}
+
+	xfs_iwalk_free(&iwag);
+	return error;
+}
diff --git a/fs/xfs/xfs_iwalk.h b/fs/xfs/xfs_iwalk.h
index 9e762e31dadc..97c1120d4237 100644
--- a/fs/xfs/xfs_iwalk.h
+++ b/fs/xfs/xfs_iwalk.h
@@ -16,4 +16,16 @@ typedef int (*xfs_iwalk_fn)(struct xfs_mount *mp, struct xfs_trans *tp,
 int xfs_iwalk(struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t startino,
 		xfs_iwalk_fn iwalk_fn, unsigned int max_prefetch, void *data);
 
+/* Walk all inode btree records in the filesystem starting from @startino. */
+typedef int (*xfs_inobt_walk_fn)(struct xfs_mount *mp, struct xfs_trans *tp,
+				 xfs_agnumber_t agno,
+				 const struct xfs_inobt_rec_incore *irec,
+				 void *data);
+/* Return value (for xfs_inobt_walk_fn) that aborts the walk immediately. */
+#define XFS_INOBT_WALK_ABORT	(XFS_IWALK_ABORT)
+
+int xfs_inobt_walk(struct xfs_mount *mp, struct xfs_trans *tp,
+		xfs_ino_t startino, xfs_inobt_walk_fn inobt_walk_fn,
+		unsigned int max_prefetch, void *data);
+
 #endif /* __XFS_IWALK_H__ */

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 12/14] xfs: refactor INUMBERS to use iwalk functions
  2019-06-12  6:47 [PATCH v5 00/14] xfs: refactor and improve inode iteration Darrick J. Wong
                   ` (10 preceding siblings ...)
  2019-06-12  6:48 ` [PATCH 11/14] xfs: refactor iwalk code to handle walking inobt records Darrick J. Wong
@ 2019-06-12  6:48 ` Darrick J. Wong
  2019-06-14 14:05   ` Brian Foster
  2019-06-12  6:48 ` [PATCH 13/14] xfs: multithreaded iwalk implementation Darrick J. Wong
  2019-06-12  6:49 ` [PATCH 14/14] xfs: poll waiting for quotacheck Darrick J. Wong
  13 siblings, 1 reply; 33+ messages in thread
From: Darrick J. Wong @ 2019-06-12  6:48 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Now that we have generic functions to walk inode records, refactor the
INUMBERS implementation to use it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_ioctl.c   |   20 ++++--
 fs/xfs/xfs_ioctl.h   |    2 +
 fs/xfs/xfs_ioctl32.c |   35 ++++-------
 fs/xfs/xfs_itable.c  |  166 +++++++++++++++++++-------------------------------
 fs/xfs/xfs_itable.h  |   22 +------
 5 files changed, 95 insertions(+), 150 deletions(-)


diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 60595e61f2a6..04b661ff0799 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -733,6 +733,16 @@ xfs_bulkstat_one_fmt(
 	return xfs_ibulk_advance(breq, sizeof(struct xfs_bstat));
 }
 
+int
+xfs_inumbers_fmt(
+	struct xfs_ibulk	*breq,
+	const struct xfs_inogrp	*igrp)
+{
+	if (copy_to_user(breq->ubuffer, igrp, sizeof(*igrp)))
+		return -EFAULT;
+	return xfs_ibulk_advance(breq, sizeof(struct xfs_inogrp));
+}
+
 STATIC int
 xfs_ioc_bulkstat(
 	xfs_mount_t		*mp,
@@ -783,13 +793,9 @@ xfs_ioc_bulkstat(
 	 * in filesystem".
 	 */
 	if (cmd == XFS_IOC_FSINUMBERS) {
-		int	count = breq.icount;
-
-		breq.startino = lastino;
-		error = xfs_inumbers(mp, &breq.startino, &count,
-					bulkreq.ubuffer, xfs_inumbers_fmt);
-		breq.ocount = count;
-		lastino = breq.startino;
+		breq.startino = lastino ? lastino + 1 : 0;
+		error = xfs_inumbers(&breq, xfs_inumbers_fmt);
+		lastino = breq.startino - 1;
 	} else if (cmd == XFS_IOC_FSBULKSTAT_SINGLE) {
 		breq.startino = lastino;
 		breq.icount = 1;
diff --git a/fs/xfs/xfs_ioctl.h b/fs/xfs/xfs_ioctl.h
index f32c8aadfeba..fb303eaa8863 100644
--- a/fs/xfs/xfs_ioctl.h
+++ b/fs/xfs/xfs_ioctl.h
@@ -79,7 +79,9 @@ xfs_set_dmattrs(
 
 struct xfs_ibulk;
 struct xfs_bstat;
+struct xfs_inogrp;
 
 int xfs_bulkstat_one_fmt(struct xfs_ibulk *breq, const struct xfs_bstat *bstat);
+int xfs_inumbers_fmt(struct xfs_ibulk *breq, const struct xfs_inogrp *igrp);
 
 #endif
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index 5d1c143bac18..3ca8ff9d4ac7 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -87,22 +87,17 @@ xfs_compat_growfs_rt_copyin(
 
 STATIC int
 xfs_inumbers_fmt_compat(
-	void			__user *ubuffer,
-	const struct xfs_inogrp	*buffer,
-	long			count,
-	long			*written)
+	struct xfs_ibulk	*breq,
+	const struct xfs_inogrp	*igrp)
 {
-	compat_xfs_inogrp_t	__user *p32 = ubuffer;
-	long			i;
+	struct compat_xfs_inogrp __user *p32 = breq->ubuffer;
 
-	for (i = 0; i < count; i++) {
-		if (put_user(buffer[i].xi_startino,   &p32[i].xi_startino) ||
-		    put_user(buffer[i].xi_alloccount, &p32[i].xi_alloccount) ||
-		    put_user(buffer[i].xi_allocmask,  &p32[i].xi_allocmask))
-			return -EFAULT;
-	}
-	*written = count * sizeof(*p32);
-	return 0;
+	if (put_user(igrp->xi_startino,   &p32->xi_startino) ||
+	    put_user(igrp->xi_alloccount, &p32->xi_alloccount) ||
+	    put_user(igrp->xi_allocmask,  &p32->xi_allocmask))
+		return -EFAULT;
+
+	return xfs_ibulk_advance(breq, sizeof(struct compat_xfs_inogrp));
 }
 
 #else
@@ -228,7 +223,7 @@ xfs_compat_ioc_bulkstat(
 	 * to userpace memory via bulkreq.ubuffer.  Normally the compat
 	 * functions and structure size are the correct ones to use ...
 	 */
-	inumbers_fmt_pf inumbers_func = xfs_inumbers_fmt_compat;
+	inumbers_fmt_pf		inumbers_func = xfs_inumbers_fmt_compat;
 	bulkstat_one_fmt_pf	bs_one_func = xfs_bulkstat_one_fmt_compat;
 
 #ifdef CONFIG_X86_X32
@@ -291,13 +286,9 @@ xfs_compat_ioc_bulkstat(
 	 * in filesystem".
 	 */
 	if (cmd == XFS_IOC_FSINUMBERS_32) {
-		int	count = breq.icount;
-
-		breq.startino = lastino;
-		error = xfs_inumbers(mp, &breq.startino, &count,
-				bulkreq.ubuffer, inumbers_func);
-		breq.ocount = count;
-		lastino = breq.startino;
+		breq.startino = lastino ? lastino + 1 : 0;
+		error = xfs_inumbers(&breq, inumbers_func);
+		lastino = breq.startino - 1;
 	} else if (cmd == XFS_IOC_FSBULKSTAT_SINGLE_32) {
 		breq.startino = lastino;
 		breq.icount = 1;
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index 1b3c9feb5f6f..b2f640ecb507 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -269,121 +269,83 @@ xfs_bulkstat(
 	return error;
 }
 
-int
-xfs_inumbers_fmt(
-	void			__user *ubuffer, /* buffer to write to */
-	const struct xfs_inogrp	*buffer,	/* buffer to read from */
-	long			count,		/* # of elements to read */
-	long			*written)	/* # of bytes written */
+struct xfs_inumbers_chunk {
+	inumbers_fmt_pf		formatter;
+	struct xfs_ibulk	*breq;
+};
+
+/*
+ * INUMBERS
+ * ========
+ * This is how we export inode btree records to userspace, so that XFS tools
+ * can figure out where inodes are allocated.
+ */
+
+/*
+ * Format the inode group structure and report it somewhere.
+ *
+ * Similar to xfs_bulkstat_one_int, lastino is the inode cursor as we walk
+ * through the filesystem so we move it forward unless there was a runtime
+ * error.  If the formatter tells us the buffer is now full we also move the
+ * cursor forward and abort the walk.
+ */
+STATIC int
+xfs_inumbers_walk(
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	xfs_agnumber_t		agno,
+	const struct xfs_inobt_rec_incore *irec,
+	void			*data)
 {
-	if (copy_to_user(ubuffer, buffer, count * sizeof(*buffer)))
-		return -EFAULT;
-	*written = count * sizeof(*buffer);
-	return 0;
+	struct xfs_inogrp	inogrp = {
+		.xi_startino	= XFS_AGINO_TO_INO(mp, agno, irec->ir_startino),
+		.xi_alloccount	= irec->ir_count - irec->ir_freecount,
+		.xi_allocmask	= ~irec->ir_free,
+	};
+	struct xfs_inumbers_chunk *ic = data;
+	xfs_agino_t		agino;
+	int			error;
+
+	error = ic->formatter(ic->breq, &inogrp);
+	if (error && error != XFS_IBULK_BUFFER_FULL)
+		return error;
+	if (error == XFS_IBULK_BUFFER_FULL)
+		error = XFS_INOBT_WALK_ABORT;
+
+	agino = irec->ir_startino + XFS_INODES_PER_CHUNK;
+	ic->breq->startino = XFS_AGINO_TO_INO(mp, agno, agino);
+	return error;
 }
 
 /*
  * Return inode number table for the filesystem.
  */
-int					/* error status */
+int
 xfs_inumbers(
-	struct xfs_mount	*mp,/* mount point for filesystem */
-	xfs_ino_t		*lastino,/* last inode returned */
-	int			*count,/* size of buffer/count returned */
-	void			__user *ubuffer,/* buffer with inode descriptions */
+	struct xfs_ibulk	*breq,
 	inumbers_fmt_pf		formatter)
 {
-	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, *lastino);
-	xfs_agino_t		agino = XFS_INO_TO_AGINO(mp, *lastino);
-	struct xfs_btree_cur	*cur = NULL;
-	struct xfs_buf		*agbp = NULL;
-	struct xfs_inogrp	*buffer;
-	int			bcount;
-	int			left = *count;
-	int			bufidx = 0;
+	struct xfs_inumbers_chunk ic = {
+		.formatter	= formatter,
+		.breq		= breq,
+	};
 	int			error = 0;
 
-	*count = 0;
-	if (agno >= mp->m_sb.sb_agcount ||
-	    *lastino != XFS_AGINO_TO_INO(mp, agno, agino))
-		return error;
+	if (xfs_bulkstat_already_done(breq->mp, breq->startino))
+		return 0;
 
-	bcount = min(left, (int)(PAGE_SIZE / sizeof(*buffer)));
-	buffer = kmem_zalloc(bcount * sizeof(*buffer), KM_SLEEP);
-	do {
-		struct xfs_inobt_rec_incore	r;
-		int				stat;
-
-		if (!agbp) {
-			error = xfs_ialloc_read_agi(mp, NULL, agno, &agbp);
-			if (error)
-				break;
-
-			cur = xfs_inobt_init_cursor(mp, NULL, agbp, agno,
-						    XFS_BTNUM_INO);
-			error = xfs_inobt_lookup(cur, agino, XFS_LOOKUP_GE,
-						 &stat);
-			if (error)
-				break;
-			if (!stat)
-				goto next_ag;
-		}
-
-		error = xfs_inobt_get_rec(cur, &r, &stat);
-		if (error)
-			break;
-		if (!stat)
-			goto next_ag;
-
-		agino = r.ir_startino + XFS_INODES_PER_CHUNK - 1;
-		buffer[bufidx].xi_startino =
-			XFS_AGINO_TO_INO(mp, agno, r.ir_startino);
-		buffer[bufidx].xi_alloccount = r.ir_count - r.ir_freecount;
-		buffer[bufidx].xi_allocmask = ~r.ir_free;
-		if (++bufidx == bcount) {
-			long	written;
-
-			error = formatter(ubuffer, buffer, bufidx, &written);
-			if (error)
-				break;
-			ubuffer += written;
-			*count += bufidx;
-			bufidx = 0;
-		}
-		if (!--left)
-			break;
-
-		error = xfs_btree_increment(cur, 0, &stat);
-		if (error)
-			break;
-		if (stat)
-			continue;
-
-next_ag:
-		xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
-		cur = NULL;
-		xfs_buf_relse(agbp);
-		agbp = NULL;
-		agino = 0;
-		agno++;
-	} while (agno < mp->m_sb.sb_agcount);
-
-	if (!error) {
-		if (bufidx) {
-			long	written;
-
-			error = formatter(ubuffer, buffer, bufidx, &written);
-			if (!error)
-				*count += bufidx;
-		}
-		*lastino = XFS_AGINO_TO_INO(mp, agno, agino);
-	}
+	error = xfs_inobt_walk(breq->mp, NULL, breq->startino,
+			xfs_inumbers_walk, breq->icount, &ic);
 
-	kmem_free(buffer);
-	if (cur)
-		xfs_btree_del_cursor(cur, error);
-	if (agbp)
-		xfs_buf_relse(agbp);
+	/*
+	 * We found some inode groups, so clear the error status and return
+	 * them.  The lastino pointer will point directly at the inode that
+	 * triggered any error that occurred, so on the next call the error
+	 * will be triggered again and propagated to userspace as there will be
+	 * no formatted inode groups in the buffer.
+	 */
+	if (breq->ocount > 0)
+		error = 0;
 
 	return error;
 }
diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
index 328a161b8898..1e1a5bb9fd9f 100644
--- a/fs/xfs/xfs_itable.h
+++ b/fs/xfs/xfs_itable.h
@@ -46,25 +46,9 @@ typedef int (*bulkstat_one_fmt_pf)(struct xfs_ibulk *breq,
 int xfs_bulkstat_one(struct xfs_ibulk *breq, bulkstat_one_fmt_pf formatter);
 int xfs_bulkstat(struct xfs_ibulk *breq, bulkstat_one_fmt_pf formatter);
 
-typedef int (*inumbers_fmt_pf)(
-	void			__user *ubuffer, /* buffer to write to */
-	const xfs_inogrp_t	*buffer,	/* buffer to read from */
-	long			count,		/* # of elements to read */
-	long			*written);	/* # of bytes written */
+typedef int (*inumbers_fmt_pf)(struct xfs_ibulk *breq,
+		const struct xfs_inogrp *igrp);
 
-int
-xfs_inumbers_fmt(
-	void			__user *ubuffer, /* buffer to write to */
-	const xfs_inogrp_t	*buffer,	/* buffer to read from */
-	long			count,		/* # of elements to read */
-	long			*written);	/* # of bytes written */
-
-int					/* error status */
-xfs_inumbers(
-	xfs_mount_t		*mp,	/* mount point for filesystem */
-	xfs_ino_t		*last,	/* last inode returned */
-	int			*count,	/* size of buffer/count returned */
-	void			__user *buffer, /* buffer with inode info */
-	inumbers_fmt_pf		formatter);
+int xfs_inumbers(struct xfs_ibulk *breq, inumbers_fmt_pf formatter);
 
 #endif	/* __XFS_ITABLE_H__ */

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 13/14] xfs: multithreaded iwalk implementation
  2019-06-12  6:47 [PATCH v5 00/14] xfs: refactor and improve inode iteration Darrick J. Wong
                   ` (11 preceding siblings ...)
  2019-06-12  6:48 ` [PATCH 12/14] xfs: refactor INUMBERS to use iwalk functions Darrick J. Wong
@ 2019-06-12  6:48 ` Darrick J. Wong
  2019-06-14 14:06   ` Brian Foster
  2019-06-12  6:49 ` [PATCH 14/14] xfs: poll waiting for quotacheck Darrick J. Wong
  13 siblings, 1 reply; 33+ messages in thread
From: Darrick J. Wong @ 2019-06-12  6:48 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Create a parallel iwalk implementation and switch quotacheck to use it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile      |    1 
 fs/xfs/xfs_globals.c |    3 +
 fs/xfs/xfs_iwalk.c   |   82 +++++++++++++++++++++++++++++++++
 fs/xfs/xfs_iwalk.h   |    2 +
 fs/xfs/xfs_pwork.c   |  126 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_pwork.h   |   58 +++++++++++++++++++++++
 fs/xfs/xfs_qm.c      |    2 -
 fs/xfs/xfs_sysctl.h  |    6 ++
 fs/xfs/xfs_sysfs.c   |   40 ++++++++++++++++
 fs/xfs/xfs_trace.h   |   18 +++++++
 10 files changed, 337 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/xfs_pwork.c
 create mode 100644 fs/xfs/xfs_pwork.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 74d30ef0dbce..48940a27d4aa 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -84,6 +84,7 @@ xfs-y				+= xfs_aops.o \
 				   xfs_message.o \
 				   xfs_mount.o \
 				   xfs_mru_cache.o \
+				   xfs_pwork.o \
 				   xfs_reflink.o \
 				   xfs_stats.o \
 				   xfs_super.o \
diff --git a/fs/xfs/xfs_globals.c b/fs/xfs/xfs_globals.c
index d0d377384120..4f93f2c4dc38 100644
--- a/fs/xfs/xfs_globals.c
+++ b/fs/xfs/xfs_globals.c
@@ -31,6 +31,9 @@ xfs_param_t xfs_params = {
 	.fstrm_timer	= {	1,		30*100,		3600*100},
 	.eofb_timer	= {	1,		300,		3600*24},
 	.cowb_timer	= {	1,		1800,		3600*24},
+#ifdef DEBUG
+	.pwork_threads	= {	0,		0,		NR_CPUS	},
+#endif
 };
 
 struct xfs_globals xfs_globals = {
diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
index def37347a362..0fe740298981 100644
--- a/fs/xfs/xfs_iwalk.c
+++ b/fs/xfs/xfs_iwalk.c
@@ -20,6 +20,7 @@
 #include "xfs_icache.h"
 #include "xfs_health.h"
 #include "xfs_trans.h"
+#include "xfs_pwork.h"
 
 /*
  * Walking Inodes in the Filesystem
@@ -45,6 +46,9 @@
  */
 
 struct xfs_iwalk_ag {
+	/* parallel work control data; will be null if single threaded */
+	struct xfs_pwork		pwork;
+
 	struct xfs_mount		*mp;
 	struct xfs_trans		*tp;
 
@@ -181,6 +185,9 @@ xfs_iwalk_ag_recs(
 
 		trace_xfs_iwalk_ag_rec(mp, agno, irec);
 
+		if (xfs_pwork_want_abort(&iwag->pwork))
+				return 0;
+
 		if (iwag->inobt_walk_fn) {
 			error = iwag->inobt_walk_fn(mp, tp, agno, irec,
 					iwag->data);
@@ -192,6 +199,9 @@ xfs_iwalk_ag_recs(
 			continue;
 
 		for (j = 0; j < XFS_INODES_PER_CHUNK; j++) {
+			if (xfs_pwork_want_abort(&iwag->pwork))
+				return 0;
+
 			/* Skip if this inode is free */
 			if (XFS_INOBT_MASK(j) & irec->ir_free)
 				continue;
@@ -386,6 +396,8 @@ xfs_iwalk_ag(
 		struct xfs_inobt_rec_incore	*irec;
 
 		cond_resched();
+		if (xfs_pwork_want_abort(&iwag->pwork))
+			goto out;
 
 		/* Fetch the inobt record. */
 		irec = &iwag->recs[iwag->nr_recs];
@@ -506,6 +518,7 @@ xfs_iwalk(
 		.startino	= startino,
 		.trim_start	= 1,
 		.skip_empty	= 1,
+		.pwork		= XFS_PWORK_SINGLE_THREADED,
 	};
 	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, startino);
 	int			error;
@@ -528,6 +541,74 @@ xfs_iwalk(
 	return error;
 }
 
+/* Run per-thread iwalk work. */
+static int
+xfs_iwalk_ag_work(
+	struct xfs_mount	*mp,
+	struct xfs_pwork	*pwork)
+{
+	struct xfs_iwalk_ag	*iwag;
+	int			error;
+
+	iwag = container_of(pwork, struct xfs_iwalk_ag, pwork);
+	if (xfs_pwork_want_abort(pwork))
+		goto out;
+
+	error = xfs_iwalk_alloc(iwag);
+	if (error)
+		goto out;
+
+	error = xfs_iwalk_ag(iwag);
+	xfs_iwalk_free(iwag);
+out:
+	kmem_free(iwag);
+	return error;
+}
+
+/*
+ * Walk all the inodes in the filesystem using multiple threads to process each
+ * AG.
+ */
+int
+xfs_iwalk_threaded(
+	struct xfs_mount	*mp,
+	xfs_ino_t		startino,
+	xfs_iwalk_fn		iwalk_fn,
+	unsigned int		max_prefetch,
+	void			*data)
+{
+	struct xfs_pwork_ctl	pctl;
+	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, startino);
+	unsigned int		nr_threads;
+	int			error;
+
+	ASSERT(agno < mp->m_sb.sb_agcount);
+
+	nr_threads = xfs_pwork_guess_datadev_parallelism(mp);
+	error = xfs_pwork_init(mp, &pctl, xfs_iwalk_ag_work, "xfs_iwalk",
+			nr_threads);
+	if (error)
+		return error;
+
+	for (; agno < mp->m_sb.sb_agcount; agno++) {
+		struct xfs_iwalk_ag	*iwag;
+
+		if (xfs_pwork_ctl_want_abort(&pctl))
+			break;
+
+		iwag = kmem_zalloc(sizeof(struct xfs_iwalk_ag), KM_SLEEP);
+		iwag->mp = mp;
+		iwag->iwalk_fn = iwalk_fn;
+		iwag->data = data;
+		iwag->startino = startino;
+		xfs_iwalk_set_prefetch(iwag, max_prefetch);
+		xfs_pwork_queue(&pctl, &iwag->pwork);
+		startino = XFS_AGINO_TO_INO(mp, agno + 1, 0);
+	}
+
+	return xfs_pwork_destroy(&pctl);
+}
+
 /*
  * Walk all inode btree records in the filesystem starting from @startino.  The
  * @inobt_walk_fn will be called for each btree record, being passed the incore
@@ -549,6 +630,7 @@ xfs_inobt_walk(
 		.inobt_walk_fn	= inobt_walk_fn,
 		.data		= data,
 		.startino	= startino,
+		.pwork		= XFS_PWORK_SINGLE_THREADED,
 	};
 	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, startino);
 	int			error;
diff --git a/fs/xfs/xfs_iwalk.h b/fs/xfs/xfs_iwalk.h
index 97c1120d4237..56e0dfe1b2ce 100644
--- a/fs/xfs/xfs_iwalk.h
+++ b/fs/xfs/xfs_iwalk.h
@@ -15,6 +15,8 @@ typedef int (*xfs_iwalk_fn)(struct xfs_mount *mp, struct xfs_trans *tp,
 
 int xfs_iwalk(struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t startino,
 		xfs_iwalk_fn iwalk_fn, unsigned int max_prefetch, void *data);
+int xfs_iwalk_threaded(struct xfs_mount *mp, xfs_ino_t startino,
+		xfs_iwalk_fn iwalk_fn, unsigned int max_prefetch, void *data);
 
 /* Walk all inode btree records in the filesystem starting from @startino. */
 typedef int (*xfs_inobt_walk_fn)(struct xfs_mount *mp, struct xfs_trans *tp,
diff --git a/fs/xfs/xfs_pwork.c b/fs/xfs/xfs_pwork.c
new file mode 100644
index 000000000000..8d0d5f130252
--- /dev/null
+++ b/fs/xfs/xfs_pwork.c
@@ -0,0 +1,126 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2019 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_trace.h"
+#include "xfs_sysctl.h"
+#include "xfs_pwork.h"
+
+/*
+ * Parallel Work Queue
+ * ===================
+ *
+ * Abstract away the details of running a large and "obviously" parallelizable
+ * task across multiple CPUs.  Callers initialize the pwork control object with
+ * a desired level of parallelization and a work function.  Next, they embed
+ * struct xfs_pwork in whatever structure they use to pass work context to a
+ * worker thread and queue that pwork.  The work function will be passed the
+ * pwork item when it is run (from process context) and any returned error will
+ * be recorded in xfs_pwork_ctl.error.  Work functions should check for errors
+ * and abort if necessary; the non-zeroness of xfs_pwork_ctl.error does not
+ * stop workqueue item processing.
+ *
+ * This is the rough equivalent of the xfsprogs workqueue code, though we can't
+ * reuse that name here.
+ */
+
+/* Invoke our caller's function. */
+static void
+xfs_pwork_work(
+	struct work_struct	*work)
+{
+	struct xfs_pwork	*pwork;
+	struct xfs_pwork_ctl	*pctl;
+	int			error;
+
+	pwork = container_of(work, struct xfs_pwork, work);
+	pctl = pwork->pctl;
+	error = pctl->work_fn(pctl->mp, pwork);
+	if (error && !pctl->error)
+		pctl->error = error;
+}
+
+/*
+ * Set up control data for parallel work.  @work_fn is the function that will
+ * be called.  @tag will be written into the kernel threads.  @nr_threads is
+ * the level of parallelism desired, or 0 for no limit.
+ */
+int
+xfs_pwork_init(
+	struct xfs_mount	*mp,
+	struct xfs_pwork_ctl	*pctl,
+	xfs_pwork_work_fn	work_fn,
+	const char		*tag,
+	unsigned int		nr_threads)
+{
+#ifdef DEBUG
+	if (xfs_globals.pwork_threads > 0)
+		nr_threads = xfs_globals.pwork_threads;
+	else if (xfs_globals.pwork_threads < 0)
+		nr_threads = 0;
+#endif
+	trace_xfs_pwork_init(mp, nr_threads, current->pid);
+
+	pctl->wq = alloc_workqueue("%s-%d", WQ_FREEZABLE, nr_threads, tag,
+			current->pid);
+	if (!pctl->wq)
+		return -ENOMEM;
+	pctl->work_fn = work_fn;
+	pctl->error = 0;
+	pctl->mp = mp;
+
+	return 0;
+}
+
+/* Queue some parallel work. */
+void
+xfs_pwork_queue(
+	struct xfs_pwork_ctl	*pctl,
+	struct xfs_pwork	*pwork)
+{
+	INIT_WORK(&pwork->work, xfs_pwork_work);
+	pwork->pctl = pctl;
+	queue_work(pctl->wq, &pwork->work);
+}
+
+/* Wait for the work to finish and tear down the control structure. */
+int
+xfs_pwork_destroy(
+	struct xfs_pwork_ctl	*pctl)
+{
+	destroy_workqueue(pctl->wq);
+	pctl->wq = NULL;
+	return pctl->error;
+}
+
+/*
+ * Return the amount of parallelism that the data device can handle, or 0 for
+ * no limit.
+ */
+unsigned int
+xfs_pwork_guess_datadev_parallelism(
+	struct xfs_mount	*mp)
+{
+	struct xfs_buftarg	*btp = mp->m_ddev_targp;
+	int			iomin;
+	int			ioopt;
+
+	if (blk_queue_nonrot(btp->bt_bdev->bd_queue))
+		return num_online_cpus();
+	if (mp->m_sb.sb_width && mp->m_sb.sb_unit)
+		return mp->m_sb.sb_width / mp->m_sb.sb_unit;
+	iomin = bdev_io_min(btp->bt_bdev);
+	ioopt = bdev_io_opt(btp->bt_bdev);
+	if (iomin && ioopt)
+		return ioopt / iomin;
+
+	return 1;
+}
diff --git a/fs/xfs/xfs_pwork.h b/fs/xfs/xfs_pwork.h
new file mode 100644
index 000000000000..4cf1a6f48237
--- /dev/null
+++ b/fs/xfs/xfs_pwork.h
@@ -0,0 +1,58 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2019 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#ifndef __XFS_PWORK_H__
+#define __XFS_PWORK_H__
+
+struct xfs_pwork;
+struct xfs_mount;
+
+typedef int (*xfs_pwork_work_fn)(struct xfs_mount *mp, struct xfs_pwork *pwork);
+
+/*
+ * Parallel work coordination structure.
+ */
+struct xfs_pwork_ctl {
+	struct workqueue_struct	*wq;
+	struct xfs_mount	*mp;
+	xfs_pwork_work_fn	work_fn;
+	int			error;
+};
+
+/*
+ * Embed this parallel work control item inside your own work structure,
+ * then queue work with it.
+ */
+struct xfs_pwork {
+	struct work_struct	work;
+	struct xfs_pwork_ctl	*pctl;
+};
+
+#define XFS_PWORK_SINGLE_THREADED	{ .pctl = NULL }
+
+/* Have we been told to abort? */
+static inline bool
+xfs_pwork_ctl_want_abort(
+	struct xfs_pwork_ctl	*pctl)
+{
+	return pctl && pctl->error;
+}
+
+/* Have we been told to abort? */
+static inline bool
+xfs_pwork_want_abort(
+	struct xfs_pwork	*pwork)
+{
+	return xfs_pwork_ctl_want_abort(pwork->pctl);
+}
+
+int xfs_pwork_init(struct xfs_mount *mp, struct xfs_pwork_ctl *pctl,
+		xfs_pwork_work_fn work_fn, const char *tag,
+		unsigned int nr_threads);
+void xfs_pwork_queue(struct xfs_pwork_ctl *pctl, struct xfs_pwork *pwork);
+int xfs_pwork_destroy(struct xfs_pwork_ctl *pctl);
+unsigned int xfs_pwork_guess_datadev_parallelism(struct xfs_mount *mp);
+
+#endif /* __XFS_PWORK_H__ */
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 52e8ec0aa064..8004c931c86e 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -1304,7 +1304,7 @@ xfs_qm_quotacheck(
 		flags |= XFS_PQUOTA_CHKD;
 	}
 
-	error = xfs_iwalk(mp, NULL, 0, xfs_qm_dqusage_adjust, 0, NULL);
+	error = xfs_iwalk_threaded(mp, 0, xfs_qm_dqusage_adjust, 0, NULL);
 	if (error)
 		goto error_return;
 
diff --git a/fs/xfs/xfs_sysctl.h b/fs/xfs/xfs_sysctl.h
index ad7f9be13087..b555e045e2f4 100644
--- a/fs/xfs/xfs_sysctl.h
+++ b/fs/xfs/xfs_sysctl.h
@@ -37,6 +37,9 @@ typedef struct xfs_param {
 	xfs_sysctl_val_t fstrm_timer;	/* Filestream dir-AG assoc'n timeout. */
 	xfs_sysctl_val_t eofb_timer;	/* Interval between eofb scan wakeups */
 	xfs_sysctl_val_t cowb_timer;	/* Interval between cowb scan wakeups */
+#ifdef DEBUG
+	xfs_sysctl_val_t pwork_threads;	/* Parallel workqueue thread count */
+#endif
 } xfs_param_t;
 
 /*
@@ -82,6 +85,9 @@ enum {
 extern xfs_param_t	xfs_params;
 
 struct xfs_globals {
+#ifdef DEBUG
+	int	pwork_threads;		/* parallel workqueue threads */
+#endif
 	int	log_recovery_delay;	/* log recovery delay (secs) */
 	int	mount_delay;		/* mount setup delay (secs) */
 	bool	bug_on_assert;		/* BUG() the kernel on assert failure */
diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
index cabda13f3c64..910e6b9cb1a7 100644
--- a/fs/xfs/xfs_sysfs.c
+++ b/fs/xfs/xfs_sysfs.c
@@ -206,11 +206,51 @@ always_cow_show(
 }
 XFS_SYSFS_ATTR_RW(always_cow);
 
+#ifdef DEBUG
+/*
+ * Override how many threads the parallel work queue is allowed to create.
+ * This has to be a debug-only global (instead of an errortag) because one of
+ * the main users of parallel workqueues is mount time quotacheck.
+ */
+STATIC ssize_t
+pwork_threads_store(
+	struct kobject	*kobject,
+	const char	*buf,
+	size_t		count)
+{
+	int		ret;
+	int		val;
+
+	ret = kstrtoint(buf, 0, &val);
+	if (ret)
+		return ret;
+
+	if (val < 0 || val > NR_CPUS)
+		return -EINVAL;
+
+	xfs_globals.pwork_threads = val;
+
+	return count;
+}
+
+STATIC ssize_t
+pwork_threads_show(
+	struct kobject	*kobject,
+	char		*buf)
+{
+	return snprintf(buf, PAGE_SIZE, "%d\n", xfs_globals.pwork_threads);
+}
+XFS_SYSFS_ATTR_RW(pwork_threads);
+#endif /* DEBUG */
+
 static struct attribute *xfs_dbg_attrs[] = {
 	ATTR_LIST(bug_on_assert),
 	ATTR_LIST(log_recovery_delay),
 	ATTR_LIST(mount_delay),
 	ATTR_LIST(always_cow),
+#ifdef DEBUG
+	ATTR_LIST(pwork_threads),
+#endif
 	NULL,
 };
 
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index f9bb1d50bc0e..658cbade1998 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3556,6 +3556,24 @@ TRACE_EVENT(xfs_iwalk_ag_rec,
 		  __entry->startino, __entry->freemask)
 )
 
+TRACE_EVENT(xfs_pwork_init,
+	TP_PROTO(struct xfs_mount *mp, unsigned int nr_threads, pid_t pid),
+	TP_ARGS(mp, nr_threads, pid),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned int, nr_threads)
+		__field(pid_t, pid)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->nr_threads = nr_threads;
+		__entry->pid = pid;
+	),
+	TP_printk("dev %d:%d nr_threads %u pid %u",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->nr_threads, __entry->pid)
+)
+
 #endif /* _TRACE_XFS_H */
 
 #undef TRACE_INCLUDE_PATH

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 14/14] xfs: poll waiting for quotacheck
  2019-06-12  6:47 [PATCH v5 00/14] xfs: refactor and improve inode iteration Darrick J. Wong
                   ` (12 preceding siblings ...)
  2019-06-12  6:48 ` [PATCH 13/14] xfs: multithreaded iwalk implementation Darrick J. Wong
@ 2019-06-12  6:49 ` Darrick J. Wong
  2019-06-14 14:07   ` Brian Foster
  13 siblings, 1 reply; 33+ messages in thread
From: Darrick J. Wong @ 2019-06-12  6:49 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Create a pwork destroy function that uses polling instead of
uninterruptible sleep to wait for work items to finish so that we can
touch the softlockup watchdog.  IOWs, gross hack.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_iwalk.c |    3 +++
 fs/xfs/xfs_iwalk.h |    3 ++-
 fs/xfs/xfs_pwork.c |   19 +++++++++++++++++++
 fs/xfs/xfs_pwork.h |    3 +++
 fs/xfs/xfs_qm.c    |    2 +-
 5 files changed, 28 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
index 0fe740298981..f10688cfb917 100644
--- a/fs/xfs/xfs_iwalk.c
+++ b/fs/xfs/xfs_iwalk.c
@@ -575,6 +575,7 @@ xfs_iwalk_threaded(
 	xfs_ino_t		startino,
 	xfs_iwalk_fn		iwalk_fn,
 	unsigned int		max_prefetch,
+	bool			polled,
 	void			*data)
 {
 	struct xfs_pwork_ctl	pctl;
@@ -606,6 +607,8 @@ xfs_iwalk_threaded(
 		startino = XFS_AGINO_TO_INO(mp, agno + 1, 0);
 	}
 
+	if (polled)
+		xfs_pwork_poll(&pctl);
 	return xfs_pwork_destroy(&pctl);
 }
 
diff --git a/fs/xfs/xfs_iwalk.h b/fs/xfs/xfs_iwalk.h
index 56e0dfe1b2ce..202bca4c9c02 100644
--- a/fs/xfs/xfs_iwalk.h
+++ b/fs/xfs/xfs_iwalk.h
@@ -16,7 +16,8 @@ typedef int (*xfs_iwalk_fn)(struct xfs_mount *mp, struct xfs_trans *tp,
 int xfs_iwalk(struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t startino,
 		xfs_iwalk_fn iwalk_fn, unsigned int max_prefetch, void *data);
 int xfs_iwalk_threaded(struct xfs_mount *mp, xfs_ino_t startino,
-		xfs_iwalk_fn iwalk_fn, unsigned int max_prefetch, void *data);
+		xfs_iwalk_fn iwalk_fn, unsigned int max_prefetch, bool poll,
+		void *data);
 
 /* Walk all inode btree records in the filesystem starting from @startino. */
 typedef int (*xfs_inobt_walk_fn)(struct xfs_mount *mp, struct xfs_trans *tp,
diff --git a/fs/xfs/xfs_pwork.c b/fs/xfs/xfs_pwork.c
index 8d0d5f130252..c2f02b710b8c 100644
--- a/fs/xfs/xfs_pwork.c
+++ b/fs/xfs/xfs_pwork.c
@@ -13,6 +13,7 @@
 #include "xfs_trace.h"
 #include "xfs_sysctl.h"
 #include "xfs_pwork.h"
+#include <linux/nmi.h>
 
 /*
  * Parallel Work Queue
@@ -46,6 +47,8 @@ xfs_pwork_work(
 	error = pctl->work_fn(pctl->mp, pwork);
 	if (error && !pctl->error)
 		pctl->error = error;
+	atomic_dec(&pctl->nr_work);
+	wake_up(&pctl->poll_wait);
 }
 
 /*
@@ -76,6 +79,8 @@ xfs_pwork_init(
 	pctl->work_fn = work_fn;
 	pctl->error = 0;
 	pctl->mp = mp;
+	atomic_set(&pctl->nr_work, 0);
+	init_waitqueue_head(&pctl->poll_wait);
 
 	return 0;
 }
@@ -88,6 +93,7 @@ xfs_pwork_queue(
 {
 	INIT_WORK(&pwork->work, xfs_pwork_work);
 	pwork->pctl = pctl;
+	atomic_inc(&pctl->nr_work);
 	queue_work(pctl->wq, &pwork->work);
 }
 
@@ -101,6 +107,19 @@ xfs_pwork_destroy(
 	return pctl->error;
 }
 
+/*
+ * Wait for the work to finish by polling completion status and touch the soft
+ * lockup watchdog.  This is for callers such as mount which hold locks.
+ */
+void
+xfs_pwork_poll(
+	struct xfs_pwork_ctl	*pctl)
+{
+	while (wait_event_timeout(pctl->poll_wait,
+				atomic_read(&pctl->nr_work) == 0, HZ) == 0)
+		touch_softlockup_watchdog();
+}
+
 /*
  * Return the amount of parallelism that the data device can handle, or 0 for
  * no limit.
diff --git a/fs/xfs/xfs_pwork.h b/fs/xfs/xfs_pwork.h
index 4cf1a6f48237..ff93873df8d3 100644
--- a/fs/xfs/xfs_pwork.h
+++ b/fs/xfs/xfs_pwork.h
@@ -18,6 +18,8 @@ struct xfs_pwork_ctl {
 	struct workqueue_struct	*wq;
 	struct xfs_mount	*mp;
 	xfs_pwork_work_fn	work_fn;
+	struct wait_queue_head	poll_wait;
+	atomic_t		nr_work;
 	int			error;
 };
 
@@ -53,6 +55,7 @@ int xfs_pwork_init(struct xfs_mount *mp, struct xfs_pwork_ctl *pctl,
 		unsigned int nr_threads);
 void xfs_pwork_queue(struct xfs_pwork_ctl *pctl, struct xfs_pwork *pwork);
 int xfs_pwork_destroy(struct xfs_pwork_ctl *pctl);
+void xfs_pwork_poll(struct xfs_pwork_ctl *pctl);
 unsigned int xfs_pwork_guess_datadev_parallelism(struct xfs_mount *mp);
 
 #endif /* __XFS_PWORK_H__ */
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 8004c931c86e..8bb902125403 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -1304,7 +1304,7 @@ xfs_qm_quotacheck(
 		flags |= XFS_PQUOTA_CHKD;
 	}
 
-	error = xfs_iwalk_threaded(mp, 0, xfs_qm_dqusage_adjust, 0, NULL);
+	error = xfs_iwalk_threaded(mp, 0, xfs_qm_dqusage_adjust, 0, true, NULL);
 	if (error)
 		goto error_return;
 

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/14] xfs: create iterator error codes
  2019-06-12  6:47 ` [PATCH 01/14] xfs: create iterator error codes Darrick J. Wong
@ 2019-06-13 16:24   ` Brian Foster
  0 siblings, 0 replies; 33+ messages in thread
From: Brian Foster @ 2019-06-13 16:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Jun 11, 2019 at 11:47:37PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Currently, xfs doesn't have generic error codes defined for "stop
> iterating"; we just reuse the XFS_BTREE_QUERY_* return values.  This
> looks a little weird if we're not actually iterating a btree index.
> Before we start adding more iterators, we should create general
> XFS_ITER_{CONTINUE,ABORT} return values and define the XFS_BTREE_QUERY_*
> ones from that.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Have you given any thought to just replacing
XFS_BTREE_QUERY_RANGE_[ABORT|CONTINUE] with the generic ITER variants
and using the latter wherever applicable?

This patch looks fine either way:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/libxfs/xfs_alloc.c  |    2 +-
>  fs/xfs/libxfs/xfs_btree.h  |    4 ++--
>  fs/xfs/libxfs/xfs_shared.h |    6 ++++++
>  fs/xfs/scrub/agheader.c    |    4 ++--
>  fs/xfs/scrub/repair.c      |    4 ++--
>  fs/xfs/xfs_dquot.c         |    2 +-
>  6 files changed, 14 insertions(+), 8 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index a9ff3cf82cce..b9eb3a8aeaf9 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -3146,7 +3146,7 @@ xfs_alloc_has_record(
>  
>  /*
>   * Walk all the blocks in the AGFL.  The @walk_fn can return any negative
> - * error code or XFS_BTREE_QUERY_RANGE_ABORT.
> + * error code or XFS_ITER_*.
>   */
>  int
>  xfs_agfl_walk(
> diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> index e3b3e9dce5da..94530766dd30 100644
> --- a/fs/xfs/libxfs/xfs_btree.h
> +++ b/fs/xfs/libxfs/xfs_btree.h
> @@ -469,8 +469,8 @@ uint xfs_btree_compute_maxlevels(uint *limits, unsigned long len);
>  unsigned long long xfs_btree_calc_size(uint *limits, unsigned long long len);
>  
>  /* return codes */
> -#define XFS_BTREE_QUERY_RANGE_CONTINUE	0	/* keep iterating */
> -#define XFS_BTREE_QUERY_RANGE_ABORT	1	/* stop iterating */
> +#define XFS_BTREE_QUERY_RANGE_CONTINUE	(XFS_ITER_CONTINUE) /* keep iterating */
> +#define XFS_BTREE_QUERY_RANGE_ABORT	(XFS_ITER_ABORT)    /* stop iterating */
>  typedef int (*xfs_btree_query_range_fn)(struct xfs_btree_cur *cur,
>  		union xfs_btree_rec *rec, void *priv);
>  
> diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
> index 4e909791aeac..fa788139dfe3 100644
> --- a/fs/xfs/libxfs/xfs_shared.h
> +++ b/fs/xfs/libxfs/xfs_shared.h
> @@ -136,4 +136,10 @@ void xfs_symlink_local_to_remote(struct xfs_trans *tp, struct xfs_buf *bp,
>  				 struct xfs_inode *ip, struct xfs_ifork *ifp);
>  xfs_failaddr_t xfs_symlink_shortform_verify(struct xfs_inode *ip);
>  
> +/* Keep iterating the data structure. */
> +#define XFS_ITER_CONTINUE	(0)
> +
> +/* Stop iterating the data structure. */
> +#define XFS_ITER_ABORT		(1)
> +
>  #endif /* __XFS_SHARED_H__ */
> diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
> index adaeabdefdd3..1d5361f9ebfc 100644
> --- a/fs/xfs/scrub/agheader.c
> +++ b/fs/xfs/scrub/agheader.c
> @@ -646,7 +646,7 @@ xchk_agfl_block(
>  	xchk_agfl_block_xref(sc, agbno);
>  
>  	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
> -		return XFS_BTREE_QUERY_RANGE_ABORT;
> +		return XFS_ITER_ABORT;
>  
>  	return 0;
>  }
> @@ -737,7 +737,7 @@ xchk_agfl(
>  	/* Check the blocks in the AGFL. */
>  	error = xfs_agfl_walk(sc->mp, XFS_BUF_TO_AGF(sc->sa.agf_bp),
>  			sc->sa.agfl_bp, xchk_agfl_block, &sai);
> -	if (error == XFS_BTREE_QUERY_RANGE_ABORT) {
> +	if (error == XFS_ITER_ABORT) {
>  		error = 0;
>  		goto out_free;
>  	}
> diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> index eb358f0f5e0a..e2a352c1bad7 100644
> --- a/fs/xfs/scrub/repair.c
> +++ b/fs/xfs/scrub/repair.c
> @@ -672,7 +672,7 @@ xrep_findroot_agfl_walk(
>  {
>  	xfs_agblock_t		*agbno = priv;
>  
> -	return (*agbno == bno) ? XFS_BTREE_QUERY_RANGE_ABORT : 0;
> +	return (*agbno == bno) ? XFS_ITER_ABORT : 0;
>  }
>  
>  /* Does this block match the btree information passed in? */
> @@ -702,7 +702,7 @@ xrep_findroot_block(
>  	if (owner == XFS_RMAP_OWN_AG) {
>  		error = xfs_agfl_walk(mp, ri->agf, ri->agfl_bp,
>  				xrep_findroot_agfl_walk, &agbno);
> -		if (error == XFS_BTREE_QUERY_RANGE_ABORT)
> +		if (error == XFS_ITER_ABORT)
>  			return 0;
>  		if (error)
>  			return error;
> diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> index a1af984e4913..8674551c5e98 100644
> --- a/fs/xfs/xfs_dquot.c
> +++ b/fs/xfs/xfs_dquot.c
> @@ -1243,7 +1243,7 @@ xfs_qm_exit(void)
>  /*
>   * Iterate every dquot of a particular type.  The caller must ensure that the
>   * particular quota type is active.  iter_fn can return negative error codes,
> - * or XFS_BTREE_QUERY_RANGE_ABORT to indicate that it wants to stop iterating.
> + * or XFS_ITER_ABORT to indicate that it wants to stop iterating.
>   */
>  int
>  xfs_qm_dqiterate(
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/14] xfs: create simplified inode walk function
  2019-06-12  6:47 ` [PATCH 02/14] xfs: create simplified inode walk function Darrick J. Wong
@ 2019-06-13 16:27   ` Brian Foster
  2019-06-13 18:06     ` Darrick J. Wong
  0 siblings, 1 reply; 33+ messages in thread
From: Brian Foster @ 2019-06-13 16:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Jun 11, 2019 at 11:47:44PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create a new iterator function to simplify walking inodes in an XFS
> filesystem.  This new iterator will replace the existing open-coded
> walking that goes on in various places.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile                  |    1 
>  fs/xfs/libxfs/xfs_ialloc_btree.c |   36 +++
>  fs/xfs/libxfs/xfs_ialloc_btree.h |    3 
>  fs/xfs/xfs_itable.c              |    5 
>  fs/xfs/xfs_itable.h              |    8 +
>  fs/xfs/xfs_iwalk.c               |  418 ++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_iwalk.h               |   19 ++
>  fs/xfs/xfs_trace.h               |   40 ++++
>  8 files changed, 524 insertions(+), 6 deletions(-)
>  create mode 100644 fs/xfs/xfs_iwalk.c
>  create mode 100644 fs/xfs/xfs_iwalk.h
> 
> 
...
> diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
> new file mode 100644
> index 000000000000..49289588413f
> --- /dev/null
> +++ b/fs/xfs/xfs_iwalk.c
> @@ -0,0 +1,418 @@
...
> +/* Allocate memory for a walk. */
> +STATIC int
> +xfs_iwalk_alloc(
> +	struct xfs_iwalk_ag	*iwag)
> +{
> +	size_t			size;
> +
> +	ASSERT(iwag->recs == NULL);
> +	iwag->nr_recs = 0;
> +
> +	/* Allocate a prefetch buffer for inobt records. */
> +	size = iwag->sz_recs * sizeof(struct xfs_inobt_rec_incore);
> +	iwag->recs = kmem_alloc(size, KM_MAYFAIL);
> +	if (iwag->recs == NULL)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +/* Free memory we allocated for a walk. */
> +STATIC void
> +xfs_iwalk_free(
> +	struct xfs_iwalk_ag	*iwag)
> +{
> +	kmem_free(iwag->recs);

It might be a good idea to ->recs = NULL here since the alloc call
asserts for that (if any future code happens to free and realloc the
recs buffer for whatever reason).

> +}
> +
...
> +/* Walk all inodes in a single AG, from @iwag->startino to the end of the AG. */
> +STATIC int
> +xfs_iwalk_ag(
> +	struct xfs_iwalk_ag		*iwag)
> +{
> +	struct xfs_mount		*mp = iwag->mp;
> +	struct xfs_trans		*tp = iwag->tp;
> +	struct xfs_buf			*agi_bp = NULL;
> +	struct xfs_btree_cur		*cur = NULL;
> +	xfs_agnumber_t			agno;
> +	xfs_agino_t			agino;
> +	int				has_more;
> +	int				error = 0;
> +
> +	/* Set up our cursor at the right place in the inode btree. */
> +	agno = XFS_INO_TO_AGNO(mp, iwag->startino);
> +	agino = XFS_INO_TO_AGINO(mp, iwag->startino);
> +	error = xfs_iwalk_ag_start(iwag, agno, agino, &cur, &agi_bp, &has_more);
> +
> +	while (!error && has_more) {
> +		struct xfs_inobt_rec_incore	*irec;
> +
> +		cond_resched();
> +
> +		/* Fetch the inobt record. */
> +		irec = &iwag->recs[iwag->nr_recs];
> +		error = xfs_inobt_get_rec(cur, irec, &has_more);
> +		if (error || !has_more)
> +			break;
> +
> +		/* No allocated inodes in this chunk; skip it. */
> +		if (irec->ir_freecount == irec->ir_count) {
> +			error = xfs_btree_increment(cur, 0, &has_more);
> +			if (error)
> +				break;
> +			continue;
> +		}
> +
> +		/*
> +		 * Start readahead for this inode chunk in anticipation of
> +		 * walking the inodes.
> +		 */
> +		xfs_bulkstat_ichunk_ra(mp, agno, irec);
> +
> +		/*
> +		 * If there's space in the buffer for more records, increment
> +		 * the btree cursor and grab more.
> +		 */
> +		if (++iwag->nr_recs < iwag->sz_recs) {
> +			error = xfs_btree_increment(cur, 0, &has_more);
> +			if (error || !has_more)
> +				break;
> +			continue;
> +		}
> +
> +		/*
> +		 * Otherwise, we need to save cursor state and run the callback
> +		 * function on the cached records.  The run_callbacks function
> +		 * is supposed to return a cursor pointing to the record where
> +		 * we would be if we had been able to increment like above.
> +		 */
> +		has_more = true;

has_more should always be true if we get here right? If so, perhaps
better to replace this with ASSERT(has_more).

> +		error = xfs_iwalk_run_callbacks(iwag, agno, &cur, &agi_bp,
> +				&has_more);
> +	}
> +
> +	if (iwag->nr_recs == 0 || error)
> +		goto out;
> +
> +	/* Walk the unprocessed records in the cache. */
> +	error = xfs_iwalk_run_callbacks(iwag, agno, &cur, &agi_bp, &has_more);
> +
> +out:
> +	xfs_iwalk_del_inobt(tp, &cur, &agi_bp, error);
> +	return error;
> +}
> +
> +/*
> + * Given the number of inodes to prefetch, set the number of inobt records that
> + * we cache in memory, which controls the number of inodes we try to read
> + * ahead.
> + */
> +static inline void
> +xfs_iwalk_set_prefetch(
> +	struct xfs_iwalk_ag	*iwag,
> +	unsigned int		max_prefetch)
> +{
> +	/*
> +	 * Default to 4096 bytes' worth of inobt records; this should be plenty
> +	 * of inodes to read ahead.  This number was chosen so that the cache
> +	 * is never more than a single memory page and the amount of inode
> +	 * readahead is limited to to 16k inodes regardless of CPU:
> +	 *
> +	 * 4096 bytes / 16 bytes per inobt record = 256 inobt records
> +	 * 256 inobt records * 64 inodes per record = 16384 inodes
> +	 * 16384 inodes * 512 bytes per inode(?) = 8MB of inode readahead
> +	 */
> +	iwag->sz_recs = 4096 / sizeof(struct xfs_inobt_rec_incore);
> +

So we decided not to preserve current readahead behavior in this patch?

> +	/*
> +	 * If the caller gives us a desired prefetch amount, round it up to
> +	 * an even inode chunk and cap it as defined previously.
> +	 */
> +	if (max_prefetch) {
> +		unsigned int	nr;
> +
> +		nr = round_up(max_prefetch, XFS_INODES_PER_CHUNK) /
> +				XFS_INODES_PER_CHUNK;
> +		iwag->sz_recs = min_t(unsigned int, iwag->sz_recs, nr);

This is comparing the record count calculated above with max_prefetch,
which the rounding just above suggests is in inodes. BTW, could we add a
one line /* prefetch in inodes */ comment on the max_prefetch parameter
line at the top of the function?

Aside from those nits the rest looks good to me.

Brian

> +	}
> +
> +	/*
> +	 * Allocate enough space to prefetch at least two records so that we
> +	 * can cache both the inobt record where the iwalk started and the next
> +	 * record.  This simplifies the AG inode walk loop setup code.
> +	 */
> +	iwag->sz_recs = max_t(unsigned int, iwag->sz_recs, 2);
> +}
> +
> +/*
> + * Walk all inodes in the filesystem starting from @startino.  The @iwalk_fn
> + * will be called for each allocated inode, being passed the inode's number and
> + * @data.  @max_prefetch controls how many inobt records' worth of inodes we
> + * try to readahead.
> + */
> +int
> +xfs_iwalk(
> +	struct xfs_mount	*mp,
> +	struct xfs_trans	*tp,
> +	xfs_ino_t		startino,
> +	xfs_iwalk_fn		iwalk_fn,
> +	unsigned int		max_prefetch,
> +	void			*data)
> +{
> +	struct xfs_iwalk_ag	iwag = {
> +		.mp		= mp,
> +		.tp		= tp,
> +		.iwalk_fn	= iwalk_fn,
> +		.data		= data,
> +		.startino	= startino,
> +	};
> +	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, startino);
> +	int			error;
> +
> +	ASSERT(agno < mp->m_sb.sb_agcount);
> +
> +	xfs_iwalk_set_prefetch(&iwag, max_prefetch);
> +	error = xfs_iwalk_alloc(&iwag);
> +	if (error)
> +		return error;
> +
> +	for (; agno < mp->m_sb.sb_agcount; agno++) {
> +		error = xfs_iwalk_ag(&iwag);
> +		if (error)
> +			break;
> +		iwag.startino = XFS_AGINO_TO_INO(mp, agno + 1, 0);
> +	}
> +
> +	xfs_iwalk_free(&iwag);
> +	return error;
> +}
> diff --git a/fs/xfs/xfs_iwalk.h b/fs/xfs/xfs_iwalk.h
> new file mode 100644
> index 000000000000..9e762e31dadc
> --- /dev/null
> +++ b/fs/xfs/xfs_iwalk.h
> @@ -0,0 +1,19 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Copyright (C) 2019 Oracle.  All Rights Reserved.
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + */
> +#ifndef __XFS_IWALK_H__
> +#define __XFS_IWALK_H__
> +
> +/* Walk all inodes in the filesystem starting from @startino. */
> +typedef int (*xfs_iwalk_fn)(struct xfs_mount *mp, struct xfs_trans *tp,
> +			    xfs_ino_t ino, void *data);
> +/* Return values for xfs_iwalk_fn. */
> +#define XFS_IWALK_CONTINUE	(XFS_ITER_CONTINUE)
> +#define XFS_IWALK_ABORT		(XFS_ITER_ABORT)
> +
> +int xfs_iwalk(struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t startino,
> +		xfs_iwalk_fn iwalk_fn, unsigned int max_prefetch, void *data);
> +
> +#endif /* __XFS_IWALK_H__ */
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 2464ea351f83..f9bb1d50bc0e 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3516,6 +3516,46 @@ DEFINE_EVENT(xfs_inode_corrupt_class, name,	\
>  DEFINE_INODE_CORRUPT_EVENT(xfs_inode_mark_sick);
>  DEFINE_INODE_CORRUPT_EVENT(xfs_inode_mark_healthy);
>  
> +TRACE_EVENT(xfs_iwalk_ag,
> +	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
> +		 xfs_agino_t startino),
> +	TP_ARGS(mp, agno, startino),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_agnumber_t, agno)
> +		__field(xfs_agino_t, startino)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = mp->m_super->s_dev;
> +		__entry->agno = agno;
> +		__entry->startino = startino;
> +	),
> +	TP_printk("dev %d:%d agno %d startino %u",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->agno,
> +		  __entry->startino)
> +)
> +
> +TRACE_EVENT(xfs_iwalk_ag_rec,
> +	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
> +		 struct xfs_inobt_rec_incore *irec),
> +	TP_ARGS(mp, agno, irec),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_agnumber_t, agno)
> +		__field(xfs_agino_t, startino)
> +		__field(uint64_t, freemask)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = mp->m_super->s_dev;
> +		__entry->agno = agno;
> +		__entry->startino = irec->ir_startino;
> +		__entry->freemask = irec->ir_free;
> +	),
> +	TP_printk("dev %d:%d agno %d startino %u freemask 0x%llx",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->agno,
> +		  __entry->startino, __entry->freemask)
> +)
> +
>  #endif /* _TRACE_XFS_H */
>  
>  #undef TRACE_INCLUDE_PATH
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 05/14] xfs: remove unnecessary includes of xfs_itable.h
  2019-06-12  6:48 ` [PATCH 05/14] xfs: remove unnecessary includes of xfs_itable.h Darrick J. Wong
@ 2019-06-13 16:27   ` Brian Foster
  0 siblings, 0 replies; 33+ messages in thread
From: Brian Foster @ 2019-06-13 16:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Jun 11, 2019 at 11:48:03PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Don't include xfs_itable.h in files that don't need it.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/scrub/common.c |    1 -
>  fs/xfs/scrub/dir.c    |    1 -
>  fs/xfs/scrub/scrub.c  |    1 -
>  fs/xfs/xfs_trace.c    |    1 -
>  4 files changed, 4 deletions(-)
> 
> 
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 973aa59975e3..561d7e818e8b 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -17,7 +17,6 @@
>  #include "xfs_sb.h"
>  #include "xfs_inode.h"
>  #include "xfs_icache.h"
> -#include "xfs_itable.h"
>  #include "xfs_alloc.h"
>  #include "xfs_alloc_btree.h"
>  #include "xfs_bmap.h"
> diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
> index a38a22785a1a..9018ca4aba64 100644
> --- a/fs/xfs/scrub/dir.c
> +++ b/fs/xfs/scrub/dir.c
> @@ -17,7 +17,6 @@
>  #include "xfs_sb.h"
>  #include "xfs_inode.h"
>  #include "xfs_icache.h"
> -#include "xfs_itable.h"
>  #include "xfs_da_format.h"
>  #include "xfs_da_btree.h"
>  #include "xfs_dir2.h"
> diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> index f630389ee176..5689a33e999c 100644
> --- a/fs/xfs/scrub/scrub.c
> +++ b/fs/xfs/scrub/scrub.c
> @@ -17,7 +17,6 @@
>  #include "xfs_sb.h"
>  #include "xfs_inode.h"
>  #include "xfs_icache.h"
> -#include "xfs_itable.h"
>  #include "xfs_alloc.h"
>  #include "xfs_alloc_btree.h"
>  #include "xfs_bmap.h"
> diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c
> index cb6489c22cad..f555a3c560b9 100644
> --- a/fs/xfs/xfs_trace.c
> +++ b/fs/xfs/xfs_trace.c
> @@ -16,7 +16,6 @@
>  #include "xfs_btree.h"
>  #include "xfs_da_btree.h"
>  #include "xfs_ialloc.h"
> -#include "xfs_itable.h"
>  #include "xfs_alloc.h"
>  #include "xfs_bmap.h"
>  #include "xfs_attr.h"
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/14] xfs: convert bulkstat to new iwalk infrastructure
  2019-06-12  6:48 ` [PATCH 06/14] xfs: convert bulkstat to new iwalk infrastructure Darrick J. Wong
@ 2019-06-13 16:31   ` Brian Foster
  2019-06-13 18:12     ` Darrick J. Wong
  0 siblings, 1 reply; 33+ messages in thread
From: Brian Foster @ 2019-06-13 16:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Jun 11, 2019 at 11:48:09PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create a new ibulk structure incore to help us deal with bulk inode stat
> state tracking and then convert the bulkstat code to use the new iwalk
> iterator.  This disentangles inode walking from bulk stat control for
> simpler code and enables us to isolate the formatter functions to the
> ioctl handling code.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/xfs_ioctl.c   |   70 ++++++--
>  fs/xfs/xfs_ioctl.h   |    5 +
>  fs/xfs/xfs_ioctl32.c |   93 ++++++-----
>  fs/xfs/xfs_itable.c  |  431 ++++++++++++++++----------------------------------
>  fs/xfs/xfs_itable.h  |   79 ++++-----
>  5 files changed, 272 insertions(+), 406 deletions(-)
> 
> 
...
> diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> index 814ffe6fbab7..5d1c143bac18 100644
> --- a/fs/xfs/xfs_ioctl32.c
> +++ b/fs/xfs/xfs_ioctl32.c
...
> @@ -284,38 +266,59 @@ xfs_compat_ioc_bulkstat(
>  		return -EFAULT;
>  	bulkreq.ocount = compat_ptr(addr);
>  
> -	if (copy_from_user(&inlast, bulkreq.lastip, sizeof(__s64)))
> +	if (copy_from_user(&lastino, bulkreq.lastip, sizeof(__s64)))
>  		return -EFAULT;
> +	breq.startino = lastino + 1;
>  

Spurious assignment?

> -	if ((count = bulkreq.icount) <= 0)
> +	if (bulkreq.icount <= 0)
>  		return -EINVAL;
>  
>  	if (bulkreq.ubuffer == NULL)
>  		return -EINVAL;
>  
> +	breq.ubuffer = bulkreq.ubuffer;
> +	breq.icount = bulkreq.icount;
> +
...
> diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> index 3ca1c454afe6..58e411e11d6c 100644
> --- a/fs/xfs/xfs_itable.c
> +++ b/fs/xfs/xfs_itable.c
> @@ -14,47 +14,68 @@
...
> +STATIC int
>  xfs_bulkstat_one_int(
> -	struct xfs_mount	*mp,		/* mount point for filesystem */
> -	xfs_ino_t		ino,		/* inode to get data for */
> -	void __user		*buffer,	/* buffer to place output in */
> -	int			ubsize,		/* size of buffer */
> -	bulkstat_one_fmt_pf	formatter,	/* formatter, copy to user */
> -	int			*ubused,	/* bytes used by me */
> -	int			*stat)		/* BULKSTAT_RV_... */
> +	struct xfs_mount	*mp,
> +	struct xfs_trans	*tp,
> +	xfs_ino_t		ino,
> +	void			*data)

There's no need for a void pointer here given the current usage. We
might as well pass this as bc (and let the caller cast it, if
necessary).

That said, it also looks like the only reason we have the
xfs_bulkstat_iwalk wrapper caller of this function is to filter out
certain error values. If those errors are needed for the single inode
case, we could stick something in the bc to toggle that invalid inode
filtering behavior and eliminate the need for the wrapper entirely
(which would pass _one_int() into the iwalk infra directly and require
retaining the void pointer).

>  {
> +	struct xfs_bstat_chunk	*bc = data;
>  	struct xfs_icdinode	*dic;		/* dinode core info pointer */
>  	struct xfs_inode	*ip;		/* incore inode pointer */
>  	struct inode		*inode;
> -	struct xfs_bstat	*buf;		/* return buffer */
> -	int			error = 0;	/* error value */
> +	struct xfs_bstat	*buf = bc->buf;
> +	int			error = -EINVAL;
>  
> -	*stat = BULKSTAT_RV_NOTHING;
> +	if (xfs_internal_inum(mp, ino))
> +		goto out_advance;
>  
> -	if (!buffer || xfs_internal_inum(mp, ino))
> -		return -EINVAL;
> -
> -	buf = kmem_zalloc(sizeof(*buf), KM_SLEEP | KM_MAYFAIL);
> -	if (!buf)
> -		return -ENOMEM;
> -
> -	error = xfs_iget(mp, NULL, ino,
> +	error = xfs_iget(mp, tp, ino,
>  			 (XFS_IGET_DONTCACHE | XFS_IGET_UNTRUSTED),
>  			 XFS_ILOCK_SHARED, &ip);
> +	if (error == -ENOENT || error == -EINVAL)
> +		goto out_advance;
>  	if (error)
> -		goto out_free;
> +		goto out;
>  
>  	ASSERT(ip != NULL);
>  	ASSERT(ip->i_imap.im_blkno != 0);
> @@ -119,43 +140,56 @@ xfs_bulkstat_one_int(
>  	xfs_iunlock(ip, XFS_ILOCK_SHARED);
>  	xfs_irele(ip);
>  
> -	error = formatter(buffer, ubsize, ubused, buf);
> -	if (!error)
> -		*stat = BULKSTAT_RV_DIDONE;
> +	error = bc->formatter(bc->breq, buf);
> +	if (error == XFS_IBULK_BUFFER_FULL) {
> +		error = XFS_IWALK_ABORT;

Related to the earlier patch.. is there a need for IBULK_BUFFER_FULL if
the only user converts it to the generic abort error?

Most of these comments are minor/aesthetic, so:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> +		goto out_advance;
> +	}
> +	if (error)
> +		goto out;
>  
> - out_free:
> -	kmem_free(buf);
> +out_advance:
> +	/*
> +	 * Advance the cursor to the inode that comes after the one we just
> +	 * looked at.  We want the caller to move along if the bulkstat
> +	 * information was copied successfully; if we tried to grab the inode
> +	 * but it's no longer allocated; or if it's internal metadata.
> +	 */
> +	bc->breq->startino = ino + 1;
> +out:
>  	return error;
>  }
>  
> -/* Return 0 on success or positive error */
> -STATIC int
> -xfs_bulkstat_one_fmt(
> -	void			__user *ubuffer,
> -	int			ubsize,
> -	int			*ubused,
> -	const xfs_bstat_t	*buffer)
> -{
> -	if (ubsize < sizeof(*buffer))
> -		return -ENOMEM;
> -	if (copy_to_user(ubuffer, buffer, sizeof(*buffer)))
> -		return -EFAULT;
> -	if (ubused)
> -		*ubused = sizeof(*buffer);
> -	return 0;
> -}
> -
> +/* Bulkstat a single inode. */
>  int
>  xfs_bulkstat_one(
> -	xfs_mount_t	*mp,		/* mount point for filesystem */
> -	xfs_ino_t	ino,		/* inode number to get data for */
> -	void		__user *buffer,	/* buffer to place output in */
> -	int		ubsize,		/* size of buffer */
> -	int		*ubused,	/* bytes used by me */
> -	int		*stat)		/* BULKSTAT_RV_... */
> +	struct xfs_ibulk	*breq,
> +	bulkstat_one_fmt_pf	formatter)
>  {
> -	return xfs_bulkstat_one_int(mp, ino, buffer, ubsize,
> -				    xfs_bulkstat_one_fmt, ubused, stat);
> +	struct xfs_bstat_chunk	bc = {
> +		.formatter	= formatter,
> +		.breq		= breq,
> +	};
> +	int			error;
> +
> +	ASSERT(breq->icount == 1);
> +
> +	bc.buf = kmem_zalloc(sizeof(struct xfs_bstat), KM_SLEEP | KM_MAYFAIL);
> +	if (!bc.buf)
> +		return -ENOMEM;
> +
> +	error = xfs_bulkstat_one_int(breq->mp, NULL, breq->startino, &bc);
> +
> +	kmem_free(bc.buf);
> +
> +	/*
> +	 * If we reported one inode to userspace then we abort because we hit
> +	 * the end of the buffer.  Don't leak that back to userspace.
> +	 */
> +	if (error == XFS_IWALK_ABORT)
> +		error = 0;
> +
> +	return error;
>  }
>  
>  /*
> @@ -251,256 +285,69 @@ xfs_bulkstat_grab_ichunk(
>  
>  #define XFS_BULKSTAT_UBLEFT(ubleft)	((ubleft) >= statstruct_size)
>  
> -struct xfs_bulkstat_agichunk {
> -	char		__user **ac_ubuffer;/* pointer into user's buffer */
> -	int		ac_ubleft;	/* bytes left in user's buffer */
> -	int		ac_ubelem;	/* spaces used in user's buffer */
> -};
> -
> -/*
> - * Process inodes in chunk with a pointer to a formatter function
> - * that will iget the inode and fill in the appropriate structure.
> - */
>  static int
> -xfs_bulkstat_ag_ichunk(
> -	struct xfs_mount		*mp,
> -	xfs_agnumber_t			agno,
> -	struct xfs_inobt_rec_incore	*irbp,
> -	bulkstat_one_pf			formatter,
> -	size_t				statstruct_size,
> -	struct xfs_bulkstat_agichunk	*acp,
> -	xfs_agino_t			*last_agino)
> +xfs_bulkstat_iwalk(
> +	struct xfs_mount	*mp,
> +	struct xfs_trans	*tp,
> +	xfs_ino_t		ino,
> +	void			*data)
>  {
> -	char				__user **ubufp = acp->ac_ubuffer;
> -	int				chunkidx;
> -	int				error = 0;
> -	xfs_agino_t			agino = irbp->ir_startino;
> -
> -	for (chunkidx = 0; chunkidx < XFS_INODES_PER_CHUNK;
> -	     chunkidx++, agino++) {
> -		int		fmterror;
> -		int		ubused;
> -
> -		/* inode won't fit in buffer, we are done */
> -		if (acp->ac_ubleft < statstruct_size)
> -			break;
> -
> -		/* Skip if this inode is free */
> -		if (XFS_INOBT_MASK(chunkidx) & irbp->ir_free)
> -			continue;
> -
> -		/* Get the inode and fill in a single buffer */
> -		ubused = statstruct_size;
> -		error = formatter(mp, XFS_AGINO_TO_INO(mp, agno, agino),
> -				  *ubufp, acp->ac_ubleft, &ubused, &fmterror);
> -
> -		if (fmterror == BULKSTAT_RV_GIVEUP ||
> -		    (error && error != -ENOENT && error != -EINVAL)) {
> -			acp->ac_ubleft = 0;
> -			ASSERT(error);
> -			break;
> -		}
> -
> -		/* be careful not to leak error if at end of chunk */
> -		if (fmterror == BULKSTAT_RV_NOTHING || error) {
> -			error = 0;
> -			continue;
> -		}
> -
> -		*ubufp += ubused;
> -		acp->ac_ubleft -= ubused;
> -		acp->ac_ubelem++;
> -	}
> -
> -	/*
> -	 * Post-update *last_agino. At this point, agino will always point one
> -	 * inode past the last inode we processed successfully. Hence we
> -	 * substract that inode when setting the *last_agino cursor so that we
> -	 * return the correct cookie to userspace. On the next bulkstat call,
> -	 * the inode under the lastino cookie will be skipped as we have already
> -	 * processed it here.
> -	 */
> -	*last_agino = agino - 1;
> +	int			error;
>  
> +	error = xfs_bulkstat_one_int(mp, tp, ino, data);
> +	/* bulkstat just skips over missing inodes */
> +	if (error == -ENOENT || error == -EINVAL)
> +		return 0;
>  	return error;
>  }
>  
>  /*
> - * Return stat information in bulk (by-inode) for the filesystem.
> + * Check the incoming lastino parameter.
> + *
> + * We allow any inode value that could map to physical space inside the
> + * filesystem because if there are no inodes there, bulkstat moves on to the
> + * next chunk.  In other words, the magic agino value of zero takes us to the
> + * first chunk in the AG, and an agino value past the end of the AG takes us to
> + * the first chunk in the next AG.
> + *
> + * Therefore we can end early if the requested inode is beyond the end of the
> + * filesystem or doesn't map properly.
>   */
> -int					/* error status */
> -xfs_bulkstat(
> -	xfs_mount_t		*mp,	/* mount point for filesystem */
> -	xfs_ino_t		*lastinop, /* last inode returned */
> -	int			*ubcountp, /* size of buffer/count returned */
> -	bulkstat_one_pf		formatter, /* func that'd fill a single buf */
> -	size_t			statstruct_size, /* sizeof struct filling */
> -	char			__user *ubuffer, /* buffer with inode stats */
> -	int			*done)	/* 1 if there are more stats to get */
> +static inline bool
> +xfs_bulkstat_already_done(
> +	struct xfs_mount	*mp,
> +	xfs_ino_t		startino)
>  {
> -	xfs_buf_t		*agbp;	/* agi header buffer */
> -	xfs_agino_t		agino;	/* inode # in allocation group */
> -	xfs_agnumber_t		agno;	/* allocation group number */
> -	xfs_btree_cur_t		*cur;	/* btree cursor for ialloc btree */
> -	xfs_inobt_rec_incore_t	*irbuf;	/* start of irec buffer */
> -	int			nirbuf;	/* size of irbuf */
> -	int			ubcount; /* size of user's buffer */
> -	struct xfs_bulkstat_agichunk ac;
> -	int			error = 0;
> +	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, startino);
> +	xfs_agino_t		agino = XFS_INO_TO_AGINO(mp, startino);
>  
> -	/*
> -	 * Get the last inode value, see if there's nothing to do.
> -	 */
> -	agno = XFS_INO_TO_AGNO(mp, *lastinop);
> -	agino = XFS_INO_TO_AGINO(mp, *lastinop);
> -	if (agno >= mp->m_sb.sb_agcount ||
> -	    *lastinop != XFS_AGINO_TO_INO(mp, agno, agino)) {
> -		*done = 1;
> -		*ubcountp = 0;
> -		return 0;
> -	}
> +	return agno >= mp->m_sb.sb_agcount ||
> +	       startino != XFS_AGINO_TO_INO(mp, agno, agino);
> +}
>  
> -	ubcount = *ubcountp; /* statstruct's */
> -	ac.ac_ubuffer = &ubuffer;
> -	ac.ac_ubleft = ubcount * statstruct_size; /* bytes */;
> -	ac.ac_ubelem = 0;
> +/* Return stat information in bulk (by-inode) for the filesystem. */
> +int
> +xfs_bulkstat(
> +	struct xfs_ibulk	*breq,
> +	bulkstat_one_fmt_pf	formatter)
> +{
> +	struct xfs_bstat_chunk	bc = {
> +		.formatter	= formatter,
> +		.breq		= breq,
> +	};
> +	int			error;
>  
> -	*ubcountp = 0;
> -	*done = 0;
> +	if (xfs_bulkstat_already_done(breq->mp, breq->startino))
> +		return 0;
>  
> -	irbuf = kmem_zalloc_large(PAGE_SIZE * 4, KM_SLEEP);
> -	if (!irbuf)
> +	bc.buf = kmem_zalloc(sizeof(struct xfs_bstat), KM_SLEEP | KM_MAYFAIL);
> +	if (!bc.buf)
>  		return -ENOMEM;
> -	nirbuf = (PAGE_SIZE * 4) / sizeof(*irbuf);
>  
> -	/*
> -	 * Loop over the allocation groups, starting from the last
> -	 * inode returned; 0 means start of the allocation group.
> -	 */
> -	while (agno < mp->m_sb.sb_agcount) {
> -		struct xfs_inobt_rec_incore	*irbp = irbuf;
> -		struct xfs_inobt_rec_incore	*irbufend = irbuf + nirbuf;
> -		bool				end_of_ag = false;
> -		int				icount = 0;
> -		int				stat;
> +	error = xfs_iwalk(breq->mp, NULL, breq->startino, xfs_bulkstat_iwalk,
> +			breq->icount, &bc);
>  
> -		error = xfs_ialloc_read_agi(mp, NULL, agno, &agbp);
> -		if (error)
> -			break;
> -		/*
> -		 * Allocate and initialize a btree cursor for ialloc btree.
> -		 */
> -		cur = xfs_inobt_init_cursor(mp, NULL, agbp, agno,
> -					    XFS_BTNUM_INO);
> -		if (agino > 0) {
> -			/*
> -			 * In the middle of an allocation group, we need to get
> -			 * the remainder of the chunk we're in.
> -			 */
> -			struct xfs_inobt_rec_incore	r;
> -
> -			error = xfs_bulkstat_grab_ichunk(cur, agino, &icount, &r);
> -			if (error)
> -				goto del_cursor;
> -			if (icount) {
> -				irbp->ir_startino = r.ir_startino;
> -				irbp->ir_holemask = r.ir_holemask;
> -				irbp->ir_count = r.ir_count;
> -				irbp->ir_freecount = r.ir_freecount;
> -				irbp->ir_free = r.ir_free;
> -				irbp++;
> -			}
> -			/* Increment to the next record */
> -			error = xfs_btree_increment(cur, 0, &stat);
> -		} else {
> -			/* Start of ag.  Lookup the first inode chunk */
> -			error = xfs_inobt_lookup(cur, 0, XFS_LOOKUP_GE, &stat);
> -		}
> -		if (error || stat == 0) {
> -			end_of_ag = true;
> -			goto del_cursor;
> -		}
> -
> -		/*
> -		 * Loop through inode btree records in this ag,
> -		 * until we run out of inodes or space in the buffer.
> -		 */
> -		while (irbp < irbufend && icount < ubcount) {
> -			struct xfs_inobt_rec_incore	r;
> -
> -			error = xfs_inobt_get_rec(cur, &r, &stat);
> -			if (error || stat == 0) {
> -				end_of_ag = true;
> -				goto del_cursor;
> -			}
> -
> -			/*
> -			 * If this chunk has any allocated inodes, save it.
> -			 * Also start read-ahead now for this chunk.
> -			 */
> -			if (r.ir_freecount < r.ir_count) {
> -				xfs_bulkstat_ichunk_ra(mp, agno, &r);
> -				irbp->ir_startino = r.ir_startino;
> -				irbp->ir_holemask = r.ir_holemask;
> -				irbp->ir_count = r.ir_count;
> -				irbp->ir_freecount = r.ir_freecount;
> -				irbp->ir_free = r.ir_free;
> -				irbp++;
> -				icount += r.ir_count - r.ir_freecount;
> -			}
> -			error = xfs_btree_increment(cur, 0, &stat);
> -			if (error || stat == 0) {
> -				end_of_ag = true;
> -				goto del_cursor;
> -			}
> -			cond_resched();
> -		}
> -
> -		/*
> -		 * Drop the btree buffers and the agi buffer as we can't hold any
> -		 * of the locks these represent when calling iget. If there is a
> -		 * pending error, then we are done.
> -		 */
> -del_cursor:
> -		xfs_btree_del_cursor(cur, error);
> -		xfs_buf_relse(agbp);
> -		if (error)
> -			break;
> -		/*
> -		 * Now format all the good inodes into the user's buffer. The
> -		 * call to xfs_bulkstat_ag_ichunk() sets up the agino pointer
> -		 * for the next loop iteration.
> -		 */
> -		irbufend = irbp;
> -		for (irbp = irbuf;
> -		     irbp < irbufend && ac.ac_ubleft >= statstruct_size;
> -		     irbp++) {
> -			error = xfs_bulkstat_ag_ichunk(mp, agno, irbp,
> -					formatter, statstruct_size, &ac,
> -					&agino);
> -			if (error)
> -				break;
> -
> -			cond_resched();
> -		}
> -
> -		/*
> -		 * If we've run out of space or had a formatting error, we
> -		 * are now done
> -		 */
> -		if (ac.ac_ubleft < statstruct_size || error)
> -			break;
> -
> -		if (end_of_ag) {
> -			agno++;
> -			agino = 0;
> -		}
> -	}
> -	/*
> -	 * Done, we're either out of filesystem or space to put the data.
> -	 */
> -	kmem_free(irbuf);
> -	*ubcountp = ac.ac_ubelem;
> +	kmem_free(bc.buf);
>  
>  	/*
>  	 * We found some inodes, so clear the error status and return them.
> @@ -509,17 +356,9 @@ xfs_bulkstat(
>  	 * triggered again and propagated to userspace as there will be no
>  	 * formatted inodes in the buffer.
>  	 */
> -	if (ac.ac_ubelem)
> +	if (breq->ocount > 0)
>  		error = 0;
>  
> -	/*
> -	 * If we ran out of filesystem, lastino will point off the end of
> -	 * the filesystem so the next call will return immediately.
> -	 */
> -	*lastinop = XFS_AGINO_TO_INO(mp, agno, agino);
> -	if (agno >= mp->m_sb.sb_agcount)
> -		*done = 1;
> -
>  	return error;
>  }
>  
> diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
> index 369e3f159d4e..7c5f1df360e6 100644
> --- a/fs/xfs/xfs_itable.h
> +++ b/fs/xfs/xfs_itable.h
> @@ -5,63 +5,46 @@
>  #ifndef __XFS_ITABLE_H__
>  #define	__XFS_ITABLE_H__
>  
> -/*
> - * xfs_bulkstat() is used to fill in xfs_bstat structures as well as dm_stat
> - * structures (by the dmi library). This is a pointer to a formatter function
> - * that will iget the inode and fill in the appropriate structure.
> - * see xfs_bulkstat_one() and xfs_dm_bulkstat_one() in dmapi_xfs.c
> - */
> -typedef int (*bulkstat_one_pf)(struct xfs_mount	*mp,
> -			       xfs_ino_t	ino,
> -			       void		__user *buffer,
> -			       int		ubsize,
> -			       int		*ubused,
> -			       int		*stat);
> +/* In-memory representation of a userspace request for batch inode data. */
> +struct xfs_ibulk {
> +	struct xfs_mount	*mp;
> +	void __user		*ubuffer; /* user output buffer */
> +	xfs_ino_t		startino; /* start with this inode */
> +	unsigned int		icount;   /* number of elements in ubuffer */
> +	unsigned int		ocount;   /* number of records returned */
> +};
> +
> +/* Return value that means we want to abort the walk. */
> +#define XFS_IBULK_ABORT		(XFS_IWALK_ABORT)
> +
> +/* Return value that means the formatting buffer is now full. */
> +#define XFS_IBULK_BUFFER_FULL	(XFS_IBULK_ABORT + 1)
>  
>  /*
> - * Values for stat return value.
> + * Advance the user buffer pointer by one record of the given size.  If the
> + * buffer is now full, return the appropriate error code.
>   */
> -#define BULKSTAT_RV_NOTHING	0
> -#define BULKSTAT_RV_DIDONE	1
> -#define BULKSTAT_RV_GIVEUP	2
> +static inline int
> +xfs_ibulk_advance(
> +	struct xfs_ibulk	*breq,
> +	size_t			bytes)
> +{
> +	char __user		*b = breq->ubuffer;
> +
> +	breq->ubuffer = b + bytes;
> +	breq->ocount++;
> +	return breq->ocount == breq->icount ? XFS_IBULK_BUFFER_FULL : 0;
> +}
>  
>  /*
>   * Return stat information in bulk (by-inode) for the filesystem.
>   */
> -int					/* error status */
> -xfs_bulkstat(
> -	xfs_mount_t	*mp,		/* mount point for filesystem */
> -	xfs_ino_t	*lastino,	/* last inode returned */
> -	int		*count,		/* size of buffer/count returned */
> -	bulkstat_one_pf formatter,	/* func that'd fill a single buf */
> -	size_t		statstruct_size,/* sizeof struct that we're filling */
> -	char		__user *ubuffer,/* buffer with inode stats */
> -	int		*done);		/* 1 if there are more stats to get */
>  
> -typedef int (*bulkstat_one_fmt_pf)(  /* used size in bytes or negative error */
> -	void			__user *ubuffer, /* buffer to write to */
> -	int			ubsize,		 /* remaining user buffer sz */
> -	int			*ubused,	 /* bytes used by formatter */
> -	const xfs_bstat_t	*buffer);        /* buffer to read from */
> +typedef int (*bulkstat_one_fmt_pf)(struct xfs_ibulk *breq,
> +		const struct xfs_bstat *bstat);
>  
> -int
> -xfs_bulkstat_one_int(
> -	xfs_mount_t		*mp,
> -	xfs_ino_t		ino,
> -	void			__user *buffer,
> -	int			ubsize,
> -	bulkstat_one_fmt_pf	formatter,
> -	int			*ubused,
> -	int			*stat);
> -
> -int
> -xfs_bulkstat_one(
> -	xfs_mount_t		*mp,
> -	xfs_ino_t		ino,
> -	void			__user *buffer,
> -	int			ubsize,
> -	int			*ubused,
> -	int			*stat);
> +int xfs_bulkstat_one(struct xfs_ibulk *breq, bulkstat_one_fmt_pf formatter);
> +int xfs_bulkstat(struct xfs_ibulk *breq, bulkstat_one_fmt_pf formatter);
>  
>  typedef int (*inumbers_fmt_pf)(
>  	void			__user *ubuffer, /* buffer to write to */
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/14] xfs: create simplified inode walk function
  2019-06-13 16:27   ` Brian Foster
@ 2019-06-13 18:06     ` Darrick J. Wong
  2019-06-13 18:07       ` Darrick J. Wong
  0 siblings, 1 reply; 33+ messages in thread
From: Darrick J. Wong @ 2019-06-13 18:06 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Thu, Jun 13, 2019 at 12:27:06PM -0400, Brian Foster wrote:
> On Tue, Jun 11, 2019 at 11:47:44PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create a new iterator function to simplify walking inodes in an XFS
> > filesystem.  This new iterator will replace the existing open-coded
> > walking that goes on in various places.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/Makefile                  |    1 
> >  fs/xfs/libxfs/xfs_ialloc_btree.c |   36 +++
> >  fs/xfs/libxfs/xfs_ialloc_btree.h |    3 
> >  fs/xfs/xfs_itable.c              |    5 
> >  fs/xfs/xfs_itable.h              |    8 +
> >  fs/xfs/xfs_iwalk.c               |  418 ++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/xfs_iwalk.h               |   19 ++
> >  fs/xfs/xfs_trace.h               |   40 ++++
> >  8 files changed, 524 insertions(+), 6 deletions(-)
> >  create mode 100644 fs/xfs/xfs_iwalk.c
> >  create mode 100644 fs/xfs/xfs_iwalk.h
> > 
> > 
> ...
> > diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
> > new file mode 100644
> > index 000000000000..49289588413f
> > --- /dev/null
> > +++ b/fs/xfs/xfs_iwalk.c
> > @@ -0,0 +1,418 @@
> ...
> > +/* Allocate memory for a walk. */
> > +STATIC int
> > +xfs_iwalk_alloc(
> > +	struct xfs_iwalk_ag	*iwag)
> > +{
> > +	size_t			size;
> > +
> > +	ASSERT(iwag->recs == NULL);
> > +	iwag->nr_recs = 0;
> > +
> > +	/* Allocate a prefetch buffer for inobt records. */
> > +	size = iwag->sz_recs * sizeof(struct xfs_inobt_rec_incore);
> > +	iwag->recs = kmem_alloc(size, KM_MAYFAIL);
> > +	if (iwag->recs == NULL)
> > +		return -ENOMEM;
> > +
> > +	return 0;
> > +}
> > +
> > +/* Free memory we allocated for a walk. */
> > +STATIC void
> > +xfs_iwalk_free(
> > +	struct xfs_iwalk_ag	*iwag)
> > +{
> > +	kmem_free(iwag->recs);
> 
> It might be a good idea to ->recs = NULL here since the alloc call
> asserts for that (if any future code happens to free and realloc the
> recs buffer for whatever reason).
> 
> > +}
> > +
> ...
> > +/* Walk all inodes in a single AG, from @iwag->startino to the end of the AG. */
> > +STATIC int
> > +xfs_iwalk_ag(
> > +	struct xfs_iwalk_ag		*iwag)
> > +{
> > +	struct xfs_mount		*mp = iwag->mp;
> > +	struct xfs_trans		*tp = iwag->tp;
> > +	struct xfs_buf			*agi_bp = NULL;
> > +	struct xfs_btree_cur		*cur = NULL;
> > +	xfs_agnumber_t			agno;
> > +	xfs_agino_t			agino;
> > +	int				has_more;
> > +	int				error = 0;
> > +
> > +	/* Set up our cursor at the right place in the inode btree. */
> > +	agno = XFS_INO_TO_AGNO(mp, iwag->startino);
> > +	agino = XFS_INO_TO_AGINO(mp, iwag->startino);
> > +	error = xfs_iwalk_ag_start(iwag, agno, agino, &cur, &agi_bp, &has_more);
> > +
> > +	while (!error && has_more) {
> > +		struct xfs_inobt_rec_incore	*irec;
> > +
> > +		cond_resched();
> > +
> > +		/* Fetch the inobt record. */
> > +		irec = &iwag->recs[iwag->nr_recs];
> > +		error = xfs_inobt_get_rec(cur, irec, &has_more);
> > +		if (error || !has_more)
> > +			break;
> > +
> > +		/* No allocated inodes in this chunk; skip it. */
> > +		if (irec->ir_freecount == irec->ir_count) {
> > +			error = xfs_btree_increment(cur, 0, &has_more);
> > +			if (error)
> > +				break;
> > +			continue;
> > +		}
> > +
> > +		/*
> > +		 * Start readahead for this inode chunk in anticipation of
> > +		 * walking the inodes.
> > +		 */
> > +		xfs_bulkstat_ichunk_ra(mp, agno, irec);
> > +
> > +		/*
> > +		 * If there's space in the buffer for more records, increment
> > +		 * the btree cursor and grab more.
> > +		 */
> > +		if (++iwag->nr_recs < iwag->sz_recs) {
> > +			error = xfs_btree_increment(cur, 0, &has_more);
> > +			if (error || !has_more)
> > +				break;
> > +			continue;
> > +		}
> > +
> > +		/*
> > +		 * Otherwise, we need to save cursor state and run the callback
> > +		 * function on the cached records.  The run_callbacks function
> > +		 * is supposed to return a cursor pointing to the record where
> > +		 * we would be if we had been able to increment like above.
> > +		 */
> > +		has_more = true;
> 
> has_more should always be true if we get here right? If so, perhaps
> better to replace this with ASSERT(has_more).
> 
> > +		error = xfs_iwalk_run_callbacks(iwag, agno, &cur, &agi_bp,
> > +				&has_more);
> > +	}
> > +
> > +	if (iwag->nr_recs == 0 || error)
> > +		goto out;
> > +
> > +	/* Walk the unprocessed records in the cache. */
> > +	error = xfs_iwalk_run_callbacks(iwag, agno, &cur, &agi_bp, &has_more);
> > +
> > +out:
> > +	xfs_iwalk_del_inobt(tp, &cur, &agi_bp, error);
> > +	return error;
> > +}
> > +
> > +/*
> > + * Given the number of inodes to prefetch, set the number of inobt records that
> > + * we cache in memory, which controls the number of inodes we try to read
> > + * ahead.
> > + */
> > +static inline void
> > +xfs_iwalk_set_prefetch(
> > +	struct xfs_iwalk_ag	*iwag,
> > +	unsigned int		max_prefetch)
> > +{
> > +	/*
> > +	 * Default to 4096 bytes' worth of inobt records; this should be plenty
> > +	 * of inodes to read ahead.  This number was chosen so that the cache
> > +	 * is never more than a single memory page and the amount of inode
> > +	 * readahead is limited to to 16k inodes regardless of CPU:
> > +	 *
> > +	 * 4096 bytes / 16 bytes per inobt record = 256 inobt records
> > +	 * 256 inobt records * 64 inodes per record = 16384 inodes
> > +	 * 16384 inodes * 512 bytes per inode(?) = 8MB of inode readahead
> > +	 */
> > +	iwag->sz_recs = 4096 / sizeof(struct xfs_inobt_rec_incore);
> > +
> 
> So we decided not to preserve current readahead behavior in this patch?

I sent this patch before I received your reply. :(

The current version of this patch restores the (4 * PAGE_SIZE) behavior,
and a new patch immediately afterwards replaces it with better logic.
"better" is where we allow prefetch up to 2048 inodes and use the
(admittedly sparse) amount of information gathered so far about average
inode chunk free factors to guess at how many inobt records to cache.

> > +	/*
> > +	 * If the caller gives us a desired prefetch amount, round it up to
> > +	 * an even inode chunk and cap it as defined previously.
> > +	 */
> > +	if (max_prefetch) {
> > +		unsigned int	nr;
> > +
> > +		nr = round_up(max_prefetch, XFS_INODES_PER_CHUNK) /
> > +				XFS_INODES_PER_CHUNK;
> > +		iwag->sz_recs = min_t(unsigned int, iwag->sz_recs, nr);
> 
> This is comparing the record count calculated above with max_prefetch,
> which the rounding just above suggests is in inodes. BTW, could we add a
> one line /* prefetch in inodes */ comment on the max_prefetch parameter
> line at the top of the function?

I renamed the parameter "inode_records", FWIW.
> 
> Aside from those nits the rest looks good to me.

<nod> Thanks for review!

(Oh, more replies are slowly wandering in...)

--D

> 
> Brian
> 
> > +	}
> > +
> > +	/*
> > +	 * Allocate enough space to prefetch at least two records so that we
> > +	 * can cache both the inobt record where the iwalk started and the next
> > +	 * record.  This simplifies the AG inode walk loop setup code.
> > +	 */
> > +	iwag->sz_recs = max_t(unsigned int, iwag->sz_recs, 2);
> > +}
> > +
> > +/*
> > + * Walk all inodes in the filesystem starting from @startino.  The @iwalk_fn
> > + * will be called for each allocated inode, being passed the inode's number and
> > + * @data.  @max_prefetch controls how many inobt records' worth of inodes we
> > + * try to readahead.
> > + */
> > +int
> > +xfs_iwalk(
> > +	struct xfs_mount	*mp,
> > +	struct xfs_trans	*tp,
> > +	xfs_ino_t		startino,
> > +	xfs_iwalk_fn		iwalk_fn,
> > +	unsigned int		max_prefetch,
> > +	void			*data)
> > +{
> > +	struct xfs_iwalk_ag	iwag = {
> > +		.mp		= mp,
> > +		.tp		= tp,
> > +		.iwalk_fn	= iwalk_fn,
> > +		.data		= data,
> > +		.startino	= startino,
> > +	};
> > +	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, startino);
> > +	int			error;
> > +
> > +	ASSERT(agno < mp->m_sb.sb_agcount);
> > +
> > +	xfs_iwalk_set_prefetch(&iwag, max_prefetch);
> > +	error = xfs_iwalk_alloc(&iwag);
> > +	if (error)
> > +		return error;
> > +
> > +	for (; agno < mp->m_sb.sb_agcount; agno++) {
> > +		error = xfs_iwalk_ag(&iwag);
> > +		if (error)
> > +			break;
> > +		iwag.startino = XFS_AGINO_TO_INO(mp, agno + 1, 0);
> > +	}
> > +
> > +	xfs_iwalk_free(&iwag);
> > +	return error;
> > +}
> > diff --git a/fs/xfs/xfs_iwalk.h b/fs/xfs/xfs_iwalk.h
> > new file mode 100644
> > index 000000000000..9e762e31dadc
> > --- /dev/null
> > +++ b/fs/xfs/xfs_iwalk.h
> > @@ -0,0 +1,19 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +/*
> > + * Copyright (C) 2019 Oracle.  All Rights Reserved.
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + */
> > +#ifndef __XFS_IWALK_H__
> > +#define __XFS_IWALK_H__
> > +
> > +/* Walk all inodes in the filesystem starting from @startino. */
> > +typedef int (*xfs_iwalk_fn)(struct xfs_mount *mp, struct xfs_trans *tp,
> > +			    xfs_ino_t ino, void *data);
> > +/* Return values for xfs_iwalk_fn. */
> > +#define XFS_IWALK_CONTINUE	(XFS_ITER_CONTINUE)
> > +#define XFS_IWALK_ABORT		(XFS_ITER_ABORT)
> > +
> > +int xfs_iwalk(struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t startino,
> > +		xfs_iwalk_fn iwalk_fn, unsigned int max_prefetch, void *data);
> > +
> > +#endif /* __XFS_IWALK_H__ */
> > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > index 2464ea351f83..f9bb1d50bc0e 100644
> > --- a/fs/xfs/xfs_trace.h
> > +++ b/fs/xfs/xfs_trace.h
> > @@ -3516,6 +3516,46 @@ DEFINE_EVENT(xfs_inode_corrupt_class, name,	\
> >  DEFINE_INODE_CORRUPT_EVENT(xfs_inode_mark_sick);
> >  DEFINE_INODE_CORRUPT_EVENT(xfs_inode_mark_healthy);
> >  
> > +TRACE_EVENT(xfs_iwalk_ag,
> > +	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
> > +		 xfs_agino_t startino),
> > +	TP_ARGS(mp, agno, startino),
> > +	TP_STRUCT__entry(
> > +		__field(dev_t, dev)
> > +		__field(xfs_agnumber_t, agno)
> > +		__field(xfs_agino_t, startino)
> > +	),
> > +	TP_fast_assign(
> > +		__entry->dev = mp->m_super->s_dev;
> > +		__entry->agno = agno;
> > +		__entry->startino = startino;
> > +	),
> > +	TP_printk("dev %d:%d agno %d startino %u",
> > +		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->agno,
> > +		  __entry->startino)
> > +)
> > +
> > +TRACE_EVENT(xfs_iwalk_ag_rec,
> > +	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
> > +		 struct xfs_inobt_rec_incore *irec),
> > +	TP_ARGS(mp, agno, irec),
> > +	TP_STRUCT__entry(
> > +		__field(dev_t, dev)
> > +		__field(xfs_agnumber_t, agno)
> > +		__field(xfs_agino_t, startino)
> > +		__field(uint64_t, freemask)
> > +	),
> > +	TP_fast_assign(
> > +		__entry->dev = mp->m_super->s_dev;
> > +		__entry->agno = agno;
> > +		__entry->startino = irec->ir_startino;
> > +		__entry->freemask = irec->ir_free;
> > +	),
> > +	TP_printk("dev %d:%d agno %d startino %u freemask 0x%llx",
> > +		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->agno,
> > +		  __entry->startino, __entry->freemask)
> > +)
> > +
> >  #endif /* _TRACE_XFS_H */
> >  
> >  #undef TRACE_INCLUDE_PATH
> > 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/14] xfs: create simplified inode walk function
  2019-06-13 18:06     ` Darrick J. Wong
@ 2019-06-13 18:07       ` Darrick J. Wong
  0 siblings, 0 replies; 33+ messages in thread
From: Darrick J. Wong @ 2019-06-13 18:07 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Thu, Jun 13, 2019 at 11:06:09AM -0700, Darrick J. Wong wrote:
> On Thu, Jun 13, 2019 at 12:27:06PM -0400, Brian Foster wrote:
> > On Tue, Jun 11, 2019 at 11:47:44PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Create a new iterator function to simplify walking inodes in an XFS
> > > filesystem.  This new iterator will replace the existing open-coded
> > > walking that goes on in various places.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  fs/xfs/Makefile                  |    1 
> > >  fs/xfs/libxfs/xfs_ialloc_btree.c |   36 +++
> > >  fs/xfs/libxfs/xfs_ialloc_btree.h |    3 
> > >  fs/xfs/xfs_itable.c              |    5 
> > >  fs/xfs/xfs_itable.h              |    8 +
> > >  fs/xfs/xfs_iwalk.c               |  418 ++++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/xfs_iwalk.h               |   19 ++
> > >  fs/xfs/xfs_trace.h               |   40 ++++
> > >  8 files changed, 524 insertions(+), 6 deletions(-)
> > >  create mode 100644 fs/xfs/xfs_iwalk.c
> > >  create mode 100644 fs/xfs/xfs_iwalk.h
> > > 
> > > 
> > ...
> > > diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
> > > new file mode 100644
> > > index 000000000000..49289588413f
> > > --- /dev/null
> > > +++ b/fs/xfs/xfs_iwalk.c
> > > @@ -0,0 +1,418 @@
> > ...
> > > +/* Allocate memory for a walk. */
> > > +STATIC int
> > > +xfs_iwalk_alloc(
> > > +	struct xfs_iwalk_ag	*iwag)
> > > +{
> > > +	size_t			size;
> > > +
> > > +	ASSERT(iwag->recs == NULL);
> > > +	iwag->nr_recs = 0;
> > > +
> > > +	/* Allocate a prefetch buffer for inobt records. */
> > > +	size = iwag->sz_recs * sizeof(struct xfs_inobt_rec_incore);
> > > +	iwag->recs = kmem_alloc(size, KM_MAYFAIL);
> > > +	if (iwag->recs == NULL)
> > > +		return -ENOMEM;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/* Free memory we allocated for a walk. */
> > > +STATIC void
> > > +xfs_iwalk_free(
> > > +	struct xfs_iwalk_ag	*iwag)
> > > +{
> > > +	kmem_free(iwag->recs);
> > 
> > It might be a good idea to ->recs = NULL here since the alloc call
> > asserts for that (if any future code happens to free and realloc the
> > recs buffer for whatever reason).

Fixed.

> > > +}
> > > +
> > ...
> > > +/* Walk all inodes in a single AG, from @iwag->startino to the end of the AG. */
> > > +STATIC int
> > > +xfs_iwalk_ag(
> > > +	struct xfs_iwalk_ag		*iwag)
> > > +{
> > > +	struct xfs_mount		*mp = iwag->mp;
> > > +	struct xfs_trans		*tp = iwag->tp;
> > > +	struct xfs_buf			*agi_bp = NULL;
> > > +	struct xfs_btree_cur		*cur = NULL;
> > > +	xfs_agnumber_t			agno;
> > > +	xfs_agino_t			agino;
> > > +	int				has_more;
> > > +	int				error = 0;
> > > +
> > > +	/* Set up our cursor at the right place in the inode btree. */
> > > +	agno = XFS_INO_TO_AGNO(mp, iwag->startino);
> > > +	agino = XFS_INO_TO_AGINO(mp, iwag->startino);
> > > +	error = xfs_iwalk_ag_start(iwag, agno, agino, &cur, &agi_bp, &has_more);
> > > +
> > > +	while (!error && has_more) {
> > > +		struct xfs_inobt_rec_incore	*irec;
> > > +
> > > +		cond_resched();
> > > +
> > > +		/* Fetch the inobt record. */
> > > +		irec = &iwag->recs[iwag->nr_recs];
> > > +		error = xfs_inobt_get_rec(cur, irec, &has_more);
> > > +		if (error || !has_more)
> > > +			break;
> > > +
> > > +		/* No allocated inodes in this chunk; skip it. */
> > > +		if (irec->ir_freecount == irec->ir_count) {
> > > +			error = xfs_btree_increment(cur, 0, &has_more);
> > > +			if (error)
> > > +				break;
> > > +			continue;
> > > +		}
> > > +
> > > +		/*
> > > +		 * Start readahead for this inode chunk in anticipation of
> > > +		 * walking the inodes.
> > > +		 */
> > > +		xfs_bulkstat_ichunk_ra(mp, agno, irec);
> > > +
> > > +		/*
> > > +		 * If there's space in the buffer for more records, increment
> > > +		 * the btree cursor and grab more.
> > > +		 */
> > > +		if (++iwag->nr_recs < iwag->sz_recs) {
> > > +			error = xfs_btree_increment(cur, 0, &has_more);
> > > +			if (error || !has_more)
> > > +				break;
> > > +			continue;
> > > +		}
> > > +
> > > +		/*
> > > +		 * Otherwise, we need to save cursor state and run the callback
> > > +		 * function on the cached records.  The run_callbacks function
> > > +		 * is supposed to return a cursor pointing to the record where
> > > +		 * we would be if we had been able to increment like above.
> > > +		 */
> > > +		has_more = true;
> > 
> > has_more should always be true if we get here right? If so, perhaps
> > better to replace this with ASSERT(has_more).

Right; fixed.

> > > +		error = xfs_iwalk_run_callbacks(iwag, agno, &cur, &agi_bp,
> > > +				&has_more);
> > > +	}
> > > +
> > > +	if (iwag->nr_recs == 0 || error)
> > > +		goto out;
> > > +
> > > +	/* Walk the unprocessed records in the cache. */
> > > +	error = xfs_iwalk_run_callbacks(iwag, agno, &cur, &agi_bp, &has_more);
> > > +
> > > +out:
> > > +	xfs_iwalk_del_inobt(tp, &cur, &agi_bp, error);
> > > +	return error;
> > > +}
> > > +
> > > +/*
> > > + * Given the number of inodes to prefetch, set the number of inobt records that
> > > + * we cache in memory, which controls the number of inodes we try to read
> > > + * ahead.
> > > + */
> > > +static inline void
> > > +xfs_iwalk_set_prefetch(
> > > +	struct xfs_iwalk_ag	*iwag,
> > > +	unsigned int		max_prefetch)
> > > +{
> > > +	/*
> > > +	 * Default to 4096 bytes' worth of inobt records; this should be plenty
> > > +	 * of inodes to read ahead.  This number was chosen so that the cache
> > > +	 * is never more than a single memory page and the amount of inode
> > > +	 * readahead is limited to to 16k inodes regardless of CPU:
> > > +	 *
> > > +	 * 4096 bytes / 16 bytes per inobt record = 256 inobt records
> > > +	 * 256 inobt records * 64 inodes per record = 16384 inodes
> > > +	 * 16384 inodes * 512 bytes per inode(?) = 8MB of inode readahead
> > > +	 */
> > > +	iwag->sz_recs = 4096 / sizeof(struct xfs_inobt_rec_incore);
> > > +
> > 
> > So we decided not to preserve current readahead behavior in this patch?
> 
> I sent this patch before I received your reply. :(

...and hit send before replying to everything.

--D

> The current version of this patch restores the (4 * PAGE_SIZE) behavior,
> and a new patch immediately afterwards replaces it with better logic.
> "better" is where we allow prefetch up to 2048 inodes and use the
> (admittedly sparse) amount of information gathered so far about average
> inode chunk free factors to guess at how many inobt records to cache.
> 
> > > +	/*
> > > +	 * If the caller gives us a desired prefetch amount, round it up to
> > > +	 * an even inode chunk and cap it as defined previously.
> > > +	 */
> > > +	if (max_prefetch) {
> > > +		unsigned int	nr;
> > > +
> > > +		nr = round_up(max_prefetch, XFS_INODES_PER_CHUNK) /
> > > +				XFS_INODES_PER_CHUNK;
> > > +		iwag->sz_recs = min_t(unsigned int, iwag->sz_recs, nr);
> > 
> > This is comparing the record count calculated above with max_prefetch,
> > which the rounding just above suggests is in inodes. BTW, could we add a
> > one line /* prefetch in inodes */ comment on the max_prefetch parameter
> > line at the top of the function?
> 
> I renamed the parameter "inode_records", FWIW.
> > 
> > Aside from those nits the rest looks good to me.
> 
> <nod> Thanks for review!
> 
> (Oh, more replies are slowly wandering in...)
> 
> --D
> 
> > 
> > Brian
> > 
> > > +	}
> > > +
> > > +	/*
> > > +	 * Allocate enough space to prefetch at least two records so that we
> > > +	 * can cache both the inobt record where the iwalk started and the next
> > > +	 * record.  This simplifies the AG inode walk loop setup code.
> > > +	 */
> > > +	iwag->sz_recs = max_t(unsigned int, iwag->sz_recs, 2);
> > > +}
> > > +
> > > +/*
> > > + * Walk all inodes in the filesystem starting from @startino.  The @iwalk_fn
> > > + * will be called for each allocated inode, being passed the inode's number and
> > > + * @data.  @max_prefetch controls how many inobt records' worth of inodes we
> > > + * try to readahead.
> > > + */
> > > +int
> > > +xfs_iwalk(
> > > +	struct xfs_mount	*mp,
> > > +	struct xfs_trans	*tp,
> > > +	xfs_ino_t		startino,
> > > +	xfs_iwalk_fn		iwalk_fn,
> > > +	unsigned int		max_prefetch,
> > > +	void			*data)
> > > +{
> > > +	struct xfs_iwalk_ag	iwag = {
> > > +		.mp		= mp,
> > > +		.tp		= tp,
> > > +		.iwalk_fn	= iwalk_fn,
> > > +		.data		= data,
> > > +		.startino	= startino,
> > > +	};
> > > +	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, startino);
> > > +	int			error;
> > > +
> > > +	ASSERT(agno < mp->m_sb.sb_agcount);
> > > +
> > > +	xfs_iwalk_set_prefetch(&iwag, max_prefetch);
> > > +	error = xfs_iwalk_alloc(&iwag);
> > > +	if (error)
> > > +		return error;
> > > +
> > > +	for (; agno < mp->m_sb.sb_agcount; agno++) {
> > > +		error = xfs_iwalk_ag(&iwag);
> > > +		if (error)
> > > +			break;
> > > +		iwag.startino = XFS_AGINO_TO_INO(mp, agno + 1, 0);
> > > +	}
> > > +
> > > +	xfs_iwalk_free(&iwag);
> > > +	return error;
> > > +}
> > > diff --git a/fs/xfs/xfs_iwalk.h b/fs/xfs/xfs_iwalk.h
> > > new file mode 100644
> > > index 000000000000..9e762e31dadc
> > > --- /dev/null
> > > +++ b/fs/xfs/xfs_iwalk.h
> > > @@ -0,0 +1,19 @@
> > > +// SPDX-License-Identifier: GPL-2.0+
> > > +/*
> > > + * Copyright (C) 2019 Oracle.  All Rights Reserved.
> > > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > > + */
> > > +#ifndef __XFS_IWALK_H__
> > > +#define __XFS_IWALK_H__
> > > +
> > > +/* Walk all inodes in the filesystem starting from @startino. */
> > > +typedef int (*xfs_iwalk_fn)(struct xfs_mount *mp, struct xfs_trans *tp,
> > > +			    xfs_ino_t ino, void *data);
> > > +/* Return values for xfs_iwalk_fn. */
> > > +#define XFS_IWALK_CONTINUE	(XFS_ITER_CONTINUE)
> > > +#define XFS_IWALK_ABORT		(XFS_ITER_ABORT)
> > > +
> > > +int xfs_iwalk(struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t startino,
> > > +		xfs_iwalk_fn iwalk_fn, unsigned int max_prefetch, void *data);
> > > +
> > > +#endif /* __XFS_IWALK_H__ */
> > > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > > index 2464ea351f83..f9bb1d50bc0e 100644
> > > --- a/fs/xfs/xfs_trace.h
> > > +++ b/fs/xfs/xfs_trace.h
> > > @@ -3516,6 +3516,46 @@ DEFINE_EVENT(xfs_inode_corrupt_class, name,	\
> > >  DEFINE_INODE_CORRUPT_EVENT(xfs_inode_mark_sick);
> > >  DEFINE_INODE_CORRUPT_EVENT(xfs_inode_mark_healthy);
> > >  
> > > +TRACE_EVENT(xfs_iwalk_ag,
> > > +	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
> > > +		 xfs_agino_t startino),
> > > +	TP_ARGS(mp, agno, startino),
> > > +	TP_STRUCT__entry(
> > > +		__field(dev_t, dev)
> > > +		__field(xfs_agnumber_t, agno)
> > > +		__field(xfs_agino_t, startino)
> > > +	),
> > > +	TP_fast_assign(
> > > +		__entry->dev = mp->m_super->s_dev;
> > > +		__entry->agno = agno;
> > > +		__entry->startino = startino;
> > > +	),
> > > +	TP_printk("dev %d:%d agno %d startino %u",
> > > +		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->agno,
> > > +		  __entry->startino)
> > > +)
> > > +
> > > +TRACE_EVENT(xfs_iwalk_ag_rec,
> > > +	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
> > > +		 struct xfs_inobt_rec_incore *irec),
> > > +	TP_ARGS(mp, agno, irec),
> > > +	TP_STRUCT__entry(
> > > +		__field(dev_t, dev)
> > > +		__field(xfs_agnumber_t, agno)
> > > +		__field(xfs_agino_t, startino)
> > > +		__field(uint64_t, freemask)
> > > +	),
> > > +	TP_fast_assign(
> > > +		__entry->dev = mp->m_super->s_dev;
> > > +		__entry->agno = agno;
> > > +		__entry->startino = irec->ir_startino;
> > > +		__entry->freemask = irec->ir_free;
> > > +	),
> > > +	TP_printk("dev %d:%d agno %d startino %u freemask 0x%llx",
> > > +		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->agno,
> > > +		  __entry->startino, __entry->freemask)
> > > +)
> > > +
> > >  #endif /* _TRACE_XFS_H */
> > >  
> > >  #undef TRACE_INCLUDE_PATH
> > > 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/14] xfs: convert bulkstat to new iwalk infrastructure
  2019-06-13 16:31   ` Brian Foster
@ 2019-06-13 18:12     ` Darrick J. Wong
  2019-06-13 23:03       ` Darrick J. Wong
  0 siblings, 1 reply; 33+ messages in thread
From: Darrick J. Wong @ 2019-06-13 18:12 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Thu, Jun 13, 2019 at 12:31:54PM -0400, Brian Foster wrote:
> On Tue, Jun 11, 2019 at 11:48:09PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create a new ibulk structure incore to help us deal with bulk inode stat
> > state tracking and then convert the bulkstat code to use the new iwalk
> > iterator.  This disentangles inode walking from bulk stat control for
> > simpler code and enables us to isolate the formatter functions to the
> > ioctl handling code.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/xfs_ioctl.c   |   70 ++++++--
> >  fs/xfs/xfs_ioctl.h   |    5 +
> >  fs/xfs/xfs_ioctl32.c |   93 ++++++-----
> >  fs/xfs/xfs_itable.c  |  431 ++++++++++++++++----------------------------------
> >  fs/xfs/xfs_itable.h  |   79 ++++-----
> >  5 files changed, 272 insertions(+), 406 deletions(-)
> > 
> > 
> ...
> > diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> > index 814ffe6fbab7..5d1c143bac18 100644
> > --- a/fs/xfs/xfs_ioctl32.c
> > +++ b/fs/xfs/xfs_ioctl32.c
> ...
> > @@ -284,38 +266,59 @@ xfs_compat_ioc_bulkstat(
> >  		return -EFAULT;
> >  	bulkreq.ocount = compat_ptr(addr);
> >  
> > -	if (copy_from_user(&inlast, bulkreq.lastip, sizeof(__s64)))
> > +	if (copy_from_user(&lastino, bulkreq.lastip, sizeof(__s64)))
> >  		return -EFAULT;
> > +	breq.startino = lastino + 1;
> >  
> 
> Spurious assignment?

Fixed.

> > -	if ((count = bulkreq.icount) <= 0)
> > +	if (bulkreq.icount <= 0)
> >  		return -EINVAL;
> >  
> >  	if (bulkreq.ubuffer == NULL)
> >  		return -EINVAL;
> >  
> > +	breq.ubuffer = bulkreq.ubuffer;
> > +	breq.icount = bulkreq.icount;
> > +
> ...
> > diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> > index 3ca1c454afe6..58e411e11d6c 100644
> > --- a/fs/xfs/xfs_itable.c
> > +++ b/fs/xfs/xfs_itable.c
> > @@ -14,47 +14,68 @@
> ...
> > +STATIC int
> >  xfs_bulkstat_one_int(
> > -	struct xfs_mount	*mp,		/* mount point for filesystem */
> > -	xfs_ino_t		ino,		/* inode to get data for */
> > -	void __user		*buffer,	/* buffer to place output in */
> > -	int			ubsize,		/* size of buffer */
> > -	bulkstat_one_fmt_pf	formatter,	/* formatter, copy to user */
> > -	int			*ubused,	/* bytes used by me */
> > -	int			*stat)		/* BULKSTAT_RV_... */
> > +	struct xfs_mount	*mp,
> > +	struct xfs_trans	*tp,
> > +	xfs_ino_t		ino,
> > +	void			*data)
> 
> There's no need for a void pointer here given the current usage. We
> might as well pass this as bc (and let the caller cast it, if
> necessary).
> 
> That said, it also looks like the only reason we have the
> xfs_bulkstat_iwalk wrapper caller of this function is to filter out
> certain error values. If those errors are needed for the single inode
> case, we could stick something in the bc to toggle that invalid inode
> filtering behavior and eliminate the need for the wrapper entirely
> (which would pass _one_int() into the iwalk infra directly and require
> retaining the void pointer).

Ok, will do.  That'll help declutter the source file.

> 
> >  {
> > +	struct xfs_bstat_chunk	*bc = data;
> >  	struct xfs_icdinode	*dic;		/* dinode core info pointer */
> >  	struct xfs_inode	*ip;		/* incore inode pointer */
> >  	struct inode		*inode;
> > -	struct xfs_bstat	*buf;		/* return buffer */
> > -	int			error = 0;	/* error value */
> > +	struct xfs_bstat	*buf = bc->buf;
> > +	int			error = -EINVAL;
> >  
> > -	*stat = BULKSTAT_RV_NOTHING;
> > +	if (xfs_internal_inum(mp, ino))
> > +		goto out_advance;
> >  
> > -	if (!buffer || xfs_internal_inum(mp, ino))
> > -		return -EINVAL;
> > -
> > -	buf = kmem_zalloc(sizeof(*buf), KM_SLEEP | KM_MAYFAIL);
> > -	if (!buf)
> > -		return -ENOMEM;
> > -
> > -	error = xfs_iget(mp, NULL, ino,
> > +	error = xfs_iget(mp, tp, ino,
> >  			 (XFS_IGET_DONTCACHE | XFS_IGET_UNTRUSTED),
> >  			 XFS_ILOCK_SHARED, &ip);
> > +	if (error == -ENOENT || error == -EINVAL)
> > +		goto out_advance;
> >  	if (error)
> > -		goto out_free;
> > +		goto out;
> >  
> >  	ASSERT(ip != NULL);
> >  	ASSERT(ip->i_imap.im_blkno != 0);
> > @@ -119,43 +140,56 @@ xfs_bulkstat_one_int(
> >  	xfs_iunlock(ip, XFS_ILOCK_SHARED);
> >  	xfs_irele(ip);
> >  
> > -	error = formatter(buffer, ubsize, ubused, buf);
> > -	if (!error)
> > -		*stat = BULKSTAT_RV_DIDONE;
> > +	error = bc->formatter(bc->breq, buf);
> > +	if (error == XFS_IBULK_BUFFER_FULL) {
> > +		error = XFS_IWALK_ABORT;
> 
> Related to the earlier patch.. is there a need for IBULK_BUFFER_FULL if
> the only user converts it to the generic abort error?

<shrug> I wasn't sure if there was ever going to be a case where the
formatter function wanted to abort for a reason that wasn't a full
buffer... though looking at the bulkstat-v5 patches there aren't any.
I guess I'll just remove BUFFER_FULL, then.

--D

> Most of these comments are minor/aesthetic, so:
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> > +		goto out_advance;
> > +	}
> > +	if (error)
> > +		goto out;
> >  
> > - out_free:
> > -	kmem_free(buf);
> > +out_advance:
> > +	/*
> > +	 * Advance the cursor to the inode that comes after the one we just
> > +	 * looked at.  We want the caller to move along if the bulkstat
> > +	 * information was copied successfully; if we tried to grab the inode
> > +	 * but it's no longer allocated; or if it's internal metadata.
> > +	 */
> > +	bc->breq->startino = ino + 1;
> > +out:
> >  	return error;
> >  }
> >  
> > -/* Return 0 on success or positive error */
> > -STATIC int
> > -xfs_bulkstat_one_fmt(
> > -	void			__user *ubuffer,
> > -	int			ubsize,
> > -	int			*ubused,
> > -	const xfs_bstat_t	*buffer)
> > -{
> > -	if (ubsize < sizeof(*buffer))
> > -		return -ENOMEM;
> > -	if (copy_to_user(ubuffer, buffer, sizeof(*buffer)))
> > -		return -EFAULT;
> > -	if (ubused)
> > -		*ubused = sizeof(*buffer);
> > -	return 0;
> > -}
> > -
> > +/* Bulkstat a single inode. */
> >  int
> >  xfs_bulkstat_one(
> > -	xfs_mount_t	*mp,		/* mount point for filesystem */
> > -	xfs_ino_t	ino,		/* inode number to get data for */
> > -	void		__user *buffer,	/* buffer to place output in */
> > -	int		ubsize,		/* size of buffer */
> > -	int		*ubused,	/* bytes used by me */
> > -	int		*stat)		/* BULKSTAT_RV_... */
> > +	struct xfs_ibulk	*breq,
> > +	bulkstat_one_fmt_pf	formatter)
> >  {
> > -	return xfs_bulkstat_one_int(mp, ino, buffer, ubsize,
> > -				    xfs_bulkstat_one_fmt, ubused, stat);
> > +	struct xfs_bstat_chunk	bc = {
> > +		.formatter	= formatter,
> > +		.breq		= breq,
> > +	};
> > +	int			error;
> > +
> > +	ASSERT(breq->icount == 1);
> > +
> > +	bc.buf = kmem_zalloc(sizeof(struct xfs_bstat), KM_SLEEP | KM_MAYFAIL);
> > +	if (!bc.buf)
> > +		return -ENOMEM;
> > +
> > +	error = xfs_bulkstat_one_int(breq->mp, NULL, breq->startino, &bc);
> > +
> > +	kmem_free(bc.buf);
> > +
> > +	/*
> > +	 * If we reported one inode to userspace then we abort because we hit
> > +	 * the end of the buffer.  Don't leak that back to userspace.
> > +	 */
> > +	if (error == XFS_IWALK_ABORT)
> > +		error = 0;
> > +
> > +	return error;
> >  }
> >  
> >  /*
> > @@ -251,256 +285,69 @@ xfs_bulkstat_grab_ichunk(
> >  
> >  #define XFS_BULKSTAT_UBLEFT(ubleft)	((ubleft) >= statstruct_size)
> >  
> > -struct xfs_bulkstat_agichunk {
> > -	char		__user **ac_ubuffer;/* pointer into user's buffer */
> > -	int		ac_ubleft;	/* bytes left in user's buffer */
> > -	int		ac_ubelem;	/* spaces used in user's buffer */
> > -};
> > -
> > -/*
> > - * Process inodes in chunk with a pointer to a formatter function
> > - * that will iget the inode and fill in the appropriate structure.
> > - */
> >  static int
> > -xfs_bulkstat_ag_ichunk(
> > -	struct xfs_mount		*mp,
> > -	xfs_agnumber_t			agno,
> > -	struct xfs_inobt_rec_incore	*irbp,
> > -	bulkstat_one_pf			formatter,
> > -	size_t				statstruct_size,
> > -	struct xfs_bulkstat_agichunk	*acp,
> > -	xfs_agino_t			*last_agino)
> > +xfs_bulkstat_iwalk(
> > +	struct xfs_mount	*mp,
> > +	struct xfs_trans	*tp,
> > +	xfs_ino_t		ino,
> > +	void			*data)
> >  {
> > -	char				__user **ubufp = acp->ac_ubuffer;
> > -	int				chunkidx;
> > -	int				error = 0;
> > -	xfs_agino_t			agino = irbp->ir_startino;
> > -
> > -	for (chunkidx = 0; chunkidx < XFS_INODES_PER_CHUNK;
> > -	     chunkidx++, agino++) {
> > -		int		fmterror;
> > -		int		ubused;
> > -
> > -		/* inode won't fit in buffer, we are done */
> > -		if (acp->ac_ubleft < statstruct_size)
> > -			break;
> > -
> > -		/* Skip if this inode is free */
> > -		if (XFS_INOBT_MASK(chunkidx) & irbp->ir_free)
> > -			continue;
> > -
> > -		/* Get the inode and fill in a single buffer */
> > -		ubused = statstruct_size;
> > -		error = formatter(mp, XFS_AGINO_TO_INO(mp, agno, agino),
> > -				  *ubufp, acp->ac_ubleft, &ubused, &fmterror);
> > -
> > -		if (fmterror == BULKSTAT_RV_GIVEUP ||
> > -		    (error && error != -ENOENT && error != -EINVAL)) {
> > -			acp->ac_ubleft = 0;
> > -			ASSERT(error);
> > -			break;
> > -		}
> > -
> > -		/* be careful not to leak error if at end of chunk */
> > -		if (fmterror == BULKSTAT_RV_NOTHING || error) {
> > -			error = 0;
> > -			continue;
> > -		}
> > -
> > -		*ubufp += ubused;
> > -		acp->ac_ubleft -= ubused;
> > -		acp->ac_ubelem++;
> > -	}
> > -
> > -	/*
> > -	 * Post-update *last_agino. At this point, agino will always point one
> > -	 * inode past the last inode we processed successfully. Hence we
> > -	 * substract that inode when setting the *last_agino cursor so that we
> > -	 * return the correct cookie to userspace. On the next bulkstat call,
> > -	 * the inode under the lastino cookie will be skipped as we have already
> > -	 * processed it here.
> > -	 */
> > -	*last_agino = agino - 1;
> > +	int			error;
> >  
> > +	error = xfs_bulkstat_one_int(mp, tp, ino, data);
> > +	/* bulkstat just skips over missing inodes */
> > +	if (error == -ENOENT || error == -EINVAL)
> > +		return 0;
> >  	return error;
> >  }
> >  
> >  /*
> > - * Return stat information in bulk (by-inode) for the filesystem.
> > + * Check the incoming lastino parameter.
> > + *
> > + * We allow any inode value that could map to physical space inside the
> > + * filesystem because if there are no inodes there, bulkstat moves on to the
> > + * next chunk.  In other words, the magic agino value of zero takes us to the
> > + * first chunk in the AG, and an agino value past the end of the AG takes us to
> > + * the first chunk in the next AG.
> > + *
> > + * Therefore we can end early if the requested inode is beyond the end of the
> > + * filesystem or doesn't map properly.
> >   */
> > -int					/* error status */
> > -xfs_bulkstat(
> > -	xfs_mount_t		*mp,	/* mount point for filesystem */
> > -	xfs_ino_t		*lastinop, /* last inode returned */
> > -	int			*ubcountp, /* size of buffer/count returned */
> > -	bulkstat_one_pf		formatter, /* func that'd fill a single buf */
> > -	size_t			statstruct_size, /* sizeof struct filling */
> > -	char			__user *ubuffer, /* buffer with inode stats */
> > -	int			*done)	/* 1 if there are more stats to get */
> > +static inline bool
> > +xfs_bulkstat_already_done(
> > +	struct xfs_mount	*mp,
> > +	xfs_ino_t		startino)
> >  {
> > -	xfs_buf_t		*agbp;	/* agi header buffer */
> > -	xfs_agino_t		agino;	/* inode # in allocation group */
> > -	xfs_agnumber_t		agno;	/* allocation group number */
> > -	xfs_btree_cur_t		*cur;	/* btree cursor for ialloc btree */
> > -	xfs_inobt_rec_incore_t	*irbuf;	/* start of irec buffer */
> > -	int			nirbuf;	/* size of irbuf */
> > -	int			ubcount; /* size of user's buffer */
> > -	struct xfs_bulkstat_agichunk ac;
> > -	int			error = 0;
> > +	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, startino);
> > +	xfs_agino_t		agino = XFS_INO_TO_AGINO(mp, startino);
> >  
> > -	/*
> > -	 * Get the last inode value, see if there's nothing to do.
> > -	 */
> > -	agno = XFS_INO_TO_AGNO(mp, *lastinop);
> > -	agino = XFS_INO_TO_AGINO(mp, *lastinop);
> > -	if (agno >= mp->m_sb.sb_agcount ||
> > -	    *lastinop != XFS_AGINO_TO_INO(mp, agno, agino)) {
> > -		*done = 1;
> > -		*ubcountp = 0;
> > -		return 0;
> > -	}
> > +	return agno >= mp->m_sb.sb_agcount ||
> > +	       startino != XFS_AGINO_TO_INO(mp, agno, agino);
> > +}
> >  
> > -	ubcount = *ubcountp; /* statstruct's */
> > -	ac.ac_ubuffer = &ubuffer;
> > -	ac.ac_ubleft = ubcount * statstruct_size; /* bytes */;
> > -	ac.ac_ubelem = 0;
> > +/* Return stat information in bulk (by-inode) for the filesystem. */
> > +int
> > +xfs_bulkstat(
> > +	struct xfs_ibulk	*breq,
> > +	bulkstat_one_fmt_pf	formatter)
> > +{
> > +	struct xfs_bstat_chunk	bc = {
> > +		.formatter	= formatter,
> > +		.breq		= breq,
> > +	};
> > +	int			error;
> >  
> > -	*ubcountp = 0;
> > -	*done = 0;
> > +	if (xfs_bulkstat_already_done(breq->mp, breq->startino))
> > +		return 0;
> >  
> > -	irbuf = kmem_zalloc_large(PAGE_SIZE * 4, KM_SLEEP);
> > -	if (!irbuf)
> > +	bc.buf = kmem_zalloc(sizeof(struct xfs_bstat), KM_SLEEP | KM_MAYFAIL);
> > +	if (!bc.buf)
> >  		return -ENOMEM;
> > -	nirbuf = (PAGE_SIZE * 4) / sizeof(*irbuf);
> >  
> > -	/*
> > -	 * Loop over the allocation groups, starting from the last
> > -	 * inode returned; 0 means start of the allocation group.
> > -	 */
> > -	while (agno < mp->m_sb.sb_agcount) {
> > -		struct xfs_inobt_rec_incore	*irbp = irbuf;
> > -		struct xfs_inobt_rec_incore	*irbufend = irbuf + nirbuf;
> > -		bool				end_of_ag = false;
> > -		int				icount = 0;
> > -		int				stat;
> > +	error = xfs_iwalk(breq->mp, NULL, breq->startino, xfs_bulkstat_iwalk,
> > +			breq->icount, &bc);
> >  
> > -		error = xfs_ialloc_read_agi(mp, NULL, agno, &agbp);
> > -		if (error)
> > -			break;
> > -		/*
> > -		 * Allocate and initialize a btree cursor for ialloc btree.
> > -		 */
> > -		cur = xfs_inobt_init_cursor(mp, NULL, agbp, agno,
> > -					    XFS_BTNUM_INO);
> > -		if (agino > 0) {
> > -			/*
> > -			 * In the middle of an allocation group, we need to get
> > -			 * the remainder of the chunk we're in.
> > -			 */
> > -			struct xfs_inobt_rec_incore	r;
> > -
> > -			error = xfs_bulkstat_grab_ichunk(cur, agino, &icount, &r);
> > -			if (error)
> > -				goto del_cursor;
> > -			if (icount) {
> > -				irbp->ir_startino = r.ir_startino;
> > -				irbp->ir_holemask = r.ir_holemask;
> > -				irbp->ir_count = r.ir_count;
> > -				irbp->ir_freecount = r.ir_freecount;
> > -				irbp->ir_free = r.ir_free;
> > -				irbp++;
> > -			}
> > -			/* Increment to the next record */
> > -			error = xfs_btree_increment(cur, 0, &stat);
> > -		} else {
> > -			/* Start of ag.  Lookup the first inode chunk */
> > -			error = xfs_inobt_lookup(cur, 0, XFS_LOOKUP_GE, &stat);
> > -		}
> > -		if (error || stat == 0) {
> > -			end_of_ag = true;
> > -			goto del_cursor;
> > -		}
> > -
> > -		/*
> > -		 * Loop through inode btree records in this ag,
> > -		 * until we run out of inodes or space in the buffer.
> > -		 */
> > -		while (irbp < irbufend && icount < ubcount) {
> > -			struct xfs_inobt_rec_incore	r;
> > -
> > -			error = xfs_inobt_get_rec(cur, &r, &stat);
> > -			if (error || stat == 0) {
> > -				end_of_ag = true;
> > -				goto del_cursor;
> > -			}
> > -
> > -			/*
> > -			 * If this chunk has any allocated inodes, save it.
> > -			 * Also start read-ahead now for this chunk.
> > -			 */
> > -			if (r.ir_freecount < r.ir_count) {
> > -				xfs_bulkstat_ichunk_ra(mp, agno, &r);
> > -				irbp->ir_startino = r.ir_startino;
> > -				irbp->ir_holemask = r.ir_holemask;
> > -				irbp->ir_count = r.ir_count;
> > -				irbp->ir_freecount = r.ir_freecount;
> > -				irbp->ir_free = r.ir_free;
> > -				irbp++;
> > -				icount += r.ir_count - r.ir_freecount;
> > -			}
> > -			error = xfs_btree_increment(cur, 0, &stat);
> > -			if (error || stat == 0) {
> > -				end_of_ag = true;
> > -				goto del_cursor;
> > -			}
> > -			cond_resched();
> > -		}
> > -
> > -		/*
> > -		 * Drop the btree buffers and the agi buffer as we can't hold any
> > -		 * of the locks these represent when calling iget. If there is a
> > -		 * pending error, then we are done.
> > -		 */
> > -del_cursor:
> > -		xfs_btree_del_cursor(cur, error);
> > -		xfs_buf_relse(agbp);
> > -		if (error)
> > -			break;
> > -		/*
> > -		 * Now format all the good inodes into the user's buffer. The
> > -		 * call to xfs_bulkstat_ag_ichunk() sets up the agino pointer
> > -		 * for the next loop iteration.
> > -		 */
> > -		irbufend = irbp;
> > -		for (irbp = irbuf;
> > -		     irbp < irbufend && ac.ac_ubleft >= statstruct_size;
> > -		     irbp++) {
> > -			error = xfs_bulkstat_ag_ichunk(mp, agno, irbp,
> > -					formatter, statstruct_size, &ac,
> > -					&agino);
> > -			if (error)
> > -				break;
> > -
> > -			cond_resched();
> > -		}
> > -
> > -		/*
> > -		 * If we've run out of space or had a formatting error, we
> > -		 * are now done
> > -		 */
> > -		if (ac.ac_ubleft < statstruct_size || error)
> > -			break;
> > -
> > -		if (end_of_ag) {
> > -			agno++;
> > -			agino = 0;
> > -		}
> > -	}
> > -	/*
> > -	 * Done, we're either out of filesystem or space to put the data.
> > -	 */
> > -	kmem_free(irbuf);
> > -	*ubcountp = ac.ac_ubelem;
> > +	kmem_free(bc.buf);
> >  
> >  	/*
> >  	 * We found some inodes, so clear the error status and return them.
> > @@ -509,17 +356,9 @@ xfs_bulkstat(
> >  	 * triggered again and propagated to userspace as there will be no
> >  	 * formatted inodes in the buffer.
> >  	 */
> > -	if (ac.ac_ubelem)
> > +	if (breq->ocount > 0)
> >  		error = 0;
> >  
> > -	/*
> > -	 * If we ran out of filesystem, lastino will point off the end of
> > -	 * the filesystem so the next call will return immediately.
> > -	 */
> > -	*lastinop = XFS_AGINO_TO_INO(mp, agno, agino);
> > -	if (agno >= mp->m_sb.sb_agcount)
> > -		*done = 1;
> > -
> >  	return error;
> >  }
> >  
> > diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
> > index 369e3f159d4e..7c5f1df360e6 100644
> > --- a/fs/xfs/xfs_itable.h
> > +++ b/fs/xfs/xfs_itable.h
> > @@ -5,63 +5,46 @@
> >  #ifndef __XFS_ITABLE_H__
> >  #define	__XFS_ITABLE_H__
> >  
> > -/*
> > - * xfs_bulkstat() is used to fill in xfs_bstat structures as well as dm_stat
> > - * structures (by the dmi library). This is a pointer to a formatter function
> > - * that will iget the inode and fill in the appropriate structure.
> > - * see xfs_bulkstat_one() and xfs_dm_bulkstat_one() in dmapi_xfs.c
> > - */
> > -typedef int (*bulkstat_one_pf)(struct xfs_mount	*mp,
> > -			       xfs_ino_t	ino,
> > -			       void		__user *buffer,
> > -			       int		ubsize,
> > -			       int		*ubused,
> > -			       int		*stat);
> > +/* In-memory representation of a userspace request for batch inode data. */
> > +struct xfs_ibulk {
> > +	struct xfs_mount	*mp;
> > +	void __user		*ubuffer; /* user output buffer */
> > +	xfs_ino_t		startino; /* start with this inode */
> > +	unsigned int		icount;   /* number of elements in ubuffer */
> > +	unsigned int		ocount;   /* number of records returned */
> > +};
> > +
> > +/* Return value that means we want to abort the walk. */
> > +#define XFS_IBULK_ABORT		(XFS_IWALK_ABORT)
> > +
> > +/* Return value that means the formatting buffer is now full. */
> > +#define XFS_IBULK_BUFFER_FULL	(XFS_IBULK_ABORT + 1)
> >  
> >  /*
> > - * Values for stat return value.
> > + * Advance the user buffer pointer by one record of the given size.  If the
> > + * buffer is now full, return the appropriate error code.
> >   */
> > -#define BULKSTAT_RV_NOTHING	0
> > -#define BULKSTAT_RV_DIDONE	1
> > -#define BULKSTAT_RV_GIVEUP	2
> > +static inline int
> > +xfs_ibulk_advance(
> > +	struct xfs_ibulk	*breq,
> > +	size_t			bytes)
> > +{
> > +	char __user		*b = breq->ubuffer;
> > +
> > +	breq->ubuffer = b + bytes;
> > +	breq->ocount++;
> > +	return breq->ocount == breq->icount ? XFS_IBULK_BUFFER_FULL : 0;
> > +}
> >  
> >  /*
> >   * Return stat information in bulk (by-inode) for the filesystem.
> >   */
> > -int					/* error status */
> > -xfs_bulkstat(
> > -	xfs_mount_t	*mp,		/* mount point for filesystem */
> > -	xfs_ino_t	*lastino,	/* last inode returned */
> > -	int		*count,		/* size of buffer/count returned */
> > -	bulkstat_one_pf formatter,	/* func that'd fill a single buf */
> > -	size_t		statstruct_size,/* sizeof struct that we're filling */
> > -	char		__user *ubuffer,/* buffer with inode stats */
> > -	int		*done);		/* 1 if there are more stats to get */
> >  
> > -typedef int (*bulkstat_one_fmt_pf)(  /* used size in bytes or negative error */
> > -	void			__user *ubuffer, /* buffer to write to */
> > -	int			ubsize,		 /* remaining user buffer sz */
> > -	int			*ubused,	 /* bytes used by formatter */
> > -	const xfs_bstat_t	*buffer);        /* buffer to read from */
> > +typedef int (*bulkstat_one_fmt_pf)(struct xfs_ibulk *breq,
> > +		const struct xfs_bstat *bstat);
> >  
> > -int
> > -xfs_bulkstat_one_int(
> > -	xfs_mount_t		*mp,
> > -	xfs_ino_t		ino,
> > -	void			__user *buffer,
> > -	int			ubsize,
> > -	bulkstat_one_fmt_pf	formatter,
> > -	int			*ubused,
> > -	int			*stat);
> > -
> > -int
> > -xfs_bulkstat_one(
> > -	xfs_mount_t		*mp,
> > -	xfs_ino_t		ino,
> > -	void			__user *buffer,
> > -	int			ubsize,
> > -	int			*ubused,
> > -	int			*stat);
> > +int xfs_bulkstat_one(struct xfs_ibulk *breq, bulkstat_one_fmt_pf formatter);
> > +int xfs_bulkstat(struct xfs_ibulk *breq, bulkstat_one_fmt_pf formatter);
> >  
> >  typedef int (*inumbers_fmt_pf)(
> >  	void			__user *ubuffer, /* buffer to write to */
> > 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/14] xfs: convert bulkstat to new iwalk infrastructure
  2019-06-13 18:12     ` Darrick J. Wong
@ 2019-06-13 23:03       ` Darrick J. Wong
  2019-06-14 11:10         ` Brian Foster
  0 siblings, 1 reply; 33+ messages in thread
From: Darrick J. Wong @ 2019-06-13 23:03 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Thu, Jun 13, 2019 at 11:12:06AM -0700, Darrick J. Wong wrote:
> On Thu, Jun 13, 2019 at 12:31:54PM -0400, Brian Foster wrote:
> > On Tue, Jun 11, 2019 at 11:48:09PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Create a new ibulk structure incore to help us deal with bulk inode stat
> > > state tracking and then convert the bulkstat code to use the new iwalk
> > > iterator.  This disentangles inode walking from bulk stat control for
> > > simpler code and enables us to isolate the formatter functions to the
> > > ioctl handling code.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  fs/xfs/xfs_ioctl.c   |   70 ++++++--
> > >  fs/xfs/xfs_ioctl.h   |    5 +
> > >  fs/xfs/xfs_ioctl32.c |   93 ++++++-----
> > >  fs/xfs/xfs_itable.c  |  431 ++++++++++++++++----------------------------------
> > >  fs/xfs/xfs_itable.h  |   79 ++++-----
> > >  5 files changed, 272 insertions(+), 406 deletions(-)
> > > 
> > > 
> > ...
> > > diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> > > index 814ffe6fbab7..5d1c143bac18 100644
> > > --- a/fs/xfs/xfs_ioctl32.c
> > > +++ b/fs/xfs/xfs_ioctl32.c
> > ...
> > > @@ -284,38 +266,59 @@ xfs_compat_ioc_bulkstat(
> > >  		return -EFAULT;
> > >  	bulkreq.ocount = compat_ptr(addr);
> > >  
> > > -	if (copy_from_user(&inlast, bulkreq.lastip, sizeof(__s64)))
> > > +	if (copy_from_user(&lastino, bulkreq.lastip, sizeof(__s64)))
> > >  		return -EFAULT;
> > > +	breq.startino = lastino + 1;
> > >  
> > 
> > Spurious assignment?
> 
> Fixed.
> 
> > > -	if ((count = bulkreq.icount) <= 0)
> > > +	if (bulkreq.icount <= 0)
> > >  		return -EINVAL;
> > >  
> > >  	if (bulkreq.ubuffer == NULL)
> > >  		return -EINVAL;
> > >  
> > > +	breq.ubuffer = bulkreq.ubuffer;
> > > +	breq.icount = bulkreq.icount;
> > > +
> > ...
> > > diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> > > index 3ca1c454afe6..58e411e11d6c 100644
> > > --- a/fs/xfs/xfs_itable.c
> > > +++ b/fs/xfs/xfs_itable.c
> > > @@ -14,47 +14,68 @@
> > ...
> > > +STATIC int
> > >  xfs_bulkstat_one_int(
> > > -	struct xfs_mount	*mp,		/* mount point for filesystem */
> > > -	xfs_ino_t		ino,		/* inode to get data for */
> > > -	void __user		*buffer,	/* buffer to place output in */
> > > -	int			ubsize,		/* size of buffer */
> > > -	bulkstat_one_fmt_pf	formatter,	/* formatter, copy to user */
> > > -	int			*ubused,	/* bytes used by me */
> > > -	int			*stat)		/* BULKSTAT_RV_... */
> > > +	struct xfs_mount	*mp,
> > > +	struct xfs_trans	*tp,
> > > +	xfs_ino_t		ino,
> > > +	void			*data)
> > 
> > There's no need for a void pointer here given the current usage. We
> > might as well pass this as bc (and let the caller cast it, if
> > necessary).
> > 
> > That said, it also looks like the only reason we have the
> > xfs_bulkstat_iwalk wrapper caller of this function is to filter out
> > certain error values. If those errors are needed for the single inode
> > case, we could stick something in the bc to toggle that invalid inode
> > filtering behavior and eliminate the need for the wrapper entirely
> > (which would pass _one_int() into the iwalk infra directly and require
> > retaining the void pointer).
> 
> Ok, will do.  That'll help declutter the source file.

...or I won't, because gcc complains that the function pointer passed
into xfs_iwalk() has to have a (void *) as the 4th parameter.  It's not
willing to accept one with a (struct xfs_bstat_chunk *).

Sorry about that. :(

--D

> > 
> > >  {
> > > +	struct xfs_bstat_chunk	*bc = data;
> > >  	struct xfs_icdinode	*dic;		/* dinode core info pointer */
> > >  	struct xfs_inode	*ip;		/* incore inode pointer */
> > >  	struct inode		*inode;
> > > -	struct xfs_bstat	*buf;		/* return buffer */
> > > -	int			error = 0;	/* error value */
> > > +	struct xfs_bstat	*buf = bc->buf;
> > > +	int			error = -EINVAL;
> > >  
> > > -	*stat = BULKSTAT_RV_NOTHING;
> > > +	if (xfs_internal_inum(mp, ino))
> > > +		goto out_advance;
> > >  
> > > -	if (!buffer || xfs_internal_inum(mp, ino))
> > > -		return -EINVAL;
> > > -
> > > -	buf = kmem_zalloc(sizeof(*buf), KM_SLEEP | KM_MAYFAIL);
> > > -	if (!buf)
> > > -		return -ENOMEM;
> > > -
> > > -	error = xfs_iget(mp, NULL, ino,
> > > +	error = xfs_iget(mp, tp, ino,
> > >  			 (XFS_IGET_DONTCACHE | XFS_IGET_UNTRUSTED),
> > >  			 XFS_ILOCK_SHARED, &ip);
> > > +	if (error == -ENOENT || error == -EINVAL)
> > > +		goto out_advance;
> > >  	if (error)
> > > -		goto out_free;
> > > +		goto out;
> > >  
> > >  	ASSERT(ip != NULL);
> > >  	ASSERT(ip->i_imap.im_blkno != 0);
> > > @@ -119,43 +140,56 @@ xfs_bulkstat_one_int(
> > >  	xfs_iunlock(ip, XFS_ILOCK_SHARED);
> > >  	xfs_irele(ip);
> > >  
> > > -	error = formatter(buffer, ubsize, ubused, buf);
> > > -	if (!error)
> > > -		*stat = BULKSTAT_RV_DIDONE;
> > > +	error = bc->formatter(bc->breq, buf);
> > > +	if (error == XFS_IBULK_BUFFER_FULL) {
> > > +		error = XFS_IWALK_ABORT;
> > 
> > Related to the earlier patch.. is there a need for IBULK_BUFFER_FULL if
> > the only user converts it to the generic abort error?
> 
> <shrug> I wasn't sure if there was ever going to be a case where the
> formatter function wanted to abort for a reason that wasn't a full
> buffer... though looking at the bulkstat-v5 patches there aren't any.
> I guess I'll just remove BUFFER_FULL, then.
> 
> --D
> 
> > Most of these comments are minor/aesthetic, so:
> > 
> > Reviewed-by: Brian Foster <bfoster@redhat.com>
> > 
> > > +		goto out_advance;
> > > +	}
> > > +	if (error)
> > > +		goto out;
> > >  
> > > - out_free:
> > > -	kmem_free(buf);
> > > +out_advance:
> > > +	/*
> > > +	 * Advance the cursor to the inode that comes after the one we just
> > > +	 * looked at.  We want the caller to move along if the bulkstat
> > > +	 * information was copied successfully; if we tried to grab the inode
> > > +	 * but it's no longer allocated; or if it's internal metadata.
> > > +	 */
> > > +	bc->breq->startino = ino + 1;
> > > +out:
> > >  	return error;
> > >  }
> > >  
> > > -/* Return 0 on success or positive error */
> > > -STATIC int
> > > -xfs_bulkstat_one_fmt(
> > > -	void			__user *ubuffer,
> > > -	int			ubsize,
> > > -	int			*ubused,
> > > -	const xfs_bstat_t	*buffer)
> > > -{
> > > -	if (ubsize < sizeof(*buffer))
> > > -		return -ENOMEM;
> > > -	if (copy_to_user(ubuffer, buffer, sizeof(*buffer)))
> > > -		return -EFAULT;
> > > -	if (ubused)
> > > -		*ubused = sizeof(*buffer);
> > > -	return 0;
> > > -}
> > > -
> > > +/* Bulkstat a single inode. */
> > >  int
> > >  xfs_bulkstat_one(
> > > -	xfs_mount_t	*mp,		/* mount point for filesystem */
> > > -	xfs_ino_t	ino,		/* inode number to get data for */
> > > -	void		__user *buffer,	/* buffer to place output in */
> > > -	int		ubsize,		/* size of buffer */
> > > -	int		*ubused,	/* bytes used by me */
> > > -	int		*stat)		/* BULKSTAT_RV_... */
> > > +	struct xfs_ibulk	*breq,
> > > +	bulkstat_one_fmt_pf	formatter)
> > >  {
> > > -	return xfs_bulkstat_one_int(mp, ino, buffer, ubsize,
> > > -				    xfs_bulkstat_one_fmt, ubused, stat);
> > > +	struct xfs_bstat_chunk	bc = {
> > > +		.formatter	= formatter,
> > > +		.breq		= breq,
> > > +	};
> > > +	int			error;
> > > +
> > > +	ASSERT(breq->icount == 1);
> > > +
> > > +	bc.buf = kmem_zalloc(sizeof(struct xfs_bstat), KM_SLEEP | KM_MAYFAIL);
> > > +	if (!bc.buf)
> > > +		return -ENOMEM;
> > > +
> > > +	error = xfs_bulkstat_one_int(breq->mp, NULL, breq->startino, &bc);
> > > +
> > > +	kmem_free(bc.buf);
> > > +
> > > +	/*
> > > +	 * If we reported one inode to userspace then we abort because we hit
> > > +	 * the end of the buffer.  Don't leak that back to userspace.
> > > +	 */
> > > +	if (error == XFS_IWALK_ABORT)
> > > +		error = 0;
> > > +
> > > +	return error;
> > >  }
> > >  
> > >  /*
> > > @@ -251,256 +285,69 @@ xfs_bulkstat_grab_ichunk(
> > >  
> > >  #define XFS_BULKSTAT_UBLEFT(ubleft)	((ubleft) >= statstruct_size)
> > >  
> > > -struct xfs_bulkstat_agichunk {
> > > -	char		__user **ac_ubuffer;/* pointer into user's buffer */
> > > -	int		ac_ubleft;	/* bytes left in user's buffer */
> > > -	int		ac_ubelem;	/* spaces used in user's buffer */
> > > -};
> > > -
> > > -/*
> > > - * Process inodes in chunk with a pointer to a formatter function
> > > - * that will iget the inode and fill in the appropriate structure.
> > > - */
> > >  static int
> > > -xfs_bulkstat_ag_ichunk(
> > > -	struct xfs_mount		*mp,
> > > -	xfs_agnumber_t			agno,
> > > -	struct xfs_inobt_rec_incore	*irbp,
> > > -	bulkstat_one_pf			formatter,
> > > -	size_t				statstruct_size,
> > > -	struct xfs_bulkstat_agichunk	*acp,
> > > -	xfs_agino_t			*last_agino)
> > > +xfs_bulkstat_iwalk(
> > > +	struct xfs_mount	*mp,
> > > +	struct xfs_trans	*tp,
> > > +	xfs_ino_t		ino,
> > > +	void			*data)
> > >  {
> > > -	char				__user **ubufp = acp->ac_ubuffer;
> > > -	int				chunkidx;
> > > -	int				error = 0;
> > > -	xfs_agino_t			agino = irbp->ir_startino;
> > > -
> > > -	for (chunkidx = 0; chunkidx < XFS_INODES_PER_CHUNK;
> > > -	     chunkidx++, agino++) {
> > > -		int		fmterror;
> > > -		int		ubused;
> > > -
> > > -		/* inode won't fit in buffer, we are done */
> > > -		if (acp->ac_ubleft < statstruct_size)
> > > -			break;
> > > -
> > > -		/* Skip if this inode is free */
> > > -		if (XFS_INOBT_MASK(chunkidx) & irbp->ir_free)
> > > -			continue;
> > > -
> > > -		/* Get the inode and fill in a single buffer */
> > > -		ubused = statstruct_size;
> > > -		error = formatter(mp, XFS_AGINO_TO_INO(mp, agno, agino),
> > > -				  *ubufp, acp->ac_ubleft, &ubused, &fmterror);
> > > -
> > > -		if (fmterror == BULKSTAT_RV_GIVEUP ||
> > > -		    (error && error != -ENOENT && error != -EINVAL)) {
> > > -			acp->ac_ubleft = 0;
> > > -			ASSERT(error);
> > > -			break;
> > > -		}
> > > -
> > > -		/* be careful not to leak error if at end of chunk */
> > > -		if (fmterror == BULKSTAT_RV_NOTHING || error) {
> > > -			error = 0;
> > > -			continue;
> > > -		}
> > > -
> > > -		*ubufp += ubused;
> > > -		acp->ac_ubleft -= ubused;
> > > -		acp->ac_ubelem++;
> > > -	}
> > > -
> > > -	/*
> > > -	 * Post-update *last_agino. At this point, agino will always point one
> > > -	 * inode past the last inode we processed successfully. Hence we
> > > -	 * substract that inode when setting the *last_agino cursor so that we
> > > -	 * return the correct cookie to userspace. On the next bulkstat call,
> > > -	 * the inode under the lastino cookie will be skipped as we have already
> > > -	 * processed it here.
> > > -	 */
> > > -	*last_agino = agino - 1;
> > > +	int			error;
> > >  
> > > +	error = xfs_bulkstat_one_int(mp, tp, ino, data);
> > > +	/* bulkstat just skips over missing inodes */
> > > +	if (error == -ENOENT || error == -EINVAL)
> > > +		return 0;
> > >  	return error;
> > >  }
> > >  
> > >  /*
> > > - * Return stat information in bulk (by-inode) for the filesystem.
> > > + * Check the incoming lastino parameter.
> > > + *
> > > + * We allow any inode value that could map to physical space inside the
> > > + * filesystem because if there are no inodes there, bulkstat moves on to the
> > > + * next chunk.  In other words, the magic agino value of zero takes us to the
> > > + * first chunk in the AG, and an agino value past the end of the AG takes us to
> > > + * the first chunk in the next AG.
> > > + *
> > > + * Therefore we can end early if the requested inode is beyond the end of the
> > > + * filesystem or doesn't map properly.
> > >   */
> > > -int					/* error status */
> > > -xfs_bulkstat(
> > > -	xfs_mount_t		*mp,	/* mount point for filesystem */
> > > -	xfs_ino_t		*lastinop, /* last inode returned */
> > > -	int			*ubcountp, /* size of buffer/count returned */
> > > -	bulkstat_one_pf		formatter, /* func that'd fill a single buf */
> > > -	size_t			statstruct_size, /* sizeof struct filling */
> > > -	char			__user *ubuffer, /* buffer with inode stats */
> > > -	int			*done)	/* 1 if there are more stats to get */
> > > +static inline bool
> > > +xfs_bulkstat_already_done(
> > > +	struct xfs_mount	*mp,
> > > +	xfs_ino_t		startino)
> > >  {
> > > -	xfs_buf_t		*agbp;	/* agi header buffer */
> > > -	xfs_agino_t		agino;	/* inode # in allocation group */
> > > -	xfs_agnumber_t		agno;	/* allocation group number */
> > > -	xfs_btree_cur_t		*cur;	/* btree cursor for ialloc btree */
> > > -	xfs_inobt_rec_incore_t	*irbuf;	/* start of irec buffer */
> > > -	int			nirbuf;	/* size of irbuf */
> > > -	int			ubcount; /* size of user's buffer */
> > > -	struct xfs_bulkstat_agichunk ac;
> > > -	int			error = 0;
> > > +	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, startino);
> > > +	xfs_agino_t		agino = XFS_INO_TO_AGINO(mp, startino);
> > >  
> > > -	/*
> > > -	 * Get the last inode value, see if there's nothing to do.
> > > -	 */
> > > -	agno = XFS_INO_TO_AGNO(mp, *lastinop);
> > > -	agino = XFS_INO_TO_AGINO(mp, *lastinop);
> > > -	if (agno >= mp->m_sb.sb_agcount ||
> > > -	    *lastinop != XFS_AGINO_TO_INO(mp, agno, agino)) {
> > > -		*done = 1;
> > > -		*ubcountp = 0;
> > > -		return 0;
> > > -	}
> > > +	return agno >= mp->m_sb.sb_agcount ||
> > > +	       startino != XFS_AGINO_TO_INO(mp, agno, agino);
> > > +}
> > >  
> > > -	ubcount = *ubcountp; /* statstruct's */
> > > -	ac.ac_ubuffer = &ubuffer;
> > > -	ac.ac_ubleft = ubcount * statstruct_size; /* bytes */;
> > > -	ac.ac_ubelem = 0;
> > > +/* Return stat information in bulk (by-inode) for the filesystem. */
> > > +int
> > > +xfs_bulkstat(
> > > +	struct xfs_ibulk	*breq,
> > > +	bulkstat_one_fmt_pf	formatter)
> > > +{
> > > +	struct xfs_bstat_chunk	bc = {
> > > +		.formatter	= formatter,
> > > +		.breq		= breq,
> > > +	};
> > > +	int			error;
> > >  
> > > -	*ubcountp = 0;
> > > -	*done = 0;
> > > +	if (xfs_bulkstat_already_done(breq->mp, breq->startino))
> > > +		return 0;
> > >  
> > > -	irbuf = kmem_zalloc_large(PAGE_SIZE * 4, KM_SLEEP);
> > > -	if (!irbuf)
> > > +	bc.buf = kmem_zalloc(sizeof(struct xfs_bstat), KM_SLEEP | KM_MAYFAIL);
> > > +	if (!bc.buf)
> > >  		return -ENOMEM;
> > > -	nirbuf = (PAGE_SIZE * 4) / sizeof(*irbuf);
> > >  
> > > -	/*
> > > -	 * Loop over the allocation groups, starting from the last
> > > -	 * inode returned; 0 means start of the allocation group.
> > > -	 */
> > > -	while (agno < mp->m_sb.sb_agcount) {
> > > -		struct xfs_inobt_rec_incore	*irbp = irbuf;
> > > -		struct xfs_inobt_rec_incore	*irbufend = irbuf + nirbuf;
> > > -		bool				end_of_ag = false;
> > > -		int				icount = 0;
> > > -		int				stat;
> > > +	error = xfs_iwalk(breq->mp, NULL, breq->startino, xfs_bulkstat_iwalk,
> > > +			breq->icount, &bc);
> > >  
> > > -		error = xfs_ialloc_read_agi(mp, NULL, agno, &agbp);
> > > -		if (error)
> > > -			break;
> > > -		/*
> > > -		 * Allocate and initialize a btree cursor for ialloc btree.
> > > -		 */
> > > -		cur = xfs_inobt_init_cursor(mp, NULL, agbp, agno,
> > > -					    XFS_BTNUM_INO);
> > > -		if (agino > 0) {
> > > -			/*
> > > -			 * In the middle of an allocation group, we need to get
> > > -			 * the remainder of the chunk we're in.
> > > -			 */
> > > -			struct xfs_inobt_rec_incore	r;
> > > -
> > > -			error = xfs_bulkstat_grab_ichunk(cur, agino, &icount, &r);
> > > -			if (error)
> > > -				goto del_cursor;
> > > -			if (icount) {
> > > -				irbp->ir_startino = r.ir_startino;
> > > -				irbp->ir_holemask = r.ir_holemask;
> > > -				irbp->ir_count = r.ir_count;
> > > -				irbp->ir_freecount = r.ir_freecount;
> > > -				irbp->ir_free = r.ir_free;
> > > -				irbp++;
> > > -			}
> > > -			/* Increment to the next record */
> > > -			error = xfs_btree_increment(cur, 0, &stat);
> > > -		} else {
> > > -			/* Start of ag.  Lookup the first inode chunk */
> > > -			error = xfs_inobt_lookup(cur, 0, XFS_LOOKUP_GE, &stat);
> > > -		}
> > > -		if (error || stat == 0) {
> > > -			end_of_ag = true;
> > > -			goto del_cursor;
> > > -		}
> > > -
> > > -		/*
> > > -		 * Loop through inode btree records in this ag,
> > > -		 * until we run out of inodes or space in the buffer.
> > > -		 */
> > > -		while (irbp < irbufend && icount < ubcount) {
> > > -			struct xfs_inobt_rec_incore	r;
> > > -
> > > -			error = xfs_inobt_get_rec(cur, &r, &stat);
> > > -			if (error || stat == 0) {
> > > -				end_of_ag = true;
> > > -				goto del_cursor;
> > > -			}
> > > -
> > > -			/*
> > > -			 * If this chunk has any allocated inodes, save it.
> > > -			 * Also start read-ahead now for this chunk.
> > > -			 */
> > > -			if (r.ir_freecount < r.ir_count) {
> > > -				xfs_bulkstat_ichunk_ra(mp, agno, &r);
> > > -				irbp->ir_startino = r.ir_startino;
> > > -				irbp->ir_holemask = r.ir_holemask;
> > > -				irbp->ir_count = r.ir_count;
> > > -				irbp->ir_freecount = r.ir_freecount;
> > > -				irbp->ir_free = r.ir_free;
> > > -				irbp++;
> > > -				icount += r.ir_count - r.ir_freecount;
> > > -			}
> > > -			error = xfs_btree_increment(cur, 0, &stat);
> > > -			if (error || stat == 0) {
> > > -				end_of_ag = true;
> > > -				goto del_cursor;
> > > -			}
> > > -			cond_resched();
> > > -		}
> > > -
> > > -		/*
> > > -		 * Drop the btree buffers and the agi buffer as we can't hold any
> > > -		 * of the locks these represent when calling iget. If there is a
> > > -		 * pending error, then we are done.
> > > -		 */
> > > -del_cursor:
> > > -		xfs_btree_del_cursor(cur, error);
> > > -		xfs_buf_relse(agbp);
> > > -		if (error)
> > > -			break;
> > > -		/*
> > > -		 * Now format all the good inodes into the user's buffer. The
> > > -		 * call to xfs_bulkstat_ag_ichunk() sets up the agino pointer
> > > -		 * for the next loop iteration.
> > > -		 */
> > > -		irbufend = irbp;
> > > -		for (irbp = irbuf;
> > > -		     irbp < irbufend && ac.ac_ubleft >= statstruct_size;
> > > -		     irbp++) {
> > > -			error = xfs_bulkstat_ag_ichunk(mp, agno, irbp,
> > > -					formatter, statstruct_size, &ac,
> > > -					&agino);
> > > -			if (error)
> > > -				break;
> > > -
> > > -			cond_resched();
> > > -		}
> > > -
> > > -		/*
> > > -		 * If we've run out of space or had a formatting error, we
> > > -		 * are now done
> > > -		 */
> > > -		if (ac.ac_ubleft < statstruct_size || error)
> > > -			break;
> > > -
> > > -		if (end_of_ag) {
> > > -			agno++;
> > > -			agino = 0;
> > > -		}
> > > -	}
> > > -	/*
> > > -	 * Done, we're either out of filesystem or space to put the data.
> > > -	 */
> > > -	kmem_free(irbuf);
> > > -	*ubcountp = ac.ac_ubelem;
> > > +	kmem_free(bc.buf);
> > >  
> > >  	/*
> > >  	 * We found some inodes, so clear the error status and return them.
> > > @@ -509,17 +356,9 @@ xfs_bulkstat(
> > >  	 * triggered again and propagated to userspace as there will be no
> > >  	 * formatted inodes in the buffer.
> > >  	 */
> > > -	if (ac.ac_ubelem)
> > > +	if (breq->ocount > 0)
> > >  		error = 0;
> > >  
> > > -	/*
> > > -	 * If we ran out of filesystem, lastino will point off the end of
> > > -	 * the filesystem so the next call will return immediately.
> > > -	 */
> > > -	*lastinop = XFS_AGINO_TO_INO(mp, agno, agino);
> > > -	if (agno >= mp->m_sb.sb_agcount)
> > > -		*done = 1;
> > > -
> > >  	return error;
> > >  }
> > >  
> > > diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
> > > index 369e3f159d4e..7c5f1df360e6 100644
> > > --- a/fs/xfs/xfs_itable.h
> > > +++ b/fs/xfs/xfs_itable.h
> > > @@ -5,63 +5,46 @@
> > >  #ifndef __XFS_ITABLE_H__
> > >  #define	__XFS_ITABLE_H__
> > >  
> > > -/*
> > > - * xfs_bulkstat() is used to fill in xfs_bstat structures as well as dm_stat
> > > - * structures (by the dmi library). This is a pointer to a formatter function
> > > - * that will iget the inode and fill in the appropriate structure.
> > > - * see xfs_bulkstat_one() and xfs_dm_bulkstat_one() in dmapi_xfs.c
> > > - */
> > > -typedef int (*bulkstat_one_pf)(struct xfs_mount	*mp,
> > > -			       xfs_ino_t	ino,
> > > -			       void		__user *buffer,
> > > -			       int		ubsize,
> > > -			       int		*ubused,
> > > -			       int		*stat);
> > > +/* In-memory representation of a userspace request for batch inode data. */
> > > +struct xfs_ibulk {
> > > +	struct xfs_mount	*mp;
> > > +	void __user		*ubuffer; /* user output buffer */
> > > +	xfs_ino_t		startino; /* start with this inode */
> > > +	unsigned int		icount;   /* number of elements in ubuffer */
> > > +	unsigned int		ocount;   /* number of records returned */
> > > +};
> > > +
> > > +/* Return value that means we want to abort the walk. */
> > > +#define XFS_IBULK_ABORT		(XFS_IWALK_ABORT)
> > > +
> > > +/* Return value that means the formatting buffer is now full. */
> > > +#define XFS_IBULK_BUFFER_FULL	(XFS_IBULK_ABORT + 1)
> > >  
> > >  /*
> > > - * Values for stat return value.
> > > + * Advance the user buffer pointer by one record of the given size.  If the
> > > + * buffer is now full, return the appropriate error code.
> > >   */
> > > -#define BULKSTAT_RV_NOTHING	0
> > > -#define BULKSTAT_RV_DIDONE	1
> > > -#define BULKSTAT_RV_GIVEUP	2
> > > +static inline int
> > > +xfs_ibulk_advance(
> > > +	struct xfs_ibulk	*breq,
> > > +	size_t			bytes)
> > > +{
> > > +	char __user		*b = breq->ubuffer;
> > > +
> > > +	breq->ubuffer = b + bytes;
> > > +	breq->ocount++;
> > > +	return breq->ocount == breq->icount ? XFS_IBULK_BUFFER_FULL : 0;
> > > +}
> > >  
> > >  /*
> > >   * Return stat information in bulk (by-inode) for the filesystem.
> > >   */
> > > -int					/* error status */
> > > -xfs_bulkstat(
> > > -	xfs_mount_t	*mp,		/* mount point for filesystem */
> > > -	xfs_ino_t	*lastino,	/* last inode returned */
> > > -	int		*count,		/* size of buffer/count returned */
> > > -	bulkstat_one_pf formatter,	/* func that'd fill a single buf */
> > > -	size_t		statstruct_size,/* sizeof struct that we're filling */
> > > -	char		__user *ubuffer,/* buffer with inode stats */
> > > -	int		*done);		/* 1 if there are more stats to get */
> > >  
> > > -typedef int (*bulkstat_one_fmt_pf)(  /* used size in bytes or negative error */
> > > -	void			__user *ubuffer, /* buffer to write to */
> > > -	int			ubsize,		 /* remaining user buffer sz */
> > > -	int			*ubused,	 /* bytes used by formatter */
> > > -	const xfs_bstat_t	*buffer);        /* buffer to read from */
> > > +typedef int (*bulkstat_one_fmt_pf)(struct xfs_ibulk *breq,
> > > +		const struct xfs_bstat *bstat);
> > >  
> > > -int
> > > -xfs_bulkstat_one_int(
> > > -	xfs_mount_t		*mp,
> > > -	xfs_ino_t		ino,
> > > -	void			__user *buffer,
> > > -	int			ubsize,
> > > -	bulkstat_one_fmt_pf	formatter,
> > > -	int			*ubused,
> > > -	int			*stat);
> > > -
> > > -int
> > > -xfs_bulkstat_one(
> > > -	xfs_mount_t		*mp,
> > > -	xfs_ino_t		ino,
> > > -	void			__user *buffer,
> > > -	int			ubsize,
> > > -	int			*ubused,
> > > -	int			*stat);
> > > +int xfs_bulkstat_one(struct xfs_ibulk *breq, bulkstat_one_fmt_pf formatter);
> > > +int xfs_bulkstat(struct xfs_ibulk *breq, bulkstat_one_fmt_pf formatter);
> > >  
> > >  typedef int (*inumbers_fmt_pf)(
> > >  	void			__user *ubuffer, /* buffer to write to */
> > > 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/14] xfs: convert bulkstat to new iwalk infrastructure
  2019-06-13 23:03       ` Darrick J. Wong
@ 2019-06-14 11:10         ` Brian Foster
  2019-06-14 16:45           ` Darrick J. Wong
  0 siblings, 1 reply; 33+ messages in thread
From: Brian Foster @ 2019-06-14 11:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Jun 13, 2019 at 04:03:58PM -0700, Darrick J. Wong wrote:
> On Thu, Jun 13, 2019 at 11:12:06AM -0700, Darrick J. Wong wrote:
> > On Thu, Jun 13, 2019 at 12:31:54PM -0400, Brian Foster wrote:
> > > On Tue, Jun 11, 2019 at 11:48:09PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > Create a new ibulk structure incore to help us deal with bulk inode stat
> > > > state tracking and then convert the bulkstat code to use the new iwalk
> > > > iterator.  This disentangles inode walking from bulk stat control for
> > > > simpler code and enables us to isolate the formatter functions to the
> > > > ioctl handling code.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > >  fs/xfs/xfs_ioctl.c   |   70 ++++++--
> > > >  fs/xfs/xfs_ioctl.h   |    5 +
> > > >  fs/xfs/xfs_ioctl32.c |   93 ++++++-----
> > > >  fs/xfs/xfs_itable.c  |  431 ++++++++++++++++----------------------------------
> > > >  fs/xfs/xfs_itable.h  |   79 ++++-----
> > > >  5 files changed, 272 insertions(+), 406 deletions(-)
> > > > 
> > > > 
> > > ...
> > > > diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> > > > index 814ffe6fbab7..5d1c143bac18 100644
> > > > --- a/fs/xfs/xfs_ioctl32.c
> > > > +++ b/fs/xfs/xfs_ioctl32.c
> > > ...
> > > > @@ -284,38 +266,59 @@ xfs_compat_ioc_bulkstat(
> > > >  		return -EFAULT;
> > > >  	bulkreq.ocount = compat_ptr(addr);
> > > >  
> > > > -	if (copy_from_user(&inlast, bulkreq.lastip, sizeof(__s64)))
> > > > +	if (copy_from_user(&lastino, bulkreq.lastip, sizeof(__s64)))
> > > >  		return -EFAULT;
> > > > +	breq.startino = lastino + 1;
> > > >  
> > > 
> > > Spurious assignment?
> > 
> > Fixed.
> > 
> > > > -	if ((count = bulkreq.icount) <= 0)
> > > > +	if (bulkreq.icount <= 0)
> > > >  		return -EINVAL;
> > > >  
> > > >  	if (bulkreq.ubuffer == NULL)
> > > >  		return -EINVAL;
> > > >  
> > > > +	breq.ubuffer = bulkreq.ubuffer;
> > > > +	breq.icount = bulkreq.icount;
> > > > +
> > > ...
> > > > diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> > > > index 3ca1c454afe6..58e411e11d6c 100644
> > > > --- a/fs/xfs/xfs_itable.c
> > > > +++ b/fs/xfs/xfs_itable.c
> > > > @@ -14,47 +14,68 @@
> > > ...
> > > > +STATIC int
> > > >  xfs_bulkstat_one_int(
> > > > -	struct xfs_mount	*mp,		/* mount point for filesystem */
> > > > -	xfs_ino_t		ino,		/* inode to get data for */
> > > > -	void __user		*buffer,	/* buffer to place output in */
> > > > -	int			ubsize,		/* size of buffer */
> > > > -	bulkstat_one_fmt_pf	formatter,	/* formatter, copy to user */
> > > > -	int			*ubused,	/* bytes used by me */
> > > > -	int			*stat)		/* BULKSTAT_RV_... */
> > > > +	struct xfs_mount	*mp,
> > > > +	struct xfs_trans	*tp,
> > > > +	xfs_ino_t		ino,
> > > > +	void			*data)
> > > 
> > > There's no need for a void pointer here given the current usage. We
> > > might as well pass this as bc (and let the caller cast it, if
> > > necessary).
> > > 
> > > That said, it also looks like the only reason we have the
> > > xfs_bulkstat_iwalk wrapper caller of this function is to filter out
> > > certain error values. If those errors are needed for the single inode
> > > case, we could stick something in the bc to toggle that invalid inode
> > > filtering behavior and eliminate the need for the wrapper entirely
> > > (which would pass _one_int() into the iwalk infra directly and require
> > > retaining the void pointer).
> > 
> > Ok, will do.  That'll help declutter the source file.
> 
> ...or I won't, because gcc complains that the function pointer passed
> into xfs_iwalk() has to have a (void *) as the 4th parameter.  It's not
> willing to accept one with a (struct xfs_bstat_chunk *).
> 

Hm I don't follow, this function already takes a void *data parameter
and we pass bc into xfs_iwalk() as a void*. What am I missing?

Brian

> Sorry about that. :(
> 
> --D
> 
> > > 
> > > >  {
> > > > +	struct xfs_bstat_chunk	*bc = data;
> > > >  	struct xfs_icdinode	*dic;		/* dinode core info pointer */
> > > >  	struct xfs_inode	*ip;		/* incore inode pointer */
> > > >  	struct inode		*inode;
> > > > -	struct xfs_bstat	*buf;		/* return buffer */
> > > > -	int			error = 0;	/* error value */
> > > > +	struct xfs_bstat	*buf = bc->buf;
> > > > +	int			error = -EINVAL;
> > > >  
> > > > -	*stat = BULKSTAT_RV_NOTHING;
> > > > +	if (xfs_internal_inum(mp, ino))
> > > > +		goto out_advance;
> > > >  
> > > > -	if (!buffer || xfs_internal_inum(mp, ino))
> > > > -		return -EINVAL;
> > > > -
> > > > -	buf = kmem_zalloc(sizeof(*buf), KM_SLEEP | KM_MAYFAIL);
> > > > -	if (!buf)
> > > > -		return -ENOMEM;
> > > > -
> > > > -	error = xfs_iget(mp, NULL, ino,
> > > > +	error = xfs_iget(mp, tp, ino,
> > > >  			 (XFS_IGET_DONTCACHE | XFS_IGET_UNTRUSTED),
> > > >  			 XFS_ILOCK_SHARED, &ip);
> > > > +	if (error == -ENOENT || error == -EINVAL)
> > > > +		goto out_advance;
> > > >  	if (error)
> > > > -		goto out_free;
> > > > +		goto out;
> > > >  
> > > >  	ASSERT(ip != NULL);
> > > >  	ASSERT(ip->i_imap.im_blkno != 0);
> > > > @@ -119,43 +140,56 @@ xfs_bulkstat_one_int(
> > > >  	xfs_iunlock(ip, XFS_ILOCK_SHARED);
> > > >  	xfs_irele(ip);
> > > >  
> > > > -	error = formatter(buffer, ubsize, ubused, buf);
> > > > -	if (!error)
> > > > -		*stat = BULKSTAT_RV_DIDONE;
> > > > +	error = bc->formatter(bc->breq, buf);
> > > > +	if (error == XFS_IBULK_BUFFER_FULL) {
> > > > +		error = XFS_IWALK_ABORT;
> > > 
> > > Related to the earlier patch.. is there a need for IBULK_BUFFER_FULL if
> > > the only user converts it to the generic abort error?
> > 
> > <shrug> I wasn't sure if there was ever going to be a case where the
> > formatter function wanted to abort for a reason that wasn't a full
> > buffer... though looking at the bulkstat-v5 patches there aren't any.
> > I guess I'll just remove BUFFER_FULL, then.
> > 
> > --D
> > 
> > > Most of these comments are minor/aesthetic, so:
> > > 
> > > Reviewed-by: Brian Foster <bfoster@redhat.com>
> > > 
> > > > +		goto out_advance;
> > > > +	}
> > > > +	if (error)
> > > > +		goto out;
> > > >  
> > > > - out_free:
> > > > -	kmem_free(buf);
> > > > +out_advance:
> > > > +	/*
> > > > +	 * Advance the cursor to the inode that comes after the one we just
> > > > +	 * looked at.  We want the caller to move along if the bulkstat
> > > > +	 * information was copied successfully; if we tried to grab the inode
> > > > +	 * but it's no longer allocated; or if it's internal metadata.
> > > > +	 */
> > > > +	bc->breq->startino = ino + 1;
> > > > +out:
> > > >  	return error;
> > > >  }
> > > >  
> > > > -/* Return 0 on success or positive error */
> > > > -STATIC int
> > > > -xfs_bulkstat_one_fmt(
> > > > -	void			__user *ubuffer,
> > > > -	int			ubsize,
> > > > -	int			*ubused,
> > > > -	const xfs_bstat_t	*buffer)
> > > > -{
> > > > -	if (ubsize < sizeof(*buffer))
> > > > -		return -ENOMEM;
> > > > -	if (copy_to_user(ubuffer, buffer, sizeof(*buffer)))
> > > > -		return -EFAULT;
> > > > -	if (ubused)
> > > > -		*ubused = sizeof(*buffer);
> > > > -	return 0;
> > > > -}
> > > > -
> > > > +/* Bulkstat a single inode. */
> > > >  int
> > > >  xfs_bulkstat_one(
> > > > -	xfs_mount_t	*mp,		/* mount point for filesystem */
> > > > -	xfs_ino_t	ino,		/* inode number to get data for */
> > > > -	void		__user *buffer,	/* buffer to place output in */
> > > > -	int		ubsize,		/* size of buffer */
> > > > -	int		*ubused,	/* bytes used by me */
> > > > -	int		*stat)		/* BULKSTAT_RV_... */
> > > > +	struct xfs_ibulk	*breq,
> > > > +	bulkstat_one_fmt_pf	formatter)
> > > >  {
> > > > -	return xfs_bulkstat_one_int(mp, ino, buffer, ubsize,
> > > > -				    xfs_bulkstat_one_fmt, ubused, stat);
> > > > +	struct xfs_bstat_chunk	bc = {
> > > > +		.formatter	= formatter,
> > > > +		.breq		= breq,
> > > > +	};
> > > > +	int			error;
> > > > +
> > > > +	ASSERT(breq->icount == 1);
> > > > +
> > > > +	bc.buf = kmem_zalloc(sizeof(struct xfs_bstat), KM_SLEEP | KM_MAYFAIL);
> > > > +	if (!bc.buf)
> > > > +		return -ENOMEM;
> > > > +
> > > > +	error = xfs_bulkstat_one_int(breq->mp, NULL, breq->startino, &bc);
> > > > +
> > > > +	kmem_free(bc.buf);
> > > > +
> > > > +	/*
> > > > +	 * If we reported one inode to userspace then we abort because we hit
> > > > +	 * the end of the buffer.  Don't leak that back to userspace.
> > > > +	 */
> > > > +	if (error == XFS_IWALK_ABORT)
> > > > +		error = 0;
> > > > +
> > > > +	return error;
> > > >  }
> > > >  
> > > >  /*
> > > > @@ -251,256 +285,69 @@ xfs_bulkstat_grab_ichunk(
> > > >  
> > > >  #define XFS_BULKSTAT_UBLEFT(ubleft)	((ubleft) >= statstruct_size)
> > > >  
> > > > -struct xfs_bulkstat_agichunk {
> > > > -	char		__user **ac_ubuffer;/* pointer into user's buffer */
> > > > -	int		ac_ubleft;	/* bytes left in user's buffer */
> > > > -	int		ac_ubelem;	/* spaces used in user's buffer */
> > > > -};
> > > > -
> > > > -/*
> > > > - * Process inodes in chunk with a pointer to a formatter function
> > > > - * that will iget the inode and fill in the appropriate structure.
> > > > - */
> > > >  static int
> > > > -xfs_bulkstat_ag_ichunk(
> > > > -	struct xfs_mount		*mp,
> > > > -	xfs_agnumber_t			agno,
> > > > -	struct xfs_inobt_rec_incore	*irbp,
> > > > -	bulkstat_one_pf			formatter,
> > > > -	size_t				statstruct_size,
> > > > -	struct xfs_bulkstat_agichunk	*acp,
> > > > -	xfs_agino_t			*last_agino)
> > > > +xfs_bulkstat_iwalk(
> > > > +	struct xfs_mount	*mp,
> > > > +	struct xfs_trans	*tp,
> > > > +	xfs_ino_t		ino,
> > > > +	void			*data)
> > > >  {
> > > > -	char				__user **ubufp = acp->ac_ubuffer;
> > > > -	int				chunkidx;
> > > > -	int				error = 0;
> > > > -	xfs_agino_t			agino = irbp->ir_startino;
> > > > -
> > > > -	for (chunkidx = 0; chunkidx < XFS_INODES_PER_CHUNK;
> > > > -	     chunkidx++, agino++) {
> > > > -		int		fmterror;
> > > > -		int		ubused;
> > > > -
> > > > -		/* inode won't fit in buffer, we are done */
> > > > -		if (acp->ac_ubleft < statstruct_size)
> > > > -			break;
> > > > -
> > > > -		/* Skip if this inode is free */
> > > > -		if (XFS_INOBT_MASK(chunkidx) & irbp->ir_free)
> > > > -			continue;
> > > > -
> > > > -		/* Get the inode and fill in a single buffer */
> > > > -		ubused = statstruct_size;
> > > > -		error = formatter(mp, XFS_AGINO_TO_INO(mp, agno, agino),
> > > > -				  *ubufp, acp->ac_ubleft, &ubused, &fmterror);
> > > > -
> > > > -		if (fmterror == BULKSTAT_RV_GIVEUP ||
> > > > -		    (error && error != -ENOENT && error != -EINVAL)) {
> > > > -			acp->ac_ubleft = 0;
> > > > -			ASSERT(error);
> > > > -			break;
> > > > -		}
> > > > -
> > > > -		/* be careful not to leak error if at end of chunk */
> > > > -		if (fmterror == BULKSTAT_RV_NOTHING || error) {
> > > > -			error = 0;
> > > > -			continue;
> > > > -		}
> > > > -
> > > > -		*ubufp += ubused;
> > > > -		acp->ac_ubleft -= ubused;
> > > > -		acp->ac_ubelem++;
> > > > -	}
> > > > -
> > > > -	/*
> > > > -	 * Post-update *last_agino. At this point, agino will always point one
> > > > -	 * inode past the last inode we processed successfully. Hence we
> > > > -	 * substract that inode when setting the *last_agino cursor so that we
> > > > -	 * return the correct cookie to userspace. On the next bulkstat call,
> > > > -	 * the inode under the lastino cookie will be skipped as we have already
> > > > -	 * processed it here.
> > > > -	 */
> > > > -	*last_agino = agino - 1;
> > > > +	int			error;
> > > >  
> > > > +	error = xfs_bulkstat_one_int(mp, tp, ino, data);
> > > > +	/* bulkstat just skips over missing inodes */
> > > > +	if (error == -ENOENT || error == -EINVAL)
> > > > +		return 0;
> > > >  	return error;
> > > >  }
> > > >  
> > > >  /*
> > > > - * Return stat information in bulk (by-inode) for the filesystem.
> > > > + * Check the incoming lastino parameter.
> > > > + *
> > > > + * We allow any inode value that could map to physical space inside the
> > > > + * filesystem because if there are no inodes there, bulkstat moves on to the
> > > > + * next chunk.  In other words, the magic agino value of zero takes us to the
> > > > + * first chunk in the AG, and an agino value past the end of the AG takes us to
> > > > + * the first chunk in the next AG.
> > > > + *
> > > > + * Therefore we can end early if the requested inode is beyond the end of the
> > > > + * filesystem or doesn't map properly.
> > > >   */
> > > > -int					/* error status */
> > > > -xfs_bulkstat(
> > > > -	xfs_mount_t		*mp,	/* mount point for filesystem */
> > > > -	xfs_ino_t		*lastinop, /* last inode returned */
> > > > -	int			*ubcountp, /* size of buffer/count returned */
> > > > -	bulkstat_one_pf		formatter, /* func that'd fill a single buf */
> > > > -	size_t			statstruct_size, /* sizeof struct filling */
> > > > -	char			__user *ubuffer, /* buffer with inode stats */
> > > > -	int			*done)	/* 1 if there are more stats to get */
> > > > +static inline bool
> > > > +xfs_bulkstat_already_done(
> > > > +	struct xfs_mount	*mp,
> > > > +	xfs_ino_t		startino)
> > > >  {
> > > > -	xfs_buf_t		*agbp;	/* agi header buffer */
> > > > -	xfs_agino_t		agino;	/* inode # in allocation group */
> > > > -	xfs_agnumber_t		agno;	/* allocation group number */
> > > > -	xfs_btree_cur_t		*cur;	/* btree cursor for ialloc btree */
> > > > -	xfs_inobt_rec_incore_t	*irbuf;	/* start of irec buffer */
> > > > -	int			nirbuf;	/* size of irbuf */
> > > > -	int			ubcount; /* size of user's buffer */
> > > > -	struct xfs_bulkstat_agichunk ac;
> > > > -	int			error = 0;
> > > > +	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, startino);
> > > > +	xfs_agino_t		agino = XFS_INO_TO_AGINO(mp, startino);
> > > >  
> > > > -	/*
> > > > -	 * Get the last inode value, see if there's nothing to do.
> > > > -	 */
> > > > -	agno = XFS_INO_TO_AGNO(mp, *lastinop);
> > > > -	agino = XFS_INO_TO_AGINO(mp, *lastinop);
> > > > -	if (agno >= mp->m_sb.sb_agcount ||
> > > > -	    *lastinop != XFS_AGINO_TO_INO(mp, agno, agino)) {
> > > > -		*done = 1;
> > > > -		*ubcountp = 0;
> > > > -		return 0;
> > > > -	}
> > > > +	return agno >= mp->m_sb.sb_agcount ||
> > > > +	       startino != XFS_AGINO_TO_INO(mp, agno, agino);
> > > > +}
> > > >  
> > > > -	ubcount = *ubcountp; /* statstruct's */
> > > > -	ac.ac_ubuffer = &ubuffer;
> > > > -	ac.ac_ubleft = ubcount * statstruct_size; /* bytes */;
> > > > -	ac.ac_ubelem = 0;
> > > > +/* Return stat information in bulk (by-inode) for the filesystem. */
> > > > +int
> > > > +xfs_bulkstat(
> > > > +	struct xfs_ibulk	*breq,
> > > > +	bulkstat_one_fmt_pf	formatter)
> > > > +{
> > > > +	struct xfs_bstat_chunk	bc = {
> > > > +		.formatter	= formatter,
> > > > +		.breq		= breq,
> > > > +	};
> > > > +	int			error;
> > > >  
> > > > -	*ubcountp = 0;
> > > > -	*done = 0;
> > > > +	if (xfs_bulkstat_already_done(breq->mp, breq->startino))
> > > > +		return 0;
> > > >  
> > > > -	irbuf = kmem_zalloc_large(PAGE_SIZE * 4, KM_SLEEP);
> > > > -	if (!irbuf)
> > > > +	bc.buf = kmem_zalloc(sizeof(struct xfs_bstat), KM_SLEEP | KM_MAYFAIL);
> > > > +	if (!bc.buf)
> > > >  		return -ENOMEM;
> > > > -	nirbuf = (PAGE_SIZE * 4) / sizeof(*irbuf);
> > > >  
> > > > -	/*
> > > > -	 * Loop over the allocation groups, starting from the last
> > > > -	 * inode returned; 0 means start of the allocation group.
> > > > -	 */
> > > > -	while (agno < mp->m_sb.sb_agcount) {
> > > > -		struct xfs_inobt_rec_incore	*irbp = irbuf;
> > > > -		struct xfs_inobt_rec_incore	*irbufend = irbuf + nirbuf;
> > > > -		bool				end_of_ag = false;
> > > > -		int				icount = 0;
> > > > -		int				stat;
> > > > +	error = xfs_iwalk(breq->mp, NULL, breq->startino, xfs_bulkstat_iwalk,
> > > > +			breq->icount, &bc);
> > > >  
> > > > -		error = xfs_ialloc_read_agi(mp, NULL, agno, &agbp);
> > > > -		if (error)
> > > > -			break;
> > > > -		/*
> > > > -		 * Allocate and initialize a btree cursor for ialloc btree.
> > > > -		 */
> > > > -		cur = xfs_inobt_init_cursor(mp, NULL, agbp, agno,
> > > > -					    XFS_BTNUM_INO);
> > > > -		if (agino > 0) {
> > > > -			/*
> > > > -			 * In the middle of an allocation group, we need to get
> > > > -			 * the remainder of the chunk we're in.
> > > > -			 */
> > > > -			struct xfs_inobt_rec_incore	r;
> > > > -
> > > > -			error = xfs_bulkstat_grab_ichunk(cur, agino, &icount, &r);
> > > > -			if (error)
> > > > -				goto del_cursor;
> > > > -			if (icount) {
> > > > -				irbp->ir_startino = r.ir_startino;
> > > > -				irbp->ir_holemask = r.ir_holemask;
> > > > -				irbp->ir_count = r.ir_count;
> > > > -				irbp->ir_freecount = r.ir_freecount;
> > > > -				irbp->ir_free = r.ir_free;
> > > > -				irbp++;
> > > > -			}
> > > > -			/* Increment to the next record */
> > > > -			error = xfs_btree_increment(cur, 0, &stat);
> > > > -		} else {
> > > > -			/* Start of ag.  Lookup the first inode chunk */
> > > > -			error = xfs_inobt_lookup(cur, 0, XFS_LOOKUP_GE, &stat);
> > > > -		}
> > > > -		if (error || stat == 0) {
> > > > -			end_of_ag = true;
> > > > -			goto del_cursor;
> > > > -		}
> > > > -
> > > > -		/*
> > > > -		 * Loop through inode btree records in this ag,
> > > > -		 * until we run out of inodes or space in the buffer.
> > > > -		 */
> > > > -		while (irbp < irbufend && icount < ubcount) {
> > > > -			struct xfs_inobt_rec_incore	r;
> > > > -
> > > > -			error = xfs_inobt_get_rec(cur, &r, &stat);
> > > > -			if (error || stat == 0) {
> > > > -				end_of_ag = true;
> > > > -				goto del_cursor;
> > > > -			}
> > > > -
> > > > -			/*
> > > > -			 * If this chunk has any allocated inodes, save it.
> > > > -			 * Also start read-ahead now for this chunk.
> > > > -			 */
> > > > -			if (r.ir_freecount < r.ir_count) {
> > > > -				xfs_bulkstat_ichunk_ra(mp, agno, &r);
> > > > -				irbp->ir_startino = r.ir_startino;
> > > > -				irbp->ir_holemask = r.ir_holemask;
> > > > -				irbp->ir_count = r.ir_count;
> > > > -				irbp->ir_freecount = r.ir_freecount;
> > > > -				irbp->ir_free = r.ir_free;
> > > > -				irbp++;
> > > > -				icount += r.ir_count - r.ir_freecount;
> > > > -			}
> > > > -			error = xfs_btree_increment(cur, 0, &stat);
> > > > -			if (error || stat == 0) {
> > > > -				end_of_ag = true;
> > > > -				goto del_cursor;
> > > > -			}
> > > > -			cond_resched();
> > > > -		}
> > > > -
> > > > -		/*
> > > > -		 * Drop the btree buffers and the agi buffer as we can't hold any
> > > > -		 * of the locks these represent when calling iget. If there is a
> > > > -		 * pending error, then we are done.
> > > > -		 */
> > > > -del_cursor:
> > > > -		xfs_btree_del_cursor(cur, error);
> > > > -		xfs_buf_relse(agbp);
> > > > -		if (error)
> > > > -			break;
> > > > -		/*
> > > > -		 * Now format all the good inodes into the user's buffer. The
> > > > -		 * call to xfs_bulkstat_ag_ichunk() sets up the agino pointer
> > > > -		 * for the next loop iteration.
> > > > -		 */
> > > > -		irbufend = irbp;
> > > > -		for (irbp = irbuf;
> > > > -		     irbp < irbufend && ac.ac_ubleft >= statstruct_size;
> > > > -		     irbp++) {
> > > > -			error = xfs_bulkstat_ag_ichunk(mp, agno, irbp,
> > > > -					formatter, statstruct_size, &ac,
> > > > -					&agino);
> > > > -			if (error)
> > > > -				break;
> > > > -
> > > > -			cond_resched();
> > > > -		}
> > > > -
> > > > -		/*
> > > > -		 * If we've run out of space or had a formatting error, we
> > > > -		 * are now done
> > > > -		 */
> > > > -		if (ac.ac_ubleft < statstruct_size || error)
> > > > -			break;
> > > > -
> > > > -		if (end_of_ag) {
> > > > -			agno++;
> > > > -			agino = 0;
> > > > -		}
> > > > -	}
> > > > -	/*
> > > > -	 * Done, we're either out of filesystem or space to put the data.
> > > > -	 */
> > > > -	kmem_free(irbuf);
> > > > -	*ubcountp = ac.ac_ubelem;
> > > > +	kmem_free(bc.buf);
> > > >  
> > > >  	/*
> > > >  	 * We found some inodes, so clear the error status and return them.
> > > > @@ -509,17 +356,9 @@ xfs_bulkstat(
> > > >  	 * triggered again and propagated to userspace as there will be no
> > > >  	 * formatted inodes in the buffer.
> > > >  	 */
> > > > -	if (ac.ac_ubelem)
> > > > +	if (breq->ocount > 0)
> > > >  		error = 0;
> > > >  
> > > > -	/*
> > > > -	 * If we ran out of filesystem, lastino will point off the end of
> > > > -	 * the filesystem so the next call will return immediately.
> > > > -	 */
> > > > -	*lastinop = XFS_AGINO_TO_INO(mp, agno, agino);
> > > > -	if (agno >= mp->m_sb.sb_agcount)
> > > > -		*done = 1;
> > > > -
> > > >  	return error;
> > > >  }
> > > >  
> > > > diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
> > > > index 369e3f159d4e..7c5f1df360e6 100644
> > > > --- a/fs/xfs/xfs_itable.h
> > > > +++ b/fs/xfs/xfs_itable.h
> > > > @@ -5,63 +5,46 @@
> > > >  #ifndef __XFS_ITABLE_H__
> > > >  #define	__XFS_ITABLE_H__
> > > >  
> > > > -/*
> > > > - * xfs_bulkstat() is used to fill in xfs_bstat structures as well as dm_stat
> > > > - * structures (by the dmi library). This is a pointer to a formatter function
> > > > - * that will iget the inode and fill in the appropriate structure.
> > > > - * see xfs_bulkstat_one() and xfs_dm_bulkstat_one() in dmapi_xfs.c
> > > > - */
> > > > -typedef int (*bulkstat_one_pf)(struct xfs_mount	*mp,
> > > > -			       xfs_ino_t	ino,
> > > > -			       void		__user *buffer,
> > > > -			       int		ubsize,
> > > > -			       int		*ubused,
> > > > -			       int		*stat);
> > > > +/* In-memory representation of a userspace request for batch inode data. */
> > > > +struct xfs_ibulk {
> > > > +	struct xfs_mount	*mp;
> > > > +	void __user		*ubuffer; /* user output buffer */
> > > > +	xfs_ino_t		startino; /* start with this inode */
> > > > +	unsigned int		icount;   /* number of elements in ubuffer */
> > > > +	unsigned int		ocount;   /* number of records returned */
> > > > +};
> > > > +
> > > > +/* Return value that means we want to abort the walk. */
> > > > +#define XFS_IBULK_ABORT		(XFS_IWALK_ABORT)
> > > > +
> > > > +/* Return value that means the formatting buffer is now full. */
> > > > +#define XFS_IBULK_BUFFER_FULL	(XFS_IBULK_ABORT + 1)
> > > >  
> > > >  /*
> > > > - * Values for stat return value.
> > > > + * Advance the user buffer pointer by one record of the given size.  If the
> > > > + * buffer is now full, return the appropriate error code.
> > > >   */
> > > > -#define BULKSTAT_RV_NOTHING	0
> > > > -#define BULKSTAT_RV_DIDONE	1
> > > > -#define BULKSTAT_RV_GIVEUP	2
> > > > +static inline int
> > > > +xfs_ibulk_advance(
> > > > +	struct xfs_ibulk	*breq,
> > > > +	size_t			bytes)
> > > > +{
> > > > +	char __user		*b = breq->ubuffer;
> > > > +
> > > > +	breq->ubuffer = b + bytes;
> > > > +	breq->ocount++;
> > > > +	return breq->ocount == breq->icount ? XFS_IBULK_BUFFER_FULL : 0;
> > > > +}
> > > >  
> > > >  /*
> > > >   * Return stat information in bulk (by-inode) for the filesystem.
> > > >   */
> > > > -int					/* error status */
> > > > -xfs_bulkstat(
> > > > -	xfs_mount_t	*mp,		/* mount point for filesystem */
> > > > -	xfs_ino_t	*lastino,	/* last inode returned */
> > > > -	int		*count,		/* size of buffer/count returned */
> > > > -	bulkstat_one_pf formatter,	/* func that'd fill a single buf */
> > > > -	size_t		statstruct_size,/* sizeof struct that we're filling */
> > > > -	char		__user *ubuffer,/* buffer with inode stats */
> > > > -	int		*done);		/* 1 if there are more stats to get */
> > > >  
> > > > -typedef int (*bulkstat_one_fmt_pf)(  /* used size in bytes or negative error */
> > > > -	void			__user *ubuffer, /* buffer to write to */
> > > > -	int			ubsize,		 /* remaining user buffer sz */
> > > > -	int			*ubused,	 /* bytes used by formatter */
> > > > -	const xfs_bstat_t	*buffer);        /* buffer to read from */
> > > > +typedef int (*bulkstat_one_fmt_pf)(struct xfs_ibulk *breq,
> > > > +		const struct xfs_bstat *bstat);
> > > >  
> > > > -int
> > > > -xfs_bulkstat_one_int(
> > > > -	xfs_mount_t		*mp,
> > > > -	xfs_ino_t		ino,
> > > > -	void			__user *buffer,
> > > > -	int			ubsize,
> > > > -	bulkstat_one_fmt_pf	formatter,
> > > > -	int			*ubused,
> > > > -	int			*stat);
> > > > -
> > > > -int
> > > > -xfs_bulkstat_one(
> > > > -	xfs_mount_t		*mp,
> > > > -	xfs_ino_t		ino,
> > > > -	void			__user *buffer,
> > > > -	int			ubsize,
> > > > -	int			*ubused,
> > > > -	int			*stat);
> > > > +int xfs_bulkstat_one(struct xfs_ibulk *breq, bulkstat_one_fmt_pf formatter);
> > > > +int xfs_bulkstat(struct xfs_ibulk *breq, bulkstat_one_fmt_pf formatter);
> > > >  
> > > >  typedef int (*inumbers_fmt_pf)(
> > > >  	void			__user *ubuffer, /* buffer to write to */
> > > > 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 10/14] xfs: refactor xfs_iwalk_grab_ichunk
  2019-06-12  6:48 ` [PATCH 10/14] xfs: refactor xfs_iwalk_grab_ichunk Darrick J. Wong
@ 2019-06-14 14:04   ` Brian Foster
  0 siblings, 0 replies; 33+ messages in thread
From: Brian Foster @ 2019-06-14 14:04 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Jun 11, 2019 at 11:48:36PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> In preparation for reusing the iwalk code for the inogrp walking code
> (aka INUMBERS), move the initial inobt lookup and retrieval code out of
> xfs_iwalk_grab_ichunk so that we call the masking code only when we need
> to trim out the inodes that came before the cursor in the inobt record
> (aka BULKSTAT).
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_iwalk.c |   79 ++++++++++++++++++++++++++--------------------------
>  1 file changed, 39 insertions(+), 40 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
> index a2102fa94ff5..8c4d7e59f86a 100644
> --- a/fs/xfs/xfs_iwalk.c
> +++ b/fs/xfs/xfs_iwalk.c
> @@ -98,43 +98,17 @@ xfs_iwalk_ichunk_ra(
>  }
>  
>  /*
> - * Lookup the inode chunk that the given @agino lives in and then get the
> - * record if we found the chunk.  Set the bits in @irec's free mask that
> - * correspond to the inodes before @agino so that we skip them.  This is how we
> - * restart an inode walk that was interrupted in the middle of an inode record.
> + * Set the bits in @irec's free mask that correspond to the inodes before
> + * @agino so that we skip them.  This is how we restart an inode walk that was
> + * interrupted in the middle of an inode record.
>   */
> -STATIC int
> -xfs_iwalk_grab_ichunk(
> -	struct xfs_btree_cur		*cur,	/* btree cursor */
> +STATIC void
> +xfs_iwalk_adjust_start(
>  	xfs_agino_t			agino,	/* starting inode of chunk */
> -	int				*icount,/* return # of inodes grabbed */
>  	struct xfs_inobt_rec_incore	*irec)	/* btree record */
>  {
>  	int				idx;	/* index into inode chunk */
> -	int				stat;
>  	int				i;
> -	int				error = 0;
> -
> -	/* Lookup the inode chunk that this inode lives in */
> -	error = xfs_inobt_lookup(cur, agino, XFS_LOOKUP_LE, &stat);
> -	if (error)
> -		return error;
> -	if (!stat) {
> -		*icount = 0;
> -		return error;
> -	}
> -
> -	/* Get the record, should always work */
> -	error = xfs_inobt_get_rec(cur, irec, &stat);
> -	if (error)
> -		return error;
> -	XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, stat == 1);
> -
> -	/* Check if the record contains the inode in request */
> -	if (irec->ir_startino + XFS_INODES_PER_CHUNK <= agino) {
> -		*icount = 0;
> -		return 0;
> -	}
>  
>  	idx = agino - irec->ir_startino;
>  
> @@ -149,8 +123,6 @@ xfs_iwalk_grab_ichunk(
>  	}
>  
>  	irec->ir_free |= xfs_inobt_maskn(0, idx);
> -	*icount = irec->ir_count - irec->ir_freecount;
> -	return 0;
>  }
>  
>  /* Allocate memory for a walk. */
> @@ -258,7 +230,7 @@ xfs_iwalk_ag_start(
>  {
>  	struct xfs_mount	*mp = iwag->mp;
>  	struct xfs_trans	*tp = iwag->tp;
> -	int			icount;
> +	struct xfs_inobt_rec_incore *irec;
>  	int			error;
>  
>  	/* Set up a fresh cursor and empty the inobt cache. */
> @@ -274,15 +246,40 @@ xfs_iwalk_ag_start(
>  	/*
>  	 * Otherwise, we have to grab the inobt record where we left off, stuff
>  	 * the record into our cache, and then see if there are more records.
> -	 * We require a lookup cache of at least two elements so that we don't
> -	 * have to deal with tearing down the cursor to walk the records.
> +	 * We require a lookup cache of at least two elements so that the
> +	 * caller doesn't have to deal with tearing down the cursor to walk the
> +	 * records.
>  	 */
> -	error = xfs_iwalk_grab_ichunk(*curpp, agino, &icount,
> -			&iwag->recs[iwag->nr_recs]);
> +	error = xfs_inobt_lookup(*curpp, agino, XFS_LOOKUP_LE, has_more);
> +	if (error)
> +		return error;
> +
> +	/*
> +	 * If the LE lookup at @agino yields no records, jump ahead to the
> +	 * inobt cursor increment to see if there are more records to process.
> +	 */
> +	if (!*has_more)
> +		goto out_advance;
> +
> +	/* Get the record, should always work */
> +	irec = &iwag->recs[iwag->nr_recs];
> +	error = xfs_inobt_get_rec(*curpp, irec, has_more);
>  	if (error)
>  		return error;
> -	if (icount)
> -		iwag->nr_recs++;
> +	XFS_WANT_CORRUPTED_RETURN(mp, *has_more == 1);
> +
> +	/*
> +	 * If the LE lookup yielded an inobt record before the cursor position,
> +	 * skip it and see if there's another one after it.
> +	 */
> +	if (irec->ir_startino + XFS_INODES_PER_CHUNK <= agino)
> +		goto out_advance;
> +
> +	/*
> +	 * If agino fell in the middle of the inode record, make it look like
> +	 * the inodes up to agino are free so that we don't return them again.
> +	 */
> +	xfs_iwalk_adjust_start(agino, irec);
>  
>  	/*
>  	 * set_prefetch is supposed to give us a large enough inobt record
> @@ -290,8 +287,10 @@ xfs_iwalk_ag_start(
>  	 * body can cache a record without having to check for cache space
>  	 * until after it reads an inobt record.
>  	 */
> +	iwag->nr_recs++;
>  	ASSERT(iwag->nr_recs < iwag->sz_recs);
>  
> +out_advance:
>  	return xfs_btree_increment(*curpp, 0, has_more);
>  }
>  
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 11/14] xfs: refactor iwalk code to handle walking inobt records
  2019-06-12  6:48 ` [PATCH 11/14] xfs: refactor iwalk code to handle walking inobt records Darrick J. Wong
@ 2019-06-14 14:04   ` Brian Foster
  0 siblings, 0 replies; 33+ messages in thread
From: Brian Foster @ 2019-06-14 14:04 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Jun 11, 2019 at 11:48:42PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Refactor xfs_iwalk_ag_start and xfs_iwalk_ag so that the bits that are
> particular to bulkstat (trimming the start irec, starting inode
> readahead, and skipping empty groups) can be controlled via flags in the
> iwag structure.
> 
> This enables us to add a new function to walk all inobt records which
> will be used for the new INUMBERS implementation in the next patch.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/xfs_iwalk.c |   75 ++++++++++++++++++++++++++++++++++++++++++++++++++--
>  fs/xfs/xfs_iwalk.h |   12 ++++++++
>  2 files changed, 84 insertions(+), 3 deletions(-)
> 
> 
...
> diff --git a/fs/xfs/xfs_iwalk.h b/fs/xfs/xfs_iwalk.h
> index 9e762e31dadc..97c1120d4237 100644
> --- a/fs/xfs/xfs_iwalk.h
> +++ b/fs/xfs/xfs_iwalk.h
> @@ -16,4 +16,16 @@ typedef int (*xfs_iwalk_fn)(struct xfs_mount *mp, struct xfs_trans *tp,
>  int xfs_iwalk(struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t startino,
>  		xfs_iwalk_fn iwalk_fn, unsigned int max_prefetch, void *data);
>  
> +/* Walk all inode btree records in the filesystem starting from @startino. */
> +typedef int (*xfs_inobt_walk_fn)(struct xfs_mount *mp, struct xfs_trans *tp,
> +				 xfs_agnumber_t agno,
> +				 const struct xfs_inobt_rec_incore *irec,
> +				 void *data);
> +/* Return value (for xfs_inobt_walk_fn) that aborts the walk immediately. */
> +#define XFS_INOBT_WALK_ABORT	(XFS_IWALK_ABORT)
> +

Similar comment here around the need for a special case abort error. I
assume we could just use IWALK_ABORT. That aside this all looks pretty
good to me. Thanks for the cleanup:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> +int xfs_inobt_walk(struct xfs_mount *mp, struct xfs_trans *tp,
> +		xfs_ino_t startino, xfs_inobt_walk_fn inobt_walk_fn,
> +		unsigned int max_prefetch, void *data);
> +
>  #endif /* __XFS_IWALK_H__ */
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 12/14] xfs: refactor INUMBERS to use iwalk functions
  2019-06-12  6:48 ` [PATCH 12/14] xfs: refactor INUMBERS to use iwalk functions Darrick J. Wong
@ 2019-06-14 14:05   ` Brian Foster
  0 siblings, 0 replies; 33+ messages in thread
From: Brian Foster @ 2019-06-14 14:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Jun 11, 2019 at 11:48:49PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Now that we have generic functions to walk inode records, refactor the
> INUMBERS implementation to use it.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Modulo the error code stuff:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_ioctl.c   |   20 ++++--
>  fs/xfs/xfs_ioctl.h   |    2 +
>  fs/xfs/xfs_ioctl32.c |   35 ++++-------
>  fs/xfs/xfs_itable.c  |  166 +++++++++++++++++++-------------------------------
>  fs/xfs/xfs_itable.h  |   22 +------
>  5 files changed, 95 insertions(+), 150 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 60595e61f2a6..04b661ff0799 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -733,6 +733,16 @@ xfs_bulkstat_one_fmt(
>  	return xfs_ibulk_advance(breq, sizeof(struct xfs_bstat));
>  }
>  
> +int
> +xfs_inumbers_fmt(
> +	struct xfs_ibulk	*breq,
> +	const struct xfs_inogrp	*igrp)
> +{
> +	if (copy_to_user(breq->ubuffer, igrp, sizeof(*igrp)))
> +		return -EFAULT;
> +	return xfs_ibulk_advance(breq, sizeof(struct xfs_inogrp));
> +}
> +
>  STATIC int
>  xfs_ioc_bulkstat(
>  	xfs_mount_t		*mp,
> @@ -783,13 +793,9 @@ xfs_ioc_bulkstat(
>  	 * in filesystem".
>  	 */
>  	if (cmd == XFS_IOC_FSINUMBERS) {
> -		int	count = breq.icount;
> -
> -		breq.startino = lastino;
> -		error = xfs_inumbers(mp, &breq.startino, &count,
> -					bulkreq.ubuffer, xfs_inumbers_fmt);
> -		breq.ocount = count;
> -		lastino = breq.startino;
> +		breq.startino = lastino ? lastino + 1 : 0;
> +		error = xfs_inumbers(&breq, xfs_inumbers_fmt);
> +		lastino = breq.startino - 1;
>  	} else if (cmd == XFS_IOC_FSBULKSTAT_SINGLE) {
>  		breq.startino = lastino;
>  		breq.icount = 1;
> diff --git a/fs/xfs/xfs_ioctl.h b/fs/xfs/xfs_ioctl.h
> index f32c8aadfeba..fb303eaa8863 100644
> --- a/fs/xfs/xfs_ioctl.h
> +++ b/fs/xfs/xfs_ioctl.h
> @@ -79,7 +79,9 @@ xfs_set_dmattrs(
>  
>  struct xfs_ibulk;
>  struct xfs_bstat;
> +struct xfs_inogrp;
>  
>  int xfs_bulkstat_one_fmt(struct xfs_ibulk *breq, const struct xfs_bstat *bstat);
> +int xfs_inumbers_fmt(struct xfs_ibulk *breq, const struct xfs_inogrp *igrp);
>  
>  #endif
> diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> index 5d1c143bac18..3ca8ff9d4ac7 100644
> --- a/fs/xfs/xfs_ioctl32.c
> +++ b/fs/xfs/xfs_ioctl32.c
> @@ -87,22 +87,17 @@ xfs_compat_growfs_rt_copyin(
>  
>  STATIC int
>  xfs_inumbers_fmt_compat(
> -	void			__user *ubuffer,
> -	const struct xfs_inogrp	*buffer,
> -	long			count,
> -	long			*written)
> +	struct xfs_ibulk	*breq,
> +	const struct xfs_inogrp	*igrp)
>  {
> -	compat_xfs_inogrp_t	__user *p32 = ubuffer;
> -	long			i;
> +	struct compat_xfs_inogrp __user *p32 = breq->ubuffer;
>  
> -	for (i = 0; i < count; i++) {
> -		if (put_user(buffer[i].xi_startino,   &p32[i].xi_startino) ||
> -		    put_user(buffer[i].xi_alloccount, &p32[i].xi_alloccount) ||
> -		    put_user(buffer[i].xi_allocmask,  &p32[i].xi_allocmask))
> -			return -EFAULT;
> -	}
> -	*written = count * sizeof(*p32);
> -	return 0;
> +	if (put_user(igrp->xi_startino,   &p32->xi_startino) ||
> +	    put_user(igrp->xi_alloccount, &p32->xi_alloccount) ||
> +	    put_user(igrp->xi_allocmask,  &p32->xi_allocmask))
> +		return -EFAULT;
> +
> +	return xfs_ibulk_advance(breq, sizeof(struct compat_xfs_inogrp));
>  }
>  
>  #else
> @@ -228,7 +223,7 @@ xfs_compat_ioc_bulkstat(
>  	 * to userpace memory via bulkreq.ubuffer.  Normally the compat
>  	 * functions and structure size are the correct ones to use ...
>  	 */
> -	inumbers_fmt_pf inumbers_func = xfs_inumbers_fmt_compat;
> +	inumbers_fmt_pf		inumbers_func = xfs_inumbers_fmt_compat;
>  	bulkstat_one_fmt_pf	bs_one_func = xfs_bulkstat_one_fmt_compat;
>  
>  #ifdef CONFIG_X86_X32
> @@ -291,13 +286,9 @@ xfs_compat_ioc_bulkstat(
>  	 * in filesystem".
>  	 */
>  	if (cmd == XFS_IOC_FSINUMBERS_32) {
> -		int	count = breq.icount;
> -
> -		breq.startino = lastino;
> -		error = xfs_inumbers(mp, &breq.startino, &count,
> -				bulkreq.ubuffer, inumbers_func);
> -		breq.ocount = count;
> -		lastino = breq.startino;
> +		breq.startino = lastino ? lastino + 1 : 0;
> +		error = xfs_inumbers(&breq, inumbers_func);
> +		lastino = breq.startino - 1;
>  	} else if (cmd == XFS_IOC_FSBULKSTAT_SINGLE_32) {
>  		breq.startino = lastino;
>  		breq.icount = 1;
> diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> index 1b3c9feb5f6f..b2f640ecb507 100644
> --- a/fs/xfs/xfs_itable.c
> +++ b/fs/xfs/xfs_itable.c
> @@ -269,121 +269,83 @@ xfs_bulkstat(
>  	return error;
>  }
>  
> -int
> -xfs_inumbers_fmt(
> -	void			__user *ubuffer, /* buffer to write to */
> -	const struct xfs_inogrp	*buffer,	/* buffer to read from */
> -	long			count,		/* # of elements to read */
> -	long			*written)	/* # of bytes written */
> +struct xfs_inumbers_chunk {
> +	inumbers_fmt_pf		formatter;
> +	struct xfs_ibulk	*breq;
> +};
> +
> +/*
> + * INUMBERS
> + * ========
> + * This is how we export inode btree records to userspace, so that XFS tools
> + * can figure out where inodes are allocated.
> + */
> +
> +/*
> + * Format the inode group structure and report it somewhere.
> + *
> + * Similar to xfs_bulkstat_one_int, lastino is the inode cursor as we walk
> + * through the filesystem so we move it forward unless there was a runtime
> + * error.  If the formatter tells us the buffer is now full we also move the
> + * cursor forward and abort the walk.
> + */
> +STATIC int
> +xfs_inumbers_walk(
> +	struct xfs_mount	*mp,
> +	struct xfs_trans	*tp,
> +	xfs_agnumber_t		agno,
> +	const struct xfs_inobt_rec_incore *irec,
> +	void			*data)
>  {
> -	if (copy_to_user(ubuffer, buffer, count * sizeof(*buffer)))
> -		return -EFAULT;
> -	*written = count * sizeof(*buffer);
> -	return 0;
> +	struct xfs_inogrp	inogrp = {
> +		.xi_startino	= XFS_AGINO_TO_INO(mp, agno, irec->ir_startino),
> +		.xi_alloccount	= irec->ir_count - irec->ir_freecount,
> +		.xi_allocmask	= ~irec->ir_free,
> +	};
> +	struct xfs_inumbers_chunk *ic = data;
> +	xfs_agino_t		agino;
> +	int			error;
> +
> +	error = ic->formatter(ic->breq, &inogrp);
> +	if (error && error != XFS_IBULK_BUFFER_FULL)
> +		return error;
> +	if (error == XFS_IBULK_BUFFER_FULL)
> +		error = XFS_INOBT_WALK_ABORT;
> +
> +	agino = irec->ir_startino + XFS_INODES_PER_CHUNK;
> +	ic->breq->startino = XFS_AGINO_TO_INO(mp, agno, agino);
> +	return error;
>  }
>  
>  /*
>   * Return inode number table for the filesystem.
>   */
> -int					/* error status */
> +int
>  xfs_inumbers(
> -	struct xfs_mount	*mp,/* mount point for filesystem */
> -	xfs_ino_t		*lastino,/* last inode returned */
> -	int			*count,/* size of buffer/count returned */
> -	void			__user *ubuffer,/* buffer with inode descriptions */
> +	struct xfs_ibulk	*breq,
>  	inumbers_fmt_pf		formatter)
>  {
> -	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, *lastino);
> -	xfs_agino_t		agino = XFS_INO_TO_AGINO(mp, *lastino);
> -	struct xfs_btree_cur	*cur = NULL;
> -	struct xfs_buf		*agbp = NULL;
> -	struct xfs_inogrp	*buffer;
> -	int			bcount;
> -	int			left = *count;
> -	int			bufidx = 0;
> +	struct xfs_inumbers_chunk ic = {
> +		.formatter	= formatter,
> +		.breq		= breq,
> +	};
>  	int			error = 0;
>  
> -	*count = 0;
> -	if (agno >= mp->m_sb.sb_agcount ||
> -	    *lastino != XFS_AGINO_TO_INO(mp, agno, agino))
> -		return error;
> +	if (xfs_bulkstat_already_done(breq->mp, breq->startino))
> +		return 0;
>  
> -	bcount = min(left, (int)(PAGE_SIZE / sizeof(*buffer)));
> -	buffer = kmem_zalloc(bcount * sizeof(*buffer), KM_SLEEP);
> -	do {
> -		struct xfs_inobt_rec_incore	r;
> -		int				stat;
> -
> -		if (!agbp) {
> -			error = xfs_ialloc_read_agi(mp, NULL, agno, &agbp);
> -			if (error)
> -				break;
> -
> -			cur = xfs_inobt_init_cursor(mp, NULL, agbp, agno,
> -						    XFS_BTNUM_INO);
> -			error = xfs_inobt_lookup(cur, agino, XFS_LOOKUP_GE,
> -						 &stat);
> -			if (error)
> -				break;
> -			if (!stat)
> -				goto next_ag;
> -		}
> -
> -		error = xfs_inobt_get_rec(cur, &r, &stat);
> -		if (error)
> -			break;
> -		if (!stat)
> -			goto next_ag;
> -
> -		agino = r.ir_startino + XFS_INODES_PER_CHUNK - 1;
> -		buffer[bufidx].xi_startino =
> -			XFS_AGINO_TO_INO(mp, agno, r.ir_startino);
> -		buffer[bufidx].xi_alloccount = r.ir_count - r.ir_freecount;
> -		buffer[bufidx].xi_allocmask = ~r.ir_free;
> -		if (++bufidx == bcount) {
> -			long	written;
> -
> -			error = formatter(ubuffer, buffer, bufidx, &written);
> -			if (error)
> -				break;
> -			ubuffer += written;
> -			*count += bufidx;
> -			bufidx = 0;
> -		}
> -		if (!--left)
> -			break;
> -
> -		error = xfs_btree_increment(cur, 0, &stat);
> -		if (error)
> -			break;
> -		if (stat)
> -			continue;
> -
> -next_ag:
> -		xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
> -		cur = NULL;
> -		xfs_buf_relse(agbp);
> -		agbp = NULL;
> -		agino = 0;
> -		agno++;
> -	} while (agno < mp->m_sb.sb_agcount);
> -
> -	if (!error) {
> -		if (bufidx) {
> -			long	written;
> -
> -			error = formatter(ubuffer, buffer, bufidx, &written);
> -			if (!error)
> -				*count += bufidx;
> -		}
> -		*lastino = XFS_AGINO_TO_INO(mp, agno, agino);
> -	}
> +	error = xfs_inobt_walk(breq->mp, NULL, breq->startino,
> +			xfs_inumbers_walk, breq->icount, &ic);
>  
> -	kmem_free(buffer);
> -	if (cur)
> -		xfs_btree_del_cursor(cur, error);
> -	if (agbp)
> -		xfs_buf_relse(agbp);
> +	/*
> +	 * We found some inode groups, so clear the error status and return
> +	 * them.  The lastino pointer will point directly at the inode that
> +	 * triggered any error that occurred, so on the next call the error
> +	 * will be triggered again and propagated to userspace as there will be
> +	 * no formatted inode groups in the buffer.
> +	 */
> +	if (breq->ocount > 0)
> +		error = 0;
>  
>  	return error;
>  }
> diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
> index 328a161b8898..1e1a5bb9fd9f 100644
> --- a/fs/xfs/xfs_itable.h
> +++ b/fs/xfs/xfs_itable.h
> @@ -46,25 +46,9 @@ typedef int (*bulkstat_one_fmt_pf)(struct xfs_ibulk *breq,
>  int xfs_bulkstat_one(struct xfs_ibulk *breq, bulkstat_one_fmt_pf formatter);
>  int xfs_bulkstat(struct xfs_ibulk *breq, bulkstat_one_fmt_pf formatter);
>  
> -typedef int (*inumbers_fmt_pf)(
> -	void			__user *ubuffer, /* buffer to write to */
> -	const xfs_inogrp_t	*buffer,	/* buffer to read from */
> -	long			count,		/* # of elements to read */
> -	long			*written);	/* # of bytes written */
> +typedef int (*inumbers_fmt_pf)(struct xfs_ibulk *breq,
> +		const struct xfs_inogrp *igrp);
>  
> -int
> -xfs_inumbers_fmt(
> -	void			__user *ubuffer, /* buffer to write to */
> -	const xfs_inogrp_t	*buffer,	/* buffer to read from */
> -	long			count,		/* # of elements to read */
> -	long			*written);	/* # of bytes written */
> -
> -int					/* error status */
> -xfs_inumbers(
> -	xfs_mount_t		*mp,	/* mount point for filesystem */
> -	xfs_ino_t		*last,	/* last inode returned */
> -	int			*count,	/* size of buffer/count returned */
> -	void			__user *buffer, /* buffer with inode info */
> -	inumbers_fmt_pf		formatter);
> +int xfs_inumbers(struct xfs_ibulk *breq, inumbers_fmt_pf formatter);
>  
>  #endif	/* __XFS_ITABLE_H__ */
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 13/14] xfs: multithreaded iwalk implementation
  2019-06-12  6:48 ` [PATCH 13/14] xfs: multithreaded iwalk implementation Darrick J. Wong
@ 2019-06-14 14:06   ` Brian Foster
  2019-06-18 18:17     ` Darrick J. Wong
  0 siblings, 1 reply; 33+ messages in thread
From: Brian Foster @ 2019-06-14 14:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Jun 11, 2019 at 11:48:55PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create a parallel iwalk implementation and switch quotacheck to use it.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

The mechanism bits all look pretty good to me. A couple quick nits
below. Otherwise I'll reserve further comment until we work out the
whole heuristic bit.

>  fs/xfs/Makefile      |    1 
>  fs/xfs/xfs_globals.c |    3 +
>  fs/xfs/xfs_iwalk.c   |   82 +++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_iwalk.h   |    2 +
>  fs/xfs/xfs_pwork.c   |  126 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_pwork.h   |   58 +++++++++++++++++++++++
>  fs/xfs/xfs_qm.c      |    2 -
>  fs/xfs/xfs_sysctl.h  |    6 ++
>  fs/xfs/xfs_sysfs.c   |   40 ++++++++++++++++
>  fs/xfs/xfs_trace.h   |   18 +++++++
>  10 files changed, 337 insertions(+), 1 deletion(-)
>  create mode 100644 fs/xfs/xfs_pwork.c
>  create mode 100644 fs/xfs/xfs_pwork.h
> 
> 
...
> diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
> index def37347a362..0fe740298981 100644
> --- a/fs/xfs/xfs_iwalk.c
> +++ b/fs/xfs/xfs_iwalk.c
...
> @@ -528,6 +541,74 @@ xfs_iwalk(
>  	return error;
>  }
>  
> +/* Run per-thread iwalk work. */
> +static int
> +xfs_iwalk_ag_work(
> +	struct xfs_mount	*mp,
> +	struct xfs_pwork	*pwork)
> +{
> +	struct xfs_iwalk_ag	*iwag;
> +	int			error;
> +
> +	iwag = container_of(pwork, struct xfs_iwalk_ag, pwork);
> +	if (xfs_pwork_want_abort(pwork))
> +		goto out;

Warning here for unitialized use of error.

> +
> +	error = xfs_iwalk_alloc(iwag);
> +	if (error)
> +		goto out;
> +
> +	error = xfs_iwalk_ag(iwag);
> +	xfs_iwalk_free(iwag);
> +out:
> +	kmem_free(iwag);
> +	return error;
> +}
> +
...
> diff --git a/fs/xfs/xfs_pwork.c b/fs/xfs/xfs_pwork.c
> new file mode 100644
> index 000000000000..8d0d5f130252
> --- /dev/null
> +++ b/fs/xfs/xfs_pwork.c
> @@ -0,0 +1,126 @@
...
> +int
> +xfs_pwork_init(
> +	struct xfs_mount	*mp,
> +	struct xfs_pwork_ctl	*pctl,
> +	xfs_pwork_work_fn	work_fn,
> +	const char		*tag,
> +	unsigned int		nr_threads)
> +{
> +#ifdef DEBUG
> +	if (xfs_globals.pwork_threads > 0)
> +		nr_threads = xfs_globals.pwork_threads;
> +	else if (xfs_globals.pwork_threads < 0)
> +		nr_threads = 0;

Can we not just have pwork_threads >= 0 means nr_threads =
pwork_threads, else we rely on the heuristic?

Brian

> +#endif
> +	trace_xfs_pwork_init(mp, nr_threads, current->pid);
> +
> +	pctl->wq = alloc_workqueue("%s-%d", WQ_FREEZABLE, nr_threads, tag,
> +			current->pid);
> +	if (!pctl->wq)
> +		return -ENOMEM;
> +	pctl->work_fn = work_fn;
> +	pctl->error = 0;
> +	pctl->mp = mp;
> +
> +	return 0;
> +}
> +
> +/* Queue some parallel work. */
> +void
> +xfs_pwork_queue(
> +	struct xfs_pwork_ctl	*pctl,
> +	struct xfs_pwork	*pwork)
> +{
> +	INIT_WORK(&pwork->work, xfs_pwork_work);
> +	pwork->pctl = pctl;
> +	queue_work(pctl->wq, &pwork->work);
> +}
> +
> +/* Wait for the work to finish and tear down the control structure. */
> +int
> +xfs_pwork_destroy(
> +	struct xfs_pwork_ctl	*pctl)
> +{
> +	destroy_workqueue(pctl->wq);
> +	pctl->wq = NULL;
> +	return pctl->error;
> +}
> +
> +/*
> + * Return the amount of parallelism that the data device can handle, or 0 for
> + * no limit.
> + */
> +unsigned int
> +xfs_pwork_guess_datadev_parallelism(
> +	struct xfs_mount	*mp)
> +{
> +	struct xfs_buftarg	*btp = mp->m_ddev_targp;
> +	int			iomin;
> +	int			ioopt;
> +
> +	if (blk_queue_nonrot(btp->bt_bdev->bd_queue))
> +		return num_online_cpus();
> +	if (mp->m_sb.sb_width && mp->m_sb.sb_unit)
> +		return mp->m_sb.sb_width / mp->m_sb.sb_unit;
> +	iomin = bdev_io_min(btp->bt_bdev);
> +	ioopt = bdev_io_opt(btp->bt_bdev);
> +	if (iomin && ioopt)
> +		return ioopt / iomin;
> +
> +	return 1;
> +}
> diff --git a/fs/xfs/xfs_pwork.h b/fs/xfs/xfs_pwork.h
> new file mode 100644
> index 000000000000..4cf1a6f48237
> --- /dev/null
> +++ b/fs/xfs/xfs_pwork.h
> @@ -0,0 +1,58 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Copyright (C) 2019 Oracle.  All Rights Reserved.
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + */
> +#ifndef __XFS_PWORK_H__
> +#define __XFS_PWORK_H__
> +
> +struct xfs_pwork;
> +struct xfs_mount;
> +
> +typedef int (*xfs_pwork_work_fn)(struct xfs_mount *mp, struct xfs_pwork *pwork);
> +
> +/*
> + * Parallel work coordination structure.
> + */
> +struct xfs_pwork_ctl {
> +	struct workqueue_struct	*wq;
> +	struct xfs_mount	*mp;
> +	xfs_pwork_work_fn	work_fn;
> +	int			error;
> +};
> +
> +/*
> + * Embed this parallel work control item inside your own work structure,
> + * then queue work with it.
> + */
> +struct xfs_pwork {
> +	struct work_struct	work;
> +	struct xfs_pwork_ctl	*pctl;
> +};
> +
> +#define XFS_PWORK_SINGLE_THREADED	{ .pctl = NULL }
> +
> +/* Have we been told to abort? */
> +static inline bool
> +xfs_pwork_ctl_want_abort(
> +	struct xfs_pwork_ctl	*pctl)
> +{
> +	return pctl && pctl->error;
> +}
> +
> +/* Have we been told to abort? */
> +static inline bool
> +xfs_pwork_want_abort(
> +	struct xfs_pwork	*pwork)
> +{
> +	return xfs_pwork_ctl_want_abort(pwork->pctl);
> +}
> +
> +int xfs_pwork_init(struct xfs_mount *mp, struct xfs_pwork_ctl *pctl,
> +		xfs_pwork_work_fn work_fn, const char *tag,
> +		unsigned int nr_threads);
> +void xfs_pwork_queue(struct xfs_pwork_ctl *pctl, struct xfs_pwork *pwork);
> +int xfs_pwork_destroy(struct xfs_pwork_ctl *pctl);
> +unsigned int xfs_pwork_guess_datadev_parallelism(struct xfs_mount *mp);
> +
> +#endif /* __XFS_PWORK_H__ */
> diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
> index 52e8ec0aa064..8004c931c86e 100644
> --- a/fs/xfs/xfs_qm.c
> +++ b/fs/xfs/xfs_qm.c
> @@ -1304,7 +1304,7 @@ xfs_qm_quotacheck(
>  		flags |= XFS_PQUOTA_CHKD;
>  	}
>  
> -	error = xfs_iwalk(mp, NULL, 0, xfs_qm_dqusage_adjust, 0, NULL);
> +	error = xfs_iwalk_threaded(mp, 0, xfs_qm_dqusage_adjust, 0, NULL);
>  	if (error)
>  		goto error_return;
>  
> diff --git a/fs/xfs/xfs_sysctl.h b/fs/xfs/xfs_sysctl.h
> index ad7f9be13087..b555e045e2f4 100644
> --- a/fs/xfs/xfs_sysctl.h
> +++ b/fs/xfs/xfs_sysctl.h
> @@ -37,6 +37,9 @@ typedef struct xfs_param {
>  	xfs_sysctl_val_t fstrm_timer;	/* Filestream dir-AG assoc'n timeout. */
>  	xfs_sysctl_val_t eofb_timer;	/* Interval between eofb scan wakeups */
>  	xfs_sysctl_val_t cowb_timer;	/* Interval between cowb scan wakeups */
> +#ifdef DEBUG
> +	xfs_sysctl_val_t pwork_threads;	/* Parallel workqueue thread count */
> +#endif
>  } xfs_param_t;
>  
>  /*
> @@ -82,6 +85,9 @@ enum {
>  extern xfs_param_t	xfs_params;
>  
>  struct xfs_globals {
> +#ifdef DEBUG
> +	int	pwork_threads;		/* parallel workqueue threads */
> +#endif
>  	int	log_recovery_delay;	/* log recovery delay (secs) */
>  	int	mount_delay;		/* mount setup delay (secs) */
>  	bool	bug_on_assert;		/* BUG() the kernel on assert failure */
> diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
> index cabda13f3c64..910e6b9cb1a7 100644
> --- a/fs/xfs/xfs_sysfs.c
> +++ b/fs/xfs/xfs_sysfs.c
> @@ -206,11 +206,51 @@ always_cow_show(
>  }
>  XFS_SYSFS_ATTR_RW(always_cow);
>  
> +#ifdef DEBUG
> +/*
> + * Override how many threads the parallel work queue is allowed to create.
> + * This has to be a debug-only global (instead of an errortag) because one of
> + * the main users of parallel workqueues is mount time quotacheck.
> + */
> +STATIC ssize_t
> +pwork_threads_store(
> +	struct kobject	*kobject,
> +	const char	*buf,
> +	size_t		count)
> +{
> +	int		ret;
> +	int		val;
> +
> +	ret = kstrtoint(buf, 0, &val);
> +	if (ret)
> +		return ret;
> +
> +	if (val < 0 || val > NR_CPUS)
> +		return -EINVAL;
> +
> +	xfs_globals.pwork_threads = val;
> +
> +	return count;
> +}
> +
> +STATIC ssize_t
> +pwork_threads_show(
> +	struct kobject	*kobject,
> +	char		*buf)
> +{
> +	return snprintf(buf, PAGE_SIZE, "%d\n", xfs_globals.pwork_threads);
> +}
> +XFS_SYSFS_ATTR_RW(pwork_threads);
> +#endif /* DEBUG */
> +
>  static struct attribute *xfs_dbg_attrs[] = {
>  	ATTR_LIST(bug_on_assert),
>  	ATTR_LIST(log_recovery_delay),
>  	ATTR_LIST(mount_delay),
>  	ATTR_LIST(always_cow),
> +#ifdef DEBUG
> +	ATTR_LIST(pwork_threads),
> +#endif
>  	NULL,
>  };
>  
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index f9bb1d50bc0e..658cbade1998 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3556,6 +3556,24 @@ TRACE_EVENT(xfs_iwalk_ag_rec,
>  		  __entry->startino, __entry->freemask)
>  )
>  
> +TRACE_EVENT(xfs_pwork_init,
> +	TP_PROTO(struct xfs_mount *mp, unsigned int nr_threads, pid_t pid),
> +	TP_ARGS(mp, nr_threads, pid),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(unsigned int, nr_threads)
> +		__field(pid_t, pid)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = mp->m_super->s_dev;
> +		__entry->nr_threads = nr_threads;
> +		__entry->pid = pid;
> +	),
> +	TP_printk("dev %d:%d nr_threads %u pid %u",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __entry->nr_threads, __entry->pid)
> +)
> +
>  #endif /* _TRACE_XFS_H */
>  
>  #undef TRACE_INCLUDE_PATH
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 14/14] xfs: poll waiting for quotacheck
  2019-06-12  6:49 ` [PATCH 14/14] xfs: poll waiting for quotacheck Darrick J. Wong
@ 2019-06-14 14:07   ` Brian Foster
  0 siblings, 0 replies; 33+ messages in thread
From: Brian Foster @ 2019-06-14 14:07 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Jun 11, 2019 at 11:49:02PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create a pwork destroy function that uses polling instead of
> uninterruptible sleep to wait for work items to finish so that we can
> touch the softlockup watchdog.  IOWs, gross hack.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/xfs_iwalk.c |    3 +++
>  fs/xfs/xfs_iwalk.h |    3 ++-
>  fs/xfs/xfs_pwork.c |   19 +++++++++++++++++++
>  fs/xfs/xfs_pwork.h |    3 +++
>  fs/xfs/xfs_qm.c    |    2 +-
>  5 files changed, 28 insertions(+), 2 deletions(-)
> 
> 
...
> diff --git a/fs/xfs/xfs_pwork.c b/fs/xfs/xfs_pwork.c
> index 8d0d5f130252..c2f02b710b8c 100644
> --- a/fs/xfs/xfs_pwork.c
> +++ b/fs/xfs/xfs_pwork.c
> @@ -13,6 +13,7 @@
>  #include "xfs_trace.h"
>  #include "xfs_sysctl.h"
>  #include "xfs_pwork.h"
> +#include <linux/nmi.h>
>  
>  /*
>   * Parallel Work Queue
> @@ -46,6 +47,8 @@ xfs_pwork_work(
>  	error = pctl->work_fn(pctl->mp, pwork);
>  	if (error && !pctl->error)
>  		pctl->error = error;
> +	atomic_dec(&pctl->nr_work);
> +	wake_up(&pctl->poll_wait);

We could use atomic_dec_and_test() here to avoid some unnecessary
wakeups. With that fixed up:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  }
>  
>  /*
> @@ -76,6 +79,8 @@ xfs_pwork_init(
>  	pctl->work_fn = work_fn;
>  	pctl->error = 0;
>  	pctl->mp = mp;
> +	atomic_set(&pctl->nr_work, 0);
> +	init_waitqueue_head(&pctl->poll_wait);
>  
>  	return 0;
>  }
> @@ -88,6 +93,7 @@ xfs_pwork_queue(
>  {
>  	INIT_WORK(&pwork->work, xfs_pwork_work);
>  	pwork->pctl = pctl;
> +	atomic_inc(&pctl->nr_work);
>  	queue_work(pctl->wq, &pwork->work);
>  }
>  
> @@ -101,6 +107,19 @@ xfs_pwork_destroy(
>  	return pctl->error;
>  }
>  
> +/*
> + * Wait for the work to finish by polling completion status and touch the soft
> + * lockup watchdog.  This is for callers such as mount which hold locks.
> + */
> +void
> +xfs_pwork_poll(
> +	struct xfs_pwork_ctl	*pctl)
> +{
> +	while (wait_event_timeout(pctl->poll_wait,
> +				atomic_read(&pctl->nr_work) == 0, HZ) == 0)
> +		touch_softlockup_watchdog();
> +}
> +
>  /*
>   * Return the amount of parallelism that the data device can handle, or 0 for
>   * no limit.
> diff --git a/fs/xfs/xfs_pwork.h b/fs/xfs/xfs_pwork.h
> index 4cf1a6f48237..ff93873df8d3 100644
> --- a/fs/xfs/xfs_pwork.h
> +++ b/fs/xfs/xfs_pwork.h
> @@ -18,6 +18,8 @@ struct xfs_pwork_ctl {
>  	struct workqueue_struct	*wq;
>  	struct xfs_mount	*mp;
>  	xfs_pwork_work_fn	work_fn;
> +	struct wait_queue_head	poll_wait;
> +	atomic_t		nr_work;
>  	int			error;
>  };
>  
> @@ -53,6 +55,7 @@ int xfs_pwork_init(struct xfs_mount *mp, struct xfs_pwork_ctl *pctl,
>  		unsigned int nr_threads);
>  void xfs_pwork_queue(struct xfs_pwork_ctl *pctl, struct xfs_pwork *pwork);
>  int xfs_pwork_destroy(struct xfs_pwork_ctl *pctl);
> +void xfs_pwork_poll(struct xfs_pwork_ctl *pctl);
>  unsigned int xfs_pwork_guess_datadev_parallelism(struct xfs_mount *mp);
>  
>  #endif /* __XFS_PWORK_H__ */
> diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
> index 8004c931c86e..8bb902125403 100644
> --- a/fs/xfs/xfs_qm.c
> +++ b/fs/xfs/xfs_qm.c
> @@ -1304,7 +1304,7 @@ xfs_qm_quotacheck(
>  		flags |= XFS_PQUOTA_CHKD;
>  	}
>  
> -	error = xfs_iwalk_threaded(mp, 0, xfs_qm_dqusage_adjust, 0, NULL);
> +	error = xfs_iwalk_threaded(mp, 0, xfs_qm_dqusage_adjust, 0, true, NULL);
>  	if (error)
>  		goto error_return;
>  
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/14] xfs: convert bulkstat to new iwalk infrastructure
  2019-06-14 11:10         ` Brian Foster
@ 2019-06-14 16:45           ` Darrick J. Wong
  2019-07-02 11:42             ` Brian Foster
  0 siblings, 1 reply; 33+ messages in thread
From: Darrick J. Wong @ 2019-06-14 16:45 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Jun 14, 2019 at 07:10:12AM -0400, Brian Foster wrote:
> On Thu, Jun 13, 2019 at 04:03:58PM -0700, Darrick J. Wong wrote:
> > On Thu, Jun 13, 2019 at 11:12:06AM -0700, Darrick J. Wong wrote:
> > > On Thu, Jun 13, 2019 at 12:31:54PM -0400, Brian Foster wrote:
> > > > On Tue, Jun 11, 2019 at 11:48:09PM -0700, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > 
> > > > > Create a new ibulk structure incore to help us deal with bulk inode stat
> > > > > state tracking and then convert the bulkstat code to use the new iwalk
> > > > > iterator.  This disentangles inode walking from bulk stat control for
> > > > > simpler code and enables us to isolate the formatter functions to the
> > > > > ioctl handling code.
> > > > > 
> > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > ---
> > > > >  fs/xfs/xfs_ioctl.c   |   70 ++++++--
> > > > >  fs/xfs/xfs_ioctl.h   |    5 +
> > > > >  fs/xfs/xfs_ioctl32.c |   93 ++++++-----
> > > > >  fs/xfs/xfs_itable.c  |  431 ++++++++++++++++----------------------------------
> > > > >  fs/xfs/xfs_itable.h  |   79 ++++-----
> > > > >  5 files changed, 272 insertions(+), 406 deletions(-)
> > > > > 
> > > > > 
> > > > ...
> > > > > diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> > > > > index 814ffe6fbab7..5d1c143bac18 100644
> > > > > --- a/fs/xfs/xfs_ioctl32.c
> > > > > +++ b/fs/xfs/xfs_ioctl32.c
> > > > ...
> > > > > @@ -284,38 +266,59 @@ xfs_compat_ioc_bulkstat(
> > > > >  		return -EFAULT;
> > > > >  	bulkreq.ocount = compat_ptr(addr);
> > > > >  
> > > > > -	if (copy_from_user(&inlast, bulkreq.lastip, sizeof(__s64)))
> > > > > +	if (copy_from_user(&lastino, bulkreq.lastip, sizeof(__s64)))
> > > > >  		return -EFAULT;
> > > > > +	breq.startino = lastino + 1;
> > > > >  
> > > > 
> > > > Spurious assignment?
> > > 
> > > Fixed.
> > > 
> > > > > -	if ((count = bulkreq.icount) <= 0)
> > > > > +	if (bulkreq.icount <= 0)
> > > > >  		return -EINVAL;
> > > > >  
> > > > >  	if (bulkreq.ubuffer == NULL)
> > > > >  		return -EINVAL;
> > > > >  
> > > > > +	breq.ubuffer = bulkreq.ubuffer;
> > > > > +	breq.icount = bulkreq.icount;
> > > > > +
> > > > ...
> > > > > diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> > > > > index 3ca1c454afe6..58e411e11d6c 100644
> > > > > --- a/fs/xfs/xfs_itable.c
> > > > > +++ b/fs/xfs/xfs_itable.c
> > > > > @@ -14,47 +14,68 @@
> > > > ...
> > > > > +STATIC int
> > > > >  xfs_bulkstat_one_int(
> > > > > -	struct xfs_mount	*mp,		/* mount point for filesystem */
> > > > > -	xfs_ino_t		ino,		/* inode to get data for */
> > > > > -	void __user		*buffer,	/* buffer to place output in */
> > > > > -	int			ubsize,		/* size of buffer */
> > > > > -	bulkstat_one_fmt_pf	formatter,	/* formatter, copy to user */
> > > > > -	int			*ubused,	/* bytes used by me */
> > > > > -	int			*stat)		/* BULKSTAT_RV_... */
> > > > > +	struct xfs_mount	*mp,
> > > > > +	struct xfs_trans	*tp,
> > > > > +	xfs_ino_t		ino,
> > > > > +	void			*data)
> > > > 
> > > > There's no need for a void pointer here given the current usage. We
> > > > might as well pass this as bc (and let the caller cast it, if
> > > > necessary).
> > > > 
> > > > That said, it also looks like the only reason we have the
> > > > xfs_bulkstat_iwalk wrapper caller of this function is to filter out
> > > > certain error values. If those errors are needed for the single inode
> > > > case, we could stick something in the bc to toggle that invalid inode
> > > > filtering behavior and eliminate the need for the wrapper entirely
> > > > (which would pass _one_int() into the iwalk infra directly and require
> > > > retaining the void pointer).
> > > 
> > > Ok, will do.  That'll help declutter the source file.
> > 
> > ...or I won't, because gcc complains that the function pointer passed
> > into xfs_iwalk() has to have a (void *) as the 4th parameter.  It's not
> > willing to accept one with a (struct xfs_bstat_chunk *).
> > 
> 
> Hm I don't follow, this function already takes a void *data parameter
> and we pass bc into xfs_iwalk() as a void*. What am I missing?

typedef int (*xfs_iwalk_fn)(struct xfs_mount *mp, struct xfs_trans *tp,
			    xfs_ino_t ino, void *data);

gcc doesn't like it if the signature of bulkstat_one_int doesn't match
xfs_iwalk_fn exactly, even if the only difference is a void pointer vs.
a structure pointer.

--D

> 
> Brian
> 
> > Sorry about that. :(
> > 
> > --D
> > 
> > > > 
> > > > >  {
> > > > > +	struct xfs_bstat_chunk	*bc = data;
> > > > >  	struct xfs_icdinode	*dic;		/* dinode core info pointer */
> > > > >  	struct xfs_inode	*ip;		/* incore inode pointer */
> > > > >  	struct inode		*inode;
> > > > > -	struct xfs_bstat	*buf;		/* return buffer */
> > > > > -	int			error = 0;	/* error value */
> > > > > +	struct xfs_bstat	*buf = bc->buf;
> > > > > +	int			error = -EINVAL;
> > > > >  
> > > > > -	*stat = BULKSTAT_RV_NOTHING;
> > > > > +	if (xfs_internal_inum(mp, ino))
> > > > > +		goto out_advance;
> > > > >  
> > > > > -	if (!buffer || xfs_internal_inum(mp, ino))
> > > > > -		return -EINVAL;
> > > > > -
> > > > > -	buf = kmem_zalloc(sizeof(*buf), KM_SLEEP | KM_MAYFAIL);
> > > > > -	if (!buf)
> > > > > -		return -ENOMEM;
> > > > > -
> > > > > -	error = xfs_iget(mp, NULL, ino,
> > > > > +	error = xfs_iget(mp, tp, ino,
> > > > >  			 (XFS_IGET_DONTCACHE | XFS_IGET_UNTRUSTED),
> > > > >  			 XFS_ILOCK_SHARED, &ip);
> > > > > +	if (error == -ENOENT || error == -EINVAL)
> > > > > +		goto out_advance;
> > > > >  	if (error)
> > > > > -		goto out_free;
> > > > > +		goto out;
> > > > >  
> > > > >  	ASSERT(ip != NULL);
> > > > >  	ASSERT(ip->i_imap.im_blkno != 0);
> > > > > @@ -119,43 +140,56 @@ xfs_bulkstat_one_int(
> > > > >  	xfs_iunlock(ip, XFS_ILOCK_SHARED);
> > > > >  	xfs_irele(ip);
> > > > >  
> > > > > -	error = formatter(buffer, ubsize, ubused, buf);
> > > > > -	if (!error)
> > > > > -		*stat = BULKSTAT_RV_DIDONE;
> > > > > +	error = bc->formatter(bc->breq, buf);
> > > > > +	if (error == XFS_IBULK_BUFFER_FULL) {
> > > > > +		error = XFS_IWALK_ABORT;
> > > > 
> > > > Related to the earlier patch.. is there a need for IBULK_BUFFER_FULL if
> > > > the only user converts it to the generic abort error?
> > > 
> > > <shrug> I wasn't sure if there was ever going to be a case where the
> > > formatter function wanted to abort for a reason that wasn't a full
> > > buffer... though looking at the bulkstat-v5 patches there aren't any.
> > > I guess I'll just remove BUFFER_FULL, then.
> > > 
> > > --D
> > > 
> > > > Most of these comments are minor/aesthetic, so:
> > > > 
> > > > Reviewed-by: Brian Foster <bfoster@redhat.com>
> > > > 
> > > > > +		goto out_advance;
> > > > > +	}
> > > > > +	if (error)
> > > > > +		goto out;
> > > > >  
> > > > > - out_free:
> > > > > -	kmem_free(buf);
> > > > > +out_advance:
> > > > > +	/*
> > > > > +	 * Advance the cursor to the inode that comes after the one we just
> > > > > +	 * looked at.  We want the caller to move along if the bulkstat
> > > > > +	 * information was copied successfully; if we tried to grab the inode
> > > > > +	 * but it's no longer allocated; or if it's internal metadata.
> > > > > +	 */
> > > > > +	bc->breq->startino = ino + 1;
> > > > > +out:
> > > > >  	return error;
> > > > >  }
> > > > >  
> > > > > -/* Return 0 on success or positive error */
> > > > > -STATIC int
> > > > > -xfs_bulkstat_one_fmt(
> > > > > -	void			__user *ubuffer,
> > > > > -	int			ubsize,
> > > > > -	int			*ubused,
> > > > > -	const xfs_bstat_t	*buffer)
> > > > > -{
> > > > > -	if (ubsize < sizeof(*buffer))
> > > > > -		return -ENOMEM;
> > > > > -	if (copy_to_user(ubuffer, buffer, sizeof(*buffer)))
> > > > > -		return -EFAULT;
> > > > > -	if (ubused)
> > > > > -		*ubused = sizeof(*buffer);
> > > > > -	return 0;
> > > > > -}
> > > > > -
> > > > > +/* Bulkstat a single inode. */
> > > > >  int
> > > > >  xfs_bulkstat_one(
> > > > > -	xfs_mount_t	*mp,		/* mount point for filesystem */
> > > > > -	xfs_ino_t	ino,		/* inode number to get data for */
> > > > > -	void		__user *buffer,	/* buffer to place output in */
> > > > > -	int		ubsize,		/* size of buffer */
> > > > > -	int		*ubused,	/* bytes used by me */
> > > > > -	int		*stat)		/* BULKSTAT_RV_... */
> > > > > +	struct xfs_ibulk	*breq,
> > > > > +	bulkstat_one_fmt_pf	formatter)
> > > > >  {
> > > > > -	return xfs_bulkstat_one_int(mp, ino, buffer, ubsize,
> > > > > -				    xfs_bulkstat_one_fmt, ubused, stat);
> > > > > +	struct xfs_bstat_chunk	bc = {
> > > > > +		.formatter	= formatter,
> > > > > +		.breq		= breq,
> > > > > +	};
> > > > > +	int			error;
> > > > > +
> > > > > +	ASSERT(breq->icount == 1);
> > > > > +
> > > > > +	bc.buf = kmem_zalloc(sizeof(struct xfs_bstat), KM_SLEEP | KM_MAYFAIL);
> > > > > +	if (!bc.buf)
> > > > > +		return -ENOMEM;
> > > > > +
> > > > > +	error = xfs_bulkstat_one_int(breq->mp, NULL, breq->startino, &bc);
> > > > > +
> > > > > +	kmem_free(bc.buf);
> > > > > +
> > > > > +	/*
> > > > > +	 * If we reported one inode to userspace then we abort because we hit
> > > > > +	 * the end of the buffer.  Don't leak that back to userspace.
> > > > > +	 */
> > > > > +	if (error == XFS_IWALK_ABORT)
> > > > > +		error = 0;
> > > > > +
> > > > > +	return error;
> > > > >  }
> > > > >  
> > > > >  /*
> > > > > @@ -251,256 +285,69 @@ xfs_bulkstat_grab_ichunk(
> > > > >  
> > > > >  #define XFS_BULKSTAT_UBLEFT(ubleft)	((ubleft) >= statstruct_size)
> > > > >  
> > > > > -struct xfs_bulkstat_agichunk {
> > > > > -	char		__user **ac_ubuffer;/* pointer into user's buffer */
> > > > > -	int		ac_ubleft;	/* bytes left in user's buffer */
> > > > > -	int		ac_ubelem;	/* spaces used in user's buffer */
> > > > > -};
> > > > > -
> > > > > -/*
> > > > > - * Process inodes in chunk with a pointer to a formatter function
> > > > > - * that will iget the inode and fill in the appropriate structure.
> > > > > - */
> > > > >  static int
> > > > > -xfs_bulkstat_ag_ichunk(
> > > > > -	struct xfs_mount		*mp,
> > > > > -	xfs_agnumber_t			agno,
> > > > > -	struct xfs_inobt_rec_incore	*irbp,
> > > > > -	bulkstat_one_pf			formatter,
> > > > > -	size_t				statstruct_size,
> > > > > -	struct xfs_bulkstat_agichunk	*acp,
> > > > > -	xfs_agino_t			*last_agino)
> > > > > +xfs_bulkstat_iwalk(
> > > > > +	struct xfs_mount	*mp,
> > > > > +	struct xfs_trans	*tp,
> > > > > +	xfs_ino_t		ino,
> > > > > +	void			*data)
> > > > >  {
> > > > > -	char				__user **ubufp = acp->ac_ubuffer;
> > > > > -	int				chunkidx;
> > > > > -	int				error = 0;
> > > > > -	xfs_agino_t			agino = irbp->ir_startino;
> > > > > -
> > > > > -	for (chunkidx = 0; chunkidx < XFS_INODES_PER_CHUNK;
> > > > > -	     chunkidx++, agino++) {
> > > > > -		int		fmterror;
> > > > > -		int		ubused;
> > > > > -
> > > > > -		/* inode won't fit in buffer, we are done */
> > > > > -		if (acp->ac_ubleft < statstruct_size)
> > > > > -			break;
> > > > > -
> > > > > -		/* Skip if this inode is free */
> > > > > -		if (XFS_INOBT_MASK(chunkidx) & irbp->ir_free)
> > > > > -			continue;
> > > > > -
> > > > > -		/* Get the inode and fill in a single buffer */
> > > > > -		ubused = statstruct_size;
> > > > > -		error = formatter(mp, XFS_AGINO_TO_INO(mp, agno, agino),
> > > > > -				  *ubufp, acp->ac_ubleft, &ubused, &fmterror);
> > > > > -
> > > > > -		if (fmterror == BULKSTAT_RV_GIVEUP ||
> > > > > -		    (error && error != -ENOENT && error != -EINVAL)) {
> > > > > -			acp->ac_ubleft = 0;
> > > > > -			ASSERT(error);
> > > > > -			break;
> > > > > -		}
> > > > > -
> > > > > -		/* be careful not to leak error if at end of chunk */
> > > > > -		if (fmterror == BULKSTAT_RV_NOTHING || error) {
> > > > > -			error = 0;
> > > > > -			continue;
> > > > > -		}
> > > > > -
> > > > > -		*ubufp += ubused;
> > > > > -		acp->ac_ubleft -= ubused;
> > > > > -		acp->ac_ubelem++;
> > > > > -	}
> > > > > -
> > > > > -	/*
> > > > > -	 * Post-update *last_agino. At this point, agino will always point one
> > > > > -	 * inode past the last inode we processed successfully. Hence we
> > > > > -	 * substract that inode when setting the *last_agino cursor so that we
> > > > > -	 * return the correct cookie to userspace. On the next bulkstat call,
> > > > > -	 * the inode under the lastino cookie will be skipped as we have already
> > > > > -	 * processed it here.
> > > > > -	 */
> > > > > -	*last_agino = agino - 1;
> > > > > +	int			error;
> > > > >  
> > > > > +	error = xfs_bulkstat_one_int(mp, tp, ino, data);
> > > > > +	/* bulkstat just skips over missing inodes */
> > > > > +	if (error == -ENOENT || error == -EINVAL)
> > > > > +		return 0;
> > > > >  	return error;
> > > > >  }
> > > > >  
> > > > >  /*
> > > > > - * Return stat information in bulk (by-inode) for the filesystem.
> > > > > + * Check the incoming lastino parameter.
> > > > > + *
> > > > > + * We allow any inode value that could map to physical space inside the
> > > > > + * filesystem because if there are no inodes there, bulkstat moves on to the
> > > > > + * next chunk.  In other words, the magic agino value of zero takes us to the
> > > > > + * first chunk in the AG, and an agino value past the end of the AG takes us to
> > > > > + * the first chunk in the next AG.
> > > > > + *
> > > > > + * Therefore we can end early if the requested inode is beyond the end of the
> > > > > + * filesystem or doesn't map properly.
> > > > >   */
> > > > > -int					/* error status */
> > > > > -xfs_bulkstat(
> > > > > -	xfs_mount_t		*mp,	/* mount point for filesystem */
> > > > > -	xfs_ino_t		*lastinop, /* last inode returned */
> > > > > -	int			*ubcountp, /* size of buffer/count returned */
> > > > > -	bulkstat_one_pf		formatter, /* func that'd fill a single buf */
> > > > > -	size_t			statstruct_size, /* sizeof struct filling */
> > > > > -	char			__user *ubuffer, /* buffer with inode stats */
> > > > > -	int			*done)	/* 1 if there are more stats to get */
> > > > > +static inline bool
> > > > > +xfs_bulkstat_already_done(
> > > > > +	struct xfs_mount	*mp,
> > > > > +	xfs_ino_t		startino)
> > > > >  {
> > > > > -	xfs_buf_t		*agbp;	/* agi header buffer */
> > > > > -	xfs_agino_t		agino;	/* inode # in allocation group */
> > > > > -	xfs_agnumber_t		agno;	/* allocation group number */
> > > > > -	xfs_btree_cur_t		*cur;	/* btree cursor for ialloc btree */
> > > > > -	xfs_inobt_rec_incore_t	*irbuf;	/* start of irec buffer */
> > > > > -	int			nirbuf;	/* size of irbuf */
> > > > > -	int			ubcount; /* size of user's buffer */
> > > > > -	struct xfs_bulkstat_agichunk ac;
> > > > > -	int			error = 0;
> > > > > +	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, startino);
> > > > > +	xfs_agino_t		agino = XFS_INO_TO_AGINO(mp, startino);
> > > > >  
> > > > > -	/*
> > > > > -	 * Get the last inode value, see if there's nothing to do.
> > > > > -	 */
> > > > > -	agno = XFS_INO_TO_AGNO(mp, *lastinop);
> > > > > -	agino = XFS_INO_TO_AGINO(mp, *lastinop);
> > > > > -	if (agno >= mp->m_sb.sb_agcount ||
> > > > > -	    *lastinop != XFS_AGINO_TO_INO(mp, agno, agino)) {
> > > > > -		*done = 1;
> > > > > -		*ubcountp = 0;
> > > > > -		return 0;
> > > > > -	}
> > > > > +	return agno >= mp->m_sb.sb_agcount ||
> > > > > +	       startino != XFS_AGINO_TO_INO(mp, agno, agino);
> > > > > +}
> > > > >  
> > > > > -	ubcount = *ubcountp; /* statstruct's */
> > > > > -	ac.ac_ubuffer = &ubuffer;
> > > > > -	ac.ac_ubleft = ubcount * statstruct_size; /* bytes */;
> > > > > -	ac.ac_ubelem = 0;
> > > > > +/* Return stat information in bulk (by-inode) for the filesystem. */
> > > > > +int
> > > > > +xfs_bulkstat(
> > > > > +	struct xfs_ibulk	*breq,
> > > > > +	bulkstat_one_fmt_pf	formatter)
> > > > > +{
> > > > > +	struct xfs_bstat_chunk	bc = {
> > > > > +		.formatter	= formatter,
> > > > > +		.breq		= breq,
> > > > > +	};
> > > > > +	int			error;
> > > > >  
> > > > > -	*ubcountp = 0;
> > > > > -	*done = 0;
> > > > > +	if (xfs_bulkstat_already_done(breq->mp, breq->startino))
> > > > > +		return 0;
> > > > >  
> > > > > -	irbuf = kmem_zalloc_large(PAGE_SIZE * 4, KM_SLEEP);
> > > > > -	if (!irbuf)
> > > > > +	bc.buf = kmem_zalloc(sizeof(struct xfs_bstat), KM_SLEEP | KM_MAYFAIL);
> > > > > +	if (!bc.buf)
> > > > >  		return -ENOMEM;
> > > > > -	nirbuf = (PAGE_SIZE * 4) / sizeof(*irbuf);
> > > > >  
> > > > > -	/*
> > > > > -	 * Loop over the allocation groups, starting from the last
> > > > > -	 * inode returned; 0 means start of the allocation group.
> > > > > -	 */
> > > > > -	while (agno < mp->m_sb.sb_agcount) {
> > > > > -		struct xfs_inobt_rec_incore	*irbp = irbuf;
> > > > > -		struct xfs_inobt_rec_incore	*irbufend = irbuf + nirbuf;
> > > > > -		bool				end_of_ag = false;
> > > > > -		int				icount = 0;
> > > > > -		int				stat;
> > > > > +	error = xfs_iwalk(breq->mp, NULL, breq->startino, xfs_bulkstat_iwalk,
> > > > > +			breq->icount, &bc);
> > > > >  
> > > > > -		error = xfs_ialloc_read_agi(mp, NULL, agno, &agbp);
> > > > > -		if (error)
> > > > > -			break;
> > > > > -		/*
> > > > > -		 * Allocate and initialize a btree cursor for ialloc btree.
> > > > > -		 */
> > > > > -		cur = xfs_inobt_init_cursor(mp, NULL, agbp, agno,
> > > > > -					    XFS_BTNUM_INO);
> > > > > -		if (agino > 0) {
> > > > > -			/*
> > > > > -			 * In the middle of an allocation group, we need to get
> > > > > -			 * the remainder of the chunk we're in.
> > > > > -			 */
> > > > > -			struct xfs_inobt_rec_incore	r;
> > > > > -
> > > > > -			error = xfs_bulkstat_grab_ichunk(cur, agino, &icount, &r);
> > > > > -			if (error)
> > > > > -				goto del_cursor;
> > > > > -			if (icount) {
> > > > > -				irbp->ir_startino = r.ir_startino;
> > > > > -				irbp->ir_holemask = r.ir_holemask;
> > > > > -				irbp->ir_count = r.ir_count;
> > > > > -				irbp->ir_freecount = r.ir_freecount;
> > > > > -				irbp->ir_free = r.ir_free;
> > > > > -				irbp++;
> > > > > -			}
> > > > > -			/* Increment to the next record */
> > > > > -			error = xfs_btree_increment(cur, 0, &stat);
> > > > > -		} else {
> > > > > -			/* Start of ag.  Lookup the first inode chunk */
> > > > > -			error = xfs_inobt_lookup(cur, 0, XFS_LOOKUP_GE, &stat);
> > > > > -		}
> > > > > -		if (error || stat == 0) {
> > > > > -			end_of_ag = true;
> > > > > -			goto del_cursor;
> > > > > -		}
> > > > > -
> > > > > -		/*
> > > > > -		 * Loop through inode btree records in this ag,
> > > > > -		 * until we run out of inodes or space in the buffer.
> > > > > -		 */
> > > > > -		while (irbp < irbufend && icount < ubcount) {
> > > > > -			struct xfs_inobt_rec_incore	r;
> > > > > -
> > > > > -			error = xfs_inobt_get_rec(cur, &r, &stat);
> > > > > -			if (error || stat == 0) {
> > > > > -				end_of_ag = true;
> > > > > -				goto del_cursor;
> > > > > -			}
> > > > > -
> > > > > -			/*
> > > > > -			 * If this chunk has any allocated inodes, save it.
> > > > > -			 * Also start read-ahead now for this chunk.
> > > > > -			 */
> > > > > -			if (r.ir_freecount < r.ir_count) {
> > > > > -				xfs_bulkstat_ichunk_ra(mp, agno, &r);
> > > > > -				irbp->ir_startino = r.ir_startino;
> > > > > -				irbp->ir_holemask = r.ir_holemask;
> > > > > -				irbp->ir_count = r.ir_count;
> > > > > -				irbp->ir_freecount = r.ir_freecount;
> > > > > -				irbp->ir_free = r.ir_free;
> > > > > -				irbp++;
> > > > > -				icount += r.ir_count - r.ir_freecount;
> > > > > -			}
> > > > > -			error = xfs_btree_increment(cur, 0, &stat);
> > > > > -			if (error || stat == 0) {
> > > > > -				end_of_ag = true;
> > > > > -				goto del_cursor;
> > > > > -			}
> > > > > -			cond_resched();
> > > > > -		}
> > > > > -
> > > > > -		/*
> > > > > -		 * Drop the btree buffers and the agi buffer as we can't hold any
> > > > > -		 * of the locks these represent when calling iget. If there is a
> > > > > -		 * pending error, then we are done.
> > > > > -		 */
> > > > > -del_cursor:
> > > > > -		xfs_btree_del_cursor(cur, error);
> > > > > -		xfs_buf_relse(agbp);
> > > > > -		if (error)
> > > > > -			break;
> > > > > -		/*
> > > > > -		 * Now format all the good inodes into the user's buffer. The
> > > > > -		 * call to xfs_bulkstat_ag_ichunk() sets up the agino pointer
> > > > > -		 * for the next loop iteration.
> > > > > -		 */
> > > > > -		irbufend = irbp;
> > > > > -		for (irbp = irbuf;
> > > > > -		     irbp < irbufend && ac.ac_ubleft >= statstruct_size;
> > > > > -		     irbp++) {
> > > > > -			error = xfs_bulkstat_ag_ichunk(mp, agno, irbp,
> > > > > -					formatter, statstruct_size, &ac,
> > > > > -					&agino);
> > > > > -			if (error)
> > > > > -				break;
> > > > > -
> > > > > -			cond_resched();
> > > > > -		}
> > > > > -
> > > > > -		/*
> > > > > -		 * If we've run out of space or had a formatting error, we
> > > > > -		 * are now done
> > > > > -		 */
> > > > > -		if (ac.ac_ubleft < statstruct_size || error)
> > > > > -			break;
> > > > > -
> > > > > -		if (end_of_ag) {
> > > > > -			agno++;
> > > > > -			agino = 0;
> > > > > -		}
> > > > > -	}
> > > > > -	/*
> > > > > -	 * Done, we're either out of filesystem or space to put the data.
> > > > > -	 */
> > > > > -	kmem_free(irbuf);
> > > > > -	*ubcountp = ac.ac_ubelem;
> > > > > +	kmem_free(bc.buf);
> > > > >  
> > > > >  	/*
> > > > >  	 * We found some inodes, so clear the error status and return them.
> > > > > @@ -509,17 +356,9 @@ xfs_bulkstat(
> > > > >  	 * triggered again and propagated to userspace as there will be no
> > > > >  	 * formatted inodes in the buffer.
> > > > >  	 */
> > > > > -	if (ac.ac_ubelem)
> > > > > +	if (breq->ocount > 0)
> > > > >  		error = 0;
> > > > >  
> > > > > -	/*
> > > > > -	 * If we ran out of filesystem, lastino will point off the end of
> > > > > -	 * the filesystem so the next call will return immediately.
> > > > > -	 */
> > > > > -	*lastinop = XFS_AGINO_TO_INO(mp, agno, agino);
> > > > > -	if (agno >= mp->m_sb.sb_agcount)
> > > > > -		*done = 1;
> > > > > -
> > > > >  	return error;
> > > > >  }
> > > > >  
> > > > > diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
> > > > > index 369e3f159d4e..7c5f1df360e6 100644
> > > > > --- a/fs/xfs/xfs_itable.h
> > > > > +++ b/fs/xfs/xfs_itable.h
> > > > > @@ -5,63 +5,46 @@
> > > > >  #ifndef __XFS_ITABLE_H__
> > > > >  #define	__XFS_ITABLE_H__
> > > > >  
> > > > > -/*
> > > > > - * xfs_bulkstat() is used to fill in xfs_bstat structures as well as dm_stat
> > > > > - * structures (by the dmi library). This is a pointer to a formatter function
> > > > > - * that will iget the inode and fill in the appropriate structure.
> > > > > - * see xfs_bulkstat_one() and xfs_dm_bulkstat_one() in dmapi_xfs.c
> > > > > - */
> > > > > -typedef int (*bulkstat_one_pf)(struct xfs_mount	*mp,
> > > > > -			       xfs_ino_t	ino,
> > > > > -			       void		__user *buffer,
> > > > > -			       int		ubsize,
> > > > > -			       int		*ubused,
> > > > > -			       int		*stat);
> > > > > +/* In-memory representation of a userspace request for batch inode data. */
> > > > > +struct xfs_ibulk {
> > > > > +	struct xfs_mount	*mp;
> > > > > +	void __user		*ubuffer; /* user output buffer */
> > > > > +	xfs_ino_t		startino; /* start with this inode */
> > > > > +	unsigned int		icount;   /* number of elements in ubuffer */
> > > > > +	unsigned int		ocount;   /* number of records returned */
> > > > > +};
> > > > > +
> > > > > +/* Return value that means we want to abort the walk. */
> > > > > +#define XFS_IBULK_ABORT		(XFS_IWALK_ABORT)
> > > > > +
> > > > > +/* Return value that means the formatting buffer is now full. */
> > > > > +#define XFS_IBULK_BUFFER_FULL	(XFS_IBULK_ABORT + 1)
> > > > >  
> > > > >  /*
> > > > > - * Values for stat return value.
> > > > > + * Advance the user buffer pointer by one record of the given size.  If the
> > > > > + * buffer is now full, return the appropriate error code.
> > > > >   */
> > > > > -#define BULKSTAT_RV_NOTHING	0
> > > > > -#define BULKSTAT_RV_DIDONE	1
> > > > > -#define BULKSTAT_RV_GIVEUP	2
> > > > > +static inline int
> > > > > +xfs_ibulk_advance(
> > > > > +	struct xfs_ibulk	*breq,
> > > > > +	size_t			bytes)
> > > > > +{
> > > > > +	char __user		*b = breq->ubuffer;
> > > > > +
> > > > > +	breq->ubuffer = b + bytes;
> > > > > +	breq->ocount++;
> > > > > +	return breq->ocount == breq->icount ? XFS_IBULK_BUFFER_FULL : 0;
> > > > > +}
> > > > >  
> > > > >  /*
> > > > >   * Return stat information in bulk (by-inode) for the filesystem.
> > > > >   */
> > > > > -int					/* error status */
> > > > > -xfs_bulkstat(
> > > > > -	xfs_mount_t	*mp,		/* mount point for filesystem */
> > > > > -	xfs_ino_t	*lastino,	/* last inode returned */
> > > > > -	int		*count,		/* size of buffer/count returned */
> > > > > -	bulkstat_one_pf formatter,	/* func that'd fill a single buf */
> > > > > -	size_t		statstruct_size,/* sizeof struct that we're filling */
> > > > > -	char		__user *ubuffer,/* buffer with inode stats */
> > > > > -	int		*done);		/* 1 if there are more stats to get */
> > > > >  
> > > > > -typedef int (*bulkstat_one_fmt_pf)(  /* used size in bytes or negative error */
> > > > > -	void			__user *ubuffer, /* buffer to write to */
> > > > > -	int			ubsize,		 /* remaining user buffer sz */
> > > > > -	int			*ubused,	 /* bytes used by formatter */
> > > > > -	const xfs_bstat_t	*buffer);        /* buffer to read from */
> > > > > +typedef int (*bulkstat_one_fmt_pf)(struct xfs_ibulk *breq,
> > > > > +		const struct xfs_bstat *bstat);
> > > > >  
> > > > > -int
> > > > > -xfs_bulkstat_one_int(
> > > > > -	xfs_mount_t		*mp,
> > > > > -	xfs_ino_t		ino,
> > > > > -	void			__user *buffer,
> > > > > -	int			ubsize,
> > > > > -	bulkstat_one_fmt_pf	formatter,
> > > > > -	int			*ubused,
> > > > > -	int			*stat);
> > > > > -
> > > > > -int
> > > > > -xfs_bulkstat_one(
> > > > > -	xfs_mount_t		*mp,
> > > > > -	xfs_ino_t		ino,
> > > > > -	void			__user *buffer,
> > > > > -	int			ubsize,
> > > > > -	int			*ubused,
> > > > > -	int			*stat);
> > > > > +int xfs_bulkstat_one(struct xfs_ibulk *breq, bulkstat_one_fmt_pf formatter);
> > > > > +int xfs_bulkstat(struct xfs_ibulk *breq, bulkstat_one_fmt_pf formatter);
> > > > >  
> > > > >  typedef int (*inumbers_fmt_pf)(
> > > > >  	void			__user *ubuffer, /* buffer to write to */
> > > > > 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 13/14] xfs: multithreaded iwalk implementation
  2019-06-14 14:06   ` Brian Foster
@ 2019-06-18 18:17     ` Darrick J. Wong
  0 siblings, 0 replies; 33+ messages in thread
From: Darrick J. Wong @ 2019-06-18 18:17 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Jun 14, 2019 at 10:06:31AM -0400, Brian Foster wrote:
> On Tue, Jun 11, 2019 at 11:48:55PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create a parallel iwalk implementation and switch quotacheck to use it.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> 
> The mechanism bits all look pretty good to me. A couple quick nits
> below. Otherwise I'll reserve further comment until we work out the
> whole heuristic bit.

<nod>

> >  fs/xfs/Makefile      |    1 
> >  fs/xfs/xfs_globals.c |    3 +
> >  fs/xfs/xfs_iwalk.c   |   82 +++++++++++++++++++++++++++++++++
> >  fs/xfs/xfs_iwalk.h   |    2 +
> >  fs/xfs/xfs_pwork.c   |  126 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/xfs_pwork.h   |   58 +++++++++++++++++++++++
> >  fs/xfs/xfs_qm.c      |    2 -
> >  fs/xfs/xfs_sysctl.h  |    6 ++
> >  fs/xfs/xfs_sysfs.c   |   40 ++++++++++++++++
> >  fs/xfs/xfs_trace.h   |   18 +++++++
> >  10 files changed, 337 insertions(+), 1 deletion(-)
> >  create mode 100644 fs/xfs/xfs_pwork.c
> >  create mode 100644 fs/xfs/xfs_pwork.h
> > 
> > 
> ...
> > diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
> > index def37347a362..0fe740298981 100644
> > --- a/fs/xfs/xfs_iwalk.c
> > +++ b/fs/xfs/xfs_iwalk.c
> ...
> > @@ -528,6 +541,74 @@ xfs_iwalk(
> >  	return error;
> >  }
> >  
> > +/* Run per-thread iwalk work. */
> > +static int
> > +xfs_iwalk_ag_work(
> > +	struct xfs_mount	*mp,
> > +	struct xfs_pwork	*pwork)
> > +{
> > +	struct xfs_iwalk_ag	*iwag;
> > +	int			error;
> > +
> > +	iwag = container_of(pwork, struct xfs_iwalk_ag, pwork);
> > +	if (xfs_pwork_want_abort(pwork))
> > +		goto out;
> 
> Warning here for unitialized use of error.

Fixed; thanks.

> > +
> > +	error = xfs_iwalk_alloc(iwag);
> > +	if (error)
> > +		goto out;
> > +
> > +	error = xfs_iwalk_ag(iwag);
> > +	xfs_iwalk_free(iwag);
> > +out:
> > +	kmem_free(iwag);
> > +	return error;
> > +}
> > +
> ...
> > diff --git a/fs/xfs/xfs_pwork.c b/fs/xfs/xfs_pwork.c
> > new file mode 100644
> > index 000000000000..8d0d5f130252
> > --- /dev/null
> > +++ b/fs/xfs/xfs_pwork.c
> > @@ -0,0 +1,126 @@
> ...
> > +int
> > +xfs_pwork_init(
> > +	struct xfs_mount	*mp,
> > +	struct xfs_pwork_ctl	*pctl,
> > +	xfs_pwork_work_fn	work_fn,
> > +	const char		*tag,
> > +	unsigned int		nr_threads)
> > +{
> > +#ifdef DEBUG
> > +	if (xfs_globals.pwork_threads > 0)
> > +		nr_threads = xfs_globals.pwork_threads;
> > +	else if (xfs_globals.pwork_threads < 0)
> > +		nr_threads = 0;
> 
> Can we not just have pwork_threads >= 0 means nr_threads =
> pwork_threads, else we rely on the heuristic?

Ok.

--D

> Brian
> 
> > +#endif
> > +	trace_xfs_pwork_init(mp, nr_threads, current->pid);
> > +
> > +	pctl->wq = alloc_workqueue("%s-%d", WQ_FREEZABLE, nr_threads, tag,
> > +			current->pid);
> > +	if (!pctl->wq)
> > +		return -ENOMEM;
> > +	pctl->work_fn = work_fn;
> > +	pctl->error = 0;
> > +	pctl->mp = mp;
> > +
> > +	return 0;
> > +}
> > +
> > +/* Queue some parallel work. */
> > +void
> > +xfs_pwork_queue(
> > +	struct xfs_pwork_ctl	*pctl,
> > +	struct xfs_pwork	*pwork)
> > +{
> > +	INIT_WORK(&pwork->work, xfs_pwork_work);
> > +	pwork->pctl = pctl;
> > +	queue_work(pctl->wq, &pwork->work);
> > +}
> > +
> > +/* Wait for the work to finish and tear down the control structure. */
> > +int
> > +xfs_pwork_destroy(
> > +	struct xfs_pwork_ctl	*pctl)
> > +{
> > +	destroy_workqueue(pctl->wq);
> > +	pctl->wq = NULL;
> > +	return pctl->error;
> > +}
> > +
> > +/*
> > + * Return the amount of parallelism that the data device can handle, or 0 for
> > + * no limit.
> > + */
> > +unsigned int
> > +xfs_pwork_guess_datadev_parallelism(
> > +	struct xfs_mount	*mp)
> > +{
> > +	struct xfs_buftarg	*btp = mp->m_ddev_targp;
> > +	int			iomin;
> > +	int			ioopt;
> > +
> > +	if (blk_queue_nonrot(btp->bt_bdev->bd_queue))
> > +		return num_online_cpus();
> > +	if (mp->m_sb.sb_width && mp->m_sb.sb_unit)
> > +		return mp->m_sb.sb_width / mp->m_sb.sb_unit;
> > +	iomin = bdev_io_min(btp->bt_bdev);
> > +	ioopt = bdev_io_opt(btp->bt_bdev);
> > +	if (iomin && ioopt)
> > +		return ioopt / iomin;
> > +
> > +	return 1;
> > +}
> > diff --git a/fs/xfs/xfs_pwork.h b/fs/xfs/xfs_pwork.h
> > new file mode 100644
> > index 000000000000..4cf1a6f48237
> > --- /dev/null
> > +++ b/fs/xfs/xfs_pwork.h
> > @@ -0,0 +1,58 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +/*
> > + * Copyright (C) 2019 Oracle.  All Rights Reserved.
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + */
> > +#ifndef __XFS_PWORK_H__
> > +#define __XFS_PWORK_H__
> > +
> > +struct xfs_pwork;
> > +struct xfs_mount;
> > +
> > +typedef int (*xfs_pwork_work_fn)(struct xfs_mount *mp, struct xfs_pwork *pwork);
> > +
> > +/*
> > + * Parallel work coordination structure.
> > + */
> > +struct xfs_pwork_ctl {
> > +	struct workqueue_struct	*wq;
> > +	struct xfs_mount	*mp;
> > +	xfs_pwork_work_fn	work_fn;
> > +	int			error;
> > +};
> > +
> > +/*
> > + * Embed this parallel work control item inside your own work structure,
> > + * then queue work with it.
> > + */
> > +struct xfs_pwork {
> > +	struct work_struct	work;
> > +	struct xfs_pwork_ctl	*pctl;
> > +};
> > +
> > +#define XFS_PWORK_SINGLE_THREADED	{ .pctl = NULL }
> > +
> > +/* Have we been told to abort? */
> > +static inline bool
> > +xfs_pwork_ctl_want_abort(
> > +	struct xfs_pwork_ctl	*pctl)
> > +{
> > +	return pctl && pctl->error;
> > +}
> > +
> > +/* Have we been told to abort? */
> > +static inline bool
> > +xfs_pwork_want_abort(
> > +	struct xfs_pwork	*pwork)
> > +{
> > +	return xfs_pwork_ctl_want_abort(pwork->pctl);
> > +}
> > +
> > +int xfs_pwork_init(struct xfs_mount *mp, struct xfs_pwork_ctl *pctl,
> > +		xfs_pwork_work_fn work_fn, const char *tag,
> > +		unsigned int nr_threads);
> > +void xfs_pwork_queue(struct xfs_pwork_ctl *pctl, struct xfs_pwork *pwork);
> > +int xfs_pwork_destroy(struct xfs_pwork_ctl *pctl);
> > +unsigned int xfs_pwork_guess_datadev_parallelism(struct xfs_mount *mp);
> > +
> > +#endif /* __XFS_PWORK_H__ */
> > diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
> > index 52e8ec0aa064..8004c931c86e 100644
> > --- a/fs/xfs/xfs_qm.c
> > +++ b/fs/xfs/xfs_qm.c
> > @@ -1304,7 +1304,7 @@ xfs_qm_quotacheck(
> >  		flags |= XFS_PQUOTA_CHKD;
> >  	}
> >  
> > -	error = xfs_iwalk(mp, NULL, 0, xfs_qm_dqusage_adjust, 0, NULL);
> > +	error = xfs_iwalk_threaded(mp, 0, xfs_qm_dqusage_adjust, 0, NULL);
> >  	if (error)
> >  		goto error_return;
> >  
> > diff --git a/fs/xfs/xfs_sysctl.h b/fs/xfs/xfs_sysctl.h
> > index ad7f9be13087..b555e045e2f4 100644
> > --- a/fs/xfs/xfs_sysctl.h
> > +++ b/fs/xfs/xfs_sysctl.h
> > @@ -37,6 +37,9 @@ typedef struct xfs_param {
> >  	xfs_sysctl_val_t fstrm_timer;	/* Filestream dir-AG assoc'n timeout. */
> >  	xfs_sysctl_val_t eofb_timer;	/* Interval between eofb scan wakeups */
> >  	xfs_sysctl_val_t cowb_timer;	/* Interval between cowb scan wakeups */
> > +#ifdef DEBUG
> > +	xfs_sysctl_val_t pwork_threads;	/* Parallel workqueue thread count */
> > +#endif
> >  } xfs_param_t;
> >  
> >  /*
> > @@ -82,6 +85,9 @@ enum {
> >  extern xfs_param_t	xfs_params;
> >  
> >  struct xfs_globals {
> > +#ifdef DEBUG
> > +	int	pwork_threads;		/* parallel workqueue threads */
> > +#endif
> >  	int	log_recovery_delay;	/* log recovery delay (secs) */
> >  	int	mount_delay;		/* mount setup delay (secs) */
> >  	bool	bug_on_assert;		/* BUG() the kernel on assert failure */
> > diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
> > index cabda13f3c64..910e6b9cb1a7 100644
> > --- a/fs/xfs/xfs_sysfs.c
> > +++ b/fs/xfs/xfs_sysfs.c
> > @@ -206,11 +206,51 @@ always_cow_show(
> >  }
> >  XFS_SYSFS_ATTR_RW(always_cow);
> >  
> > +#ifdef DEBUG
> > +/*
> > + * Override how many threads the parallel work queue is allowed to create.
> > + * This has to be a debug-only global (instead of an errortag) because one of
> > + * the main users of parallel workqueues is mount time quotacheck.
> > + */
> > +STATIC ssize_t
> > +pwork_threads_store(
> > +	struct kobject	*kobject,
> > +	const char	*buf,
> > +	size_t		count)
> > +{
> > +	int		ret;
> > +	int		val;
> > +
> > +	ret = kstrtoint(buf, 0, &val);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (val < 0 || val > NR_CPUS)
> > +		return -EINVAL;
> > +
> > +	xfs_globals.pwork_threads = val;
> > +
> > +	return count;
> > +}
> > +
> > +STATIC ssize_t
> > +pwork_threads_show(
> > +	struct kobject	*kobject,
> > +	char		*buf)
> > +{
> > +	return snprintf(buf, PAGE_SIZE, "%d\n", xfs_globals.pwork_threads);
> > +}
> > +XFS_SYSFS_ATTR_RW(pwork_threads);
> > +#endif /* DEBUG */
> > +
> >  static struct attribute *xfs_dbg_attrs[] = {
> >  	ATTR_LIST(bug_on_assert),
> >  	ATTR_LIST(log_recovery_delay),
> >  	ATTR_LIST(mount_delay),
> >  	ATTR_LIST(always_cow),
> > +#ifdef DEBUG
> > +	ATTR_LIST(pwork_threads),
> > +#endif
> >  	NULL,
> >  };
> >  
> > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > index f9bb1d50bc0e..658cbade1998 100644
> > --- a/fs/xfs/xfs_trace.h
> > +++ b/fs/xfs/xfs_trace.h
> > @@ -3556,6 +3556,24 @@ TRACE_EVENT(xfs_iwalk_ag_rec,
> >  		  __entry->startino, __entry->freemask)
> >  )
> >  
> > +TRACE_EVENT(xfs_pwork_init,
> > +	TP_PROTO(struct xfs_mount *mp, unsigned int nr_threads, pid_t pid),
> > +	TP_ARGS(mp, nr_threads, pid),
> > +	TP_STRUCT__entry(
> > +		__field(dev_t, dev)
> > +		__field(unsigned int, nr_threads)
> > +		__field(pid_t, pid)
> > +	),
> > +	TP_fast_assign(
> > +		__entry->dev = mp->m_super->s_dev;
> > +		__entry->nr_threads = nr_threads;
> > +		__entry->pid = pid;
> > +	),
> > +	TP_printk("dev %d:%d nr_threads %u pid %u",
> > +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> > +		  __entry->nr_threads, __entry->pid)
> > +)
> > +
> >  #endif /* _TRACE_XFS_H */
> >  
> >  #undef TRACE_INCLUDE_PATH
> > 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/14] xfs: convert bulkstat to new iwalk infrastructure
  2019-06-14 16:45           ` Darrick J. Wong
@ 2019-07-02 11:42             ` Brian Foster
  2019-07-02 15:33               ` Darrick J. Wong
  0 siblings, 1 reply; 33+ messages in thread
From: Brian Foster @ 2019-07-02 11:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Fri, Jun 14, 2019 at 09:45:10AM -0700, Darrick J. Wong wrote:
> On Fri, Jun 14, 2019 at 07:10:12AM -0400, Brian Foster wrote:
> > On Thu, Jun 13, 2019 at 04:03:58PM -0700, Darrick J. Wong wrote:
> > > On Thu, Jun 13, 2019 at 11:12:06AM -0700, Darrick J. Wong wrote:
> > > > On Thu, Jun 13, 2019 at 12:31:54PM -0400, Brian Foster wrote:
> > > > > On Tue, Jun 11, 2019 at 11:48:09PM -0700, Darrick J. Wong wrote:
> > > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > 
> > > > > > Create a new ibulk structure incore to help us deal with bulk inode stat
> > > > > > state tracking and then convert the bulkstat code to use the new iwalk
> > > > > > iterator.  This disentangles inode walking from bulk stat control for
> > > > > > simpler code and enables us to isolate the formatter functions to the
> > > > > > ioctl handling code.
> > > > > > 
> > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > ---
> > > > > >  fs/xfs/xfs_ioctl.c   |   70 ++++++--
> > > > > >  fs/xfs/xfs_ioctl.h   |    5 +
> > > > > >  fs/xfs/xfs_ioctl32.c |   93 ++++++-----
> > > > > >  fs/xfs/xfs_itable.c  |  431 ++++++++++++++++----------------------------------
> > > > > >  fs/xfs/xfs_itable.h  |   79 ++++-----
> > > > > >  5 files changed, 272 insertions(+), 406 deletions(-)
> > > > > > 
> > > > > > 
> > > > > ...
> > > > > > diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> > > > > > index 814ffe6fbab7..5d1c143bac18 100644
> > > > > > --- a/fs/xfs/xfs_ioctl32.c
> > > > > > +++ b/fs/xfs/xfs_ioctl32.c
> > > > > ...
> > > > > > @@ -284,38 +266,59 @@ xfs_compat_ioc_bulkstat(
> > > > > >  		return -EFAULT;
> > > > > >  	bulkreq.ocount = compat_ptr(addr);
> > > > > >  
> > > > > > -	if (copy_from_user(&inlast, bulkreq.lastip, sizeof(__s64)))
> > > > > > +	if (copy_from_user(&lastino, bulkreq.lastip, sizeof(__s64)))
> > > > > >  		return -EFAULT;
> > > > > > +	breq.startino = lastino + 1;
> > > > > >  
> > > > > 
> > > > > Spurious assignment?
> > > > 
> > > > Fixed.
> > > > 
> > > > > > -	if ((count = bulkreq.icount) <= 0)
> > > > > > +	if (bulkreq.icount <= 0)
> > > > > >  		return -EINVAL;
> > > > > >  
> > > > > >  	if (bulkreq.ubuffer == NULL)
> > > > > >  		return -EINVAL;
> > > > > >  
> > > > > > +	breq.ubuffer = bulkreq.ubuffer;
> > > > > > +	breq.icount = bulkreq.icount;
> > > > > > +
> > > > > ...
> > > > > > diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> > > > > > index 3ca1c454afe6..58e411e11d6c 100644
> > > > > > --- a/fs/xfs/xfs_itable.c
> > > > > > +++ b/fs/xfs/xfs_itable.c
> > > > > > @@ -14,47 +14,68 @@
> > > > > ...
> > > > > > +STATIC int
> > > > > >  xfs_bulkstat_one_int(
> > > > > > -	struct xfs_mount	*mp,		/* mount point for filesystem */
> > > > > > -	xfs_ino_t		ino,		/* inode to get data for */
> > > > > > -	void __user		*buffer,	/* buffer to place output in */
> > > > > > -	int			ubsize,		/* size of buffer */
> > > > > > -	bulkstat_one_fmt_pf	formatter,	/* formatter, copy to user */
> > > > > > -	int			*ubused,	/* bytes used by me */
> > > > > > -	int			*stat)		/* BULKSTAT_RV_... */
> > > > > > +	struct xfs_mount	*mp,
> > > > > > +	struct xfs_trans	*tp,
> > > > > > +	xfs_ino_t		ino,
> > > > > > +	void			*data)
> > > > > 
> > > > > There's no need for a void pointer here given the current usage. We
> > > > > might as well pass this as bc (and let the caller cast it, if
> > > > > necessary).
> > > > > 
> > > > > That said, it also looks like the only reason we have the
> > > > > xfs_bulkstat_iwalk wrapper caller of this function is to filter out
> > > > > certain error values. If those errors are needed for the single inode
> > > > > case, we could stick something in the bc to toggle that invalid inode
> > > > > filtering behavior and eliminate the need for the wrapper entirely
> > > > > (which would pass _one_int() into the iwalk infra directly and require
> > > > > retaining the void pointer).
> > > > 
> > > > Ok, will do.  That'll help declutter the source file.
> > > 
> > > ...or I won't, because gcc complains that the function pointer passed
> > > into xfs_iwalk() has to have a (void *) as the 4th parameter.  It's not
> > > willing to accept one with a (struct xfs_bstat_chunk *).
> > > 
> > 
> > Hm I don't follow, this function already takes a void *data parameter
> > and we pass bc into xfs_iwalk() as a void*. What am I missing?
> 
> typedef int (*xfs_iwalk_fn)(struct xfs_mount *mp, struct xfs_trans *tp,
> 			    xfs_ino_t ino, void *data);
> 
> gcc doesn't like it if the signature of bulkstat_one_int doesn't match
> xfs_iwalk_fn exactly, even if the only difference is a void pointer vs.
> a structure pointer.
> 

Sure, but I was just suggesting to do one or the other. There's no
reason for _one_int() to have a void pointer in the current code, but
the better cleanup IMO is to find a way to just pass _one_int() (with
its current signature) to xfs_iwalk().

Brian

> --D
> 
> > 
> > Brian
> > 
> > > Sorry about that. :(
> > > 
> > > --D
> > > 
> > > > > 
> > > > > >  {
> > > > > > +	struct xfs_bstat_chunk	*bc = data;
> > > > > >  	struct xfs_icdinode	*dic;		/* dinode core info pointer */
> > > > > >  	struct xfs_inode	*ip;		/* incore inode pointer */
> > > > > >  	struct inode		*inode;
> > > > > > -	struct xfs_bstat	*buf;		/* return buffer */
> > > > > > -	int			error = 0;	/* error value */
> > > > > > +	struct xfs_bstat	*buf = bc->buf;
> > > > > > +	int			error = -EINVAL;
> > > > > >  
> > > > > > -	*stat = BULKSTAT_RV_NOTHING;
> > > > > > +	if (xfs_internal_inum(mp, ino))
> > > > > > +		goto out_advance;
> > > > > >  
> > > > > > -	if (!buffer || xfs_internal_inum(mp, ino))
> > > > > > -		return -EINVAL;
> > > > > > -
> > > > > > -	buf = kmem_zalloc(sizeof(*buf), KM_SLEEP | KM_MAYFAIL);
> > > > > > -	if (!buf)
> > > > > > -		return -ENOMEM;
> > > > > > -
> > > > > > -	error = xfs_iget(mp, NULL, ino,
> > > > > > +	error = xfs_iget(mp, tp, ino,
> > > > > >  			 (XFS_IGET_DONTCACHE | XFS_IGET_UNTRUSTED),
> > > > > >  			 XFS_ILOCK_SHARED, &ip);
> > > > > > +	if (error == -ENOENT || error == -EINVAL)
> > > > > > +		goto out_advance;
> > > > > >  	if (error)
> > > > > > -		goto out_free;
> > > > > > +		goto out;
> > > > > >  
> > > > > >  	ASSERT(ip != NULL);
> > > > > >  	ASSERT(ip->i_imap.im_blkno != 0);
> > > > > > @@ -119,43 +140,56 @@ xfs_bulkstat_one_int(
> > > > > >  	xfs_iunlock(ip, XFS_ILOCK_SHARED);
> > > > > >  	xfs_irele(ip);
> > > > > >  
> > > > > > -	error = formatter(buffer, ubsize, ubused, buf);
> > > > > > -	if (!error)
> > > > > > -		*stat = BULKSTAT_RV_DIDONE;
> > > > > > +	error = bc->formatter(bc->breq, buf);
> > > > > > +	if (error == XFS_IBULK_BUFFER_FULL) {
> > > > > > +		error = XFS_IWALK_ABORT;
> > > > > 
> > > > > Related to the earlier patch.. is there a need for IBULK_BUFFER_FULL if
> > > > > the only user converts it to the generic abort error?
> > > > 
> > > > <shrug> I wasn't sure if there was ever going to be a case where the
> > > > formatter function wanted to abort for a reason that wasn't a full
> > > > buffer... though looking at the bulkstat-v5 patches there aren't any.
> > > > I guess I'll just remove BUFFER_FULL, then.
> > > > 
> > > > --D
> > > > 
> > > > > Most of these comments are minor/aesthetic, so:
> > > > > 
> > > > > Reviewed-by: Brian Foster <bfoster@redhat.com>
> > > > > 
> > > > > > +		goto out_advance;
> > > > > > +	}
> > > > > > +	if (error)
> > > > > > +		goto out;
> > > > > >  
> > > > > > - out_free:
> > > > > > -	kmem_free(buf);
> > > > > > +out_advance:
> > > > > > +	/*
> > > > > > +	 * Advance the cursor to the inode that comes after the one we just
> > > > > > +	 * looked at.  We want the caller to move along if the bulkstat
> > > > > > +	 * information was copied successfully; if we tried to grab the inode
> > > > > > +	 * but it's no longer allocated; or if it's internal metadata.
> > > > > > +	 */
> > > > > > +	bc->breq->startino = ino + 1;
> > > > > > +out:
> > > > > >  	return error;
> > > > > >  }
> > > > > >  
> > > > > > -/* Return 0 on success or positive error */
> > > > > > -STATIC int
> > > > > > -xfs_bulkstat_one_fmt(
> > > > > > -	void			__user *ubuffer,
> > > > > > -	int			ubsize,
> > > > > > -	int			*ubused,
> > > > > > -	const xfs_bstat_t	*buffer)
> > > > > > -{
> > > > > > -	if (ubsize < sizeof(*buffer))
> > > > > > -		return -ENOMEM;
> > > > > > -	if (copy_to_user(ubuffer, buffer, sizeof(*buffer)))
> > > > > > -		return -EFAULT;
> > > > > > -	if (ubused)
> > > > > > -		*ubused = sizeof(*buffer);
> > > > > > -	return 0;
> > > > > > -}
> > > > > > -
> > > > > > +/* Bulkstat a single inode. */
> > > > > >  int
> > > > > >  xfs_bulkstat_one(
> > > > > > -	xfs_mount_t	*mp,		/* mount point for filesystem */
> > > > > > -	xfs_ino_t	ino,		/* inode number to get data for */
> > > > > > -	void		__user *buffer,	/* buffer to place output in */
> > > > > > -	int		ubsize,		/* size of buffer */
> > > > > > -	int		*ubused,	/* bytes used by me */
> > > > > > -	int		*stat)		/* BULKSTAT_RV_... */
> > > > > > +	struct xfs_ibulk	*breq,
> > > > > > +	bulkstat_one_fmt_pf	formatter)
> > > > > >  {
> > > > > > -	return xfs_bulkstat_one_int(mp, ino, buffer, ubsize,
> > > > > > -				    xfs_bulkstat_one_fmt, ubused, stat);
> > > > > > +	struct xfs_bstat_chunk	bc = {
> > > > > > +		.formatter	= formatter,
> > > > > > +		.breq		= breq,
> > > > > > +	};
> > > > > > +	int			error;
> > > > > > +
> > > > > > +	ASSERT(breq->icount == 1);
> > > > > > +
> > > > > > +	bc.buf = kmem_zalloc(sizeof(struct xfs_bstat), KM_SLEEP | KM_MAYFAIL);
> > > > > > +	if (!bc.buf)
> > > > > > +		return -ENOMEM;
> > > > > > +
> > > > > > +	error = xfs_bulkstat_one_int(breq->mp, NULL, breq->startino, &bc);
> > > > > > +
> > > > > > +	kmem_free(bc.buf);
> > > > > > +
> > > > > > +	/*
> > > > > > +	 * If we reported one inode to userspace then we abort because we hit
> > > > > > +	 * the end of the buffer.  Don't leak that back to userspace.
> > > > > > +	 */
> > > > > > +	if (error == XFS_IWALK_ABORT)
> > > > > > +		error = 0;
> > > > > > +
> > > > > > +	return error;
> > > > > >  }
> > > > > >  
> > > > > >  /*
> > > > > > @@ -251,256 +285,69 @@ xfs_bulkstat_grab_ichunk(
> > > > > >  
> > > > > >  #define XFS_BULKSTAT_UBLEFT(ubleft)	((ubleft) >= statstruct_size)
> > > > > >  
> > > > > > -struct xfs_bulkstat_agichunk {
> > > > > > -	char		__user **ac_ubuffer;/* pointer into user's buffer */
> > > > > > -	int		ac_ubleft;	/* bytes left in user's buffer */
> > > > > > -	int		ac_ubelem;	/* spaces used in user's buffer */
> > > > > > -};
> > > > > > -
> > > > > > -/*
> > > > > > - * Process inodes in chunk with a pointer to a formatter function
> > > > > > - * that will iget the inode and fill in the appropriate structure.
> > > > > > - */
> > > > > >  static int
> > > > > > -xfs_bulkstat_ag_ichunk(
> > > > > > -	struct xfs_mount		*mp,
> > > > > > -	xfs_agnumber_t			agno,
> > > > > > -	struct xfs_inobt_rec_incore	*irbp,
> > > > > > -	bulkstat_one_pf			formatter,
> > > > > > -	size_t				statstruct_size,
> > > > > > -	struct xfs_bulkstat_agichunk	*acp,
> > > > > > -	xfs_agino_t			*last_agino)
> > > > > > +xfs_bulkstat_iwalk(
> > > > > > +	struct xfs_mount	*mp,
> > > > > > +	struct xfs_trans	*tp,
> > > > > > +	xfs_ino_t		ino,
> > > > > > +	void			*data)
> > > > > >  {
> > > > > > -	char				__user **ubufp = acp->ac_ubuffer;
> > > > > > -	int				chunkidx;
> > > > > > -	int				error = 0;
> > > > > > -	xfs_agino_t			agino = irbp->ir_startino;
> > > > > > -
> > > > > > -	for (chunkidx = 0; chunkidx < XFS_INODES_PER_CHUNK;
> > > > > > -	     chunkidx++, agino++) {
> > > > > > -		int		fmterror;
> > > > > > -		int		ubused;
> > > > > > -
> > > > > > -		/* inode won't fit in buffer, we are done */
> > > > > > -		if (acp->ac_ubleft < statstruct_size)
> > > > > > -			break;
> > > > > > -
> > > > > > -		/* Skip if this inode is free */
> > > > > > -		if (XFS_INOBT_MASK(chunkidx) & irbp->ir_free)
> > > > > > -			continue;
> > > > > > -
> > > > > > -		/* Get the inode and fill in a single buffer */
> > > > > > -		ubused = statstruct_size;
> > > > > > -		error = formatter(mp, XFS_AGINO_TO_INO(mp, agno, agino),
> > > > > > -				  *ubufp, acp->ac_ubleft, &ubused, &fmterror);
> > > > > > -
> > > > > > -		if (fmterror == BULKSTAT_RV_GIVEUP ||
> > > > > > -		    (error && error != -ENOENT && error != -EINVAL)) {
> > > > > > -			acp->ac_ubleft = 0;
> > > > > > -			ASSERT(error);
> > > > > > -			break;
> > > > > > -		}
> > > > > > -
> > > > > > -		/* be careful not to leak error if at end of chunk */
> > > > > > -		if (fmterror == BULKSTAT_RV_NOTHING || error) {
> > > > > > -			error = 0;
> > > > > > -			continue;
> > > > > > -		}
> > > > > > -
> > > > > > -		*ubufp += ubused;
> > > > > > -		acp->ac_ubleft -= ubused;
> > > > > > -		acp->ac_ubelem++;
> > > > > > -	}
> > > > > > -
> > > > > > -	/*
> > > > > > -	 * Post-update *last_agino. At this point, agino will always point one
> > > > > > -	 * inode past the last inode we processed successfully. Hence we
> > > > > > -	 * substract that inode when setting the *last_agino cursor so that we
> > > > > > -	 * return the correct cookie to userspace. On the next bulkstat call,
> > > > > > -	 * the inode under the lastino cookie will be skipped as we have already
> > > > > > -	 * processed it here.
> > > > > > -	 */
> > > > > > -	*last_agino = agino - 1;
> > > > > > +	int			error;
> > > > > >  
> > > > > > +	error = xfs_bulkstat_one_int(mp, tp, ino, data);
> > > > > > +	/* bulkstat just skips over missing inodes */
> > > > > > +	if (error == -ENOENT || error == -EINVAL)
> > > > > > +		return 0;
> > > > > >  	return error;
> > > > > >  }
> > > > > >  
> > > > > >  /*
> > > > > > - * Return stat information in bulk (by-inode) for the filesystem.
> > > > > > + * Check the incoming lastino parameter.
> > > > > > + *
> > > > > > + * We allow any inode value that could map to physical space inside the
> > > > > > + * filesystem because if there are no inodes there, bulkstat moves on to the
> > > > > > + * next chunk.  In other words, the magic agino value of zero takes us to the
> > > > > > + * first chunk in the AG, and an agino value past the end of the AG takes us to
> > > > > > + * the first chunk in the next AG.
> > > > > > + *
> > > > > > + * Therefore we can end early if the requested inode is beyond the end of the
> > > > > > + * filesystem or doesn't map properly.
> > > > > >   */
> > > > > > -int					/* error status */
> > > > > > -xfs_bulkstat(
> > > > > > -	xfs_mount_t		*mp,	/* mount point for filesystem */
> > > > > > -	xfs_ino_t		*lastinop, /* last inode returned */
> > > > > > -	int			*ubcountp, /* size of buffer/count returned */
> > > > > > -	bulkstat_one_pf		formatter, /* func that'd fill a single buf */
> > > > > > -	size_t			statstruct_size, /* sizeof struct filling */
> > > > > > -	char			__user *ubuffer, /* buffer with inode stats */
> > > > > > -	int			*done)	/* 1 if there are more stats to get */
> > > > > > +static inline bool
> > > > > > +xfs_bulkstat_already_done(
> > > > > > +	struct xfs_mount	*mp,
> > > > > > +	xfs_ino_t		startino)
> > > > > >  {
> > > > > > -	xfs_buf_t		*agbp;	/* agi header buffer */
> > > > > > -	xfs_agino_t		agino;	/* inode # in allocation group */
> > > > > > -	xfs_agnumber_t		agno;	/* allocation group number */
> > > > > > -	xfs_btree_cur_t		*cur;	/* btree cursor for ialloc btree */
> > > > > > -	xfs_inobt_rec_incore_t	*irbuf;	/* start of irec buffer */
> > > > > > -	int			nirbuf;	/* size of irbuf */
> > > > > > -	int			ubcount; /* size of user's buffer */
> > > > > > -	struct xfs_bulkstat_agichunk ac;
> > > > > > -	int			error = 0;
> > > > > > +	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, startino);
> > > > > > +	xfs_agino_t		agino = XFS_INO_TO_AGINO(mp, startino);
> > > > > >  
> > > > > > -	/*
> > > > > > -	 * Get the last inode value, see if there's nothing to do.
> > > > > > -	 */
> > > > > > -	agno = XFS_INO_TO_AGNO(mp, *lastinop);
> > > > > > -	agino = XFS_INO_TO_AGINO(mp, *lastinop);
> > > > > > -	if (agno >= mp->m_sb.sb_agcount ||
> > > > > > -	    *lastinop != XFS_AGINO_TO_INO(mp, agno, agino)) {
> > > > > > -		*done = 1;
> > > > > > -		*ubcountp = 0;
> > > > > > -		return 0;
> > > > > > -	}
> > > > > > +	return agno >= mp->m_sb.sb_agcount ||
> > > > > > +	       startino != XFS_AGINO_TO_INO(mp, agno, agino);
> > > > > > +}
> > > > > >  
> > > > > > -	ubcount = *ubcountp; /* statstruct's */
> > > > > > -	ac.ac_ubuffer = &ubuffer;
> > > > > > -	ac.ac_ubleft = ubcount * statstruct_size; /* bytes */;
> > > > > > -	ac.ac_ubelem = 0;
> > > > > > +/* Return stat information in bulk (by-inode) for the filesystem. */
> > > > > > +int
> > > > > > +xfs_bulkstat(
> > > > > > +	struct xfs_ibulk	*breq,
> > > > > > +	bulkstat_one_fmt_pf	formatter)
> > > > > > +{
> > > > > > +	struct xfs_bstat_chunk	bc = {
> > > > > > +		.formatter	= formatter,
> > > > > > +		.breq		= breq,
> > > > > > +	};
> > > > > > +	int			error;
> > > > > >  
> > > > > > -	*ubcountp = 0;
> > > > > > -	*done = 0;
> > > > > > +	if (xfs_bulkstat_already_done(breq->mp, breq->startino))
> > > > > > +		return 0;
> > > > > >  
> > > > > > -	irbuf = kmem_zalloc_large(PAGE_SIZE * 4, KM_SLEEP);
> > > > > > -	if (!irbuf)
> > > > > > +	bc.buf = kmem_zalloc(sizeof(struct xfs_bstat), KM_SLEEP | KM_MAYFAIL);
> > > > > > +	if (!bc.buf)
> > > > > >  		return -ENOMEM;
> > > > > > -	nirbuf = (PAGE_SIZE * 4) / sizeof(*irbuf);
> > > > > >  
> > > > > > -	/*
> > > > > > -	 * Loop over the allocation groups, starting from the last
> > > > > > -	 * inode returned; 0 means start of the allocation group.
> > > > > > -	 */
> > > > > > -	while (agno < mp->m_sb.sb_agcount) {
> > > > > > -		struct xfs_inobt_rec_incore	*irbp = irbuf;
> > > > > > -		struct xfs_inobt_rec_incore	*irbufend = irbuf + nirbuf;
> > > > > > -		bool				end_of_ag = false;
> > > > > > -		int				icount = 0;
> > > > > > -		int				stat;
> > > > > > +	error = xfs_iwalk(breq->mp, NULL, breq->startino, xfs_bulkstat_iwalk,
> > > > > > +			breq->icount, &bc);
> > > > > >  
> > > > > > -		error = xfs_ialloc_read_agi(mp, NULL, agno, &agbp);
> > > > > > -		if (error)
> > > > > > -			break;
> > > > > > -		/*
> > > > > > -		 * Allocate and initialize a btree cursor for ialloc btree.
> > > > > > -		 */
> > > > > > -		cur = xfs_inobt_init_cursor(mp, NULL, agbp, agno,
> > > > > > -					    XFS_BTNUM_INO);
> > > > > > -		if (agino > 0) {
> > > > > > -			/*
> > > > > > -			 * In the middle of an allocation group, we need to get
> > > > > > -			 * the remainder of the chunk we're in.
> > > > > > -			 */
> > > > > > -			struct xfs_inobt_rec_incore	r;
> > > > > > -
> > > > > > -			error = xfs_bulkstat_grab_ichunk(cur, agino, &icount, &r);
> > > > > > -			if (error)
> > > > > > -				goto del_cursor;
> > > > > > -			if (icount) {
> > > > > > -				irbp->ir_startino = r.ir_startino;
> > > > > > -				irbp->ir_holemask = r.ir_holemask;
> > > > > > -				irbp->ir_count = r.ir_count;
> > > > > > -				irbp->ir_freecount = r.ir_freecount;
> > > > > > -				irbp->ir_free = r.ir_free;
> > > > > > -				irbp++;
> > > > > > -			}
> > > > > > -			/* Increment to the next record */
> > > > > > -			error = xfs_btree_increment(cur, 0, &stat);
> > > > > > -		} else {
> > > > > > -			/* Start of ag.  Lookup the first inode chunk */
> > > > > > -			error = xfs_inobt_lookup(cur, 0, XFS_LOOKUP_GE, &stat);
> > > > > > -		}
> > > > > > -		if (error || stat == 0) {
> > > > > > -			end_of_ag = true;
> > > > > > -			goto del_cursor;
> > > > > > -		}
> > > > > > -
> > > > > > -		/*
> > > > > > -		 * Loop through inode btree records in this ag,
> > > > > > -		 * until we run out of inodes or space in the buffer.
> > > > > > -		 */
> > > > > > -		while (irbp < irbufend && icount < ubcount) {
> > > > > > -			struct xfs_inobt_rec_incore	r;
> > > > > > -
> > > > > > -			error = xfs_inobt_get_rec(cur, &r, &stat);
> > > > > > -			if (error || stat == 0) {
> > > > > > -				end_of_ag = true;
> > > > > > -				goto del_cursor;
> > > > > > -			}
> > > > > > -
> > > > > > -			/*
> > > > > > -			 * If this chunk has any allocated inodes, save it.
> > > > > > -			 * Also start read-ahead now for this chunk.
> > > > > > -			 */
> > > > > > -			if (r.ir_freecount < r.ir_count) {
> > > > > > -				xfs_bulkstat_ichunk_ra(mp, agno, &r);
> > > > > > -				irbp->ir_startino = r.ir_startino;
> > > > > > -				irbp->ir_holemask = r.ir_holemask;
> > > > > > -				irbp->ir_count = r.ir_count;
> > > > > > -				irbp->ir_freecount = r.ir_freecount;
> > > > > > -				irbp->ir_free = r.ir_free;
> > > > > > -				irbp++;
> > > > > > -				icount += r.ir_count - r.ir_freecount;
> > > > > > -			}
> > > > > > -			error = xfs_btree_increment(cur, 0, &stat);
> > > > > > -			if (error || stat == 0) {
> > > > > > -				end_of_ag = true;
> > > > > > -				goto del_cursor;
> > > > > > -			}
> > > > > > -			cond_resched();
> > > > > > -		}
> > > > > > -
> > > > > > -		/*
> > > > > > -		 * Drop the btree buffers and the agi buffer as we can't hold any
> > > > > > -		 * of the locks these represent when calling iget. If there is a
> > > > > > -		 * pending error, then we are done.
> > > > > > -		 */
> > > > > > -del_cursor:
> > > > > > -		xfs_btree_del_cursor(cur, error);
> > > > > > -		xfs_buf_relse(agbp);
> > > > > > -		if (error)
> > > > > > -			break;
> > > > > > -		/*
> > > > > > -		 * Now format all the good inodes into the user's buffer. The
> > > > > > -		 * call to xfs_bulkstat_ag_ichunk() sets up the agino pointer
> > > > > > -		 * for the next loop iteration.
> > > > > > -		 */
> > > > > > -		irbufend = irbp;
> > > > > > -		for (irbp = irbuf;
> > > > > > -		     irbp < irbufend && ac.ac_ubleft >= statstruct_size;
> > > > > > -		     irbp++) {
> > > > > > -			error = xfs_bulkstat_ag_ichunk(mp, agno, irbp,
> > > > > > -					formatter, statstruct_size, &ac,
> > > > > > -					&agino);
> > > > > > -			if (error)
> > > > > > -				break;
> > > > > > -
> > > > > > -			cond_resched();
> > > > > > -		}
> > > > > > -
> > > > > > -		/*
> > > > > > -		 * If we've run out of space or had a formatting error, we
> > > > > > -		 * are now done
> > > > > > -		 */
> > > > > > -		if (ac.ac_ubleft < statstruct_size || error)
> > > > > > -			break;
> > > > > > -
> > > > > > -		if (end_of_ag) {
> > > > > > -			agno++;
> > > > > > -			agino = 0;
> > > > > > -		}
> > > > > > -	}
> > > > > > -	/*
> > > > > > -	 * Done, we're either out of filesystem or space to put the data.
> > > > > > -	 */
> > > > > > -	kmem_free(irbuf);
> > > > > > -	*ubcountp = ac.ac_ubelem;
> > > > > > +	kmem_free(bc.buf);
> > > > > >  
> > > > > >  	/*
> > > > > >  	 * We found some inodes, so clear the error status and return them.
> > > > > > @@ -509,17 +356,9 @@ xfs_bulkstat(
> > > > > >  	 * triggered again and propagated to userspace as there will be no
> > > > > >  	 * formatted inodes in the buffer.
> > > > > >  	 */
> > > > > > -	if (ac.ac_ubelem)
> > > > > > +	if (breq->ocount > 0)
> > > > > >  		error = 0;
> > > > > >  
> > > > > > -	/*
> > > > > > -	 * If we ran out of filesystem, lastino will point off the end of
> > > > > > -	 * the filesystem so the next call will return immediately.
> > > > > > -	 */
> > > > > > -	*lastinop = XFS_AGINO_TO_INO(mp, agno, agino);
> > > > > > -	if (agno >= mp->m_sb.sb_agcount)
> > > > > > -		*done = 1;
> > > > > > -
> > > > > >  	return error;
> > > > > >  }
> > > > > >  
> > > > > > diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
> > > > > > index 369e3f159d4e..7c5f1df360e6 100644
> > > > > > --- a/fs/xfs/xfs_itable.h
> > > > > > +++ b/fs/xfs/xfs_itable.h
> > > > > > @@ -5,63 +5,46 @@
> > > > > >  #ifndef __XFS_ITABLE_H__
> > > > > >  #define	__XFS_ITABLE_H__
> > > > > >  
> > > > > > -/*
> > > > > > - * xfs_bulkstat() is used to fill in xfs_bstat structures as well as dm_stat
> > > > > > - * structures (by the dmi library). This is a pointer to a formatter function
> > > > > > - * that will iget the inode and fill in the appropriate structure.
> > > > > > - * see xfs_bulkstat_one() and xfs_dm_bulkstat_one() in dmapi_xfs.c
> > > > > > - */
> > > > > > -typedef int (*bulkstat_one_pf)(struct xfs_mount	*mp,
> > > > > > -			       xfs_ino_t	ino,
> > > > > > -			       void		__user *buffer,
> > > > > > -			       int		ubsize,
> > > > > > -			       int		*ubused,
> > > > > > -			       int		*stat);
> > > > > > +/* In-memory representation of a userspace request for batch inode data. */
> > > > > > +struct xfs_ibulk {
> > > > > > +	struct xfs_mount	*mp;
> > > > > > +	void __user		*ubuffer; /* user output buffer */
> > > > > > +	xfs_ino_t		startino; /* start with this inode */
> > > > > > +	unsigned int		icount;   /* number of elements in ubuffer */
> > > > > > +	unsigned int		ocount;   /* number of records returned */
> > > > > > +};
> > > > > > +
> > > > > > +/* Return value that means we want to abort the walk. */
> > > > > > +#define XFS_IBULK_ABORT		(XFS_IWALK_ABORT)
> > > > > > +
> > > > > > +/* Return value that means the formatting buffer is now full. */
> > > > > > +#define XFS_IBULK_BUFFER_FULL	(XFS_IBULK_ABORT + 1)
> > > > > >  
> > > > > >  /*
> > > > > > - * Values for stat return value.
> > > > > > + * Advance the user buffer pointer by one record of the given size.  If the
> > > > > > + * buffer is now full, return the appropriate error code.
> > > > > >   */
> > > > > > -#define BULKSTAT_RV_NOTHING	0
> > > > > > -#define BULKSTAT_RV_DIDONE	1
> > > > > > -#define BULKSTAT_RV_GIVEUP	2
> > > > > > +static inline int
> > > > > > +xfs_ibulk_advance(
> > > > > > +	struct xfs_ibulk	*breq,
> > > > > > +	size_t			bytes)
> > > > > > +{
> > > > > > +	char __user		*b = breq->ubuffer;
> > > > > > +
> > > > > > +	breq->ubuffer = b + bytes;
> > > > > > +	breq->ocount++;
> > > > > > +	return breq->ocount == breq->icount ? XFS_IBULK_BUFFER_FULL : 0;
> > > > > > +}
> > > > > >  
> > > > > >  /*
> > > > > >   * Return stat information in bulk (by-inode) for the filesystem.
> > > > > >   */
> > > > > > -int					/* error status */
> > > > > > -xfs_bulkstat(
> > > > > > -	xfs_mount_t	*mp,		/* mount point for filesystem */
> > > > > > -	xfs_ino_t	*lastino,	/* last inode returned */
> > > > > > -	int		*count,		/* size of buffer/count returned */
> > > > > > -	bulkstat_one_pf formatter,	/* func that'd fill a single buf */
> > > > > > -	size_t		statstruct_size,/* sizeof struct that we're filling */
> > > > > > -	char		__user *ubuffer,/* buffer with inode stats */
> > > > > > -	int		*done);		/* 1 if there are more stats to get */
> > > > > >  
> > > > > > -typedef int (*bulkstat_one_fmt_pf)(  /* used size in bytes or negative error */
> > > > > > -	void			__user *ubuffer, /* buffer to write to */
> > > > > > -	int			ubsize,		 /* remaining user buffer sz */
> > > > > > -	int			*ubused,	 /* bytes used by formatter */
> > > > > > -	const xfs_bstat_t	*buffer);        /* buffer to read from */
> > > > > > +typedef int (*bulkstat_one_fmt_pf)(struct xfs_ibulk *breq,
> > > > > > +		const struct xfs_bstat *bstat);
> > > > > >  
> > > > > > -int
> > > > > > -xfs_bulkstat_one_int(
> > > > > > -	xfs_mount_t		*mp,
> > > > > > -	xfs_ino_t		ino,
> > > > > > -	void			__user *buffer,
> > > > > > -	int			ubsize,
> > > > > > -	bulkstat_one_fmt_pf	formatter,
> > > > > > -	int			*ubused,
> > > > > > -	int			*stat);
> > > > > > -
> > > > > > -int
> > > > > > -xfs_bulkstat_one(
> > > > > > -	xfs_mount_t		*mp,
> > > > > > -	xfs_ino_t		ino,
> > > > > > -	void			__user *buffer,
> > > > > > -	int			ubsize,
> > > > > > -	int			*ubused,
> > > > > > -	int			*stat);
> > > > > > +int xfs_bulkstat_one(struct xfs_ibulk *breq, bulkstat_one_fmt_pf formatter);
> > > > > > +int xfs_bulkstat(struct xfs_ibulk *breq, bulkstat_one_fmt_pf formatter);
> > > > > >  
> > > > > >  typedef int (*inumbers_fmt_pf)(
> > > > > >  	void			__user *ubuffer, /* buffer to write to */
> > > > > > 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/14] xfs: convert bulkstat to new iwalk infrastructure
  2019-07-02 11:42             ` Brian Foster
@ 2019-07-02 15:33               ` Darrick J. Wong
  0 siblings, 0 replies; 33+ messages in thread
From: Darrick J. Wong @ 2019-07-02 15:33 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Tue, Jul 02, 2019 at 07:42:05AM -0400, Brian Foster wrote:
> On Fri, Jun 14, 2019 at 09:45:10AM -0700, Darrick J. Wong wrote:
> > On Fri, Jun 14, 2019 at 07:10:12AM -0400, Brian Foster wrote:
> > > On Thu, Jun 13, 2019 at 04:03:58PM -0700, Darrick J. Wong wrote:
> > > > On Thu, Jun 13, 2019 at 11:12:06AM -0700, Darrick J. Wong wrote:
> > > > > On Thu, Jun 13, 2019 at 12:31:54PM -0400, Brian Foster wrote:
> > > > > > On Tue, Jun 11, 2019 at 11:48:09PM -0700, Darrick J. Wong wrote:
> > > > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > > 
> > > > > > > Create a new ibulk structure incore to help us deal with bulk inode stat
> > > > > > > state tracking and then convert the bulkstat code to use the new iwalk
> > > > > > > iterator.  This disentangles inode walking from bulk stat control for
> > > > > > > simpler code and enables us to isolate the formatter functions to the
> > > > > > > ioctl handling code.
> > > > > > > 
> > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > > ---
> > > > > > >  fs/xfs/xfs_ioctl.c   |   70 ++++++--
> > > > > > >  fs/xfs/xfs_ioctl.h   |    5 +
> > > > > > >  fs/xfs/xfs_ioctl32.c |   93 ++++++-----
> > > > > > >  fs/xfs/xfs_itable.c  |  431 ++++++++++++++++----------------------------------
> > > > > > >  fs/xfs/xfs_itable.h  |   79 ++++-----
> > > > > > >  5 files changed, 272 insertions(+), 406 deletions(-)
> > > > > > > 
> > > > > > > 
> > > > > > ...
> > > > > > > diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> > > > > > > index 814ffe6fbab7..5d1c143bac18 100644
> > > > > > > --- a/fs/xfs/xfs_ioctl32.c
> > > > > > > +++ b/fs/xfs/xfs_ioctl32.c
> > > > > > ...
> > > > > > > @@ -284,38 +266,59 @@ xfs_compat_ioc_bulkstat(
> > > > > > >  		return -EFAULT;
> > > > > > >  	bulkreq.ocount = compat_ptr(addr);
> > > > > > >  
> > > > > > > -	if (copy_from_user(&inlast, bulkreq.lastip, sizeof(__s64)))
> > > > > > > +	if (copy_from_user(&lastino, bulkreq.lastip, sizeof(__s64)))
> > > > > > >  		return -EFAULT;
> > > > > > > +	breq.startino = lastino + 1;
> > > > > > >  
> > > > > > 
> > > > > > Spurious assignment?
> > > > > 
> > > > > Fixed.
> > > > > 
> > > > > > > -	if ((count = bulkreq.icount) <= 0)
> > > > > > > +	if (bulkreq.icount <= 0)
> > > > > > >  		return -EINVAL;
> > > > > > >  
> > > > > > >  	if (bulkreq.ubuffer == NULL)
> > > > > > >  		return -EINVAL;
> > > > > > >  
> > > > > > > +	breq.ubuffer = bulkreq.ubuffer;
> > > > > > > +	breq.icount = bulkreq.icount;
> > > > > > > +
> > > > > > ...
> > > > > > > diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> > > > > > > index 3ca1c454afe6..58e411e11d6c 100644
> > > > > > > --- a/fs/xfs/xfs_itable.c
> > > > > > > +++ b/fs/xfs/xfs_itable.c
> > > > > > > @@ -14,47 +14,68 @@
> > > > > > ...
> > > > > > > +STATIC int
> > > > > > >  xfs_bulkstat_one_int(
> > > > > > > -	struct xfs_mount	*mp,		/* mount point for filesystem */
> > > > > > > -	xfs_ino_t		ino,		/* inode to get data for */
> > > > > > > -	void __user		*buffer,	/* buffer to place output in */
> > > > > > > -	int			ubsize,		/* size of buffer */
> > > > > > > -	bulkstat_one_fmt_pf	formatter,	/* formatter, copy to user */
> > > > > > > -	int			*ubused,	/* bytes used by me */
> > > > > > > -	int			*stat)		/* BULKSTAT_RV_... */
> > > > > > > +	struct xfs_mount	*mp,
> > > > > > > +	struct xfs_trans	*tp,
> > > > > > > +	xfs_ino_t		ino,
> > > > > > > +	void			*data)
> > > > > > 
> > > > > > There's no need for a void pointer here given the current usage. We
> > > > > > might as well pass this as bc (and let the caller cast it, if
> > > > > > necessary).
> > > > > > 
> > > > > > That said, it also looks like the only reason we have the
> > > > > > xfs_bulkstat_iwalk wrapper caller of this function is to filter out
> > > > > > certain error values. If those errors are needed for the single inode
> > > > > > case, we could stick something in the bc to toggle that invalid inode
> > > > > > filtering behavior and eliminate the need for the wrapper entirely
> > > > > > (which would pass _one_int() into the iwalk infra directly and require
> > > > > > retaining the void pointer).
> > > > > 
> > > > > Ok, will do.  That'll help declutter the source file.
> > > > 
> > > > ...or I won't, because gcc complains that the function pointer passed
> > > > into xfs_iwalk() has to have a (void *) as the 4th parameter.  It's not
> > > > willing to accept one with a (struct xfs_bstat_chunk *).
> > > > 
> > > 
> > > Hm I don't follow, this function already takes a void *data parameter
> > > and we pass bc into xfs_iwalk() as a void*. What am I missing?
> > 
> > typedef int (*xfs_iwalk_fn)(struct xfs_mount *mp, struct xfs_trans *tp,
> > 			    xfs_ino_t ino, void *data);
> > 
> > gcc doesn't like it if the signature of bulkstat_one_int doesn't match
> > xfs_iwalk_fn exactly, even if the only difference is a void pointer vs.
> > a structure pointer.
> > 
> 
> Sure, but I was just suggesting to do one or the other. There's no
> reason for _one_int() to have a void pointer in the current code, but
> the better cleanup IMO is to find a way to just pass _one_int() (with
> its current signature) to xfs_iwalk().

Hmm, yeah, xfs_bstat_chunk just needs to grow a skip_missing flag that
would mask the EINVAL/ENOENT return from _one_int.  Ok, I'll work on
that.

--D

> Brian
> 
> > --D
> > 
> > > 
> > > Brian
> > > 
> > > > Sorry about that. :(
> > > > 
> > > > --D
> > > > 
> > > > > > 
> > > > > > >  {
> > > > > > > +	struct xfs_bstat_chunk	*bc = data;
> > > > > > >  	struct xfs_icdinode	*dic;		/* dinode core info pointer */
> > > > > > >  	struct xfs_inode	*ip;		/* incore inode pointer */
> > > > > > >  	struct inode		*inode;
> > > > > > > -	struct xfs_bstat	*buf;		/* return buffer */
> > > > > > > -	int			error = 0;	/* error value */
> > > > > > > +	struct xfs_bstat	*buf = bc->buf;
> > > > > > > +	int			error = -EINVAL;
> > > > > > >  
> > > > > > > -	*stat = BULKSTAT_RV_NOTHING;
> > > > > > > +	if (xfs_internal_inum(mp, ino))
> > > > > > > +		goto out_advance;
> > > > > > >  
> > > > > > > -	if (!buffer || xfs_internal_inum(mp, ino))
> > > > > > > -		return -EINVAL;
> > > > > > > -
> > > > > > > -	buf = kmem_zalloc(sizeof(*buf), KM_SLEEP | KM_MAYFAIL);
> > > > > > > -	if (!buf)
> > > > > > > -		return -ENOMEM;
> > > > > > > -
> > > > > > > -	error = xfs_iget(mp, NULL, ino,
> > > > > > > +	error = xfs_iget(mp, tp, ino,
> > > > > > >  			 (XFS_IGET_DONTCACHE | XFS_IGET_UNTRUSTED),
> > > > > > >  			 XFS_ILOCK_SHARED, &ip);
> > > > > > > +	if (error == -ENOENT || error == -EINVAL)
> > > > > > > +		goto out_advance;
> > > > > > >  	if (error)
> > > > > > > -		goto out_free;
> > > > > > > +		goto out;
> > > > > > >  
> > > > > > >  	ASSERT(ip != NULL);
> > > > > > >  	ASSERT(ip->i_imap.im_blkno != 0);
> > > > > > > @@ -119,43 +140,56 @@ xfs_bulkstat_one_int(
> > > > > > >  	xfs_iunlock(ip, XFS_ILOCK_SHARED);
> > > > > > >  	xfs_irele(ip);
> > > > > > >  
> > > > > > > -	error = formatter(buffer, ubsize, ubused, buf);
> > > > > > > -	if (!error)
> > > > > > > -		*stat = BULKSTAT_RV_DIDONE;
> > > > > > > +	error = bc->formatter(bc->breq, buf);
> > > > > > > +	if (error == XFS_IBULK_BUFFER_FULL) {
> > > > > > > +		error = XFS_IWALK_ABORT;
> > > > > > 
> > > > > > Related to the earlier patch.. is there a need for IBULK_BUFFER_FULL if
> > > > > > the only user converts it to the generic abort error?
> > > > > 
> > > > > <shrug> I wasn't sure if there was ever going to be a case where the
> > > > > formatter function wanted to abort for a reason that wasn't a full
> > > > > buffer... though looking at the bulkstat-v5 patches there aren't any.
> > > > > I guess I'll just remove BUFFER_FULL, then.
> > > > > 
> > > > > --D
> > > > > 
> > > > > > Most of these comments are minor/aesthetic, so:
> > > > > > 
> > > > > > Reviewed-by: Brian Foster <bfoster@redhat.com>
> > > > > > 
> > > > > > > +		goto out_advance;
> > > > > > > +	}
> > > > > > > +	if (error)
> > > > > > > +		goto out;
> > > > > > >  
> > > > > > > - out_free:
> > > > > > > -	kmem_free(buf);
> > > > > > > +out_advance:
> > > > > > > +	/*
> > > > > > > +	 * Advance the cursor to the inode that comes after the one we just
> > > > > > > +	 * looked at.  We want the caller to move along if the bulkstat
> > > > > > > +	 * information was copied successfully; if we tried to grab the inode
> > > > > > > +	 * but it's no longer allocated; or if it's internal metadata.
> > > > > > > +	 */
> > > > > > > +	bc->breq->startino = ino + 1;
> > > > > > > +out:
> > > > > > >  	return error;
> > > > > > >  }
> > > > > > >  
> > > > > > > -/* Return 0 on success or positive error */
> > > > > > > -STATIC int
> > > > > > > -xfs_bulkstat_one_fmt(
> > > > > > > -	void			__user *ubuffer,
> > > > > > > -	int			ubsize,
> > > > > > > -	int			*ubused,
> > > > > > > -	const xfs_bstat_t	*buffer)
> > > > > > > -{
> > > > > > > -	if (ubsize < sizeof(*buffer))
> > > > > > > -		return -ENOMEM;
> > > > > > > -	if (copy_to_user(ubuffer, buffer, sizeof(*buffer)))
> > > > > > > -		return -EFAULT;
> > > > > > > -	if (ubused)
> > > > > > > -		*ubused = sizeof(*buffer);
> > > > > > > -	return 0;
> > > > > > > -}
> > > > > > > -
> > > > > > > +/* Bulkstat a single inode. */
> > > > > > >  int
> > > > > > >  xfs_bulkstat_one(
> > > > > > > -	xfs_mount_t	*mp,		/* mount point for filesystem */
> > > > > > > -	xfs_ino_t	ino,		/* inode number to get data for */
> > > > > > > -	void		__user *buffer,	/* buffer to place output in */
> > > > > > > -	int		ubsize,		/* size of buffer */
> > > > > > > -	int		*ubused,	/* bytes used by me */
> > > > > > > -	int		*stat)		/* BULKSTAT_RV_... */
> > > > > > > +	struct xfs_ibulk	*breq,
> > > > > > > +	bulkstat_one_fmt_pf	formatter)
> > > > > > >  {
> > > > > > > -	return xfs_bulkstat_one_int(mp, ino, buffer, ubsize,
> > > > > > > -				    xfs_bulkstat_one_fmt, ubused, stat);
> > > > > > > +	struct xfs_bstat_chunk	bc = {
> > > > > > > +		.formatter	= formatter,
> > > > > > > +		.breq		= breq,
> > > > > > > +	};
> > > > > > > +	int			error;
> > > > > > > +
> > > > > > > +	ASSERT(breq->icount == 1);
> > > > > > > +
> > > > > > > +	bc.buf = kmem_zalloc(sizeof(struct xfs_bstat), KM_SLEEP | KM_MAYFAIL);
> > > > > > > +	if (!bc.buf)
> > > > > > > +		return -ENOMEM;
> > > > > > > +
> > > > > > > +	error = xfs_bulkstat_one_int(breq->mp, NULL, breq->startino, &bc);
> > > > > > > +
> > > > > > > +	kmem_free(bc.buf);
> > > > > > > +
> > > > > > > +	/*
> > > > > > > +	 * If we reported one inode to userspace then we abort because we hit
> > > > > > > +	 * the end of the buffer.  Don't leak that back to userspace.
> > > > > > > +	 */
> > > > > > > +	if (error == XFS_IWALK_ABORT)
> > > > > > > +		error = 0;
> > > > > > > +
> > > > > > > +	return error;
> > > > > > >  }
> > > > > > >  
> > > > > > >  /*
> > > > > > > @@ -251,256 +285,69 @@ xfs_bulkstat_grab_ichunk(
> > > > > > >  
> > > > > > >  #define XFS_BULKSTAT_UBLEFT(ubleft)	((ubleft) >= statstruct_size)
> > > > > > >  
> > > > > > > -struct xfs_bulkstat_agichunk {
> > > > > > > -	char		__user **ac_ubuffer;/* pointer into user's buffer */
> > > > > > > -	int		ac_ubleft;	/* bytes left in user's buffer */
> > > > > > > -	int		ac_ubelem;	/* spaces used in user's buffer */
> > > > > > > -};
> > > > > > > -
> > > > > > > -/*
> > > > > > > - * Process inodes in chunk with a pointer to a formatter function
> > > > > > > - * that will iget the inode and fill in the appropriate structure.
> > > > > > > - */
> > > > > > >  static int
> > > > > > > -xfs_bulkstat_ag_ichunk(
> > > > > > > -	struct xfs_mount		*mp,
> > > > > > > -	xfs_agnumber_t			agno,
> > > > > > > -	struct xfs_inobt_rec_incore	*irbp,
> > > > > > > -	bulkstat_one_pf			formatter,
> > > > > > > -	size_t				statstruct_size,
> > > > > > > -	struct xfs_bulkstat_agichunk	*acp,
> > > > > > > -	xfs_agino_t			*last_agino)
> > > > > > > +xfs_bulkstat_iwalk(
> > > > > > > +	struct xfs_mount	*mp,
> > > > > > > +	struct xfs_trans	*tp,
> > > > > > > +	xfs_ino_t		ino,
> > > > > > > +	void			*data)
> > > > > > >  {
> > > > > > > -	char				__user **ubufp = acp->ac_ubuffer;
> > > > > > > -	int				chunkidx;
> > > > > > > -	int				error = 0;
> > > > > > > -	xfs_agino_t			agino = irbp->ir_startino;
> > > > > > > -
> > > > > > > -	for (chunkidx = 0; chunkidx < XFS_INODES_PER_CHUNK;
> > > > > > > -	     chunkidx++, agino++) {
> > > > > > > -		int		fmterror;
> > > > > > > -		int		ubused;
> > > > > > > -
> > > > > > > -		/* inode won't fit in buffer, we are done */
> > > > > > > -		if (acp->ac_ubleft < statstruct_size)
> > > > > > > -			break;
> > > > > > > -
> > > > > > > -		/* Skip if this inode is free */
> > > > > > > -		if (XFS_INOBT_MASK(chunkidx) & irbp->ir_free)
> > > > > > > -			continue;
> > > > > > > -
> > > > > > > -		/* Get the inode and fill in a single buffer */
> > > > > > > -		ubused = statstruct_size;
> > > > > > > -		error = formatter(mp, XFS_AGINO_TO_INO(mp, agno, agino),
> > > > > > > -				  *ubufp, acp->ac_ubleft, &ubused, &fmterror);
> > > > > > > -
> > > > > > > -		if (fmterror == BULKSTAT_RV_GIVEUP ||
> > > > > > > -		    (error && error != -ENOENT && error != -EINVAL)) {
> > > > > > > -			acp->ac_ubleft = 0;
> > > > > > > -			ASSERT(error);
> > > > > > > -			break;
> > > > > > > -		}
> > > > > > > -
> > > > > > > -		/* be careful not to leak error if at end of chunk */
> > > > > > > -		if (fmterror == BULKSTAT_RV_NOTHING || error) {
> > > > > > > -			error = 0;
> > > > > > > -			continue;
> > > > > > > -		}
> > > > > > > -
> > > > > > > -		*ubufp += ubused;
> > > > > > > -		acp->ac_ubleft -= ubused;
> > > > > > > -		acp->ac_ubelem++;
> > > > > > > -	}
> > > > > > > -
> > > > > > > -	/*
> > > > > > > -	 * Post-update *last_agino. At this point, agino will always point one
> > > > > > > -	 * inode past the last inode we processed successfully. Hence we
> > > > > > > -	 * substract that inode when setting the *last_agino cursor so that we
> > > > > > > -	 * return the correct cookie to userspace. On the next bulkstat call,
> > > > > > > -	 * the inode under the lastino cookie will be skipped as we have already
> > > > > > > -	 * processed it here.
> > > > > > > -	 */
> > > > > > > -	*last_agino = agino - 1;
> > > > > > > +	int			error;
> > > > > > >  
> > > > > > > +	error = xfs_bulkstat_one_int(mp, tp, ino, data);
> > > > > > > +	/* bulkstat just skips over missing inodes */
> > > > > > > +	if (error == -ENOENT || error == -EINVAL)
> > > > > > > +		return 0;
> > > > > > >  	return error;
> > > > > > >  }
> > > > > > >  
> > > > > > >  /*
> > > > > > > - * Return stat information in bulk (by-inode) for the filesystem.
> > > > > > > + * Check the incoming lastino parameter.
> > > > > > > + *
> > > > > > > + * We allow any inode value that could map to physical space inside the
> > > > > > > + * filesystem because if there are no inodes there, bulkstat moves on to the
> > > > > > > + * next chunk.  In other words, the magic agino value of zero takes us to the
> > > > > > > + * first chunk in the AG, and an agino value past the end of the AG takes us to
> > > > > > > + * the first chunk in the next AG.
> > > > > > > + *
> > > > > > > + * Therefore we can end early if the requested inode is beyond the end of the
> > > > > > > + * filesystem or doesn't map properly.
> > > > > > >   */
> > > > > > > -int					/* error status */
> > > > > > > -xfs_bulkstat(
> > > > > > > -	xfs_mount_t		*mp,	/* mount point for filesystem */
> > > > > > > -	xfs_ino_t		*lastinop, /* last inode returned */
> > > > > > > -	int			*ubcountp, /* size of buffer/count returned */
> > > > > > > -	bulkstat_one_pf		formatter, /* func that'd fill a single buf */
> > > > > > > -	size_t			statstruct_size, /* sizeof struct filling */
> > > > > > > -	char			__user *ubuffer, /* buffer with inode stats */
> > > > > > > -	int			*done)	/* 1 if there are more stats to get */
> > > > > > > +static inline bool
> > > > > > > +xfs_bulkstat_already_done(
> > > > > > > +	struct xfs_mount	*mp,
> > > > > > > +	xfs_ino_t		startino)
> > > > > > >  {
> > > > > > > -	xfs_buf_t		*agbp;	/* agi header buffer */
> > > > > > > -	xfs_agino_t		agino;	/* inode # in allocation group */
> > > > > > > -	xfs_agnumber_t		agno;	/* allocation group number */
> > > > > > > -	xfs_btree_cur_t		*cur;	/* btree cursor for ialloc btree */
> > > > > > > -	xfs_inobt_rec_incore_t	*irbuf;	/* start of irec buffer */
> > > > > > > -	int			nirbuf;	/* size of irbuf */
> > > > > > > -	int			ubcount; /* size of user's buffer */
> > > > > > > -	struct xfs_bulkstat_agichunk ac;
> > > > > > > -	int			error = 0;
> > > > > > > +	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, startino);
> > > > > > > +	xfs_agino_t		agino = XFS_INO_TO_AGINO(mp, startino);
> > > > > > >  
> > > > > > > -	/*
> > > > > > > -	 * Get the last inode value, see if there's nothing to do.
> > > > > > > -	 */
> > > > > > > -	agno = XFS_INO_TO_AGNO(mp, *lastinop);
> > > > > > > -	agino = XFS_INO_TO_AGINO(mp, *lastinop);
> > > > > > > -	if (agno >= mp->m_sb.sb_agcount ||
> > > > > > > -	    *lastinop != XFS_AGINO_TO_INO(mp, agno, agino)) {
> > > > > > > -		*done = 1;
> > > > > > > -		*ubcountp = 0;
> > > > > > > -		return 0;
> > > > > > > -	}
> > > > > > > +	return agno >= mp->m_sb.sb_agcount ||
> > > > > > > +	       startino != XFS_AGINO_TO_INO(mp, agno, agino);
> > > > > > > +}
> > > > > > >  
> > > > > > > -	ubcount = *ubcountp; /* statstruct's */
> > > > > > > -	ac.ac_ubuffer = &ubuffer;
> > > > > > > -	ac.ac_ubleft = ubcount * statstruct_size; /* bytes */;
> > > > > > > -	ac.ac_ubelem = 0;
> > > > > > > +/* Return stat information in bulk (by-inode) for the filesystem. */
> > > > > > > +int
> > > > > > > +xfs_bulkstat(
> > > > > > > +	struct xfs_ibulk	*breq,
> > > > > > > +	bulkstat_one_fmt_pf	formatter)
> > > > > > > +{
> > > > > > > +	struct xfs_bstat_chunk	bc = {
> > > > > > > +		.formatter	= formatter,
> > > > > > > +		.breq		= breq,
> > > > > > > +	};
> > > > > > > +	int			error;
> > > > > > >  
> > > > > > > -	*ubcountp = 0;
> > > > > > > -	*done = 0;
> > > > > > > +	if (xfs_bulkstat_already_done(breq->mp, breq->startino))
> > > > > > > +		return 0;
> > > > > > >  
> > > > > > > -	irbuf = kmem_zalloc_large(PAGE_SIZE * 4, KM_SLEEP);
> > > > > > > -	if (!irbuf)
> > > > > > > +	bc.buf = kmem_zalloc(sizeof(struct xfs_bstat), KM_SLEEP | KM_MAYFAIL);
> > > > > > > +	if (!bc.buf)
> > > > > > >  		return -ENOMEM;
> > > > > > > -	nirbuf = (PAGE_SIZE * 4) / sizeof(*irbuf);
> > > > > > >  
> > > > > > > -	/*
> > > > > > > -	 * Loop over the allocation groups, starting from the last
> > > > > > > -	 * inode returned; 0 means start of the allocation group.
> > > > > > > -	 */
> > > > > > > -	while (agno < mp->m_sb.sb_agcount) {
> > > > > > > -		struct xfs_inobt_rec_incore	*irbp = irbuf;
> > > > > > > -		struct xfs_inobt_rec_incore	*irbufend = irbuf + nirbuf;
> > > > > > > -		bool				end_of_ag = false;
> > > > > > > -		int				icount = 0;
> > > > > > > -		int				stat;
> > > > > > > +	error = xfs_iwalk(breq->mp, NULL, breq->startino, xfs_bulkstat_iwalk,
> > > > > > > +			breq->icount, &bc);
> > > > > > >  
> > > > > > > -		error = xfs_ialloc_read_agi(mp, NULL, agno, &agbp);
> > > > > > > -		if (error)
> > > > > > > -			break;
> > > > > > > -		/*
> > > > > > > -		 * Allocate and initialize a btree cursor for ialloc btree.
> > > > > > > -		 */
> > > > > > > -		cur = xfs_inobt_init_cursor(mp, NULL, agbp, agno,
> > > > > > > -					    XFS_BTNUM_INO);
> > > > > > > -		if (agino > 0) {
> > > > > > > -			/*
> > > > > > > -			 * In the middle of an allocation group, we need to get
> > > > > > > -			 * the remainder of the chunk we're in.
> > > > > > > -			 */
> > > > > > > -			struct xfs_inobt_rec_incore	r;
> > > > > > > -
> > > > > > > -			error = xfs_bulkstat_grab_ichunk(cur, agino, &icount, &r);
> > > > > > > -			if (error)
> > > > > > > -				goto del_cursor;
> > > > > > > -			if (icount) {
> > > > > > > -				irbp->ir_startino = r.ir_startino;
> > > > > > > -				irbp->ir_holemask = r.ir_holemask;
> > > > > > > -				irbp->ir_count = r.ir_count;
> > > > > > > -				irbp->ir_freecount = r.ir_freecount;
> > > > > > > -				irbp->ir_free = r.ir_free;
> > > > > > > -				irbp++;
> > > > > > > -			}
> > > > > > > -			/* Increment to the next record */
> > > > > > > -			error = xfs_btree_increment(cur, 0, &stat);
> > > > > > > -		} else {
> > > > > > > -			/* Start of ag.  Lookup the first inode chunk */
> > > > > > > -			error = xfs_inobt_lookup(cur, 0, XFS_LOOKUP_GE, &stat);
> > > > > > > -		}
> > > > > > > -		if (error || stat == 0) {
> > > > > > > -			end_of_ag = true;
> > > > > > > -			goto del_cursor;
> > > > > > > -		}
> > > > > > > -
> > > > > > > -		/*
> > > > > > > -		 * Loop through inode btree records in this ag,
> > > > > > > -		 * until we run out of inodes or space in the buffer.
> > > > > > > -		 */
> > > > > > > -		while (irbp < irbufend && icount < ubcount) {
> > > > > > > -			struct xfs_inobt_rec_incore	r;
> > > > > > > -
> > > > > > > -			error = xfs_inobt_get_rec(cur, &r, &stat);
> > > > > > > -			if (error || stat == 0) {
> > > > > > > -				end_of_ag = true;
> > > > > > > -				goto del_cursor;
> > > > > > > -			}
> > > > > > > -
> > > > > > > -			/*
> > > > > > > -			 * If this chunk has any allocated inodes, save it.
> > > > > > > -			 * Also start read-ahead now for this chunk.
> > > > > > > -			 */
> > > > > > > -			if (r.ir_freecount < r.ir_count) {
> > > > > > > -				xfs_bulkstat_ichunk_ra(mp, agno, &r);
> > > > > > > -				irbp->ir_startino = r.ir_startino;
> > > > > > > -				irbp->ir_holemask = r.ir_holemask;
> > > > > > > -				irbp->ir_count = r.ir_count;
> > > > > > > -				irbp->ir_freecount = r.ir_freecount;
> > > > > > > -				irbp->ir_free = r.ir_free;
> > > > > > > -				irbp++;
> > > > > > > -				icount += r.ir_count - r.ir_freecount;
> > > > > > > -			}
> > > > > > > -			error = xfs_btree_increment(cur, 0, &stat);
> > > > > > > -			if (error || stat == 0) {
> > > > > > > -				end_of_ag = true;
> > > > > > > -				goto del_cursor;
> > > > > > > -			}
> > > > > > > -			cond_resched();
> > > > > > > -		}
> > > > > > > -
> > > > > > > -		/*
> > > > > > > -		 * Drop the btree buffers and the agi buffer as we can't hold any
> > > > > > > -		 * of the locks these represent when calling iget. If there is a
> > > > > > > -		 * pending error, then we are done.
> > > > > > > -		 */
> > > > > > > -del_cursor:
> > > > > > > -		xfs_btree_del_cursor(cur, error);
> > > > > > > -		xfs_buf_relse(agbp);
> > > > > > > -		if (error)
> > > > > > > -			break;
> > > > > > > -		/*
> > > > > > > -		 * Now format all the good inodes into the user's buffer. The
> > > > > > > -		 * call to xfs_bulkstat_ag_ichunk() sets up the agino pointer
> > > > > > > -		 * for the next loop iteration.
> > > > > > > -		 */
> > > > > > > -		irbufend = irbp;
> > > > > > > -		for (irbp = irbuf;
> > > > > > > -		     irbp < irbufend && ac.ac_ubleft >= statstruct_size;
> > > > > > > -		     irbp++) {
> > > > > > > -			error = xfs_bulkstat_ag_ichunk(mp, agno, irbp,
> > > > > > > -					formatter, statstruct_size, &ac,
> > > > > > > -					&agino);
> > > > > > > -			if (error)
> > > > > > > -				break;
> > > > > > > -
> > > > > > > -			cond_resched();
> > > > > > > -		}
> > > > > > > -
> > > > > > > -		/*
> > > > > > > -		 * If we've run out of space or had a formatting error, we
> > > > > > > -		 * are now done
> > > > > > > -		 */
> > > > > > > -		if (ac.ac_ubleft < statstruct_size || error)
> > > > > > > -			break;
> > > > > > > -
> > > > > > > -		if (end_of_ag) {
> > > > > > > -			agno++;
> > > > > > > -			agino = 0;
> > > > > > > -		}
> > > > > > > -	}
> > > > > > > -	/*
> > > > > > > -	 * Done, we're either out of filesystem or space to put the data.
> > > > > > > -	 */
> > > > > > > -	kmem_free(irbuf);
> > > > > > > -	*ubcountp = ac.ac_ubelem;
> > > > > > > +	kmem_free(bc.buf);
> > > > > > >  
> > > > > > >  	/*
> > > > > > >  	 * We found some inodes, so clear the error status and return them.
> > > > > > > @@ -509,17 +356,9 @@ xfs_bulkstat(
> > > > > > >  	 * triggered again and propagated to userspace as there will be no
> > > > > > >  	 * formatted inodes in the buffer.
> > > > > > >  	 */
> > > > > > > -	if (ac.ac_ubelem)
> > > > > > > +	if (breq->ocount > 0)
> > > > > > >  		error = 0;
> > > > > > >  
> > > > > > > -	/*
> > > > > > > -	 * If we ran out of filesystem, lastino will point off the end of
> > > > > > > -	 * the filesystem so the next call will return immediately.
> > > > > > > -	 */
> > > > > > > -	*lastinop = XFS_AGINO_TO_INO(mp, agno, agino);
> > > > > > > -	if (agno >= mp->m_sb.sb_agcount)
> > > > > > > -		*done = 1;
> > > > > > > -
> > > > > > >  	return error;
> > > > > > >  }
> > > > > > >  
> > > > > > > diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
> > > > > > > index 369e3f159d4e..7c5f1df360e6 100644
> > > > > > > --- a/fs/xfs/xfs_itable.h
> > > > > > > +++ b/fs/xfs/xfs_itable.h
> > > > > > > @@ -5,63 +5,46 @@
> > > > > > >  #ifndef __XFS_ITABLE_H__
> > > > > > >  #define	__XFS_ITABLE_H__
> > > > > > >  
> > > > > > > -/*
> > > > > > > - * xfs_bulkstat() is used to fill in xfs_bstat structures as well as dm_stat
> > > > > > > - * structures (by the dmi library). This is a pointer to a formatter function
> > > > > > > - * that will iget the inode and fill in the appropriate structure.
> > > > > > > - * see xfs_bulkstat_one() and xfs_dm_bulkstat_one() in dmapi_xfs.c
> > > > > > > - */
> > > > > > > -typedef int (*bulkstat_one_pf)(struct xfs_mount	*mp,
> > > > > > > -			       xfs_ino_t	ino,
> > > > > > > -			       void		__user *buffer,
> > > > > > > -			       int		ubsize,
> > > > > > > -			       int		*ubused,
> > > > > > > -			       int		*stat);
> > > > > > > +/* In-memory representation of a userspace request for batch inode data. */
> > > > > > > +struct xfs_ibulk {
> > > > > > > +	struct xfs_mount	*mp;
> > > > > > > +	void __user		*ubuffer; /* user output buffer */
> > > > > > > +	xfs_ino_t		startino; /* start with this inode */
> > > > > > > +	unsigned int		icount;   /* number of elements in ubuffer */
> > > > > > > +	unsigned int		ocount;   /* number of records returned */
> > > > > > > +};
> > > > > > > +
> > > > > > > +/* Return value that means we want to abort the walk. */
> > > > > > > +#define XFS_IBULK_ABORT		(XFS_IWALK_ABORT)
> > > > > > > +
> > > > > > > +/* Return value that means the formatting buffer is now full. */
> > > > > > > +#define XFS_IBULK_BUFFER_FULL	(XFS_IBULK_ABORT + 1)
> > > > > > >  
> > > > > > >  /*
> > > > > > > - * Values for stat return value.
> > > > > > > + * Advance the user buffer pointer by one record of the given size.  If the
> > > > > > > + * buffer is now full, return the appropriate error code.
> > > > > > >   */
> > > > > > > -#define BULKSTAT_RV_NOTHING	0
> > > > > > > -#define BULKSTAT_RV_DIDONE	1
> > > > > > > -#define BULKSTAT_RV_GIVEUP	2
> > > > > > > +static inline int
> > > > > > > +xfs_ibulk_advance(
> > > > > > > +	struct xfs_ibulk	*breq,
> > > > > > > +	size_t			bytes)
> > > > > > > +{
> > > > > > > +	char __user		*b = breq->ubuffer;
> > > > > > > +
> > > > > > > +	breq->ubuffer = b + bytes;
> > > > > > > +	breq->ocount++;
> > > > > > > +	return breq->ocount == breq->icount ? XFS_IBULK_BUFFER_FULL : 0;
> > > > > > > +}
> > > > > > >  
> > > > > > >  /*
> > > > > > >   * Return stat information in bulk (by-inode) for the filesystem.
> > > > > > >   */
> > > > > > > -int					/* error status */
> > > > > > > -xfs_bulkstat(
> > > > > > > -	xfs_mount_t	*mp,		/* mount point for filesystem */
> > > > > > > -	xfs_ino_t	*lastino,	/* last inode returned */
> > > > > > > -	int		*count,		/* size of buffer/count returned */
> > > > > > > -	bulkstat_one_pf formatter,	/* func that'd fill a single buf */
> > > > > > > -	size_t		statstruct_size,/* sizeof struct that we're filling */
> > > > > > > -	char		__user *ubuffer,/* buffer with inode stats */
> > > > > > > -	int		*done);		/* 1 if there are more stats to get */
> > > > > > >  
> > > > > > > -typedef int (*bulkstat_one_fmt_pf)(  /* used size in bytes or negative error */
> > > > > > > -	void			__user *ubuffer, /* buffer to write to */
> > > > > > > -	int			ubsize,		 /* remaining user buffer sz */
> > > > > > > -	int			*ubused,	 /* bytes used by formatter */
> > > > > > > -	const xfs_bstat_t	*buffer);        /* buffer to read from */
> > > > > > > +typedef int (*bulkstat_one_fmt_pf)(struct xfs_ibulk *breq,
> > > > > > > +		const struct xfs_bstat *bstat);
> > > > > > >  
> > > > > > > -int
> > > > > > > -xfs_bulkstat_one_int(
> > > > > > > -	xfs_mount_t		*mp,
> > > > > > > -	xfs_ino_t		ino,
> > > > > > > -	void			__user *buffer,
> > > > > > > -	int			ubsize,
> > > > > > > -	bulkstat_one_fmt_pf	formatter,
> > > > > > > -	int			*ubused,
> > > > > > > -	int			*stat);
> > > > > > > -
> > > > > > > -int
> > > > > > > -xfs_bulkstat_one(
> > > > > > > -	xfs_mount_t		*mp,
> > > > > > > -	xfs_ino_t		ino,
> > > > > > > -	void			__user *buffer,
> > > > > > > -	int			ubsize,
> > > > > > > -	int			*ubused,
> > > > > > > -	int			*stat);
> > > > > > > +int xfs_bulkstat_one(struct xfs_ibulk *breq, bulkstat_one_fmt_pf formatter);
> > > > > > > +int xfs_bulkstat(struct xfs_ibulk *breq, bulkstat_one_fmt_pf formatter);
> > > > > > >  
> > > > > > >  typedef int (*inumbers_fmt_pf)(
> > > > > > >  	void			__user *ubuffer, /* buffer to write to */
> > > > > > > 

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2019-07-02 15:34 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-12  6:47 [PATCH v5 00/14] xfs: refactor and improve inode iteration Darrick J. Wong
2019-06-12  6:47 ` [PATCH 01/14] xfs: create iterator error codes Darrick J. Wong
2019-06-13 16:24   ` Brian Foster
2019-06-12  6:47 ` [PATCH 02/14] xfs: create simplified inode walk function Darrick J. Wong
2019-06-13 16:27   ` Brian Foster
2019-06-13 18:06     ` Darrick J. Wong
2019-06-13 18:07       ` Darrick J. Wong
2019-06-12  6:47 ` [PATCH 03/14] xfs: convert quotacheck to use the new iwalk functions Darrick J. Wong
2019-06-12  6:47 ` [PATCH 04/14] xfs: bulkstat should copy lastip whenever userspace supplies one Darrick J. Wong
2019-06-12  6:48 ` [PATCH 05/14] xfs: remove unnecessary includes of xfs_itable.h Darrick J. Wong
2019-06-13 16:27   ` Brian Foster
2019-06-12  6:48 ` [PATCH 06/14] xfs: convert bulkstat to new iwalk infrastructure Darrick J. Wong
2019-06-13 16:31   ` Brian Foster
2019-06-13 18:12     ` Darrick J. Wong
2019-06-13 23:03       ` Darrick J. Wong
2019-06-14 11:10         ` Brian Foster
2019-06-14 16:45           ` Darrick J. Wong
2019-07-02 11:42             ` Brian Foster
2019-07-02 15:33               ` Darrick J. Wong
2019-06-12  6:48 ` [PATCH 07/14] xfs: move bulkstat ichunk helpers to iwalk code Darrick J. Wong
2019-06-12  6:48 ` [PATCH 08/14] xfs: change xfs_iwalk_grab_ichunk to use startino, not lastino Darrick J. Wong
2019-06-12  6:48 ` [PATCH 09/14] xfs: clean up long conditionals in xfs_iwalk_ichunk_ra Darrick J. Wong
2019-06-12  6:48 ` [PATCH 10/14] xfs: refactor xfs_iwalk_grab_ichunk Darrick J. Wong
2019-06-14 14:04   ` Brian Foster
2019-06-12  6:48 ` [PATCH 11/14] xfs: refactor iwalk code to handle walking inobt records Darrick J. Wong
2019-06-14 14:04   ` Brian Foster
2019-06-12  6:48 ` [PATCH 12/14] xfs: refactor INUMBERS to use iwalk functions Darrick J. Wong
2019-06-14 14:05   ` Brian Foster
2019-06-12  6:48 ` [PATCH 13/14] xfs: multithreaded iwalk implementation Darrick J. Wong
2019-06-14 14:06   ` Brian Foster
2019-06-18 18:17     ` Darrick J. Wong
2019-06-12  6:49 ` [PATCH 14/14] xfs: poll waiting for quotacheck Darrick J. Wong
2019-06-14 14:07   ` Brian Foster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).