Linux-XFS Archive on lore.kernel.org
 help / color / Atom feed
* [PATCHSET v3 0/6] xfs: clean up incore inode walk functions
@ 2021-03-26  0:21 Darrick J. Wong
  2021-03-26  0:21 ` [PATCH 1/6] xfs: use s_inodes in xfs_qm_dqrele_all_inodes Darrick J. Wong
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Darrick J. Wong @ 2021-03-26  0:21 UTC (permalink / raw)
  To: djwong; +Cc: Christoph Hellwig, linux-xfs, hch

Hi all,

This series reduces the number of indirect function calls when we want
to iterate the incore inode radix tree, and reduces the number of
arguments that must be passed to the walk function.

I made a few observations about incore inode radix tree walks -- the one
caller (blockgc) that cares about radix tree tags is internal to
xfs_icache.c, and there's a 1:1 mapping between that tag and the
iterator function.  Furthermore, the only other caller (quotaoff) is the
only caller to supply a nonzero flags argument, it never specifies a
radix tree tag, and it only walks inodes that have VFS state.

The first patch moves quotaoff to walk the vfs inode list to drop all
attached dquots, which frees us to remove the iter_flags argument and
the indirect function calls from xfs_inode_walk.

Next, we merge the code that walks reclaimable inodes into
xfs_inode_walk and refactor all the code that sets and clears the radix
tree tags in the perag structure and the perag tree itself.

This series is a prerequisite for the next patchset, since deferred
inode inactivation will add another inode radix tree tag and iterator
function to go with it.

v2: walk the vfs inode list when running quotaoff instead of the radix
    tree, then rework the (now completely internal) inode walk function
    to take the tag as the main parameter.
v3: merge the reclaim loop into xfs_inode_walk, then consolidate the
    radix tree tagging functions

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=inode-walk-cleanups-5.13
---
 fs/xfs/libxfs/xfs_sb.c   |    2 
 fs/xfs/libxfs/xfs_sb.h   |    4 
 fs/xfs/xfs_icache.c      |  377 ++++++++++++++++++----------------------------
 fs/xfs/xfs_icache.h      |   18 +-
 fs/xfs/xfs_qm_syscalls.c |   56 ++++---
 fs/xfs/xfs_super.c       |    2 
 fs/xfs/xfs_trace.h       |    6 -
 7 files changed, 194 insertions(+), 271 deletions(-)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/6] xfs: use s_inodes in xfs_qm_dqrele_all_inodes
  2021-03-26  0:21 [PATCHSET v3 0/6] xfs: clean up incore inode walk functions Darrick J. Wong
@ 2021-03-26  0:21 ` Darrick J. Wong
  2021-03-30  0:44   ` Dave Chinner
  2021-03-26  0:21 ` [PATCH 2/6] xfs: remove iter_flags parameter from xfs_inode_walk_* Darrick J. Wong
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-03-26  0:21 UTC (permalink / raw)
  To: djwong; +Cc: Christoph Hellwig, linux-xfs, hch

From: Christoph Hellwig <hch@lst.de>

Using xfs_inode_walk in xfs_qm_dqrele_all_inodes is complete overkill,
given that function simplify wants to iterate all live inodes known
to the VFS.  Just iterate over the s_inodes list.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_icache.c      |    2 +-
 fs/xfs/xfs_icache.h      |    2 ++
 fs/xfs/xfs_qm_syscalls.c |   56 +++++++++++++++++++++++++++++-----------------
 3 files changed, 38 insertions(+), 22 deletions(-)


diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 3c81daca0e9a..9a307bb738bd 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -235,7 +235,7 @@ xfs_inode_clear_reclaim_tag(
 	xfs_perag_clear_reclaim_tag(pag);
 }
 
-static void
+void
 xfs_inew_wait(
 	struct xfs_inode	*ip)
 {
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index d1fddb152420..d70d93c2bb10 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -78,4 +78,6 @@ int xfs_icache_inode_is_allocated(struct xfs_mount *mp, struct xfs_trans *tp,
 void xfs_blockgc_stop(struct xfs_mount *mp);
 void xfs_blockgc_start(struct xfs_mount *mp);
 
+void xfs_inew_wait(struct xfs_inode *ip);
+
 #endif
diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
index 11f1e2fbf22f..76efae956fa8 100644
--- a/fs/xfs/xfs_qm_syscalls.c
+++ b/fs/xfs/xfs_qm_syscalls.c
@@ -748,41 +748,27 @@ xfs_qm_scall_getquota_next(
 	return error;
 }
 
-STATIC int
+static void
 xfs_dqrele_inode(
 	struct xfs_inode	*ip,
-	void			*args)
+	uint			flags)
 {
-	uint			*flags = args;
-
-	/* skip quota inodes */
-	if (ip == ip->i_mount->m_quotainfo->qi_uquotaip ||
-	    ip == ip->i_mount->m_quotainfo->qi_gquotaip ||
-	    ip == ip->i_mount->m_quotainfo->qi_pquotaip) {
-		ASSERT(ip->i_udquot == NULL);
-		ASSERT(ip->i_gdquot == NULL);
-		ASSERT(ip->i_pdquot == NULL);
-		return 0;
-	}
-
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	if ((*flags & XFS_UQUOTA_ACCT) && ip->i_udquot) {
+	if ((flags & XFS_UQUOTA_ACCT) && ip->i_udquot) {
 		xfs_qm_dqrele(ip->i_udquot);
 		ip->i_udquot = NULL;
 	}
-	if ((*flags & XFS_GQUOTA_ACCT) && ip->i_gdquot) {
+	if ((flags & XFS_GQUOTA_ACCT) && ip->i_gdquot) {
 		xfs_qm_dqrele(ip->i_gdquot);
 		ip->i_gdquot = NULL;
 	}
-	if ((*flags & XFS_PQUOTA_ACCT) && ip->i_pdquot) {
+	if ((flags & XFS_PQUOTA_ACCT) && ip->i_pdquot) {
 		xfs_qm_dqrele(ip->i_pdquot);
 		ip->i_pdquot = NULL;
 	}
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	return 0;
 }
 
-
 /*
  * Go thru all the inodes in the file system, releasing their dquots.
  *
@@ -794,7 +780,35 @@ xfs_qm_dqrele_all_inodes(
 	struct xfs_mount	*mp,
 	uint			flags)
 {
+	struct super_block	*sb = mp->m_super;
+	struct inode		*inode, *old_inode = NULL;
+
 	ASSERT(mp->m_quotainfo);
-	xfs_inode_walk(mp, XFS_INODE_WALK_INEW_WAIT, xfs_dqrele_inode,
-			&flags, XFS_ICI_NO_TAG);
+
+	spin_lock(&sb->s_inode_list_lock);
+	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+		struct xfs_inode	*ip = XFS_I(inode);
+
+		if (XFS_FORCED_SHUTDOWN(mp))
+			break;
+		if (xfs_is_quota_inode(&mp->m_sb, ip->i_ino))
+			continue;
+		if (xfs_iflags_test(ip, XFS_IRECLAIMABLE | XFS_IRECLAIM))
+			continue;
+		if (!igrab(inode))
+			continue;
+		spin_unlock(&sb->s_inode_list_lock);
+
+		iput(old_inode);
+		old_inode = inode;
+
+		if (xfs_iflags_test(ip, XFS_INEW))
+			xfs_inew_wait(ip);
+		xfs_dqrele_inode(ip, flags);
+
+		spin_lock(&sb->s_inode_list_lock);
+	}
+	spin_unlock(&sb->s_inode_list_lock);
+
+	iput(old_inode);
 }


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/6] xfs: remove iter_flags parameter from xfs_inode_walk_*
  2021-03-26  0:21 [PATCHSET v3 0/6] xfs: clean up incore inode walk functions Darrick J. Wong
  2021-03-26  0:21 ` [PATCH 1/6] xfs: use s_inodes in xfs_qm_dqrele_all_inodes Darrick J. Wong
@ 2021-03-26  0:21 ` Darrick J. Wong
  2021-03-26  6:04   ` Christoph Hellwig
  2021-03-26  0:21 ` [PATCH 3/6] xfs: remove indirect calls from xfs_inode_walk{,_ag} Darrick J. Wong
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-03-26  0:21 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

The sole iter_flags is XFS_INODE_WALK_INEW_WAIT, and there are no users.
Remove the flag, and the parameter, and all the code that used it.
Since there are no longer any external callers of xfs_inode_walk, make
it static.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_icache.c |   22 +++++++---------------
 fs/xfs/xfs_icache.h |    9 ---------
 2 files changed, 7 insertions(+), 24 deletions(-)


diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 9a307bb738bd..16392e96be91 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -730,11 +730,9 @@ xfs_icache_inode_is_allocated(
  */
 STATIC bool
 xfs_inode_walk_ag_grab(
-	struct xfs_inode	*ip,
-	int			flags)
+	struct xfs_inode	*ip)
 {
 	struct inode		*inode = VFS_I(ip);
-	bool			newinos = !!(flags & XFS_INODE_WALK_INEW_WAIT);
 
 	ASSERT(rcu_read_lock_held());
 
@@ -744,8 +742,7 @@ xfs_inode_walk_ag_grab(
 		goto out_unlock_noent;
 
 	/* avoid new or reclaimable inodes. Leave for reclaim code to flush */
-	if ((!newinos && __xfs_iflags_test(ip, XFS_INEW)) ||
-	    __xfs_iflags_test(ip, XFS_IRECLAIMABLE | XFS_IRECLAIM))
+	if (__xfs_iflags_test(ip, XFS_INEW | XFS_IRECLAIMABLE | XFS_IRECLAIM))
 		goto out_unlock_noent;
 	spin_unlock(&ip->i_flags_lock);
 
@@ -772,7 +769,6 @@ xfs_inode_walk_ag_grab(
 STATIC int
 xfs_inode_walk_ag(
 	struct xfs_perag	*pag,
-	int			iter_flags,
 	int			(*execute)(struct xfs_inode *ip, void *args),
 	void			*args,
 	int			tag)
@@ -818,7 +814,7 @@ xfs_inode_walk_ag(
 		for (i = 0; i < nr_found; i++) {
 			struct xfs_inode *ip = batch[i];
 
-			if (done || !xfs_inode_walk_ag_grab(ip, iter_flags))
+			if (done || !xfs_inode_walk_ag_grab(ip))
 				batch[i] = NULL;
 
 			/*
@@ -846,9 +842,6 @@ xfs_inode_walk_ag(
 		for (i = 0; i < nr_found; i++) {
 			if (!batch[i])
 				continue;
-			if ((iter_flags & XFS_INODE_WALK_INEW_WAIT) &&
-			    xfs_iflags_test(batch[i], XFS_INEW))
-				xfs_inew_wait(batch[i]);
 			error = execute(batch[i], args);
 			xfs_irele(batch[i]);
 			if (error == -EAGAIN) {
@@ -890,10 +883,9 @@ xfs_inode_walk_get_perag(
  * Call the @execute function on all incore inodes matching the radix tree
  * @tag.
  */
-int
+static int
 xfs_inode_walk(
 	struct xfs_mount	*mp,
-	int			iter_flags,
 	int			(*execute)(struct xfs_inode *ip, void *args),
 	void			*args,
 	int			tag)
@@ -906,7 +898,7 @@ xfs_inode_walk(
 	ag = 0;
 	while ((pag = xfs_inode_walk_get_perag(mp, ag, tag))) {
 		ag = pag->pag_agno + 1;
-		error = xfs_inode_walk_ag(pag, iter_flags, execute, args, tag);
+		error = xfs_inode_walk_ag(pag, execute, args, tag);
 		xfs_perag_put(pag);
 		if (error) {
 			last_error = error;
@@ -1618,7 +1610,7 @@ xfs_blockgc_worker(
 
 	if (!sb_start_write_trylock(mp->m_super))
 		return;
-	error = xfs_inode_walk_ag(pag, 0, xfs_blockgc_scan_inode, NULL,
+	error = xfs_inode_walk_ag(pag, xfs_blockgc_scan_inode, NULL,
 			XFS_ICI_BLOCKGC_TAG);
 	if (error)
 		xfs_info(mp, "AG %u preallocation gc worker failed, err=%d",
@@ -1637,7 +1629,7 @@ xfs_blockgc_free_space(
 {
 	trace_xfs_blockgc_free_space(mp, eofb, _RET_IP_);
 
-	return xfs_inode_walk(mp, 0, xfs_blockgc_scan_inode, eofb,
+	return xfs_inode_walk(mp, xfs_blockgc_scan_inode, eofb,
 			XFS_ICI_BLOCKGC_TAG);
 }
 
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index d70d93c2bb10..d52c041093a3 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -34,11 +34,6 @@ struct xfs_eofblocks {
 #define XFS_IGET_DONTCACHE	0x4
 #define XFS_IGET_INCORE		0x8	/* don't read from disk or reinit */
 
-/*
- * flags for AG inode iterator
- */
-#define XFS_INODE_WALK_INEW_WAIT	0x1	/* wait on new inodes */
-
 int xfs_iget(struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t ino,
 	     uint flags, uint lock_flags, xfs_inode_t **ipp);
 
@@ -68,10 +63,6 @@ void xfs_inode_clear_cowblocks_tag(struct xfs_inode *ip);
 
 void xfs_blockgc_worker(struct work_struct *work);
 
-int xfs_inode_walk(struct xfs_mount *mp, int iter_flags,
-	int (*execute)(struct xfs_inode *ip, void *args),
-	void *args, int tag);
-
 int xfs_icache_inode_is_allocated(struct xfs_mount *mp, struct xfs_trans *tp,
 				  xfs_ino_t ino, bool *inuse);
 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/6] xfs: remove indirect calls from xfs_inode_walk{,_ag}
  2021-03-26  0:21 [PATCHSET v3 0/6] xfs: clean up incore inode walk functions Darrick J. Wong
  2021-03-26  0:21 ` [PATCH 1/6] xfs: use s_inodes in xfs_qm_dqrele_all_inodes Darrick J. Wong
  2021-03-26  0:21 ` [PATCH 2/6] xfs: remove iter_flags parameter from xfs_inode_walk_* Darrick J. Wong
@ 2021-03-26  0:21 ` Darrick J. Wong
  2021-03-26  6:08   ` Christoph Hellwig
  2021-03-26  0:21 ` [PATCH 4/6] xfs: pass struct xfs_eofblocks to the inode scan callback Darrick J. Wong
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-03-26  0:21 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

It turns out that there is a 1:1 mapping between the execute and tag
parameters that are passed to xfs_inode_walk_ag:

	xfs_blockgc_scan_inode <=> XFS_ICI_BLOCKGC_TAG

Since the only user of the inode walk function is the blockgc code, we
don't need the tag parameter or the execute function pointer.  The inode
deferred inactivation changes in the next series will add a second
tag:function pair, so we'll leave the tag parameter for now.

For the price of a forward static declaration, we can eliminate the
indirect function call.  This likely has a negligible impact on
performance (since the execute function runs transactions), but it also
simplifies the function signature.

Radix tree tags are unsigned ints, so fix the type usage for all those
tags.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_sb.c |    2 +
 fs/xfs/libxfs/xfs_sb.h |    4 +-
 fs/xfs/xfs_icache.c    |   77 ++++++++++++++++++++----------------------------
 3 files changed, 35 insertions(+), 48 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 60e6d255e5e2..f72f6d7fef33 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -61,7 +61,7 @@ struct xfs_perag *
 xfs_perag_get_tag(
 	struct xfs_mount	*mp,
 	xfs_agnumber_t		first,
-	int			tag)
+	unsigned int		tag)
 {
 	struct xfs_perag	*pag;
 	int			found;
diff --git a/fs/xfs/libxfs/xfs_sb.h b/fs/xfs/libxfs/xfs_sb.h
index f79f9dc632b6..e5f1c2d879eb 100644
--- a/fs/xfs/libxfs/xfs_sb.h
+++ b/fs/xfs/libxfs/xfs_sb.h
@@ -17,8 +17,8 @@ struct xfs_perag;
  * perag get/put wrappers for ref counting
  */
 extern struct xfs_perag *xfs_perag_get(struct xfs_mount *, xfs_agnumber_t);
-extern struct xfs_perag *xfs_perag_get_tag(struct xfs_mount *, xfs_agnumber_t,
-					   int tag);
+struct xfs_perag *xfs_perag_get_tag(struct xfs_mount *mp, xfs_agnumber_t agno,
+		unsigned int tag);
 extern void	xfs_perag_put(struct xfs_perag *pag);
 extern int	xfs_initialize_perag_data(struct xfs_mount *, xfs_agnumber_t);
 
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 16392e96be91..f4c4f6e15d77 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -26,6 +26,9 @@
 
 #include <linux/iversion.h>
 
+/* Forward declarations to reduce indirect calls */
+static int xfs_blockgc_scan_inode(struct xfs_inode *ip, void *args);
+
 /*
  * Allocate and initialise an xfs_inode.
  */
@@ -723,13 +726,17 @@ xfs_icache_inode_is_allocated(
  */
 #define XFS_LOOKUP_BATCH	32
 
+/* Don't try to run block gc on an inode that's in any of these states. */
+#define XFS_BLOCKGC_INELIGIBLE_IFLAGS	(XFS_INEW | \
+					 XFS_IRECLAIMABLE | \
+					 XFS_IRECLAIM)
 /*
- * Decide if the given @ip is eligible to be a part of the inode walk, and
- * grab it if so.  Returns true if it's ready to go or false if we should just
- * ignore it.
+ * Decide if the given @ip is eligible for garbage collection of speculative
+ * preallocations, and grab it if so.  Returns true if it's ready to go or
+ * false if we should just ignore it.
  */
 STATIC bool
-xfs_inode_walk_ag_grab(
+xfs_blockgc_grab(
 	struct xfs_inode	*ip)
 {
 	struct inode		*inode = VFS_I(ip);
@@ -741,8 +748,7 @@ xfs_inode_walk_ag_grab(
 	if (!ip->i_ino)
 		goto out_unlock_noent;
 
-	/* avoid new or reclaimable inodes. Leave for reclaim code to flush */
-	if (__xfs_iflags_test(ip, XFS_INEW | XFS_IRECLAIMABLE | XFS_IRECLAIM))
+	if (__xfs_iflags_test(ip, XFS_BLOCKGC_INELIGIBLE_IFLAGS))
 		goto out_unlock_noent;
 	spin_unlock(&ip->i_flags_lock);
 
@@ -763,15 +769,14 @@ xfs_inode_walk_ag_grab(
 }
 
 /*
- * For a given per-AG structure @pag, grab, @execute, and rele all incore
- * inodes with the given radix tree @tag.
+ * For a given per-AG structure @pag, grab, execute a tag specific function,
+ * and release all incore inodes with the given radix tree @tag.
  */
 STATIC int
 xfs_inode_walk_ag(
 	struct xfs_perag	*pag,
-	int			(*execute)(struct xfs_inode *ip, void *args),
-	void			*args,
-	int			tag)
+	unsigned int		tag,
+	void			*args)
 {
 	struct xfs_mount	*mp = pag->pag_mount;
 	uint32_t		first_index;
@@ -780,6 +785,8 @@ xfs_inode_walk_ag(
 	bool			done;
 	int			nr_found;
 
+	ASSERT(tag == XFS_ICI_BLOCKGC_TAG);
+
 restart:
 	done = false;
 	skipped = 0;
@@ -792,16 +799,9 @@ xfs_inode_walk_ag(
 
 		rcu_read_lock();
 
-		if (tag == XFS_ICI_NO_TAG)
-			nr_found = radix_tree_gang_lookup(&pag->pag_ici_root,
-					(void **)batch, first_index,
-					XFS_LOOKUP_BATCH);
-		else
-			nr_found = radix_tree_gang_lookup_tag(
-					&pag->pag_ici_root,
-					(void **) batch, first_index,
-					XFS_LOOKUP_BATCH, tag);
-
+		nr_found = radix_tree_gang_lookup_tag(&pag->pag_ici_root,
+				(void **)batch, first_index, XFS_LOOKUP_BATCH,
+				tag);
 		if (!nr_found) {
 			rcu_read_unlock();
 			break;
@@ -814,7 +814,7 @@ xfs_inode_walk_ag(
 		for (i = 0; i < nr_found; i++) {
 			struct xfs_inode *ip = batch[i];
 
-			if (done || !xfs_inode_walk_ag_grab(ip))
+			if (done || !xfs_blockgc_grab(ip))
 				batch[i] = NULL;
 
 			/*
@@ -842,7 +842,7 @@ xfs_inode_walk_ag(
 		for (i = 0; i < nr_found; i++) {
 			if (!batch[i])
 				continue;
-			error = execute(batch[i], args);
+			error = xfs_blockgc_scan_inode(batch[i], args);
 			xfs_irele(batch[i]);
 			if (error == -EAGAIN) {
 				skipped++;
@@ -867,38 +867,27 @@ xfs_inode_walk_ag(
 	return last_error;
 }
 
-/* Fetch the next (possibly tagged) per-AG structure. */
-static inline struct xfs_perag *
-xfs_inode_walk_get_perag(
-	struct xfs_mount	*mp,
-	xfs_agnumber_t		agno,
-	int			tag)
-{
-	if (tag == XFS_ICI_NO_TAG)
-		return xfs_perag_get(mp, agno);
-	return xfs_perag_get_tag(mp, agno, tag);
-}
-
 /*
- * Call the @execute function on all incore inodes matching the radix tree
+ * Call a tag-specific function on all incore inodes matching the radix tree
  * @tag.
  */
 static int
 xfs_inode_walk(
 	struct xfs_mount	*mp,
-	int			(*execute)(struct xfs_inode *ip, void *args),
-	void			*args,
-	int			tag)
+	unsigned int		tag,
+	void			*args)
 {
 	struct xfs_perag	*pag;
 	int			error = 0;
 	int			last_error = 0;
 	xfs_agnumber_t		ag;
 
+	ASSERT(tag == XFS_ICI_BLOCKGC_TAG);
+
 	ag = 0;
-	while ((pag = xfs_inode_walk_get_perag(mp, ag, tag))) {
+	while ((pag = xfs_perag_get_tag(mp, ag, tag))) {
 		ag = pag->pag_agno + 1;
-		error = xfs_inode_walk_ag(pag, execute, args, tag);
+		error = xfs_inode_walk_ag(pag, tag, args);
 		xfs_perag_put(pag);
 		if (error) {
 			last_error = error;
@@ -1610,8 +1599,7 @@ xfs_blockgc_worker(
 
 	if (!sb_start_write_trylock(mp->m_super))
 		return;
-	error = xfs_inode_walk_ag(pag, xfs_blockgc_scan_inode, NULL,
-			XFS_ICI_BLOCKGC_TAG);
+	error = xfs_inode_walk_ag(pag, XFS_ICI_BLOCKGC_TAG, NULL);
 	if (error)
 		xfs_info(mp, "AG %u preallocation gc worker failed, err=%d",
 				pag->pag_agno, error);
@@ -1629,8 +1617,7 @@ xfs_blockgc_free_space(
 {
 	trace_xfs_blockgc_free_space(mp, eofb, _RET_IP_);
 
-	return xfs_inode_walk(mp, xfs_blockgc_scan_inode, eofb,
-			XFS_ICI_BLOCKGC_TAG);
+	return xfs_inode_walk(mp, XFS_ICI_BLOCKGC_TAG, eofb);
 }
 
 /*


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 4/6] xfs: pass struct xfs_eofblocks to the inode scan callback
  2021-03-26  0:21 [PATCHSET v3 0/6] xfs: clean up incore inode walk functions Darrick J. Wong
                   ` (2 preceding siblings ...)
  2021-03-26  0:21 ` [PATCH 3/6] xfs: remove indirect calls from xfs_inode_walk{,_ag} Darrick J. Wong
@ 2021-03-26  0:21 ` Darrick J. Wong
  2021-03-26  6:09   ` Christoph Hellwig
  2021-03-26  0:21 ` [PATCH 5/6] xfs: merge xfs_reclaim_inodes_ag into xfs_inode_walk_ag Darrick J. Wong
  2021-03-26  0:21 ` [PATCH 6/6] xfs: refactor per-AG inode tagging functions Darrick J. Wong
  5 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-03-26  0:21 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Pass a pointer to the actual eofb structure around the inode scanner
functions instead of a void pointer, now that none of the functions is
used as a callback.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_icache.c |   23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)


diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index f4c4f6e15d77..b02b4b349ee9 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -27,7 +27,8 @@
 #include <linux/iversion.h>
 
 /* Forward declarations to reduce indirect calls */
-static int xfs_blockgc_scan_inode(struct xfs_inode *ip, void *args);
+static int xfs_blockgc_scan_inode(struct xfs_inode *ip,
+		struct xfs_eofblocks *eofb);
 
 /*
  * Allocate and initialise an xfs_inode.
@@ -776,7 +777,7 @@ STATIC int
 xfs_inode_walk_ag(
 	struct xfs_perag	*pag,
 	unsigned int		tag,
-	void			*args)
+	struct xfs_eofblocks	*eofb)
 {
 	struct xfs_mount	*mp = pag->pag_mount;
 	uint32_t		first_index;
@@ -842,7 +843,7 @@ xfs_inode_walk_ag(
 		for (i = 0; i < nr_found; i++) {
 			if (!batch[i])
 				continue;
-			error = xfs_blockgc_scan_inode(batch[i], args);
+			error = xfs_blockgc_scan_inode(batch[i], eofb);
 			xfs_irele(batch[i]);
 			if (error == -EAGAIN) {
 				skipped++;
@@ -875,7 +876,7 @@ static int
 xfs_inode_walk(
 	struct xfs_mount	*mp,
 	unsigned int		tag,
-	void			*args)
+	struct xfs_eofblocks	*eofb)
 {
 	struct xfs_perag	*pag;
 	int			error = 0;
@@ -887,7 +888,7 @@ xfs_inode_walk(
 	ag = 0;
 	while ((pag = xfs_perag_get_tag(mp, ag, tag))) {
 		ag = pag->pag_agno + 1;
-		error = xfs_inode_walk_ag(pag, tag, args);
+		error = xfs_inode_walk_ag(pag, tag, eofb);
 		xfs_perag_put(pag);
 		if (error) {
 			last_error = error;
@@ -1266,10 +1267,9 @@ xfs_reclaim_worker(
 STATIC int
 xfs_inode_free_eofblocks(
 	struct xfs_inode	*ip,
-	void			*args,
+	struct xfs_eofblocks	*eofb,
 	unsigned int		*lockflags)
 {
-	struct xfs_eofblocks	*eofb = args;
 	bool			wait;
 
 	wait = eofb && (eofb->eof_flags & XFS_EOF_FLAGS_SYNC);
@@ -1473,10 +1473,9 @@ xfs_prep_free_cowblocks(
 STATIC int
 xfs_inode_free_cowblocks(
 	struct xfs_inode	*ip,
-	void			*args,
+	struct xfs_eofblocks	*eofb,
 	unsigned int		*lockflags)
 {
-	struct xfs_eofblocks	*eofb = args;
 	bool			wait;
 	int			ret = 0;
 
@@ -1571,16 +1570,16 @@ xfs_blockgc_start(
 static int
 xfs_blockgc_scan_inode(
 	struct xfs_inode	*ip,
-	void			*args)
+	struct xfs_eofblocks	*eofb)
 {
 	unsigned int		lockflags = 0;
 	int			error;
 
-	error = xfs_inode_free_eofblocks(ip, args, &lockflags);
+	error = xfs_inode_free_eofblocks(ip, eofb, &lockflags);
 	if (error)
 		goto unlock;
 
-	error = xfs_inode_free_cowblocks(ip, args, &lockflags);
+	error = xfs_inode_free_cowblocks(ip, eofb, &lockflags);
 unlock:
 	if (lockflags)
 		xfs_iunlock(ip, lockflags);


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 5/6] xfs: merge xfs_reclaim_inodes_ag into xfs_inode_walk_ag
  2021-03-26  0:21 [PATCHSET v3 0/6] xfs: clean up incore inode walk functions Darrick J. Wong
                   ` (3 preceding siblings ...)
  2021-03-26  0:21 ` [PATCH 4/6] xfs: pass struct xfs_eofblocks to the inode scan callback Darrick J. Wong
@ 2021-03-26  0:21 ` Darrick J. Wong
  2021-03-26  6:30   ` Christoph Hellwig
  2021-03-26  0:21 ` [PATCH 6/6] xfs: refactor per-AG inode tagging functions Darrick J. Wong
  5 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-03-26  0:21 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Merge these two inode walk loops together, since they're pretty similar
now.  Get rid of XFS_ICI_NO_TAG since nobody uses it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_icache.c |  154 ++++++++++++++++-----------------------------------
 fs/xfs/xfs_icache.h |    5 +-
 2 files changed, 52 insertions(+), 107 deletions(-)


diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index b02b4b349ee9..2b25fe679b0e 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -29,6 +29,8 @@
 /* Forward declarations to reduce indirect calls */
 static int xfs_blockgc_scan_inode(struct xfs_inode *ip,
 		struct xfs_eofblocks *eofb);
+static bool xfs_reclaim_inode_grab(struct xfs_inode *ip);
+static void xfs_reclaim_inode(struct xfs_inode *ip, struct xfs_perag *pag);
 
 /*
  * Allocate and initialise an xfs_inode.
@@ -769,6 +771,21 @@ xfs_blockgc_grab(
 	return false;
 }
 
+static inline bool
+selected_for_walk(
+	unsigned int		tag,
+	struct xfs_inode	*ip)
+{
+	switch (tag) {
+	case XFS_ICI_BLOCKGC_TAG:
+		return xfs_blockgc_grab(ip);
+	case XFS_ICI_RECLAIM_TAG:
+		return xfs_reclaim_inode_grab(ip);
+	default:
+		return false;
+	}
+}
+
 /*
  * For a given per-AG structure @pag, grab, execute a tag specific function,
  * and release all incore inodes with the given radix tree @tag.
@@ -786,12 +803,14 @@ xfs_inode_walk_ag(
 	bool			done;
 	int			nr_found;
 
-	ASSERT(tag == XFS_ICI_BLOCKGC_TAG);
+	ASSERT(tag < RADIX_TREE_MAX_TAGS);
 
 restart:
 	done = false;
 	skipped = 0;
 	first_index = 0;
+	if (tag == XFS_ICI_RECLAIM_TAG)
+		first_index = READ_ONCE(pag->pag_ici_reclaim_cursor);
 	nr_found = 0;
 	do {
 		struct xfs_inode *batch[XFS_LOOKUP_BATCH];
@@ -804,6 +823,7 @@ xfs_inode_walk_ag(
 				(void **)batch, first_index, XFS_LOOKUP_BATCH,
 				tag);
 		if (!nr_found) {
+			done = true;
 			rcu_read_unlock();
 			break;
 		}
@@ -815,7 +835,7 @@ xfs_inode_walk_ag(
 		for (i = 0; i < nr_found; i++) {
 			struct xfs_inode *ip = batch[i];
 
-			if (done || !xfs_blockgc_grab(ip))
+			if (done || !selected_for_walk(tag, ip))
 				batch[i] = NULL;
 
 			/*
@@ -843,8 +863,16 @@ xfs_inode_walk_ag(
 		for (i = 0; i < nr_found; i++) {
 			if (!batch[i])
 				continue;
-			error = xfs_blockgc_scan_inode(batch[i], eofb);
-			xfs_irele(batch[i]);
+			switch (tag) {
+			case XFS_ICI_BLOCKGC_TAG:
+				error = xfs_blockgc_scan_inode(batch[i], eofb);
+				xfs_irele(batch[i]);
+				break;
+			case XFS_ICI_RECLAIM_TAG:
+				xfs_reclaim_inode(batch[i], pag);
+				error = 0;
+				break;
+			}
 			if (error == -EAGAIN) {
 				skipped++;
 				continue;
@@ -858,9 +886,19 @@ xfs_inode_walk_ag(
 			break;
 
 		cond_resched();
-
+		if (tag == XFS_ICI_RECLAIM_TAG && eofb) {
+			eofb->nr_to_scan -= XFS_LOOKUP_BATCH;
+			if (eofb->nr_to_scan < 0)
+				break;
+		}
 	} while (nr_found && !done);
 
+	if (tag == XFS_ICI_RECLAIM_TAG) {
+		if (done)
+			first_index = 0;
+		WRITE_ONCE(pag->pag_ici_reclaim_cursor, first_index);
+	}
+
 	if (skipped) {
 		delay(1);
 		goto restart;
@@ -883,7 +921,7 @@ xfs_inode_walk(
 	int			last_error = 0;
 	xfs_agnumber_t		ag;
 
-	ASSERT(tag == XFS_ICI_BLOCKGC_TAG);
+	ASSERT(tag < RADIX_TREE_MAX_TAGS);
 
 	ag = 0;
 	while ((pag = xfs_perag_get_tag(mp, ag, tag))) {
@@ -1027,108 +1065,13 @@ xfs_reclaim_inode(
 	xfs_iflags_clear(ip, XFS_IRECLAIM);
 }
 
-/*
- * Walk the AGs and reclaim the inodes in them. Even if the filesystem is
- * corrupted, we still want to try to reclaim all the inodes. If we don't,
- * then a shut down during filesystem unmount reclaim walk leak all the
- * unreclaimed inodes.
- *
- * Returns non-zero if any AGs or inodes were skipped in the reclaim pass
- * so that callers that want to block until all dirty inodes are written back
- * and reclaimed can sanely loop.
- */
-static void
-xfs_reclaim_inodes_ag(
-	struct xfs_mount	*mp,
-	int			*nr_to_scan)
-{
-	struct xfs_perag	*pag;
-	xfs_agnumber_t		ag = 0;
-
-	while ((pag = xfs_perag_get_tag(mp, ag, XFS_ICI_RECLAIM_TAG))) {
-		unsigned long	first_index = 0;
-		int		done = 0;
-		int		nr_found = 0;
-
-		ag = pag->pag_agno + 1;
-
-		first_index = READ_ONCE(pag->pag_ici_reclaim_cursor);
-		do {
-			struct xfs_inode *batch[XFS_LOOKUP_BATCH];
-			int	i;
-
-			rcu_read_lock();
-			nr_found = radix_tree_gang_lookup_tag(
-					&pag->pag_ici_root,
-					(void **)batch, first_index,
-					XFS_LOOKUP_BATCH,
-					XFS_ICI_RECLAIM_TAG);
-			if (!nr_found) {
-				done = 1;
-				rcu_read_unlock();
-				break;
-			}
-
-			/*
-			 * Grab the inodes before we drop the lock. if we found
-			 * nothing, nr == 0 and the loop will be skipped.
-			 */
-			for (i = 0; i < nr_found; i++) {
-				struct xfs_inode *ip = batch[i];
-
-				if (done || !xfs_reclaim_inode_grab(ip))
-					batch[i] = NULL;
-
-				/*
-				 * Update the index for the next lookup. Catch
-				 * overflows into the next AG range which can
-				 * occur if we have inodes in the last block of
-				 * the AG and we are currently pointing to the
-				 * last inode.
-				 *
-				 * Because we may see inodes that are from the
-				 * wrong AG due to RCU freeing and
-				 * reallocation, only update the index if it
-				 * lies in this AG. It was a race that lead us
-				 * to see this inode, so another lookup from
-				 * the same index will not find it again.
-				 */
-				if (XFS_INO_TO_AGNO(mp, ip->i_ino) !=
-								pag->pag_agno)
-					continue;
-				first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1);
-				if (first_index < XFS_INO_TO_AGINO(mp, ip->i_ino))
-					done = 1;
-			}
-
-			/* unlock now we've grabbed the inodes. */
-			rcu_read_unlock();
-
-			for (i = 0; i < nr_found; i++) {
-				if (batch[i])
-					xfs_reclaim_inode(batch[i], pag);
-			}
-
-			*nr_to_scan -= XFS_LOOKUP_BATCH;
-			cond_resched();
-		} while (nr_found && !done && *nr_to_scan > 0);
-
-		if (done)
-			first_index = 0;
-		WRITE_ONCE(pag->pag_ici_reclaim_cursor, first_index);
-		xfs_perag_put(pag);
-	}
-}
-
 void
 xfs_reclaim_inodes(
 	struct xfs_mount	*mp)
 {
-	int		nr_to_scan = INT_MAX;
-
 	while (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_RECLAIM_TAG)) {
 		xfs_ail_push_all_sync(mp->m_ail);
-		xfs_reclaim_inodes_ag(mp, &nr_to_scan);
+		xfs_inode_walk(mp, XFS_ICI_RECLAIM_TAG, NULL);
 	}
 }
 
@@ -1144,11 +1087,13 @@ xfs_reclaim_inodes_nr(
 	struct xfs_mount	*mp,
 	int			nr_to_scan)
 {
+	struct xfs_eofblocks	eofb = { .nr_to_scan = nr_to_scan };
+
 	/* kick background reclaimer and push the AIL */
 	xfs_reclaim_work_queue(mp);
 	xfs_ail_push_all(mp->m_ail);
 
-	xfs_reclaim_inodes_ag(mp, &nr_to_scan);
+	xfs_inode_walk(mp, XFS_ICI_RECLAIM_TAG, &eofb);
 	return 0;
 }
 
@@ -1258,9 +1203,8 @@ xfs_reclaim_worker(
 {
 	struct xfs_mount *mp = container_of(to_delayed_work(work),
 					struct xfs_mount, m_reclaim_work);
-	int		nr_to_scan = INT_MAX;
 
-	xfs_reclaim_inodes_ag(mp, &nr_to_scan);
+	xfs_inode_walk(mp, XFS_ICI_RECLAIM_TAG, NULL);
 	xfs_reclaim_work_queue(mp);
 }
 
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index d52c041093a3..bde7bab84230 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -15,13 +15,14 @@ struct xfs_eofblocks {
 	kgid_t		eof_gid;
 	prid_t		eof_prid;
 	__u64		eof_min_file_size;
+
+	/* Number of inodes to scan, currently limited to reclaim */
+	int		nr_to_scan;
 };
 
 /*
  * tags for inode radix tree
  */
-#define XFS_ICI_NO_TAG		(-1)	/* special flag for an untagged lookup
-					   in xfs_inode_walk */
 #define XFS_ICI_RECLAIM_TAG	0	/* inode is to be reclaimed */
 /* Inode has speculative preallocations (posteof or cow) to clean. */
 #define XFS_ICI_BLOCKGC_TAG	1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 6/6] xfs: refactor per-AG inode tagging functions
  2021-03-26  0:21 [PATCHSET v3 0/6] xfs: clean up incore inode walk functions Darrick J. Wong
                   ` (4 preceding siblings ...)
  2021-03-26  0:21 ` [PATCH 5/6] xfs: merge xfs_reclaim_inodes_ag into xfs_inode_walk_ag Darrick J. Wong
@ 2021-03-26  0:21 ` Darrick J. Wong
  2021-03-26  6:48   ` Christoph Hellwig
  5 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-03-26  0:21 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

In preparation for adding another incore inode tree tag, refactor the
code that sets and clears tags from the per-AG inode tree and the tree
of per-AG structures, and remove the open-coded versions used by the
blockgc code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_icache.c |  127 ++++++++++++++++++++++++---------------------------
 fs/xfs/xfs_icache.h |    2 -
 fs/xfs/xfs_super.c  |    2 -
 fs/xfs/xfs_trace.h  |    6 +-
 4 files changed, 65 insertions(+), 72 deletions(-)


diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 2b25fe679b0e..4c124bc98f39 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -29,6 +29,7 @@
 /* Forward declarations to reduce indirect calls */
 static int xfs_blockgc_scan_inode(struct xfs_inode *ip,
 		struct xfs_eofblocks *eofb);
+static inline void xfs_blockgc_queue(struct xfs_perag *pag);
 static bool xfs_reclaim_inode_grab(struct xfs_inode *ip);
 static void xfs_reclaim_inode(struct xfs_inode *ip, struct xfs_perag *pag);
 
@@ -163,46 +164,78 @@ xfs_reclaim_work_queue(
 	rcu_read_unlock();
 }
 
+/* Set a tag on both the AG incore inode tree and the AG radix tree. */
 static void
-xfs_perag_set_reclaim_tag(
-	struct xfs_perag	*pag)
+xfs_perag_set_ici_tag(
+	struct xfs_perag	*pag,
+	xfs_agino_t		agino,
+	unsigned int		tag)
 {
 	struct xfs_mount	*mp = pag->pag_mount;
+	bool			was_tagged;
 
 	lockdep_assert_held(&pag->pag_ici_lock);
-	if (pag->pag_ici_reclaimable++)
+
+	was_tagged = radix_tree_tagged(&pag->pag_ici_root, tag);
+	radix_tree_tag_set(&pag->pag_ici_root, agino, tag);
+
+	if (tag == XFS_ICI_RECLAIM_TAG)
+		pag->pag_ici_reclaimable++;
+
+	if (was_tagged)
 		return;
 
-	/* propagate the reclaim tag up into the perag radix tree */
+	/* propagate the tag up into the perag radix tree */
 	spin_lock(&mp->m_perag_lock);
-	radix_tree_tag_set(&mp->m_perag_tree, pag->pag_agno,
-			   XFS_ICI_RECLAIM_TAG);
+	radix_tree_tag_set(&mp->m_perag_tree, pag->pag_agno, tag);
 	spin_unlock(&mp->m_perag_lock);
 
-	/* schedule periodic background inode reclaim */
-	xfs_reclaim_work_queue(mp);
+	/* start background work */
+	switch (tag) {
+	case XFS_ICI_RECLAIM_TAG:
+		xfs_reclaim_work_queue(mp);
+		break;
+	case XFS_ICI_BLOCKGC_TAG:
+		xfs_blockgc_queue(pag);
+		break;
+	}
 
-	trace_xfs_perag_set_reclaim(mp, pag->pag_agno, -1, _RET_IP_);
+	trace_xfs_perag_set_ici_tag(mp, pag->pag_agno, tag, _RET_IP_);
 }
 
+/* Clear a tag on both the AG incore inode tree and the AG radix tree. */
 static void
-xfs_perag_clear_reclaim_tag(
-	struct xfs_perag	*pag)
+xfs_perag_clear_ici_tag(
+	struct xfs_perag	*pag,
+	xfs_agino_t		agino,
+	unsigned int		tag)
 {
 	struct xfs_mount	*mp = pag->pag_mount;
 
 	lockdep_assert_held(&pag->pag_ici_lock);
-	if (--pag->pag_ici_reclaimable)
+
+	/*
+	 * Reclaim can signal (with a null agino) that it cleared its own tag
+	 * by removing the inode from the radix tree.
+	 */
+	if (agino != NULLAGINO)
+		radix_tree_tag_clear(&pag->pag_ici_root, agino, tag);
+	else
+		ASSERT(tag == XFS_ICI_RECLAIM_TAG);
+
+	if (tag == XFS_ICI_RECLAIM_TAG)
+		pag->pag_ici_reclaimable--;
+
+	if (radix_tree_tagged(&pag->pag_ici_root, tag))
 		return;
 
-	/* clear the reclaim tag from the perag radix tree */
+	/* clear the tag from the perag radix tree */
 	spin_lock(&mp->m_perag_lock);
-	radix_tree_tag_clear(&mp->m_perag_tree, pag->pag_agno,
-			     XFS_ICI_RECLAIM_TAG);
+	radix_tree_tag_clear(&mp->m_perag_tree, pag->pag_agno, tag);
 	spin_unlock(&mp->m_perag_lock);
-	trace_xfs_perag_clear_reclaim(mp, pag->pag_agno, -1, _RET_IP_);
-}
 
+	trace_xfs_perag_clear_ici_tag(mp, pag->pag_agno, tag, _RET_IP_);
+}
 
 /*
  * We set the inode flag atomically with the radix tree tag.
@@ -210,7 +243,7 @@ xfs_perag_clear_reclaim_tag(
  * can go away.
  */
 void
-xfs_inode_set_reclaim_tag(
+xfs_inode_destroy(
 	struct xfs_inode	*ip)
 {
 	struct xfs_mount	*mp = ip->i_mount;
@@ -220,9 +253,8 @@ xfs_inode_set_reclaim_tag(
 	spin_lock(&pag->pag_ici_lock);
 	spin_lock(&ip->i_flags_lock);
 
-	radix_tree_tag_set(&pag->pag_ici_root, XFS_INO_TO_AGINO(mp, ip->i_ino),
-			   XFS_ICI_RECLAIM_TAG);
-	xfs_perag_set_reclaim_tag(pag);
+	xfs_perag_set_ici_tag(pag, XFS_INO_TO_AGINO(mp, ip->i_ino),
+			XFS_ICI_RECLAIM_TAG);
 	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
 
 	spin_unlock(&ip->i_flags_lock);
@@ -230,17 +262,6 @@ xfs_inode_set_reclaim_tag(
 	xfs_perag_put(pag);
 }
 
-STATIC void
-xfs_inode_clear_reclaim_tag(
-	struct xfs_perag	*pag,
-	xfs_ino_t		ino)
-{
-	radix_tree_tag_clear(&pag->pag_ici_root,
-			     XFS_INO_TO_AGINO(pag->pag_mount, ino),
-			     XFS_ICI_RECLAIM_TAG);
-	xfs_perag_clear_reclaim_tag(pag);
-}
-
 void
 xfs_inew_wait(
 	struct xfs_inode	*ip)
@@ -439,7 +460,9 @@ xfs_iget_cache_hit(
 		 */
 		ip->i_flags &= ~XFS_IRECLAIM_RESET_FLAGS;
 		ip->i_flags |= XFS_INEW;
-		xfs_inode_clear_reclaim_tag(pag, ip->i_ino);
+		xfs_perag_clear_ici_tag(pag,
+				XFS_INO_TO_AGINO(pag->pag_mount, ino),
+				XFS_ICI_RECLAIM_TAG);
 		inode->i_state = I_NEW;
 		ip->i_sick = 0;
 		ip->i_checked = 0;
@@ -1038,7 +1061,7 @@ xfs_reclaim_inode(
 	if (!radix_tree_delete(&pag->pag_ici_root,
 				XFS_INO_TO_AGINO(ip->i_mount, ino)))
 		ASSERT(0);
-	xfs_perag_clear_reclaim_tag(pag);
+	xfs_perag_clear_ici_tag(pag, NULLAGINO, XFS_ICI_RECLAIM_TAG);
 	spin_unlock(&pag->pag_ici_lock);
 
 	/*
@@ -1274,7 +1297,6 @@ xfs_blockgc_set_iflag(
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_perag	*pag;
-	int			tagged;
 
 	ASSERT((iflag & ~(XFS_IEOFBLOCKS | XFS_ICOWBLOCKS)) == 0);
 
@@ -1291,24 +1313,8 @@ xfs_blockgc_set_iflag(
 	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
 	spin_lock(&pag->pag_ici_lock);
 
-	tagged = radix_tree_tagged(&pag->pag_ici_root, XFS_ICI_BLOCKGC_TAG);
-	radix_tree_tag_set(&pag->pag_ici_root,
-			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
-			   XFS_ICI_BLOCKGC_TAG);
-	if (!tagged) {
-		/* propagate the blockgc tag up into the perag radix tree */
-		spin_lock(&ip->i_mount->m_perag_lock);
-		radix_tree_tag_set(&ip->i_mount->m_perag_tree,
-				   XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino),
-				   XFS_ICI_BLOCKGC_TAG);
-		spin_unlock(&ip->i_mount->m_perag_lock);
-
-		/* kick off background trimming */
-		xfs_blockgc_queue(pag);
-
-		trace_xfs_perag_set_blockgc(ip->i_mount, pag->pag_agno, -1,
-				_RET_IP_);
-	}
+	xfs_perag_set_ici_tag(pag, XFS_INO_TO_AGINO(mp, ip->i_ino),
+			XFS_ICI_BLOCKGC_TAG);
 
 	spin_unlock(&pag->pag_ici_lock);
 	xfs_perag_put(pag);
@@ -1344,19 +1350,8 @@ xfs_blockgc_clear_iflag(
 	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
 	spin_lock(&pag->pag_ici_lock);
 
-	radix_tree_tag_clear(&pag->pag_ici_root,
-			     XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
-			     XFS_ICI_BLOCKGC_TAG);
-	if (!radix_tree_tagged(&pag->pag_ici_root, XFS_ICI_BLOCKGC_TAG)) {
-		/* clear the blockgc tag from the perag radix tree */
-		spin_lock(&ip->i_mount->m_perag_lock);
-		radix_tree_tag_clear(&ip->i_mount->m_perag_tree,
-				     XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino),
-				     XFS_ICI_BLOCKGC_TAG);
-		spin_unlock(&ip->i_mount->m_perag_lock);
-		trace_xfs_perag_clear_blockgc(ip->i_mount, pag->pag_agno, -1,
-				_RET_IP_);
-	}
+	xfs_perag_clear_ici_tag(pag, XFS_INO_TO_AGINO(mp, ip->i_ino),
+			XFS_ICI_BLOCKGC_TAG);
 
 	spin_unlock(&pag->pag_ici_lock);
 	xfs_perag_put(pag);
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index bde7bab84230..987267797fc4 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -48,7 +48,7 @@ void xfs_reclaim_inodes(struct xfs_mount *mp);
 int xfs_reclaim_inodes_count(struct xfs_mount *mp);
 long xfs_reclaim_inodes_nr(struct xfs_mount *mp, int nr_to_scan);
 
-void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
+void xfs_inode_destroy(struct xfs_inode *ip);
 
 int xfs_blockgc_free_dquots(struct xfs_mount *mp, struct xfs_dquot *udqp,
 		struct xfs_dquot *gdqp, struct xfs_dquot *pdqp,
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index a2dab05332ac..f5bbfd13a956 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -667,7 +667,7 @@ xfs_fs_destroy_inode(
 	 * reclaim path handles this more efficiently than we can here, so
 	 * simply let background reclaim tear down all inodes.
 	 */
-	xfs_inode_set_reclaim_tag(ip);
+	xfs_inode_destroy(ip);
 }
 
 static void
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 808ae337b222..a929ebef89ec 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -153,10 +153,8 @@ DEFINE_EVENT(xfs_perag_class, name,	\
 DEFINE_PERAG_REF_EVENT(xfs_perag_get);
 DEFINE_PERAG_REF_EVENT(xfs_perag_get_tag);
 DEFINE_PERAG_REF_EVENT(xfs_perag_put);
-DEFINE_PERAG_REF_EVENT(xfs_perag_set_reclaim);
-DEFINE_PERAG_REF_EVENT(xfs_perag_clear_reclaim);
-DEFINE_PERAG_REF_EVENT(xfs_perag_set_blockgc);
-DEFINE_PERAG_REF_EVENT(xfs_perag_clear_blockgc);
+DEFINE_PERAG_REF_EVENT(xfs_perag_set_ici_tag);
+DEFINE_PERAG_REF_EVENT(xfs_perag_clear_ici_tag);
 
 DECLARE_EVENT_CLASS(xfs_ag_class,
 	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno),


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] xfs: remove iter_flags parameter from xfs_inode_walk_*
  2021-03-26  0:21 ` [PATCH 2/6] xfs: remove iter_flags parameter from xfs_inode_walk_* Darrick J. Wong
@ 2021-03-26  6:04   ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2021-03-26  6:04 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch

On Thu, Mar 25, 2021 at 05:21:24PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> The sole iter_flags is XFS_INODE_WALK_INEW_WAIT, and there are no users.
> Remove the flag, and the parameter, and all the code that used it.
> Since there are no longer any external callers of xfs_inode_walk, make
> it static.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/6] xfs: remove indirect calls from xfs_inode_walk{,_ag}
  2021-03-26  0:21 ` [PATCH 3/6] xfs: remove indirect calls from xfs_inode_walk{,_ag} Darrick J. Wong
@ 2021-03-26  6:08   ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2021-03-26  6:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch

> +/* Forward declarations to reduce indirect calls */
> +static int xfs_blockgc_scan_inode(struct xfs_inode *ip, void *args);

While we try to avoid forward declarations I don't think we really need
a comment justifying it.

Otherwise this looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] xfs: pass struct xfs_eofblocks to the inode scan callback
  2021-03-26  0:21 ` [PATCH 4/6] xfs: pass struct xfs_eofblocks to the inode scan callback Darrick J. Wong
@ 2021-03-26  6:09   ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2021-03-26  6:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch

On Thu, Mar 25, 2021 at 05:21:35PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Pass a pointer to the actual eofb structure around the inode scanner
> functions instead of a void pointer, now that none of the functions is
> used as a callback.

Looks so good that it could be from me :)

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] xfs: merge xfs_reclaim_inodes_ag into xfs_inode_walk_ag
  2021-03-26  0:21 ` [PATCH 5/6] xfs: merge xfs_reclaim_inodes_ag into xfs_inode_walk_ag Darrick J. Wong
@ 2021-03-26  6:30   ` Christoph Hellwig
  2021-03-26 16:07     ` Darrick J. Wong
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2021-03-26  6:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch

On Thu, Mar 25, 2021 at 05:21:40PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Merge these two inode walk loops together, since they're pretty similar
> now.  Get rid of XFS_ICI_NO_TAG since nobody uses it.

The laster user of XFS_ICI_NO_TAG was quotoff, and the last reference
was removed in "xfs: remove indirect calls from xfs_inode_walk{,_ag}".
So I think it should be dropped there, or even better in a prep patch
removing all the XFS_ICI_NO_TAG code before that one.

> +static inline bool
> +selected_for_walk(
> +	unsigned int		tag,
> +	struct xfs_inode	*ip)
> +{
> +	switch (tag) {
> +	case XFS_ICI_BLOCKGC_TAG:
> +		return xfs_blockgc_grab(ip);
> +	case XFS_ICI_RECLAIM_TAG:
> +		return xfs_reclaim_inode_grab(ip);
> +	default:
> +		return false;
> +	}
> +}

Maybe name ths something that starts with xfs_ and ends with _grab?

>   * and release all incore inodes with the given radix tree @tag.
> @@ -786,12 +803,14 @@ xfs_inode_walk_ag(
>  	bool			done;
>  	int			nr_found;
>  
> -	ASSERT(tag == XFS_ICI_BLOCKGC_TAG);
> +	ASSERT(tag < RADIX_TREE_MAX_TAGS);
>  
>  restart:
>  	done = false;
>  	skipped = 0;
>  	first_index = 0;
> +	if (tag == XFS_ICI_RECLAIM_TAG)
> +		first_index = READ_ONCE(pag->pag_ici_reclaim_cursor);

if / else to make this clear?

>  		for (i = 0; i < nr_found; i++) {
>  			if (!batch[i])
>  				continue;
> -			error = xfs_blockgc_scan_inode(batch[i], eofb);
> -			xfs_irele(batch[i]);
> +			switch (tag) {
> +			case XFS_ICI_BLOCKGC_TAG:
> +				error = xfs_blockgc_scan_inode(batch[i], eofb);
> +				xfs_irele(batch[i]);
> +				break;
> +			case XFS_ICI_RECLAIM_TAG:
> +				xfs_reclaim_inode(batch[i], pag);
> +				error = 0;

Maybe move the irele into xfs_blockgc_scan_inode to make the calling
conventions more similar?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] xfs: refactor per-AG inode tagging functions
  2021-03-26  0:21 ` [PATCH 6/6] xfs: refactor per-AG inode tagging functions Darrick J. Wong
@ 2021-03-26  6:48   ` Christoph Hellwig
  2021-03-26 16:34     ` Darrick J. Wong
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2021-03-26  6:48 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch

On Thu, Mar 25, 2021 at 05:21:46PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> In preparation for adding another incore inode tree tag, refactor the
> code that sets and clears tags from the per-AG inode tree and the tree
> of per-AG structures, and remove the open-coded versions used by the
> blockgc code.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/xfs_icache.c |  127 ++++++++++++++++++++++++---------------------------
>  fs/xfs/xfs_icache.h |    2 -
>  fs/xfs/xfs_super.c  |    2 -
>  fs/xfs/xfs_trace.h  |    6 +-
>  4 files changed, 65 insertions(+), 72 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 2b25fe679b0e..4c124bc98f39 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -29,6 +29,7 @@
>  /* Forward declarations to reduce indirect calls */
>  static int xfs_blockgc_scan_inode(struct xfs_inode *ip,
>  		struct xfs_eofblocks *eofb);
> +static inline void xfs_blockgc_queue(struct xfs_perag *pag);
>  static bool xfs_reclaim_inode_grab(struct xfs_inode *ip);
>  static void xfs_reclaim_inode(struct xfs_inode *ip, struct xfs_perag *pag);
>  
> @@ -163,46 +164,78 @@ xfs_reclaim_work_queue(
>  	rcu_read_unlock();
>  }
>  
> +/* Set a tag on both the AG incore inode tree and the AG radix tree. */
>  static void
> +xfs_perag_set_ici_tag(
> +	struct xfs_perag	*pag,
> +	xfs_agino_t		agino,
> +	unsigned int		tag)

Looking at the callers - I think the logic to lookup the pag and set the
inode flag should also go in here.  Currently only xfs_inode_destroy
nests i_Flags log inside the pag_ici_lock, but I don't see how that
would harm the xfs_blockgc_set_iflag case.  I suspect the unlocked
check in xfs_blockgc_set_iflag would harm in the reclaim case either.

>  void
> +xfs_inode_destroy(

I find this new name a little confusing.  What about
xfs_inode_mark_reclaimable?

But overall this new scheme looks nice to me.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] xfs: merge xfs_reclaim_inodes_ag into xfs_inode_walk_ag
  2021-03-26  6:30   ` Christoph Hellwig
@ 2021-03-26 16:07     ` Darrick J. Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2021-03-26 16:07 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Fri, Mar 26, 2021 at 06:30:56AM +0000, Christoph Hellwig wrote:
> On Thu, Mar 25, 2021 at 05:21:40PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Merge these two inode walk loops together, since they're pretty similar
> > now.  Get rid of XFS_ICI_NO_TAG since nobody uses it.
> 
> The laster user of XFS_ICI_NO_TAG was quotoff, and the last reference
> was removed in "xfs: remove indirect calls from xfs_inode_walk{,_ag}".
> So I think it should be dropped there, or even better in a prep patch
> removing all the XFS_ICI_NO_TAG code before that one.

Ok, moved to patch 3.

> > +static inline bool
> > +selected_for_walk(
> > +	unsigned int		tag,
> > +	struct xfs_inode	*ip)
> > +{
> > +	switch (tag) {
> > +	case XFS_ICI_BLOCKGC_TAG:
> > +		return xfs_blockgc_grab(ip);
> > +	case XFS_ICI_RECLAIM_TAG:
> > +		return xfs_reclaim_inode_grab(ip);
> > +	default:
> > +		return false;
> > +	}
> > +}
> 
> Maybe name ths something that starts with xfs_ and ends with _grab?

xfs_grabbed_for_walk?

> >   * and release all incore inodes with the given radix tree @tag.
> > @@ -786,12 +803,14 @@ xfs_inode_walk_ag(
> >  	bool			done;
> >  	int			nr_found;
> >  
> > -	ASSERT(tag == XFS_ICI_BLOCKGC_TAG);
> > +	ASSERT(tag < RADIX_TREE_MAX_TAGS);
> >  
> >  restart:
> >  	done = false;
> >  	skipped = 0;
> >  	first_index = 0;
> > +	if (tag == XFS_ICI_RECLAIM_TAG)
> > +		first_index = READ_ONCE(pag->pag_ici_reclaim_cursor);
> 
> if / else to make this clear?

Done.

> >  		for (i = 0; i < nr_found; i++) {
> >  			if (!batch[i])
> >  				continue;
> > -			error = xfs_blockgc_scan_inode(batch[i], eofb);
> > -			xfs_irele(batch[i]);
> > +			switch (tag) {
> > +			case XFS_ICI_BLOCKGC_TAG:
> > +				error = xfs_blockgc_scan_inode(batch[i], eofb);
> > +				xfs_irele(batch[i]);
> > +				break;
> > +			case XFS_ICI_RECLAIM_TAG:
> > +				xfs_reclaim_inode(batch[i], pag);
> > +				error = 0;
> 
> Maybe move the irele into xfs_blockgc_scan_inode to make the calling
> conventions more similar?

Ok.  I'll also fix the off-by-one error in the nr_to_scan check.

--D

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] xfs: refactor per-AG inode tagging functions
  2021-03-26  6:48   ` Christoph Hellwig
@ 2021-03-26 16:34     ` Darrick J. Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2021-03-26 16:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Fri, Mar 26, 2021 at 06:48:09AM +0000, Christoph Hellwig wrote:
> On Thu, Mar 25, 2021 at 05:21:46PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > In preparation for adding another incore inode tree tag, refactor the
> > code that sets and clears tags from the per-AG inode tree and the tree
> > of per-AG structures, and remove the open-coded versions used by the
> > blockgc code.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/xfs/xfs_icache.c |  127 ++++++++++++++++++++++++---------------------------
> >  fs/xfs/xfs_icache.h |    2 -
> >  fs/xfs/xfs_super.c  |    2 -
> >  fs/xfs/xfs_trace.h  |    6 +-
> >  4 files changed, 65 insertions(+), 72 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > index 2b25fe679b0e..4c124bc98f39 100644
> > --- a/fs/xfs/xfs_icache.c
> > +++ b/fs/xfs/xfs_icache.c
> > @@ -29,6 +29,7 @@
> >  /* Forward declarations to reduce indirect calls */
> >  static int xfs_blockgc_scan_inode(struct xfs_inode *ip,
> >  		struct xfs_eofblocks *eofb);
> > +static inline void xfs_blockgc_queue(struct xfs_perag *pag);
> >  static bool xfs_reclaim_inode_grab(struct xfs_inode *ip);
> >  static void xfs_reclaim_inode(struct xfs_inode *ip, struct xfs_perag *pag);
> >  
> > @@ -163,46 +164,78 @@ xfs_reclaim_work_queue(
> >  	rcu_read_unlock();
> >  }
> >  
> > +/* Set a tag on both the AG incore inode tree and the AG radix tree. */
> >  static void
> > +xfs_perag_set_ici_tag(
> > +	struct xfs_perag	*pag,
> > +	xfs_agino_t		agino,
> > +	unsigned int		tag)
> 
> Looking at the callers - I think the logic to lookup the pag and set the
> inode flag should also go in here.

I deliberately didn't do that here because of what happens in the
deferred inactivation patch.  After calling xfs_inactive, we have to
transition the inode from INACTIVATING to RECLAIMABLE (along with the
radix tree tags) without anybody being able to see intermediate state:

	/*
	 * Move the inode from the inactivation phase to the reclamation phase
	 * by clearing both inactivation inode state flags and marking the
	 * inode reclaimable.  Schedule background reclaim to run later.
	 */
	spin_lock(&pag->pag_ici_lock);
	spin_lock(&ip->i_flags_lock);

	ip->i_flags &= ~(XFS_NEED_INACTIVE | XFS_INACTIVATING);
	ip->i_flags |= XFS_IRECLAIMABLE;

	xfs_perag_clear_ici_tag(pag, agino, XFS_ICI_INODEGC_TAG);
	xfs_perag_set_ici_tag(pag, agino, XFS_ICI_RECLAIM_TAG);

	spin_unlock(&ip->i_flags_lock);
	spin_unlock(&pag->pag_ici_lock);

Which is why the xfs_perag_*_ici_tag callers are left in charge of
looking up the pag and taking locks as needed.

> Currently only xfs_inode_destroy
> nests i_Flags log inside the pag_ici_lock, but I don't see how that
> would harm the xfs_blockgc_set_iflag case.

The other wart is that IEOFBLOCKS and ICOWBLOCKS share the same radix
tree tag, which complicates the clearing logic, and I thought it best
to let the callers deal with that.

> I suspect the unlocked
> check in xfs_blockgc_set_iflag would harm in the reclaim case either.

"wouldn't"?

> 
> >  void
> > +xfs_inode_destroy(
> 
> I find this new name a little confusing.  What about
> xfs_inode_mark_reclaimable?

Fixed.

> But overall this new scheme looks nice to me.

Thanks!

--D

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] xfs: use s_inodes in xfs_qm_dqrele_all_inodes
  2021-03-26  0:21 ` [PATCH 1/6] xfs: use s_inodes in xfs_qm_dqrele_all_inodes Darrick J. Wong
@ 2021-03-30  0:44   ` Dave Chinner
  2021-03-30  2:36     ` Darrick J. Wong
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2021-03-30  0:44 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs, hch

On Thu, Mar 25, 2021 at 05:21:18PM -0700, Darrick J. Wong wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> Using xfs_inode_walk in xfs_qm_dqrele_all_inodes is complete overkill,
> given that function simplify wants to iterate all live inodes known
> to the VFS.  Just iterate over the s_inodes list.

I'm not sure that assertion is true. We attach dquots during inode
inactivation after the VFS has removed the inode from the s_inodes
list and evicted the inode. Hence there is a window between the
inode being removed from the sb->s_inodes lists and it being marked
XFS_IRECLAIMABLE where we can attach dquots to the inode.

Indeed, an inode marked XFS_IRECLAIMABLE that has gone through
evict -> destroy -> inactive -> nlink != 0 -> xfs_free_ eofblocks()
can have referenced dquots attached to it and require dqrele() to be
called to release them.

Hence I think that xfs_qm_dqrele_all_inodes() is broken if all it is
doing is walking vfs referenced inodes, because it doesn't actually
release the dquots attached to reclaimable inodes. If this did
actually release all dquots, then there wouldn't be a need for the
xfs_qm_dqdetach() call in xfs_reclaim_inode() just before it's
handed to RCU to be freed....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] xfs: use s_inodes in xfs_qm_dqrele_all_inodes
  2021-03-30  0:44   ` Dave Chinner
@ 2021-03-30  2:36     ` Darrick J. Wong
  2021-03-30  3:07       ` Dave Chinner
  0 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-03-30  2:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-xfs, hch

On Tue, Mar 30, 2021 at 11:44:07AM +1100, Dave Chinner wrote:
> On Thu, Mar 25, 2021 at 05:21:18PM -0700, Darrick J. Wong wrote:
> > From: Christoph Hellwig <hch@lst.de>
> > 
> > Using xfs_inode_walk in xfs_qm_dqrele_all_inodes is complete overkill,
> > given that function simplify wants to iterate all live inodes known
> > to the VFS.  Just iterate over the s_inodes list.
> 
> I'm not sure that assertion is true. We attach dquots during inode
> inactivation after the VFS has removed the inode from the s_inodes
> list and evicted the inode. Hence there is a window between the
> inode being removed from the sb->s_inodes lists and it being marked
> XFS_IRECLAIMABLE where we can attach dquots to the inode.
> 
> Indeed, an inode marked XFS_IRECLAIMABLE that has gone through
> evict -> destroy -> inactive -> nlink != 0 -> xfs_free_ eofblocks()
> can have referenced dquots attached to it and require dqrele() to be
> called to release them.

Why do the dquots need to remain attached after destroy_inode?  We can
easily reattach them during inactivation (v3 did this), and I don't know
why an inode needs dquots once we're through making metadata updates.

> Hence I think that xfs_qm_dqrele_all_inodes() is broken if all it is
> doing is walking vfs referenced inodes, because it doesn't actually
> release the dquots attached to reclaimable inodes. If this did
> actually release all dquots, then there wouldn't be a need for the
> xfs_qm_dqdetach() call in xfs_reclaim_inode() just before it's
> handed to RCU to be freed....

Why does it work now, then?  The current code /also/ leaves the dquots
attached to reclaimable inodes, and the quotaoff scan ignores
IRECLAIMABLE inodes.  Has it simply been the case that the dqpurge spins
until reclaim runs, and reclaim gets run quickly enough (or quotaoff runs
infrequently enough) that nobody's complained?

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] xfs: use s_inodes in xfs_qm_dqrele_all_inodes
  2021-03-30  2:36     ` Darrick J. Wong
@ 2021-03-30  3:07       ` Dave Chinner
  2021-03-30  4:06         ` Darrick J. Wong
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2021-03-30  3:07 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs, hch

On Mon, Mar 29, 2021 at 07:36:56PM -0700, Darrick J. Wong wrote:
> On Tue, Mar 30, 2021 at 11:44:07AM +1100, Dave Chinner wrote:
> > On Thu, Mar 25, 2021 at 05:21:18PM -0700, Darrick J. Wong wrote:
> > > From: Christoph Hellwig <hch@lst.de>
> > > 
> > > Using xfs_inode_walk in xfs_qm_dqrele_all_inodes is complete overkill,
> > > given that function simplify wants to iterate all live inodes known
> > > to the VFS.  Just iterate over the s_inodes list.
> > 
> > I'm not sure that assertion is true. We attach dquots during inode
> > inactivation after the VFS has removed the inode from the s_inodes
> > list and evicted the inode. Hence there is a window between the
> > inode being removed from the sb->s_inodes lists and it being marked
> > XFS_IRECLAIMABLE where we can attach dquots to the inode.
> > 
> > Indeed, an inode marked XFS_IRECLAIMABLE that has gone through
> > evict -> destroy -> inactive -> nlink != 0 -> xfs_free_ eofblocks()
> > can have referenced dquots attached to it and require dqrele() to be
> > called to release them.
> 
> Why do the dquots need to remain attached after destroy_inode?

They don't. But that's not the problem here.

> We can
> easily reattach them during inactivation (v3 did this), and I don't know
> why an inode needs dquots once we're through making metadata updates.

Yes, they get re-attached for truncation, attr removal, EOF block
freeing, etc. Only on the unlinked inode path in inactivation do
they get removed once all the work tha tmodifies the dquots is done.

But many of the paths don't detach them again because they are
multi-use.  e.g xfs_free_eofblocks() will attach dquots, but doesn't
detatch them because it's called from more placed than than the
inactivation path.

I'm sure this can all be cleaned up, but I *really* don't like the
idea of a "walk all XFS inodes" scan that actually only walks the
inodes with VFS references and not -all XFS inodes-.

And there's other problems with doing sb->s_inodes list walks -
namely the global lock. While we are doing this walk (might be tens
of millions of inodes!) we can hold the s_inode_list_lock for a long
time and we cannot instantiate new inodes or evict inodes to/from
the cache while that lock is held. The XFS inode walk is lockless
and we don't hold off anything to do wiht cache instantiation and
freeing, so it has less impact on the running system.

If everything is clean and don't block on locks anywhere, the
s_inodes list walk needs a cond_resched() in it. Again, tens
(hundreds) of millions of inodes can be on that list mean it can
hold the CPU for a long time.

Next, igrab() takes a reference to the inode which will mark them
referenced. THis walk grabs every inode in the filesysetm cache,
so marks them all referenced and makes it harder to reclaim them
under memory pressure. This perturbs working set behaviour.

inode list walks and igrab/iput don't come for free - they perturb
the working set, LRU orders, cause lock contention, long tail
latencies, etc. The XFS inode cache walk might not be the prettiest
thing, but it doesn't have any of these nasty side effects.

So, in general, I don't think we should be adding new inode list
walks anywhere, not even deep in XFS where nobody else might care...

> > Hence I think that xfs_qm_dqrele_all_inodes() is broken if all it is
> > doing is walking vfs referenced inodes, because it doesn't actually
> > release the dquots attached to reclaimable inodes. If this did
> > actually release all dquots, then there wouldn't be a need for the
> > xfs_qm_dqdetach() call in xfs_reclaim_inode() just before it's
> > handed to RCU to be freed....
> 
> Why does it work now, then?  The current code /also/ leaves the dquots
> attached to reclaimable inodes, and the quotaoff scan ignores
> IRECLAIMABLE inodes.

Luck, I think.

> Has it simply been the case that the dqpurge spins
> until reclaim runs, and reclaim gets run quickly enough (or quotaoff runs
> infrequently enough) that nobody's complained?

Yup, that's my assumption - quotaoff is rare, inode reclaim runs
every 5s - and so we haven't noticed it because nobody has looked
closely at how dquots vs inode reclaim works recently...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] xfs: use s_inodes in xfs_qm_dqrele_all_inodes
  2021-03-30  3:07       ` Dave Chinner
@ 2021-03-30  4:06         ` Darrick J. Wong
  2021-03-31  1:34           ` Dave Chinner
  0 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-03-30  4:06 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-xfs, hch

On Tue, Mar 30, 2021 at 02:07:47PM +1100, Dave Chinner wrote:
> On Mon, Mar 29, 2021 at 07:36:56PM -0700, Darrick J. Wong wrote:
> > On Tue, Mar 30, 2021 at 11:44:07AM +1100, Dave Chinner wrote:
> > > On Thu, Mar 25, 2021 at 05:21:18PM -0700, Darrick J. Wong wrote:
> > > > From: Christoph Hellwig <hch@lst.de>
> > > > 
> > > > Using xfs_inode_walk in xfs_qm_dqrele_all_inodes is complete overkill,
> > > > given that function simplify wants to iterate all live inodes known
> > > > to the VFS.  Just iterate over the s_inodes list.
> > > 
> > > I'm not sure that assertion is true. We attach dquots during inode
> > > inactivation after the VFS has removed the inode from the s_inodes
> > > list and evicted the inode. Hence there is a window between the
> > > inode being removed from the sb->s_inodes lists and it being marked
> > > XFS_IRECLAIMABLE where we can attach dquots to the inode.
> > > 
> > > Indeed, an inode marked XFS_IRECLAIMABLE that has gone through
> > > evict -> destroy -> inactive -> nlink != 0 -> xfs_free_ eofblocks()
> > > can have referenced dquots attached to it and require dqrele() to be
> > > called to release them.
> > 
> > Why do the dquots need to remain attached after destroy_inode?
> 
> They don't. But that's not the problem here.

Actually, they do need to remain attached nowadays, because COW blocks
are accounted as incore dquot reservations so we can't let the dquots
drop until the COW fork gets cleaned out.

Granted I guess I did have a patch that changed the dquot lifecycle so
that they would stay in memory after the refcount dropped to zero, even
if they had incore reservations.

...and now I finally see the plot twist that turns this into the
*Fourth* part of Yet Another Quota Restructuring.  This time I get to
reimplement quotaoff! :P

> > We can
> > easily reattach them during inactivation (v3 did this), and I don't know
> > why an inode needs dquots once we're through making metadata updates.
> 
> Yes, they get re-attached for truncation, attr removal, EOF block
> freeing, etc. Only on the unlinked inode path in inactivation do
> they get removed once all the work tha tmodifies the dquots is done.
> 
> But many of the paths don't detach them again because they are
> multi-use.  e.g xfs_free_eofblocks() will attach dquots, but doesn't
> detatch them because it's called from more placed than than the
> inactivation path.
> 
> I'm sure this can all be cleaned up, but I *really* don't like the
> idea of a "walk all XFS inodes" scan that actually only walks the
> inodes with VFS references and not -all XFS inodes-.
> 
> And there's other problems with doing sb->s_inodes list walks -
> namely the global lock. While we are doing this walk (might be tens
> of millions of inodes!) we can hold the s_inode_list_lock for a long
> time and we cannot instantiate new inodes or evict inodes to/from
> the cache while that lock is held. The XFS inode walk is lockless
> and we don't hold off anything to do wiht cache instantiation and
> freeing, so it has less impact on the running system.
> 
> If everything is clean and don't block on locks anywhere, the
> s_inodes list walk needs a cond_resched() in it. Again, tens
> (hundreds) of millions of inodes can be on that list mean it can
> hold the CPU for a long time.

Yeah, I had wondered how good an idea it was to replace batch lookups
with a list walk...

> Next, igrab() takes a reference to the inode which will mark them
> referenced. THis walk grabs every inode in the filesysetm cache,
> so marks them all referenced and makes it harder to reclaim them
> under memory pressure. This perturbs working set behaviour.
> 
> inode list walks and igrab/iput don't come for free - they perturb
> the working set, LRU orders, cause lock contention, long tail
> latencies, etc. The XFS inode cache walk might not be the prettiest
> thing, but it doesn't have any of these nasty side effects.
> 
> So, in general, I don't think we should be adding new inode list
> walks anywhere, not even deep in XFS where nobody else might care...

...but the current quotaoff behavior has /all/ of these problems too.

I think you and I hashed out on IRC that quotaoff could simply take the
ILOCK and the i_flags lock of every inode that isn't INEW, RECLAIMING,
or INACTIVATING; drop the dquots, and drop the locks, and then dqpurge
would only have to wait for the inodes that are actively being reclaimed
or inactivated.

I'll give that a try ... eventually, but I wouldn't be too confident
that I'll get all this turned around before I have to shut the door next
next Thursday.

> > > Hence I think that xfs_qm_dqrele_all_inodes() is broken if all it is
> > > doing is walking vfs referenced inodes, because it doesn't actually
> > > release the dquots attached to reclaimable inodes. If this did
> > > actually release all dquots, then there wouldn't be a need for the
> > > xfs_qm_dqdetach() call in xfs_reclaim_inode() just before it's
> > > handed to RCU to be freed....
> > 
> > Why does it work now, then?  The current code /also/ leaves the dquots
> > attached to reclaimable inodes, and the quotaoff scan ignores
> > IRECLAIMABLE inodes.
> 
> Luck, I think.
> 
> > Has it simply been the case that the dqpurge spins
> > until reclaim runs, and reclaim gets run quickly enough (or quotaoff runs
> > infrequently enough) that nobody's complained?
> 
> Yup, that's my assumption - quotaoff is rare, inode reclaim runs
> every 5s - and so we haven't noticed it because nobody has looked
> closely at how dquots vs inode reclaim works recently...

Yes, that does explain some of the weird test duration quirks I see in
xfs/305....

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] xfs: use s_inodes in xfs_qm_dqrele_all_inodes
  2021-03-30  4:06         ` Darrick J. Wong
@ 2021-03-31  1:34           ` Dave Chinner
  0 siblings, 0 replies; 19+ messages in thread
From: Dave Chinner @ 2021-03-31  1:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs, hch

On Mon, Mar 29, 2021 at 09:06:25PM -0700, Darrick J. Wong wrote:
> On Tue, Mar 30, 2021 at 02:07:47PM +1100, Dave Chinner wrote:
> > Next, igrab() takes a reference to the inode which will mark them
> > referenced. THis walk grabs every inode in the filesysetm cache,
> > so marks them all referenced and makes it harder to reclaim them
> > under memory pressure. This perturbs working set behaviour.
> > 
> > inode list walks and igrab/iput don't come for free - they perturb
> > the working set, LRU orders, cause lock contention, long tail
> > latencies, etc. The XFS inode cache walk might not be the prettiest
> > thing, but it doesn't have any of these nasty side effects.
> > 
> > So, in general, I don't think we should be adding new inode list
> > walks anywhere, not even deep in XFS where nobody else might care...
> 
> ...but the current quotaoff behavior has /all/ of these problems too.

Yup, ISTR in the past it didn't even take inode references, but I
may be misremembering...

> I think you and I hashed out on IRC that quotaoff could simply take the
> ILOCK and the i_flags lock of every inode that isn't INEW, RECLAIMING,
> or INACTIVATING; drop the dquots, and drop the locks, and then dqpurge
> would only have to wait for the inodes that are actively being reclaimed
> or inactivated.

I'm pretty sure that we only need to avoid INEW and IRECLAIM.
Anything else that has dquots attached needs to be locked and have
them removed. Those that are marked INEW will not have dquots
attached, and those marked IRECLAIM are in the process of being torn
down so their dquots will be released in short order.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, back to index

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-26  0:21 [PATCHSET v3 0/6] xfs: clean up incore inode walk functions Darrick J. Wong
2021-03-26  0:21 ` [PATCH 1/6] xfs: use s_inodes in xfs_qm_dqrele_all_inodes Darrick J. Wong
2021-03-30  0:44   ` Dave Chinner
2021-03-30  2:36     ` Darrick J. Wong
2021-03-30  3:07       ` Dave Chinner
2021-03-30  4:06         ` Darrick J. Wong
2021-03-31  1:34           ` Dave Chinner
2021-03-26  0:21 ` [PATCH 2/6] xfs: remove iter_flags parameter from xfs_inode_walk_* Darrick J. Wong
2021-03-26  6:04   ` Christoph Hellwig
2021-03-26  0:21 ` [PATCH 3/6] xfs: remove indirect calls from xfs_inode_walk{,_ag} Darrick J. Wong
2021-03-26  6:08   ` Christoph Hellwig
2021-03-26  0:21 ` [PATCH 4/6] xfs: pass struct xfs_eofblocks to the inode scan callback Darrick J. Wong
2021-03-26  6:09   ` Christoph Hellwig
2021-03-26  0:21 ` [PATCH 5/6] xfs: merge xfs_reclaim_inodes_ag into xfs_inode_walk_ag Darrick J. Wong
2021-03-26  6:30   ` Christoph Hellwig
2021-03-26 16:07     ` Darrick J. Wong
2021-03-26  0:21 ` [PATCH 6/6] xfs: refactor per-AG inode tagging functions Darrick J. Wong
2021-03-26  6:48   ` Christoph Hellwig
2021-03-26 16:34     ` Darrick J. Wong

Linux-XFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-xfs/0 linux-xfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-xfs linux-xfs/ https://lore.kernel.org/linux-xfs \
		linux-xfs@vger.kernel.org
	public-inbox-index linux-xfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-xfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git