All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11 v3] xfs: inode reclaim vs the world
@ 2016-04-13  5:31 Dave Chinner
  2016-04-13  5:31 ` [PATCH 01/11] xfs: we don't need no steekin ->evict_inode Dave Chinner
                   ` (11 more replies)
  0 siblings, 12 replies; 25+ messages in thread
From: Dave Chinner @ 2016-04-13  5:31 UTC (permalink / raw)
  To: xfs

Hi folks,

The inode reclaim patchset has grown somewhat larger with all the
fixes I've accrued over the past couple of days for issues that have
been reported. I've just re-ordered the patchset to have all the bug
fixes that need to go into 4.6-rc up front, followed by other bug
fixes, followed by all the cleanups I wrote as I went along fixing
things. I want to get this out before LSFMM, so I'm posting this
before I've done a full verification that I've got the patches
correct, and they are still running through xfstests right now.

Patch 1 is new - it fixes a regression introduced in 4.6-rc1 and is
caused by clearing the vfs inode i_mode in ->evict_inode, when there
is code in evict() that requires the mode to be intact until
->destroy_inode.

Patch 2 is new - if fixes the inode log item leak that left a dirty
log item on the AIL after the inode had been reclaimed, resulting in
unmountable filesystems and a couple of use-after-free vectors.

Patch 3 is the original fix for the xfs_iflush_cluster lookup
validity checks that were incorect

Patch 4 is the original fix for avoiding flushing stale inodes.

Patch 5 is new, and comes from Alex @ Zadara to avoid having to
reallocate memory when tearing down the inode extent lists. This is
a necessary pre-requisite for patch 6.

Patch 6 is the original patch that pushed all the inode memory
freeing into the RCU callback, so that xfs_iflush_cluster didn't
reference freed memory when racing with reclaim.

Patch 7 is the original patch that made reclaim races eaasier to
detect.

Patch 8 is the original patch that drops out of xfs-iflush_cluster
on the first inode beyond the end of the current cluster.

Patch 9 is the original patch that renames the variables in
xfs_iflush_cluster().

Patch 10 and 11 are new patches that simplify the inode reclaim
tagging interfaces to remove dependencies on the struct xfs_inode
and the inode number where they are not actually required.

-Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/11] xfs: we don't need no steekin ->evict_inode
  2016-04-13  5:31 [PATCH 00/11 v3] xfs: inode reclaim vs the world Dave Chinner
@ 2016-04-13  5:31 ` Dave Chinner
  2016-04-13 16:41   ` Christoph Hellwig
  2016-04-14 12:10   ` Brian Foster
  2016-04-13  5:31 ` [PATCH 02/11] xfs: xfs_iflush_cluster fails to abort on error Dave Chinner
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 25+ messages in thread
From: Dave Chinner @ 2016-04-13  5:31 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Joe Lawrence reported a list_add corruption with 4.6-rc1 when
testing some custom md administration code that made it's own
block device nodes for the md array. The simple test loop of:

for i in {0..100}; do
	mknod --mode=0600 $tmp/tmp_node b $MAJOR $MINOR
	mdadm --detail --export $tmp/tmp_node > /dev/null
	rm -f $tmp/tmp_node
done


Would produce this warning in bd_acquire() when mdadm opened the
device node:

list_add double add: new=ffff88043831c7b8, prev=ffff8804380287d8, next=ffff88043831c7b8.

And then produce this from bd_forget from kdevtmpfs evicting a block
dev inode:

list_del corruption. prev->next should be ffff8800bb83eb10, but was ffff88043831c7b8

This is a regression caused by commit c19b3b05 ("xfs: mode di_mode
to vfs inode"). The issue is that xfs_inactive() frees the
unlinked inode, and the above commit meant that this freeing zeroed
the mode in the struct inode. The problem is that after evict() has
called ->evict_inode, it expects the i_mode to be intact so that it
can call bd_forget() or cd_forget() to drop the reference to the
block device inode attached to the XFS inode.

In reality, the only thing we do in xfs_fs_evict_inode() that is not
generic is call xfs_inactive(). We can move the xfs_inactive() call
to xfs_fs_destroy_inode() without any problems at all, and this
will leave the VFS inode intact until it is completely done with it.

So, remove xfs_fs_evict_inode(), and do the work it used to do in
->destroy_inode instead.

Reported-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_super.c | 28 +++++++---------------------
 1 file changed, 7 insertions(+), 21 deletions(-)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index e2f0f52..416421d 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -928,7 +928,7 @@ xfs_fs_alloc_inode(
 
 /*
  * Now that the generic code is guaranteed not to be accessing
- * the linux inode, we can reclaim the inode.
+ * the linux inode, we can inactivate and reclaim the inode.
  */
 STATIC void
 xfs_fs_destroy_inode(
@@ -938,9 +938,14 @@ xfs_fs_destroy_inode(
 
 	trace_xfs_destroy_inode(ip);
 
-	XFS_STATS_INC(ip->i_mount, vn_reclaim);
+	ASSERT(!rwsem_is_locked(&ip->i_iolock.mr_lock));
+	XFS_STATS_INC(ip->i_mount, vn_rele);
+	XFS_STATS_INC(ip->i_mount, vn_remove);
+
+	xfs_inactive(ip);
 
 	ASSERT(XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0);
+	XFS_STATS_INC(ip->i_mount, vn_reclaim);
 
 	/*
 	 * We should never get here with one of the reclaim flags already set.
@@ -987,24 +992,6 @@ xfs_fs_inode_init_once(
 		     "xfsino", ip->i_ino);
 }
 
-STATIC void
-xfs_fs_evict_inode(
-	struct inode		*inode)
-{
-	xfs_inode_t		*ip = XFS_I(inode);
-
-	ASSERT(!rwsem_is_locked(&ip->i_iolock.mr_lock));
-
-	trace_xfs_evict_inode(ip);
-
-	truncate_inode_pages_final(&inode->i_data);
-	clear_inode(inode);
-	XFS_STATS_INC(ip->i_mount, vn_rele);
-	XFS_STATS_INC(ip->i_mount, vn_remove);
-
-	xfs_inactive(ip);
-}
-
 /*
  * We do an unlocked check for XFS_IDONTCACHE here because we are already
  * serialised against cache hits here via the inode->i_lock and igrab() in
@@ -1673,7 +1660,6 @@ xfs_fs_free_cached_objects(
 static const struct super_operations xfs_super_operations = {
 	.alloc_inode		= xfs_fs_alloc_inode,
 	.destroy_inode		= xfs_fs_destroy_inode,
-	.evict_inode		= xfs_fs_evict_inode,
 	.drop_inode		= xfs_fs_drop_inode,
 	.put_super		= xfs_fs_put_super,
 	.sync_fs		= xfs_fs_sync_fs,
-- 
2.7.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/11] xfs: xfs_iflush_cluster fails to abort on error
  2016-04-13  5:31 [PATCH 00/11 v3] xfs: inode reclaim vs the world Dave Chinner
  2016-04-13  5:31 ` [PATCH 01/11] xfs: we don't need no steekin ->evict_inode Dave Chinner
@ 2016-04-13  5:31 ` Dave Chinner
  2016-04-13 16:41   ` Christoph Hellwig
  2016-04-13  5:31 ` [PATCH 03/11] xfs: fix inode validity check in xfs_iflush_cluster Dave Chinner
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2016-04-13  5:31 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When a failure due to an inode buffer occurs, the error handling
fails to abort the inode writeback correctly. This can result in the
inode being reclaimed whilst still in the AIL, leading to
use-after-free situations as well as filesystems that cannot be
unmounted as the inode log items left in the AIL never get removed.

Fix this by ensuring fatal errors from xfs_imap_to_bp() result in
the inode flush being aborted correctly.

cc: <stable@vger.kernel.org> # 3.10.x-
Reported-by: Shyam Kaushik <shyam@zadarastorage.com>
Diagnosed-by: Shyam Kaushik <shyam@zadarastorage.com>
Tested-by: Shyam Kaushik <shyam@zadarastorage.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_inode.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index f79ea59..e97ae5d 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3314,7 +3314,7 @@ xfs_iflush(
 	struct xfs_buf		**bpp)
 {
 	struct xfs_mount	*mp = ip->i_mount;
-	struct xfs_buf		*bp;
+	struct xfs_buf		*bp = NULL;
 	struct xfs_dinode	*dip;
 	int			error;
 
@@ -3356,14 +3356,22 @@ xfs_iflush(
 	}
 
 	/*
-	 * Get the buffer containing the on-disk inode.
+	 * Get the buffer containing the on-disk inode. We are doing a try-lock
+	 * operation here, so we may get  an EAGAIN error. In that case, we
+	 * simply want to return with the inode still dirty.
+	 *
+	 * If we get any other error, we effectively have a corruption situation
+	 * and we cannot flush the inode, so we treat it the same as failing
+	 * xfs_iflush_int().
 	 */
 	error = xfs_imap_to_bp(mp, NULL, &ip->i_imap, &dip, &bp, XBF_TRYLOCK,
 			       0);
-	if (error || !bp) {
+	if (error == -EAGAIN) {
 		xfs_ifunlock(ip);
 		return error;
 	}
+	if (error)
+		goto corrupt_out;
 
 	/*
 	 * First flush out the inode that xfs_iflush was called with.
@@ -3391,7 +3399,8 @@ xfs_iflush(
 	return 0;
 
 corrupt_out:
-	xfs_buf_relse(bp);
+	if (bp)
+		xfs_buf_relse(bp);
 	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
 cluster_corrupt_out:
 	error = -EFSCORRUPTED;
-- 
2.7.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/11] xfs: fix inode validity check in xfs_iflush_cluster
  2016-04-13  5:31 [PATCH 00/11 v3] xfs: inode reclaim vs the world Dave Chinner
  2016-04-13  5:31 ` [PATCH 01/11] xfs: we don't need no steekin ->evict_inode Dave Chinner
  2016-04-13  5:31 ` [PATCH 02/11] xfs: xfs_iflush_cluster fails to abort on error Dave Chinner
@ 2016-04-13  5:31 ` Dave Chinner
  2016-04-13  5:31 ` [PATCH 04/11] xfs: skip stale inodes " Dave Chinner
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Dave Chinner @ 2016-04-13  5:31 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Some careless idiot(*) wrote crap code in commit 1a3e8f3 ("xfs:
convert inode cache lookups to use RCU locking") back in late 2010,
and so xfs_iflush_cluster checks the wrong inode for whether it is
still valid under RCU protection. Fix it to lock and check the
correct inode.

(*) Careless-idiot: Dave Chinner <dchinner@redhat.com>

cc: <stable@vger.kernel.org> # 3.10.x-
Discovered-by: Brain Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/xfs_inode.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index e97ae5d..e302f11 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3192,13 +3192,13 @@ xfs_iflush_cluster(
 		 * We need to check under the i_flags_lock for a valid inode
 		 * here. Skip it if it is not valid or the wrong inode.
 		 */
-		spin_lock(&ip->i_flags_lock);
-		if (!ip->i_ino ||
+		spin_lock(&iq->i_flags_lock);
+		if (!iq->i_ino ||
 		    (XFS_INO_TO_AGINO(mp, iq->i_ino) & mask) != first_index) {
-			spin_unlock(&ip->i_flags_lock);
+			spin_unlock(&iq->i_flags_lock);
 			continue;
 		}
-		spin_unlock(&ip->i_flags_lock);
+		spin_unlock(&iq->i_flags_lock);
 
 		/*
 		 * Do an un-protected check to see if the inode is dirty and
-- 
2.7.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/11] xfs: skip stale inodes in xfs_iflush_cluster
  2016-04-13  5:31 [PATCH 00/11 v3] xfs: inode reclaim vs the world Dave Chinner
                   ` (2 preceding siblings ...)
  2016-04-13  5:31 ` [PATCH 03/11] xfs: fix inode validity check in xfs_iflush_cluster Dave Chinner
@ 2016-04-13  5:31 ` Dave Chinner
  2016-04-13  5:31 ` [PATCH 05/11] xfs: optimise xfs_iext_destroy Dave Chinner
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Dave Chinner @ 2016-04-13  5:31 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

We don't write back stale inodes so we should skip them in
xfs_iflush_cluster, too.

cc: <stable@vger.kernel.org> # 3.10.x-
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/xfs_inode.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index e302f11..ad0ad8d 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3194,6 +3194,7 @@ xfs_iflush_cluster(
 		 */
 		spin_lock(&iq->i_flags_lock);
 		if (!iq->i_ino ||
+		    __xfs_iflags_test(iq, XFS_ISTALE) ||
 		    (XFS_INO_TO_AGINO(mp, iq->i_ino) & mask) != first_index) {
 			spin_unlock(&iq->i_flags_lock);
 			continue;
-- 
2.7.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/11] xfs: optimise xfs_iext_destroy
  2016-04-13  5:31 [PATCH 00/11 v3] xfs: inode reclaim vs the world Dave Chinner
                   ` (3 preceding siblings ...)
  2016-04-13  5:31 ` [PATCH 04/11] xfs: skip stale inodes " Dave Chinner
@ 2016-04-13  5:31 ` Dave Chinner
  2016-04-13 16:45   ` Christoph Hellwig
  2016-04-13  5:31 ` [PATCH 06/11] xfs: xfs_inode_free() isn't RCU safe Dave Chinner
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2016-04-13  5:31 UTC (permalink / raw)
  To: xfs

From: Alex Lyakas <alex@zadarastorage.com>

When unmounting XFS, we call:

xfs_inode_free => xfs_idestroy_fork => xfs_iext_destroy

This goes over the whole indirection array and calls
xfs_iext_irec_remove for each one of the erps (from the last one to
the first one). As a result, we keep shrinking (reallocating
actually) the indirection array until we shrink out all of its
elements. When we have files with huge numbers of extents, umount
takes 30-80 sec, depending on the amount of files that XFS loaded
and the amount of indirection entries of each file. The unmount
stack looks like:

[<ffffffffc0b6d200>] xfs_iext_realloc_indirect+0x40/0x60 [xfs]
[<ffffffffc0b6cd8e>] xfs_iext_irec_remove+0xee/0xf0 [xfs]
[<ffffffffc0b6cdcd>] xfs_iext_destroy+0x3d/0xb0 [xfs]
[<ffffffffc0b6cef6>] xfs_idestroy_fork+0xb6/0xf0 [xfs]
[<ffffffffc0b87002>] xfs_inode_free+0xb2/0xc0 [xfs]
[<ffffffffc0b87260>] xfs_reclaim_inode+0x250/0x340 [xfs]
[<ffffffffc0b87583>] xfs_reclaim_inodes_ag+0x233/0x370 [xfs]
[<ffffffffc0b8823d>] xfs_reclaim_inodes+0x1d/0x20 [xfs]
[<ffffffffc0b96feb>] xfs_unmountfs+0x7b/0x1a0 [xfs]
[<ffffffffc0b98e4d>] xfs_fs_put_super+0x2d/0x70 [xfs]
[<ffffffff811e9e36>] generic_shutdown_super+0x76/0x100
[<ffffffff811ea207>] kill_block_super+0x27/0x70
[<ffffffff811ea519>] deactivate_locked_super+0x49/0x60
[<ffffffff811eaaee>] deactivate_super+0x4e/0x70
[<ffffffff81207593>] cleanup_mnt+0x43/0x90
[<ffffffff81207632>] __cleanup_mnt+0x12/0x20
[<ffffffff8108f8e7>] task_work_run+0xa7/0xe0
[<ffffffff81014ff7>] do_notify_resume+0x97/0xb0
[<ffffffff81717c6f>] int_signal+0x12/0x17

Further, this reallocation prevents us from freeing the extent list
from a RCU callback as allocation can block. Hence if the extent
list is in indirect format, optimise the freeing of the extent list
to only use kmem_free calls by freeing entire extent buffer pages at
a time, rather than extent by extent.

Signed-off-by: Alex Lyakas <alex@zadarastorage.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/libxfs/xfs_inode_fork.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index d3d1477..27497e1 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -1519,6 +1519,26 @@ xfs_iext_indirect_to_direct(
 }
 
 /*
+ * Remove all records from the indirection array.
+ */
+STATIC void
+xfs_iext_irec_remove_all(
+	struct xfs_ifork *ifp)
+{
+	int		nlists;
+	int		i;
+
+	ASSERT(ifp->if_flags & XFS_IFEXTIREC);
+	nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
+	for (i = 0; i < nlists; i++) {
+		xfs_ext_irec_t *erp = &ifp->if_u1.if_ext_irec[i];
+		if (erp->er_extbuf)
+			kmem_free(erp->er_extbuf);
+	}
+	kmem_free(ifp->if_u1.if_ext_irec);
+}
+
+/*
  * Free incore file extents.
  */
 void
@@ -1526,13 +1546,7 @@ xfs_iext_destroy(
 	xfs_ifork_t	*ifp)		/* inode fork pointer */
 {
 	if (ifp->if_flags & XFS_IFEXTIREC) {
-		int	erp_idx;
-		int	nlists;
-
-		nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
-		for (erp_idx = nlists - 1; erp_idx >= 0 ; erp_idx--) {
-			xfs_iext_irec_remove(ifp, erp_idx);
-		}
+		xfs_iext_irec_remove_all(ifp);
 		ifp->if_flags &= ~XFS_IFEXTIREC;
 	} else if (ifp->if_real_bytes) {
 		kmem_free(ifp->if_u1.if_extents);
-- 
2.7.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/11] xfs: xfs_inode_free() isn't RCU safe
  2016-04-13  5:31 [PATCH 00/11 v3] xfs: inode reclaim vs the world Dave Chinner
                   ` (4 preceding siblings ...)
  2016-04-13  5:31 ` [PATCH 05/11] xfs: optimise xfs_iext_destroy Dave Chinner
@ 2016-04-13  5:31 ` Dave Chinner
  2016-04-13  5:31 ` [PATCH 07/11] xfs: mark reclaimed inodes invalid earlier Dave Chinner
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Dave Chinner @ 2016-04-13  5:31 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The xfs_inode freed in xfs_inode_free() has multiple allocated
structures attached to it. We free these in xfs_inode_free() before
we mark the inode as invalid, and before we run call_rcu() to queue
the structure for freeing.

Unfortunately, this freeing can race with other accesses that are in
the RCU current grace period that have found the inode in the radix
tree with a valid state.  This includes xfs_iflush_cluster(), which
calls xfs_inode_clean(), and that accesses the inode log item on the
xfs_inode.

The log item structure is freed in xfs_inode_free(), so there is the
possibility we can be accessing freed memory in xfs_iflush_cluster()
after validating the xfs_inode structure as being valid for this RCU
context. Hence we can get spuriously incorrect clean state returned
from such checks. This can lead to use thinking the inode is dirty
when it is, in fact, clean, and so incorrectly attaching it to the
buffer for IO and completion processing.

This then leads to use-after-free situations on the xfs_inode itself
if the IO completes after the current RCU grace period expires. The
buffer callbacks will access the xfs_inode and try to do all sorts
of things it shouldn't with freed memory.

IOWs, xfs_iflush_cluster() only works correctly when racing with
inode reclaim if the inode log item is present and correctly stating
the inode is clean. If the inode is being freed, then reclaim has
already made sure the inode is clean, and hence xfs_iflush_cluster
can skip it. However, we are accessing the inode inode under RCU
read lock protection and so also must ensure that all dynamically
allocated memory we reference in this context is not freed until the
RCU grace period expires.

To fix this, move all the potential memory freeing into
xfs_inode_free_callback() so that we are guarantee RCU protected
lookup code will always have the memory structures it needs
available during the RCU grace period that lookup races can occur
in.

Discovered-by: Brain Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/xfs_icache.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index bf2d607..0c94cde 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -94,13 +94,6 @@ xfs_inode_free_callback(
 	struct inode		*inode = container_of(head, struct inode, i_rcu);
 	struct xfs_inode	*ip = XFS_I(inode);
 
-	kmem_zone_free(xfs_inode_zone, ip);
-}
-
-void
-xfs_inode_free(
-	struct xfs_inode	*ip)
-{
 	switch (VFS_I(ip)->i_mode & S_IFMT) {
 	case S_IFREG:
 	case S_IFDIR:
@@ -118,6 +111,13 @@ xfs_inode_free(
 		ip->i_itemp = NULL;
 	}
 
+	kmem_zone_free(xfs_inode_zone, ip);
+}
+
+void
+xfs_inode_free(
+	struct xfs_inode	*ip)
+{
 	/*
 	 * Because we use RCU freeing we need to ensure the inode always
 	 * appears to be reclaimed with an invalid inode number when in the
-- 
2.7.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/11] xfs: mark reclaimed inodes invalid earlier
  2016-04-13  5:31 [PATCH 00/11 v3] xfs: inode reclaim vs the world Dave Chinner
                   ` (5 preceding siblings ...)
  2016-04-13  5:31 ` [PATCH 06/11] xfs: xfs_inode_free() isn't RCU safe Dave Chinner
@ 2016-04-13  5:31 ` Dave Chinner
  2016-04-13  6:49   ` Dave Chinner
  2016-04-13  5:31 ` [PATCH 08/11] xfs: xfs_iflush_cluster has range issues Dave Chinner
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2016-04-13  5:31 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The last thing we do before using call_rcu() on an xfs_inode to be
freed is mark it as invalid. This means there is a window between
when we know for certain that the inode is going to be freed and
when we do actually mark it as "freed".

This is important in the context of RCU lookups - we can look up the
inode, find that it is valid, and then use it as such not realising
that it is in the final stages of being freed.

As such, mark the inode as being invalid the moment we know it is
going to be reclaimed. This can be done while we still hold the
XFS_ILOCK_EXCL and the flush lock in xfs_inode_reclaim, meaning that
it occurs well before we remove it from the radix tree, and that
the i_flags_lock, the XFS_ILOCK and the inode flush lock all act as
synchronisation points for detecting that an inode is about to go
away.

For defensive purposes, this allows us to add a further check to
xfs_iflush_cluster to ensure we skip inodes that are being freed
after we grab the XFS_ILOCK_SHARED and the flush lock - we know that
if the inode number if valid while we have these locks held we know
that it has not progressed through reclaim to the point where it is
clean and is about to be freed.

[bfoster: fixed __xfs_inode_clear_reclaim() using ip->i_ino after it
	  had already been zeroed.]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_icache.c | 40 ++++++++++++++++++++++++++++++++--------
 fs/xfs/xfs_inode.c  | 13 +++++++++++++
 2 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 0c94cde..a60db43 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -114,6 +114,18 @@ xfs_inode_free_callback(
 	kmem_zone_free(xfs_inode_zone, ip);
 }
 
+static void
+__xfs_inode_free(
+	struct xfs_inode	*ip)
+{
+	/* asserts to verify all state is correct here */
+	ASSERT(atomic_read(&ip->i_pincount) == 0);
+	ASSERT(!xfs_isiflocked(ip));
+	XFS_STATS_DEC(ip->i_mount, vn_active);
+
+	call_rcu(&VFS_I(ip)->i_rcu, xfs_inode_free_callback);
+}
+
 void
 xfs_inode_free(
 	struct xfs_inode	*ip)
@@ -129,12 +141,7 @@ xfs_inode_free(
 	ip->i_ino = 0;
 	spin_unlock(&ip->i_flags_lock);
 
-	/* asserts to verify all state is correct here */
-	ASSERT(atomic_read(&ip->i_pincount) == 0);
-	ASSERT(!xfs_isiflocked(ip));
-	XFS_STATS_DEC(ip->i_mount, vn_active);
-
-	call_rcu(&VFS_I(ip)->i_rcu, xfs_inode_free_callback);
+	__xfs_inode_free(ip);
 }
 
 /*
@@ -929,6 +936,7 @@ xfs_reclaim_inode(
 	int			sync_mode)
 {
 	struct xfs_buf		*bp = NULL;
+	xfs_ino_t		ino = ip->i_ino; /* for radix_tree_delete */
 	int			error;
 
 restart:
@@ -993,6 +1001,22 @@ restart:
 
 	xfs_iflock(ip);
 reclaim:
+	/*
+	 * Because we use RCU freeing we need to ensure the inode always appears
+	 * to be reclaimed with an invalid inode number when in the free state.
+	 * We do this as early as possible under the ILOCK and flush lock so
+	 * that xfs_iflush_cluster() can be guaranteed to detect races with us
+	 * here. By doing this, we guarantee that once xfs_iflush_cluster has
+	 * locked both the XFS_ILOCK and the flush lock that it will see either
+	 * a valid, flushable inode that will serialise correctly against the
+	 * locks below, or it will see a clean (and invalid) inode that it can
+	 * skip.
+	 */
+	spin_lock(&ip->i_flags_lock);
+	ip->i_flags = XFS_IRECLAIM;
+	ip->i_ino = 0;
+	spin_unlock(&ip->i_flags_lock);
+
 	xfs_ifunlock(ip);
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 
@@ -1006,7 +1030,7 @@ reclaim:
 	 */
 	spin_lock(&pag->pag_ici_lock);
 	if (!radix_tree_delete(&pag->pag_ici_root,
-				XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino)))
+				XFS_INO_TO_AGINO(ip->i_mount, ino)))
 		ASSERT(0);
 	__xfs_inode_clear_reclaim(pag, ip);
 	spin_unlock(&pag->pag_ici_lock);
@@ -1023,7 +1047,7 @@ reclaim:
 	xfs_qm_dqdetach(ip);
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 
-	xfs_inode_free(ip);
+	__xfs_inode_free(ip);
 	return error;
 
 out_ifunlock:
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index ad0ad8d..8931375 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3226,6 +3226,19 @@ xfs_iflush_cluster(
 			continue;
 		}
 
+
+		/*
+		 * Check the inode number again, just to be certain we are not
+		 * racing with freeing in xfs_reclaim_inode(). See the comments
+		 * in that function for more information as to why the initial
+		 * check is not sufficient.
+		 */
+		if (!iq->i_ino) {
+			xfs_ifunlock(iq);
+			xfs_iunlock(iq, XFS_ILOCK_SHARED);
+			continue;
+		}
+
 		/*
 		 * arriving here means that this inode can be flushed.  First
 		 * re-check that it's dirty before flushing.
-- 
2.7.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/11] xfs: xfs_iflush_cluster has range issues
  2016-04-13  5:31 [PATCH 00/11 v3] xfs: inode reclaim vs the world Dave Chinner
                   ` (6 preceding siblings ...)
  2016-04-13  5:31 ` [PATCH 07/11] xfs: mark reclaimed inodes invalid earlier Dave Chinner
@ 2016-04-13  5:31 ` Dave Chinner
  2016-04-13  5:31 ` [PATCH 09/11] xfs: rename variables in xfs_iflush_cluster for clarity Dave Chinner
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Dave Chinner @ 2016-04-13  5:31 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_iflush_cluster() does a gang lookup on the radix tree, meaning
it can find inodes beyond the current cluster if there is sparse
cache population. gang lookups return results in ascending index
order, so stop trying to cluster inodes once the first inode outside
the cluster mask is detected.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/xfs_inode.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 8931375..26f22cb 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3194,11 +3194,20 @@ xfs_iflush_cluster(
 		 */
 		spin_lock(&iq->i_flags_lock);
 		if (!iq->i_ino ||
-		    __xfs_iflags_test(iq, XFS_ISTALE) ||
-		    (XFS_INO_TO_AGINO(mp, iq->i_ino) & mask) != first_index) {
+		    __xfs_iflags_test(iq, XFS_ISTALE)) {
 			spin_unlock(&iq->i_flags_lock);
 			continue;
 		}
+
+		/*
+		 * Once we fall off the end of the cluster, no point checking
+		 * any more inodes in the list because they will also all be
+		 * outside the cluster.
+		 */
+		if ((XFS_INO_TO_AGINO(mp, iq->i_ino) & mask) != first_index) {
+			spin_unlock(&iq->i_flags_lock);
+			break;
+		}
 		spin_unlock(&iq->i_flags_lock);
 
 		/*
-- 
2.7.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/11] xfs: rename variables in xfs_iflush_cluster for clarity
  2016-04-13  5:31 [PATCH 00/11 v3] xfs: inode reclaim vs the world Dave Chinner
                   ` (7 preceding siblings ...)
  2016-04-13  5:31 ` [PATCH 08/11] xfs: xfs_iflush_cluster has range issues Dave Chinner
@ 2016-04-13  5:31 ` Dave Chinner
  2016-04-13  5:31 ` [PATCH 10/11] xfs: simplify inode reclaim tagging interfaces Dave Chinner
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Dave Chinner @ 2016-04-13  5:31 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The cluster inode variable uses unconventional naming - iq - which
makes it hard to distinguish it between the inode passed into the
function - ip - and that is a vector for mistakes to be made.
Rename all the cluster inode variables to use a more conventional
prefixes to reduce potential future confusion (cilist, cilist_size,
cip).

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/xfs_inode.c | 74 +++++++++++++++++++++++++++---------------------------
 1 file changed, 37 insertions(+), 37 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 26f22cb..ee6799e 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3149,16 +3149,16 @@ out_release_wip:
 
 STATIC int
 xfs_iflush_cluster(
-	xfs_inode_t	*ip,
-	xfs_buf_t	*bp)
+	struct xfs_inode	*ip,
+	struct xfs_buf		*bp)
 {
-	xfs_mount_t		*mp = ip->i_mount;
+	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_perag	*pag;
 	unsigned long		first_index, mask;
 	unsigned long		inodes_per_cluster;
-	int			ilist_size;
-	xfs_inode_t		**ilist;
-	xfs_inode_t		*iq;
+	int			cilist_size;
+	struct xfs_inode	**cilist;
+	struct xfs_inode	*cip;
 	int			nr_found;
 	int			clcount = 0;
 	int			bufwasdelwri;
@@ -3167,23 +3167,23 @@ xfs_iflush_cluster(
 	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
 
 	inodes_per_cluster = mp->m_inode_cluster_size >> mp->m_sb.sb_inodelog;
-	ilist_size = inodes_per_cluster * sizeof(xfs_inode_t *);
-	ilist = kmem_alloc(ilist_size, KM_MAYFAIL|KM_NOFS);
-	if (!ilist)
+	cilist_size = inodes_per_cluster * sizeof(xfs_inode_t *);
+	cilist = kmem_alloc(cilist_size, KM_MAYFAIL|KM_NOFS);
+	if (!cilist)
 		goto out_put;
 
 	mask = ~(((mp->m_inode_cluster_size >> mp->m_sb.sb_inodelog)) - 1);
 	first_index = XFS_INO_TO_AGINO(mp, ip->i_ino) & mask;
 	rcu_read_lock();
 	/* really need a gang lookup range call here */
-	nr_found = radix_tree_gang_lookup(&pag->pag_ici_root, (void**)ilist,
+	nr_found = radix_tree_gang_lookup(&pag->pag_ici_root, (void**)cilist,
 					first_index, inodes_per_cluster);
 	if (nr_found == 0)
 		goto out_free;
 
 	for (i = 0; i < nr_found; i++) {
-		iq = ilist[i];
-		if (iq == ip)
+		cip = cilist[i];
+		if (cip == ip)
 			continue;
 
 		/*
@@ -3192,10 +3192,10 @@ xfs_iflush_cluster(
 		 * We need to check under the i_flags_lock for a valid inode
 		 * here. Skip it if it is not valid or the wrong inode.
 		 */
-		spin_lock(&iq->i_flags_lock);
-		if (!iq->i_ino ||
-		    __xfs_iflags_test(iq, XFS_ISTALE)) {
-			spin_unlock(&iq->i_flags_lock);
+		spin_lock(&cip->i_flags_lock);
+		if (!cip->i_ino ||
+		    __xfs_iflags_test(cip, XFS_ISTALE)) {
+			spin_unlock(&cip->i_flags_lock);
 			continue;
 		}
 
@@ -3204,18 +3204,18 @@ xfs_iflush_cluster(
 		 * any more inodes in the list because they will also all be
 		 * outside the cluster.
 		 */
-		if ((XFS_INO_TO_AGINO(mp, iq->i_ino) & mask) != first_index) {
-			spin_unlock(&iq->i_flags_lock);
+		if ((XFS_INO_TO_AGINO(mp, cip->i_ino) & mask) != first_index) {
+			spin_unlock(&cip->i_flags_lock);
 			break;
 		}
-		spin_unlock(&iq->i_flags_lock);
+		spin_unlock(&cip->i_flags_lock);
 
 		/*
 		 * Do an un-protected check to see if the inode is dirty and
 		 * is a candidate for flushing.  These checks will be repeated
 		 * later after the appropriate locks are acquired.
 		 */
-		if (xfs_inode_clean(iq) && xfs_ipincount(iq) == 0)
+		if (xfs_inode_clean(cip) && xfs_ipincount(cip) == 0)
 			continue;
 
 		/*
@@ -3223,15 +3223,15 @@ xfs_iflush_cluster(
 		 * then this inode cannot be flushed and is skipped.
 		 */
 
-		if (!xfs_ilock_nowait(iq, XFS_ILOCK_SHARED))
+		if (!xfs_ilock_nowait(cip, XFS_ILOCK_SHARED))
 			continue;
-		if (!xfs_iflock_nowait(iq)) {
-			xfs_iunlock(iq, XFS_ILOCK_SHARED);
+		if (!xfs_iflock_nowait(cip)) {
+			xfs_iunlock(cip, XFS_ILOCK_SHARED);
 			continue;
 		}
-		if (xfs_ipincount(iq)) {
-			xfs_ifunlock(iq);
-			xfs_iunlock(iq, XFS_ILOCK_SHARED);
+		if (xfs_ipincount(cip)) {
+			xfs_ifunlock(cip);
+			xfs_iunlock(cip, XFS_ILOCK_SHARED);
 			continue;
 		}
 
@@ -3242,9 +3242,9 @@ xfs_iflush_cluster(
 		 * in that function for more information as to why the initial
 		 * check is not sufficient.
 		 */
-		if (!iq->i_ino) {
-			xfs_ifunlock(iq);
-			xfs_iunlock(iq, XFS_ILOCK_SHARED);
+		if (!cip->i_ino) {
+			xfs_ifunlock(cip);
+			xfs_iunlock(cip, XFS_ILOCK_SHARED);
 			continue;
 		}
 
@@ -3252,18 +3252,18 @@ xfs_iflush_cluster(
 		 * arriving here means that this inode can be flushed.  First
 		 * re-check that it's dirty before flushing.
 		 */
-		if (!xfs_inode_clean(iq)) {
+		if (!xfs_inode_clean(cip)) {
 			int	error;
-			error = xfs_iflush_int(iq, bp);
+			error = xfs_iflush_int(cip, bp);
 			if (error) {
-				xfs_iunlock(iq, XFS_ILOCK_SHARED);
+				xfs_iunlock(cip, XFS_ILOCK_SHARED);
 				goto cluster_corrupt_out;
 			}
 			clcount++;
 		} else {
-			xfs_ifunlock(iq);
+			xfs_ifunlock(cip);
 		}
-		xfs_iunlock(iq, XFS_ILOCK_SHARED);
+		xfs_iunlock(cip, XFS_ILOCK_SHARED);
 	}
 
 	if (clcount) {
@@ -3273,7 +3273,7 @@ xfs_iflush_cluster(
 
 out_free:
 	rcu_read_unlock();
-	kmem_free(ilist);
+	kmem_free(cilist);
 out_put:
 	xfs_perag_put(pag);
 	return 0;
@@ -3316,8 +3316,8 @@ cluster_corrupt_out:
 	/*
 	 * Unlocks the flush lock
 	 */
-	xfs_iflush_abort(iq, false);
-	kmem_free(ilist);
+	xfs_iflush_abort(cip, false);
+	kmem_free(cilist);
 	xfs_perag_put(pag);
 	return -EFSCORRUPTED;
 }
-- 
2.7.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/11] xfs: simplify inode reclaim tagging interfaces
  2016-04-13  5:31 [PATCH 00/11 v3] xfs: inode reclaim vs the world Dave Chinner
                   ` (8 preceding siblings ...)
  2016-04-13  5:31 ` [PATCH 09/11] xfs: rename variables in xfs_iflush_cluster for clarity Dave Chinner
@ 2016-04-13  5:31 ` Dave Chinner
  2016-04-14 12:10   ` Brian Foster
  2016-06-29  4:21   ` Darrick J. Wong
  2016-04-13  5:31 ` [PATCH 11/11] xfs: move reclaim tagging functions Dave Chinner
  2016-04-13 15:38 ` [PATCH 00/11 v3] xfs: inode reclaim vs the world Darrick J. Wong
  11 siblings, 2 replies; 25+ messages in thread
From: Dave Chinner @ 2016-04-13  5:31 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Inode radix tree tagging for reclaim passes a lot of unnecessary
variables around. Over time the xfs-perag has grown a xfs_mount
backpointer, and an internal agno so we don't need to pass other
variables into the tagging functions to supply this information.

Rework the functions to pass the minimal variable set required
and simplify the internal logic and flow.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_icache.c | 97 +++++++++++++++++++++++++++--------------------------
 1 file changed, 49 insertions(+), 48 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index a60db43..927e7b0 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -37,8 +37,7 @@
 #include <linux/kthread.h>
 #include <linux/freezer.h>
 
-STATIC void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp,
-				struct xfs_perag *pag, struct xfs_inode *ip);
+STATIC void xfs_inode_clear_reclaim_tag(struct xfs_perag *pag, xfs_ino_t ino);
 
 /*
  * Allocate and initialise an xfs_inode.
@@ -271,7 +270,7 @@ xfs_iget_cache_hit(
 		 */
 		ip->i_flags &= ~XFS_IRECLAIM_RESET_FLAGS;
 		ip->i_flags |= XFS_INEW;
-		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
+		xfs_inode_clear_reclaim_tag(pag, ip->i_ino);
 		inode->i_state = I_NEW;
 
 		ASSERT(!rwsem_is_locked(&ip->i_iolock.mr_lock));
@@ -768,29 +767,44 @@ xfs_reclaim_worker(
 }
 
 static void
-__xfs_inode_set_reclaim_tag(
+xfs_perag_set_reclaim_tag(
 	struct xfs_perag	*pag,
-	struct xfs_inode	*ip)
+	xfs_ino_t		ino)
 {
-	radix_tree_tag_set(&pag->pag_ici_root,
-			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
+	struct xfs_mount	*mp = pag->pag_mount;
+
+	ASSERT(spin_is_locked(&pag->pag_ici_lock));
+	if (pag->pag_ici_reclaimable++)
+		return;
+
+	/* propagate the reclaim tag up into the perag radix tree */
+	spin_lock(&mp->m_perag_lock);
+	radix_tree_tag_set(&mp->m_perag_tree, pag->pag_agno,
 			   XFS_ICI_RECLAIM_TAG);
+	spin_unlock(&mp->m_perag_lock);
 
-	if (!pag->pag_ici_reclaimable) {
-		/* propagate the reclaim tag up into the perag radix tree */
-		spin_lock(&ip->i_mount->m_perag_lock);
-		radix_tree_tag_set(&ip->i_mount->m_perag_tree,
-				XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino),
-				XFS_ICI_RECLAIM_TAG);
-		spin_unlock(&ip->i_mount->m_perag_lock);
+	/* schedule periodic background inode reclaim */
+	xfs_reclaim_work_queue(mp);
 
-		/* schedule periodic background inode reclaim */
-		xfs_reclaim_work_queue(ip->i_mount);
+	trace_xfs_perag_set_reclaim(mp, pag->pag_agno, -1, _RET_IP_);
+}
 
-		trace_xfs_perag_set_reclaim(ip->i_mount, pag->pag_agno,
-							-1, _RET_IP_);
-	}
-	pag->pag_ici_reclaimable++;
+static void
+xfs_perag_clear_reclaim_tag(
+	struct xfs_perag	*pag)
+{
+	struct xfs_mount	*mp = pag->pag_mount;
+
+	ASSERT(spin_is_locked(&pag->pag_ici_lock));
+	if (--pag->pag_ici_reclaimable)
+		return;
+
+	/* clear the reclaim tag from the perag radix tree */
+	spin_lock(&mp->m_perag_lock);
+	radix_tree_tag_clear(&mp->m_perag_tree, pag->pag_agno,
+			     XFS_ICI_RECLAIM_TAG);
+	spin_unlock(&mp->m_perag_lock);
+	trace_xfs_perag_clear_reclaim(mp, pag->pag_agno, -1, _RET_IP_);
 }
 
 /*
@@ -800,48 +814,35 @@ __xfs_inode_set_reclaim_tag(
  */
 void
 xfs_inode_set_reclaim_tag(
-	xfs_inode_t	*ip)
+	struct xfs_inode	*ip)
 {
-	struct xfs_mount *mp = ip->i_mount;
-	struct xfs_perag *pag;
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_perag	*pag;
 
 	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
 	spin_lock(&pag->pag_ici_lock);
 	spin_lock(&ip->i_flags_lock);
-	__xfs_inode_set_reclaim_tag(pag, ip);
+
+	radix_tree_tag_set(&pag->pag_ici_root, XFS_INO_TO_AGINO(mp, ip->i_ino),
+			   XFS_ICI_RECLAIM_TAG);
+	xfs_perag_set_reclaim_tag(pag, ip->i_ino);
 	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
+
 	spin_unlock(&ip->i_flags_lock);
 	spin_unlock(&pag->pag_ici_lock);
 	xfs_perag_put(pag);
 }
 
-STATIC void
-__xfs_inode_clear_reclaim(
-	xfs_perag_t	*pag,
-	xfs_inode_t	*ip)
-{
-	pag->pag_ici_reclaimable--;
-	if (!pag->pag_ici_reclaimable) {
-		/* clear the reclaim tag from the perag radix tree */
-		spin_lock(&ip->i_mount->m_perag_lock);
-		radix_tree_tag_clear(&ip->i_mount->m_perag_tree,
-				XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino),
-				XFS_ICI_RECLAIM_TAG);
-		spin_unlock(&ip->i_mount->m_perag_lock);
-		trace_xfs_perag_clear_reclaim(ip->i_mount, pag->pag_agno,
-							-1, _RET_IP_);
-	}
-}
 
 STATIC void
-__xfs_inode_clear_reclaim_tag(
-	xfs_mount_t	*mp,
-	xfs_perag_t	*pag,
-	xfs_inode_t	*ip)
+xfs_inode_clear_reclaim_tag(
+	struct xfs_perag	*pag,
+	xfs_ino_t		ino)
 {
 	radix_tree_tag_clear(&pag->pag_ici_root,
-			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
-	__xfs_inode_clear_reclaim(pag, ip);
+			     XFS_INO_TO_AGINO(pag->pag_mount, ino),
+			     XFS_ICI_RECLAIM_TAG);
+	xfs_perag_clear_reclaim_tag(pag);
 }
 
 /*
@@ -1032,7 +1033,7 @@ reclaim:
 	if (!radix_tree_delete(&pag->pag_ici_root,
 				XFS_INO_TO_AGINO(ip->i_mount, ino)))
 		ASSERT(0);
-	__xfs_inode_clear_reclaim(pag, ip);
+	xfs_perag_clear_reclaim_tag(pag);
 	spin_unlock(&pag->pag_ici_lock);
 
 	/*
-- 
2.7.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 11/11] xfs: move reclaim tagging functions
  2016-04-13  5:31 [PATCH 00/11 v3] xfs: inode reclaim vs the world Dave Chinner
                   ` (9 preceding siblings ...)
  2016-04-13  5:31 ` [PATCH 10/11] xfs: simplify inode reclaim tagging interfaces Dave Chinner
@ 2016-04-13  5:31 ` Dave Chinner
  2016-04-14 12:11   ` Brian Foster
  2016-04-13 15:38 ` [PATCH 00/11 v3] xfs: inode reclaim vs the world Darrick J. Wong
  11 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2016-04-13  5:31 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Rearrange the inode tagging functions so that they are higher up in
xfs_cache.c and so there is no need for forward prototypes to be
defined. This is purely code movement, no other change.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_icache.c | 235 ++++++++++++++++++++++++++--------------------------
 1 file changed, 116 insertions(+), 119 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 927e7b0..dab58e1 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -37,8 +37,6 @@
 #include <linux/kthread.h>
 #include <linux/freezer.h>
 
-STATIC void xfs_inode_clear_reclaim_tag(struct xfs_perag *pag, xfs_ino_t ino);
-
 /*
  * Allocate and initialise an xfs_inode.
  */
@@ -172,6 +170,122 @@ xfs_reinit_inode(
 }
 
 /*
+ * Queue a new inode reclaim pass if there are reclaimable inodes and there
+ * isn't a reclaim pass already in progress. By default it runs every 5s based
+ * on the xfs periodic sync default of 30s. Perhaps this should have it's own
+ * tunable, but that can be done if this method proves to be ineffective or too
+ * aggressive.
+ */
+static void
+xfs_reclaim_work_queue(
+	struct xfs_mount        *mp)
+{
+
+	rcu_read_lock();
+	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_RECLAIM_TAG)) {
+		queue_delayed_work(mp->m_reclaim_workqueue, &mp->m_reclaim_work,
+			msecs_to_jiffies(xfs_syncd_centisecs / 6 * 10));
+	}
+	rcu_read_unlock();
+}
+
+/*
+ * This is a fast pass over the inode cache to try to get reclaim moving on as
+ * many inodes as possible in a short period of time. It kicks itself every few
+ * seconds, as well as being kicked by the inode cache shrinker when memory
+ * goes low. It scans as quickly as possible avoiding locked inodes or those
+ * already being flushed, and once done schedules a future pass.
+ */
+void
+xfs_reclaim_worker(
+	struct work_struct *work)
+{
+	struct xfs_mount *mp = container_of(to_delayed_work(work),
+					struct xfs_mount, m_reclaim_work);
+
+	xfs_reclaim_inodes(mp, SYNC_TRYLOCK);
+	xfs_reclaim_work_queue(mp);
+}
+
+static void
+xfs_perag_set_reclaim_tag(
+	struct xfs_perag	*pag,
+	xfs_ino_t		ino)
+{
+	struct xfs_mount	*mp = pag->pag_mount;
+
+	ASSERT(spin_is_locked(&pag->pag_ici_lock));
+	if (pag->pag_ici_reclaimable++)
+		return;
+
+	/* propagate the reclaim tag up into the perag radix tree */
+	spin_lock(&mp->m_perag_lock);
+	radix_tree_tag_set(&mp->m_perag_tree, pag->pag_agno,
+			   XFS_ICI_RECLAIM_TAG);
+	spin_unlock(&mp->m_perag_lock);
+
+	/* schedule periodic background inode reclaim */
+	xfs_reclaim_work_queue(mp);
+
+	trace_xfs_perag_set_reclaim(mp, pag->pag_agno, -1, _RET_IP_);
+}
+
+static void
+xfs_perag_clear_reclaim_tag(
+	struct xfs_perag	*pag)
+{
+	struct xfs_mount	*mp = pag->pag_mount;
+
+	ASSERT(spin_is_locked(&pag->pag_ici_lock));
+	if (--pag->pag_ici_reclaimable)
+		return;
+
+	/* clear the reclaim tag from the perag radix tree */
+	spin_lock(&mp->m_perag_lock);
+	radix_tree_tag_clear(&mp->m_perag_tree, pag->pag_agno,
+			     XFS_ICI_RECLAIM_TAG);
+	spin_unlock(&mp->m_perag_lock);
+	trace_xfs_perag_clear_reclaim(mp, pag->pag_agno, -1, _RET_IP_);
+}
+
+/*
+ * We set the inode flag atomically with the radix tree tag.
+ * Once we get tag lookups on the radix tree, this inode flag
+ * can go away.
+ */
+void
+xfs_inode_set_reclaim_tag(
+	struct xfs_inode	*ip)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_perag	*pag;
+
+	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
+	spin_lock(&pag->pag_ici_lock);
+	spin_lock(&ip->i_flags_lock);
+
+	radix_tree_tag_set(&pag->pag_ici_root, XFS_INO_TO_AGINO(mp, ip->i_ino),
+			   XFS_ICI_RECLAIM_TAG);
+	xfs_perag_set_reclaim_tag(pag, ip->i_ino);
+	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
+
+	spin_unlock(&ip->i_flags_lock);
+	spin_unlock(&pag->pag_ici_lock);
+	xfs_perag_put(pag);
+}
+
+static void
+xfs_inode_clear_reclaim_tag(
+	struct xfs_perag	*pag,
+	xfs_ino_t		ino)
+{
+	radix_tree_tag_clear(&pag->pag_ici_root,
+			     XFS_INO_TO_AGINO(pag->pag_mount, ino),
+			     XFS_ICI_RECLAIM_TAG);
+	xfs_perag_clear_reclaim_tag(pag);
+}
+
+/*
  * Check the validity of the inode we just found it the cache
  */
 static int
@@ -729,123 +843,6 @@ xfs_inode_ag_iterator_tag(
 }
 
 /*
- * Queue a new inode reclaim pass if there are reclaimable inodes and there
- * isn't a reclaim pass already in progress. By default it runs every 5s based
- * on the xfs periodic sync default of 30s. Perhaps this should have it's own
- * tunable, but that can be done if this method proves to be ineffective or too
- * aggressive.
- */
-static void
-xfs_reclaim_work_queue(
-	struct xfs_mount        *mp)
-{
-
-	rcu_read_lock();
-	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_RECLAIM_TAG)) {
-		queue_delayed_work(mp->m_reclaim_workqueue, &mp->m_reclaim_work,
-			msecs_to_jiffies(xfs_syncd_centisecs / 6 * 10));
-	}
-	rcu_read_unlock();
-}
-
-/*
- * This is a fast pass over the inode cache to try to get reclaim moving on as
- * many inodes as possible in a short period of time. It kicks itself every few
- * seconds, as well as being kicked by the inode cache shrinker when memory
- * goes low. It scans as quickly as possible avoiding locked inodes or those
- * already being flushed, and once done schedules a future pass.
- */
-void
-xfs_reclaim_worker(
-	struct work_struct *work)
-{
-	struct xfs_mount *mp = container_of(to_delayed_work(work),
-					struct xfs_mount, m_reclaim_work);
-
-	xfs_reclaim_inodes(mp, SYNC_TRYLOCK);
-	xfs_reclaim_work_queue(mp);
-}
-
-static void
-xfs_perag_set_reclaim_tag(
-	struct xfs_perag	*pag,
-	xfs_ino_t		ino)
-{
-	struct xfs_mount	*mp = pag->pag_mount;
-
-	ASSERT(spin_is_locked(&pag->pag_ici_lock));
-	if (pag->pag_ici_reclaimable++)
-		return;
-
-	/* propagate the reclaim tag up into the perag radix tree */
-	spin_lock(&mp->m_perag_lock);
-	radix_tree_tag_set(&mp->m_perag_tree, pag->pag_agno,
-			   XFS_ICI_RECLAIM_TAG);
-	spin_unlock(&mp->m_perag_lock);
-
-	/* schedule periodic background inode reclaim */
-	xfs_reclaim_work_queue(mp);
-
-	trace_xfs_perag_set_reclaim(mp, pag->pag_agno, -1, _RET_IP_);
-}
-
-static void
-xfs_perag_clear_reclaim_tag(
-	struct xfs_perag	*pag)
-{
-	struct xfs_mount	*mp = pag->pag_mount;
-
-	ASSERT(spin_is_locked(&pag->pag_ici_lock));
-	if (--pag->pag_ici_reclaimable)
-		return;
-
-	/* clear the reclaim tag from the perag radix tree */
-	spin_lock(&mp->m_perag_lock);
-	radix_tree_tag_clear(&mp->m_perag_tree, pag->pag_agno,
-			     XFS_ICI_RECLAIM_TAG);
-	spin_unlock(&mp->m_perag_lock);
-	trace_xfs_perag_clear_reclaim(mp, pag->pag_agno, -1, _RET_IP_);
-}
-
-/*
- * We set the inode flag atomically with the radix tree tag.
- * Once we get tag lookups on the radix tree, this inode flag
- * can go away.
- */
-void
-xfs_inode_set_reclaim_tag(
-	struct xfs_inode	*ip)
-{
-	struct xfs_mount	*mp = ip->i_mount;
-	struct xfs_perag	*pag;
-
-	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
-	spin_lock(&pag->pag_ici_lock);
-	spin_lock(&ip->i_flags_lock);
-
-	radix_tree_tag_set(&pag->pag_ici_root, XFS_INO_TO_AGINO(mp, ip->i_ino),
-			   XFS_ICI_RECLAIM_TAG);
-	xfs_perag_set_reclaim_tag(pag, ip->i_ino);
-	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
-
-	spin_unlock(&ip->i_flags_lock);
-	spin_unlock(&pag->pag_ici_lock);
-	xfs_perag_put(pag);
-}
-
-
-STATIC void
-xfs_inode_clear_reclaim_tag(
-	struct xfs_perag	*pag,
-	xfs_ino_t		ino)
-{
-	radix_tree_tag_clear(&pag->pag_ici_root,
-			     XFS_INO_TO_AGINO(pag->pag_mount, ino),
-			     XFS_ICI_RECLAIM_TAG);
-	xfs_perag_clear_reclaim_tag(pag);
-}
-
-/*
  * Grab the inode for reclaim exclusively.
  * Return 0 if we grabbed it, non-zero otherwise.
  */
-- 
2.7.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 07/11] xfs: mark reclaimed inodes invalid earlier
  2016-04-13  5:31 ` [PATCH 07/11] xfs: mark reclaimed inodes invalid earlier Dave Chinner
@ 2016-04-13  6:49   ` Dave Chinner
  2016-04-14 12:10     ` Brian Foster
  0 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2016-04-13  6:49 UTC (permalink / raw)
  To: xfs

On Wed, Apr 13, 2016 at 03:31:28PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The last thing we do before using call_rcu() on an xfs_inode to be
> freed is mark it as invalid. This means there is a window between
> when we know for certain that the inode is going to be freed and
> when we do actually mark it as "freed".
> 
> This is important in the context of RCU lookups - we can look up the
> inode, find that it is valid, and then use it as such not realising
> that it is in the final stages of being freed.
> 
> As such, mark the inode as being invalid the moment we know it is
> going to be reclaimed. This can be done while we still hold the
> XFS_ILOCK_EXCL and the flush lock in xfs_inode_reclaim, meaning that
> it occurs well before we remove it from the radix tree, and that
> the i_flags_lock, the XFS_ILOCK and the inode flush lock all act as
> synchronisation points for detecting that an inode is about to go
> away.
> 
> For defensive purposes, this allows us to add a further check to
> xfs_iflush_cluster to ensure we skip inodes that are being freed
> after we grab the XFS_ILOCK_SHARED and the flush lock - we know that
> if the inode number if valid while we have these locks held we know
> that it has not progressed through reclaim to the point where it is
> clean and is about to be freed.
> 
> [bfoster: fixed __xfs_inode_clear_reclaim() using ip->i_ino after it
> 	  had already been zeroed.]

And, of course, in reordering this I dropped this fix because it was
handled by the reworking of tagging code to use pag->pag_agno.

So I've brought that small change forward to this patch (using
pag->pag_agno instead of deriving it from the ip->i_ino in
__xfs_inode_clear_reclaim()).

That means I have to rebase the later cleanup patch too, but the end
result of the patch set is identical...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/11 v3] xfs: inode reclaim vs the world
  2016-04-13  5:31 [PATCH 00/11 v3] xfs: inode reclaim vs the world Dave Chinner
                   ` (10 preceding siblings ...)
  2016-04-13  5:31 ` [PATCH 11/11] xfs: move reclaim tagging functions Dave Chinner
@ 2016-04-13 15:38 ` Darrick J. Wong
  11 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2016-04-13 15:38 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

> Subject: Re: [PATCH 00/11 v3] xfs: inode reclaim vs the world

Does it also have to defeat seven evil X's? ;)

--D

On Wed, Apr 13, 2016 at 03:31:21PM +1000, Dave Chinner wrote:
> Hi folks,
> 
> The inode reclaim patchset has grown somewhat larger with all the
> fixes I've accrued over the past couple of days for issues that have
> been reported. I've just re-ordered the patchset to have all the bug
> fixes that need to go into 4.6-rc up front, followed by other bug
> fixes, followed by all the cleanups I wrote as I went along fixing
> things. I want to get this out before LSFMM, so I'm posting this
> before I've done a full verification that I've got the patches
> correct, and they are still running through xfstests right now.
> 
> Patch 1 is new - it fixes a regression introduced in 4.6-rc1 and is
> caused by clearing the vfs inode i_mode in ->evict_inode, when there
> is code in evict() that requires the mode to be intact until
> ->destroy_inode.
> 
> Patch 2 is new - if fixes the inode log item leak that left a dirty
> log item on the AIL after the inode had been reclaimed, resulting in
> unmountable filesystems and a couple of use-after-free vectors.
> 
> Patch 3 is the original fix for the xfs_iflush_cluster lookup
> validity checks that were incorect
> 
> Patch 4 is the original fix for avoiding flushing stale inodes.
> 
> Patch 5 is new, and comes from Alex @ Zadara to avoid having to
> reallocate memory when tearing down the inode extent lists. This is
> a necessary pre-requisite for patch 6.
> 
> Patch 6 is the original patch that pushed all the inode memory
> freeing into the RCU callback, so that xfs_iflush_cluster didn't
> reference freed memory when racing with reclaim.
> 
> Patch 7 is the original patch that made reclaim races eaasier to
> detect.
> 
> Patch 8 is the original patch that drops out of xfs-iflush_cluster
> on the first inode beyond the end of the current cluster.
> 
> Patch 9 is the original patch that renames the variables in
> xfs_iflush_cluster().
> 
> Patch 10 and 11 are new patches that simplify the inode reclaim
> tagging interfaces to remove dependencies on the struct xfs_inode
> and the inode number where they are not actually required.
> 
> -Dave.
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 01/11] xfs: we don't need no steekin ->evict_inode
  2016-04-13  5:31 ` [PATCH 01/11] xfs: we don't need no steekin ->evict_inode Dave Chinner
@ 2016-04-13 16:41   ` Christoph Hellwig
  2016-04-13 21:20     ` Dave Chinner
  2016-04-14 12:10   ` Brian Foster
  1 sibling, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2016-04-13 16:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: viro, xfs

Al has been very unhappy about our destroy_inode abuse, and I'd
reluctant to make it worse.

Why do we need to play games with i_mode when freeing?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 02/11] xfs: xfs_iflush_cluster fails to abort on error
  2016-04-13  5:31 ` [PATCH 02/11] xfs: xfs_iflush_cluster fails to abort on error Dave Chinner
@ 2016-04-13 16:41   ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2016-04-13 16:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 05/11] xfs: optimise xfs_iext_destroy
  2016-04-13  5:31 ` [PATCH 05/11] xfs: optimise xfs_iext_destroy Dave Chinner
@ 2016-04-13 16:45   ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2016-04-13 16:45 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

>  /*
> + * Remove all records from the indirection array.
> + */
> +STATIC void
> +xfs_iext_irec_remove_all(
> +	struct xfs_ifork *ifp)
> +{
> +	int		nlists;
> +	int		i;
> +
> +	ASSERT(ifp->if_flags & XFS_IFEXTIREC);
> +	nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
> +	for (i = 0; i < nlists; i++) {
> +		xfs_ext_irec_t *erp = &ifp->if_u1.if_ext_irec[i];

Can we avoid the typedef for new code?

> +		if (erp->er_extbuf)
> +			kmem_free(erp->er_extbuf);

no need to check for NULL before a kmem_free, e.g. this whole loop could
just be:

	for (i = 0; i < nlists; i++)
		kmem_free(ifp->if_u1.if_ext_irec[i].er_extbuf);

>  {
>  	if (ifp->if_flags & XFS_IFEXTIREC) {
> -		int	erp_idx;
> -		int	nlists;
> -
> -		nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
> -		for (erp_idx = nlists - 1; erp_idx >= 0 ; erp_idx--) {
> -			xfs_iext_irec_remove(ifp, erp_idx);
> -		}
> +		xfs_iext_irec_remove_all(ifp);
>  		ifp->if_flags &= ~XFS_IFEXTIREC;

I'd be tempted to just move clearing of the flag into
xfs_iext_irec_remove_all if we change the patch anyway.

Otherwise this looks fine to me.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 01/11] xfs: we don't need no steekin ->evict_inode
  2016-04-13 16:41   ` Christoph Hellwig
@ 2016-04-13 21:20     ` Dave Chinner
  0 siblings, 0 replies; 25+ messages in thread
From: Dave Chinner @ 2016-04-13 21:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: viro, xfs

On Wed, Apr 13, 2016 at 09:41:10AM -0700, Christoph Hellwig wrote:
> Al has been very unhappy about our destroy_inode abuse, and I'd
> reluctant to make it worse.

I don't have any problems with it at all. The VFS doesn't care
how we manage inode allocation or destruction, so I don't see
any problem with what we do outside the visibility of the VFS inode
life cycle.

> Why do we need to play games with i_mode when freeing?

Because the inode cache lookup XFS code uses i_mode == 0 to detect a
freed inode. i.e.  in xfs_iget_cache_miss/xfs_iget_cache_hit this is
used to allow XFS_IGET_CREATE to return a freed inodes still in the
cache during inode allocation. This is the only case where we are
allowed to find a freed inode in the cache lookup, so we have to be
able to detect it somehow.

Similarly, I'm pretty sure there are assumptions all through the XFS
code (both kernel and userspace) that i_mode = 0 means the inode is
free/unallocated. xfs_repair, for example, makes this assumption,
and so we have to zero the mode when freeing the inode...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 01/11] xfs: we don't need no steekin ->evict_inode
  2016-04-13  5:31 ` [PATCH 01/11] xfs: we don't need no steekin ->evict_inode Dave Chinner
  2016-04-13 16:41   ` Christoph Hellwig
@ 2016-04-14 12:10   ` Brian Foster
  1 sibling, 0 replies; 25+ messages in thread
From: Brian Foster @ 2016-04-14 12:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Apr 13, 2016 at 03:31:22PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Joe Lawrence reported a list_add corruption with 4.6-rc1 when
> testing some custom md administration code that made it's own
> block device nodes for the md array. The simple test loop of:
> 
> for i in {0..100}; do
> 	mknod --mode=0600 $tmp/tmp_node b $MAJOR $MINOR
> 	mdadm --detail --export $tmp/tmp_node > /dev/null
> 	rm -f $tmp/tmp_node
> done
> 
> 
> Would produce this warning in bd_acquire() when mdadm opened the
> device node:
> 
> list_add double add: new=ffff88043831c7b8, prev=ffff8804380287d8, next=ffff88043831c7b8.
> 
> And then produce this from bd_forget from kdevtmpfs evicting a block
> dev inode:
> 
> list_del corruption. prev->next should be ffff8800bb83eb10, but was ffff88043831c7b8
> 
> This is a regression caused by commit c19b3b05 ("xfs: mode di_mode
> to vfs inode"). The issue is that xfs_inactive() frees the
> unlinked inode, and the above commit meant that this freeing zeroed
> the mode in the struct inode. The problem is that after evict() has
> called ->evict_inode, it expects the i_mode to be intact so that it
> can call bd_forget() or cd_forget() to drop the reference to the
> block device inode attached to the XFS inode.
> 
> In reality, the only thing we do in xfs_fs_evict_inode() that is not
> generic is call xfs_inactive(). We can move the xfs_inactive() call
> to xfs_fs_destroy_inode() without any problems at all, and this
> will leave the VFS inode intact until it is completely done with it.
> 
> So, remove xfs_fs_evict_inode(), and do the work it used to do in
> ->destroy_inode instead.
> 
> Reported-by: Joe Lawrence <joe.lawrence@stratus.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_super.c | 28 +++++++---------------------
>  1 file changed, 7 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index e2f0f52..416421d 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -928,7 +928,7 @@ xfs_fs_alloc_inode(
>  
>  /*
>   * Now that the generic code is guaranteed not to be accessing
> - * the linux inode, we can reclaim the inode.
> + * the linux inode, we can inactivate and reclaim the inode.
>   */
>  STATIC void
>  xfs_fs_destroy_inode(
> @@ -938,9 +938,14 @@ xfs_fs_destroy_inode(
>  
>  	trace_xfs_destroy_inode(ip);
>  
> -	XFS_STATS_INC(ip->i_mount, vn_reclaim);
> +	ASSERT(!rwsem_is_locked(&ip->i_iolock.mr_lock));
> +	XFS_STATS_INC(ip->i_mount, vn_rele);
> +	XFS_STATS_INC(ip->i_mount, vn_remove);
> +
> +	xfs_inactive(ip);
>  
>  	ASSERT(XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0);
> +	XFS_STATS_INC(ip->i_mount, vn_reclaim);
>  
>  	/*
>  	 * We should never get here with one of the reclaim flags already set.
> @@ -987,24 +992,6 @@ xfs_fs_inode_init_once(
>  		     "xfsino", ip->i_ino);
>  }
>  
> -STATIC void
> -xfs_fs_evict_inode(
> -	struct inode		*inode)
> -{
> -	xfs_inode_t		*ip = XFS_I(inode);
> -
> -	ASSERT(!rwsem_is_locked(&ip->i_iolock.mr_lock));
> -
> -	trace_xfs_evict_inode(ip);
> -
> -	truncate_inode_pages_final(&inode->i_data);
> -	clear_inode(inode);
> -	XFS_STATS_INC(ip->i_mount, vn_rele);
> -	XFS_STATS_INC(ip->i_mount, vn_remove);
> -
> -	xfs_inactive(ip);
> -}
> -
>  /*
>   * We do an unlocked check for XFS_IDONTCACHE here because we are already
>   * serialised against cache hits here via the inode->i_lock and igrab() in
> @@ -1673,7 +1660,6 @@ xfs_fs_free_cached_objects(
>  static const struct super_operations xfs_super_operations = {
>  	.alloc_inode		= xfs_fs_alloc_inode,
>  	.destroy_inode		= xfs_fs_destroy_inode,
> -	.evict_inode		= xfs_fs_evict_inode,
>  	.drop_inode		= xfs_fs_drop_inode,
>  	.put_super		= xfs_fs_put_super,
>  	.sync_fs		= xfs_fs_sync_fs,
> -- 
> 2.7.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 07/11] xfs: mark reclaimed inodes invalid earlier
  2016-04-13  6:49   ` Dave Chinner
@ 2016-04-14 12:10     ` Brian Foster
  2016-04-14 23:31       ` Dave Chinner
  0 siblings, 1 reply; 25+ messages in thread
From: Brian Foster @ 2016-04-14 12:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Apr 13, 2016 at 04:49:00PM +1000, Dave Chinner wrote:
> On Wed, Apr 13, 2016 at 03:31:28PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > The last thing we do before using call_rcu() on an xfs_inode to be
> > freed is mark it as invalid. This means there is a window between
> > when we know for certain that the inode is going to be freed and
> > when we do actually mark it as "freed".
> > 
> > This is important in the context of RCU lookups - we can look up the
> > inode, find that it is valid, and then use it as such not realising
> > that it is in the final stages of being freed.
> > 
> > As such, mark the inode as being invalid the moment we know it is
> > going to be reclaimed. This can be done while we still hold the
> > XFS_ILOCK_EXCL and the flush lock in xfs_inode_reclaim, meaning that
> > it occurs well before we remove it from the radix tree, and that
> > the i_flags_lock, the XFS_ILOCK and the inode flush lock all act as
> > synchronisation points for detecting that an inode is about to go
> > away.
> > 
> > For defensive purposes, this allows us to add a further check to
> > xfs_iflush_cluster to ensure we skip inodes that are being freed
> > after we grab the XFS_ILOCK_SHARED and the flush lock - we know that
> > if the inode number if valid while we have these locks held we know
> > that it has not progressed through reclaim to the point where it is
> > clean and is about to be freed.
> > 
> > [bfoster: fixed __xfs_inode_clear_reclaim() using ip->i_ino after it
> > 	  had already been zeroed.]
> 
> And, of course, in reordering this I dropped this fix because it was
> handled by the reworking of tagging code to use pag->pag_agno.
> 
> So I've brought that small change forward to this patch (using
> pag->pag_agno instead of deriving it from the ip->i_ino in
> __xfs_inode_clear_reclaim()).
> 

I don't see any such change in this patch..?
__xfs_inode_clear_reclaim() still uses ip->i_ino.

Brian

> That means I have to rebase the later cleanup patch too, but the end
> result of the patch set is identical...
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 10/11] xfs: simplify inode reclaim tagging interfaces
  2016-04-13  5:31 ` [PATCH 10/11] xfs: simplify inode reclaim tagging interfaces Dave Chinner
@ 2016-04-14 12:10   ` Brian Foster
  2016-06-29  4:21   ` Darrick J. Wong
  1 sibling, 0 replies; 25+ messages in thread
From: Brian Foster @ 2016-04-14 12:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Apr 13, 2016 at 03:31:31PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Inode radix tree tagging for reclaim passes a lot of unnecessary
> variables around. Over time the xfs-perag has grown a xfs_mount
> backpointer, and an internal agno so we don't need to pass other
> variables into the tagging functions to supply this information.
> 
> Rework the functions to pass the minimal variable set required
> and simplify the internal logic and flow.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_icache.c | 97 +++++++++++++++++++++++++++--------------------------
>  1 file changed, 49 insertions(+), 48 deletions(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index a60db43..927e7b0 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -37,8 +37,7 @@
>  #include <linux/kthread.h>
>  #include <linux/freezer.h>
>  
> -STATIC void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp,
> -				struct xfs_perag *pag, struct xfs_inode *ip);
> +STATIC void xfs_inode_clear_reclaim_tag(struct xfs_perag *pag, xfs_ino_t ino);
>  
>  /*
>   * Allocate and initialise an xfs_inode.
> @@ -271,7 +270,7 @@ xfs_iget_cache_hit(
>  		 */
>  		ip->i_flags &= ~XFS_IRECLAIM_RESET_FLAGS;
>  		ip->i_flags |= XFS_INEW;
> -		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
> +		xfs_inode_clear_reclaim_tag(pag, ip->i_ino);
>  		inode->i_state = I_NEW;
>  
>  		ASSERT(!rwsem_is_locked(&ip->i_iolock.mr_lock));
> @@ -768,29 +767,44 @@ xfs_reclaim_worker(
>  }
>  
>  static void
> -__xfs_inode_set_reclaim_tag(
> +xfs_perag_set_reclaim_tag(
>  	struct xfs_perag	*pag,
> -	struct xfs_inode	*ip)
> +	xfs_ino_t		ino)

No need to pass the inode number here. Otherwise looks good:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  {
> -	radix_tree_tag_set(&pag->pag_ici_root,
> -			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
> +	struct xfs_mount	*mp = pag->pag_mount;
> +
> +	ASSERT(spin_is_locked(&pag->pag_ici_lock));
> +	if (pag->pag_ici_reclaimable++)
> +		return;
> +
> +	/* propagate the reclaim tag up into the perag radix tree */
> +	spin_lock(&mp->m_perag_lock);
> +	radix_tree_tag_set(&mp->m_perag_tree, pag->pag_agno,
>  			   XFS_ICI_RECLAIM_TAG);
> +	spin_unlock(&mp->m_perag_lock);
>  
> -	if (!pag->pag_ici_reclaimable) {
> -		/* propagate the reclaim tag up into the perag radix tree */
> -		spin_lock(&ip->i_mount->m_perag_lock);
> -		radix_tree_tag_set(&ip->i_mount->m_perag_tree,
> -				XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino),
> -				XFS_ICI_RECLAIM_TAG);
> -		spin_unlock(&ip->i_mount->m_perag_lock);
> +	/* schedule periodic background inode reclaim */
> +	xfs_reclaim_work_queue(mp);
>  
> -		/* schedule periodic background inode reclaim */
> -		xfs_reclaim_work_queue(ip->i_mount);
> +	trace_xfs_perag_set_reclaim(mp, pag->pag_agno, -1, _RET_IP_);
> +}
>  
> -		trace_xfs_perag_set_reclaim(ip->i_mount, pag->pag_agno,
> -							-1, _RET_IP_);
> -	}
> -	pag->pag_ici_reclaimable++;
> +static void
> +xfs_perag_clear_reclaim_tag(
> +	struct xfs_perag	*pag)
> +{
> +	struct xfs_mount	*mp = pag->pag_mount;
> +
> +	ASSERT(spin_is_locked(&pag->pag_ici_lock));
> +	if (--pag->pag_ici_reclaimable)
> +		return;
> +
> +	/* clear the reclaim tag from the perag radix tree */
> +	spin_lock(&mp->m_perag_lock);
> +	radix_tree_tag_clear(&mp->m_perag_tree, pag->pag_agno,
> +			     XFS_ICI_RECLAIM_TAG);
> +	spin_unlock(&mp->m_perag_lock);
> +	trace_xfs_perag_clear_reclaim(mp, pag->pag_agno, -1, _RET_IP_);
>  }
>  
>  /*
> @@ -800,48 +814,35 @@ __xfs_inode_set_reclaim_tag(
>   */
>  void
>  xfs_inode_set_reclaim_tag(
> -	xfs_inode_t	*ip)
> +	struct xfs_inode	*ip)
>  {
> -	struct xfs_mount *mp = ip->i_mount;
> -	struct xfs_perag *pag;
> +	struct xfs_mount	*mp = ip->i_mount;
> +	struct xfs_perag	*pag;
>  
>  	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
>  	spin_lock(&pag->pag_ici_lock);
>  	spin_lock(&ip->i_flags_lock);
> -	__xfs_inode_set_reclaim_tag(pag, ip);
> +
> +	radix_tree_tag_set(&pag->pag_ici_root, XFS_INO_TO_AGINO(mp, ip->i_ino),
> +			   XFS_ICI_RECLAIM_TAG);
> +	xfs_perag_set_reclaim_tag(pag, ip->i_ino);
>  	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
> +
>  	spin_unlock(&ip->i_flags_lock);
>  	spin_unlock(&pag->pag_ici_lock);
>  	xfs_perag_put(pag);
>  }
>  
> -STATIC void
> -__xfs_inode_clear_reclaim(
> -	xfs_perag_t	*pag,
> -	xfs_inode_t	*ip)
> -{
> -	pag->pag_ici_reclaimable--;
> -	if (!pag->pag_ici_reclaimable) {
> -		/* clear the reclaim tag from the perag radix tree */
> -		spin_lock(&ip->i_mount->m_perag_lock);
> -		radix_tree_tag_clear(&ip->i_mount->m_perag_tree,
> -				XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino),
> -				XFS_ICI_RECLAIM_TAG);
> -		spin_unlock(&ip->i_mount->m_perag_lock);
> -		trace_xfs_perag_clear_reclaim(ip->i_mount, pag->pag_agno,
> -							-1, _RET_IP_);
> -	}
> -}
>  
>  STATIC void
> -__xfs_inode_clear_reclaim_tag(
> -	xfs_mount_t	*mp,
> -	xfs_perag_t	*pag,
> -	xfs_inode_t	*ip)
> +xfs_inode_clear_reclaim_tag(
> +	struct xfs_perag	*pag,
> +	xfs_ino_t		ino)
>  {
>  	radix_tree_tag_clear(&pag->pag_ici_root,
> -			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
> -	__xfs_inode_clear_reclaim(pag, ip);
> +			     XFS_INO_TO_AGINO(pag->pag_mount, ino),
> +			     XFS_ICI_RECLAIM_TAG);
> +	xfs_perag_clear_reclaim_tag(pag);
>  }
>  
>  /*
> @@ -1032,7 +1033,7 @@ reclaim:
>  	if (!radix_tree_delete(&pag->pag_ici_root,
>  				XFS_INO_TO_AGINO(ip->i_mount, ino)))
>  		ASSERT(0);
> -	__xfs_inode_clear_reclaim(pag, ip);
> +	xfs_perag_clear_reclaim_tag(pag);
>  	spin_unlock(&pag->pag_ici_lock);
>  
>  	/*
> -- 
> 2.7.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/11] xfs: move reclaim tagging functions
  2016-04-13  5:31 ` [PATCH 11/11] xfs: move reclaim tagging functions Dave Chinner
@ 2016-04-14 12:11   ` Brian Foster
  0 siblings, 0 replies; 25+ messages in thread
From: Brian Foster @ 2016-04-14 12:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Apr 13, 2016 at 03:31:32PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Rearrange the inode tagging functions so that they are higher up in
> xfs_cache.c and so there is no need for forward prototypes to be
> defined. This is purely code movement, no other change.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_icache.c | 235 ++++++++++++++++++++++++++--------------------------
>  1 file changed, 116 insertions(+), 119 deletions(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 927e7b0..dab58e1 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -37,8 +37,6 @@
>  #include <linux/kthread.h>
>  #include <linux/freezer.h>
>  
> -STATIC void xfs_inode_clear_reclaim_tag(struct xfs_perag *pag, xfs_ino_t ino);
> -
>  /*
>   * Allocate and initialise an xfs_inode.
>   */
> @@ -172,6 +170,122 @@ xfs_reinit_inode(
>  }
>  
>  /*
> + * Queue a new inode reclaim pass if there are reclaimable inodes and there
> + * isn't a reclaim pass already in progress. By default it runs every 5s based
> + * on the xfs periodic sync default of 30s. Perhaps this should have it's own
> + * tunable, but that can be done if this method proves to be ineffective or too
> + * aggressive.
> + */
> +static void
> +xfs_reclaim_work_queue(
> +	struct xfs_mount        *mp)
> +{
> +
> +	rcu_read_lock();
> +	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_RECLAIM_TAG)) {
> +		queue_delayed_work(mp->m_reclaim_workqueue, &mp->m_reclaim_work,
> +			msecs_to_jiffies(xfs_syncd_centisecs / 6 * 10));
> +	}
> +	rcu_read_unlock();
> +}
> +
> +/*
> + * This is a fast pass over the inode cache to try to get reclaim moving on as
> + * many inodes as possible in a short period of time. It kicks itself every few
> + * seconds, as well as being kicked by the inode cache shrinker when memory
> + * goes low. It scans as quickly as possible avoiding locked inodes or those
> + * already being flushed, and once done schedules a future pass.
> + */
> +void
> +xfs_reclaim_worker(
> +	struct work_struct *work)
> +{
> +	struct xfs_mount *mp = container_of(to_delayed_work(work),
> +					struct xfs_mount, m_reclaim_work);
> +
> +	xfs_reclaim_inodes(mp, SYNC_TRYLOCK);
> +	xfs_reclaim_work_queue(mp);
> +}
> +
> +static void
> +xfs_perag_set_reclaim_tag(
> +	struct xfs_perag	*pag,
> +	xfs_ino_t		ino)
> +{
> +	struct xfs_mount	*mp = pag->pag_mount;
> +
> +	ASSERT(spin_is_locked(&pag->pag_ici_lock));
> +	if (pag->pag_ici_reclaimable++)
> +		return;
> +
> +	/* propagate the reclaim tag up into the perag radix tree */
> +	spin_lock(&mp->m_perag_lock);
> +	radix_tree_tag_set(&mp->m_perag_tree, pag->pag_agno,
> +			   XFS_ICI_RECLAIM_TAG);
> +	spin_unlock(&mp->m_perag_lock);
> +
> +	/* schedule periodic background inode reclaim */
> +	xfs_reclaim_work_queue(mp);
> +
> +	trace_xfs_perag_set_reclaim(mp, pag->pag_agno, -1, _RET_IP_);
> +}
> +
> +static void
> +xfs_perag_clear_reclaim_tag(
> +	struct xfs_perag	*pag)
> +{
> +	struct xfs_mount	*mp = pag->pag_mount;
> +
> +	ASSERT(spin_is_locked(&pag->pag_ici_lock));
> +	if (--pag->pag_ici_reclaimable)
> +		return;
> +
> +	/* clear the reclaim tag from the perag radix tree */
> +	spin_lock(&mp->m_perag_lock);
> +	radix_tree_tag_clear(&mp->m_perag_tree, pag->pag_agno,
> +			     XFS_ICI_RECLAIM_TAG);
> +	spin_unlock(&mp->m_perag_lock);
> +	trace_xfs_perag_clear_reclaim(mp, pag->pag_agno, -1, _RET_IP_);
> +}
> +
> +/*
> + * We set the inode flag atomically with the radix tree tag.
> + * Once we get tag lookups on the radix tree, this inode flag
> + * can go away.
> + */
> +void
> +xfs_inode_set_reclaim_tag(
> +	struct xfs_inode	*ip)
> +{
> +	struct xfs_mount	*mp = ip->i_mount;
> +	struct xfs_perag	*pag;
> +
> +	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
> +	spin_lock(&pag->pag_ici_lock);
> +	spin_lock(&ip->i_flags_lock);
> +
> +	radix_tree_tag_set(&pag->pag_ici_root, XFS_INO_TO_AGINO(mp, ip->i_ino),
> +			   XFS_ICI_RECLAIM_TAG);
> +	xfs_perag_set_reclaim_tag(pag, ip->i_ino);
> +	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
> +
> +	spin_unlock(&ip->i_flags_lock);
> +	spin_unlock(&pag->pag_ici_lock);
> +	xfs_perag_put(pag);
> +}
> +
> +static void
> +xfs_inode_clear_reclaim_tag(
> +	struct xfs_perag	*pag,
> +	xfs_ino_t		ino)
> +{
> +	radix_tree_tag_clear(&pag->pag_ici_root,
> +			     XFS_INO_TO_AGINO(pag->pag_mount, ino),
> +			     XFS_ICI_RECLAIM_TAG);
> +	xfs_perag_clear_reclaim_tag(pag);
> +}
> +
> +/*
>   * Check the validity of the inode we just found it the cache
>   */
>  static int
> @@ -729,123 +843,6 @@ xfs_inode_ag_iterator_tag(
>  }
>  
>  /*
> - * Queue a new inode reclaim pass if there are reclaimable inodes and there
> - * isn't a reclaim pass already in progress. By default it runs every 5s based
> - * on the xfs periodic sync default of 30s. Perhaps this should have it's own
> - * tunable, but that can be done if this method proves to be ineffective or too
> - * aggressive.
> - */
> -static void
> -xfs_reclaim_work_queue(
> -	struct xfs_mount        *mp)
> -{
> -
> -	rcu_read_lock();
> -	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_RECLAIM_TAG)) {
> -		queue_delayed_work(mp->m_reclaim_workqueue, &mp->m_reclaim_work,
> -			msecs_to_jiffies(xfs_syncd_centisecs / 6 * 10));
> -	}
> -	rcu_read_unlock();
> -}
> -
> -/*
> - * This is a fast pass over the inode cache to try to get reclaim moving on as
> - * many inodes as possible in a short period of time. It kicks itself every few
> - * seconds, as well as being kicked by the inode cache shrinker when memory
> - * goes low. It scans as quickly as possible avoiding locked inodes or those
> - * already being flushed, and once done schedules a future pass.
> - */
> -void
> -xfs_reclaim_worker(
> -	struct work_struct *work)
> -{
> -	struct xfs_mount *mp = container_of(to_delayed_work(work),
> -					struct xfs_mount, m_reclaim_work);
> -
> -	xfs_reclaim_inodes(mp, SYNC_TRYLOCK);
> -	xfs_reclaim_work_queue(mp);
> -}
> -
> -static void
> -xfs_perag_set_reclaim_tag(
> -	struct xfs_perag	*pag,
> -	xfs_ino_t		ino)
> -{
> -	struct xfs_mount	*mp = pag->pag_mount;
> -
> -	ASSERT(spin_is_locked(&pag->pag_ici_lock));
> -	if (pag->pag_ici_reclaimable++)
> -		return;
> -
> -	/* propagate the reclaim tag up into the perag radix tree */
> -	spin_lock(&mp->m_perag_lock);
> -	radix_tree_tag_set(&mp->m_perag_tree, pag->pag_agno,
> -			   XFS_ICI_RECLAIM_TAG);
> -	spin_unlock(&mp->m_perag_lock);
> -
> -	/* schedule periodic background inode reclaim */
> -	xfs_reclaim_work_queue(mp);
> -
> -	trace_xfs_perag_set_reclaim(mp, pag->pag_agno, -1, _RET_IP_);
> -}
> -
> -static void
> -xfs_perag_clear_reclaim_tag(
> -	struct xfs_perag	*pag)
> -{
> -	struct xfs_mount	*mp = pag->pag_mount;
> -
> -	ASSERT(spin_is_locked(&pag->pag_ici_lock));
> -	if (--pag->pag_ici_reclaimable)
> -		return;
> -
> -	/* clear the reclaim tag from the perag radix tree */
> -	spin_lock(&mp->m_perag_lock);
> -	radix_tree_tag_clear(&mp->m_perag_tree, pag->pag_agno,
> -			     XFS_ICI_RECLAIM_TAG);
> -	spin_unlock(&mp->m_perag_lock);
> -	trace_xfs_perag_clear_reclaim(mp, pag->pag_agno, -1, _RET_IP_);
> -}
> -
> -/*
> - * We set the inode flag atomically with the radix tree tag.
> - * Once we get tag lookups on the radix tree, this inode flag
> - * can go away.
> - */
> -void
> -xfs_inode_set_reclaim_tag(
> -	struct xfs_inode	*ip)
> -{
> -	struct xfs_mount	*mp = ip->i_mount;
> -	struct xfs_perag	*pag;
> -
> -	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
> -	spin_lock(&pag->pag_ici_lock);
> -	spin_lock(&ip->i_flags_lock);
> -
> -	radix_tree_tag_set(&pag->pag_ici_root, XFS_INO_TO_AGINO(mp, ip->i_ino),
> -			   XFS_ICI_RECLAIM_TAG);
> -	xfs_perag_set_reclaim_tag(pag, ip->i_ino);
> -	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
> -
> -	spin_unlock(&ip->i_flags_lock);
> -	spin_unlock(&pag->pag_ici_lock);
> -	xfs_perag_put(pag);
> -}
> -
> -
> -STATIC void
> -xfs_inode_clear_reclaim_tag(
> -	struct xfs_perag	*pag,
> -	xfs_ino_t		ino)
> -{
> -	radix_tree_tag_clear(&pag->pag_ici_root,
> -			     XFS_INO_TO_AGINO(pag->pag_mount, ino),
> -			     XFS_ICI_RECLAIM_TAG);
> -	xfs_perag_clear_reclaim_tag(pag);
> -}
> -
> -/*
>   * Grab the inode for reclaim exclusively.
>   * Return 0 if we grabbed it, non-zero otherwise.
>   */
> -- 
> 2.7.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 07/11] xfs: mark reclaimed inodes invalid earlier
  2016-04-14 12:10     ` Brian Foster
@ 2016-04-14 23:31       ` Dave Chinner
  2016-04-15 12:46         ` Brian Foster
  0 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2016-04-14 23:31 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Thu, Apr 14, 2016 at 08:10:49AM -0400, Brian Foster wrote:
> On Wed, Apr 13, 2016 at 04:49:00PM +1000, Dave Chinner wrote:
> > On Wed, Apr 13, 2016 at 03:31:28PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > The last thing we do before using call_rcu() on an xfs_inode to be
> > > freed is mark it as invalid. This means there is a window between
> > > when we know for certain that the inode is going to be freed and
> > > when we do actually mark it as "freed".
> > > 
> > > This is important in the context of RCU lookups - we can look up the
> > > inode, find that it is valid, and then use it as such not realising
> > > that it is in the final stages of being freed.
> > > 
> > > As such, mark the inode as being invalid the moment we know it is
> > > going to be reclaimed. This can be done while we still hold the
> > > XFS_ILOCK_EXCL and the flush lock in xfs_inode_reclaim, meaning that
> > > it occurs well before we remove it from the radix tree, and that
> > > the i_flags_lock, the XFS_ILOCK and the inode flush lock all act as
> > > synchronisation points for detecting that an inode is about to go
> > > away.
> > > 
> > > For defensive purposes, this allows us to add a further check to
> > > xfs_iflush_cluster to ensure we skip inodes that are being freed
> > > after we grab the XFS_ILOCK_SHARED and the flush lock - we know that
> > > if the inode number if valid while we have these locks held we know
> > > that it has not progressed through reclaim to the point where it is
> > > clean and is about to be freed.
> > > 
> > > [bfoster: fixed __xfs_inode_clear_reclaim() using ip->i_ino after it
> > > 	  had already been zeroed.]
> > 
> > And, of course, in reordering this I dropped this fix because it was
> > handled by the reworking of tagging code to use pag->pag_agno.
> > 
> > So I've brought that small change forward to this patch (using
> > pag->pag_agno instead of deriving it from the ip->i_ino in
> > __xfs_inode_clear_reclaim()).
> > 
> 
> I don't see any such change in this patch..?
> __xfs_inode_clear_reclaim() still uses ip->i_ino.

I meant that I realised that I'd screwed it up and so I'd changed my
local copy after I sent this.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 07/11] xfs: mark reclaimed inodes invalid earlier
  2016-04-14 23:31       ` Dave Chinner
@ 2016-04-15 12:46         ` Brian Foster
  0 siblings, 0 replies; 25+ messages in thread
From: Brian Foster @ 2016-04-15 12:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Fri, Apr 15, 2016 at 09:31:01AM +1000, Dave Chinner wrote:
> On Thu, Apr 14, 2016 at 08:10:49AM -0400, Brian Foster wrote:
> > On Wed, Apr 13, 2016 at 04:49:00PM +1000, Dave Chinner wrote:
> > > On Wed, Apr 13, 2016 at 03:31:28PM +1000, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > 
> > > > The last thing we do before using call_rcu() on an xfs_inode to be
> > > > freed is mark it as invalid. This means there is a window between
> > > > when we know for certain that the inode is going to be freed and
> > > > when we do actually mark it as "freed".
> > > > 
> > > > This is important in the context of RCU lookups - we can look up the
> > > > inode, find that it is valid, and then use it as such not realising
> > > > that it is in the final stages of being freed.
> > > > 
> > > > As such, mark the inode as being invalid the moment we know it is
> > > > going to be reclaimed. This can be done while we still hold the
> > > > XFS_ILOCK_EXCL and the flush lock in xfs_inode_reclaim, meaning that
> > > > it occurs well before we remove it from the radix tree, and that
> > > > the i_flags_lock, the XFS_ILOCK and the inode flush lock all act as
> > > > synchronisation points for detecting that an inode is about to go
> > > > away.
> > > > 
> > > > For defensive purposes, this allows us to add a further check to
> > > > xfs_iflush_cluster to ensure we skip inodes that are being freed
> > > > after we grab the XFS_ILOCK_SHARED and the flush lock - we know that
> > > > if the inode number if valid while we have these locks held we know
> > > > that it has not progressed through reclaim to the point where it is
> > > > clean and is about to be freed.
> > > > 
> > > > [bfoster: fixed __xfs_inode_clear_reclaim() using ip->i_ino after it
> > > > 	  had already been zeroed.]
> > > 
> > > And, of course, in reordering this I dropped this fix because it was
> > > handled by the reworking of tagging code to use pag->pag_agno.
> > > 
> > > So I've brought that small change forward to this patch (using
> > > pag->pag_agno instead of deriving it from the ip->i_ino in
> > > __xfs_inode_clear_reclaim()).
> > > 
> > 
> > I don't see any such change in this patch..?
> > __xfs_inode_clear_reclaim() still uses ip->i_ino.
> 
> I meant that I realised that I'd screwed it up and so I'd changed my
> local copy after I sent this.
> 

Oh, Ok. Anyways, the long running reproducer I had running (against v2 +
the reclaim tag hunk that I had posted) completed without an error.
That's the first time I've seen that, so I think it's safe to confirm
this series fixes the original problem:

Tested-by: Brian Foster <bfoster@redhat.com>

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 10/11] xfs: simplify inode reclaim tagging interfaces
  2016-04-13  5:31 ` [PATCH 10/11] xfs: simplify inode reclaim tagging interfaces Dave Chinner
  2016-04-14 12:10   ` Brian Foster
@ 2016-06-29  4:21   ` Darrick J. Wong
  1 sibling, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2016-06-29  4:21 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Apr 13, 2016 at 03:31:31PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Inode radix tree tagging for reclaim passes a lot of unnecessary
> variables around. Over time the xfs-perag has grown a xfs_mount
> backpointer, and an internal agno so we don't need to pass other
> variables into the tagging functions to supply this information.
> 
> Rework the functions to pass the minimal variable set required
> and simplify the internal logic and flow.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_icache.c | 97 +++++++++++++++++++++++++++--------------------------
>  1 file changed, 49 insertions(+), 48 deletions(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index a60db43..927e7b0 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -37,8 +37,7 @@
>  #include <linux/kthread.h>
>  #include <linux/freezer.h>
>  
> -STATIC void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp,
> -				struct xfs_perag *pag, struct xfs_inode *ip);
> +STATIC void xfs_inode_clear_reclaim_tag(struct xfs_perag *pag, xfs_ino_t ino);
>  
>  /*
>   * Allocate and initialise an xfs_inode.
> @@ -271,7 +270,7 @@ xfs_iget_cache_hit(
>  		 */
>  		ip->i_flags &= ~XFS_IRECLAIM_RESET_FLAGS;
>  		ip->i_flags |= XFS_INEW;
> -		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
> +		xfs_inode_clear_reclaim_tag(pag, ip->i_ino);
>  		inode->i_state = I_NEW;
>  
>  		ASSERT(!rwsem_is_locked(&ip->i_iolock.mr_lock));
> @@ -768,29 +767,44 @@ xfs_reclaim_worker(
>  }
>  
>  static void
> -__xfs_inode_set_reclaim_tag(
> +xfs_perag_set_reclaim_tag(
>  	struct xfs_perag	*pag,
> -	struct xfs_inode	*ip)
> +	xfs_ino_t		ino)
>  {
> -	radix_tree_tag_set(&pag->pag_ici_root,
> -			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
> +	struct xfs_mount	*mp = pag->pag_mount;
> +
> +	ASSERT(spin_is_locked(&pag->pag_ici_lock));

Hmm.  On (only) ppc64, I see the following splat on umount:

[ 1598.584616] XFS: Assertion failed: spin_is_locked(&pag->pag_ici_lock), file: /raid/home/djwong/cdev/work/linux-mcsum/fs/xfs/xfs_icache.c, line: 216
[ 1598.584799] ------------[ cut here ]------------
[ 1598.585262] WARNING: CPU: 0 PID: 9144 at /raid/home/djwong/cdev/work/linux-mcsum/fs/xfs/xfs_message.c:105 .asswarn+0x48/0x70 [xfs]
[ 1598.585318] Modules linked in: xfs libcrc32c sg sch_fq_codel autofs4 hid_generic usbhid hid xhci_pci xhci_hcd sd_mod
[ 1598.585518] CPU: 0 PID: 9144 Comm: umount Tainted: G        W       4.7.0-rc5-pcsum #1
[ 1598.585566] task: c000000069621bd0 ti: c000000069668000 task.ti: c000000069668000
[ 1598.585608] NIP: d000000001942658 LR: d000000001942658 CTR: c0000000003e4e30
[ 1598.585651] REGS: c00000006966b300 TRAP: 0700   Tainted: G        W        (4.7.0-rc5-pcsum)
[ 1598.585691] MSR: 800000000282b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>  CR: 28242448  XER: 00000000
[ 1598.585854] CFAR: d000000001942374 SOFTE: 1 
GPR00: d000000001942658 c00000006966b580 d0000000019ff908 ffffffffffffffea 
GPR04: 000000000000000a c00000006966b390 00000000000000d1 ffffffffffffffc0 
GPR08: 0000000000000000 0000000000000021 00000000ffffffd1 d0000000019b01d0 
GPR12: c0000000003e4e30 c00000000fff0000 0000000010110000 c00000007922eb80 
GPR16: 0000000040000000 0000000020000000 0000000000000000 c00000007922ebe8 
GPR20: 0000000000000004 c00000007922ebd0 c00000006966b9d0 0000000000000001 
GPR24: 0000000000600081 c00000007922eb80 c000000069668000 c000000069668000 
GPR28: c00000006966b710 d0000000019aef20 d0000000019aee90 00000000000000d8 
[ 1598.586844] NIP [d000000001942658] .asswarn+0x48/0x70 [xfs]
[ 1598.587242] LR [d000000001942658] .asswarn+0x48/0x70 [xfs]
[ 1598.587281] Call Trace:
[ 1598.587669] [c00000006966b580] [d000000001942658] .asswarn+0x48/0x70 [xfs] (unreliable)
[ 1598.588098] [c00000006966b610] [d00000000192d308] .xfs_perag_clear_reclaim_tag+0x48/0x1a0 [xfs]
[ 1598.588597] [c00000006966b6a0] [d00000000192d7a8] .xfs_reclaim_inode+0x2d8/0x600 [xfs]
[ 1598.589027] [c00000006966b760] [d00000000192dc8c] .xfs_reclaim_inodes_ag+0x1bc/0x390 [xfs]
[ 1598.589451] [c00000006966b960] [d00000000192ff28] .xfs_reclaim_inodes+0x38/0x50 [xfs]
[ 1598.589875] [c00000006966b9f0] [d000000001944754] .xfs_unmountfs+0xb4/0x280 [xfs]
[ 1598.590297] [c00000006966ba90] [d00000000194cd48] .xfs_fs_put_super+0x48/0xa0 [xfs]
[ 1598.590382] [c00000006966bb10] [c0000000002527f8] .generic_shutdown_super+0xb8/0x180
[ 1598.590455] [c00000006966bb90] [c000000000252e18] .kill_block_super+0x38/0xa0
[ 1598.590521] [c00000006966bc20] [c000000000253104] .deactivate_locked_super+0xa4/0xf0
[ 1598.590594] [c00000006966bca0] [c00000000027c73c] .cleanup_mnt+0x5c/0xa0
[ 1598.590671] [c00000006966bd20] [c0000000000a44bc] .task_work_run+0xbc/0x100
[ 1598.590748] [c00000006966bdb0] [c000000000015598] .do_notify_resume+0xf8/0x100
[ 1598.590826] [c00000006966be30] [c0000000000094b0] .ret_from_except_lite+0x5c/0x60
[ 1598.590884] Instruction dump:
[ 1598.590942] 7c7d1b78 7c9e2378 7cbf2b78 48000008 e8410028 3d220000 38600000 e889c190 
[ 1598.591109] 7fa5eb78 7fc6f378 7fe7fb78 4bfffc5d <0fe00000> 38210090 e8010010 eba1ffe8 
[ 1598.591281] ---[ end trace dea443eeb71bd63f ]---

Not sure if it's the rmap/reflink patchset that's to blame, or what.

It's kinda late, so I'm going to set this aside for debugging tomorrow.

(x64/i386/armhf seem fine for whatever reason)

--D

> +	if (pag->pag_ici_reclaimable++)
> +		return;
> +
> +	/* propagate the reclaim tag up into the perag radix tree */
> +	spin_lock(&mp->m_perag_lock);
> +	radix_tree_tag_set(&mp->m_perag_tree, pag->pag_agno,
>  			   XFS_ICI_RECLAIM_TAG);
> +	spin_unlock(&mp->m_perag_lock);
>  
> -	if (!pag->pag_ici_reclaimable) {
> -		/* propagate the reclaim tag up into the perag radix tree */
> -		spin_lock(&ip->i_mount->m_perag_lock);
> -		radix_tree_tag_set(&ip->i_mount->m_perag_tree,
> -				XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino),
> -				XFS_ICI_RECLAIM_TAG);
> -		spin_unlock(&ip->i_mount->m_perag_lock);
> +	/* schedule periodic background inode reclaim */
> +	xfs_reclaim_work_queue(mp);
>  
> -		/* schedule periodic background inode reclaim */
> -		xfs_reclaim_work_queue(ip->i_mount);
> +	trace_xfs_perag_set_reclaim(mp, pag->pag_agno, -1, _RET_IP_);
> +}
>  
> -		trace_xfs_perag_set_reclaim(ip->i_mount, pag->pag_agno,
> -							-1, _RET_IP_);
> -	}
> -	pag->pag_ici_reclaimable++;
> +static void
> +xfs_perag_clear_reclaim_tag(
> +	struct xfs_perag	*pag)
> +{
> +	struct xfs_mount	*mp = pag->pag_mount;
> +
> +	ASSERT(spin_is_locked(&pag->pag_ici_lock));
> +	if (--pag->pag_ici_reclaimable)
> +		return;
> +
> +	/* clear the reclaim tag from the perag radix tree */
> +	spin_lock(&mp->m_perag_lock);
> +	radix_tree_tag_clear(&mp->m_perag_tree, pag->pag_agno,
> +			     XFS_ICI_RECLAIM_TAG);
> +	spin_unlock(&mp->m_perag_lock);
> +	trace_xfs_perag_clear_reclaim(mp, pag->pag_agno, -1, _RET_IP_);
>  }
>  
>  /*
> @@ -800,48 +814,35 @@ __xfs_inode_set_reclaim_tag(
>   */
>  void
>  xfs_inode_set_reclaim_tag(
> -	xfs_inode_t	*ip)
> +	struct xfs_inode	*ip)
>  {
> -	struct xfs_mount *mp = ip->i_mount;
> -	struct xfs_perag *pag;
> +	struct xfs_mount	*mp = ip->i_mount;
> +	struct xfs_perag	*pag;
>  
>  	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
>  	spin_lock(&pag->pag_ici_lock);
>  	spin_lock(&ip->i_flags_lock);
> -	__xfs_inode_set_reclaim_tag(pag, ip);
> +
> +	radix_tree_tag_set(&pag->pag_ici_root, XFS_INO_TO_AGINO(mp, ip->i_ino),
> +			   XFS_ICI_RECLAIM_TAG);
> +	xfs_perag_set_reclaim_tag(pag, ip->i_ino);
>  	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
> +
>  	spin_unlock(&ip->i_flags_lock);
>  	spin_unlock(&pag->pag_ici_lock);
>  	xfs_perag_put(pag);
>  }
>  
> -STATIC void
> -__xfs_inode_clear_reclaim(
> -	xfs_perag_t	*pag,
> -	xfs_inode_t	*ip)
> -{
> -	pag->pag_ici_reclaimable--;
> -	if (!pag->pag_ici_reclaimable) {
> -		/* clear the reclaim tag from the perag radix tree */
> -		spin_lock(&ip->i_mount->m_perag_lock);
> -		radix_tree_tag_clear(&ip->i_mount->m_perag_tree,
> -				XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino),
> -				XFS_ICI_RECLAIM_TAG);
> -		spin_unlock(&ip->i_mount->m_perag_lock);
> -		trace_xfs_perag_clear_reclaim(ip->i_mount, pag->pag_agno,
> -							-1, _RET_IP_);
> -	}
> -}
>  
>  STATIC void
> -__xfs_inode_clear_reclaim_tag(
> -	xfs_mount_t	*mp,
> -	xfs_perag_t	*pag,
> -	xfs_inode_t	*ip)
> +xfs_inode_clear_reclaim_tag(
> +	struct xfs_perag	*pag,
> +	xfs_ino_t		ino)
>  {
>  	radix_tree_tag_clear(&pag->pag_ici_root,
> -			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
> -	__xfs_inode_clear_reclaim(pag, ip);
> +			     XFS_INO_TO_AGINO(pag->pag_mount, ino),
> +			     XFS_ICI_RECLAIM_TAG);
> +	xfs_perag_clear_reclaim_tag(pag);
>  }
>  
>  /*
> @@ -1032,7 +1033,7 @@ reclaim:
>  	if (!radix_tree_delete(&pag->pag_ici_root,
>  				XFS_INO_TO_AGINO(ip->i_mount, ino)))
>  		ASSERT(0);
> -	__xfs_inode_clear_reclaim(pag, ip);
> +	xfs_perag_clear_reclaim_tag(pag);
>  	spin_unlock(&pag->pag_ici_lock);
>  
>  	/*
> -- 
> 2.7.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-06-29  4:21 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-13  5:31 [PATCH 00/11 v3] xfs: inode reclaim vs the world Dave Chinner
2016-04-13  5:31 ` [PATCH 01/11] xfs: we don't need no steekin ->evict_inode Dave Chinner
2016-04-13 16:41   ` Christoph Hellwig
2016-04-13 21:20     ` Dave Chinner
2016-04-14 12:10   ` Brian Foster
2016-04-13  5:31 ` [PATCH 02/11] xfs: xfs_iflush_cluster fails to abort on error Dave Chinner
2016-04-13 16:41   ` Christoph Hellwig
2016-04-13  5:31 ` [PATCH 03/11] xfs: fix inode validity check in xfs_iflush_cluster Dave Chinner
2016-04-13  5:31 ` [PATCH 04/11] xfs: skip stale inodes " Dave Chinner
2016-04-13  5:31 ` [PATCH 05/11] xfs: optimise xfs_iext_destroy Dave Chinner
2016-04-13 16:45   ` Christoph Hellwig
2016-04-13  5:31 ` [PATCH 06/11] xfs: xfs_inode_free() isn't RCU safe Dave Chinner
2016-04-13  5:31 ` [PATCH 07/11] xfs: mark reclaimed inodes invalid earlier Dave Chinner
2016-04-13  6:49   ` Dave Chinner
2016-04-14 12:10     ` Brian Foster
2016-04-14 23:31       ` Dave Chinner
2016-04-15 12:46         ` Brian Foster
2016-04-13  5:31 ` [PATCH 08/11] xfs: xfs_iflush_cluster has range issues Dave Chinner
2016-04-13  5:31 ` [PATCH 09/11] xfs: rename variables in xfs_iflush_cluster for clarity Dave Chinner
2016-04-13  5:31 ` [PATCH 10/11] xfs: simplify inode reclaim tagging interfaces Dave Chinner
2016-04-14 12:10   ` Brian Foster
2016-06-29  4:21   ` Darrick J. Wong
2016-04-13  5:31 ` [PATCH 11/11] xfs: move reclaim tagging functions Dave Chinner
2016-04-14 12:11   ` Brian Foster
2016-04-13 15:38 ` [PATCH 00/11 v3] xfs: inode reclaim vs the world Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.