All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] XFS iget fixes
@ 2009-08-04 14:15 ` Christoph Hellwig
  0 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2009-08-04 14:15 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel, david

These should fix the fallout from the xfs_inode/VFS inode merge.  The first
patch contains just locking changes to the iget hit patch and is conceptually
unrelated to the rest, but required for them to apply.

The other three patches deal basically with the problem that we really
don't want to delete the inode that won the race and got added to the inode
cache radix tree when we free the inode that lost it.  To do so we need
to change the hooks into inode.c that were exported only for this kind of
use in XFS.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 0/4] XFS iget fixes
@ 2009-08-04 14:15 ` Christoph Hellwig
  0 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2009-08-04 14:15 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel

These should fix the fallout from the xfs_inode/VFS inode merge.  The first
patch contains just locking changes to the iget hit patch and is conceptually
unrelated to the rest, but required for them to apply.

The other three patches deal basically with the problem that we really
don't want to delete the inode that won the race and got added to the inode
cache radix tree when we free the inode that lost it.  To do so we need
to change the hooks into inode.c that were exported only for this kind of
use in XFS.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/4] xfs: fix locking in xfs_iget_cache_hit
  2009-08-04 14:15 ` Christoph Hellwig
@ 2009-08-04 14:15   ` Christoph Hellwig
  -1 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2009-08-04 14:15 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel, david

[-- Attachment #1: xfs-fix-xfs_iget_cache_hit-locking --]
[-- Type: text/plain, Size: 8123 bytes --]

The locking in xfs_iget_cache_hit currently has numerous problems:

 - we clear the reclaim tag without i_flags_lock which protects modifications
   to it
 - we call inode_init_always which can sleep with pag_ici_lock held
 - we acquire and drop i_flags_lock a lot and thus provide no consistency
   between the various flags we set/clear under it

This patch fixes all that with a major revamp of the locking in the function.
The new version acquires i_flags_lock early and only drops it once we need to
call into inode_init_always or before calling xfs_ilock.

This patch fixes a bug seen in the wild where we race modifying the reclaim tag.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/xfs/xfs_iget.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-01 23:20:54.775080254 +0200
+++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-01 23:20:54.807080483 +0200
@@ -133,80 +133,90 @@ xfs_iget_cache_hit(
 	int			flags,
 	int			lock_flags) __releases(pag->pag_ici_lock)
 {
+	struct inode		*inode = VFS_I(ip);
 	struct xfs_mount	*mp = ip->i_mount;
-	int			error = EAGAIN;
+	int			error;
+
+	spin_lock(&ip->i_flags_lock);
 
 	/*
-	 * If INEW is set this inode is being set up
-	 * If IRECLAIM is set this inode is being torn down
-	 * Pause and try again.
+	 * This inode is being torn down, pause and try again.
 	 */
-	if (xfs_iflags_test(ip, (XFS_INEW|XFS_IRECLAIM))) {
+	if (ip->i_flags & XFS_IRECLAIM) {
 		XFS_STATS_INC(xs_ig_frecycle);
+		error = EAGAIN;
 		goto out_error;
 	}
 
-	/* If IRECLAIMABLE is set, we've torn down the vfs inode part */
-	if (xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
+	/*
+	 * If we are racing with another cache hit that is currently recycling
+	 * this inode out of the XFS_IRECLAIMABLE state, wait for the
+	 * initialisation to complete before continuing.
+	 */
+	if (ip->i_flags & XFS_INEW) {
+		spin_unlock(&ip->i_flags_lock);
+		read_unlock(&pag->pag_ici_lock);
 
-		/*
-		 * If lookup is racing with unlink, then we should return an
-		 * error immediately so we don't remove it from the reclaim
-		 * list and potentially leak the inode.
-		 */
-		if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) {
-			error = ENOENT;
-			goto out_error;
-		}
+		XFS_STATS_INC(xs_ig_frecycle);
+		wait_on_inode(inode);
+		return EAGAIN;
+	}
 
+	/*
+	 * If lookup is racing with unlink, then we should return an
+	 * error immediately so we don't remove it from the reclaim
+	 * list and potentially leak the inode.
+	 */
+	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
+		error = ENOENT;
+		goto out_error;
+	}
+
+	/*
+	 * If IRECLAIMABLE is set, we've torn down the VFS inode already.
+	 * Need to carefully get it back into useable state.
+	 */
+	if (ip->i_flags & XFS_IRECLAIMABLE) {
 		xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
 
 		/*
-		 * We need to re-initialise the VFS inode as it has been
-		 * 'freed' by the VFS. Do this here so we can deal with
-		 * errors cleanly, then tag it so it can be set up correctly
-		 * later.
+		 * We need to set XFS_INEW atomically with clearing the
+		 * reclaimable tag so that we do have an indicator of the
+		 * inode still being initialized.
 		 */
-		if (!inode_init_always(mp->m_super, VFS_I(ip))) {
+		ip->i_flags |= XFS_INEW;
+		__xfs_inode_clear_reclaim_tag(pag, ip);
+
+		spin_unlock(&ip->i_flags_lock);
+		read_unlock(&pag->pag_ici_lock);
+
+		if (unlikely(!inode_init_always(mp->m_super, inode))) {
+			/*
+			 * Re-initializing the inode failed, and we are in deep
+			 * trouble.  Try to re-add it to the reclaim list.
+			 */
+			read_lock(&pag->pag_ici_lock);
+			spin_lock(&ip->i_flags_lock);
+
+			ip->i_flags &= ~XFS_INEW;
+			__xfs_inode_set_reclaim_tag(pag, ip);
+
 			error = ENOMEM;
 			goto out_error;
 		}
-
-		/*
-		 * We must set the XFS_INEW flag before clearing the
-		 * XFS_IRECLAIMABLE flag so that if a racing lookup does
-		 * not find the XFS_IRECLAIMABLE above but has the igrab()
-		 * below succeed we can safely check XFS_INEW to detect
-		 * that this inode is still being initialised.
-		 */
-		xfs_iflags_set(ip, XFS_INEW);
-		xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
-
-		/* clear the radix tree reclaim flag as well. */
-		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
-	} else if (!igrab(VFS_I(ip))) {
+		inode->i_state = I_LOCK|I_NEW;
+	} else {
 		/* If the VFS inode is being torn down, pause and try again. */
-		XFS_STATS_INC(xs_ig_frecycle);
-		goto out_error;
-	} else if (xfs_iflags_test(ip, XFS_INEW)) {
-		/*
-		 * We are racing with another cache hit that is
-		 * currently recycling this inode out of the XFS_IRECLAIMABLE
-		 * state. Wait for the initialisation to complete before
-		 * continuing.
-		 */
-		wait_on_inode(VFS_I(ip));
-	}
+		if (!igrab(inode)) {
+			error = EAGAIN;
+			goto out_error;
+		}
 
-	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
-		error = ENOENT;
-		iput(VFS_I(ip));
-		goto out_error;
+		/* We've got a live one. */
+		spin_unlock(&ip->i_flags_lock);
+		read_unlock(&pag->pag_ici_lock);
 	}
 
-	/* We've got a live one. */
-	read_unlock(&pag->pag_ici_lock);
-
 	if (lock_flags != 0)
 		xfs_ilock(ip, lock_flags);
 
@@ -216,6 +226,7 @@ xfs_iget_cache_hit(
 	return 0;
 
 out_error:
+	spin_unlock(&ip->i_flags_lock);
 	read_unlock(&pag->pag_ici_lock);
 	return error;
 }
Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.c	2009-08-01 23:20:31.441330970 +0200
+++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.c	2009-08-01 23:20:54.807080483 +0200
@@ -708,6 +708,17 @@ xfs_reclaim_inode(
 	return 0;
 }
 
+void
+__xfs_inode_set_reclaim_tag(
+	struct xfs_perag	*pag,
+	struct xfs_inode	*ip)
+{
+	radix_tree_tag_set(&pag->pag_ici_root,
+			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
+			   XFS_ICI_RECLAIM_TAG);
+	ip->i_flags |= XFS_IRECLAIMABLE;
+}
+
 /*
  * We set the inode flag atomically with the radix tree tag.
  * Once we get tag lookups on the radix tree, this inode flag
@@ -722,9 +733,7 @@ xfs_inode_set_reclaim_tag(
 
 	read_lock(&pag->pag_ici_lock);
 	spin_lock(&ip->i_flags_lock);
-	radix_tree_tag_set(&pag->pag_ici_root,
-			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
-	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
+	__xfs_inode_set_reclaim_tag(pag, ip);
 	spin_unlock(&ip->i_flags_lock);
 	read_unlock(&pag->pag_ici_lock);
 	xfs_put_perag(mp, pag);
@@ -732,27 +741,13 @@ xfs_inode_set_reclaim_tag(
 
 void
 __xfs_inode_clear_reclaim_tag(
-	xfs_mount_t	*mp,
-	xfs_perag_t	*pag,
-	xfs_inode_t	*ip)
+	struct xfs_perag	*pag,
+	struct xfs_inode	*ip)
 {
+	ip->i_flags &= ~XFS_IRECLAIMABLE;
 	radix_tree_tag_clear(&pag->pag_ici_root,
-			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
-}
-
-void
-xfs_inode_clear_reclaim_tag(
-	xfs_inode_t	*ip)
-{
-	xfs_mount_t	*mp = ip->i_mount;
-	xfs_perag_t	*pag = xfs_get_perag(mp, ip->i_ino);
-
-	read_lock(&pag->pag_ici_lock);
-	spin_lock(&ip->i_flags_lock);
-	__xfs_inode_clear_reclaim_tag(mp, pag, ip);
-	spin_unlock(&ip->i_flags_lock);
-	read_unlock(&pag->pag_ici_lock);
-	xfs_put_perag(mp, pag);
+			     XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
+			     XFS_ICI_RECLAIM_TAG);
 }
 
 STATIC int
Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.h
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.h	2009-08-01 23:20:31.449329683 +0200
+++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.h	2009-08-01 23:20:54.808079772 +0200
@@ -48,9 +48,8 @@ int xfs_reclaim_inode(struct xfs_inode *
 int xfs_reclaim_inodes(struct xfs_mount *mp, int mode);
 
 void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
-void xfs_inode_clear_reclaim_tag(struct xfs_inode *ip);
-void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp, struct xfs_perag *pag,
-				struct xfs_inode *ip);
+void __xfs_inode_set_reclaim_tag(struct xfs_perag *pag, struct xfs_inode *ip);
+void __xfs_inode_clear_reclaim_tag(struct xfs_perag *pag, struct xfs_inode *ip);
 
 int xfs_sync_inode_valid(struct xfs_inode *ip, struct xfs_perag *pag);
 int xfs_inode_ag_iterator(struct xfs_mount *mp,


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/4] xfs: fix locking in xfs_iget_cache_hit
@ 2009-08-04 14:15   ` Christoph Hellwig
  0 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2009-08-04 14:15 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel

[-- Attachment #1: xfs-fix-xfs_iget_cache_hit-locking --]
[-- Type: text/plain, Size: 8244 bytes --]

The locking in xfs_iget_cache_hit currently has numerous problems:

 - we clear the reclaim tag without i_flags_lock which protects modifications
   to it
 - we call inode_init_always which can sleep with pag_ici_lock held
 - we acquire and drop i_flags_lock a lot and thus provide no consistency
   between the various flags we set/clear under it

This patch fixes all that with a major revamp of the locking in the function.
The new version acquires i_flags_lock early and only drops it once we need to
call into inode_init_always or before calling xfs_ilock.

This patch fixes a bug seen in the wild where we race modifying the reclaim tag.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/xfs/xfs_iget.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-01 23:20:54.775080254 +0200
+++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-01 23:20:54.807080483 +0200
@@ -133,80 +133,90 @@ xfs_iget_cache_hit(
 	int			flags,
 	int			lock_flags) __releases(pag->pag_ici_lock)
 {
+	struct inode		*inode = VFS_I(ip);
 	struct xfs_mount	*mp = ip->i_mount;
-	int			error = EAGAIN;
+	int			error;
+
+	spin_lock(&ip->i_flags_lock);
 
 	/*
-	 * If INEW is set this inode is being set up
-	 * If IRECLAIM is set this inode is being torn down
-	 * Pause and try again.
+	 * This inode is being torn down, pause and try again.
 	 */
-	if (xfs_iflags_test(ip, (XFS_INEW|XFS_IRECLAIM))) {
+	if (ip->i_flags & XFS_IRECLAIM) {
 		XFS_STATS_INC(xs_ig_frecycle);
+		error = EAGAIN;
 		goto out_error;
 	}
 
-	/* If IRECLAIMABLE is set, we've torn down the vfs inode part */
-	if (xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
+	/*
+	 * If we are racing with another cache hit that is currently recycling
+	 * this inode out of the XFS_IRECLAIMABLE state, wait for the
+	 * initialisation to complete before continuing.
+	 */
+	if (ip->i_flags & XFS_INEW) {
+		spin_unlock(&ip->i_flags_lock);
+		read_unlock(&pag->pag_ici_lock);
 
-		/*
-		 * If lookup is racing with unlink, then we should return an
-		 * error immediately so we don't remove it from the reclaim
-		 * list and potentially leak the inode.
-		 */
-		if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) {
-			error = ENOENT;
-			goto out_error;
-		}
+		XFS_STATS_INC(xs_ig_frecycle);
+		wait_on_inode(inode);
+		return EAGAIN;
+	}
 
+	/*
+	 * If lookup is racing with unlink, then we should return an
+	 * error immediately so we don't remove it from the reclaim
+	 * list and potentially leak the inode.
+	 */
+	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
+		error = ENOENT;
+		goto out_error;
+	}
+
+	/*
+	 * If IRECLAIMABLE is set, we've torn down the VFS inode already.
+	 * Need to carefully get it back into useable state.
+	 */
+	if (ip->i_flags & XFS_IRECLAIMABLE) {
 		xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
 
 		/*
-		 * We need to re-initialise the VFS inode as it has been
-		 * 'freed' by the VFS. Do this here so we can deal with
-		 * errors cleanly, then tag it so it can be set up correctly
-		 * later.
+		 * We need to set XFS_INEW atomically with clearing the
+		 * reclaimable tag so that we do have an indicator of the
+		 * inode still being initialized.
 		 */
-		if (!inode_init_always(mp->m_super, VFS_I(ip))) {
+		ip->i_flags |= XFS_INEW;
+		__xfs_inode_clear_reclaim_tag(pag, ip);
+
+		spin_unlock(&ip->i_flags_lock);
+		read_unlock(&pag->pag_ici_lock);
+
+		if (unlikely(!inode_init_always(mp->m_super, inode))) {
+			/*
+			 * Re-initializing the inode failed, and we are in deep
+			 * trouble.  Try to re-add it to the reclaim list.
+			 */
+			read_lock(&pag->pag_ici_lock);
+			spin_lock(&ip->i_flags_lock);
+
+			ip->i_flags &= ~XFS_INEW;
+			__xfs_inode_set_reclaim_tag(pag, ip);
+
 			error = ENOMEM;
 			goto out_error;
 		}
-
-		/*
-		 * We must set the XFS_INEW flag before clearing the
-		 * XFS_IRECLAIMABLE flag so that if a racing lookup does
-		 * not find the XFS_IRECLAIMABLE above but has the igrab()
-		 * below succeed we can safely check XFS_INEW to detect
-		 * that this inode is still being initialised.
-		 */
-		xfs_iflags_set(ip, XFS_INEW);
-		xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
-
-		/* clear the radix tree reclaim flag as well. */
-		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
-	} else if (!igrab(VFS_I(ip))) {
+		inode->i_state = I_LOCK|I_NEW;
+	} else {
 		/* If the VFS inode is being torn down, pause and try again. */
-		XFS_STATS_INC(xs_ig_frecycle);
-		goto out_error;
-	} else if (xfs_iflags_test(ip, XFS_INEW)) {
-		/*
-		 * We are racing with another cache hit that is
-		 * currently recycling this inode out of the XFS_IRECLAIMABLE
-		 * state. Wait for the initialisation to complete before
-		 * continuing.
-		 */
-		wait_on_inode(VFS_I(ip));
-	}
+		if (!igrab(inode)) {
+			error = EAGAIN;
+			goto out_error;
+		}
 
-	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
-		error = ENOENT;
-		iput(VFS_I(ip));
-		goto out_error;
+		/* We've got a live one. */
+		spin_unlock(&ip->i_flags_lock);
+		read_unlock(&pag->pag_ici_lock);
 	}
 
-	/* We've got a live one. */
-	read_unlock(&pag->pag_ici_lock);
-
 	if (lock_flags != 0)
 		xfs_ilock(ip, lock_flags);
 
@@ -216,6 +226,7 @@ xfs_iget_cache_hit(
 	return 0;
 
 out_error:
+	spin_unlock(&ip->i_flags_lock);
 	read_unlock(&pag->pag_ici_lock);
 	return error;
 }
Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.c	2009-08-01 23:20:31.441330970 +0200
+++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.c	2009-08-01 23:20:54.807080483 +0200
@@ -708,6 +708,17 @@ xfs_reclaim_inode(
 	return 0;
 }
 
+void
+__xfs_inode_set_reclaim_tag(
+	struct xfs_perag	*pag,
+	struct xfs_inode	*ip)
+{
+	radix_tree_tag_set(&pag->pag_ici_root,
+			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
+			   XFS_ICI_RECLAIM_TAG);
+	ip->i_flags |= XFS_IRECLAIMABLE;
+}
+
 /*
  * We set the inode flag atomically with the radix tree tag.
  * Once we get tag lookups on the radix tree, this inode flag
@@ -722,9 +733,7 @@ xfs_inode_set_reclaim_tag(
 
 	read_lock(&pag->pag_ici_lock);
 	spin_lock(&ip->i_flags_lock);
-	radix_tree_tag_set(&pag->pag_ici_root,
-			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
-	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
+	__xfs_inode_set_reclaim_tag(pag, ip);
 	spin_unlock(&ip->i_flags_lock);
 	read_unlock(&pag->pag_ici_lock);
 	xfs_put_perag(mp, pag);
@@ -732,27 +741,13 @@ xfs_inode_set_reclaim_tag(
 
 void
 __xfs_inode_clear_reclaim_tag(
-	xfs_mount_t	*mp,
-	xfs_perag_t	*pag,
-	xfs_inode_t	*ip)
+	struct xfs_perag	*pag,
+	struct xfs_inode	*ip)
 {
+	ip->i_flags &= ~XFS_IRECLAIMABLE;
 	radix_tree_tag_clear(&pag->pag_ici_root,
-			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
-}
-
-void
-xfs_inode_clear_reclaim_tag(
-	xfs_inode_t	*ip)
-{
-	xfs_mount_t	*mp = ip->i_mount;
-	xfs_perag_t	*pag = xfs_get_perag(mp, ip->i_ino);
-
-	read_lock(&pag->pag_ici_lock);
-	spin_lock(&ip->i_flags_lock);
-	__xfs_inode_clear_reclaim_tag(mp, pag, ip);
-	spin_unlock(&ip->i_flags_lock);
-	read_unlock(&pag->pag_ici_lock);
-	xfs_put_perag(mp, pag);
+			     XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
+			     XFS_ICI_RECLAIM_TAG);
 }
 
 STATIC int
Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.h
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.h	2009-08-01 23:20:31.449329683 +0200
+++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.h	2009-08-01 23:20:54.808079772 +0200
@@ -48,9 +48,8 @@ int xfs_reclaim_inode(struct xfs_inode *
 int xfs_reclaim_inodes(struct xfs_mount *mp, int mode);
 
 void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
-void xfs_inode_clear_reclaim_tag(struct xfs_inode *ip);
-void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp, struct xfs_perag *pag,
-				struct xfs_inode *ip);
+void __xfs_inode_set_reclaim_tag(struct xfs_perag *pag, struct xfs_inode *ip);
+void __xfs_inode_clear_reclaim_tag(struct xfs_perag *pag, struct xfs_inode *ip);
 
 int xfs_sync_inode_valid(struct xfs_inode *ip, struct xfs_perag *pag);
 int xfs_inode_ag_iterator(struct xfs_mount *mp,

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 2/4] fix inode_init_always calling convention
  2009-08-04 14:15 ` Christoph Hellwig
@ 2009-08-04 14:15   ` Christoph Hellwig
  -1 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2009-08-04 14:15 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel, david

[-- Attachment #1: change-inode_init_always --]
[-- Type: text/plain, Size: 4896 bytes --]

Currently inode_init_always calls into ->destroy_inode if the additional
initialization fails.  That's not only counter-intuitive because
inode_init_always did not allocate the inode structure, but in case of
XFS it's actively harmful as ->destroy_inode might delete the inode from
a radix-tree that has never been added.  This in turn might end up
deleting the inode for the same inum that has been instanciated by
another process and cause lots of cause subtile problems.

Also in the case of re-initializing a reclaimable inode in XFS it would
free an inode we still want to keep alive.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/inode.c
===================================================================
--- linux-2.6.orig/fs/inode.c	2009-08-03 01:16:04.254556370 +0200
+++ linux-2.6/fs/inode.c	2009-08-03 01:23:11.135532251 +0200
@@ -120,12 +120,11 @@ static void wake_up_inode(struct inode *
  * These are initializations that need to be done on every inode
  * allocation as the fields are not initialised by slab allocation.
  */
-struct inode *inode_init_always(struct super_block *sb, struct inode *inode)
+int inode_init_always(struct super_block *sb, struct inode *inode)
 {
 	static const struct address_space_operations empty_aops;
 	static struct inode_operations empty_iops;
 	static const struct file_operations empty_fops;
-
 	struct address_space *const mapping = &inode->i_data;
 
 	inode->i_sb = sb;
@@ -152,7 +151,7 @@ struct inode *inode_init_always(struct s
 	inode->dirtied_when = 0;
 
 	if (security_inode_alloc(inode))
-		goto out_free_inode;
+		goto out;
 
 	/* allocate and initialize an i_integrity */
 	if (ima_inode_alloc(inode))
@@ -198,16 +197,12 @@ struct inode *inode_init_always(struct s
 	inode->i_fsnotify_mask = 0;
 #endif
 
-	return inode;
+	return 0;
 
 out_free_security:
 	security_inode_free(inode);
-out_free_inode:
-	if (inode->i_sb->s_op->destroy_inode)
-		inode->i_sb->s_op->destroy_inode(inode);
-	else
-		kmem_cache_free(inode_cachep, (inode));
-	return NULL;
+out:
+	return -ENOMEM;
 }
 EXPORT_SYMBOL(inode_init_always);
 
@@ -220,9 +215,17 @@ static struct inode *alloc_inode(struct 
 	else
 		inode = kmem_cache_alloc(inode_cachep, GFP_KERNEL);
 
-	if (inode)
-		return inode_init_always(sb, inode);
-	return NULL;
+	if (!inode)
+		return NULL;
+
+	if (unlikely(inode_init_always(sb, inode))) {
+		if (inode->i_sb->s_op->destroy_inode)
+			inode->i_sb->s_op->destroy_inode(inode);
+		else
+			kmem_cache_free(inode_cachep, inode);
+	}
+
+	return inode;
 }
 
 void destroy_inode(struct inode *inode)
Index: linux-2.6/fs/xfs/xfs_iget.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-03 01:16:22.510806794 +0200
+++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-03 01:23:29.878784477 +0200
@@ -64,6 +64,10 @@ xfs_inode_alloc(
 	ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
 	if (!ip)
 		return NULL;
+	if (inode_init_always(mp->m_super, VFS_I(ip))) {
+		kmem_zone_free(xfs_inode_zone, ip);
+		return NULL;
+	}
 
 	ASSERT(atomic_read(&ip->i_iocount) == 0);
 	ASSERT(atomic_read(&ip->i_pincount) == 0);
@@ -105,17 +109,6 @@ xfs_inode_alloc(
 #ifdef XFS_DIR2_TRACE
 	ip->i_dir_trace = ktrace_alloc(XFS_DIR2_KTRACE_SIZE, KM_NOFS);
 #endif
-	/*
-	* Now initialise the VFS inode. We do this after the xfs_inode
-	* initialisation as internal failures will result in ->destroy_inode
-	* being called and that will pass down through the reclaim path and
-	* free the XFS inode. This path requires the XFS inode to already be
-	* initialised. Hence if this call fails, the xfs_inode has already
-	* been freed and we should not reference it at all in the error
-	* handling.
-	*/
-	if (!inode_init_always(mp->m_super, VFS_I(ip)))
-		return NULL;
 
 	/* prevent anyone from using this yet */
 	VFS_I(ip)->i_state = I_NEW|I_LOCK;
@@ -190,7 +183,7 @@ xfs_iget_cache_hit(
 		spin_unlock(&ip->i_flags_lock);
 		read_unlock(&pag->pag_ici_lock);
 
-		if (unlikely(!inode_init_always(mp->m_super, inode))) {
+		if (unlikely(inode_init_always(mp->m_super, inode))) {
 			/*
 			 * Re-initializing the inode failed, and we are in deep
 			 * trouble.  Try to re-add it to the reclaim list.
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2009-08-03 01:16:21.186539128 +0200
+++ linux-2.6/include/linux/fs.h	2009-08-03 01:23:11.131532230 +0200
@@ -2136,7 +2136,7 @@ extern loff_t default_llseek(struct file
 
 extern loff_t vfs_llseek(struct file *file, loff_t offset, int origin);
 
-extern struct inode * inode_init_always(struct super_block *, struct inode *);
+extern int inode_init_always(struct super_block *, struct inode *);
 extern void inode_init_once(struct inode *);
 extern void inode_add_to_lists(struct super_block *, struct inode *);
 extern void iput(struct inode *);


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 2/4] fix inode_init_always calling convention
@ 2009-08-04 14:15   ` Christoph Hellwig
  0 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2009-08-04 14:15 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel

[-- Attachment #1: change-inode_init_always --]
[-- Type: text/plain, Size: 5017 bytes --]

Currently inode_init_always calls into ->destroy_inode if the additional
initialization fails.  That's not only counter-intuitive because
inode_init_always did not allocate the inode structure, but in case of
XFS it's actively harmful as ->destroy_inode might delete the inode from
a radix-tree that has never been added.  This in turn might end up
deleting the inode for the same inum that has been instanciated by
another process and cause lots of cause subtile problems.

Also in the case of re-initializing a reclaimable inode in XFS it would
free an inode we still want to keep alive.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/inode.c
===================================================================
--- linux-2.6.orig/fs/inode.c	2009-08-03 01:16:04.254556370 +0200
+++ linux-2.6/fs/inode.c	2009-08-03 01:23:11.135532251 +0200
@@ -120,12 +120,11 @@ static void wake_up_inode(struct inode *
  * These are initializations that need to be done on every inode
  * allocation as the fields are not initialised by slab allocation.
  */
-struct inode *inode_init_always(struct super_block *sb, struct inode *inode)
+int inode_init_always(struct super_block *sb, struct inode *inode)
 {
 	static const struct address_space_operations empty_aops;
 	static struct inode_operations empty_iops;
 	static const struct file_operations empty_fops;
-
 	struct address_space *const mapping = &inode->i_data;
 
 	inode->i_sb = sb;
@@ -152,7 +151,7 @@ struct inode *inode_init_always(struct s
 	inode->dirtied_when = 0;
 
 	if (security_inode_alloc(inode))
-		goto out_free_inode;
+		goto out;
 
 	/* allocate and initialize an i_integrity */
 	if (ima_inode_alloc(inode))
@@ -198,16 +197,12 @@ struct inode *inode_init_always(struct s
 	inode->i_fsnotify_mask = 0;
 #endif
 
-	return inode;
+	return 0;
 
 out_free_security:
 	security_inode_free(inode);
-out_free_inode:
-	if (inode->i_sb->s_op->destroy_inode)
-		inode->i_sb->s_op->destroy_inode(inode);
-	else
-		kmem_cache_free(inode_cachep, (inode));
-	return NULL;
+out:
+	return -ENOMEM;
 }
 EXPORT_SYMBOL(inode_init_always);
 
@@ -220,9 +215,17 @@ static struct inode *alloc_inode(struct 
 	else
 		inode = kmem_cache_alloc(inode_cachep, GFP_KERNEL);
 
-	if (inode)
-		return inode_init_always(sb, inode);
-	return NULL;
+	if (!inode)
+		return NULL;
+
+	if (unlikely(inode_init_always(sb, inode))) {
+		if (inode->i_sb->s_op->destroy_inode)
+			inode->i_sb->s_op->destroy_inode(inode);
+		else
+			kmem_cache_free(inode_cachep, inode);
+	}
+
+	return inode;
 }
 
 void destroy_inode(struct inode *inode)
Index: linux-2.6/fs/xfs/xfs_iget.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-03 01:16:22.510806794 +0200
+++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-03 01:23:29.878784477 +0200
@@ -64,6 +64,10 @@ xfs_inode_alloc(
 	ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
 	if (!ip)
 		return NULL;
+	if (inode_init_always(mp->m_super, VFS_I(ip))) {
+		kmem_zone_free(xfs_inode_zone, ip);
+		return NULL;
+	}
 
 	ASSERT(atomic_read(&ip->i_iocount) == 0);
 	ASSERT(atomic_read(&ip->i_pincount) == 0);
@@ -105,17 +109,6 @@ xfs_inode_alloc(
 #ifdef XFS_DIR2_TRACE
 	ip->i_dir_trace = ktrace_alloc(XFS_DIR2_KTRACE_SIZE, KM_NOFS);
 #endif
-	/*
-	* Now initialise the VFS inode. We do this after the xfs_inode
-	* initialisation as internal failures will result in ->destroy_inode
-	* being called and that will pass down through the reclaim path and
-	* free the XFS inode. This path requires the XFS inode to already be
-	* initialised. Hence if this call fails, the xfs_inode has already
-	* been freed and we should not reference it at all in the error
-	* handling.
-	*/
-	if (!inode_init_always(mp->m_super, VFS_I(ip)))
-		return NULL;
 
 	/* prevent anyone from using this yet */
 	VFS_I(ip)->i_state = I_NEW|I_LOCK;
@@ -190,7 +183,7 @@ xfs_iget_cache_hit(
 		spin_unlock(&ip->i_flags_lock);
 		read_unlock(&pag->pag_ici_lock);
 
-		if (unlikely(!inode_init_always(mp->m_super, inode))) {
+		if (unlikely(inode_init_always(mp->m_super, inode))) {
 			/*
 			 * Re-initializing the inode failed, and we are in deep
 			 * trouble.  Try to re-add it to the reclaim list.
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2009-08-03 01:16:21.186539128 +0200
+++ linux-2.6/include/linux/fs.h	2009-08-03 01:23:11.131532230 +0200
@@ -2136,7 +2136,7 @@ extern loff_t default_llseek(struct file
 
 extern loff_t vfs_llseek(struct file *file, loff_t offset, int origin);
 
-extern struct inode * inode_init_always(struct super_block *, struct inode *);
+extern int inode_init_always(struct super_block *, struct inode *);
 extern void inode_init_once(struct inode *);
 extern void inode_add_to_lists(struct super_block *, struct inode *);
 extern void iput(struct inode *);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 3/4] add __destroy_inode
  2009-08-04 14:15 ` Christoph Hellwig
@ 2009-08-04 14:15   ` Christoph Hellwig
  -1 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2009-08-04 14:15 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel, david

[-- Attachment #1: add-__destroy_inode --]
[-- Type: text/plain, Size: 2552 bytes --]

When we want to tear down an inode that lost the add to the cache race
in XFS we must not call into ->destroy_inode because that would delete
the inode that won the race from the inode cache radix tree.

This patch provides the __destroy_inode helper needed to fix this,
the actual fix will be in th next patch.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/inode.c
===================================================================
--- linux-2.6.orig/fs/inode.c	2009-08-04 16:01:18.766801320 +0200
+++ linux-2.6/fs/inode.c	2009-08-04 16:01:37.281556243 +0200
@@ -228,7 +228,7 @@ static struct inode *alloc_inode(struct 
 	return inode;
 }
 
-void destroy_inode(struct inode *inode)
+void __destroy_inode(struct inode *inode)
 {
 	BUG_ON(inode_has_buffers(inode));
 	ima_inode_free(inode);
@@ -240,13 +240,17 @@ void destroy_inode(struct inode *inode)
 	if (inode->i_default_acl && inode->i_default_acl != ACL_NOT_CACHED)
 		posix_acl_release(inode->i_default_acl);
 #endif
+}
+EXPORT_SYMBOL(__destroy_inode);
+
+void destroy_inode(struct inode *inode)
+{
+	__destroy_inode(inode);
 	if (inode->i_sb->s_op->destroy_inode)
 		inode->i_sb->s_op->destroy_inode(inode);
 	else
 		kmem_cache_free(inode_cachep, (inode));
 }
-EXPORT_SYMBOL(destroy_inode);
-
 
 /*
  * These are initializations that only need to be done
Index: linux-2.6/fs/xfs/xfs_inode.h
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_inode.h	2009-08-04 15:59:38.705782128 +0200
+++ linux-2.6/fs/xfs/xfs_inode.h	2009-08-04 16:01:20.157556334 +0200
@@ -322,8 +322,11 @@ static inline struct inode *VFS_I(struct
  */
 static inline void xfs_destroy_inode(struct xfs_inode *ip)
 {
-	make_bad_inode(VFS_I(ip));
-	return destroy_inode(VFS_I(ip));
+	struct inode *inode = VFS_I(ip);
+
+	make_bad_inode(inode);
+	__destroy_inode(inode);
+	inode->i_sb->s_op->destroy_inode(inode);
 }
 
 /*
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2009-08-04 16:01:18.770804693 +0200
+++ linux-2.6/include/linux/fs.h	2009-08-04 16:01:20.159539128 +0200
@@ -2163,6 +2163,7 @@ extern void __iget(struct inode * inode)
 extern void iget_failed(struct inode *);
 extern void clear_inode(struct inode *);
 extern void destroy_inode(struct inode *);
+extern void __destroy_inode(struct inode *);
 extern struct inode *new_inode(struct super_block *);
 extern int should_remove_suid(struct dentry *);
 extern int file_remove_suid(struct file *);


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 3/4] add __destroy_inode
@ 2009-08-04 14:15   ` Christoph Hellwig
  0 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2009-08-04 14:15 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel

[-- Attachment #1: add-__destroy_inode --]
[-- Type: text/plain, Size: 2673 bytes --]

When we want to tear down an inode that lost the add to the cache race
in XFS we must not call into ->destroy_inode because that would delete
the inode that won the race from the inode cache radix tree.

This patch provides the __destroy_inode helper needed to fix this,
the actual fix will be in th next patch.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/inode.c
===================================================================
--- linux-2.6.orig/fs/inode.c	2009-08-04 16:01:18.766801320 +0200
+++ linux-2.6/fs/inode.c	2009-08-04 16:01:37.281556243 +0200
@@ -228,7 +228,7 @@ static struct inode *alloc_inode(struct 
 	return inode;
 }
 
-void destroy_inode(struct inode *inode)
+void __destroy_inode(struct inode *inode)
 {
 	BUG_ON(inode_has_buffers(inode));
 	ima_inode_free(inode);
@@ -240,13 +240,17 @@ void destroy_inode(struct inode *inode)
 	if (inode->i_default_acl && inode->i_default_acl != ACL_NOT_CACHED)
 		posix_acl_release(inode->i_default_acl);
 #endif
+}
+EXPORT_SYMBOL(__destroy_inode);
+
+void destroy_inode(struct inode *inode)
+{
+	__destroy_inode(inode);
 	if (inode->i_sb->s_op->destroy_inode)
 		inode->i_sb->s_op->destroy_inode(inode);
 	else
 		kmem_cache_free(inode_cachep, (inode));
 }
-EXPORT_SYMBOL(destroy_inode);
-
 
 /*
  * These are initializations that only need to be done
Index: linux-2.6/fs/xfs/xfs_inode.h
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_inode.h	2009-08-04 15:59:38.705782128 +0200
+++ linux-2.6/fs/xfs/xfs_inode.h	2009-08-04 16:01:20.157556334 +0200
@@ -322,8 +322,11 @@ static inline struct inode *VFS_I(struct
  */
 static inline void xfs_destroy_inode(struct xfs_inode *ip)
 {
-	make_bad_inode(VFS_I(ip));
-	return destroy_inode(VFS_I(ip));
+	struct inode *inode = VFS_I(ip);
+
+	make_bad_inode(inode);
+	__destroy_inode(inode);
+	inode->i_sb->s_op->destroy_inode(inode);
 }
 
 /*
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2009-08-04 16:01:18.770804693 +0200
+++ linux-2.6/include/linux/fs.h	2009-08-04 16:01:20.159539128 +0200
@@ -2163,6 +2163,7 @@ extern void __iget(struct inode * inode)
 extern void iget_failed(struct inode *);
 extern void clear_inode(struct inode *);
 extern void destroy_inode(struct inode *);
+extern void __destroy_inode(struct inode *);
 extern struct inode *new_inode(struct super_block *);
 extern int should_remove_suid(struct dentry *);
 extern int file_remove_suid(struct file *);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 4/4] xfs: add xfs_inode_free
  2009-08-04 14:15 ` Christoph Hellwig
@ 2009-08-04 14:15   ` Christoph Hellwig
  -1 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2009-08-04 14:15 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel, david

[-- Attachment #1: xfs-add-xfs_inode_free --]
[-- Type: text/plain, Size: 5194 bytes --]

When we want to tear down an inode that lost the add to the cache race
in XFS we must not call into ->destroy_inode because that would delete
the inode that won the race from the inode cache radix tree.

This patch uses splits a new xfs_inode_free helper out of xfs_ireclaim
and uses that plus __destroy_inode to make sure we really only free
the memory allocted for the inode that lost the race, and not mess with
the inode cache state.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/xfs/xfs_iget.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-03 01:23:29.878784477 +0200
+++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-03 01:25:01.601784988 +0200
@@ -116,6 +116,71 @@ xfs_inode_alloc(
 	return ip;
 }
 
+STATIC void
+xfs_inode_free(
+	struct xfs_inode	*ip)
+{
+	switch (ip->i_d.di_mode & S_IFMT) {
+	case S_IFREG:
+	case S_IFDIR:
+	case S_IFLNK:
+		xfs_idestroy_fork(ip, XFS_DATA_FORK);
+		break;
+	}
+
+	if (ip->i_afp)
+		xfs_idestroy_fork(ip, XFS_ATTR_FORK);
+
+#ifdef XFS_INODE_TRACE
+	ktrace_free(ip->i_trace);
+#endif
+#ifdef XFS_BMAP_TRACE
+	ktrace_free(ip->i_xtrace);
+#endif
+#ifdef XFS_BTREE_TRACE
+	ktrace_free(ip->i_btrace);
+#endif
+#ifdef XFS_RW_TRACE
+	ktrace_free(ip->i_rwtrace);
+#endif
+#ifdef XFS_ILOCK_TRACE
+	ktrace_free(ip->i_lock_trace);
+#endif
+#ifdef XFS_DIR2_TRACE
+	ktrace_free(ip->i_dir_trace);
+#endif
+
+	if (ip->i_itemp) {
+		/*
+		 * Only if we are shutting down the fs will we see an
+		 * inode still in the AIL. If it is there, we should remove
+		 * it to prevent a use-after-free from occurring.
+		 */
+		xfs_log_item_t	*lip = &ip->i_itemp->ili_item;
+		struct xfs_ail	*ailp = lip->li_ailp;
+
+		ASSERT(((lip->li_flags & XFS_LI_IN_AIL) == 0) ||
+				       XFS_FORCED_SHUTDOWN(ip->i_mount));
+		if (lip->li_flags & XFS_LI_IN_AIL) {
+			spin_lock(&ailp->xa_lock);
+			if (lip->li_flags & XFS_LI_IN_AIL)
+				xfs_trans_ail_delete(ailp, lip);
+			else
+				spin_unlock(&ailp->xa_lock);
+		}
+		xfs_inode_item_destroy(ip);
+		ip->i_itemp = NULL;
+	}
+
+	/* asserts to verify all state is correct here */
+	ASSERT(atomic_read(&ip->i_iocount) == 0);
+	ASSERT(atomic_read(&ip->i_pincount) == 0);
+	ASSERT(!spin_is_locked(&ip->i_flags_lock));
+	ASSERT(completion_done(&ip->i_flush));
+
+	kmem_zone_free(xfs_inode_zone, ip);
+}
+
 /*
  * Check the validity of the inode we just found it the cache
  */
@@ -303,7 +368,8 @@ out_preload_end:
 	if (lock_flags)
 		xfs_iunlock(ip, lock_flags);
 out_destroy:
-	xfs_destroy_inode(ip);
+	__destroy_inode(VFS_I(ip));
+	xfs_inode_free(ip);
 	return error;
 }
 
@@ -506,62 +572,7 @@ xfs_ireclaim(
 	xfs_qm_dqdetach(ip);
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 
-	switch (ip->i_d.di_mode & S_IFMT) {
-	case S_IFREG:
-	case S_IFDIR:
-	case S_IFLNK:
-		xfs_idestroy_fork(ip, XFS_DATA_FORK);
-		break;
-	}
-
-	if (ip->i_afp)
-		xfs_idestroy_fork(ip, XFS_ATTR_FORK);
-
-#ifdef XFS_INODE_TRACE
-	ktrace_free(ip->i_trace);
-#endif
-#ifdef XFS_BMAP_TRACE
-	ktrace_free(ip->i_xtrace);
-#endif
-#ifdef XFS_BTREE_TRACE
-	ktrace_free(ip->i_btrace);
-#endif
-#ifdef XFS_RW_TRACE
-	ktrace_free(ip->i_rwtrace);
-#endif
-#ifdef XFS_ILOCK_TRACE
-	ktrace_free(ip->i_lock_trace);
-#endif
-#ifdef XFS_DIR2_TRACE
-	ktrace_free(ip->i_dir_trace);
-#endif
-	if (ip->i_itemp) {
-		/*
-		 * Only if we are shutting down the fs will we see an
-		 * inode still in the AIL. If it is there, we should remove
-		 * it to prevent a use-after-free from occurring.
-		 */
-		xfs_log_item_t	*lip = &ip->i_itemp->ili_item;
-		struct xfs_ail	*ailp = lip->li_ailp;
-
-		ASSERT(((lip->li_flags & XFS_LI_IN_AIL) == 0) ||
-				       XFS_FORCED_SHUTDOWN(ip->i_mount));
-		if (lip->li_flags & XFS_LI_IN_AIL) {
-			spin_lock(&ailp->xa_lock);
-			if (lip->li_flags & XFS_LI_IN_AIL)
-				xfs_trans_ail_delete(ailp, lip);
-			else
-				spin_unlock(&ailp->xa_lock);
-		}
-		xfs_inode_item_destroy(ip);
-		ip->i_itemp = NULL;
-	}
-	/* asserts to verify all state is correct here */
-	ASSERT(atomic_read(&ip->i_iocount) == 0);
-	ASSERT(atomic_read(&ip->i_pincount) == 0);
-	ASSERT(!spin_is_locked(&ip->i_flags_lock));
-	ASSERT(completion_done(&ip->i_flush));
-	kmem_zone_free(xfs_inode_zone, ip);
+	xfs_inode_free(ip);
 }
 
 /*
Index: linux-2.6/fs/xfs/xfs_inode.h
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_inode.h	2009-08-03 01:23:39.876532108 +0200
+++ linux-2.6/fs/xfs/xfs_inode.h	2009-08-03 01:23:47.411789594 +0200
@@ -310,26 +310,6 @@ static inline struct inode *VFS_I(struct
 }
 
 /*
- * Get rid of a partially initialized inode.
- *
- * We have to go through destroy_inode to make sure allocations
- * from init_inode_always like the security data are undone.
- *
- * We mark the inode bad so that it takes the short cut in
- * the reclaim path instead of going through the flush path
- * which doesn't make sense for an inode that has never seen the
- * light of day.
- */
-static inline void xfs_destroy_inode(struct xfs_inode *ip)
-{
-	struct inode *inode = VFS_I(ip);
-
-	make_bad_inode(inode);
-	__destroy_inode(inode);
-	inode->i_sb->s_op->destroy_inode(inode);
-}
-
-/*
  * i_flags helper functions
  */
 static inline void


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 4/4] xfs: add xfs_inode_free
@ 2009-08-04 14:15   ` Christoph Hellwig
  0 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2009-08-04 14:15 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel

[-- Attachment #1: xfs-add-xfs_inode_free --]
[-- Type: text/plain, Size: 5315 bytes --]

When we want to tear down an inode that lost the add to the cache race
in XFS we must not call into ->destroy_inode because that would delete
the inode that won the race from the inode cache radix tree.

This patch uses splits a new xfs_inode_free helper out of xfs_ireclaim
and uses that plus __destroy_inode to make sure we really only free
the memory allocted for the inode that lost the race, and not mess with
the inode cache state.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/xfs/xfs_iget.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-03 01:23:29.878784477 +0200
+++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-03 01:25:01.601784988 +0200
@@ -116,6 +116,71 @@ xfs_inode_alloc(
 	return ip;
 }
 
+STATIC void
+xfs_inode_free(
+	struct xfs_inode	*ip)
+{
+	switch (ip->i_d.di_mode & S_IFMT) {
+	case S_IFREG:
+	case S_IFDIR:
+	case S_IFLNK:
+		xfs_idestroy_fork(ip, XFS_DATA_FORK);
+		break;
+	}
+
+	if (ip->i_afp)
+		xfs_idestroy_fork(ip, XFS_ATTR_FORK);
+
+#ifdef XFS_INODE_TRACE
+	ktrace_free(ip->i_trace);
+#endif
+#ifdef XFS_BMAP_TRACE
+	ktrace_free(ip->i_xtrace);
+#endif
+#ifdef XFS_BTREE_TRACE
+	ktrace_free(ip->i_btrace);
+#endif
+#ifdef XFS_RW_TRACE
+	ktrace_free(ip->i_rwtrace);
+#endif
+#ifdef XFS_ILOCK_TRACE
+	ktrace_free(ip->i_lock_trace);
+#endif
+#ifdef XFS_DIR2_TRACE
+	ktrace_free(ip->i_dir_trace);
+#endif
+
+	if (ip->i_itemp) {
+		/*
+		 * Only if we are shutting down the fs will we see an
+		 * inode still in the AIL. If it is there, we should remove
+		 * it to prevent a use-after-free from occurring.
+		 */
+		xfs_log_item_t	*lip = &ip->i_itemp->ili_item;
+		struct xfs_ail	*ailp = lip->li_ailp;
+
+		ASSERT(((lip->li_flags & XFS_LI_IN_AIL) == 0) ||
+				       XFS_FORCED_SHUTDOWN(ip->i_mount));
+		if (lip->li_flags & XFS_LI_IN_AIL) {
+			spin_lock(&ailp->xa_lock);
+			if (lip->li_flags & XFS_LI_IN_AIL)
+				xfs_trans_ail_delete(ailp, lip);
+			else
+				spin_unlock(&ailp->xa_lock);
+		}
+		xfs_inode_item_destroy(ip);
+		ip->i_itemp = NULL;
+	}
+
+	/* asserts to verify all state is correct here */
+	ASSERT(atomic_read(&ip->i_iocount) == 0);
+	ASSERT(atomic_read(&ip->i_pincount) == 0);
+	ASSERT(!spin_is_locked(&ip->i_flags_lock));
+	ASSERT(completion_done(&ip->i_flush));
+
+	kmem_zone_free(xfs_inode_zone, ip);
+}
+
 /*
  * Check the validity of the inode we just found it the cache
  */
@@ -303,7 +368,8 @@ out_preload_end:
 	if (lock_flags)
 		xfs_iunlock(ip, lock_flags);
 out_destroy:
-	xfs_destroy_inode(ip);
+	__destroy_inode(VFS_I(ip));
+	xfs_inode_free(ip);
 	return error;
 }
 
@@ -506,62 +572,7 @@ xfs_ireclaim(
 	xfs_qm_dqdetach(ip);
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 
-	switch (ip->i_d.di_mode & S_IFMT) {
-	case S_IFREG:
-	case S_IFDIR:
-	case S_IFLNK:
-		xfs_idestroy_fork(ip, XFS_DATA_FORK);
-		break;
-	}
-
-	if (ip->i_afp)
-		xfs_idestroy_fork(ip, XFS_ATTR_FORK);
-
-#ifdef XFS_INODE_TRACE
-	ktrace_free(ip->i_trace);
-#endif
-#ifdef XFS_BMAP_TRACE
-	ktrace_free(ip->i_xtrace);
-#endif
-#ifdef XFS_BTREE_TRACE
-	ktrace_free(ip->i_btrace);
-#endif
-#ifdef XFS_RW_TRACE
-	ktrace_free(ip->i_rwtrace);
-#endif
-#ifdef XFS_ILOCK_TRACE
-	ktrace_free(ip->i_lock_trace);
-#endif
-#ifdef XFS_DIR2_TRACE
-	ktrace_free(ip->i_dir_trace);
-#endif
-	if (ip->i_itemp) {
-		/*
-		 * Only if we are shutting down the fs will we see an
-		 * inode still in the AIL. If it is there, we should remove
-		 * it to prevent a use-after-free from occurring.
-		 */
-		xfs_log_item_t	*lip = &ip->i_itemp->ili_item;
-		struct xfs_ail	*ailp = lip->li_ailp;
-
-		ASSERT(((lip->li_flags & XFS_LI_IN_AIL) == 0) ||
-				       XFS_FORCED_SHUTDOWN(ip->i_mount));
-		if (lip->li_flags & XFS_LI_IN_AIL) {
-			spin_lock(&ailp->xa_lock);
-			if (lip->li_flags & XFS_LI_IN_AIL)
-				xfs_trans_ail_delete(ailp, lip);
-			else
-				spin_unlock(&ailp->xa_lock);
-		}
-		xfs_inode_item_destroy(ip);
-		ip->i_itemp = NULL;
-	}
-	/* asserts to verify all state is correct here */
-	ASSERT(atomic_read(&ip->i_iocount) == 0);
-	ASSERT(atomic_read(&ip->i_pincount) == 0);
-	ASSERT(!spin_is_locked(&ip->i_flags_lock));
-	ASSERT(completion_done(&ip->i_flush));
-	kmem_zone_free(xfs_inode_zone, ip);
+	xfs_inode_free(ip);
 }
 
 /*
Index: linux-2.6/fs/xfs/xfs_inode.h
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_inode.h	2009-08-03 01:23:39.876532108 +0200
+++ linux-2.6/fs/xfs/xfs_inode.h	2009-08-03 01:23:47.411789594 +0200
@@ -310,26 +310,6 @@ static inline struct inode *VFS_I(struct
 }
 
 /*
- * Get rid of a partially initialized inode.
- *
- * We have to go through destroy_inode to make sure allocations
- * from init_inode_always like the security data are undone.
- *
- * We mark the inode bad so that it takes the short cut in
- * the reclaim path instead of going through the flush path
- * which doesn't make sense for an inode that has never seen the
- * light of day.
- */
-static inline void xfs_destroy_inode(struct xfs_inode *ip)
-{
-	struct inode *inode = VFS_I(ip);
-
-	make_bad_inode(inode);
-	__destroy_inode(inode);
-	inode->i_sb->s_op->destroy_inode(inode);
-}
-
-/*
  * i_flags helper functions
  */
 static inline void

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] xfs: fix locking in xfs_iget_cache_hit
  2009-08-04 14:15   ` Christoph Hellwig
  (?)
@ 2009-08-06 21:50   ` Eric Sandeen
  2009-08-06 22:29     ` Christoph Hellwig
  -1 siblings, 1 reply; 38+ messages in thread
From: Eric Sandeen @ 2009-08-06 21:50 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

Any chance this could be broken into 2 patches starting
with the set/clear cleanup something like:


diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index b619d6b..49c7b6f 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -708,6 +708,17 @@ xfs_reclaim_inode(
 	return 0;
 }
 
+void
+__xfs_inode_set_reclaim_tag(
+	struct xfs_perag	*pag,
+	struct xfs_inode	*ip)
+{
+	radix_tree_tag_set(&pag->pag_ici_root,
+			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
+			   XFS_ICI_RECLAIM_TAG);
+	ip->i_flags |= XFS_IRECLAIMABLE;
+}
+
 /*
  * We set the inode flag atomically with the radix tree tag.
  * Once we get tag lookups on the radix tree, this inode flag
@@ -722,9 +733,7 @@ xfs_inode_set_reclaim_tag(
 
 	read_lock(&pag->pag_ici_lock);
 	spin_lock(&ip->i_flags_lock);
-	radix_tree_tag_set(&pag->pag_ici_root,
-			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
-	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
+	__xfs_inode_set_reclaim_tag(pag, ip);
 	spin_unlock(&ip->i_flags_lock);
 	read_unlock(&pag->pag_ici_lock);
 	xfs_put_perag(mp, pag);
@@ -732,27 +741,12 @@ xfs_inode_set_reclaim_tag(
 
 void
 __xfs_inode_clear_reclaim_tag(
-	xfs_mount_t	*mp,
-	xfs_perag_t	*pag,
-	xfs_inode_t	*ip)
+	struct xfs_perag	*pag,
+	struct xfs_inode	*ip)
 {
 	radix_tree_tag_clear(&pag->pag_ici_root,
-			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
-}
-
-void
-xfs_inode_clear_reclaim_tag(
-	xfs_inode_t	*ip)
-{
-	xfs_mount_t	*mp = ip->i_mount;
-	xfs_perag_t	*pag = xfs_get_perag(mp, ip->i_ino);
-
-	read_lock(&pag->pag_ici_lock);
-	spin_lock(&ip->i_flags_lock);
-	__xfs_inode_clear_reclaim_tag(mp, pag, ip);
-	spin_unlock(&ip->i_flags_lock);
-	read_unlock(&pag->pag_ici_lock);
-	xfs_put_perag(mp, pag);
+			     XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
+			     XFS_ICI_RECLAIM_TAG);
 }
 
 STATIC int
diff --git a/fs/xfs/linux-2.6/xfs_sync.h b/fs/xfs/linux-2.6/xfs_sync.h
index 2a10301..cb64d20 100644
--- a/fs/xfs/linux-2.6/xfs_sync.h
+++ b/fs/xfs/linux-2.6/xfs_sync.h
@@ -48,9 +48,8 @@ int xfs_reclaim_inode(struct xfs_inode *ip, int locked, int sync_mode);
 int xfs_reclaim_inodes(struct xfs_mount *mp, int mode);
 
 void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
-void xfs_inode_clear_reclaim_tag(struct xfs_inode *ip);
-void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp, struct xfs_perag *pag,
-				struct xfs_inode *ip);
+void __xfs_inode_set_reclaim_tag(struct xfs_perag *pag, struct xfs_inode *ip);
+void __xfs_inode_clear_reclaim_tag(struct xfs_perag *pag, struct xfs_inode *ip);
 
 int xfs_sync_inode_valid(struct xfs_inode *ip, struct xfs_perag *pag);
 int xfs_inode_ag_iterator(struct xfs_mount *mp,
diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c
index 5fcec6f..94b72d3 100644
--- a/fs/xfs/xfs_iget.c
+++ b/fs/xfs/xfs_iget.c
@@ -183,7 +183,7 @@ xfs_iget_cache_hit(
 		xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
 
 		/* clear the radix tree reclaim flag as well. */
-		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
+		__xfs_inode_clear_reclaim_tag(pag, ip);
 	} else if (!igrab(VFS_I(ip))) {
 		/* If the VFS inode is being torn down, pause and try again. */
 		XFS_STATS_INC(xs_ig_frecycle);


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] xfs: fix locking in xfs_iget_cache_hit
  2009-08-06 21:50   ` Eric Sandeen
@ 2009-08-06 22:29     ` Christoph Hellwig
  0 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2009-08-06 22:29 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

On Thu, Aug 06, 2009 at 04:50:23PM -0500, Eric Sandeen wrote:
> Any chance this could be broken into 2 patches starting
> with the set/clear cleanup something like:

Let's just drop those for now given that we're late enough in the cycle.

New version below:

-- 

Subject: xfs: fix locking in xfs_iget_cache_hit
From: Christoph Hellwig <hch@lst.de>

The locking in xfs_iget_cache_hit currently has numerous problems:

 - we clear the reclaim tag without i_flags_lock which protects modifications
   to it
 - we call inode_init_always which can sleep with pag_ici_lock held
 - we acquire and drop i_flags_lock a lot and thus provide no consistency
   between the various flags we set/clear under it

This patch fixes all that with a major revamp of the locking in the function.
The new version acquires i_flags_lock early and only drops it once we need to
call into inode_init_always or before calling xfs_ilock.

This patch fixes a bug seen in the wild where we race modifying the reclaim tag.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/xfs/xfs_iget.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-06 19:25:05.522592017 -0300
+++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-06 19:25:31.760342654 -0300
@@ -133,80 +133,92 @@ xfs_iget_cache_hit(
 	int			flags,
 	int			lock_flags) __releases(pag->pag_ici_lock)
 {
+	struct inode		*inode = VFS_I(ip);
 	struct xfs_mount	*mp = ip->i_mount;
-	int			error = EAGAIN;
+	int			error;
+
+	spin_lock(&ip->i_flags_lock);
 
 	/*
-	 * If INEW is set this inode is being set up
-	 * If IRECLAIM is set this inode is being torn down
-	 * Pause and try again.
+	 * This inode is being torn down, pause and try again.
 	 */
-	if (xfs_iflags_test(ip, (XFS_INEW|XFS_IRECLAIM))) {
+	if (ip->i_flags & XFS_IRECLAIM) {
 		XFS_STATS_INC(xs_ig_frecycle);
+		error = EAGAIN;
 		goto out_error;
 	}
 
-	/* If IRECLAIMABLE is set, we've torn down the vfs inode part */
-	if (xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
+	/*
+	 * If we are racing with another cache hit that is currently recycling
+	 * this inode out of the XFS_IRECLAIMABLE state, wait for the
+	 * initialisation to complete before continuing.
+	 */
+	if (ip->i_flags & XFS_INEW) {
+		spin_unlock(&ip->i_flags_lock);
+		read_unlock(&pag->pag_ici_lock);
 
-		/*
-		 * If lookup is racing with unlink, then we should return an
-		 * error immediately so we don't remove it from the reclaim
-		 * list and potentially leak the inode.
-		 */
-		if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) {
-			error = ENOENT;
-			goto out_error;
-		}
+		XFS_STATS_INC(xs_ig_frecycle);
+		wait_on_inode(inode);
+		return EAGAIN;
+	}
 
+	/*
+	 * If lookup is racing with unlink, then we should return an
+	 * error immediately so we don't remove it from the reclaim
+	 * list and potentially leak the inode.
+	 */
+	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
+		error = ENOENT;
+		goto out_error;
+	}
+
+	/*
+	 * If IRECLAIMABLE is set, we've torn down the VFS inode already.
+	 * Need to carefully get it back into useable state.
+	 */
+	if (ip->i_flags & XFS_IRECLAIMABLE) {
 		xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
 
 		/*
-		 * We need to re-initialise the VFS inode as it has been
-		 * 'freed' by the VFS. Do this here so we can deal with
-		 * errors cleanly, then tag it so it can be set up correctly
-		 * later.
+		 * We need to set XFS_INEW atomically with clearing the
+		 * reclaimable tag so that we do have an indicator of the
+		 * inode still being initialized.
 		 */
-		if (!inode_init_always(mp->m_super, VFS_I(ip))) {
+		ip->i_flags |= XFS_INEW;
+		ip->i_flags &= ~XFS_IRECLAIMABLE;
+		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
+
+		spin_unlock(&ip->i_flags_lock);
+		read_unlock(&pag->pag_ici_lock);
+
+		if (unlikely(!inode_init_always(mp->m_super, inode))) {
+			/*
+			 * Re-initializing the inode failed, and we are in deep
+			 * trouble.  Try to re-add it to the reclaim list.
+			 */
+			read_lock(&pag->pag_ici_lock);
+			spin_lock(&ip->i_flags_lock);
+
+			ip->i_flags &= ~XFS_INEW;
+			ip->i_flags |= XFS_IRECLAIMABLE;
+			__xfs_inode_set_reclaim_tag(pag, ip);
+
 			error = ENOMEM;
 			goto out_error;
 		}
-
-		/*
-		 * We must set the XFS_INEW flag before clearing the
-		 * XFS_IRECLAIMABLE flag so that if a racing lookup does
-		 * not find the XFS_IRECLAIMABLE above but has the igrab()
-		 * below succeed we can safely check XFS_INEW to detect
-		 * that this inode is still being initialised.
-		 */
-		xfs_iflags_set(ip, XFS_INEW);
-		xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
-
-		/* clear the radix tree reclaim flag as well. */
-		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
-	} else if (!igrab(VFS_I(ip))) {
+		inode->i_state = I_LOCK|I_NEW;
+	} else {
 		/* If the VFS inode is being torn down, pause and try again. */
-		XFS_STATS_INC(xs_ig_frecycle);
-		goto out_error;
-	} else if (xfs_iflags_test(ip, XFS_INEW)) {
-		/*
-		 * We are racing with another cache hit that is
-		 * currently recycling this inode out of the XFS_IRECLAIMABLE
-		 * state. Wait for the initialisation to complete before
-		 * continuing.
-		 */
-		wait_on_inode(VFS_I(ip));
-	}
+		if (!igrab(inode)) {
+			error = EAGAIN;
+			goto out_error;
+		}
 
-	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
-		error = ENOENT;
-		iput(VFS_I(ip));
-		goto out_error;
+		/* We've got a live one. */
+		spin_unlock(&ip->i_flags_lock);
+		read_unlock(&pag->pag_ici_lock);
 	}
 
-	/* We've got a live one. */
-	read_unlock(&pag->pag_ici_lock);
-
 	if (lock_flags != 0)
 		xfs_ilock(ip, lock_flags);
 
@@ -216,6 +228,7 @@ xfs_iget_cache_hit(
 	return 0;
 
 out_error:
+	spin_unlock(&ip->i_flags_lock);
 	read_unlock(&pag->pag_ici_lock);
 	return error;
 }
Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.c	2009-08-06 19:25:05.530591777 -0300
+++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.c	2009-08-06 19:25:43.727344574 -0300
@@ -708,6 +708,16 @@ xfs_reclaim_inode(
 	return 0;
 }
 
+void
+__xfs_inode_set_reclaim_tag(
+	struct xfs_perag	*pag,
+	struct xfs_inode	*ip)
+{
+	radix_tree_tag_set(&pag->pag_ici_root,
+			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
+			   XFS_ICI_RECLAIM_TAG);
+}
+
 /*
  * We set the inode flag atomically with the radix tree tag.
  * Once we get tag lookups on the radix tree, this inode flag
@@ -722,8 +732,7 @@ xfs_inode_set_reclaim_tag(
 
 	read_lock(&pag->pag_ici_lock);
 	spin_lock(&ip->i_flags_lock);
-	radix_tree_tag_set(&pag->pag_ici_root,
-			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
+	__xfs_inode_set_reclaim_tag(pag, ip);
 	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
 	spin_unlock(&ip->i_flags_lock);
 	read_unlock(&pag->pag_ici_lock);
Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.h
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.h	2009-08-06 19:25:05.587593862 -0300
+++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.h	2009-08-06 19:25:31.764365652 -0300
@@ -48,6 +48,7 @@ int xfs_reclaim_inode(struct xfs_inode *
 int xfs_reclaim_inodes(struct xfs_mount *mp, int mode);
 
 void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
+void __xfs_inode_set_reclaim_tag(struct xfs_perag *pag, struct xfs_inode *ip);
 void xfs_inode_clear_reclaim_tag(struct xfs_inode *ip);
 void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp, struct xfs_perag *pag,
 				struct xfs_inode *ip);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] fix inode_init_always calling convention
  2009-08-04 14:15   ` Christoph Hellwig
@ 2009-08-06 22:30     ` Eric Sandeen
  -1 siblings, 0 replies; 38+ messages in thread
From: Eric Sandeen @ 2009-08-06 22:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs, linux-fsdevel

Christoph Hellwig wrote:

> Currently inode_init_always calls into ->destroy_inode if the additional
> initialization fails.  That's not only counter-intuitive because
> inode_init_always did not allocate the inode structure, but in case of
> XFS it's actively harmful as ->destroy_inode might delete the inode from
> a radix-tree that has never been added.  This in turn might end up
> deleting the inode for the same inum that has been instanciated by
> another process and cause lots of cause subtile problems.
>
> Also in the case of re-initializing a reclaimable inode in XFS it would
> free an inode we still want to keep alive.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good to me, though it depends on 1/4 which I haven't yet wrapped
my head around...

Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
> Index: linux-2.6/fs/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/inode.c	2009-08-03 01:16:04.254556370 +0200
> +++ linux-2.6/fs/inode.c	2009-08-03 01:23:11.135532251 +0200
> @@ -120,12 +120,11 @@ static void wake_up_inode(struct inode *
>   * These are initializations that need to be done on every inode
>   * allocation as the fields are not initialised by slab allocation.
>   */
> -struct inode *inode_init_always(struct super_block *sb, struct inode *inode)
> +int inode_init_always(struct super_block *sb, struct inode *inode)
>  {
>  	static const struct address_space_operations empty_aops;
>  	static struct inode_operations empty_iops;
>  	static const struct file_operations empty_fops;
> -
>  	struct address_space *const mapping = &inode->i_data;
>  
>  	inode->i_sb = sb;
> @@ -152,7 +151,7 @@ struct inode *inode_init_always(struct s
>  	inode->dirtied_when = 0;
>  
>  	if (security_inode_alloc(inode))
> -		goto out_free_inode;
> +		goto out;
>  
>  	/* allocate and initialize an i_integrity */
>  	if (ima_inode_alloc(inode))
> @@ -198,16 +197,12 @@ struct inode *inode_init_always(struct s
>  	inode->i_fsnotify_mask = 0;
>  #endif
>  
> -	return inode;
> +	return 0;
>  
>  out_free_security:
>  	security_inode_free(inode);
> -out_free_inode:
> -	if (inode->i_sb->s_op->destroy_inode)
> -		inode->i_sb->s_op->destroy_inode(inode);
> -	else
> -		kmem_cache_free(inode_cachep, (inode));
> -	return NULL;
> +out:
> +	return -ENOMEM;
>  }
>  EXPORT_SYMBOL(inode_init_always);
>  
> @@ -220,9 +215,17 @@ static struct inode *alloc_inode(struct 
>  	else
>  		inode = kmem_cache_alloc(inode_cachep, GFP_KERNEL);
>  
> -	if (inode)
> -		return inode_init_always(sb, inode);
> -	return NULL;
> +	if (!inode)
> +		return NULL;
> +
> +	if (unlikely(inode_init_always(sb, inode))) {
> +		if (inode->i_sb->s_op->destroy_inode)
> +			inode->i_sb->s_op->destroy_inode(inode);
> +		else
> +			kmem_cache_free(inode_cachep, inode);
> +	}
> +
> +	return inode;
>  }
>  
>  void destroy_inode(struct inode *inode)
> Index: linux-2.6/fs/xfs/xfs_iget.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-03 01:16:22.510806794 +0200
> +++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-03 01:23:29.878784477 +0200
> @@ -64,6 +64,10 @@ xfs_inode_alloc(
>  	ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
>  	if (!ip)
>  		return NULL;
> +	if (inode_init_always(mp->m_super, VFS_I(ip))) {
> +		kmem_zone_free(xfs_inode_zone, ip);
> +		return NULL;
> +	}
>  
>  	ASSERT(atomic_read(&ip->i_iocount) == 0);
>  	ASSERT(atomic_read(&ip->i_pincount) == 0);
> @@ -105,17 +109,6 @@ xfs_inode_alloc(
>  #ifdef XFS_DIR2_TRACE
>  	ip->i_dir_trace = ktrace_alloc(XFS_DIR2_KTRACE_SIZE, KM_NOFS);
>  #endif
> -	/*
> -	* Now initialise the VFS inode. We do this after the xfs_inode
> -	* initialisation as internal failures will result in ->destroy_inode
> -	* being called and that will pass down through the reclaim path and
> -	* free the XFS inode. This path requires the XFS inode to already be
> -	* initialised. Hence if this call fails, the xfs_inode has already
> -	* been freed and we should not reference it at all in the error
> -	* handling.
> -	*/
> -	if (!inode_init_always(mp->m_super, VFS_I(ip)))
> -		return NULL;
>  
>  	/* prevent anyone from using this yet */
>  	VFS_I(ip)->i_state = I_NEW|I_LOCK;
> @@ -190,7 +183,7 @@ xfs_iget_cache_hit(
>  		spin_unlock(&ip->i_flags_lock);
>  		read_unlock(&pag->pag_ici_lock);
>  
> -		if (unlikely(!inode_init_always(mp->m_super, inode))) {
> +		if (unlikely(inode_init_always(mp->m_super, inode))) {
>  			/*
>  			 * Re-initializing the inode failed, and we are in deep
>  			 * trouble.  Try to re-add it to the reclaim list.
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h	2009-08-03 01:16:21.186539128 +0200
> +++ linux-2.6/include/linux/fs.h	2009-08-03 01:23:11.131532230 +0200
> @@ -2136,7 +2136,7 @@ extern loff_t default_llseek(struct file
>  
>  extern loff_t vfs_llseek(struct file *file, loff_t offset, int origin);
>  
> -extern struct inode * inode_init_always(struct super_block *, struct inode *);
> +extern int inode_init_always(struct super_block *, struct inode *);
>  extern void inode_init_once(struct inode *);
>  extern void inode_add_to_lists(struct super_block *, struct inode *);
>  extern void iput(struct inode *);
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] fix inode_init_always calling convention
@ 2009-08-06 22:30     ` Eric Sandeen
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Sandeen @ 2009-08-06 22:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs

Christoph Hellwig wrote:

> Currently inode_init_always calls into ->destroy_inode if the additional
> initialization fails.  That's not only counter-intuitive because
> inode_init_always did not allocate the inode structure, but in case of
> XFS it's actively harmful as ->destroy_inode might delete the inode from
> a radix-tree that has never been added.  This in turn might end up
> deleting the inode for the same inum that has been instanciated by
> another process and cause lots of cause subtile problems.
>
> Also in the case of re-initializing a reclaimable inode in XFS it would
> free an inode we still want to keep alive.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good to me, though it depends on 1/4 which I haven't yet wrapped
my head around...

Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
> Index: linux-2.6/fs/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/inode.c	2009-08-03 01:16:04.254556370 +0200
> +++ linux-2.6/fs/inode.c	2009-08-03 01:23:11.135532251 +0200
> @@ -120,12 +120,11 @@ static void wake_up_inode(struct inode *
>   * These are initializations that need to be done on every inode
>   * allocation as the fields are not initialised by slab allocation.
>   */
> -struct inode *inode_init_always(struct super_block *sb, struct inode *inode)
> +int inode_init_always(struct super_block *sb, struct inode *inode)
>  {
>  	static const struct address_space_operations empty_aops;
>  	static struct inode_operations empty_iops;
>  	static const struct file_operations empty_fops;
> -
>  	struct address_space *const mapping = &inode->i_data;
>  
>  	inode->i_sb = sb;
> @@ -152,7 +151,7 @@ struct inode *inode_init_always(struct s
>  	inode->dirtied_when = 0;
>  
>  	if (security_inode_alloc(inode))
> -		goto out_free_inode;
> +		goto out;
>  
>  	/* allocate and initialize an i_integrity */
>  	if (ima_inode_alloc(inode))
> @@ -198,16 +197,12 @@ struct inode *inode_init_always(struct s
>  	inode->i_fsnotify_mask = 0;
>  #endif
>  
> -	return inode;
> +	return 0;
>  
>  out_free_security:
>  	security_inode_free(inode);
> -out_free_inode:
> -	if (inode->i_sb->s_op->destroy_inode)
> -		inode->i_sb->s_op->destroy_inode(inode);
> -	else
> -		kmem_cache_free(inode_cachep, (inode));
> -	return NULL;
> +out:
> +	return -ENOMEM;
>  }
>  EXPORT_SYMBOL(inode_init_always);
>  
> @@ -220,9 +215,17 @@ static struct inode *alloc_inode(struct 
>  	else
>  		inode = kmem_cache_alloc(inode_cachep, GFP_KERNEL);
>  
> -	if (inode)
> -		return inode_init_always(sb, inode);
> -	return NULL;
> +	if (!inode)
> +		return NULL;
> +
> +	if (unlikely(inode_init_always(sb, inode))) {
> +		if (inode->i_sb->s_op->destroy_inode)
> +			inode->i_sb->s_op->destroy_inode(inode);
> +		else
> +			kmem_cache_free(inode_cachep, inode);
> +	}
> +
> +	return inode;
>  }
>  
>  void destroy_inode(struct inode *inode)
> Index: linux-2.6/fs/xfs/xfs_iget.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-03 01:16:22.510806794 +0200
> +++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-03 01:23:29.878784477 +0200
> @@ -64,6 +64,10 @@ xfs_inode_alloc(
>  	ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
>  	if (!ip)
>  		return NULL;
> +	if (inode_init_always(mp->m_super, VFS_I(ip))) {
> +		kmem_zone_free(xfs_inode_zone, ip);
> +		return NULL;
> +	}
>  
>  	ASSERT(atomic_read(&ip->i_iocount) == 0);
>  	ASSERT(atomic_read(&ip->i_pincount) == 0);
> @@ -105,17 +109,6 @@ xfs_inode_alloc(
>  #ifdef XFS_DIR2_TRACE
>  	ip->i_dir_trace = ktrace_alloc(XFS_DIR2_KTRACE_SIZE, KM_NOFS);
>  #endif
> -	/*
> -	* Now initialise the VFS inode. We do this after the xfs_inode
> -	* initialisation as internal failures will result in ->destroy_inode
> -	* being called and that will pass down through the reclaim path and
> -	* free the XFS inode. This path requires the XFS inode to already be
> -	* initialised. Hence if this call fails, the xfs_inode has already
> -	* been freed and we should not reference it at all in the error
> -	* handling.
> -	*/
> -	if (!inode_init_always(mp->m_super, VFS_I(ip)))
> -		return NULL;
>  
>  	/* prevent anyone from using this yet */
>  	VFS_I(ip)->i_state = I_NEW|I_LOCK;
> @@ -190,7 +183,7 @@ xfs_iget_cache_hit(
>  		spin_unlock(&ip->i_flags_lock);
>  		read_unlock(&pag->pag_ici_lock);
>  
> -		if (unlikely(!inode_init_always(mp->m_super, inode))) {
> +		if (unlikely(inode_init_always(mp->m_super, inode))) {
>  			/*
>  			 * Re-initializing the inode failed, and we are in deep
>  			 * trouble.  Try to re-add it to the reclaim list.
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h	2009-08-03 01:16:21.186539128 +0200
> +++ linux-2.6/include/linux/fs.h	2009-08-03 01:23:11.131532230 +0200
> @@ -2136,7 +2136,7 @@ extern loff_t default_llseek(struct file
>  
>  extern loff_t vfs_llseek(struct file *file, loff_t offset, int origin);
>  
> -extern struct inode * inode_init_always(struct super_block *, struct inode *);
> +extern int inode_init_always(struct super_block *, struct inode *);
>  extern void inode_init_once(struct inode *);
>  extern void inode_add_to_lists(struct super_block *, struct inode *);
>  extern void iput(struct inode *);
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/4] add __destroy_inode
  2009-08-04 14:15   ` Christoph Hellwig
@ 2009-08-06 22:56     ` Eric Sandeen
  -1 siblings, 0 replies; 38+ messages in thread
From: Eric Sandeen @ 2009-08-06 22:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs, linux-fsdevel

Christoph Hellwig wrote:

> When we want to tear down an inode that lost the add to the cache race
> in XFS we must not call into ->destroy_inode because that would delete
> the inode that won the race from the inode cache radix tree.
>
> This patch provides the __destroy_inode helper needed to fix this,
> the actual fix will be in th next patch.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
Looks fine to me logically, though having __destroy_inode do everything
-but- ->destroy_inode is a little funky semantically ... maybe
__free_inode?  Not a huge deal.

Reviewed-by: Eric Sandeen <sandeen@sandeen.net>

> Index: linux-2.6/fs/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/inode.c	2009-08-04 16:01:18.766801320 +0200
> +++ linux-2.6/fs/inode.c	2009-08-04 16:01:37.281556243 +0200
> @@ -228,7 +228,7 @@ static struct inode *alloc_inode(struct 
>  	return inode;
>  }
>  
> -void destroy_inode(struct inode *inode)
> +void __destroy_inode(struct inode *inode)
>  {
>  	BUG_ON(inode_has_buffers(inode));
>  	ima_inode_free(inode);
> @@ -240,13 +240,17 @@ void destroy_inode(struct inode *inode)
>  	if (inode->i_default_acl && inode->i_default_acl != ACL_NOT_CACHED)
>  		posix_acl_release(inode->i_default_acl);
>  #endif
> +}
> +EXPORT_SYMBOL(__destroy_inode);
> +
> +void destroy_inode(struct inode *inode)
> +{
> +	__destroy_inode(inode);
>  	if (inode->i_sb->s_op->destroy_inode)
>  		inode->i_sb->s_op->destroy_inode(inode);
>  	else
>  		kmem_cache_free(inode_cachep, (inode));
>  }
> -EXPORT_SYMBOL(destroy_inode);
> -
>  
>  /*
>   * These are initializations that only need to be done
> Index: linux-2.6/fs/xfs/xfs_inode.h
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_inode.h	2009-08-04 15:59:38.705782128 +0200
> +++ linux-2.6/fs/xfs/xfs_inode.h	2009-08-04 16:01:20.157556334 +0200
> @@ -322,8 +322,11 @@ static inline struct inode *VFS_I(struct
>   */
>  static inline void xfs_destroy_inode(struct xfs_inode *ip)
>  {
> -	make_bad_inode(VFS_I(ip));
> -	return destroy_inode(VFS_I(ip));
> +	struct inode *inode = VFS_I(ip);
> +
> +	make_bad_inode(inode);
> +	__destroy_inode(inode);
> +	inode->i_sb->s_op->destroy_inode(inode);
>  }
>  
>  /*
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h	2009-08-04 16:01:18.770804693 +0200
> +++ linux-2.6/include/linux/fs.h	2009-08-04 16:01:20.159539128 +0200
> @@ -2163,6 +2163,7 @@ extern void __iget(struct inode * inode)
>  extern void iget_failed(struct inode *);
>  extern void clear_inode(struct inode *);
>  extern void destroy_inode(struct inode *);
> +extern void __destroy_inode(struct inode *);
>  extern struct inode *new_inode(struct super_block *);
>  extern int should_remove_suid(struct dentry *);
>  extern int file_remove_suid(struct file *);
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/4] add __destroy_inode
@ 2009-08-06 22:56     ` Eric Sandeen
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Sandeen @ 2009-08-06 22:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs

Christoph Hellwig wrote:

> When we want to tear down an inode that lost the add to the cache race
> in XFS we must not call into ->destroy_inode because that would delete
> the inode that won the race from the inode cache radix tree.
>
> This patch provides the __destroy_inode helper needed to fix this,
> the actual fix will be in th next patch.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
Looks fine to me logically, though having __destroy_inode do everything
-but- ->destroy_inode is a little funky semantically ... maybe
__free_inode?  Not a huge deal.

Reviewed-by: Eric Sandeen <sandeen@sandeen.net>

> Index: linux-2.6/fs/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/inode.c	2009-08-04 16:01:18.766801320 +0200
> +++ linux-2.6/fs/inode.c	2009-08-04 16:01:37.281556243 +0200
> @@ -228,7 +228,7 @@ static struct inode *alloc_inode(struct 
>  	return inode;
>  }
>  
> -void destroy_inode(struct inode *inode)
> +void __destroy_inode(struct inode *inode)
>  {
>  	BUG_ON(inode_has_buffers(inode));
>  	ima_inode_free(inode);
> @@ -240,13 +240,17 @@ void destroy_inode(struct inode *inode)
>  	if (inode->i_default_acl && inode->i_default_acl != ACL_NOT_CACHED)
>  		posix_acl_release(inode->i_default_acl);
>  #endif
> +}
> +EXPORT_SYMBOL(__destroy_inode);
> +
> +void destroy_inode(struct inode *inode)
> +{
> +	__destroy_inode(inode);
>  	if (inode->i_sb->s_op->destroy_inode)
>  		inode->i_sb->s_op->destroy_inode(inode);
>  	else
>  		kmem_cache_free(inode_cachep, (inode));
>  }
> -EXPORT_SYMBOL(destroy_inode);
> -
>  
>  /*
>   * These are initializations that only need to be done
> Index: linux-2.6/fs/xfs/xfs_inode.h
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_inode.h	2009-08-04 15:59:38.705782128 +0200
> +++ linux-2.6/fs/xfs/xfs_inode.h	2009-08-04 16:01:20.157556334 +0200
> @@ -322,8 +322,11 @@ static inline struct inode *VFS_I(struct
>   */
>  static inline void xfs_destroy_inode(struct xfs_inode *ip)
>  {
> -	make_bad_inode(VFS_I(ip));
> -	return destroy_inode(VFS_I(ip));
> +	struct inode *inode = VFS_I(ip);
> +
> +	make_bad_inode(inode);
> +	__destroy_inode(inode);
> +	inode->i_sb->s_op->destroy_inode(inode);
>  }
>  
>  /*
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h	2009-08-04 16:01:18.770804693 +0200
> +++ linux-2.6/include/linux/fs.h	2009-08-04 16:01:20.159539128 +0200
> @@ -2163,6 +2163,7 @@ extern void __iget(struct inode * inode)
>  extern void iget_failed(struct inode *);
>  extern void clear_inode(struct inode *);
>  extern void destroy_inode(struct inode *);
> +extern void __destroy_inode(struct inode *);
>  extern struct inode *new_inode(struct super_block *);
>  extern int should_remove_suid(struct dentry *);
>  extern int file_remove_suid(struct file *);
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/4] xfs: add xfs_inode_free
  2009-08-04 14:15   ` Christoph Hellwig
@ 2009-08-06 23:54     ` Eric Sandeen
  -1 siblings, 0 replies; 38+ messages in thread
From: Eric Sandeen @ 2009-08-06 23:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs, linux-fsdevel

Christoph Hellwig wrote:

> When we want to tear down an inode that lost the add to the cache race
> in XFS we must not call into ->destroy_inode because that would delete
> the inode that won the race from the inode cache radix tree.
>
> This patch uses splits a new xfs_inode_free helper out of xfs_ireclaim
> and uses that plus __destroy_inode to make sure we really only free
> the memory allocted for the inode that lost the race, and not mess with
> the inode cache state.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>

Looks right to me.

Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
> Index: linux-2.6/fs/xfs/xfs_iget.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-03 01:23:29.878784477 +0200
> +++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-03 01:25:01.601784988 +0200
> @@ -116,6 +116,71 @@ xfs_inode_alloc(
>  	return ip;
>  }
>  
> +STATIC void
> +xfs_inode_free(
> +	struct xfs_inode	*ip)
> +{
> +	switch (ip->i_d.di_mode & S_IFMT) {
> +	case S_IFREG:
> +	case S_IFDIR:
> +	case S_IFLNK:
> +		xfs_idestroy_fork(ip, XFS_DATA_FORK);
> +		break;
> +	}
> +
> +	if (ip->i_afp)
> +		xfs_idestroy_fork(ip, XFS_ATTR_FORK);
> +
> +#ifdef XFS_INODE_TRACE
> +	ktrace_free(ip->i_trace);
> +#endif
> +#ifdef XFS_BMAP_TRACE
> +	ktrace_free(ip->i_xtrace);
> +#endif
> +#ifdef XFS_BTREE_TRACE
> +	ktrace_free(ip->i_btrace);
> +#endif
> +#ifdef XFS_RW_TRACE
> +	ktrace_free(ip->i_rwtrace);
> +#endif
> +#ifdef XFS_ILOCK_TRACE
> +	ktrace_free(ip->i_lock_trace);
> +#endif
> +#ifdef XFS_DIR2_TRACE
> +	ktrace_free(ip->i_dir_trace);
> +#endif
> +
> +	if (ip->i_itemp) {
> +		/*
> +		 * Only if we are shutting down the fs will we see an
> +		 * inode still in the AIL. If it is there, we should remove
> +		 * it to prevent a use-after-free from occurring.
> +		 */
> +		xfs_log_item_t	*lip = &ip->i_itemp->ili_item;
> +		struct xfs_ail	*ailp = lip->li_ailp;
> +
> +		ASSERT(((lip->li_flags & XFS_LI_IN_AIL) == 0) ||
> +				       XFS_FORCED_SHUTDOWN(ip->i_mount));
> +		if (lip->li_flags & XFS_LI_IN_AIL) {
> +			spin_lock(&ailp->xa_lock);
> +			if (lip->li_flags & XFS_LI_IN_AIL)
> +				xfs_trans_ail_delete(ailp, lip);
> +			else
> +				spin_unlock(&ailp->xa_lock);
> +		}
> +		xfs_inode_item_destroy(ip);
> +		ip->i_itemp = NULL;
> +	}
> +
> +	/* asserts to verify all state is correct here */
> +	ASSERT(atomic_read(&ip->i_iocount) == 0);
> +	ASSERT(atomic_read(&ip->i_pincount) == 0);
> +	ASSERT(!spin_is_locked(&ip->i_flags_lock));
> +	ASSERT(completion_done(&ip->i_flush));
> +
> +	kmem_zone_free(xfs_inode_zone, ip);
> +}
> +
>  /*
>   * Check the validity of the inode we just found it the cache
>   */
> @@ -303,7 +368,8 @@ out_preload_end:
>  	if (lock_flags)
>  		xfs_iunlock(ip, lock_flags);
>  out_destroy:
> -	xfs_destroy_inode(ip);
> +	__destroy_inode(VFS_I(ip));
> +	xfs_inode_free(ip);
>  	return error;
>  }
>  
> @@ -506,62 +572,7 @@ xfs_ireclaim(
>  	xfs_qm_dqdetach(ip);
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  
> -	switch (ip->i_d.di_mode & S_IFMT) {
> -	case S_IFREG:
> -	case S_IFDIR:
> -	case S_IFLNK:
> -		xfs_idestroy_fork(ip, XFS_DATA_FORK);
> -		break;
> -	}
> -
> -	if (ip->i_afp)
> -		xfs_idestroy_fork(ip, XFS_ATTR_FORK);
> -
> -#ifdef XFS_INODE_TRACE
> -	ktrace_free(ip->i_trace);
> -#endif
> -#ifdef XFS_BMAP_TRACE
> -	ktrace_free(ip->i_xtrace);
> -#endif
> -#ifdef XFS_BTREE_TRACE
> -	ktrace_free(ip->i_btrace);
> -#endif
> -#ifdef XFS_RW_TRACE
> -	ktrace_free(ip->i_rwtrace);
> -#endif
> -#ifdef XFS_ILOCK_TRACE
> -	ktrace_free(ip->i_lock_trace);
> -#endif
> -#ifdef XFS_DIR2_TRACE
> -	ktrace_free(ip->i_dir_trace);
> -#endif
> -	if (ip->i_itemp) {
> -		/*
> -		 * Only if we are shutting down the fs will we see an
> -		 * inode still in the AIL. If it is there, we should remove
> -		 * it to prevent a use-after-free from occurring.
> -		 */
> -		xfs_log_item_t	*lip = &ip->i_itemp->ili_item;
> -		struct xfs_ail	*ailp = lip->li_ailp;
> -
> -		ASSERT(((lip->li_flags & XFS_LI_IN_AIL) == 0) ||
> -				       XFS_FORCED_SHUTDOWN(ip->i_mount));
> -		if (lip->li_flags & XFS_LI_IN_AIL) {
> -			spin_lock(&ailp->xa_lock);
> -			if (lip->li_flags & XFS_LI_IN_AIL)
> -				xfs_trans_ail_delete(ailp, lip);
> -			else
> -				spin_unlock(&ailp->xa_lock);
> -		}
> -		xfs_inode_item_destroy(ip);
> -		ip->i_itemp = NULL;
> -	}
> -	/* asserts to verify all state is correct here */
> -	ASSERT(atomic_read(&ip->i_iocount) == 0);
> -	ASSERT(atomic_read(&ip->i_pincount) == 0);
> -	ASSERT(!spin_is_locked(&ip->i_flags_lock));
> -	ASSERT(completion_done(&ip->i_flush));
> -	kmem_zone_free(xfs_inode_zone, ip);
> +	xfs_inode_free(ip);
>  }
>  
>  /*
> Index: linux-2.6/fs/xfs/xfs_inode.h
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_inode.h	2009-08-03 01:23:39.876532108 +0200
> +++ linux-2.6/fs/xfs/xfs_inode.h	2009-08-03 01:23:47.411789594 +0200
> @@ -310,26 +310,6 @@ static inline struct inode *VFS_I(struct
>  }
>  
>  /*
> - * Get rid of a partially initialized inode.
> - *
> - * We have to go through destroy_inode to make sure allocations
> - * from init_inode_always like the security data are undone.
> - *
> - * We mark the inode bad so that it takes the short cut in
> - * the reclaim path instead of going through the flush path
> - * which doesn't make sense for an inode that has never seen the
> - * light of day.
> - */
> -static inline void xfs_destroy_inode(struct xfs_inode *ip)
> -{
> -	struct inode *inode = VFS_I(ip);
> -
> -	make_bad_inode(inode);
> -	__destroy_inode(inode);
> -	inode->i_sb->s_op->destroy_inode(inode);
> -}
> -
> -/*
>   * i_flags helper functions
>   */
>  static inline void
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/4] xfs: add xfs_inode_free
@ 2009-08-06 23:54     ` Eric Sandeen
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Sandeen @ 2009-08-06 23:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs

Christoph Hellwig wrote:

> When we want to tear down an inode that lost the add to the cache race
> in XFS we must not call into ->destroy_inode because that would delete
> the inode that won the race from the inode cache radix tree.
>
> This patch uses splits a new xfs_inode_free helper out of xfs_ireclaim
> and uses that plus __destroy_inode to make sure we really only free
> the memory allocted for the inode that lost the race, and not mess with
> the inode cache state.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>

Looks right to me.

Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
> Index: linux-2.6/fs/xfs/xfs_iget.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-03 01:23:29.878784477 +0200
> +++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-03 01:25:01.601784988 +0200
> @@ -116,6 +116,71 @@ xfs_inode_alloc(
>  	return ip;
>  }
>  
> +STATIC void
> +xfs_inode_free(
> +	struct xfs_inode	*ip)
> +{
> +	switch (ip->i_d.di_mode & S_IFMT) {
> +	case S_IFREG:
> +	case S_IFDIR:
> +	case S_IFLNK:
> +		xfs_idestroy_fork(ip, XFS_DATA_FORK);
> +		break;
> +	}
> +
> +	if (ip->i_afp)
> +		xfs_idestroy_fork(ip, XFS_ATTR_FORK);
> +
> +#ifdef XFS_INODE_TRACE
> +	ktrace_free(ip->i_trace);
> +#endif
> +#ifdef XFS_BMAP_TRACE
> +	ktrace_free(ip->i_xtrace);
> +#endif
> +#ifdef XFS_BTREE_TRACE
> +	ktrace_free(ip->i_btrace);
> +#endif
> +#ifdef XFS_RW_TRACE
> +	ktrace_free(ip->i_rwtrace);
> +#endif
> +#ifdef XFS_ILOCK_TRACE
> +	ktrace_free(ip->i_lock_trace);
> +#endif
> +#ifdef XFS_DIR2_TRACE
> +	ktrace_free(ip->i_dir_trace);
> +#endif
> +
> +	if (ip->i_itemp) {
> +		/*
> +		 * Only if we are shutting down the fs will we see an
> +		 * inode still in the AIL. If it is there, we should remove
> +		 * it to prevent a use-after-free from occurring.
> +		 */
> +		xfs_log_item_t	*lip = &ip->i_itemp->ili_item;
> +		struct xfs_ail	*ailp = lip->li_ailp;
> +
> +		ASSERT(((lip->li_flags & XFS_LI_IN_AIL) == 0) ||
> +				       XFS_FORCED_SHUTDOWN(ip->i_mount));
> +		if (lip->li_flags & XFS_LI_IN_AIL) {
> +			spin_lock(&ailp->xa_lock);
> +			if (lip->li_flags & XFS_LI_IN_AIL)
> +				xfs_trans_ail_delete(ailp, lip);
> +			else
> +				spin_unlock(&ailp->xa_lock);
> +		}
> +		xfs_inode_item_destroy(ip);
> +		ip->i_itemp = NULL;
> +	}
> +
> +	/* asserts to verify all state is correct here */
> +	ASSERT(atomic_read(&ip->i_iocount) == 0);
> +	ASSERT(atomic_read(&ip->i_pincount) == 0);
> +	ASSERT(!spin_is_locked(&ip->i_flags_lock));
> +	ASSERT(completion_done(&ip->i_flush));
> +
> +	kmem_zone_free(xfs_inode_zone, ip);
> +}
> +
>  /*
>   * Check the validity of the inode we just found it the cache
>   */
> @@ -303,7 +368,8 @@ out_preload_end:
>  	if (lock_flags)
>  		xfs_iunlock(ip, lock_flags);
>  out_destroy:
> -	xfs_destroy_inode(ip);
> +	__destroy_inode(VFS_I(ip));
> +	xfs_inode_free(ip);
>  	return error;
>  }
>  
> @@ -506,62 +572,7 @@ xfs_ireclaim(
>  	xfs_qm_dqdetach(ip);
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  
> -	switch (ip->i_d.di_mode & S_IFMT) {
> -	case S_IFREG:
> -	case S_IFDIR:
> -	case S_IFLNK:
> -		xfs_idestroy_fork(ip, XFS_DATA_FORK);
> -		break;
> -	}
> -
> -	if (ip->i_afp)
> -		xfs_idestroy_fork(ip, XFS_ATTR_FORK);
> -
> -#ifdef XFS_INODE_TRACE
> -	ktrace_free(ip->i_trace);
> -#endif
> -#ifdef XFS_BMAP_TRACE
> -	ktrace_free(ip->i_xtrace);
> -#endif
> -#ifdef XFS_BTREE_TRACE
> -	ktrace_free(ip->i_btrace);
> -#endif
> -#ifdef XFS_RW_TRACE
> -	ktrace_free(ip->i_rwtrace);
> -#endif
> -#ifdef XFS_ILOCK_TRACE
> -	ktrace_free(ip->i_lock_trace);
> -#endif
> -#ifdef XFS_DIR2_TRACE
> -	ktrace_free(ip->i_dir_trace);
> -#endif
> -	if (ip->i_itemp) {
> -		/*
> -		 * Only if we are shutting down the fs will we see an
> -		 * inode still in the AIL. If it is there, we should remove
> -		 * it to prevent a use-after-free from occurring.
> -		 */
> -		xfs_log_item_t	*lip = &ip->i_itemp->ili_item;
> -		struct xfs_ail	*ailp = lip->li_ailp;
> -
> -		ASSERT(((lip->li_flags & XFS_LI_IN_AIL) == 0) ||
> -				       XFS_FORCED_SHUTDOWN(ip->i_mount));
> -		if (lip->li_flags & XFS_LI_IN_AIL) {
> -			spin_lock(&ailp->xa_lock);
> -			if (lip->li_flags & XFS_LI_IN_AIL)
> -				xfs_trans_ail_delete(ailp, lip);
> -			else
> -				spin_unlock(&ailp->xa_lock);
> -		}
> -		xfs_inode_item_destroy(ip);
> -		ip->i_itemp = NULL;
> -	}
> -	/* asserts to verify all state is correct here */
> -	ASSERT(atomic_read(&ip->i_iocount) == 0);
> -	ASSERT(atomic_read(&ip->i_pincount) == 0);
> -	ASSERT(!spin_is_locked(&ip->i_flags_lock));
> -	ASSERT(completion_done(&ip->i_flush));
> -	kmem_zone_free(xfs_inode_zone, ip);
> +	xfs_inode_free(ip);
>  }
>  
>  /*
> Index: linux-2.6/fs/xfs/xfs_inode.h
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_inode.h	2009-08-03 01:23:39.876532108 +0200
> +++ linux-2.6/fs/xfs/xfs_inode.h	2009-08-03 01:23:47.411789594 +0200
> @@ -310,26 +310,6 @@ static inline struct inode *VFS_I(struct
>  }
>  
>  /*
> - * Get rid of a partially initialized inode.
> - *
> - * We have to go through destroy_inode to make sure allocations
> - * from init_inode_always like the security data are undone.
> - *
> - * We mark the inode bad so that it takes the short cut in
> - * the reclaim path instead of going through the flush path
> - * which doesn't make sense for an inode that has never seen the
> - * light of day.
> - */
> -static inline void xfs_destroy_inode(struct xfs_inode *ip)
> -{
> -	struct inode *inode = VFS_I(ip);
> -
> -	make_bad_inode(inode);
> -	__destroy_inode(inode);
> -	inode->i_sb->s_op->destroy_inode(inode);
> -}
> -
> -/*
>   * i_flags helper functions
>   */
>  static inline void
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] xfs: fix locking in xfs_iget_cache_hit
  2009-08-04 14:15   ` Christoph Hellwig
@ 2009-08-07 17:25     ` Felix Blyakher
  -1 siblings, 0 replies; 38+ messages in thread
From: Felix Blyakher @ 2009-08-07 17:25 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs, linux-fsdevel

On Aug 4, 2009, at 9:15 AM, Christoph Hellwig wrote:

> The locking in xfs_iget_cache_hit currently has numerous problems:
>
> - we clear the reclaim tag without i_flags_lock which protects  
> modifications
>   to it
> - we call inode_init_always which can sleep with pag_ici_lock held
> - we acquire and drop i_flags_lock a lot and thus provide no  
> consistency
>   between the various flags we set/clear under it
>
> This patch fixes all that with a major revamp of the locking in the  
> function.
> The new version acquires i_flags_lock early and only drops it once  
> we need to
> call into inode_init_always or before calling xfs_ilock.
>
> This patch fixes a bug seen in the wild where we race modifying the  
> reclaim tag.

Some comments below as well as couple of questions based on the
difference with previous code, not necessarily pointing to a
problem. Just trying to figure it out for myself.

>
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> Index: linux-2.6/fs/xfs/xfs_iget.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-01 23:20:54.775080254  
> +0200
> +++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-01 23:20:54.807080483 +0200
> @@ -133,80 +133,90 @@ xfs_iget_cache_hit(
> 	int			flags,
> 	int			lock_flags) __releases(pag->pag_ici_lock)
> {
> +	struct inode		*inode = VFS_I(ip);
> 	struct xfs_mount	*mp = ip->i_mount;
> -	int			error = EAGAIN;
> +	int			error;
> +
> +	spin_lock(&ip->i_flags_lock);
>
> 	/*
> -	 * If INEW is set this inode is being set up
> -	 * If IRECLAIM is set this inode is being torn down
> -	 * Pause and try again.
> +	 * This inode is being torn down, pause and try again.
> 	 */
> -	if (xfs_iflags_test(ip, (XFS_INEW|XFS_IRECLAIM))) {
> +	if (ip->i_flags & XFS_IRECLAIM) {
> 		XFS_STATS_INC(xs_ig_frecycle);
> +		error = EAGAIN;
> 		goto out_error;
> 	}
>
> -	/* If IRECLAIMABLE is set, we've torn down the vfs inode part */
> -	if (xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
> +	/*
> +	 * If we are racing with another cache hit that is currently  
> recycling
> +	 * this inode out of the XFS_IRECLAIMABLE state, wait for the
> +	 * initialisation to complete before continuing.
> +	 */
> +	if (ip->i_flags & XFS_INEW) {

Another case when we find XFS_INEW set is the race with the
cache miss, which just set up a new inode. Would the proposed
code be still sensible in that case? If yes, at least comments
should be updated.

>
> +		spin_unlock(&ip->i_flags_lock);
> +		read_unlock(&pag->pag_ici_lock);
>
> -		/*
> -		 * If lookup is racing with unlink, then we should return an
> -		 * error immediately so we don't remove it from the reclaim
> -		 * list and potentially leak the inode.
> -		 */
> -		if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) {
> -			error = ENOENT;
> -			goto out_error;
> -		}
> +		XFS_STATS_INC(xs_ig_frecycle);
> +		wait_on_inode(inode);

It's possible to have XFS_INEW set, but no I_LOCK|I_NEW yet.
Then the wait_on_inode() would return quickly even before the
linux inode is reinitialized. Though, that was the case with
the old code as well.

>
> +		return EAGAIN;

This return seems inconsistent with usual goto end of function
convention. I understand that in this case goto out_error would
be incorrect. Should we create new label out_error_unlocked?

>
> +	}
>
> +	/*
> +	 * If lookup is racing with unlink, then we should return an
> +	 * error immediately so we don't remove it from the reclaim
> +	 * list and potentially leak the inode.
> +	 */
> +	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {

Previously the conclusion of the race with unlink was based on
XFS_IRECLAIMABLE i_flag set in addition to the test above.
Is is no longer a case, or not really necessary?

>
> +		error = ENOENT;
> +		goto out_error;
> +	}
> +
> +	/*
> +	 * If IRECLAIMABLE is set, we've torn down the VFS inode already.
> +	 * Need to carefully get it back into useable state.
> +	 */
> +	if (ip->i_flags & XFS_IRECLAIMABLE) {
> 		xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
>
> 		/*
> -		 * We need to re-initialise the VFS inode as it has been
> -		 * 'freed' by the VFS. Do this here so we can deal with
> -		 * errors cleanly, then tag it so it can be set up correctly
> -		 * later.
> +		 * We need to set XFS_INEW atomically with clearing the
> +		 * reclaimable tag so that we do have an indicator of the
> +		 * inode still being initialized.
> 		 */
> -		if (!inode_init_always(mp->m_super, VFS_I(ip))) {
> +		ip->i_flags |= XFS_INEW;
> +		__xfs_inode_clear_reclaim_tag(pag, ip);
> +
> +		spin_unlock(&ip->i_flags_lock);
> +		read_unlock(&pag->pag_ici_lock);
> +
> +		if (unlikely(!inode_init_always(mp->m_super, inode))) {
> +			/*
> +			 * Re-initializing the inode failed, and we are in deep
> +			 * trouble.  Try to re-add it to the reclaim list.
> +			 */
> +			read_lock(&pag->pag_ici_lock);
> +			spin_lock(&ip->i_flags_lock);
> +
> +			ip->i_flags &= ~XFS_INEW;
> +			__xfs_inode_set_reclaim_tag(pag, ip);
> +
> 			error = ENOMEM;
> 			goto out_error;
> 		}
> -
> -		/*
> -		 * We must set the XFS_INEW flag before clearing the
> -		 * XFS_IRECLAIMABLE flag so that if a racing lookup does
> -		 * not find the XFS_IRECLAIMABLE above but has the igrab()
> -		 * below succeed we can safely check XFS_INEW to detect
> -		 * that this inode is still being initialised.
> -		 */
> -		xfs_iflags_set(ip, XFS_INEW);
> -		xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
> -
> -		/* clear the radix tree reclaim flag as well. */
> -		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
> -	} else if (!igrab(VFS_I(ip))) {
> +		inode->i_state = I_LOCK|I_NEW;
> +	} else {
> 		/* If the VFS inode is being torn down, pause and try again. */
> -		XFS_STATS_INC(xs_ig_frecycle);
> -		goto out_error;
> -	} else if (xfs_iflags_test(ip, XFS_INEW)) {
> -		/*
> -		 * We are racing with another cache hit that is
> -		 * currently recycling this inode out of the XFS_IRECLAIMABLE
> -		 * state. Wait for the initialisation to complete before
> -		 * continuing.
> -		 */
> -		wait_on_inode(VFS_I(ip));
> -	}
> +		if (!igrab(inode)) {
> +			error = EAGAIN;
> +			goto out_error;
> +		}
>
> -	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
> -		error = ENOENT;
> -		iput(VFS_I(ip));
> -		goto out_error;
> +		/* We've got a live one. */
> +		spin_unlock(&ip->i_flags_lock);
> +		read_unlock(&pag->pag_ici_lock);
> 	}
>
> -	/* We've got a live one. */
> -	read_unlock(&pag->pag_ici_lock);
> -
> 	if (lock_flags != 0)
> 		xfs_ilock(ip, lock_flags);
>
> @@ -216,6 +226,7 @@ xfs_iget_cache_hit(
> 	return 0;
>
> out_error:
> +	spin_unlock(&ip->i_flags_lock);
> 	read_unlock(&pag->pag_ici_lock);
> 	return error;
> }
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.c	2009-08-01  
> 23:20:31.441330970 +0200
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.c	2009-08-01  
> 23:20:54.807080483 +0200
> @@ -708,6 +708,17 @@ xfs_reclaim_inode(
> 	return 0;
> }
>
> +void
> +__xfs_inode_set_reclaim_tag(
> +	struct xfs_perag	*pag,
> +	struct xfs_inode	*ip)
> +{
> +	radix_tree_tag_set(&pag->pag_ici_root,
> +			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
> +			   XFS_ICI_RECLAIM_TAG);
> +	ip->i_flags |= XFS_IRECLAIMABLE;
> +}
> +
> /*
>  * We set the inode flag atomically with the radix tree tag.
>  * Once we get tag lookups on the radix tree, this inode flag
> @@ -722,9 +733,7 @@ xfs_inode_set_reclaim_tag(
>
> 	read_lock(&pag->pag_ici_lock);
> 	spin_lock(&ip->i_flags_lock);
> -	radix_tree_tag_set(&pag->pag_ici_root,
> -			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
> -	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
> +	__xfs_inode_set_reclaim_tag(pag, ip);
> 	spin_unlock(&ip->i_flags_lock);
> 	read_unlock(&pag->pag_ici_lock);
> 	xfs_put_perag(mp, pag);
> @@ -732,27 +741,13 @@ xfs_inode_set_reclaim_tag(
>
> void
> __xfs_inode_clear_reclaim_tag(
> -	xfs_mount_t	*mp,
> -	xfs_perag_t	*pag,
> -	xfs_inode_t	*ip)
> +	struct xfs_perag	*pag,
> +	struct xfs_inode	*ip)
> {
> +	ip->i_flags &= ~XFS_IRECLAIMABLE;
> 	radix_tree_tag_clear(&pag->pag_ici_root,
> -			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
> -}
> -
> -void
> -xfs_inode_clear_reclaim_tag(
> -	xfs_inode_t	*ip)
> -{
> -	xfs_mount_t	*mp = ip->i_mount;
> -	xfs_perag_t	*pag = xfs_get_perag(mp, ip->i_ino);
> -
> -	read_lock(&pag->pag_ici_lock);
> -	spin_lock(&ip->i_flags_lock);
> -	__xfs_inode_clear_reclaim_tag(mp, pag, ip);
> -	spin_unlock(&ip->i_flags_lock);
> -	read_unlock(&pag->pag_ici_lock);
> -	xfs_put_perag(mp, pag);
> +			     XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
> +			     XFS_ICI_RECLAIM_TAG);
> }
>
> STATIC int
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.h
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.h	2009-08-01  
> 23:20:31.449329683 +0200
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.h	2009-08-01  
> 23:20:54.808079772 +0200
> @@ -48,9 +48,8 @@ int xfs_reclaim_inode(struct xfs_inode *
> int xfs_reclaim_inodes(struct xfs_mount *mp, int mode);
>
> void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
> -void xfs_inode_clear_reclaim_tag(struct xfs_inode *ip);
> -void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp, struct  
> xfs_perag *pag,
> -				struct xfs_inode *ip);
> +void __xfs_inode_set_reclaim_tag(struct xfs_perag *pag, struct  
> xfs_inode *ip);
> +void __xfs_inode_clear_reclaim_tag(struct xfs_perag *pag, struct  
> xfs_inode *ip);
>
> int xfs_sync_inode_valid(struct xfs_inode *ip, struct xfs_perag *pag);
> int xfs_inode_ag_iterator(struct xfs_mount *mp,
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] xfs: fix locking in xfs_iget_cache_hit
@ 2009-08-07 17:25     ` Felix Blyakher
  0 siblings, 0 replies; 38+ messages in thread
From: Felix Blyakher @ 2009-08-07 17:25 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs

On Aug 4, 2009, at 9:15 AM, Christoph Hellwig wrote:

> The locking in xfs_iget_cache_hit currently has numerous problems:
>
> - we clear the reclaim tag without i_flags_lock which protects  
> modifications
>   to it
> - we call inode_init_always which can sleep with pag_ici_lock held
> - we acquire and drop i_flags_lock a lot and thus provide no  
> consistency
>   between the various flags we set/clear under it
>
> This patch fixes all that with a major revamp of the locking in the  
> function.
> The new version acquires i_flags_lock early and only drops it once  
> we need to
> call into inode_init_always or before calling xfs_ilock.
>
> This patch fixes a bug seen in the wild where we race modifying the  
> reclaim tag.

Some comments below as well as couple of questions based on the
difference with previous code, not necessarily pointing to a
problem. Just trying to figure it out for myself.

>
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> Index: linux-2.6/fs/xfs/xfs_iget.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-01 23:20:54.775080254  
> +0200
> +++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-01 23:20:54.807080483 +0200
> @@ -133,80 +133,90 @@ xfs_iget_cache_hit(
> 	int			flags,
> 	int			lock_flags) __releases(pag->pag_ici_lock)
> {
> +	struct inode		*inode = VFS_I(ip);
> 	struct xfs_mount	*mp = ip->i_mount;
> -	int			error = EAGAIN;
> +	int			error;
> +
> +	spin_lock(&ip->i_flags_lock);
>
> 	/*
> -	 * If INEW is set this inode is being set up
> -	 * If IRECLAIM is set this inode is being torn down
> -	 * Pause and try again.
> +	 * This inode is being torn down, pause and try again.
> 	 */
> -	if (xfs_iflags_test(ip, (XFS_INEW|XFS_IRECLAIM))) {
> +	if (ip->i_flags & XFS_IRECLAIM) {
> 		XFS_STATS_INC(xs_ig_frecycle);
> +		error = EAGAIN;
> 		goto out_error;
> 	}
>
> -	/* If IRECLAIMABLE is set, we've torn down the vfs inode part */
> -	if (xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
> +	/*
> +	 * If we are racing with another cache hit that is currently  
> recycling
> +	 * this inode out of the XFS_IRECLAIMABLE state, wait for the
> +	 * initialisation to complete before continuing.
> +	 */
> +	if (ip->i_flags & XFS_INEW) {

Another case when we find XFS_INEW set is the race with the
cache miss, which just set up a new inode. Would the proposed
code be still sensible in that case? If yes, at least comments
should be updated.

>
> +		spin_unlock(&ip->i_flags_lock);
> +		read_unlock(&pag->pag_ici_lock);
>
> -		/*
> -		 * If lookup is racing with unlink, then we should return an
> -		 * error immediately so we don't remove it from the reclaim
> -		 * list and potentially leak the inode.
> -		 */
> -		if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) {
> -			error = ENOENT;
> -			goto out_error;
> -		}
> +		XFS_STATS_INC(xs_ig_frecycle);
> +		wait_on_inode(inode);

It's possible to have XFS_INEW set, but no I_LOCK|I_NEW yet.
Then the wait_on_inode() would return quickly even before the
linux inode is reinitialized. Though, that was the case with
the old code as well.

>
> +		return EAGAIN;

This return seems inconsistent with usual goto end of function
convention. I understand that in this case goto out_error would
be incorrect. Should we create new label out_error_unlocked?

>
> +	}
>
> +	/*
> +	 * If lookup is racing with unlink, then we should return an
> +	 * error immediately so we don't remove it from the reclaim
> +	 * list and potentially leak the inode.
> +	 */
> +	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {

Previously the conclusion of the race with unlink was based on
XFS_IRECLAIMABLE i_flag set in addition to the test above.
Is is no longer a case, or not really necessary?

>
> +		error = ENOENT;
> +		goto out_error;
> +	}
> +
> +	/*
> +	 * If IRECLAIMABLE is set, we've torn down the VFS inode already.
> +	 * Need to carefully get it back into useable state.
> +	 */
> +	if (ip->i_flags & XFS_IRECLAIMABLE) {
> 		xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
>
> 		/*
> -		 * We need to re-initialise the VFS inode as it has been
> -		 * 'freed' by the VFS. Do this here so we can deal with
> -		 * errors cleanly, then tag it so it can be set up correctly
> -		 * later.
> +		 * We need to set XFS_INEW atomically with clearing the
> +		 * reclaimable tag so that we do have an indicator of the
> +		 * inode still being initialized.
> 		 */
> -		if (!inode_init_always(mp->m_super, VFS_I(ip))) {
> +		ip->i_flags |= XFS_INEW;
> +		__xfs_inode_clear_reclaim_tag(pag, ip);
> +
> +		spin_unlock(&ip->i_flags_lock);
> +		read_unlock(&pag->pag_ici_lock);
> +
> +		if (unlikely(!inode_init_always(mp->m_super, inode))) {
> +			/*
> +			 * Re-initializing the inode failed, and we are in deep
> +			 * trouble.  Try to re-add it to the reclaim list.
> +			 */
> +			read_lock(&pag->pag_ici_lock);
> +			spin_lock(&ip->i_flags_lock);
> +
> +			ip->i_flags &= ~XFS_INEW;
> +			__xfs_inode_set_reclaim_tag(pag, ip);
> +
> 			error = ENOMEM;
> 			goto out_error;
> 		}
> -
> -		/*
> -		 * We must set the XFS_INEW flag before clearing the
> -		 * XFS_IRECLAIMABLE flag so that if a racing lookup does
> -		 * not find the XFS_IRECLAIMABLE above but has the igrab()
> -		 * below succeed we can safely check XFS_INEW to detect
> -		 * that this inode is still being initialised.
> -		 */
> -		xfs_iflags_set(ip, XFS_INEW);
> -		xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
> -
> -		/* clear the radix tree reclaim flag as well. */
> -		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
> -	} else if (!igrab(VFS_I(ip))) {
> +		inode->i_state = I_LOCK|I_NEW;
> +	} else {
> 		/* If the VFS inode is being torn down, pause and try again. */
> -		XFS_STATS_INC(xs_ig_frecycle);
> -		goto out_error;
> -	} else if (xfs_iflags_test(ip, XFS_INEW)) {
> -		/*
> -		 * We are racing with another cache hit that is
> -		 * currently recycling this inode out of the XFS_IRECLAIMABLE
> -		 * state. Wait for the initialisation to complete before
> -		 * continuing.
> -		 */
> -		wait_on_inode(VFS_I(ip));
> -	}
> +		if (!igrab(inode)) {
> +			error = EAGAIN;
> +			goto out_error;
> +		}
>
> -	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
> -		error = ENOENT;
> -		iput(VFS_I(ip));
> -		goto out_error;
> +		/* We've got a live one. */
> +		spin_unlock(&ip->i_flags_lock);
> +		read_unlock(&pag->pag_ici_lock);
> 	}
>
> -	/* We've got a live one. */
> -	read_unlock(&pag->pag_ici_lock);
> -
> 	if (lock_flags != 0)
> 		xfs_ilock(ip, lock_flags);
>
> @@ -216,6 +226,7 @@ xfs_iget_cache_hit(
> 	return 0;
>
> out_error:
> +	spin_unlock(&ip->i_flags_lock);
> 	read_unlock(&pag->pag_ici_lock);
> 	return error;
> }
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.c	2009-08-01  
> 23:20:31.441330970 +0200
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.c	2009-08-01  
> 23:20:54.807080483 +0200
> @@ -708,6 +708,17 @@ xfs_reclaim_inode(
> 	return 0;
> }
>
> +void
> +__xfs_inode_set_reclaim_tag(
> +	struct xfs_perag	*pag,
> +	struct xfs_inode	*ip)
> +{
> +	radix_tree_tag_set(&pag->pag_ici_root,
> +			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
> +			   XFS_ICI_RECLAIM_TAG);
> +	ip->i_flags |= XFS_IRECLAIMABLE;
> +}
> +
> /*
>  * We set the inode flag atomically with the radix tree tag.
>  * Once we get tag lookups on the radix tree, this inode flag
> @@ -722,9 +733,7 @@ xfs_inode_set_reclaim_tag(
>
> 	read_lock(&pag->pag_ici_lock);
> 	spin_lock(&ip->i_flags_lock);
> -	radix_tree_tag_set(&pag->pag_ici_root,
> -			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
> -	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
> +	__xfs_inode_set_reclaim_tag(pag, ip);
> 	spin_unlock(&ip->i_flags_lock);
> 	read_unlock(&pag->pag_ici_lock);
> 	xfs_put_perag(mp, pag);
> @@ -732,27 +741,13 @@ xfs_inode_set_reclaim_tag(
>
> void
> __xfs_inode_clear_reclaim_tag(
> -	xfs_mount_t	*mp,
> -	xfs_perag_t	*pag,
> -	xfs_inode_t	*ip)
> +	struct xfs_perag	*pag,
> +	struct xfs_inode	*ip)
> {
> +	ip->i_flags &= ~XFS_IRECLAIMABLE;
> 	radix_tree_tag_clear(&pag->pag_ici_root,
> -			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
> -}
> -
> -void
> -xfs_inode_clear_reclaim_tag(
> -	xfs_inode_t	*ip)
> -{
> -	xfs_mount_t	*mp = ip->i_mount;
> -	xfs_perag_t	*pag = xfs_get_perag(mp, ip->i_ino);
> -
> -	read_lock(&pag->pag_ici_lock);
> -	spin_lock(&ip->i_flags_lock);
> -	__xfs_inode_clear_reclaim_tag(mp, pag, ip);
> -	spin_unlock(&ip->i_flags_lock);
> -	read_unlock(&pag->pag_ici_lock);
> -	xfs_put_perag(mp, pag);
> +			     XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
> +			     XFS_ICI_RECLAIM_TAG);
> }
>
> STATIC int
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.h
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.h	2009-08-01  
> 23:20:31.449329683 +0200
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.h	2009-08-01  
> 23:20:54.808079772 +0200
> @@ -48,9 +48,8 @@ int xfs_reclaim_inode(struct xfs_inode *
> int xfs_reclaim_inodes(struct xfs_mount *mp, int mode);
>
> void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
> -void xfs_inode_clear_reclaim_tag(struct xfs_inode *ip);
> -void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp, struct  
> xfs_perag *pag,
> -				struct xfs_inode *ip);
> +void __xfs_inode_set_reclaim_tag(struct xfs_perag *pag, struct  
> xfs_inode *ip);
> +void __xfs_inode_clear_reclaim_tag(struct xfs_perag *pag, struct  
> xfs_inode *ip);
>
> int xfs_sync_inode_valid(struct xfs_inode *ip, struct xfs_perag *pag);
> int xfs_inode_ag_iterator(struct xfs_mount *mp,
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] fix inode_init_always calling convention
  2009-08-04 14:15   ` Christoph Hellwig
@ 2009-08-07 17:39     ` Felix Blyakher
  -1 siblings, 0 replies; 38+ messages in thread
From: Felix Blyakher @ 2009-08-07 17:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs, linux-fsdevel


On Aug 4, 2009, at 9:15 AM, Christoph Hellwig wrote:

> Currently inode_init_always calls into ->destroy_inode if the  
> additional
> initialization fails.  That's not only counter-intuitive because
> inode_init_always did not allocate the inode structure, but in case of
> XFS it's actively harmful as ->destroy_inode might delete the inode  
> from
> a radix-tree that has never been added.  This in turn might end up
> deleting the inode for the same inum that has been instanciated by
> another process and cause lots of cause subtile problems.
>
> Also in the case of re-initializing a reclaimable inode in XFS it  
> would
> free an inode we still want to keep alive.

Definitely sensible approach for inode_init_always to be
symmetric, and to not free what it didn't allocate.

Reviewed-by: Felix Blyakher <felixb@sgi.com>

with minor comment below.

>
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> Index: linux-2.6/fs/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/inode.c	2009-08-03 01:16:04.254556370 +0200
> +++ linux-2.6/fs/inode.c	2009-08-03 01:23:11.135532251 +0200
> @@ -120,12 +120,11 @@ static void wake_up_inode(struct inode *
>  * These are initializations that need to be done on every inode
>  * allocation as the fields are not initialised by slab allocation.
>  */
> -struct inode *inode_init_always(struct super_block *sb, struct  
> inode *inode)
> +int inode_init_always(struct super_block *sb, struct inode *inode)
> {
> 	static const struct address_space_operations empty_aops;
> 	static struct inode_operations empty_iops;
> 	static const struct file_operations empty_fops;
> -
> 	struct address_space *const mapping = &inode->i_data;
>
> 	inode->i_sb = sb;
> @@ -152,7 +151,7 @@ struct inode *inode_init_always(struct s
> 	inode->dirtied_when = 0;
>
> 	if (security_inode_alloc(inode))
> -		goto out_free_inode;
> +		goto out;
>
> 	/* allocate and initialize an i_integrity */
> 	if (ima_inode_alloc(inode))
> @@ -198,16 +197,12 @@ struct inode *inode_init_always(struct s
> 	inode->i_fsnotify_mask = 0;
> #endif
>
> -	return inode;
> +	return 0;
>
> out_free_security:
> 	security_inode_free(inode);
> -out_free_inode:
> -	if (inode->i_sb->s_op->destroy_inode)
> -		inode->i_sb->s_op->destroy_inode(inode);
> -	else
> -		kmem_cache_free(inode_cachep, (inode));
> -	return NULL;
> +out:
> +	return -ENOMEM;
> }
> EXPORT_SYMBOL(inode_init_always);
>
> @@ -220,9 +215,17 @@ static struct inode *alloc_inode(struct
> 	else
> 		inode = kmem_cache_alloc(inode_cachep, GFP_KERNEL);
>
> -	if (inode)
> -		return inode_init_always(sb, inode);
> -	return NULL;
> +	if (!inode)
> +		return NULL;
> +
> +	if (unlikely(inode_init_always(sb, inode))) {
> +		if (inode->i_sb->s_op->destroy_inode)
> +			inode->i_sb->s_op->destroy_inode(inode);
> +		else
> +			kmem_cache_free(inode_cachep, inode);
> +	}
> +
> +	return inode;
> }
>
> void destroy_inode(struct inode *inode)
> Index: linux-2.6/fs/xfs/xfs_iget.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-03 01:16:22.510806794  
> +0200
> +++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-03 01:23:29.878784477 +0200
> @@ -64,6 +64,10 @@ xfs_inode_alloc(
> 	ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
> 	if (!ip)
> 		return NULL;
> +	if (inode_init_always(mp->m_super, VFS_I(ip))) {

Should this be 'unlikely' event?

>
> +		kmem_zone_free(xfs_inode_zone, ip);
> +		return NULL;
> +	}
>
> 	ASSERT(atomic_read(&ip->i_iocount) == 0);
> 	ASSERT(atomic_read(&ip->i_pincount) == 0);
> @@ -105,17 +109,6 @@ xfs_inode_alloc(
> #ifdef XFS_DIR2_TRACE
> 	ip->i_dir_trace = ktrace_alloc(XFS_DIR2_KTRACE_SIZE, KM_NOFS);
> #endif
> -	/*
> -	* Now initialise the VFS inode. We do this after the xfs_inode
> -	* initialisation as internal failures will result in ->destroy_inode
> -	* being called and that will pass down through the reclaim path and
> -	* free the XFS inode. This path requires the XFS inode to already be
> -	* initialised. Hence if this call fails, the xfs_inode has already
> -	* been freed and we should not reference it at all in the error
> -	* handling.
> -	*/
> -	if (!inode_init_always(mp->m_super, VFS_I(ip)))
> -		return NULL;
>
> 	/* prevent anyone from using this yet */
> 	VFS_I(ip)->i_state = I_NEW|I_LOCK;
> @@ -190,7 +183,7 @@ xfs_iget_cache_hit(
> 		spin_unlock(&ip->i_flags_lock);
> 		read_unlock(&pag->pag_ici_lock);
>
> -		if (unlikely(!inode_init_always(mp->m_super, inode))) {
> +		if (unlikely(inode_init_always(mp->m_super, inode))) {
> 			/*
> 			 * Re-initializing the inode failed, and we are in deep
> 			 * trouble.  Try to re-add it to the reclaim list.
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h	2009-08-03 01:16:21.186539128  
> +0200
> +++ linux-2.6/include/linux/fs.h	2009-08-03 01:23:11.131532230 +0200
> @@ -2136,7 +2136,7 @@ extern loff_t default_llseek(struct file
>
> extern loff_t vfs_llseek(struct file *file, loff_t offset, int  
> origin);
>
> -extern struct inode * inode_init_always(struct super_block *,  
> struct inode *);
> +extern int inode_init_always(struct super_block *, struct inode *);
> extern void inode_init_once(struct inode *);
> extern void inode_add_to_lists(struct super_block *, struct inode *);
> extern void iput(struct inode *);
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] fix inode_init_always calling convention
@ 2009-08-07 17:39     ` Felix Blyakher
  0 siblings, 0 replies; 38+ messages in thread
From: Felix Blyakher @ 2009-08-07 17:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs


On Aug 4, 2009, at 9:15 AM, Christoph Hellwig wrote:

> Currently inode_init_always calls into ->destroy_inode if the  
> additional
> initialization fails.  That's not only counter-intuitive because
> inode_init_always did not allocate the inode structure, but in case of
> XFS it's actively harmful as ->destroy_inode might delete the inode  
> from
> a radix-tree that has never been added.  This in turn might end up
> deleting the inode for the same inum that has been instanciated by
> another process and cause lots of cause subtile problems.
>
> Also in the case of re-initializing a reclaimable inode in XFS it  
> would
> free an inode we still want to keep alive.

Definitely sensible approach for inode_init_always to be
symmetric, and to not free what it didn't allocate.

Reviewed-by: Felix Blyakher <felixb@sgi.com>

with minor comment below.

>
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> Index: linux-2.6/fs/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/inode.c	2009-08-03 01:16:04.254556370 +0200
> +++ linux-2.6/fs/inode.c	2009-08-03 01:23:11.135532251 +0200
> @@ -120,12 +120,11 @@ static void wake_up_inode(struct inode *
>  * These are initializations that need to be done on every inode
>  * allocation as the fields are not initialised by slab allocation.
>  */
> -struct inode *inode_init_always(struct super_block *sb, struct  
> inode *inode)
> +int inode_init_always(struct super_block *sb, struct inode *inode)
> {
> 	static const struct address_space_operations empty_aops;
> 	static struct inode_operations empty_iops;
> 	static const struct file_operations empty_fops;
> -
> 	struct address_space *const mapping = &inode->i_data;
>
> 	inode->i_sb = sb;
> @@ -152,7 +151,7 @@ struct inode *inode_init_always(struct s
> 	inode->dirtied_when = 0;
>
> 	if (security_inode_alloc(inode))
> -		goto out_free_inode;
> +		goto out;
>
> 	/* allocate and initialize an i_integrity */
> 	if (ima_inode_alloc(inode))
> @@ -198,16 +197,12 @@ struct inode *inode_init_always(struct s
> 	inode->i_fsnotify_mask = 0;
> #endif
>
> -	return inode;
> +	return 0;
>
> out_free_security:
> 	security_inode_free(inode);
> -out_free_inode:
> -	if (inode->i_sb->s_op->destroy_inode)
> -		inode->i_sb->s_op->destroy_inode(inode);
> -	else
> -		kmem_cache_free(inode_cachep, (inode));
> -	return NULL;
> +out:
> +	return -ENOMEM;
> }
> EXPORT_SYMBOL(inode_init_always);
>
> @@ -220,9 +215,17 @@ static struct inode *alloc_inode(struct
> 	else
> 		inode = kmem_cache_alloc(inode_cachep, GFP_KERNEL);
>
> -	if (inode)
> -		return inode_init_always(sb, inode);
> -	return NULL;
> +	if (!inode)
> +		return NULL;
> +
> +	if (unlikely(inode_init_always(sb, inode))) {
> +		if (inode->i_sb->s_op->destroy_inode)
> +			inode->i_sb->s_op->destroy_inode(inode);
> +		else
> +			kmem_cache_free(inode_cachep, inode);
> +	}
> +
> +	return inode;
> }
>
> void destroy_inode(struct inode *inode)
> Index: linux-2.6/fs/xfs/xfs_iget.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-03 01:16:22.510806794  
> +0200
> +++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-03 01:23:29.878784477 +0200
> @@ -64,6 +64,10 @@ xfs_inode_alloc(
> 	ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
> 	if (!ip)
> 		return NULL;
> +	if (inode_init_always(mp->m_super, VFS_I(ip))) {

Should this be 'unlikely' event?

>
> +		kmem_zone_free(xfs_inode_zone, ip);
> +		return NULL;
> +	}
>
> 	ASSERT(atomic_read(&ip->i_iocount) == 0);
> 	ASSERT(atomic_read(&ip->i_pincount) == 0);
> @@ -105,17 +109,6 @@ xfs_inode_alloc(
> #ifdef XFS_DIR2_TRACE
> 	ip->i_dir_trace = ktrace_alloc(XFS_DIR2_KTRACE_SIZE, KM_NOFS);
> #endif
> -	/*
> -	* Now initialise the VFS inode. We do this after the xfs_inode
> -	* initialisation as internal failures will result in ->destroy_inode
> -	* being called and that will pass down through the reclaim path and
> -	* free the XFS inode. This path requires the XFS inode to already be
> -	* initialised. Hence if this call fails, the xfs_inode has already
> -	* been freed and we should not reference it at all in the error
> -	* handling.
> -	*/
> -	if (!inode_init_always(mp->m_super, VFS_I(ip)))
> -		return NULL;
>
> 	/* prevent anyone from using this yet */
> 	VFS_I(ip)->i_state = I_NEW|I_LOCK;
> @@ -190,7 +183,7 @@ xfs_iget_cache_hit(
> 		spin_unlock(&ip->i_flags_lock);
> 		read_unlock(&pag->pag_ici_lock);
>
> -		if (unlikely(!inode_init_always(mp->m_super, inode))) {
> +		if (unlikely(inode_init_always(mp->m_super, inode))) {
> 			/*
> 			 * Re-initializing the inode failed, and we are in deep
> 			 * trouble.  Try to re-add it to the reclaim list.
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h	2009-08-03 01:16:21.186539128  
> +0200
> +++ linux-2.6/include/linux/fs.h	2009-08-03 01:23:11.131532230 +0200
> @@ -2136,7 +2136,7 @@ extern loff_t default_llseek(struct file
>
> extern loff_t vfs_llseek(struct file *file, loff_t offset, int  
> origin);
>
> -extern struct inode * inode_init_always(struct super_block *,  
> struct inode *);
> +extern int inode_init_always(struct super_block *, struct inode *);
> extern void inode_init_once(struct inode *);
> extern void inode_add_to_lists(struct super_block *, struct inode *);
> extern void iput(struct inode *);
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] fix inode_init_always calling convention
  2009-08-07 17:39     ` Felix Blyakher
@ 2009-08-07 18:09       ` Felix Blyakher
  -1 siblings, 0 replies; 38+ messages in thread
From: Felix Blyakher @ 2009-08-07 18:09 UTC (permalink / raw)
  To: Felix Blyakher; +Cc: Christoph Hellwig, xfs, linux-fsdevel


On Aug 7, 2009, at 12:39 PM, Felix Blyakher wrote:

>
> On Aug 4, 2009, at 9:15 AM, Christoph Hellwig wrote:
>
>> Currently inode_init_always calls into ->destroy_inode if the  
>> additional
>> initialization fails.  That's not only counter-intuitive because
>> inode_init_always did not allocate the inode structure, but in case  
>> of
>> XFS it's actively harmful as ->destroy_inode might delete the inode  
>> from
>> a radix-tree that has never been added.  This in turn might end up
>> deleting the inode for the same inum that has been instanciated by
>> another process and cause lots of cause subtile problems.

Also for a clean git log the last line should read:

    another process and cause lots of subtle problems.

Not trying to be picky :)
Felix


>>
>>
>> Also in the case of re-initializing a reclaimable inode in XFS it  
>> would
>> free an inode we still want to keep alive.
>
> Definitely sensible approach for inode_init_always to be
> symmetric, and to not free what it didn't allocate.
>
> Reviewed-by: Felix Blyakher <felixb@sgi.com>
>
> with minor comment below.
>
>>
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>
>> Index: linux-2.6/fs/inode.c
>> ===================================================================
>> --- linux-2.6.orig/fs/inode.c	2009-08-03 01:16:04.254556370 +0200
>> +++ linux-2.6/fs/inode.c	2009-08-03 01:23:11.135532251 +0200
>> @@ -120,12 +120,11 @@ static void wake_up_inode(struct inode *
>> * These are initializations that need to be done on every inode
>> * allocation as the fields are not initialised by slab allocation.
>> */
>> -struct inode *inode_init_always(struct super_block *sb, struct  
>> inode *inode)
>> +int inode_init_always(struct super_block *sb, struct inode *inode)
>> {
>> 	static const struct address_space_operations empty_aops;
>> 	static struct inode_operations empty_iops;
>> 	static const struct file_operations empty_fops;
>> -
>> 	struct address_space *const mapping = &inode->i_data;
>>
>> 	inode->i_sb = sb;
>> @@ -152,7 +151,7 @@ struct inode *inode_init_always(struct s
>> 	inode->dirtied_when = 0;
>>
>> 	if (security_inode_alloc(inode))
>> -		goto out_free_inode;
>> +		goto out;
>>
>> 	/* allocate and initialize an i_integrity */
>> 	if (ima_inode_alloc(inode))
>> @@ -198,16 +197,12 @@ struct inode *inode_init_always(struct s
>> 	inode->i_fsnotify_mask = 0;
>> #endif
>>
>> -	return inode;
>> +	return 0;
>>
>> out_free_security:
>> 	security_inode_free(inode);
>> -out_free_inode:
>> -	if (inode->i_sb->s_op->destroy_inode)
>> -		inode->i_sb->s_op->destroy_inode(inode);
>> -	else
>> -		kmem_cache_free(inode_cachep, (inode));
>> -	return NULL;
>> +out:
>> +	return -ENOMEM;
>> }
>> EXPORT_SYMBOL(inode_init_always);
>>
>> @@ -220,9 +215,17 @@ static struct inode *alloc_inode(struct
>> 	else
>> 		inode = kmem_cache_alloc(inode_cachep, GFP_KERNEL);
>>
>> -	if (inode)
>> -		return inode_init_always(sb, inode);
>> -	return NULL;
>> +	if (!inode)
>> +		return NULL;
>> +
>> +	if (unlikely(inode_init_always(sb, inode))) {
>> +		if (inode->i_sb->s_op->destroy_inode)
>> +			inode->i_sb->s_op->destroy_inode(inode);
>> +		else
>> +			kmem_cache_free(inode_cachep, inode);
>> +	}
>> +
>> +	return inode;
>> }
>>
>> void destroy_inode(struct inode *inode)
>> Index: linux-2.6/fs/xfs/xfs_iget.c
>> ===================================================================
>> --- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-03 01:16:22.510806794  
>> +0200
>> +++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-03 01:23:29.878784477 +0200
>> @@ -64,6 +64,10 @@ xfs_inode_alloc(
>> 	ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
>> 	if (!ip)
>> 		return NULL;
>> +	if (inode_init_always(mp->m_super, VFS_I(ip))) {
>
> Should this be 'unlikely' event?
>
>>
>> +		kmem_zone_free(xfs_inode_zone, ip);
>> +		return NULL;
>> +	}
>>
>> 	ASSERT(atomic_read(&ip->i_iocount) == 0);
>> 	ASSERT(atomic_read(&ip->i_pincount) == 0);
>> @@ -105,17 +109,6 @@ xfs_inode_alloc(
>> #ifdef XFS_DIR2_TRACE
>> 	ip->i_dir_trace = ktrace_alloc(XFS_DIR2_KTRACE_SIZE, KM_NOFS);
>> #endif
>> -	/*
>> -	* Now initialise the VFS inode. We do this after the xfs_inode
>> -	* initialisation as internal failures will result in - 
>> >destroy_inode
>> -	* being called and that will pass down through the reclaim path and
>> -	* free the XFS inode. This path requires the XFS inode to already  
>> be
>> -	* initialised. Hence if this call fails, the xfs_inode has already
>> -	* been freed and we should not reference it at all in the error
>> -	* handling.
>> -	*/
>> -	if (!inode_init_always(mp->m_super, VFS_I(ip)))
>> -		return NULL;
>>
>> 	/* prevent anyone from using this yet */
>> 	VFS_I(ip)->i_state = I_NEW|I_LOCK;
>> @@ -190,7 +183,7 @@ xfs_iget_cache_hit(
>> 		spin_unlock(&ip->i_flags_lock);
>> 		read_unlock(&pag->pag_ici_lock);
>>
>> -		if (unlikely(!inode_init_always(mp->m_super, inode))) {
>> +		if (unlikely(inode_init_always(mp->m_super, inode))) {
>> 			/*
>> 			 * Re-initializing the inode failed, and we are in deep
>> 			 * trouble.  Try to re-add it to the reclaim list.
>> Index: linux-2.6/include/linux/fs.h
>> ===================================================================
>> --- linux-2.6.orig/include/linux/fs.h	2009-08-03 01:16:21.186539128  
>> +0200
>> +++ linux-2.6/include/linux/fs.h	2009-08-03 01:23:11.131532230 +0200
>> @@ -2136,7 +2136,7 @@ extern loff_t default_llseek(struct file
>>
>> extern loff_t vfs_llseek(struct file *file, loff_t offset, int  
>> origin);
>>
>> -extern struct inode * inode_init_always(struct super_block *,  
>> struct inode *);
>> +extern int inode_init_always(struct super_block *, struct inode *);
>> extern void inode_init_once(struct inode *);
>> extern void inode_add_to_lists(struct super_block *, struct inode *);
>> extern void iput(struct inode *);
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] fix inode_init_always calling convention
@ 2009-08-07 18:09       ` Felix Blyakher
  0 siblings, 0 replies; 38+ messages in thread
From: Felix Blyakher @ 2009-08-07 18:09 UTC (permalink / raw)
  To: Felix Blyakher; +Cc: Christoph Hellwig, linux-fsdevel, xfs


On Aug 7, 2009, at 12:39 PM, Felix Blyakher wrote:

>
> On Aug 4, 2009, at 9:15 AM, Christoph Hellwig wrote:
>
>> Currently inode_init_always calls into ->destroy_inode if the  
>> additional
>> initialization fails.  That's not only counter-intuitive because
>> inode_init_always did not allocate the inode structure, but in case  
>> of
>> XFS it's actively harmful as ->destroy_inode might delete the inode  
>> from
>> a radix-tree that has never been added.  This in turn might end up
>> deleting the inode for the same inum that has been instanciated by
>> another process and cause lots of cause subtile problems.

Also for a clean git log the last line should read:

    another process and cause lots of subtle problems.

Not trying to be picky :)
Felix


>>
>>
>> Also in the case of re-initializing a reclaimable inode in XFS it  
>> would
>> free an inode we still want to keep alive.
>
> Definitely sensible approach for inode_init_always to be
> symmetric, and to not free what it didn't allocate.
>
> Reviewed-by: Felix Blyakher <felixb@sgi.com>
>
> with minor comment below.
>
>>
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>
>> Index: linux-2.6/fs/inode.c
>> ===================================================================
>> --- linux-2.6.orig/fs/inode.c	2009-08-03 01:16:04.254556370 +0200
>> +++ linux-2.6/fs/inode.c	2009-08-03 01:23:11.135532251 +0200
>> @@ -120,12 +120,11 @@ static void wake_up_inode(struct inode *
>> * These are initializations that need to be done on every inode
>> * allocation as the fields are not initialised by slab allocation.
>> */
>> -struct inode *inode_init_always(struct super_block *sb, struct  
>> inode *inode)
>> +int inode_init_always(struct super_block *sb, struct inode *inode)
>> {
>> 	static const struct address_space_operations empty_aops;
>> 	static struct inode_operations empty_iops;
>> 	static const struct file_operations empty_fops;
>> -
>> 	struct address_space *const mapping = &inode->i_data;
>>
>> 	inode->i_sb = sb;
>> @@ -152,7 +151,7 @@ struct inode *inode_init_always(struct s
>> 	inode->dirtied_when = 0;
>>
>> 	if (security_inode_alloc(inode))
>> -		goto out_free_inode;
>> +		goto out;
>>
>> 	/* allocate and initialize an i_integrity */
>> 	if (ima_inode_alloc(inode))
>> @@ -198,16 +197,12 @@ struct inode *inode_init_always(struct s
>> 	inode->i_fsnotify_mask = 0;
>> #endif
>>
>> -	return inode;
>> +	return 0;
>>
>> out_free_security:
>> 	security_inode_free(inode);
>> -out_free_inode:
>> -	if (inode->i_sb->s_op->destroy_inode)
>> -		inode->i_sb->s_op->destroy_inode(inode);
>> -	else
>> -		kmem_cache_free(inode_cachep, (inode));
>> -	return NULL;
>> +out:
>> +	return -ENOMEM;
>> }
>> EXPORT_SYMBOL(inode_init_always);
>>
>> @@ -220,9 +215,17 @@ static struct inode *alloc_inode(struct
>> 	else
>> 		inode = kmem_cache_alloc(inode_cachep, GFP_KERNEL);
>>
>> -	if (inode)
>> -		return inode_init_always(sb, inode);
>> -	return NULL;
>> +	if (!inode)
>> +		return NULL;
>> +
>> +	if (unlikely(inode_init_always(sb, inode))) {
>> +		if (inode->i_sb->s_op->destroy_inode)
>> +			inode->i_sb->s_op->destroy_inode(inode);
>> +		else
>> +			kmem_cache_free(inode_cachep, inode);
>> +	}
>> +
>> +	return inode;
>> }
>>
>> void destroy_inode(struct inode *inode)
>> Index: linux-2.6/fs/xfs/xfs_iget.c
>> ===================================================================
>> --- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-03 01:16:22.510806794  
>> +0200
>> +++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-03 01:23:29.878784477 +0200
>> @@ -64,6 +64,10 @@ xfs_inode_alloc(
>> 	ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
>> 	if (!ip)
>> 		return NULL;
>> +	if (inode_init_always(mp->m_super, VFS_I(ip))) {
>
> Should this be 'unlikely' event?
>
>>
>> +		kmem_zone_free(xfs_inode_zone, ip);
>> +		return NULL;
>> +	}
>>
>> 	ASSERT(atomic_read(&ip->i_iocount) == 0);
>> 	ASSERT(atomic_read(&ip->i_pincount) == 0);
>> @@ -105,17 +109,6 @@ xfs_inode_alloc(
>> #ifdef XFS_DIR2_TRACE
>> 	ip->i_dir_trace = ktrace_alloc(XFS_DIR2_KTRACE_SIZE, KM_NOFS);
>> #endif
>> -	/*
>> -	* Now initialise the VFS inode. We do this after the xfs_inode
>> -	* initialisation as internal failures will result in - 
>> >destroy_inode
>> -	* being called and that will pass down through the reclaim path and
>> -	* free the XFS inode. This path requires the XFS inode to already  
>> be
>> -	* initialised. Hence if this call fails, the xfs_inode has already
>> -	* been freed and we should not reference it at all in the error
>> -	* handling.
>> -	*/
>> -	if (!inode_init_always(mp->m_super, VFS_I(ip)))
>> -		return NULL;
>>
>> 	/* prevent anyone from using this yet */
>> 	VFS_I(ip)->i_state = I_NEW|I_LOCK;
>> @@ -190,7 +183,7 @@ xfs_iget_cache_hit(
>> 		spin_unlock(&ip->i_flags_lock);
>> 		read_unlock(&pag->pag_ici_lock);
>>
>> -		if (unlikely(!inode_init_always(mp->m_super, inode))) {
>> +		if (unlikely(inode_init_always(mp->m_super, inode))) {
>> 			/*
>> 			 * Re-initializing the inode failed, and we are in deep
>> 			 * trouble.  Try to re-add it to the reclaim list.
>> Index: linux-2.6/include/linux/fs.h
>> ===================================================================
>> --- linux-2.6.orig/include/linux/fs.h	2009-08-03 01:16:21.186539128  
>> +0200
>> +++ linux-2.6/include/linux/fs.h	2009-08-03 01:23:11.131532230 +0200
>> @@ -2136,7 +2136,7 @@ extern loff_t default_llseek(struct file
>>
>> extern loff_t vfs_llseek(struct file *file, loff_t offset, int  
>> origin);
>>
>> -extern struct inode * inode_init_always(struct super_block *,  
>> struct inode *);
>> +extern int inode_init_always(struct super_block *, struct inode *);
>> extern void inode_init_once(struct inode *);
>> extern void inode_add_to_lists(struct super_block *, struct inode *);
>> extern void iput(struct inode *);
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs
>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/4] add __destroy_inode
  2009-08-04 14:15   ` Christoph Hellwig
@ 2009-08-07 18:20     ` Felix Blyakher
  -1 siblings, 0 replies; 38+ messages in thread
From: Felix Blyakher @ 2009-08-07 18:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs, linux-fsdevel


On Aug 4, 2009, at 9:15 AM, Christoph Hellwig wrote:

> When we want to tear down an inode that lost the add to the cache race
> in XFS we must not call into ->destroy_inode because that would delete
> the inode that won the race from the inode cache radix tree.
>
> This patch provides the __destroy_inode helper needed to fix this,
> the actual fix will be in th next patch.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Felix Blyakher <felixb@sgi.com>

>
>
> Index: linux-2.6/fs/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/inode.c	2009-08-04 16:01:18.766801320 +0200
> +++ linux-2.6/fs/inode.c	2009-08-04 16:01:37.281556243 +0200
> @@ -228,7 +228,7 @@ static struct inode *alloc_inode(struct
> 	return inode;
> }
>
> -void destroy_inode(struct inode *inode)
> +void __destroy_inode(struct inode *inode)
> {
> 	BUG_ON(inode_has_buffers(inode));
> 	ima_inode_free(inode);
> @@ -240,13 +240,17 @@ void destroy_inode(struct inode *inode)
> 	if (inode->i_default_acl && inode->i_default_acl != ACL_NOT_CACHED)
> 		posix_acl_release(inode->i_default_acl);
> #endif
> +}
> +EXPORT_SYMBOL(__destroy_inode);
> +
> +void destroy_inode(struct inode *inode)
> +{
> +	__destroy_inode(inode);
> 	if (inode->i_sb->s_op->destroy_inode)
> 		inode->i_sb->s_op->destroy_inode(inode);
> 	else
> 		kmem_cache_free(inode_cachep, (inode));
> }
> -EXPORT_SYMBOL(destroy_inode);
> -
>
> /*
>  * These are initializations that only need to be done
> Index: linux-2.6/fs/xfs/xfs_inode.h
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_inode.h	2009-08-04 15:59:38.705782128  
> +0200
> +++ linux-2.6/fs/xfs/xfs_inode.h	2009-08-04 16:01:20.157556334 +0200
> @@ -322,8 +322,11 @@ static inline struct inode *VFS_I(struct
>  */
> static inline void xfs_destroy_inode(struct xfs_inode *ip)
> {
> -	make_bad_inode(VFS_I(ip));
> -	return destroy_inode(VFS_I(ip));
> +	struct inode *inode = VFS_I(ip);
> +
> +	make_bad_inode(inode);
> +	__destroy_inode(inode);
> +	inode->i_sb->s_op->destroy_inode(inode);
> }
>
> /*
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h	2009-08-04 16:01:18.770804693  
> +0200
> +++ linux-2.6/include/linux/fs.h	2009-08-04 16:01:20.159539128 +0200
> @@ -2163,6 +2163,7 @@ extern void __iget(struct inode * inode)
> extern void iget_failed(struct inode *);
> extern void clear_inode(struct inode *);
> extern void destroy_inode(struct inode *);
> +extern void __destroy_inode(struct inode *);
> extern struct inode *new_inode(struct super_block *);
> extern int should_remove_suid(struct dentry *);
> extern int file_remove_suid(struct file *);
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/4] add __destroy_inode
@ 2009-08-07 18:20     ` Felix Blyakher
  0 siblings, 0 replies; 38+ messages in thread
From: Felix Blyakher @ 2009-08-07 18:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs


On Aug 4, 2009, at 9:15 AM, Christoph Hellwig wrote:

> When we want to tear down an inode that lost the add to the cache race
> in XFS we must not call into ->destroy_inode because that would delete
> the inode that won the race from the inode cache radix tree.
>
> This patch provides the __destroy_inode helper needed to fix this,
> the actual fix will be in th next patch.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Felix Blyakher <felixb@sgi.com>

>
>
> Index: linux-2.6/fs/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/inode.c	2009-08-04 16:01:18.766801320 +0200
> +++ linux-2.6/fs/inode.c	2009-08-04 16:01:37.281556243 +0200
> @@ -228,7 +228,7 @@ static struct inode *alloc_inode(struct
> 	return inode;
> }
>
> -void destroy_inode(struct inode *inode)
> +void __destroy_inode(struct inode *inode)
> {
> 	BUG_ON(inode_has_buffers(inode));
> 	ima_inode_free(inode);
> @@ -240,13 +240,17 @@ void destroy_inode(struct inode *inode)
> 	if (inode->i_default_acl && inode->i_default_acl != ACL_NOT_CACHED)
> 		posix_acl_release(inode->i_default_acl);
> #endif
> +}
> +EXPORT_SYMBOL(__destroy_inode);
> +
> +void destroy_inode(struct inode *inode)
> +{
> +	__destroy_inode(inode);
> 	if (inode->i_sb->s_op->destroy_inode)
> 		inode->i_sb->s_op->destroy_inode(inode);
> 	else
> 		kmem_cache_free(inode_cachep, (inode));
> }
> -EXPORT_SYMBOL(destroy_inode);
> -
>
> /*
>  * These are initializations that only need to be done
> Index: linux-2.6/fs/xfs/xfs_inode.h
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_inode.h	2009-08-04 15:59:38.705782128  
> +0200
> +++ linux-2.6/fs/xfs/xfs_inode.h	2009-08-04 16:01:20.157556334 +0200
> @@ -322,8 +322,11 @@ static inline struct inode *VFS_I(struct
>  */
> static inline void xfs_destroy_inode(struct xfs_inode *ip)
> {
> -	make_bad_inode(VFS_I(ip));
> -	return destroy_inode(VFS_I(ip));
> +	struct inode *inode = VFS_I(ip);
> +
> +	make_bad_inode(inode);
> +	__destroy_inode(inode);
> +	inode->i_sb->s_op->destroy_inode(inode);
> }
>
> /*
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h	2009-08-04 16:01:18.770804693  
> +0200
> +++ linux-2.6/include/linux/fs.h	2009-08-04 16:01:20.159539128 +0200
> @@ -2163,6 +2163,7 @@ extern void __iget(struct inode * inode)
> extern void iget_failed(struct inode *);
> extern void clear_inode(struct inode *);
> extern void destroy_inode(struct inode *);
> +extern void __destroy_inode(struct inode *);
> extern struct inode *new_inode(struct super_block *);
> extern int should_remove_suid(struct dentry *);
> extern int file_remove_suid(struct file *);
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/4] xfs: add xfs_inode_free
  2009-08-04 14:15   ` Christoph Hellwig
@ 2009-08-07 18:22     ` Felix Blyakher
  -1 siblings, 0 replies; 38+ messages in thread
From: Felix Blyakher @ 2009-08-07 18:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs, linux-fsdevel


On Aug 4, 2009, at 9:15 AM, Christoph Hellwig wrote:

> When we want to tear down an inode that lost the add to the cache race
> in XFS we must not call into ->destroy_inode because that would delete
> the inode that won the race from the inode cache radix tree.
>
> This patch uses splits a new xfs_inode_free helper out of xfs_ireclaim
              ^^^^

take it out

>
> and uses that plus __destroy_inode to make sure we really only free
> the memory allocted for the inode that lost the race, and not mess  
> with
                  ^
>
> the inode cache state.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Felix Blyakher <felixb@sgi.com>

>
>
> Index: linux-2.6/fs/xfs/xfs_iget.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-03 01:23:29.878784477  
> +0200
> +++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-03 01:25:01.601784988 +0200
> @@ -116,6 +116,71 @@ xfs_inode_alloc(
> 	return ip;
> }
>
> +STATIC void
> +xfs_inode_free(
> +	struct xfs_inode	*ip)
> +{
> +	switch (ip->i_d.di_mode & S_IFMT) {
> +	case S_IFREG:
> +	case S_IFDIR:
> +	case S_IFLNK:
> +		xfs_idestroy_fork(ip, XFS_DATA_FORK);
> +		break;
> +	}
> +
> +	if (ip->i_afp)
> +		xfs_idestroy_fork(ip, XFS_ATTR_FORK);
> +
> +#ifdef XFS_INODE_TRACE
> +	ktrace_free(ip->i_trace);
> +#endif
> +#ifdef XFS_BMAP_TRACE
> +	ktrace_free(ip->i_xtrace);
> +#endif
> +#ifdef XFS_BTREE_TRACE
> +	ktrace_free(ip->i_btrace);
> +#endif
> +#ifdef XFS_RW_TRACE
> +	ktrace_free(ip->i_rwtrace);
> +#endif
> +#ifdef XFS_ILOCK_TRACE
> +	ktrace_free(ip->i_lock_trace);
> +#endif
> +#ifdef XFS_DIR2_TRACE
> +	ktrace_free(ip->i_dir_trace);
> +#endif
> +
> +	if (ip->i_itemp) {
> +		/*
> +		 * Only if we are shutting down the fs will we see an
> +		 * inode still in the AIL. If it is there, we should remove
> +		 * it to prevent a use-after-free from occurring.
> +		 */
> +		xfs_log_item_t	*lip = &ip->i_itemp->ili_item;
> +		struct xfs_ail	*ailp = lip->li_ailp;
> +
> +		ASSERT(((lip->li_flags & XFS_LI_IN_AIL) == 0) ||
> +				       XFS_FORCED_SHUTDOWN(ip->i_mount));
> +		if (lip->li_flags & XFS_LI_IN_AIL) {
> +			spin_lock(&ailp->xa_lock);
> +			if (lip->li_flags & XFS_LI_IN_AIL)
> +				xfs_trans_ail_delete(ailp, lip);
> +			else
> +				spin_unlock(&ailp->xa_lock);
> +		}
> +		xfs_inode_item_destroy(ip);
> +		ip->i_itemp = NULL;
> +	}
> +
> +	/* asserts to verify all state is correct here */
> +	ASSERT(atomic_read(&ip->i_iocount) == 0);
> +	ASSERT(atomic_read(&ip->i_pincount) == 0);
> +	ASSERT(!spin_is_locked(&ip->i_flags_lock));
> +	ASSERT(completion_done(&ip->i_flush));
> +
> +	kmem_zone_free(xfs_inode_zone, ip);
> +}
> +
> /*
>  * Check the validity of the inode we just found it the cache
>  */
> @@ -303,7 +368,8 @@ out_preload_end:
> 	if (lock_flags)
> 		xfs_iunlock(ip, lock_flags);
> out_destroy:
> -	xfs_destroy_inode(ip);
> +	__destroy_inode(VFS_I(ip));
> +	xfs_inode_free(ip);
> 	return error;
> }
>
> @@ -506,62 +572,7 @@ xfs_ireclaim(
> 	xfs_qm_dqdetach(ip);
> 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>
> -	switch (ip->i_d.di_mode & S_IFMT) {
> -	case S_IFREG:
> -	case S_IFDIR:
> -	case S_IFLNK:
> -		xfs_idestroy_fork(ip, XFS_DATA_FORK);
> -		break;
> -	}
> -
> -	if (ip->i_afp)
> -		xfs_idestroy_fork(ip, XFS_ATTR_FORK);
> -
> -#ifdef XFS_INODE_TRACE
> -	ktrace_free(ip->i_trace);
> -#endif
> -#ifdef XFS_BMAP_TRACE
> -	ktrace_free(ip->i_xtrace);
> -#endif
> -#ifdef XFS_BTREE_TRACE
> -	ktrace_free(ip->i_btrace);
> -#endif
> -#ifdef XFS_RW_TRACE
> -	ktrace_free(ip->i_rwtrace);
> -#endif
> -#ifdef XFS_ILOCK_TRACE
> -	ktrace_free(ip->i_lock_trace);
> -#endif
> -#ifdef XFS_DIR2_TRACE
> -	ktrace_free(ip->i_dir_trace);
> -#endif
> -	if (ip->i_itemp) {
> -		/*
> -		 * Only if we are shutting down the fs will we see an
> -		 * inode still in the AIL. If it is there, we should remove
> -		 * it to prevent a use-after-free from occurring.
> -		 */
> -		xfs_log_item_t	*lip = &ip->i_itemp->ili_item;
> -		struct xfs_ail	*ailp = lip->li_ailp;
> -
> -		ASSERT(((lip->li_flags & XFS_LI_IN_AIL) == 0) ||
> -				       XFS_FORCED_SHUTDOWN(ip->i_mount));
> -		if (lip->li_flags & XFS_LI_IN_AIL) {
> -			spin_lock(&ailp->xa_lock);
> -			if (lip->li_flags & XFS_LI_IN_AIL)
> -				xfs_trans_ail_delete(ailp, lip);
> -			else
> -				spin_unlock(&ailp->xa_lock);
> -		}
> -		xfs_inode_item_destroy(ip);
> -		ip->i_itemp = NULL;
> -	}
> -	/* asserts to verify all state is correct here */
> -	ASSERT(atomic_read(&ip->i_iocount) == 0);
> -	ASSERT(atomic_read(&ip->i_pincount) == 0);
> -	ASSERT(!spin_is_locked(&ip->i_flags_lock));
> -	ASSERT(completion_done(&ip->i_flush));
> -	kmem_zone_free(xfs_inode_zone, ip);
> +	xfs_inode_free(ip);
> }
>
> /*
> Index: linux-2.6/fs/xfs/xfs_inode.h
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_inode.h	2009-08-03 01:23:39.876532108  
> +0200
> +++ linux-2.6/fs/xfs/xfs_inode.h	2009-08-03 01:23:47.411789594 +0200
> @@ -310,26 +310,6 @@ static inline struct inode *VFS_I(struct
> }
>
> /*
> - * Get rid of a partially initialized inode.
> - *
> - * We have to go through destroy_inode to make sure allocations
> - * from init_inode_always like the security data are undone.
> - *
> - * We mark the inode bad so that it takes the short cut in
> - * the reclaim path instead of going through the flush path
> - * which doesn't make sense for an inode that has never seen the
> - * light of day.
> - */
> -static inline void xfs_destroy_inode(struct xfs_inode *ip)
> -{
> -	struct inode *inode = VFS_I(ip);
> -
> -	make_bad_inode(inode);
> -	__destroy_inode(inode);
> -	inode->i_sb->s_op->destroy_inode(inode);
> -}
> -
> -/*
>  * i_flags helper functions
>  */
> static inline void
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/4] xfs: add xfs_inode_free
@ 2009-08-07 18:22     ` Felix Blyakher
  0 siblings, 0 replies; 38+ messages in thread
From: Felix Blyakher @ 2009-08-07 18:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs


On Aug 4, 2009, at 9:15 AM, Christoph Hellwig wrote:

> When we want to tear down an inode that lost the add to the cache race
> in XFS we must not call into ->destroy_inode because that would delete
> the inode that won the race from the inode cache radix tree.
>
> This patch uses splits a new xfs_inode_free helper out of xfs_ireclaim
              ^^^^

take it out

>
> and uses that plus __destroy_inode to make sure we really only free
> the memory allocted for the inode that lost the race, and not mess  
> with
                  ^
>
> the inode cache state.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Felix Blyakher <felixb@sgi.com>

>
>
> Index: linux-2.6/fs/xfs/xfs_iget.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-03 01:23:29.878784477  
> +0200
> +++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-03 01:25:01.601784988 +0200
> @@ -116,6 +116,71 @@ xfs_inode_alloc(
> 	return ip;
> }
>
> +STATIC void
> +xfs_inode_free(
> +	struct xfs_inode	*ip)
> +{
> +	switch (ip->i_d.di_mode & S_IFMT) {
> +	case S_IFREG:
> +	case S_IFDIR:
> +	case S_IFLNK:
> +		xfs_idestroy_fork(ip, XFS_DATA_FORK);
> +		break;
> +	}
> +
> +	if (ip->i_afp)
> +		xfs_idestroy_fork(ip, XFS_ATTR_FORK);
> +
> +#ifdef XFS_INODE_TRACE
> +	ktrace_free(ip->i_trace);
> +#endif
> +#ifdef XFS_BMAP_TRACE
> +	ktrace_free(ip->i_xtrace);
> +#endif
> +#ifdef XFS_BTREE_TRACE
> +	ktrace_free(ip->i_btrace);
> +#endif
> +#ifdef XFS_RW_TRACE
> +	ktrace_free(ip->i_rwtrace);
> +#endif
> +#ifdef XFS_ILOCK_TRACE
> +	ktrace_free(ip->i_lock_trace);
> +#endif
> +#ifdef XFS_DIR2_TRACE
> +	ktrace_free(ip->i_dir_trace);
> +#endif
> +
> +	if (ip->i_itemp) {
> +		/*
> +		 * Only if we are shutting down the fs will we see an
> +		 * inode still in the AIL. If it is there, we should remove
> +		 * it to prevent a use-after-free from occurring.
> +		 */
> +		xfs_log_item_t	*lip = &ip->i_itemp->ili_item;
> +		struct xfs_ail	*ailp = lip->li_ailp;
> +
> +		ASSERT(((lip->li_flags & XFS_LI_IN_AIL) == 0) ||
> +				       XFS_FORCED_SHUTDOWN(ip->i_mount));
> +		if (lip->li_flags & XFS_LI_IN_AIL) {
> +			spin_lock(&ailp->xa_lock);
> +			if (lip->li_flags & XFS_LI_IN_AIL)
> +				xfs_trans_ail_delete(ailp, lip);
> +			else
> +				spin_unlock(&ailp->xa_lock);
> +		}
> +		xfs_inode_item_destroy(ip);
> +		ip->i_itemp = NULL;
> +	}
> +
> +	/* asserts to verify all state is correct here */
> +	ASSERT(atomic_read(&ip->i_iocount) == 0);
> +	ASSERT(atomic_read(&ip->i_pincount) == 0);
> +	ASSERT(!spin_is_locked(&ip->i_flags_lock));
> +	ASSERT(completion_done(&ip->i_flush));
> +
> +	kmem_zone_free(xfs_inode_zone, ip);
> +}
> +
> /*
>  * Check the validity of the inode we just found it the cache
>  */
> @@ -303,7 +368,8 @@ out_preload_end:
> 	if (lock_flags)
> 		xfs_iunlock(ip, lock_flags);
> out_destroy:
> -	xfs_destroy_inode(ip);
> +	__destroy_inode(VFS_I(ip));
> +	xfs_inode_free(ip);
> 	return error;
> }
>
> @@ -506,62 +572,7 @@ xfs_ireclaim(
> 	xfs_qm_dqdetach(ip);
> 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>
> -	switch (ip->i_d.di_mode & S_IFMT) {
> -	case S_IFREG:
> -	case S_IFDIR:
> -	case S_IFLNK:
> -		xfs_idestroy_fork(ip, XFS_DATA_FORK);
> -		break;
> -	}
> -
> -	if (ip->i_afp)
> -		xfs_idestroy_fork(ip, XFS_ATTR_FORK);
> -
> -#ifdef XFS_INODE_TRACE
> -	ktrace_free(ip->i_trace);
> -#endif
> -#ifdef XFS_BMAP_TRACE
> -	ktrace_free(ip->i_xtrace);
> -#endif
> -#ifdef XFS_BTREE_TRACE
> -	ktrace_free(ip->i_btrace);
> -#endif
> -#ifdef XFS_RW_TRACE
> -	ktrace_free(ip->i_rwtrace);
> -#endif
> -#ifdef XFS_ILOCK_TRACE
> -	ktrace_free(ip->i_lock_trace);
> -#endif
> -#ifdef XFS_DIR2_TRACE
> -	ktrace_free(ip->i_dir_trace);
> -#endif
> -	if (ip->i_itemp) {
> -		/*
> -		 * Only if we are shutting down the fs will we see an
> -		 * inode still in the AIL. If it is there, we should remove
> -		 * it to prevent a use-after-free from occurring.
> -		 */
> -		xfs_log_item_t	*lip = &ip->i_itemp->ili_item;
> -		struct xfs_ail	*ailp = lip->li_ailp;
> -
> -		ASSERT(((lip->li_flags & XFS_LI_IN_AIL) == 0) ||
> -				       XFS_FORCED_SHUTDOWN(ip->i_mount));
> -		if (lip->li_flags & XFS_LI_IN_AIL) {
> -			spin_lock(&ailp->xa_lock);
> -			if (lip->li_flags & XFS_LI_IN_AIL)
> -				xfs_trans_ail_delete(ailp, lip);
> -			else
> -				spin_unlock(&ailp->xa_lock);
> -		}
> -		xfs_inode_item_destroy(ip);
> -		ip->i_itemp = NULL;
> -	}
> -	/* asserts to verify all state is correct here */
> -	ASSERT(atomic_read(&ip->i_iocount) == 0);
> -	ASSERT(atomic_read(&ip->i_pincount) == 0);
> -	ASSERT(!spin_is_locked(&ip->i_flags_lock));
> -	ASSERT(completion_done(&ip->i_flush));
> -	kmem_zone_free(xfs_inode_zone, ip);
> +	xfs_inode_free(ip);
> }
>
> /*
> Index: linux-2.6/fs/xfs/xfs_inode.h
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_inode.h	2009-08-03 01:23:39.876532108  
> +0200
> +++ linux-2.6/fs/xfs/xfs_inode.h	2009-08-03 01:23:47.411789594 +0200
> @@ -310,26 +310,6 @@ static inline struct inode *VFS_I(struct
> }
>
> /*
> - * Get rid of a partially initialized inode.
> - *
> - * We have to go through destroy_inode to make sure allocations
> - * from init_inode_always like the security data are undone.
> - *
> - * We mark the inode bad so that it takes the short cut in
> - * the reclaim path instead of going through the flush path
> - * which doesn't make sense for an inode that has never seen the
> - * light of day.
> - */
> -static inline void xfs_destroy_inode(struct xfs_inode *ip)
> -{
> -	struct inode *inode = VFS_I(ip);
> -
> -	make_bad_inode(inode);
> -	__destroy_inode(inode);
> -	inode->i_sb->s_op->destroy_inode(inode);
> -}
> -
> -/*
>  * i_flags helper functions
>  */
> static inline void
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] xfs: fix locking in xfs_iget_cache_hit
  2009-08-07 17:25     ` Felix Blyakher
@ 2009-08-10 17:09       ` Christoph Hellwig
  -1 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2009-08-10 17:09 UTC (permalink / raw)
  To: Felix Blyakher; +Cc: Christoph Hellwig, linux-fsdevel, xfs

On Fri, Aug 07, 2009 at 12:25:47PM -0500, Felix Blyakher wrote:
>> +	if (ip->i_flags & XFS_INEW) {
>
> Another case when we find XFS_INEW set is the race with the
> cache miss, which just set up a new inode. Would the proposed
> code be still sensible in that case? If yes, at least comments
> should be updated.

>> +		wait_on_inode(inode);
>
> It's possible to have XFS_INEW set, but no I_LOCK|I_NEW yet.
> Then the wait_on_inode() would return quickly even before the
> linux inode is reinitialized. Though, that was the case with
> the old code as well.

The wait_on_inode is only sensible for the non-recycle case.  But it's
not actually very useful with our flags scheme, so for now I've reverted
to do the old-style polling and just added an XXX comment that we'll
eventually look into a better scheme.

>> +	/*
>> +	 * If lookup is racing with unlink, then we should return an
>> +	 * error immediately so we don't remove it from the reclaim
>> +	 * list and potentially leak the inode.
>> +	 */
>> +	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
>
> Previously the conclusion of the race with unlink was based on
> XFS_IRECLAIMABLE i_flag set in addition to the test above.
> Is is no longer a case, or not really necessary?

We actually had the test two times in the old code, once for the
reclaim case, and once after the igrab succeeded.  I just moved it into
a single command place.  I've updated the comment to match that.


New version below:

-- 

Subject: xfs: fix locking in xfs_iget_cache_hit
From: Christoph Hellwig <hch@lst.de>

The locking in xfs_iget_cache_hit currently has numerous problems:

 - we clear the reclaim tag without i_flags_lock which protects modifications
   to it
 - we call inode_init_always which can sleep with pag_ici_lock held
   (this is oss.sgi.com BZ #819)
 - we acquire and drop i_flags_lock a lot and thus provide no consistency
   between the various flags we set/clear under it

This patch fixes all that with a major revamp of the locking in the function.
The new version acquires i_flags_lock early and only drops it once we need to
call into inode_init_always or before calling xfs_ilock.

This patch fixes a bug seen in the wild where we race modifying the reclaim tag.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/xfs/xfs_iget.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-10 13:10:19.141974933 -0300
+++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-10 13:19:06.913056731 -0300
@@ -191,80 +191,83 @@ xfs_iget_cache_hit(
 	int			flags,
 	int			lock_flags) __releases(pag->pag_ici_lock)
 {
+	struct inode		*inode = VFS_I(ip);
 	struct xfs_mount	*mp = ip->i_mount;
-	int			error = EAGAIN;
+	int			error;
+
+	spin_lock(&ip->i_flags_lock);
 
 	/*
-	 * If INEW is set this inode is being set up
-	 * If IRECLAIM is set this inode is being torn down
-	 * Pause and try again.
+	 * If we are racing with another cache hit that is currently
+	 * instantiating this inode or currently recycling it out of
+	 * reclaimabe state, wait for the initialisation to complete
+	 * before continuing.
+	 *
+	 * XXX(hch): eventually we should do something equivalent to
+	 *	     wait_on_inode to wait for these flags to be cleared
+	 *	     instead of polling for it.
 	 */
-	if (xfs_iflags_test(ip, (XFS_INEW|XFS_IRECLAIM))) {
+	if (ip->i_flags & (XFS_INEW|XFS_IRECLAIM)) {
 		XFS_STATS_INC(xs_ig_frecycle);
+		error = EAGAIN;
 		goto out_error;
 	}
 
-	/* If IRECLAIMABLE is set, we've torn down the vfs inode part */
-	if (xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
-
-		/*
-		 * If lookup is racing with unlink, then we should return an
-		 * error immediately so we don't remove it from the reclaim
-		 * list and potentially leak the inode.
-		 */
-		if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) {
-			error = ENOENT;
-			goto out_error;
-		}
+	/*
+	 * If lookup is racing with unlink return an error immediately.
+	 */
+	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
+		error = ENOENT;
+		goto out_error;
+	}
 
+	/*
+	 * If IRECLAIMABLE is set, we've torn down the VFS inode already.
+	 * Need to carefully get it back into useable state.
+	 */
+	if (ip->i_flags & XFS_IRECLAIMABLE) {
 		xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
 
 		/*
-		 * We need to re-initialise the VFS inode as it has been
-		 * 'freed' by the VFS. Do this here so we can deal with
-		 * errors cleanly, then tag it so it can be set up correctly
-		 * later.
+		 * We need to set XFS_INEW atomically with clearing the
+		 * reclaimable tag so that we do have an indicator of the
+		 * inode still being initialized.
 		 */
-		if (inode_init_always(mp->m_super, VFS_I(ip))) {
+		ip->i_flags |= XFS_INEW;
+		ip->i_flags &= ~XFS_IRECLAIMABLE;
+		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
+
+		spin_unlock(&ip->i_flags_lock);
+		read_unlock(&pag->pag_ici_lock);
+
+		if (unlikely(inode_init_always(mp->m_super, inode))) {
+			/*
+			 * Re-initializing the inode failed, and we are in deep
+			 * trouble.  Try to re-add it to the reclaim list.
+			 */
+			read_lock(&pag->pag_ici_lock);
+			spin_lock(&ip->i_flags_lock);
+
+			ip->i_flags &= ~XFS_INEW;
+			ip->i_flags |= XFS_IRECLAIMABLE;
+			__xfs_inode_set_reclaim_tag(pag, ip);
+
 			error = ENOMEM;
 			goto out_error;
 		}
-
-		/*
-		 * We must set the XFS_INEW flag before clearing the
-		 * XFS_IRECLAIMABLE flag so that if a racing lookup does
-		 * not find the XFS_IRECLAIMABLE above but has the igrab()
-		 * below succeed we can safely check XFS_INEW to detect
-		 * that this inode is still being initialised.
-		 */
-		xfs_iflags_set(ip, XFS_INEW);
-		xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
-
-		/* clear the radix tree reclaim flag as well. */
-		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
-	} else if (!igrab(VFS_I(ip))) {
+		inode->i_state = I_LOCK|I_NEW;
+	} else {
 		/* If the VFS inode is being torn down, pause and try again. */
-		XFS_STATS_INC(xs_ig_frecycle);
-		goto out_error;
-	} else if (xfs_iflags_test(ip, XFS_INEW)) {
-		/*
-		 * We are racing with another cache hit that is
-		 * currently recycling this inode out of the XFS_IRECLAIMABLE
-		 * state. Wait for the initialisation to complete before
-		 * continuing.
-		 */
-		wait_on_inode(VFS_I(ip));
-	}
+		if (!igrab(inode)) {
+			error = EAGAIN;
+			goto out_error;
+		}
 
-	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
-		error = ENOENT;
-		iput(VFS_I(ip));
-		goto out_error;
+		/* We've got a live one. */
+		spin_unlock(&ip->i_flags_lock);
+		read_unlock(&pag->pag_ici_lock);
 	}
 
-	/* We've got a live one. */
-	read_unlock(&pag->pag_ici_lock);
-
 	if (lock_flags != 0)
 		xfs_ilock(ip, lock_flags);
 
@@ -274,6 +277,7 @@ xfs_iget_cache_hit(
 	return 0;
 
 out_error:
+	spin_unlock(&ip->i_flags_lock);
 	read_unlock(&pag->pag_ici_lock);
 	return error;
 }
Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.c	2009-08-10 13:10:19.146974522 -0300
+++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.c	2009-08-10 13:10:59.958993938 -0300
@@ -708,6 +708,16 @@ xfs_reclaim_inode(
 	return 0;
 }
 
+void
+__xfs_inode_set_reclaim_tag(
+	struct xfs_perag	*pag,
+	struct xfs_inode	*ip)
+{
+	radix_tree_tag_set(&pag->pag_ici_root,
+			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
+			   XFS_ICI_RECLAIM_TAG);
+}
+
 /*
  * We set the inode flag atomically with the radix tree tag.
  * Once we get tag lookups on the radix tree, this inode flag
@@ -722,8 +732,7 @@ xfs_inode_set_reclaim_tag(
 
 	read_lock(&pag->pag_ici_lock);
 	spin_lock(&ip->i_flags_lock);
-	radix_tree_tag_set(&pag->pag_ici_root,
-			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
+	__xfs_inode_set_reclaim_tag(pag, ip);
 	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
 	spin_unlock(&ip->i_flags_lock);
 	read_unlock(&pag->pag_ici_lock);
Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.h
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.h	2009-08-10 13:10:19.153974227 -0300
+++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.h	2009-08-10 13:10:59.962994168 -0300
@@ -48,6 +48,7 @@ int xfs_reclaim_inode(struct xfs_inode *
 int xfs_reclaim_inodes(struct xfs_mount *mp, int mode);
 
 void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
+void __xfs_inode_set_reclaim_tag(struct xfs_perag *pag, struct xfs_inode *ip);
 void xfs_inode_clear_reclaim_tag(struct xfs_inode *ip);
 void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp, struct xfs_perag *pag,
 				struct xfs_inode *ip);

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] xfs: fix locking in xfs_iget_cache_hit
@ 2009-08-10 17:09       ` Christoph Hellwig
  0 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2009-08-10 17:09 UTC (permalink / raw)
  To: Felix Blyakher; +Cc: Christoph Hellwig, linux-fsdevel, xfs

On Fri, Aug 07, 2009 at 12:25:47PM -0500, Felix Blyakher wrote:
>> +	if (ip->i_flags & XFS_INEW) {
>
> Another case when we find XFS_INEW set is the race with the
> cache miss, which just set up a new inode. Would the proposed
> code be still sensible in that case? If yes, at least comments
> should be updated.

>> +		wait_on_inode(inode);
>
> It's possible to have XFS_INEW set, but no I_LOCK|I_NEW yet.
> Then the wait_on_inode() would return quickly even before the
> linux inode is reinitialized. Though, that was the case with
> the old code as well.

The wait_on_inode is only sensible for the non-recycle case.  But it's
not actually very useful with our flags scheme, so for now I've reverted
to do the old-style polling and just added an XXX comment that we'll
eventually look into a better scheme.

>> +	/*
>> +	 * If lookup is racing with unlink, then we should return an
>> +	 * error immediately so we don't remove it from the reclaim
>> +	 * list and potentially leak the inode.
>> +	 */
>> +	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
>
> Previously the conclusion of the race with unlink was based on
> XFS_IRECLAIMABLE i_flag set in addition to the test above.
> Is is no longer a case, or not really necessary?

We actually had the test two times in the old code, once for the
reclaim case, and once after the igrab succeeded.  I just moved it into
a single command place.  I've updated the comment to match that.


New version below:

-- 

Subject: xfs: fix locking in xfs_iget_cache_hit
From: Christoph Hellwig <hch@lst.de>

The locking in xfs_iget_cache_hit currently has numerous problems:

 - we clear the reclaim tag without i_flags_lock which protects modifications
   to it
 - we call inode_init_always which can sleep with pag_ici_lock held
   (this is oss.sgi.com BZ #819)
 - we acquire and drop i_flags_lock a lot and thus provide no consistency
   between the various flags we set/clear under it

This patch fixes all that with a major revamp of the locking in the function.
The new version acquires i_flags_lock early and only drops it once we need to
call into inode_init_always or before calling xfs_ilock.

This patch fixes a bug seen in the wild where we race modifying the reclaim tag.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/xfs/xfs_iget.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-10 13:10:19.141974933 -0300
+++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-10 13:19:06.913056731 -0300
@@ -191,80 +191,83 @@ xfs_iget_cache_hit(
 	int			flags,
 	int			lock_flags) __releases(pag->pag_ici_lock)
 {
+	struct inode		*inode = VFS_I(ip);
 	struct xfs_mount	*mp = ip->i_mount;
-	int			error = EAGAIN;
+	int			error;
+
+	spin_lock(&ip->i_flags_lock);
 
 	/*
-	 * If INEW is set this inode is being set up
-	 * If IRECLAIM is set this inode is being torn down
-	 * Pause and try again.
+	 * If we are racing with another cache hit that is currently
+	 * instantiating this inode or currently recycling it out of
+	 * reclaimabe state, wait for the initialisation to complete
+	 * before continuing.
+	 *
+	 * XXX(hch): eventually we should do something equivalent to
+	 *	     wait_on_inode to wait for these flags to be cleared
+	 *	     instead of polling for it.
 	 */
-	if (xfs_iflags_test(ip, (XFS_INEW|XFS_IRECLAIM))) {
+	if (ip->i_flags & (XFS_INEW|XFS_IRECLAIM)) {
 		XFS_STATS_INC(xs_ig_frecycle);
+		error = EAGAIN;
 		goto out_error;
 	}
 
-	/* If IRECLAIMABLE is set, we've torn down the vfs inode part */
-	if (xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
-
-		/*
-		 * If lookup is racing with unlink, then we should return an
-		 * error immediately so we don't remove it from the reclaim
-		 * list and potentially leak the inode.
-		 */
-		if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) {
-			error = ENOENT;
-			goto out_error;
-		}
+	/*
+	 * If lookup is racing with unlink return an error immediately.
+	 */
+	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
+		error = ENOENT;
+		goto out_error;
+	}
 
+	/*
+	 * If IRECLAIMABLE is set, we've torn down the VFS inode already.
+	 * Need to carefully get it back into useable state.
+	 */
+	if (ip->i_flags & XFS_IRECLAIMABLE) {
 		xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
 
 		/*
-		 * We need to re-initialise the VFS inode as it has been
-		 * 'freed' by the VFS. Do this here so we can deal with
-		 * errors cleanly, then tag it so it can be set up correctly
-		 * later.
+		 * We need to set XFS_INEW atomically with clearing the
+		 * reclaimable tag so that we do have an indicator of the
+		 * inode still being initialized.
 		 */
-		if (inode_init_always(mp->m_super, VFS_I(ip))) {
+		ip->i_flags |= XFS_INEW;
+		ip->i_flags &= ~XFS_IRECLAIMABLE;
+		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
+
+		spin_unlock(&ip->i_flags_lock);
+		read_unlock(&pag->pag_ici_lock);
+
+		if (unlikely(inode_init_always(mp->m_super, inode))) {
+			/*
+			 * Re-initializing the inode failed, and we are in deep
+			 * trouble.  Try to re-add it to the reclaim list.
+			 */
+			read_lock(&pag->pag_ici_lock);
+			spin_lock(&ip->i_flags_lock);
+
+			ip->i_flags &= ~XFS_INEW;
+			ip->i_flags |= XFS_IRECLAIMABLE;
+			__xfs_inode_set_reclaim_tag(pag, ip);
+
 			error = ENOMEM;
 			goto out_error;
 		}
-
-		/*
-		 * We must set the XFS_INEW flag before clearing the
-		 * XFS_IRECLAIMABLE flag so that if a racing lookup does
-		 * not find the XFS_IRECLAIMABLE above but has the igrab()
-		 * below succeed we can safely check XFS_INEW to detect
-		 * that this inode is still being initialised.
-		 */
-		xfs_iflags_set(ip, XFS_INEW);
-		xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
-
-		/* clear the radix tree reclaim flag as well. */
-		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
-	} else if (!igrab(VFS_I(ip))) {
+		inode->i_state = I_LOCK|I_NEW;
+	} else {
 		/* If the VFS inode is being torn down, pause and try again. */
-		XFS_STATS_INC(xs_ig_frecycle);
-		goto out_error;
-	} else if (xfs_iflags_test(ip, XFS_INEW)) {
-		/*
-		 * We are racing with another cache hit that is
-		 * currently recycling this inode out of the XFS_IRECLAIMABLE
-		 * state. Wait for the initialisation to complete before
-		 * continuing.
-		 */
-		wait_on_inode(VFS_I(ip));
-	}
+		if (!igrab(inode)) {
+			error = EAGAIN;
+			goto out_error;
+		}
 
-	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
-		error = ENOENT;
-		iput(VFS_I(ip));
-		goto out_error;
+		/* We've got a live one. */
+		spin_unlock(&ip->i_flags_lock);
+		read_unlock(&pag->pag_ici_lock);
 	}
 
-	/* We've got a live one. */
-	read_unlock(&pag->pag_ici_lock);
-
 	if (lock_flags != 0)
 		xfs_ilock(ip, lock_flags);
 
@@ -274,6 +277,7 @@ xfs_iget_cache_hit(
 	return 0;
 
 out_error:
+	spin_unlock(&ip->i_flags_lock);
 	read_unlock(&pag->pag_ici_lock);
 	return error;
 }
Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.c	2009-08-10 13:10:19.146974522 -0300
+++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.c	2009-08-10 13:10:59.958993938 -0300
@@ -708,6 +708,16 @@ xfs_reclaim_inode(
 	return 0;
 }
 
+void
+__xfs_inode_set_reclaim_tag(
+	struct xfs_perag	*pag,
+	struct xfs_inode	*ip)
+{
+	radix_tree_tag_set(&pag->pag_ici_root,
+			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
+			   XFS_ICI_RECLAIM_TAG);
+}
+
 /*
  * We set the inode flag atomically with the radix tree tag.
  * Once we get tag lookups on the radix tree, this inode flag
@@ -722,8 +732,7 @@ xfs_inode_set_reclaim_tag(
 
 	read_lock(&pag->pag_ici_lock);
 	spin_lock(&ip->i_flags_lock);
-	radix_tree_tag_set(&pag->pag_ici_root,
-			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
+	__xfs_inode_set_reclaim_tag(pag, ip);
 	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
 	spin_unlock(&ip->i_flags_lock);
 	read_unlock(&pag->pag_ici_lock);
Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.h
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.h	2009-08-10 13:10:19.153974227 -0300
+++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.h	2009-08-10 13:10:59.962994168 -0300
@@ -48,6 +48,7 @@ int xfs_reclaim_inode(struct xfs_inode *
 int xfs_reclaim_inodes(struct xfs_mount *mp, int mode);
 
 void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
+void __xfs_inode_set_reclaim_tag(struct xfs_perag *pag, struct xfs_inode *ip);
 void xfs_inode_clear_reclaim_tag(struct xfs_inode *ip);
 void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp, struct xfs_perag *pag,
 				struct xfs_inode *ip);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] xfs: fix locking in xfs_iget_cache_hit
  2009-08-10 17:09       ` Christoph Hellwig
@ 2009-08-16 21:01         ` Eric Sandeen
  -1 siblings, 0 replies; 38+ messages in thread
From: Eric Sandeen @ 2009-08-16 21:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Felix Blyakher, linux-fsdevel, xfs

Christoph Hellwig wrote:

> New version below:

> The locking in xfs_iget_cache_hit currently has numerous problems:
> 
>  - we clear the reclaim tag without i_flags_lock which protects modifications
>    to it
>  - we call inode_init_always which can sleep with pag_ici_lock held
>    (this is oss.sgi.com BZ #819)
>  - we acquire and drop i_flags_lock a lot and thus provide no consistency
>    between the various flags we set/clear under it
> 
> This patch fixes all that with a major revamp of the locking in the function.
> The new version acquires i_flags_lock early and only drops it once we need to
> call into inode_init_always or before calling xfs_ilock.
> 
> This patch fixes a bug seen in the wild where we race modifying the reclaim tag.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

This seems ok to me but I have to be honest, I'm having a hard time
getting my head around back into the inode lifecycle.

one comment, I wonder if it's worth capturing the actual error from
inode_init_always() vs. turning every error into ENOMEM?  True, today
it's the only error we can get but why re-set it?

-Eric

> Index: linux-2.6/fs/xfs/xfs_iget.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-10 13:10:19.141974933 -0300
> +++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-10 13:19:06.913056731 -0300
> @@ -191,80 +191,83 @@ xfs_iget_cache_hit(
>  	int			flags,
>  	int			lock_flags) __releases(pag->pag_ici_lock)
>  {
> +	struct inode		*inode = VFS_I(ip);
>  	struct xfs_mount	*mp = ip->i_mount;
> -	int			error = EAGAIN;
> +	int			error;
> +
> +	spin_lock(&ip->i_flags_lock);
>  
>  	/*
> -	 * If INEW is set this inode is being set up
> -	 * If IRECLAIM is set this inode is being torn down
> -	 * Pause and try again.
> +	 * If we are racing with another cache hit that is currently
> +	 * instantiating this inode or currently recycling it out of
> +	 * reclaimabe state, wait for the initialisation to complete
> +	 * before continuing.
> +	 *
> +	 * XXX(hch): eventually we should do something equivalent to
> +	 *	     wait_on_inode to wait for these flags to be cleared
> +	 *	     instead of polling for it.
>  	 */
> -	if (xfs_iflags_test(ip, (XFS_INEW|XFS_IRECLAIM))) {
> +	if (ip->i_flags & (XFS_INEW|XFS_IRECLAIM)) {
>  		XFS_STATS_INC(xs_ig_frecycle);
> +		error = EAGAIN;
>  		goto out_error;
>  	}
>  
> -	/* If IRECLAIMABLE is set, we've torn down the vfs inode part */
> -	if (xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
> -
> -		/*
> -		 * If lookup is racing with unlink, then we should return an
> -		 * error immediately so we don't remove it from the reclaim
> -		 * list and potentially leak the inode.
> -		 */
> -		if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) {
> -			error = ENOENT;
> -			goto out_error;
> -		}
> +	/*
> +	 * If lookup is racing with unlink return an error immediately.
> +	 */
> +	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
> +		error = ENOENT;
> +		goto out_error;
> +	}
>  
> +	/*
> +	 * If IRECLAIMABLE is set, we've torn down the VFS inode already.
> +	 * Need to carefully get it back into useable state.
> +	 */
> +	if (ip->i_flags & XFS_IRECLAIMABLE) {
>  		xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
>  
>  		/*
> -		 * We need to re-initialise the VFS inode as it has been
> -		 * 'freed' by the VFS. Do this here so we can deal with
> -		 * errors cleanly, then tag it so it can be set up correctly
> -		 * later.
> +		 * We need to set XFS_INEW atomically with clearing the
> +		 * reclaimable tag so that we do have an indicator of the
> +		 * inode still being initialized.
>  		 */
> -		if (inode_init_always(mp->m_super, VFS_I(ip))) {
> +		ip->i_flags |= XFS_INEW;
> +		ip->i_flags &= ~XFS_IRECLAIMABLE;
> +		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
> +
> +		spin_unlock(&ip->i_flags_lock);
> +		read_unlock(&pag->pag_ici_lock);
> +
> +		if (unlikely(inode_init_always(mp->m_super, inode))) {
> +			/*
> +			 * Re-initializing the inode failed, and we are in deep
> +			 * trouble.  Try to re-add it to the reclaim list.
> +			 */
> +			read_lock(&pag->pag_ici_lock);
> +			spin_lock(&ip->i_flags_lock);
> +
> +			ip->i_flags &= ~XFS_INEW;
> +			ip->i_flags |= XFS_IRECLAIMABLE;
> +			__xfs_inode_set_reclaim_tag(pag, ip);
> +
>  			error = ENOMEM;
>  			goto out_error;
>  		}
> -
> -		/*
> -		 * We must set the XFS_INEW flag before clearing the
> -		 * XFS_IRECLAIMABLE flag so that if a racing lookup does
> -		 * not find the XFS_IRECLAIMABLE above but has the igrab()
> -		 * below succeed we can safely check XFS_INEW to detect
> -		 * that this inode is still being initialised.
> -		 */
> -		xfs_iflags_set(ip, XFS_INEW);
> -		xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
> -
> -		/* clear the radix tree reclaim flag as well. */
> -		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
> -	} else if (!igrab(VFS_I(ip))) {
> +		inode->i_state = I_LOCK|I_NEW;
> +	} else {
>  		/* If the VFS inode is being torn down, pause and try again. */
> -		XFS_STATS_INC(xs_ig_frecycle);
> -		goto out_error;
> -	} else if (xfs_iflags_test(ip, XFS_INEW)) {
> -		/*
> -		 * We are racing with another cache hit that is
> -		 * currently recycling this inode out of the XFS_IRECLAIMABLE
> -		 * state. Wait for the initialisation to complete before
> -		 * continuing.
> -		 */
> -		wait_on_inode(VFS_I(ip));
> -	}
> +		if (!igrab(inode)) {
> +			error = EAGAIN;
> +			goto out_error;
> +		}
>  
> -	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
> -		error = ENOENT;
> -		iput(VFS_I(ip));
> -		goto out_error;
> +		/* We've got a live one. */
> +		spin_unlock(&ip->i_flags_lock);
> +		read_unlock(&pag->pag_ici_lock);
>  	}
>  
> -	/* We've got a live one. */
> -	read_unlock(&pag->pag_ici_lock);
> -
>  	if (lock_flags != 0)
>  		xfs_ilock(ip, lock_flags);
>  
> @@ -274,6 +277,7 @@ xfs_iget_cache_hit(
>  	return 0;
>  
>  out_error:
> +	spin_unlock(&ip->i_flags_lock);
>  	read_unlock(&pag->pag_ici_lock);
>  	return error;
>  }
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.c	2009-08-10 13:10:19.146974522 -0300
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.c	2009-08-10 13:10:59.958993938 -0300
> @@ -708,6 +708,16 @@ xfs_reclaim_inode(
>  	return 0;
>  }
>  
> +void
> +__xfs_inode_set_reclaim_tag(
> +	struct xfs_perag	*pag,
> +	struct xfs_inode	*ip)
> +{
> +	radix_tree_tag_set(&pag->pag_ici_root,
> +			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
> +			   XFS_ICI_RECLAIM_TAG);
> +}
> +
>  /*
>   * We set the inode flag atomically with the radix tree tag.
>   * Once we get tag lookups on the radix tree, this inode flag
> @@ -722,8 +732,7 @@ xfs_inode_set_reclaim_tag(
>  
>  	read_lock(&pag->pag_ici_lock);
>  	spin_lock(&ip->i_flags_lock);
> -	radix_tree_tag_set(&pag->pag_ici_root,
> -			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
> +	__xfs_inode_set_reclaim_tag(pag, ip);
>  	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
>  	spin_unlock(&ip->i_flags_lock);
>  	read_unlock(&pag->pag_ici_lock);
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.h
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.h	2009-08-10 13:10:19.153974227 -0300
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.h	2009-08-10 13:10:59.962994168 -0300
> @@ -48,6 +48,7 @@ int xfs_reclaim_inode(struct xfs_inode *
>  int xfs_reclaim_inodes(struct xfs_mount *mp, int mode);
>  
>  void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
> +void __xfs_inode_set_reclaim_tag(struct xfs_perag *pag, struct xfs_inode *ip);
>  void xfs_inode_clear_reclaim_tag(struct xfs_inode *ip);
>  void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp, struct xfs_perag *pag,
>  				struct xfs_inode *ip);



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] xfs: fix locking in xfs_iget_cache_hit
@ 2009-08-16 21:01         ` Eric Sandeen
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Sandeen @ 2009-08-16 21:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs

Christoph Hellwig wrote:

> New version below:

> The locking in xfs_iget_cache_hit currently has numerous problems:
> 
>  - we clear the reclaim tag without i_flags_lock which protects modifications
>    to it
>  - we call inode_init_always which can sleep with pag_ici_lock held
>    (this is oss.sgi.com BZ #819)
>  - we acquire and drop i_flags_lock a lot and thus provide no consistency
>    between the various flags we set/clear under it
> 
> This patch fixes all that with a major revamp of the locking in the function.
> The new version acquires i_flags_lock early and only drops it once we need to
> call into inode_init_always or before calling xfs_ilock.
> 
> This patch fixes a bug seen in the wild where we race modifying the reclaim tag.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

This seems ok to me but I have to be honest, I'm having a hard time
getting my head around back into the inode lifecycle.

one comment, I wonder if it's worth capturing the actual error from
inode_init_always() vs. turning every error into ENOMEM?  True, today
it's the only error we can get but why re-set it?

-Eric

> Index: linux-2.6/fs/xfs/xfs_iget.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-10 13:10:19.141974933 -0300
> +++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-10 13:19:06.913056731 -0300
> @@ -191,80 +191,83 @@ xfs_iget_cache_hit(
>  	int			flags,
>  	int			lock_flags) __releases(pag->pag_ici_lock)
>  {
> +	struct inode		*inode = VFS_I(ip);
>  	struct xfs_mount	*mp = ip->i_mount;
> -	int			error = EAGAIN;
> +	int			error;
> +
> +	spin_lock(&ip->i_flags_lock);
>  
>  	/*
> -	 * If INEW is set this inode is being set up
> -	 * If IRECLAIM is set this inode is being torn down
> -	 * Pause and try again.
> +	 * If we are racing with another cache hit that is currently
> +	 * instantiating this inode or currently recycling it out of
> +	 * reclaimabe state, wait for the initialisation to complete
> +	 * before continuing.
> +	 *
> +	 * XXX(hch): eventually we should do something equivalent to
> +	 *	     wait_on_inode to wait for these flags to be cleared
> +	 *	     instead of polling for it.
>  	 */
> -	if (xfs_iflags_test(ip, (XFS_INEW|XFS_IRECLAIM))) {
> +	if (ip->i_flags & (XFS_INEW|XFS_IRECLAIM)) {
>  		XFS_STATS_INC(xs_ig_frecycle);
> +		error = EAGAIN;
>  		goto out_error;
>  	}
>  
> -	/* If IRECLAIMABLE is set, we've torn down the vfs inode part */
> -	if (xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
> -
> -		/*
> -		 * If lookup is racing with unlink, then we should return an
> -		 * error immediately so we don't remove it from the reclaim
> -		 * list and potentially leak the inode.
> -		 */
> -		if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) {
> -			error = ENOENT;
> -			goto out_error;
> -		}
> +	/*
> +	 * If lookup is racing with unlink return an error immediately.
> +	 */
> +	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
> +		error = ENOENT;
> +		goto out_error;
> +	}
>  
> +	/*
> +	 * If IRECLAIMABLE is set, we've torn down the VFS inode already.
> +	 * Need to carefully get it back into useable state.
> +	 */
> +	if (ip->i_flags & XFS_IRECLAIMABLE) {
>  		xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
>  
>  		/*
> -		 * We need to re-initialise the VFS inode as it has been
> -		 * 'freed' by the VFS. Do this here so we can deal with
> -		 * errors cleanly, then tag it so it can be set up correctly
> -		 * later.
> +		 * We need to set XFS_INEW atomically with clearing the
> +		 * reclaimable tag so that we do have an indicator of the
> +		 * inode still being initialized.
>  		 */
> -		if (inode_init_always(mp->m_super, VFS_I(ip))) {
> +		ip->i_flags |= XFS_INEW;
> +		ip->i_flags &= ~XFS_IRECLAIMABLE;
> +		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
> +
> +		spin_unlock(&ip->i_flags_lock);
> +		read_unlock(&pag->pag_ici_lock);
> +
> +		if (unlikely(inode_init_always(mp->m_super, inode))) {
> +			/*
> +			 * Re-initializing the inode failed, and we are in deep
> +			 * trouble.  Try to re-add it to the reclaim list.
> +			 */
> +			read_lock(&pag->pag_ici_lock);
> +			spin_lock(&ip->i_flags_lock);
> +
> +			ip->i_flags &= ~XFS_INEW;
> +			ip->i_flags |= XFS_IRECLAIMABLE;
> +			__xfs_inode_set_reclaim_tag(pag, ip);
> +
>  			error = ENOMEM;
>  			goto out_error;
>  		}
> -
> -		/*
> -		 * We must set the XFS_INEW flag before clearing the
> -		 * XFS_IRECLAIMABLE flag so that if a racing lookup does
> -		 * not find the XFS_IRECLAIMABLE above but has the igrab()
> -		 * below succeed we can safely check XFS_INEW to detect
> -		 * that this inode is still being initialised.
> -		 */
> -		xfs_iflags_set(ip, XFS_INEW);
> -		xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
> -
> -		/* clear the radix tree reclaim flag as well. */
> -		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
> -	} else if (!igrab(VFS_I(ip))) {
> +		inode->i_state = I_LOCK|I_NEW;
> +	} else {
>  		/* If the VFS inode is being torn down, pause and try again. */
> -		XFS_STATS_INC(xs_ig_frecycle);
> -		goto out_error;
> -	} else if (xfs_iflags_test(ip, XFS_INEW)) {
> -		/*
> -		 * We are racing with another cache hit that is
> -		 * currently recycling this inode out of the XFS_IRECLAIMABLE
> -		 * state. Wait for the initialisation to complete before
> -		 * continuing.
> -		 */
> -		wait_on_inode(VFS_I(ip));
> -	}
> +		if (!igrab(inode)) {
> +			error = EAGAIN;
> +			goto out_error;
> +		}
>  
> -	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
> -		error = ENOENT;
> -		iput(VFS_I(ip));
> -		goto out_error;
> +		/* We've got a live one. */
> +		spin_unlock(&ip->i_flags_lock);
> +		read_unlock(&pag->pag_ici_lock);
>  	}
>  
> -	/* We've got a live one. */
> -	read_unlock(&pag->pag_ici_lock);
> -
>  	if (lock_flags != 0)
>  		xfs_ilock(ip, lock_flags);
>  
> @@ -274,6 +277,7 @@ xfs_iget_cache_hit(
>  	return 0;
>  
>  out_error:
> +	spin_unlock(&ip->i_flags_lock);
>  	read_unlock(&pag->pag_ici_lock);
>  	return error;
>  }
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.c	2009-08-10 13:10:19.146974522 -0300
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.c	2009-08-10 13:10:59.958993938 -0300
> @@ -708,6 +708,16 @@ xfs_reclaim_inode(
>  	return 0;
>  }
>  
> +void
> +__xfs_inode_set_reclaim_tag(
> +	struct xfs_perag	*pag,
> +	struct xfs_inode	*ip)
> +{
> +	radix_tree_tag_set(&pag->pag_ici_root,
> +			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
> +			   XFS_ICI_RECLAIM_TAG);
> +}
> +
>  /*
>   * We set the inode flag atomically with the radix tree tag.
>   * Once we get tag lookups on the radix tree, this inode flag
> @@ -722,8 +732,7 @@ xfs_inode_set_reclaim_tag(
>  
>  	read_lock(&pag->pag_ici_lock);
>  	spin_lock(&ip->i_flags_lock);
> -	radix_tree_tag_set(&pag->pag_ici_root,
> -			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
> +	__xfs_inode_set_reclaim_tag(pag, ip);
>  	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
>  	spin_unlock(&ip->i_flags_lock);
>  	read_unlock(&pag->pag_ici_lock);
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.h
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.h	2009-08-10 13:10:19.153974227 -0300
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.h	2009-08-10 13:10:59.962994168 -0300
> @@ -48,6 +48,7 @@ int xfs_reclaim_inode(struct xfs_inode *
>  int xfs_reclaim_inodes(struct xfs_mount *mp, int mode);
>  
>  void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
> +void __xfs_inode_set_reclaim_tag(struct xfs_perag *pag, struct xfs_inode *ip);
>  void xfs_inode_clear_reclaim_tag(struct xfs_inode *ip);
>  void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp, struct xfs_perag *pag,
>  				struct xfs_inode *ip);


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] xfs: fix locking in xfs_iget_cache_hit
  2009-08-10 17:09       ` Christoph Hellwig
@ 2009-08-16 22:54         ` Felix Blyakher
  -1 siblings, 0 replies; 38+ messages in thread
From: Felix Blyakher @ 2009-08-16 22:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs


On Aug 10, 2009, at 12:09 PM, Christoph Hellwig wrote:

> On Fri, Aug 07, 2009 at 12:25:47PM -0500, Felix Blyakher wrote:
>>> +	if (ip->i_flags & XFS_INEW) {
>>
>> Another case when we find XFS_INEW set is the race with the
>> cache miss, which just set up a new inode. Would the proposed
>> code be still sensible in that case? If yes, at least comments
>> should be updated.
>
>>> +		wait_on_inode(inode);
>>
>> It's possible to have XFS_INEW set, but no I_LOCK|I_NEW yet.
>> Then the wait_on_inode() would return quickly even before the
>> linux inode is reinitialized. Though, that was the case with
>> the old code as well.
>
> The wait_on_inode is only sensible for the non-recycle case.

The case, I was referring to, was indeed the reclaimable one when
the first thread is going through

     xfs_iget
	xfs_iget_cache_hit
             if (ip->i_flags & XFS_IRECLAIMABLE) {
                 ip->i_flags |= XFS_INEW;
-->
         xfs_setup_inode
             inode->i_state = I_NEW|I_LOCK;


while another therad run through the following sequence right where the
arrow shows above:

         xfs_iget_cache_hit
             if (ip->i_flags & XFS_INEW) {
                 wait_on_inode

There is nothing to wait on here yet, as I_LOCK is not set yet.

		
> But it's
> not actually very useful with our flags scheme, so for now I've  
> reverted
> to do the old-style polling and just added an XXX comment that we'll
> eventually look into a better scheme.
>
>>> +	/*
>>> +	 * If lookup is racing with unlink, then we should return an
>>> +	 * error immediately so we don't remove it from the reclaim
>>> +	 * list and potentially leak the inode.
>>> +	 */
>>> +	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
>>
>> Previously the conclusion of the race with unlink was based on
>> XFS_IRECLAIMABLE i_flag set in addition to the test above.
>> Is is no longer a case, or not really necessary?
>
> We actually had the test two times in the old code, once for the
> reclaim case, and once after the igrab succeeded. I just moved it into

> a single command place.

which doesn't test for XFS_IRECLAIMABLE.
I just wanted to make sure that was the intention.

>  I've updated the comment to match that.

Thanks, that will make the intention clear.

>
>
>
> New version below:
>
> -- 
>
> Subject: xfs: fix locking in xfs_iget_cache_hit
> From: Christoph Hellwig <hch@lst.de>
>
> The locking in xfs_iget_cache_hit currently has numerous problems:
>
> - we clear the reclaim tag without i_flags_lock which protects  
> modifications
>   to it
> - we call inode_init_always which can sleep with pag_ici_lock held
>   (this is oss.sgi.com BZ #819)
> - we acquire and drop i_flags_lock a lot and thus provide no  
> consistency
>   between the various flags we set/clear under it
>
> This patch fixes all that with a major revamp of the locking in the  
> function.
> The new version acquires i_flags_lock early and only drops it once  
> we need to
> call into inode_init_always or before calling xfs_ilock.
>
> This patch fixes a bug seen in the wild where we race modifying the  
> reclaim tag.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Felix Blyakher <felixb@sgi.com>

>
>
> Index: linux-2.6/fs/xfs/xfs_iget.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-10 13:10:19.141974933  
> -0300
> +++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-10 13:19:06.913056731 -0300
> @@ -191,80 +191,83 @@ xfs_iget_cache_hit(
> 	int			flags,
> 	int			lock_flags) __releases(pag->pag_ici_lock)
> {
> +	struct inode		*inode = VFS_I(ip);
> 	struct xfs_mount	*mp = ip->i_mount;
> -	int			error = EAGAIN;
> +	int			error;
> +
> +	spin_lock(&ip->i_flags_lock);
>
> 	/*
> -	 * If INEW is set this inode is being set up
> -	 * If IRECLAIM is set this inode is being torn down
> -	 * Pause and try again.
> +	 * If we are racing with another cache hit that is currently
> +	 * instantiating this inode or currently recycling it out of
> +	 * reclaimabe state, wait for the initialisation to complete
                      ^ l

>
> +	 * before continuing.
> +	 *
> +	 * XXX(hch): eventually we should do something equivalent to
> +	 *	     wait_on_inode to wait for these flags to be cleared
> +	 *	     instead of polling for it.
> 	 */
> -	if (xfs_iflags_test(ip, (XFS_INEW|XFS_IRECLAIM))) {
> +	if (ip->i_flags & (XFS_INEW|XFS_IRECLAIM)) {
> 		XFS_STATS_INC(xs_ig_frecycle);
> +		error = EAGAIN;
> 		goto out_error;
> 	}
>
> -	/* If IRECLAIMABLE is set, we've torn down the vfs inode part */
> -	if (xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
> -
> -		/*
> -		 * If lookup is racing with unlink, then we should return an
> -		 * error immediately so we don't remove it from the reclaim
> -		 * list and potentially leak the inode.
> -		 */
> -		if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) {
> -			error = ENOENT;
> -			goto out_error;
> -		}
> +	/*
> +	 * If lookup is racing with unlink return an error immediately.
> +	 */
> +	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
> +		error = ENOENT;
> +		goto out_error;
> +	}
>
> +	/*
> +	 * If IRECLAIMABLE is set, we've torn down the VFS inode already.
> +	 * Need to carefully get it back into useable state.
> +	 */
> +	if (ip->i_flags & XFS_IRECLAIMABLE) {
> 		xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
>
> 		/*
> -		 * We need to re-initialise the VFS inode as it has been
> -		 * 'freed' by the VFS. Do this here so we can deal with
> -		 * errors cleanly, then tag it so it can be set up correctly
> -		 * later.
> +		 * We need to set XFS_INEW atomically with clearing the
> +		 * reclaimable tag so that we do have an indicator of the
> +		 * inode still being initialized.
> 		 */
> -		if (inode_init_always(mp->m_super, VFS_I(ip))) {
> +		ip->i_flags |= XFS_INEW;
> +		ip->i_flags &= ~XFS_IRECLAIMABLE;
> +		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
> +
> +		spin_unlock(&ip->i_flags_lock);
> +		read_unlock(&pag->pag_ici_lock);
> +
> +		if (unlikely(inode_init_always(mp->m_super, inode))) {
> +			/*
> +			 * Re-initializing the inode failed, and we are in deep
> +			 * trouble.  Try to re-add it to the reclaim list.
> +			 */
> +			read_lock(&pag->pag_ici_lock);
> +			spin_lock(&ip->i_flags_lock);
> +
> +			ip->i_flags &= ~XFS_INEW;
> +			ip->i_flags |= XFS_IRECLAIMABLE;
> +			__xfs_inode_set_reclaim_tag(pag, ip);
> +
> 			error = ENOMEM;
> 			goto out_error;
> 		}
> -
> -		/*
> -		 * We must set the XFS_INEW flag before clearing the
> -		 * XFS_IRECLAIMABLE flag so that if a racing lookup does
> -		 * not find the XFS_IRECLAIMABLE above but has the igrab()
> -		 * below succeed we can safely check XFS_INEW to detect
> -		 * that this inode is still being initialised.
> -		 */
> -		xfs_iflags_set(ip, XFS_INEW);
> -		xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
> -
> -		/* clear the radix tree reclaim flag as well. */
> -		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
> -	} else if (!igrab(VFS_I(ip))) {
> +		inode->i_state = I_LOCK|I_NEW;

Seems redundant, as that will be set one step later in  
xfs_setup_inode().

> +	} else {
> 		/* If the VFS inode is being torn down, pause and try again. */
> -		XFS_STATS_INC(xs_ig_frecycle);
> -		goto out_error;
> -	} else if (xfs_iflags_test(ip, XFS_INEW)) {
> -		/*
> -		 * We are racing with another cache hit that is
> -		 * currently recycling this inode out of the XFS_IRECLAIMABLE
> -		 * state. Wait for the initialisation to complete before
> -		 * continuing.
> -		 */
> -		wait_on_inode(VFS_I(ip));
> -	}
> +		if (!igrab(inode)) {
> +			error = EAGAIN;
> +			goto out_error;
> +		}
>
> -	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
> -		error = ENOENT;
> -		iput(VFS_I(ip));
> -		goto out_error;
> +		/* We've got a live one. */
> +		spin_unlock(&ip->i_flags_lock);
> +		read_unlock(&pag->pag_ici_lock);
> 	}
>
> -	/* We've got a live one. */
> -	read_unlock(&pag->pag_ici_lock);
> -
> 	if (lock_flags != 0)
> 		xfs_ilock(ip, lock_flags);
>
> @@ -274,6 +277,7 @@ xfs_iget_cache_hit(
> 	return 0;
>
> out_error:
> +	spin_unlock(&ip->i_flags_lock);
> 	read_unlock(&pag->pag_ici_lock);
> 	return error;
> }
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.c	2009-08-10  
> 13:10:19.146974522 -0300
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.c	2009-08-10  
> 13:10:59.958993938 -0300
> @@ -708,6 +708,16 @@ xfs_reclaim_inode(
> 	return 0;
> }
>
> +void
> +__xfs_inode_set_reclaim_tag(
> +	struct xfs_perag	*pag,
> +	struct xfs_inode	*ip)
> +{
> +	radix_tree_tag_set(&pag->pag_ici_root,
> +			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
> +			   XFS_ICI_RECLAIM_TAG);
> +}
> +
> /*
>  * We set the inode flag atomically with the radix tree tag.
>  * Once we get tag lookups on the radix tree, this inode flag
> @@ -722,8 +732,7 @@ xfs_inode_set_reclaim_tag(
>
> 	read_lock(&pag->pag_ici_lock);
> 	spin_lock(&ip->i_flags_lock);
> -	radix_tree_tag_set(&pag->pag_ici_root,
> -			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
> +	__xfs_inode_set_reclaim_tag(pag, ip);
> 	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
> 	spin_unlock(&ip->i_flags_lock);
> 	read_unlock(&pag->pag_ici_lock);
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.h
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.h	2009-08-10  
> 13:10:19.153974227 -0300
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.h	2009-08-10  
> 13:10:59.962994168 -0300
> @@ -48,6 +48,7 @@ int xfs_reclaim_inode(struct xfs_inode *
> int xfs_reclaim_inodes(struct xfs_mount *mp, int mode);
>
> void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
> +void __xfs_inode_set_reclaim_tag(struct xfs_perag *pag, struct  
> xfs_inode *ip);
> void xfs_inode_clear_reclaim_tag(struct xfs_inode *ip);
> void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp, struct  
> xfs_perag *pag,
> 				struct xfs_inode *ip);


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] xfs: fix locking in xfs_iget_cache_hit
@ 2009-08-16 22:54         ` Felix Blyakher
  0 siblings, 0 replies; 38+ messages in thread
From: Felix Blyakher @ 2009-08-16 22:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs


On Aug 10, 2009, at 12:09 PM, Christoph Hellwig wrote:

> On Fri, Aug 07, 2009 at 12:25:47PM -0500, Felix Blyakher wrote:
>>> +	if (ip->i_flags & XFS_INEW) {
>>
>> Another case when we find XFS_INEW set is the race with the
>> cache miss, which just set up a new inode. Would the proposed
>> code be still sensible in that case? If yes, at least comments
>> should be updated.
>
>>> +		wait_on_inode(inode);
>>
>> It's possible to have XFS_INEW set, but no I_LOCK|I_NEW yet.
>> Then the wait_on_inode() would return quickly even before the
>> linux inode is reinitialized. Though, that was the case with
>> the old code as well.
>
> The wait_on_inode is only sensible for the non-recycle case.

The case, I was referring to, was indeed the reclaimable one when
the first thread is going through

     xfs_iget
	xfs_iget_cache_hit
             if (ip->i_flags & XFS_IRECLAIMABLE) {
                 ip->i_flags |= XFS_INEW;
-->
         xfs_setup_inode
             inode->i_state = I_NEW|I_LOCK;


while another therad run through the following sequence right where the
arrow shows above:

         xfs_iget_cache_hit
             if (ip->i_flags & XFS_INEW) {
                 wait_on_inode

There is nothing to wait on here yet, as I_LOCK is not set yet.

		
> But it's
> not actually very useful with our flags scheme, so for now I've  
> reverted
> to do the old-style polling and just added an XXX comment that we'll
> eventually look into a better scheme.
>
>>> +	/*
>>> +	 * If lookup is racing with unlink, then we should return an
>>> +	 * error immediately so we don't remove it from the reclaim
>>> +	 * list and potentially leak the inode.
>>> +	 */
>>> +	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
>>
>> Previously the conclusion of the race with unlink was based on
>> XFS_IRECLAIMABLE i_flag set in addition to the test above.
>> Is is no longer a case, or not really necessary?
>
> We actually had the test two times in the old code, once for the
> reclaim case, and once after the igrab succeeded. I just moved it into

> a single command place.

which doesn't test for XFS_IRECLAIMABLE.
I just wanted to make sure that was the intention.

>  I've updated the comment to match that.

Thanks, that will make the intention clear.

>
>
>
> New version below:
>
> -- 
>
> Subject: xfs: fix locking in xfs_iget_cache_hit
> From: Christoph Hellwig <hch@lst.de>
>
> The locking in xfs_iget_cache_hit currently has numerous problems:
>
> - we clear the reclaim tag without i_flags_lock which protects  
> modifications
>   to it
> - we call inode_init_always which can sleep with pag_ici_lock held
>   (this is oss.sgi.com BZ #819)
> - we acquire and drop i_flags_lock a lot and thus provide no  
> consistency
>   between the various flags we set/clear under it
>
> This patch fixes all that with a major revamp of the locking in the  
> function.
> The new version acquires i_flags_lock early and only drops it once  
> we need to
> call into inode_init_always or before calling xfs_ilock.
>
> This patch fixes a bug seen in the wild where we race modifying the  
> reclaim tag.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Felix Blyakher <felixb@sgi.com>

>
>
> Index: linux-2.6/fs/xfs/xfs_iget.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-10 13:10:19.141974933  
> -0300
> +++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-10 13:19:06.913056731 -0300
> @@ -191,80 +191,83 @@ xfs_iget_cache_hit(
> 	int			flags,
> 	int			lock_flags) __releases(pag->pag_ici_lock)
> {
> +	struct inode		*inode = VFS_I(ip);
> 	struct xfs_mount	*mp = ip->i_mount;
> -	int			error = EAGAIN;
> +	int			error;
> +
> +	spin_lock(&ip->i_flags_lock);
>
> 	/*
> -	 * If INEW is set this inode is being set up
> -	 * If IRECLAIM is set this inode is being torn down
> -	 * Pause and try again.
> +	 * If we are racing with another cache hit that is currently
> +	 * instantiating this inode or currently recycling it out of
> +	 * reclaimabe state, wait for the initialisation to complete
                      ^ l

>
> +	 * before continuing.
> +	 *
> +	 * XXX(hch): eventually we should do something equivalent to
> +	 *	     wait_on_inode to wait for these flags to be cleared
> +	 *	     instead of polling for it.
> 	 */
> -	if (xfs_iflags_test(ip, (XFS_INEW|XFS_IRECLAIM))) {
> +	if (ip->i_flags & (XFS_INEW|XFS_IRECLAIM)) {
> 		XFS_STATS_INC(xs_ig_frecycle);
> +		error = EAGAIN;
> 		goto out_error;
> 	}
>
> -	/* If IRECLAIMABLE is set, we've torn down the vfs inode part */
> -	if (xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
> -
> -		/*
> -		 * If lookup is racing with unlink, then we should return an
> -		 * error immediately so we don't remove it from the reclaim
> -		 * list and potentially leak the inode.
> -		 */
> -		if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) {
> -			error = ENOENT;
> -			goto out_error;
> -		}
> +	/*
> +	 * If lookup is racing with unlink return an error immediately.
> +	 */
> +	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
> +		error = ENOENT;
> +		goto out_error;
> +	}
>
> +	/*
> +	 * If IRECLAIMABLE is set, we've torn down the VFS inode already.
> +	 * Need to carefully get it back into useable state.
> +	 */
> +	if (ip->i_flags & XFS_IRECLAIMABLE) {
> 		xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
>
> 		/*
> -		 * We need to re-initialise the VFS inode as it has been
> -		 * 'freed' by the VFS. Do this here so we can deal with
> -		 * errors cleanly, then tag it so it can be set up correctly
> -		 * later.
> +		 * We need to set XFS_INEW atomically with clearing the
> +		 * reclaimable tag so that we do have an indicator of the
> +		 * inode still being initialized.
> 		 */
> -		if (inode_init_always(mp->m_super, VFS_I(ip))) {
> +		ip->i_flags |= XFS_INEW;
> +		ip->i_flags &= ~XFS_IRECLAIMABLE;
> +		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
> +
> +		spin_unlock(&ip->i_flags_lock);
> +		read_unlock(&pag->pag_ici_lock);
> +
> +		if (unlikely(inode_init_always(mp->m_super, inode))) {
> +			/*
> +			 * Re-initializing the inode failed, and we are in deep
> +			 * trouble.  Try to re-add it to the reclaim list.
> +			 */
> +			read_lock(&pag->pag_ici_lock);
> +			spin_lock(&ip->i_flags_lock);
> +
> +			ip->i_flags &= ~XFS_INEW;
> +			ip->i_flags |= XFS_IRECLAIMABLE;
> +			__xfs_inode_set_reclaim_tag(pag, ip);
> +
> 			error = ENOMEM;
> 			goto out_error;
> 		}
> -
> -		/*
> -		 * We must set the XFS_INEW flag before clearing the
> -		 * XFS_IRECLAIMABLE flag so that if a racing lookup does
> -		 * not find the XFS_IRECLAIMABLE above but has the igrab()
> -		 * below succeed we can safely check XFS_INEW to detect
> -		 * that this inode is still being initialised.
> -		 */
> -		xfs_iflags_set(ip, XFS_INEW);
> -		xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
> -
> -		/* clear the radix tree reclaim flag as well. */
> -		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
> -	} else if (!igrab(VFS_I(ip))) {
> +		inode->i_state = I_LOCK|I_NEW;

Seems redundant, as that will be set one step later in  
xfs_setup_inode().

> +	} else {
> 		/* If the VFS inode is being torn down, pause and try again. */
> -		XFS_STATS_INC(xs_ig_frecycle);
> -		goto out_error;
> -	} else if (xfs_iflags_test(ip, XFS_INEW)) {
> -		/*
> -		 * We are racing with another cache hit that is
> -		 * currently recycling this inode out of the XFS_IRECLAIMABLE
> -		 * state. Wait for the initialisation to complete before
> -		 * continuing.
> -		 */
> -		wait_on_inode(VFS_I(ip));
> -	}
> +		if (!igrab(inode)) {
> +			error = EAGAIN;
> +			goto out_error;
> +		}
>
> -	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
> -		error = ENOENT;
> -		iput(VFS_I(ip));
> -		goto out_error;
> +		/* We've got a live one. */
> +		spin_unlock(&ip->i_flags_lock);
> +		read_unlock(&pag->pag_ici_lock);
> 	}
>
> -	/* We've got a live one. */
> -	read_unlock(&pag->pag_ici_lock);
> -
> 	if (lock_flags != 0)
> 		xfs_ilock(ip, lock_flags);
>
> @@ -274,6 +277,7 @@ xfs_iget_cache_hit(
> 	return 0;
>
> out_error:
> +	spin_unlock(&ip->i_flags_lock);
> 	read_unlock(&pag->pag_ici_lock);
> 	return error;
> }
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.c	2009-08-10  
> 13:10:19.146974522 -0300
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.c	2009-08-10  
> 13:10:59.958993938 -0300
> @@ -708,6 +708,16 @@ xfs_reclaim_inode(
> 	return 0;
> }
>
> +void
> +__xfs_inode_set_reclaim_tag(
> +	struct xfs_perag	*pag,
> +	struct xfs_inode	*ip)
> +{
> +	radix_tree_tag_set(&pag->pag_ici_root,
> +			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
> +			   XFS_ICI_RECLAIM_TAG);
> +}
> +
> /*
>  * We set the inode flag atomically with the radix tree tag.
>  * Once we get tag lookups on the radix tree, this inode flag
> @@ -722,8 +732,7 @@ xfs_inode_set_reclaim_tag(
>
> 	read_lock(&pag->pag_ici_lock);
> 	spin_lock(&ip->i_flags_lock);
> -	radix_tree_tag_set(&pag->pag_ici_root,
> -			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
> +	__xfs_inode_set_reclaim_tag(pag, ip);
> 	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
> 	spin_unlock(&ip->i_flags_lock);
> 	read_unlock(&pag->pag_ici_lock);
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.h
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.h	2009-08-10  
> 13:10:19.153974227 -0300
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.h	2009-08-10  
> 13:10:59.962994168 -0300
> @@ -48,6 +48,7 @@ int xfs_reclaim_inode(struct xfs_inode *
> int xfs_reclaim_inodes(struct xfs_mount *mp, int mode);
>
> void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
> +void __xfs_inode_set_reclaim_tag(struct xfs_perag *pag, struct  
> xfs_inode *ip);
> void xfs_inode_clear_reclaim_tag(struct xfs_inode *ip);
> void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp, struct  
> xfs_perag *pag,
> 				struct xfs_inode *ip);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] xfs: fix locking in xfs_iget_cache_hit
  2009-08-16 22:54         ` Felix Blyakher
@ 2009-08-17  0:36           ` Christoph Hellwig
  -1 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2009-08-17  0:36 UTC (permalink / raw)
  To: Felix Blyakher; +Cc: Christoph Hellwig, linux-fsdevel, xfs

On Sun, Aug 16, 2009 at 05:54:35PM -0500, Felix Blyakher wrote:
>> The wait_on_inode is only sensible for the non-recycle case.
>
> The case, I was referring to, was indeed the reclaimable one when
> the first thread is going through
>
>     xfs_iget
> 	xfs_iget_cache_hit
>             if (ip->i_flags & XFS_IRECLAIMABLE) {
>                 ip->i_flags |= XFS_INEW;
> -->
>         xfs_setup_inode
>             inode->i_state = I_NEW|I_LOCK;
>
>
> while another therad run through the following sequence right where the
> arrow shows above:
>
>         xfs_iget_cache_hit
>             if (ip->i_flags & XFS_INEW) {
>                 wait_on_inode
>
> There is nothing to wait on here yet, as I_LOCK is not set yet.

Yeah.  The new version should fix it.

Here's a version with the small update that Eric suggested, any chance
we could get this into 2.6.31 still?

-- 

Subject: xfs: fix locking in xfs_iget_cache_hit
From: Christoph Hellwig <hch@lst.de>

The locking in xfs_iget_cache_hit currently has numerous problems:

 - we clear the reclaim tag without i_flags_lock which protects modifications
   to it
 - we call inode_init_always which can sleep with pag_ici_lock held
   (this is oss.sgi.com BZ #819)
 - we acquire and drop i_flags_lock a lot and thus provide no consistency
   between the various flags we set/clear under it

This patch fixes all that with a major revamp of the locking in the function.
The new version acquires i_flags_lock early and only drops it once we need to
call into inode_init_always or before calling xfs_ilock.

This patch fixes a bug seen in the wild where we race modifying the reclaim tag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>

Index: linux-2.6/fs/xfs/xfs_iget.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-16 20:10:25.200960533 -0300
+++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-16 20:11:30.580432781 -0300
@@ -191,80 +191,82 @@ xfs_iget_cache_hit(
 	int			flags,
 	int			lock_flags) __releases(pag->pag_ici_lock)
 {
+	struct inode		*inode = VFS_I(ip);
 	struct xfs_mount	*mp = ip->i_mount;
-	int			error = EAGAIN;
+	int			error;
+
+	spin_lock(&ip->i_flags_lock);
 
 	/*
-	 * If INEW is set this inode is being set up
-	 * If IRECLAIM is set this inode is being torn down
-	 * Pause and try again.
+	 * If we are racing with another cache hit that is currently
+	 * instantiating this inode or currently recycling it out of
+	 * reclaimabe state, wait for the initialisation to complete
+	 * before continuing.
+	 *
+	 * XXX(hch): eventually we should do something equivalent to
+	 *	     wait_on_inode to wait for these flags to be cleared
+	 *	     instead of polling for it.
 	 */
-	if (xfs_iflags_test(ip, (XFS_INEW|XFS_IRECLAIM))) {
+	if (ip->i_flags & (XFS_INEW|XFS_IRECLAIM)) {
 		XFS_STATS_INC(xs_ig_frecycle);
+		error = EAGAIN;
 		goto out_error;
 	}
 
-	/* If IRECLAIMABLE is set, we've torn down the vfs inode part */
-	if (xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
-
-		/*
-		 * If lookup is racing with unlink, then we should return an
-		 * error immediately so we don't remove it from the reclaim
-		 * list and potentially leak the inode.
-		 */
-		if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) {
-			error = ENOENT;
-			goto out_error;
-		}
+	/*
+	 * If lookup is racing with unlink return an error immediately.
+	 */
+	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
+		error = ENOENT;
+		goto out_error;
+	}
 
+	/*
+	 * If IRECLAIMABLE is set, we've torn down the VFS inode already.
+	 * Need to carefully get it back into useable state.
+	 */
+	if (ip->i_flags & XFS_IRECLAIMABLE) {
 		xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
 
 		/*
-		 * We need to re-initialise the VFS inode as it has been
-		 * 'freed' by the VFS. Do this here so we can deal with
-		 * errors cleanly, then tag it so it can be set up correctly
-		 * later.
+		 * We need to set XFS_INEW atomically with clearing the
+		 * reclaimable tag so that we do have an indicator of the
+		 * inode still being initialized.
 		 */
-		if (inode_init_always(mp->m_super, VFS_I(ip))) {
-			error = ENOMEM;
-			goto out_error;
-		}
+		ip->i_flags |= XFS_INEW;
+		ip->i_flags &= ~XFS_IRECLAIMABLE;
+		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
 
-		/*
-		 * We must set the XFS_INEW flag before clearing the
-		 * XFS_IRECLAIMABLE flag so that if a racing lookup does
-		 * not find the XFS_IRECLAIMABLE above but has the igrab()
-		 * below succeed we can safely check XFS_INEW to detect
-		 * that this inode is still being initialised.
-		 */
-		xfs_iflags_set(ip, XFS_INEW);
-		xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
+		spin_unlock(&ip->i_flags_lock);
+		read_unlock(&pag->pag_ici_lock);
 
-		/* clear the radix tree reclaim flag as well. */
-		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
-	} else if (!igrab(VFS_I(ip))) {
+		error = -inode_init_always(mp->m_super, inode);
+		if (error) {
+			/*
+			 * Re-initializing the inode failed, and we are in deep
+			 * trouble.  Try to re-add it to the reclaim list.
+			 */
+			read_lock(&pag->pag_ici_lock);
+			spin_lock(&ip->i_flags_lock);
+
+			ip->i_flags &= ~XFS_INEW;
+			ip->i_flags |= XFS_IRECLAIMABLE;
+			__xfs_inode_set_reclaim_tag(pag, ip);
+			goto out_error;
+		}
+		inode->i_state = I_LOCK|I_NEW;
+	} else {
 		/* If the VFS inode is being torn down, pause and try again. */
-		XFS_STATS_INC(xs_ig_frecycle);
-		goto out_error;
-	} else if (xfs_iflags_test(ip, XFS_INEW)) {
-		/*
-		 * We are racing with another cache hit that is
-		 * currently recycling this inode out of the XFS_IRECLAIMABLE
-		 * state. Wait for the initialisation to complete before
-		 * continuing.
-		 */
-		wait_on_inode(VFS_I(ip));
-	}
+		if (!igrab(inode)) {
+			error = EAGAIN;
+			goto out_error;
+		}
 
-	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
-		error = ENOENT;
-		iput(VFS_I(ip));
-		goto out_error;
+		/* We've got a live one. */
+		spin_unlock(&ip->i_flags_lock);
+		read_unlock(&pag->pag_ici_lock);
 	}
 
-	/* We've got a live one. */
-	read_unlock(&pag->pag_ici_lock);
-
 	if (lock_flags != 0)
 		xfs_ilock(ip, lock_flags);
 
@@ -274,6 +276,7 @@ xfs_iget_cache_hit(
 	return 0;
 
 out_error:
+	spin_unlock(&ip->i_flags_lock);
 	read_unlock(&pag->pag_ici_lock);
 	return error;
 }
Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.c	2009-08-16 20:01:14.632430664 -0300
+++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.c	2009-08-16 20:10:25.740968342 -0300
@@ -708,6 +708,16 @@ xfs_reclaim_inode(
 	return 0;
 }
 
+void
+__xfs_inode_set_reclaim_tag(
+	struct xfs_perag	*pag,
+	struct xfs_inode	*ip)
+{
+	radix_tree_tag_set(&pag->pag_ici_root,
+			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
+			   XFS_ICI_RECLAIM_TAG);
+}
+
 /*
  * We set the inode flag atomically with the radix tree tag.
  * Once we get tag lookups on the radix tree, this inode flag
@@ -722,8 +732,7 @@ xfs_inode_set_reclaim_tag(
 
 	read_lock(&pag->pag_ici_lock);
 	spin_lock(&ip->i_flags_lock);
-	radix_tree_tag_set(&pag->pag_ici_root,
-			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
+	__xfs_inode_set_reclaim_tag(pag, ip);
 	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
 	spin_unlock(&ip->i_flags_lock);
 	read_unlock(&pag->pag_ici_lock);
Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.h
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.h	2009-08-16 20:01:14.640431122 -0300
+++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.h	2009-08-16 20:10:25.744967593 -0300
@@ -48,6 +48,7 @@ int xfs_reclaim_inode(struct xfs_inode *
 int xfs_reclaim_inodes(struct xfs_mount *mp, int mode);
 
 void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
+void __xfs_inode_set_reclaim_tag(struct xfs_perag *pag, struct xfs_inode *ip);
 void xfs_inode_clear_reclaim_tag(struct xfs_inode *ip);
 void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp, struct xfs_perag *pag,
 				struct xfs_inode *ip);

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] xfs: fix locking in xfs_iget_cache_hit
@ 2009-08-17  0:36           ` Christoph Hellwig
  0 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2009-08-17  0:36 UTC (permalink / raw)
  To: Felix Blyakher; +Cc: Christoph Hellwig, linux-fsdevel, xfs

On Sun, Aug 16, 2009 at 05:54:35PM -0500, Felix Blyakher wrote:
>> The wait_on_inode is only sensible for the non-recycle case.
>
> The case, I was referring to, was indeed the reclaimable one when
> the first thread is going through
>
>     xfs_iget
> 	xfs_iget_cache_hit
>             if (ip->i_flags & XFS_IRECLAIMABLE) {
>                 ip->i_flags |= XFS_INEW;
> -->
>         xfs_setup_inode
>             inode->i_state = I_NEW|I_LOCK;
>
>
> while another therad run through the following sequence right where the
> arrow shows above:
>
>         xfs_iget_cache_hit
>             if (ip->i_flags & XFS_INEW) {
>                 wait_on_inode
>
> There is nothing to wait on here yet, as I_LOCK is not set yet.

Yeah.  The new version should fix it.

Here's a version with the small update that Eric suggested, any chance
we could get this into 2.6.31 still?

-- 

Subject: xfs: fix locking in xfs_iget_cache_hit
From: Christoph Hellwig <hch@lst.de>

The locking in xfs_iget_cache_hit currently has numerous problems:

 - we clear the reclaim tag without i_flags_lock which protects modifications
   to it
 - we call inode_init_always which can sleep with pag_ici_lock held
   (this is oss.sgi.com BZ #819)
 - we acquire and drop i_flags_lock a lot and thus provide no consistency
   between the various flags we set/clear under it

This patch fixes all that with a major revamp of the locking in the function.
The new version acquires i_flags_lock early and only drops it once we need to
call into inode_init_always or before calling xfs_ilock.

This patch fixes a bug seen in the wild where we race modifying the reclaim tag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>

Index: linux-2.6/fs/xfs/xfs_iget.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-16 20:10:25.200960533 -0300
+++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-16 20:11:30.580432781 -0300
@@ -191,80 +191,82 @@ xfs_iget_cache_hit(
 	int			flags,
 	int			lock_flags) __releases(pag->pag_ici_lock)
 {
+	struct inode		*inode = VFS_I(ip);
 	struct xfs_mount	*mp = ip->i_mount;
-	int			error = EAGAIN;
+	int			error;
+
+	spin_lock(&ip->i_flags_lock);
 
 	/*
-	 * If INEW is set this inode is being set up
-	 * If IRECLAIM is set this inode is being torn down
-	 * Pause and try again.
+	 * If we are racing with another cache hit that is currently
+	 * instantiating this inode or currently recycling it out of
+	 * reclaimabe state, wait for the initialisation to complete
+	 * before continuing.
+	 *
+	 * XXX(hch): eventually we should do something equivalent to
+	 *	     wait_on_inode to wait for these flags to be cleared
+	 *	     instead of polling for it.
 	 */
-	if (xfs_iflags_test(ip, (XFS_INEW|XFS_IRECLAIM))) {
+	if (ip->i_flags & (XFS_INEW|XFS_IRECLAIM)) {
 		XFS_STATS_INC(xs_ig_frecycle);
+		error = EAGAIN;
 		goto out_error;
 	}
 
-	/* If IRECLAIMABLE is set, we've torn down the vfs inode part */
-	if (xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
-
-		/*
-		 * If lookup is racing with unlink, then we should return an
-		 * error immediately so we don't remove it from the reclaim
-		 * list and potentially leak the inode.
-		 */
-		if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) {
-			error = ENOENT;
-			goto out_error;
-		}
+	/*
+	 * If lookup is racing with unlink return an error immediately.
+	 */
+	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
+		error = ENOENT;
+		goto out_error;
+	}
 
+	/*
+	 * If IRECLAIMABLE is set, we've torn down the VFS inode already.
+	 * Need to carefully get it back into useable state.
+	 */
+	if (ip->i_flags & XFS_IRECLAIMABLE) {
 		xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
 
 		/*
-		 * We need to re-initialise the VFS inode as it has been
-		 * 'freed' by the VFS. Do this here so we can deal with
-		 * errors cleanly, then tag it so it can be set up correctly
-		 * later.
+		 * We need to set XFS_INEW atomically with clearing the
+		 * reclaimable tag so that we do have an indicator of the
+		 * inode still being initialized.
 		 */
-		if (inode_init_always(mp->m_super, VFS_I(ip))) {
-			error = ENOMEM;
-			goto out_error;
-		}
+		ip->i_flags |= XFS_INEW;
+		ip->i_flags &= ~XFS_IRECLAIMABLE;
+		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
 
-		/*
-		 * We must set the XFS_INEW flag before clearing the
-		 * XFS_IRECLAIMABLE flag so that if a racing lookup does
-		 * not find the XFS_IRECLAIMABLE above but has the igrab()
-		 * below succeed we can safely check XFS_INEW to detect
-		 * that this inode is still being initialised.
-		 */
-		xfs_iflags_set(ip, XFS_INEW);
-		xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
+		spin_unlock(&ip->i_flags_lock);
+		read_unlock(&pag->pag_ici_lock);
 
-		/* clear the radix tree reclaim flag as well. */
-		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
-	} else if (!igrab(VFS_I(ip))) {
+		error = -inode_init_always(mp->m_super, inode);
+		if (error) {
+			/*
+			 * Re-initializing the inode failed, and we are in deep
+			 * trouble.  Try to re-add it to the reclaim list.
+			 */
+			read_lock(&pag->pag_ici_lock);
+			spin_lock(&ip->i_flags_lock);
+
+			ip->i_flags &= ~XFS_INEW;
+			ip->i_flags |= XFS_IRECLAIMABLE;
+			__xfs_inode_set_reclaim_tag(pag, ip);
+			goto out_error;
+		}
+		inode->i_state = I_LOCK|I_NEW;
+	} else {
 		/* If the VFS inode is being torn down, pause and try again. */
-		XFS_STATS_INC(xs_ig_frecycle);
-		goto out_error;
-	} else if (xfs_iflags_test(ip, XFS_INEW)) {
-		/*
-		 * We are racing with another cache hit that is
-		 * currently recycling this inode out of the XFS_IRECLAIMABLE
-		 * state. Wait for the initialisation to complete before
-		 * continuing.
-		 */
-		wait_on_inode(VFS_I(ip));
-	}
+		if (!igrab(inode)) {
+			error = EAGAIN;
+			goto out_error;
+		}
 
-	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
-		error = ENOENT;
-		iput(VFS_I(ip));
-		goto out_error;
+		/* We've got a live one. */
+		spin_unlock(&ip->i_flags_lock);
+		read_unlock(&pag->pag_ici_lock);
 	}
 
-	/* We've got a live one. */
-	read_unlock(&pag->pag_ici_lock);
-
 	if (lock_flags != 0)
 		xfs_ilock(ip, lock_flags);
 
@@ -274,6 +276,7 @@ xfs_iget_cache_hit(
 	return 0;
 
 out_error:
+	spin_unlock(&ip->i_flags_lock);
 	read_unlock(&pag->pag_ici_lock);
 	return error;
 }
Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.c	2009-08-16 20:01:14.632430664 -0300
+++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.c	2009-08-16 20:10:25.740968342 -0300
@@ -708,6 +708,16 @@ xfs_reclaim_inode(
 	return 0;
 }
 
+void
+__xfs_inode_set_reclaim_tag(
+	struct xfs_perag	*pag,
+	struct xfs_inode	*ip)
+{
+	radix_tree_tag_set(&pag->pag_ici_root,
+			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
+			   XFS_ICI_RECLAIM_TAG);
+}
+
 /*
  * We set the inode flag atomically with the radix tree tag.
  * Once we get tag lookups on the radix tree, this inode flag
@@ -722,8 +732,7 @@ xfs_inode_set_reclaim_tag(
 
 	read_lock(&pag->pag_ici_lock);
 	spin_lock(&ip->i_flags_lock);
-	radix_tree_tag_set(&pag->pag_ici_root,
-			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
+	__xfs_inode_set_reclaim_tag(pag, ip);
 	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
 	spin_unlock(&ip->i_flags_lock);
 	read_unlock(&pag->pag_ici_lock);
Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.h
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.h	2009-08-16 20:01:14.640431122 -0300
+++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.h	2009-08-16 20:10:25.744967593 -0300
@@ -48,6 +48,7 @@ int xfs_reclaim_inode(struct xfs_inode *
 int xfs_reclaim_inodes(struct xfs_mount *mp, int mode);
 
 void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
+void __xfs_inode_set_reclaim_tag(struct xfs_perag *pag, struct xfs_inode *ip);
 void xfs_inode_clear_reclaim_tag(struct xfs_inode *ip);
 void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp, struct xfs_perag *pag,
 				struct xfs_inode *ip);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] xfs: fix locking in xfs_iget_cache_hit
  2009-08-17  0:36           ` Christoph Hellwig
@ 2009-08-17  3:05             ` Felix Blyakher
  -1 siblings, 0 replies; 38+ messages in thread
From: Felix Blyakher @ 2009-08-17  3:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs


On Aug 16, 2009, at 7:36 PM, Christoph Hellwig wrote:

> On Sun, Aug 16, 2009 at 05:54:35PM -0500, Felix Blyakher wrote:
>>> The wait_on_inode is only sensible for the non-recycle case.
>>
>> The case, I was referring to, was indeed the reclaimable one when
>> the first thread is going through
>>
>>    xfs_iget
>> 	xfs_iget_cache_hit
>>            if (ip->i_flags & XFS_IRECLAIMABLE) {
>>                ip->i_flags |= XFS_INEW;
>> -->
>>        xfs_setup_inode
>>            inode->i_state = I_NEW|I_LOCK;
>>
>>
>> while another therad run through the following sequence right where  
>> the
>> arrow shows above:
>>
>>        xfs_iget_cache_hit
>>            if (ip->i_flags & XFS_INEW) {
>>                wait_on_inode
>>
>> There is nothing to wait on here yet, as I_LOCK is not set yet.
>
> Yeah.  The new version should fix it.

Agree.

> Here's a version with the small update that Eric suggested, any chance
> we could get this into 2.6.31 still?

Will send the pull request tonight.

Felix

>
>
> -- 
>
> Subject: xfs: fix locking in xfs_iget_cache_hit
> From: Christoph Hellwig <hch@lst.de>
>
> The locking in xfs_iget_cache_hit currently has numerous problems:
>
> - we clear the reclaim tag without i_flags_lock which protects  
> modifications
>   to it
> - we call inode_init_always which can sleep with pag_ici_lock held
>   (this is oss.sgi.com BZ #819)
> - we acquire and drop i_flags_lock a lot and thus provide no  
> consistency
>   between the various flags we set/clear under it
>
> This patch fixes all that with a major revamp of the locking in the  
> function.
> The new version acquires i_flags_lock early and only drops it once  
> we need to
> call into inode_init_always or before calling xfs_ilock.
>
> This patch fixes a bug seen in the wild where we race modifying the  
> reclaim tag.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Felix Blyakher <felixb@sgi.com>
>
> Index: linux-2.6/fs/xfs/xfs_iget.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-16 20:10:25.200960533  
> -0300
> +++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-16 20:11:30.580432781 -0300
> @@ -191,80 +191,82 @@ xfs_iget_cache_hit(
> 	int			flags,
> 	int			lock_flags) __releases(pag->pag_ici_lock)
> {
> +	struct inode		*inode = VFS_I(ip);
> 	struct xfs_mount	*mp = ip->i_mount;
> -	int			error = EAGAIN;
> +	int			error;
> +
> +	spin_lock(&ip->i_flags_lock);
>
> 	/*
> -	 * If INEW is set this inode is being set up
> -	 * If IRECLAIM is set this inode is being torn down
> -	 * Pause and try again.
> +	 * If we are racing with another cache hit that is currently
> +	 * instantiating this inode or currently recycling it out of
> +	 * reclaimabe state, wait for the initialisation to complete
> +	 * before continuing.
> +	 *
> +	 * XXX(hch): eventually we should do something equivalent to
> +	 *	     wait_on_inode to wait for these flags to be cleared
> +	 *	     instead of polling for it.
> 	 */
> -	if (xfs_iflags_test(ip, (XFS_INEW|XFS_IRECLAIM))) {
> +	if (ip->i_flags & (XFS_INEW|XFS_IRECLAIM)) {
> 		XFS_STATS_INC(xs_ig_frecycle);
> +		error = EAGAIN;
> 		goto out_error;
> 	}
>
> -	/* If IRECLAIMABLE is set, we've torn down the vfs inode part */
> -	if (xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
> -
> -		/*
> -		 * If lookup is racing with unlink, then we should return an
> -		 * error immediately so we don't remove it from the reclaim
> -		 * list and potentially leak the inode.
> -		 */
> -		if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) {
> -			error = ENOENT;
> -			goto out_error;
> -		}
> +	/*
> +	 * If lookup is racing with unlink return an error immediately.
> +	 */
> +	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
> +		error = ENOENT;
> +		goto out_error;
> +	}
>
> +	/*
> +	 * If IRECLAIMABLE is set, we've torn down the VFS inode already.
> +	 * Need to carefully get it back into useable state.
> +	 */
> +	if (ip->i_flags & XFS_IRECLAIMABLE) {
> 		xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
>
> 		/*
> -		 * We need to re-initialise the VFS inode as it has been
> -		 * 'freed' by the VFS. Do this here so we can deal with
> -		 * errors cleanly, then tag it so it can be set up correctly
> -		 * later.
> +		 * We need to set XFS_INEW atomically with clearing the
> +		 * reclaimable tag so that we do have an indicator of the
> +		 * inode still being initialized.
> 		 */
> -		if (inode_init_always(mp->m_super, VFS_I(ip))) {
> -			error = ENOMEM;
> -			goto out_error;
> -		}
> +		ip->i_flags |= XFS_INEW;
> +		ip->i_flags &= ~XFS_IRECLAIMABLE;
> +		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
>
> -		/*
> -		 * We must set the XFS_INEW flag before clearing the
> -		 * XFS_IRECLAIMABLE flag so that if a racing lookup does
> -		 * not find the XFS_IRECLAIMABLE above but has the igrab()
> -		 * below succeed we can safely check XFS_INEW to detect
> -		 * that this inode is still being initialised.
> -		 */
> -		xfs_iflags_set(ip, XFS_INEW);
> -		xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
> +		spin_unlock(&ip->i_flags_lock);
> +		read_unlock(&pag->pag_ici_lock);
>
> -		/* clear the radix tree reclaim flag as well. */
> -		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
> -	} else if (!igrab(VFS_I(ip))) {
> +		error = -inode_init_always(mp->m_super, inode);
> +		if (error) {
> +			/*
> +			 * Re-initializing the inode failed, and we are in deep
> +			 * trouble.  Try to re-add it to the reclaim list.
> +			 */
> +			read_lock(&pag->pag_ici_lock);
> +			spin_lock(&ip->i_flags_lock);
> +
> +			ip->i_flags &= ~XFS_INEW;
> +			ip->i_flags |= XFS_IRECLAIMABLE;
> +			__xfs_inode_set_reclaim_tag(pag, ip);
> +			goto out_error;
> +		}
> +		inode->i_state = I_LOCK|I_NEW;
> +	} else {
> 		/* If the VFS inode is being torn down, pause and try again. */
> -		XFS_STATS_INC(xs_ig_frecycle);
> -		goto out_error;
> -	} else if (xfs_iflags_test(ip, XFS_INEW)) {
> -		/*
> -		 * We are racing with another cache hit that is
> -		 * currently recycling this inode out of the XFS_IRECLAIMABLE
> -		 * state. Wait for the initialisation to complete before
> -		 * continuing.
> -		 */
> -		wait_on_inode(VFS_I(ip));
> -	}
> +		if (!igrab(inode)) {
> +			error = EAGAIN;
> +			goto out_error;
> +		}
>
> -	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
> -		error = ENOENT;
> -		iput(VFS_I(ip));
> -		goto out_error;
> +		/* We've got a live one. */
> +		spin_unlock(&ip->i_flags_lock);
> +		read_unlock(&pag->pag_ici_lock);
> 	}
>
> -	/* We've got a live one. */
> -	read_unlock(&pag->pag_ici_lock);
> -
> 	if (lock_flags != 0)
> 		xfs_ilock(ip, lock_flags);
>
> @@ -274,6 +276,7 @@ xfs_iget_cache_hit(
> 	return 0;
>
> out_error:
> +	spin_unlock(&ip->i_flags_lock);
> 	read_unlock(&pag->pag_ici_lock);
> 	return error;
> }
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.c	2009-08-16  
> 20:01:14.632430664 -0300
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.c	2009-08-16  
> 20:10:25.740968342 -0300
> @@ -708,6 +708,16 @@ xfs_reclaim_inode(
> 	return 0;
> }
>
> +void
> +__xfs_inode_set_reclaim_tag(
> +	struct xfs_perag	*pag,
> +	struct xfs_inode	*ip)
> +{
> +	radix_tree_tag_set(&pag->pag_ici_root,
> +			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
> +			   XFS_ICI_RECLAIM_TAG);
> +}
> +
> /*
>  * We set the inode flag atomically with the radix tree tag.
>  * Once we get tag lookups on the radix tree, this inode flag
> @@ -722,8 +732,7 @@ xfs_inode_set_reclaim_tag(
>
> 	read_lock(&pag->pag_ici_lock);
> 	spin_lock(&ip->i_flags_lock);
> -	radix_tree_tag_set(&pag->pag_ici_root,
> -			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
> +	__xfs_inode_set_reclaim_tag(pag, ip);
> 	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
> 	spin_unlock(&ip->i_flags_lock);
> 	read_unlock(&pag->pag_ici_lock);
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.h
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.h	2009-08-16  
> 20:01:14.640431122 -0300
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.h	2009-08-16  
> 20:10:25.744967593 -0300
> @@ -48,6 +48,7 @@ int xfs_reclaim_inode(struct xfs_inode *
> int xfs_reclaim_inodes(struct xfs_mount *mp, int mode);
>
> void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
> +void __xfs_inode_set_reclaim_tag(struct xfs_perag *pag, struct  
> xfs_inode *ip);
> void xfs_inode_clear_reclaim_tag(struct xfs_inode *ip);
> void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp, struct  
> xfs_perag *pag,
> 				struct xfs_inode *ip);


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] xfs: fix locking in xfs_iget_cache_hit
@ 2009-08-17  3:05             ` Felix Blyakher
  0 siblings, 0 replies; 38+ messages in thread
From: Felix Blyakher @ 2009-08-17  3:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs


On Aug 16, 2009, at 7:36 PM, Christoph Hellwig wrote:

> On Sun, Aug 16, 2009 at 05:54:35PM -0500, Felix Blyakher wrote:
>>> The wait_on_inode is only sensible for the non-recycle case.
>>
>> The case, I was referring to, was indeed the reclaimable one when
>> the first thread is going through
>>
>>    xfs_iget
>> 	xfs_iget_cache_hit
>>            if (ip->i_flags & XFS_IRECLAIMABLE) {
>>                ip->i_flags |= XFS_INEW;
>> -->
>>        xfs_setup_inode
>>            inode->i_state = I_NEW|I_LOCK;
>>
>>
>> while another therad run through the following sequence right where  
>> the
>> arrow shows above:
>>
>>        xfs_iget_cache_hit
>>            if (ip->i_flags & XFS_INEW) {
>>                wait_on_inode
>>
>> There is nothing to wait on here yet, as I_LOCK is not set yet.
>
> Yeah.  The new version should fix it.

Agree.

> Here's a version with the small update that Eric suggested, any chance
> we could get this into 2.6.31 still?

Will send the pull request tonight.

Felix

>
>
> -- 
>
> Subject: xfs: fix locking in xfs_iget_cache_hit
> From: Christoph Hellwig <hch@lst.de>
>
> The locking in xfs_iget_cache_hit currently has numerous problems:
>
> - we clear the reclaim tag without i_flags_lock which protects  
> modifications
>   to it
> - we call inode_init_always which can sleep with pag_ici_lock held
>   (this is oss.sgi.com BZ #819)
> - we acquire and drop i_flags_lock a lot and thus provide no  
> consistency
>   between the various flags we set/clear under it
>
> This patch fixes all that with a major revamp of the locking in the  
> function.
> The new version acquires i_flags_lock early and only drops it once  
> we need to
> call into inode_init_always or before calling xfs_ilock.
>
> This patch fixes a bug seen in the wild where we race modifying the  
> reclaim tag.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Felix Blyakher <felixb@sgi.com>
>
> Index: linux-2.6/fs/xfs/xfs_iget.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_iget.c	2009-08-16 20:10:25.200960533  
> -0300
> +++ linux-2.6/fs/xfs/xfs_iget.c	2009-08-16 20:11:30.580432781 -0300
> @@ -191,80 +191,82 @@ xfs_iget_cache_hit(
> 	int			flags,
> 	int			lock_flags) __releases(pag->pag_ici_lock)
> {
> +	struct inode		*inode = VFS_I(ip);
> 	struct xfs_mount	*mp = ip->i_mount;
> -	int			error = EAGAIN;
> +	int			error;
> +
> +	spin_lock(&ip->i_flags_lock);
>
> 	/*
> -	 * If INEW is set this inode is being set up
> -	 * If IRECLAIM is set this inode is being torn down
> -	 * Pause and try again.
> +	 * If we are racing with another cache hit that is currently
> +	 * instantiating this inode or currently recycling it out of
> +	 * reclaimabe state, wait for the initialisation to complete
> +	 * before continuing.
> +	 *
> +	 * XXX(hch): eventually we should do something equivalent to
> +	 *	     wait_on_inode to wait for these flags to be cleared
> +	 *	     instead of polling for it.
> 	 */
> -	if (xfs_iflags_test(ip, (XFS_INEW|XFS_IRECLAIM))) {
> +	if (ip->i_flags & (XFS_INEW|XFS_IRECLAIM)) {
> 		XFS_STATS_INC(xs_ig_frecycle);
> +		error = EAGAIN;
> 		goto out_error;
> 	}
>
> -	/* If IRECLAIMABLE is set, we've torn down the vfs inode part */
> -	if (xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
> -
> -		/*
> -		 * If lookup is racing with unlink, then we should return an
> -		 * error immediately so we don't remove it from the reclaim
> -		 * list and potentially leak the inode.
> -		 */
> -		if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) {
> -			error = ENOENT;
> -			goto out_error;
> -		}
> +	/*
> +	 * If lookup is racing with unlink return an error immediately.
> +	 */
> +	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
> +		error = ENOENT;
> +		goto out_error;
> +	}
>
> +	/*
> +	 * If IRECLAIMABLE is set, we've torn down the VFS inode already.
> +	 * Need to carefully get it back into useable state.
> +	 */
> +	if (ip->i_flags & XFS_IRECLAIMABLE) {
> 		xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
>
> 		/*
> -		 * We need to re-initialise the VFS inode as it has been
> -		 * 'freed' by the VFS. Do this here so we can deal with
> -		 * errors cleanly, then tag it so it can be set up correctly
> -		 * later.
> +		 * We need to set XFS_INEW atomically with clearing the
> +		 * reclaimable tag so that we do have an indicator of the
> +		 * inode still being initialized.
> 		 */
> -		if (inode_init_always(mp->m_super, VFS_I(ip))) {
> -			error = ENOMEM;
> -			goto out_error;
> -		}
> +		ip->i_flags |= XFS_INEW;
> +		ip->i_flags &= ~XFS_IRECLAIMABLE;
> +		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
>
> -		/*
> -		 * We must set the XFS_INEW flag before clearing the
> -		 * XFS_IRECLAIMABLE flag so that if a racing lookup does
> -		 * not find the XFS_IRECLAIMABLE above but has the igrab()
> -		 * below succeed we can safely check XFS_INEW to detect
> -		 * that this inode is still being initialised.
> -		 */
> -		xfs_iflags_set(ip, XFS_INEW);
> -		xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
> +		spin_unlock(&ip->i_flags_lock);
> +		read_unlock(&pag->pag_ici_lock);
>
> -		/* clear the radix tree reclaim flag as well. */
> -		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
> -	} else if (!igrab(VFS_I(ip))) {
> +		error = -inode_init_always(mp->m_super, inode);
> +		if (error) {
> +			/*
> +			 * Re-initializing the inode failed, and we are in deep
> +			 * trouble.  Try to re-add it to the reclaim list.
> +			 */
> +			read_lock(&pag->pag_ici_lock);
> +			spin_lock(&ip->i_flags_lock);
> +
> +			ip->i_flags &= ~XFS_INEW;
> +			ip->i_flags |= XFS_IRECLAIMABLE;
> +			__xfs_inode_set_reclaim_tag(pag, ip);
> +			goto out_error;
> +		}
> +		inode->i_state = I_LOCK|I_NEW;
> +	} else {
> 		/* If the VFS inode is being torn down, pause and try again. */
> -		XFS_STATS_INC(xs_ig_frecycle);
> -		goto out_error;
> -	} else if (xfs_iflags_test(ip, XFS_INEW)) {
> -		/*
> -		 * We are racing with another cache hit that is
> -		 * currently recycling this inode out of the XFS_IRECLAIMABLE
> -		 * state. Wait for the initialisation to complete before
> -		 * continuing.
> -		 */
> -		wait_on_inode(VFS_I(ip));
> -	}
> +		if (!igrab(inode)) {
> +			error = EAGAIN;
> +			goto out_error;
> +		}
>
> -	if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
> -		error = ENOENT;
> -		iput(VFS_I(ip));
> -		goto out_error;
> +		/* We've got a live one. */
> +		spin_unlock(&ip->i_flags_lock);
> +		read_unlock(&pag->pag_ici_lock);
> 	}
>
> -	/* We've got a live one. */
> -	read_unlock(&pag->pag_ici_lock);
> -
> 	if (lock_flags != 0)
> 		xfs_ilock(ip, lock_flags);
>
> @@ -274,6 +276,7 @@ xfs_iget_cache_hit(
> 	return 0;
>
> out_error:
> +	spin_unlock(&ip->i_flags_lock);
> 	read_unlock(&pag->pag_ici_lock);
> 	return error;
> }
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.c	2009-08-16  
> 20:01:14.632430664 -0300
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.c	2009-08-16  
> 20:10:25.740968342 -0300
> @@ -708,6 +708,16 @@ xfs_reclaim_inode(
> 	return 0;
> }
>
> +void
> +__xfs_inode_set_reclaim_tag(
> +	struct xfs_perag	*pag,
> +	struct xfs_inode	*ip)
> +{
> +	radix_tree_tag_set(&pag->pag_ici_root,
> +			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
> +			   XFS_ICI_RECLAIM_TAG);
> +}
> +
> /*
>  * We set the inode flag atomically with the radix tree tag.
>  * Once we get tag lookups on the radix tree, this inode flag
> @@ -722,8 +732,7 @@ xfs_inode_set_reclaim_tag(
>
> 	read_lock(&pag->pag_ici_lock);
> 	spin_lock(&ip->i_flags_lock);
> -	radix_tree_tag_set(&pag->pag_ici_root,
> -			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
> +	__xfs_inode_set_reclaim_tag(pag, ip);
> 	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
> 	spin_unlock(&ip->i_flags_lock);
> 	read_unlock(&pag->pag_ici_lock);
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.h
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.h	2009-08-16  
> 20:01:14.640431122 -0300
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.h	2009-08-16  
> 20:10:25.744967593 -0300
> @@ -48,6 +48,7 @@ int xfs_reclaim_inode(struct xfs_inode *
> int xfs_reclaim_inodes(struct xfs_mount *mp, int mode);
>
> void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
> +void __xfs_inode_set_reclaim_tag(struct xfs_perag *pag, struct  
> xfs_inode *ip);
> void xfs_inode_clear_reclaim_tag(struct xfs_inode *ip);
> void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp, struct  
> xfs_perag *pag,
> 				struct xfs_inode *ip);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2009-08-17  3:30 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-04 14:15 [PATCH 0/4] XFS iget fixes Christoph Hellwig
2009-08-04 14:15 ` Christoph Hellwig
2009-08-04 14:15 ` [PATCH 1/4] xfs: fix locking in xfs_iget_cache_hit Christoph Hellwig
2009-08-04 14:15   ` Christoph Hellwig
2009-08-06 21:50   ` Eric Sandeen
2009-08-06 22:29     ` Christoph Hellwig
2009-08-07 17:25   ` Felix Blyakher
2009-08-07 17:25     ` Felix Blyakher
2009-08-10 17:09     ` Christoph Hellwig
2009-08-10 17:09       ` Christoph Hellwig
2009-08-16 21:01       ` Eric Sandeen
2009-08-16 21:01         ` Eric Sandeen
2009-08-16 22:54       ` Felix Blyakher
2009-08-16 22:54         ` Felix Blyakher
2009-08-17  0:36         ` Christoph Hellwig
2009-08-17  0:36           ` Christoph Hellwig
2009-08-17  3:05           ` Felix Blyakher
2009-08-17  3:05             ` Felix Blyakher
2009-08-04 14:15 ` [PATCH 2/4] fix inode_init_always calling convention Christoph Hellwig
2009-08-04 14:15   ` Christoph Hellwig
2009-08-06 22:30   ` Eric Sandeen
2009-08-06 22:30     ` Eric Sandeen
2009-08-07 17:39   ` Felix Blyakher
2009-08-07 17:39     ` Felix Blyakher
2009-08-07 18:09     ` Felix Blyakher
2009-08-07 18:09       ` Felix Blyakher
2009-08-04 14:15 ` [PATCH 3/4] add __destroy_inode Christoph Hellwig
2009-08-04 14:15   ` Christoph Hellwig
2009-08-06 22:56   ` Eric Sandeen
2009-08-06 22:56     ` Eric Sandeen
2009-08-07 18:20   ` Felix Blyakher
2009-08-07 18:20     ` Felix Blyakher
2009-08-04 14:15 ` [PATCH 4/4] xfs: add xfs_inode_free Christoph Hellwig
2009-08-04 14:15   ` Christoph Hellwig
2009-08-06 23:54   ` Eric Sandeen
2009-08-06 23:54     ` Eric Sandeen
2009-08-07 18:22   ` Felix Blyakher
2009-08-07 18:22     ` Felix Blyakher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.