All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Subject: [PATCH 06/11] xfs: xfs_inode_free() isn't RCU safe
Date: Wed, 13 Apr 2016 15:31:27 +1000	[thread overview]
Message-ID: <1460525492-1170-7-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1460525492-1170-1-git-send-email-david@fromorbit.com>

From: Dave Chinner <dchinner@redhat.com>

The xfs_inode freed in xfs_inode_free() has multiple allocated
structures attached to it. We free these in xfs_inode_free() before
we mark the inode as invalid, and before we run call_rcu() to queue
the structure for freeing.

Unfortunately, this freeing can race with other accesses that are in
the RCU current grace period that have found the inode in the radix
tree with a valid state.  This includes xfs_iflush_cluster(), which
calls xfs_inode_clean(), and that accesses the inode log item on the
xfs_inode.

The log item structure is freed in xfs_inode_free(), so there is the
possibility we can be accessing freed memory in xfs_iflush_cluster()
after validating the xfs_inode structure as being valid for this RCU
context. Hence we can get spuriously incorrect clean state returned
from such checks. This can lead to use thinking the inode is dirty
when it is, in fact, clean, and so incorrectly attaching it to the
buffer for IO and completion processing.

This then leads to use-after-free situations on the xfs_inode itself
if the IO completes after the current RCU grace period expires. The
buffer callbacks will access the xfs_inode and try to do all sorts
of things it shouldn't with freed memory.

IOWs, xfs_iflush_cluster() only works correctly when racing with
inode reclaim if the inode log item is present and correctly stating
the inode is clean. If the inode is being freed, then reclaim has
already made sure the inode is clean, and hence xfs_iflush_cluster
can skip it. However, we are accessing the inode inode under RCU
read lock protection and so also must ensure that all dynamically
allocated memory we reference in this context is not freed until the
RCU grace period expires.

To fix this, move all the potential memory freeing into
xfs_inode_free_callback() so that we are guarantee RCU protected
lookup code will always have the memory structures it needs
available during the RCU grace period that lookup races can occur
in.

Discovered-by: Brain Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/xfs_icache.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index bf2d607..0c94cde 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -94,13 +94,6 @@ xfs_inode_free_callback(
 	struct inode		*inode = container_of(head, struct inode, i_rcu);
 	struct xfs_inode	*ip = XFS_I(inode);
 
-	kmem_zone_free(xfs_inode_zone, ip);
-}
-
-void
-xfs_inode_free(
-	struct xfs_inode	*ip)
-{
 	switch (VFS_I(ip)->i_mode & S_IFMT) {
 	case S_IFREG:
 	case S_IFDIR:
@@ -118,6 +111,13 @@ xfs_inode_free(
 		ip->i_itemp = NULL;
 	}
 
+	kmem_zone_free(xfs_inode_zone, ip);
+}
+
+void
+xfs_inode_free(
+	struct xfs_inode	*ip)
+{
 	/*
 	 * Because we use RCU freeing we need to ensure the inode always
 	 * appears to be reclaimed with an invalid inode number when in the
-- 
2.7.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2016-04-13  5:32 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-13  5:31 [PATCH 00/11 v3] xfs: inode reclaim vs the world Dave Chinner
2016-04-13  5:31 ` [PATCH 01/11] xfs: we don't need no steekin ->evict_inode Dave Chinner
2016-04-13 16:41   ` Christoph Hellwig
2016-04-13 21:20     ` Dave Chinner
2016-04-14 12:10   ` Brian Foster
2016-04-13  5:31 ` [PATCH 02/11] xfs: xfs_iflush_cluster fails to abort on error Dave Chinner
2016-04-13 16:41   ` Christoph Hellwig
2016-04-13  5:31 ` [PATCH 03/11] xfs: fix inode validity check in xfs_iflush_cluster Dave Chinner
2016-04-13  5:31 ` [PATCH 04/11] xfs: skip stale inodes " Dave Chinner
2016-04-13  5:31 ` [PATCH 05/11] xfs: optimise xfs_iext_destroy Dave Chinner
2016-04-13 16:45   ` Christoph Hellwig
2016-04-13  5:31 ` Dave Chinner [this message]
2016-04-13  5:31 ` [PATCH 07/11] xfs: mark reclaimed inodes invalid earlier Dave Chinner
2016-04-13  6:49   ` Dave Chinner
2016-04-14 12:10     ` Brian Foster
2016-04-14 23:31       ` Dave Chinner
2016-04-15 12:46         ` Brian Foster
2016-04-13  5:31 ` [PATCH 08/11] xfs: xfs_iflush_cluster has range issues Dave Chinner
2016-04-13  5:31 ` [PATCH 09/11] xfs: rename variables in xfs_iflush_cluster for clarity Dave Chinner
2016-04-13  5:31 ` [PATCH 10/11] xfs: simplify inode reclaim tagging interfaces Dave Chinner
2016-04-14 12:10   ` Brian Foster
2016-06-29  4:21   ` Darrick J. Wong
2016-04-13  5:31 ` [PATCH 11/11] xfs: move reclaim tagging functions Dave Chinner
2016-04-14 12:11   ` Brian Foster
2016-04-13 15:38 ` [PATCH 00/11 v3] xfs: inode reclaim vs the world Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1460525492-1170-7-git-send-email-david@fromorbit.com \
    --to=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.