All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Williams, Dan J" <dan.j.williams@intel.com>
To: "jack@suse.cz" <jack@suse.cz>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	"willy@linux.intel.com" <willy@linux.intel.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"ross.zwisler@linux.intel.com" <ross.zwisler@linux.intel.com>,
	"jack@suse.com" <jack@suse.com>,
	"david@fromorbit.com" <david@fromorbit.com>
Subject: Re: [PATCH 4/8] mm, dax: truncate dax mappings at bdev or fs shutdown
Date: Thu, 19 Nov 2015 00:22:14 +0000	[thread overview]
Message-ID: <1447892533.13153.8.camel@intel.com> (raw)
In-Reply-To: <20151118150945.GE6097@quack.suse.cz>

On Wed, 2015-11-18 at 16:09 +0100, Jan Kara wrote:
> Hum, I don't get this. truncate_inode_pages_final() gets called when inode
> has no more users. So there are no mappings of the inode. So how could
> truncate_pagecache() possibly make a difference?

True.  I confirmed with more focus testing that the change to
truncate_inode_pages_final() is not necessary.  After
invalidate_inodes() does unmap_mapping_range() we are protected by
future calls to get_block() and blk_queue_enter() failing when there
are attempts to re-establish a mapping after the block device has been
torn down.

Here's a revised patch.  Note that the call truncate_pagecache() is
replaced with a call to unmap_mapping_range() since it is fine to
access zero pages that might still be in the page cache.

8<----
Subject: mm, dax: unmap dax mappings at bdev or fs shutdown

From: Dan Williams <dan.j.williams@intel.com>

Currently dax mappings leak past / survive block_device shutdown.  While
page cache pages are permitted to be read/written after the block_device
is torn down this is not acceptable in the dax case as all media access
must end when the device is disabled.  The pfn backing a dax mapping is
permitted to be invalidated after bdev shutdown and this is indeed the
case with brd.

When a dax capable block_device driver calls del_gendisk() in its
shutdown path del_gendisk() needs to ensure that all DAX pfns are
unmapped.  This is different than the pagecache backed case where the
disk is protected by the queue being torn down which ends I/O to the
device.  Since dax bypasses the page cache we need to unconditionally
unmap the inode.

Cc: <stable@vger.kernel.org>
Cc: Jan Kara <jack@suse.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
[honza: drop changes to truncate_inode_pages_final]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/inode.c |   27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/fs/inode.c b/fs/inode.c
index 1be5f9003eb3..dcb31d2c15e6 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -579,6 +579,18 @@ static void dispose_list(struct list_head *head)
 	}
 }
 
+static void unmap_list(struct list_head *head)
+{
+	struct inode *inode, *_i;
+
+	list_for_each_entry_safe(inode, _i, head, i_lru) {
+		list_del_init(&inode->i_lru);
+		unmap_mapping_range(&inode->i_data, 0, 0, 1);
+		iput(inode);
+		cond_resched();
+	}
+}
+
 /**
  * evict_inodes	- evict all evictable inodes for a superblock
  * @sb:		superblock to operate on
@@ -642,6 +654,7 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty)
 	int busy = 0;
 	struct inode *inode, *next;
 	LIST_HEAD(dispose);
+	LIST_HEAD(unmap);
 
 	spin_lock(&sb->s_inode_list_lock);
 	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
@@ -655,6 +668,19 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty)
 			busy = 1;
 			continue;
 		}
+		if (IS_DAX(inode) && atomic_read(&inode->i_count)) {
+			/*
+			 * dax mappings can't live past this invalidation event
+			 * as there is no page cache present to allow the data
+			 * to remain accessible.
+			 */
+			__iget(inode);
+			inode_lru_list_del(inode);
+			spin_unlock(&inode->i_lock);
+			list_add(&inode->i_lru, &unmap);
+			busy = 1;
+			continue;
+		}
 		if (atomic_read(&inode->i_count)) {
 			spin_unlock(&inode->i_lock);
 			busy = 1;
@@ -669,6 +695,7 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty)
 	spin_unlock(&sb->s_inode_list_lock);
 
 	dispose_list(&dispose);
+	unmap_list(&unmap);
 
 	return busy;
 }

  reply	other threads:[~2015-11-19  0:22 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-17 20:15 [PATCH 0/8] dax fixes / cleanups: pmd vs thp, lifetime, and locking Dan Williams
2015-11-17 20:15 ` [PATCH 1/8] ext2, ext4: warn when mounting with dax enabled Dan Williams
2015-11-17 20:16 ` [PATCH 2/8] dax: disable pmd mappings Dan Williams
2015-11-17 20:51   ` Ross Zwisler
2015-11-17 20:16 ` [PATCH 3/8] mm, dax: fix DAX deadlocks (COW fault) Dan Williams
2015-11-17 20:16 ` [PATCH 4/8] mm, dax: truncate dax mappings at bdev or fs shutdown Dan Williams
2015-11-18 15:09   ` Jan Kara
2015-11-19  0:22     ` Williams, Dan J [this message]
2015-11-19 12:55       ` Jan Kara
2015-11-19 16:55         ` Dan Williams
2015-11-19 17:12           ` Jan Kara
2015-11-19 23:17           ` Dave Chinner
2015-11-20  0:05             ` Williams, Dan J
2015-11-20  4:06               ` Dave Chinner
2015-11-20  4:25                 ` Dan Williams
2015-11-20 17:08                   ` Dan Williams
2015-11-17 20:16 ` [PATCH 5/8] pmem, dax: clean up clear_pmem() Dan Williams
2015-11-17 20:16 ` [PATCH 6/8] dax: increase granularity of dax_clear_blocks() operations Dan Williams
2015-11-17 20:16 ` [PATCH 7/8] dax: guarantee page aligned results from bdev_direct_access() Dan Williams
2015-11-17 20:16 ` [PATCH 8/8] dax: fix lifetime of in-kernel dax mappings with dax_map_atomic() Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1447892533.13153.8.camel@intel.com \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=jack@suse.com \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=stable@vger.kernel.org \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.