NVDIMM Device and Persistent Memory development
 help / color / Atom feed
From: Jan Kara <jack@suse.cz>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>,
	linux-nvdimm@lists.01.org, "[4.10+]" <stable@vger.kernel.org>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
Subject: [PATCH 1/4] dax: prevent invalidation of mapped DAX entries
Date: Wed, 10 May 2017 10:54:16 +0200
Message-ID: <20170510085419.27601-2-jack@suse.cz> (raw)
In-Reply-To: <20170510085419.27601-1-jack@suse.cz>

From: Ross Zwisler <ross.zwisler@linux.intel.com>

dax_invalidate_mapping_entry() currently removes DAX exceptional entries
only if they are clean and unlocked.  This is done via:


However, for page cache pages removed in invalidate_mapping_pages() there
is an additional criteria which is that the page must not be mapped.  This
is noted in the comments above invalidate_mapping_pages() and is checked in

For DAX entries this means that we can can end up in a situation where a
DAX exceptional entry, either a huge zero page or a regular DAX entry,
could end up mapped but without an associated radix tree entry. This is
inconsistent with the rest of the DAX code and with what happens in the
page cache case.

We aren't able to unmap the DAX exceptional entry because according to its
comments invalidate_mapping_pages() isn't allowed to block, and
unmap_mapping_range() takes a write lock on the mapping->i_mmap_rwsem.

Since we essentially never have unmapped DAX entries to evict from the
radix tree, just remove dax_invalidate_mapping_entry().

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Fixes: c6dcf52c23d2 ("mm: Invalidate DAX radix tree entries only if appropriate")
Reported-by: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>    [4.10+]
Signed-off-by: Jan Kara <jack@suse.cz>
 fs/dax.c            | 29 -----------------------------
 include/linux/dax.h |  1 -
 mm/truncate.c       |  9 +++------
 3 files changed, 3 insertions(+), 36 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 66d79067eedf..38deebb8c86e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -461,35 +461,6 @@ int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index)
- * Invalidate exceptional DAX entry if easily possible. This handles DAX
- * entries for invalidate_inode_pages() so we evict the entry only if we can
- * do so without blocking.
- */
-int dax_invalidate_mapping_entry(struct address_space *mapping, pgoff_t index)
-	int ret = 0;
-	void *entry, **slot;
-	struct radix_tree_root *page_tree = &mapping->page_tree;
-	spin_lock_irq(&mapping->tree_lock);
-	entry = __radix_tree_lookup(page_tree, index, NULL, &slot);
-	if (!entry || !radix_tree_exceptional_entry(entry) ||
-	    slot_locked(mapping, slot))
-		goto out;
-	if (radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_DIRTY) ||
-	    radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE))
-		goto out;
-	radix_tree_delete(page_tree, index);
-	mapping->nrexceptional--;
-	ret = 1;
-	spin_unlock_irq(&mapping->tree_lock);
-	if (ret)
-		dax_wake_mapping_entry_waiter(mapping, index, entry, true);
-	return ret;
  * Invalidate exceptional DAX entry if it is clean.
 int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
diff --git a/include/linux/dax.h b/include/linux/dax.h
index d3158e74a59e..d1236d16ef00 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -63,7 +63,6 @@ ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
 int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
 		    const struct iomap_ops *ops);
 int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
-int dax_invalidate_mapping_entry(struct address_space *mapping, pgoff_t index);
 int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
 				      pgoff_t index);
 void dax_wake_mapping_entry_waiter(struct address_space *mapping,
diff --git a/mm/truncate.c b/mm/truncate.c
index 83a059e8cd1d..706cff171a15 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -67,17 +67,14 @@ static void truncate_exceptional_entry(struct address_space *mapping,
  * Invalidate exceptional entry if easily possible. This handles exceptional
- * entries for invalidate_inode_pages() so for DAX it evicts only unlocked and
- * clean entries.
+ * entries for invalidate_inode_pages().
 static int invalidate_exceptional_entry(struct address_space *mapping,
 					pgoff_t index, void *entry)
-	/* Handled by shmem itself */
-	if (shmem_mapping(mapping))
+	/* Handled by shmem itself, or for DAX we do nothing. */
+	if (shmem_mapping(mapping) || dax_mapping(mapping))
 		return 1;
-	if (dax_mapping(mapping))
-		return dax_invalidate_mapping_entry(mapping, index);
 	clear_shadow_entry(mapping, index, entry);
 	return 1;

Linux-nvdimm mailing list

  reply index

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-10  8:54 [PATCH 0/4 v4] mm,dax: Fix data corruption due to mmap inconsistency Jan Kara
2017-05-10  8:54 ` Jan Kara [this message]
2017-05-10  8:54 ` [PATCH 2/4] mm: Fix data corruption due to stale mmap reads Jan Kara
2017-05-10  8:54 ` [PATCH 3/4] ext4: Return back to starting transaction in ext4_dax_huge_fault() Jan Kara
2017-05-10 17:27 ` [PATCH 0/4 v4] mm,dax: Fix data corruption due to mmap inconsistency Ross Zwisler
  -- strict thread matches above, loose matches on Subject: below --
2017-05-10  8:54 [PATCH 4/4] dax: Fix data corruption when fault races with write Jan Kara
2017-05-10 17:27 ` [PATCH 5/4] dax: Fix PMD " Ross Zwisler
2017-05-11  8:39   ` Jan Kara
2017-05-09 12:18 [PATCH 0/4 v3] mm,dax: Fix data corruption due to mmap inconsistency Jan Kara
2017-05-09 12:18 ` [PATCH 1/4] dax: prevent invalidation of mapped DAX entries Jan Kara
2017-05-05  7:24 [PATCH 0/4 v2] mm,dax: Fix data corruption due to mmap inconsistency Jan Kara
2017-05-05  7:24 ` [PATCH 1/4] dax: prevent invalidation of mapped DAX entries Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170510085419.27601-2-jack@suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=stable@vger.kernel.org \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

NVDIMM Device and Persistent Memory development

Archives are clonable:
	git clone --mirror https://lore.kernel.org/nvdimm/0 nvdimm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 nvdimm nvdimm/ https://lore.kernel.org/nvdimm \
	public-inbox-index nvdimm

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git