linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: linux-nvdimm@lists.01.org
Cc: hch@lst.de, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	jack@suse.cz
Subject: [PATCH v4 10/12] filesystem-dax: Introduce dax_lock_page()
Date: Fri, 08 Jun 2018 16:51:14 -0700	[thread overview]
Message-ID: <152850187437.38390.2257981090761438811.stgit@dwillia2-desk3.amr.corp.intel.com> (raw)
In-Reply-To: <152850182079.38390.8280340535691965744.stgit@dwillia2-desk3.amr.corp.intel.com>

In preparation for implementing support for memory poison (media error)
handling via dax mappings, implement a lock_page() equivalent. Poison
error handling requires rmap and needs guarantees that the page->mapping
association is maintained / valid (inode not freed) for the duration of
the lookup.

In the device-dax case it is sufficient to simply hold a dev_pagemap
reference. In the filesystem-dax case we need to use the entry lock.

Export the entry lock via dax_lock_page() that uses rcu_read_lock() to
protect against the inode being freed, and revalidates the page->mapping
association under xa_lock().

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/dax.c            |   76 +++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dax.h |   15 ++++++++++
 2 files changed, 91 insertions(+)

diff --git a/fs/dax.c b/fs/dax.c
index cccf6cad1a7a..b7e71b108fcf 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -361,6 +361,82 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping,
 	}
 }
 
+struct page *dax_lock_page(unsigned long pfn)
+{
+	pgoff_t index;
+	struct inode *inode;
+	wait_queue_head_t *wq;
+	void *entry = NULL, **slot;
+	struct address_space *mapping;
+	struct wait_exceptional_entry_queue ewait;
+	struct page *ret = NULL, *page = pfn_to_page(pfn);
+
+	rcu_read_lock();
+	for (;;) {
+		mapping = READ_ONCE(page->mapping);
+
+		if (!mapping || !IS_DAX(mapping->host))
+			break;
+
+		/*
+		 * In the device-dax case there's no need to lock, a
+		 * struct dev_pagemap pin is sufficient to keep the
+		 * inode alive.
+		 */
+		inode = mapping->host;
+		if (S_ISCHR(inode->i_mode)) {
+			ret = page;
+			break;
+		}
+
+		xa_lock_irq(&mapping->i_pages);
+		if (mapping != page->mapping) {
+			xa_unlock_irq(&mapping->i_pages);
+			continue;
+		}
+		index = page->index;
+
+		init_wait(&ewait.wait);
+		ewait.wait.func = wake_exceptional_entry_func;
+
+		entry = __radix_tree_lookup(&mapping->i_pages, index, NULL,
+				&slot);
+		if (!entry ||
+		    WARN_ON_ONCE(!radix_tree_exceptional_entry(entry))) {
+			xa_unlock_irq(&mapping->i_pages);
+			break;
+		} else if (!slot_locked(mapping, slot)) {
+			lock_slot(mapping, slot);
+			ret = page;
+			xa_unlock_irq(&mapping->i_pages);
+			break;
+		}
+
+		wq = dax_entry_waitqueue(mapping, index, entry, &ewait.key);
+		prepare_to_wait_exclusive(wq, &ewait.wait,
+				TASK_UNINTERRUPTIBLE);
+		xa_unlock_irq(&mapping->i_pages);
+		rcu_read_unlock();
+		schedule();
+		finish_wait(wq, &ewait.wait);
+		rcu_read_lock();
+	}
+	rcu_read_unlock();
+
+	return page;
+}
+
+void dax_unlock_page(struct page *page)
+{
+	struct address_space *mapping = page->mapping;
+	struct inode *inode = mapping->host;
+
+	if (S_ISCHR(inode->i_mode))
+		return;
+
+	dax_unlock_mapping_entry(mapping, page->index);
+}
+
 /*
  * Find radix tree entry at given index. If it points to an exceptional entry,
  * return it with the radix tree entry locked. If the radix tree doesn't
diff --git a/include/linux/dax.h b/include/linux/dax.h
index f9eb22ad341e..641cab7e1fa7 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -83,6 +83,8 @@ static inline void fs_put_dax(struct dax_device *dax_dev)
 struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev);
 int dax_writeback_mapping_range(struct address_space *mapping,
 		struct block_device *bdev, struct writeback_control *wbc);
+struct page *dax_lock_page(unsigned long pfn);
+void dax_unlock_page(struct page *page);
 #else
 static inline int bdev_dax_supported(struct super_block *sb, int blocksize)
 {
@@ -108,6 +110,19 @@ static inline int dax_writeback_mapping_range(struct address_space *mapping,
 {
 	return -EOPNOTSUPP;
 }
+
+static inline struct page *dax_lock_page(unsigned long pfn)
+{
+	struct page *page = pfn_to_page(pfn);
+
+	if (IS_DAX(page->mapping->host))
+		return page;
+	return NULL;
+}
+
+static inline void dax_unlock_page(struct page *page)
+{
+}
 #endif
 
 int dax_read_lock(void);

  parent reply	other threads:[~2018-06-08 23:51 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-08 23:50 [PATCH v4 00/12] mm: Teach memory_failure() about ZONE_DEVICE pages Dan Williams
2018-06-08 23:50 ` [PATCH v4 01/12] device-dax: Convert to vmf_insert_mixed and vm_fault_t Dan Williams
2018-06-08 23:50 ` [PATCH v4 02/12] device-dax: Cleanup vm_fault de-reference chains Dan Williams
2018-06-11 17:12   ` Laurent Dufour
2018-06-11 17:14     ` Dan Williams
2018-06-08 23:50 ` [PATCH v4 03/12] device-dax: Enable page_mapping() Dan Williams
2018-06-08 23:50 ` [PATCH v4 04/12] device-dax: Set page->index Dan Williams
2018-06-08 23:50 ` [PATCH v4 05/12] filesystem-dax: " Dan Williams
2018-06-08 23:50 ` [PATCH v4 06/12] mm, madvise_inject_error: Let memory_failure() optionally take a page reference Dan Williams
2018-06-08 23:50 ` [PATCH v4 07/12] x86/mm/pat: Prepare {reserve, free}_memtype() for "decoy" addresses Dan Williams
2018-06-08 23:51 ` [PATCH v4 08/12] x86/memory_failure: Introduce {set, clear}_mce_nospec() Dan Williams
2018-06-08 23:51 ` [PATCH v4 09/12] mm, memory_failure: Pass page size to kill_proc() Dan Williams
2018-06-08 23:51 ` Dan Williams [this message]
2018-06-11 15:41   ` [PATCH v4 10/12] filesystem-dax: Introduce dax_lock_page() Jan Kara
2018-06-11 16:48     ` Dan Williams
2018-06-12 18:07     ` Ross Zwisler
2018-07-04 15:20       ` Dan Williams
2018-07-04 15:17     ` Dan Williams
2018-06-12 18:15   ` Ross Zwisler
2018-07-04 15:11     ` Dan Williams
2018-06-08 23:51 ` [PATCH v4 11/12] mm, memory_failure: Teach memory_failure() about dev_pagemap pages Dan Williams
2018-06-11 15:50   ` Jan Kara
2018-06-11 16:45     ` Dan Williams
2018-06-12 20:14   ` Ross Zwisler
2018-06-12 23:38     ` Dan Williams
2018-06-08 23:51 ` [PATCH v4 12/12] libnvdimm, pmem: Restore page attributes when clearing errors Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=152850187437.38390.2257981090761438811.stgit@dwillia2-desk3.amr.corp.intel.com \
    --to=dan.j.williams@intel.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).