Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] dax: Fix Xarray conversion of dax_unlock_mapping_entry()
@ 2018-11-30  0:13 Dan Williams
  2018-11-30 15:49 ` Matthew Wilcox
  0 siblings, 1 reply; 12+ messages in thread
From: Dan Williams @ 2018-11-30  0:13 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Matthew Wilcox, Jan Kara, linux-fsdevel, linux-kernel

Internal to dax_unlock_mapping_entry(), dax_unlock_entry() is used to
store a replacement entry in the Xarray at the given xas-index with the
DAX_LOCKED bit clear. When called, dax_unlock_entry() expects the unlocked
value of the entry relative to the current Xarray state to be specified.

In most contexts dax_unlock_entry() is operating in the same scope as
the matched dax_lock_entry(). However, in the dax_unlock_mapping_entry()
case the implementation needs to recall the original entry. In the case
where the original entry is a 'pmd' entry it is possible that the pfn
performed to do the lookup is misaligned to the value retrieved in the
Xarray.

When creating the 'unlocked' entry be sure to align it to the expected
size as reflected by the DAX_PMD flag. Otherwise, future lookups become
confused by finding a 'pte' aligned value at an index that should return
a 'pmd' aligned value. This mismatch results in failure signatures like
the following:

 WARNING: CPU: 38 PID: 1396 at fs/dax.c:340 dax_insert_entry+0x2b2/0x2d0
 RIP: 0010:dax_insert_entry+0x2b2/0x2d0
 [..]
 Call Trace:
  dax_iomap_pte_fault.isra.41+0x791/0xde0
  ext4_dax_huge_fault+0x16f/0x1f0
  ? up_read+0x1c/0xa0
  __do_fault+0x1f/0x160
  __handle_mm_fault+0x1033/0x1490
  handle_mm_fault+0x18b/0x3d0

...and potential corruption of nearby page state as housekeeping
routines, like dax_disassociate_entry(), may overshoot their expected
bounds starting at the wrong page.

Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/dax.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 3f592dc18d67..6c5f8f345b1a 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -59,6 +59,7 @@ static inline unsigned int pe_order(enum page_entry_size pe_size)
 
 /* The order of a PMD entry */
 #define PMD_ORDER	(PMD_SHIFT - PAGE_SHIFT)
+#define PMD_ORDER_MASK	~((1UL << PMD_ORDER) - 1)
 
 static wait_queue_head_t wait_table[DAX_WAIT_TABLE_ENTRIES];
 
@@ -93,9 +94,13 @@ static unsigned long dax_to_pfn(void *entry)
 	return xa_to_value(entry) >> DAX_SHIFT;
 }
 
-static void *dax_make_entry(pfn_t pfn, unsigned long flags)
+static void *dax_make_entry(pfn_t pfn_t, unsigned long flags)
 {
-	return xa_mk_value(flags | (pfn_t_to_pfn(pfn) << DAX_SHIFT));
+	unsigned long pfn = pfn_t_to_pfn(pfn_t);
+
+	if (flags & DAX_PMD)
+		pfn &= PMD_ORDER_MASK;
+	return xa_mk_value(flags | (pfn << DAX_SHIFT));
 }
 
 static bool dax_is_locked(void *entry)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dax: Fix Xarray conversion of dax_unlock_mapping_entry()
  2018-11-30  0:13 [PATCH] dax: Fix Xarray conversion of dax_unlock_mapping_entry() Dan Williams
@ 2018-11-30 15:49 ` Matthew Wilcox
  2018-11-30 15:54   ` Dan Williams
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2018-11-30 15:49 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm, Jan Kara, linux-fsdevel, linux-kernel

On Thu, Nov 29, 2018 at 04:13:46PM -0800, Dan Williams wrote:
> Internal to dax_unlock_mapping_entry(), dax_unlock_entry() is used to
> store a replacement entry in the Xarray at the given xas-index with the
> DAX_LOCKED bit clear. When called, dax_unlock_entry() expects the unlocked
> value of the entry relative to the current Xarray state to be specified.
> 
> In most contexts dax_unlock_entry() is operating in the same scope as
> the matched dax_lock_entry(). However, in the dax_unlock_mapping_entry()
> case the implementation needs to recall the original entry. In the case
> where the original entry is a 'pmd' entry it is possible that the pfn
> performed to do the lookup is misaligned to the value retrieved in the
> Xarray.

So far, dax_unlock_mapping_entry only has the one caller.  I'd rather we
returned the 'entry' to the caller, then had them pass it back to the
unlock function.  That matches the flow in the rest of DAX and doesn't
pose an undue burden to the caller.

I plan to reclaim the DAX_LOCK bit (and the DAX_EMPTY bit for that
matter), instead using a special DAX_LOCK value.  DAX is almost free of
assumptions about the other bits in a locked entry, and this will remove
the assuption that there's a PMD bit in the entry.

How does this look?

diff --git a/fs/dax.c b/fs/dax.c
index 9bcce89ea18e..7681429af42f 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -351,20 +351,20 @@ static struct page *dax_busy_page(void *entry)
  * @page: The page whose entry we want to lock
  *
  * Context: Process context.
- * Return: %true if the entry was locked or does not need to be locked.
+ * Return: A cookie to pass to dax_unlock_mapping_entry() or %NULL if the
+ * entry could not be locked.
  */
-bool dax_lock_mapping_entry(struct page *page)
+void *dax_lock_mapping_entry(struct page *page)
 {
 	XA_STATE(xas, NULL, 0);
 	void *entry;
-	bool locked;
 
 	/* Ensure page->mapping isn't freed while we look at it */
 	rcu_read_lock();
 	for (;;) {
 		struct address_space *mapping = READ_ONCE(page->mapping);
 
-		locked = false;
+		entry = NULL;
 		if (!dax_mapping(mapping))
 			break;
 
@@ -375,7 +375,7 @@ bool dax_lock_mapping_entry(struct page *page)
 		 * otherwise we would not have a valid pfn_to_page()
 		 * translation.
 		 */
-		locked = true;
+		entry = (void *)1;
 		if (S_ISCHR(mapping->host->i_mode))
 			break;
 
@@ -400,22 +400,17 @@ bool dax_lock_mapping_entry(struct page *page)
 		break;
 	}
 	rcu_read_unlock();
-	return locked;
+	return entry;
 }
 
-void dax_unlock_mapping_entry(struct page *page)
+void dax_unlock_mapping_entry(struct page *page, void *entry)
 {
 	struct address_space *mapping = page->mapping;
 	XA_STATE(xas, &mapping->i_pages, page->index);
-	void *entry;
 
 	if (S_ISCHR(mapping->host->i_mode))
 		return;
 
-	rcu_read_lock();
-	entry = xas_load(&xas);
-	rcu_read_unlock();
-	entry = dax_make_entry(page_to_pfn_t(page), dax_is_pmd_entry(entry));
 	dax_unlock_entry(&xas, entry);
 }
 
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 450b28db9533..bc143c2d6980 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -88,8 +88,8 @@ int dax_writeback_mapping_range(struct address_space *mapping,
 		struct block_device *bdev, struct writeback_control *wbc);
 
 struct page *dax_layout_busy_page(struct address_space *mapping);
-bool dax_lock_mapping_entry(struct page *page);
-void dax_unlock_mapping_entry(struct page *page);
+void *dax_lock_mapping_entry(struct page *page);
+void dax_unlock_mapping_entry(struct page *page, void *entry);
 #else
 static inline bool bdev_dax_supported(struct block_device *bdev,
 		int blocksize)
@@ -122,14 +122,14 @@ static inline int dax_writeback_mapping_range(struct address_space *mapping,
 	return -EOPNOTSUPP;
 }
 
-static inline bool dax_lock_mapping_entry(struct page *page)
+static inline void *dax_lock_mapping_entry(struct page *page)
 {
 	if (IS_DAX(page->mapping->host))
-		return true;
-	return false;
+		return (void *)1;
+	return NULL;
 }
 
-static inline void dax_unlock_mapping_entry(struct page *page)
+static inline void dax_unlock_mapping_entry(struct page *page, void *entry)
 {
 }
 #endif
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 0cd3de3550f0..3abea1e19902 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1161,6 +1161,7 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
 	LIST_HEAD(tokill);
 	int rc = -EBUSY;
 	loff_t start;
+	void *cookie;
 
 	/*
 	 * Prevent the inode from being freed while we are interrogating
@@ -1169,7 +1170,8 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
 	 * also prevents changes to the mapping of this pfn until
 	 * poison signaling is complete.
 	 */
-	if (!dax_lock_mapping_entry(page))
+	cookie = dax_lock_mapping_entry(page);
+	if (!cookie)
 		goto out;
 
 	if (hwpoison_filter(page)) {
@@ -1220,7 +1222,7 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
 	kill_procs(&tokill, flags & MF_MUST_KILL, !unmap_success, pfn, flags);
 	rc = 0;
 unlock:
-	dax_unlock_mapping_entry(page);
+	dax_unlock_mapping_entry(page, cookie);
 out:
 	/* drop pgmap ref acquired in caller */
 	put_dev_pagemap(pgmap);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dax: Fix Xarray conversion of dax_unlock_mapping_entry()
  2018-11-30 15:49 ` Matthew Wilcox
@ 2018-11-30 15:54   ` Dan Williams
  2018-11-30 16:24     ` Matthew Wilcox
  0 siblings, 1 reply; 12+ messages in thread
From: Dan Williams @ 2018-11-30 15:54 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-nvdimm, Jan Kara, linux-fsdevel, Linux Kernel Mailing List

On Fri, Nov 30, 2018 at 7:49 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Nov 29, 2018 at 04:13:46PM -0800, Dan Williams wrote:
> > Internal to dax_unlock_mapping_entry(), dax_unlock_entry() is used to
> > store a replacement entry in the Xarray at the given xas-index with the
> > DAX_LOCKED bit clear. When called, dax_unlock_entry() expects the unlocked
> > value of the entry relative to the current Xarray state to be specified.
> >
> > In most contexts dax_unlock_entry() is operating in the same scope as
> > the matched dax_lock_entry(). However, in the dax_unlock_mapping_entry()
> > case the implementation needs to recall the original entry. In the case
> > where the original entry is a 'pmd' entry it is possible that the pfn
> > performed to do the lookup is misaligned to the value retrieved in the
> > Xarray.
>
> So far, dax_unlock_mapping_entry only has the one caller.  I'd rather we
> returned the 'entry' to the caller, then had them pass it back to the
> unlock function.  That matches the flow in the rest of DAX and doesn't
> pose an undue burden to the caller.
>
> I plan to reclaim the DAX_LOCK bit (and the DAX_EMPTY bit for that
> matter), instead using a special DAX_LOCK value.  DAX is almost free of
> assumptions about the other bits in a locked entry, and this will remove
> the assuption that there's a PMD bit in the entry.
>
> How does this look?
>

Looks good to me, although can we make that cookie an actual type? I
think it's mostly ok to pass around (void *) for 'entry' inside of
fs/dax.c, but once an entry leaves that file I'd like it to have an
explicit type to catch people that might accidentally pass a (struct
page *) to the unlock routine.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dax: Fix Xarray conversion of dax_unlock_mapping_entry()
  2018-11-30 15:54   ` Dan Williams
@ 2018-11-30 16:24     ` Matthew Wilcox
  2018-11-30 16:33       ` Dan Williams
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2018-11-30 16:24 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm, Jan Kara, linux-fsdevel, Linux Kernel Mailing List

On Fri, Nov 30, 2018 at 07:54:49AM -0800, Dan Williams wrote:
> Looks good to me, although can we make that cookie an actual type? I
> think it's mostly ok to pass around (void *) for 'entry' inside of
> fs/dax.c, but once an entry leaves that file I'd like it to have an
> explicit type to catch people that might accidentally pass a (struct
> page *) to the unlock routine.

That's a really good idea.  Something like this?

typedef struct {
	void *v;
} dax_entry_t;

I could see us making good use of that within dax.c.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dax: Fix Xarray conversion of dax_unlock_mapping_entry()
  2018-11-30 16:24     ` Matthew Wilcox
@ 2018-11-30 16:33       ` Dan Williams
  2018-11-30 17:01         ` Dan Williams
  0 siblings, 1 reply; 12+ messages in thread
From: Dan Williams @ 2018-11-30 16:33 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-nvdimm, Jan Kara, linux-fsdevel, Linux Kernel Mailing List

On Fri, Nov 30, 2018 at 8:24 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Nov 30, 2018 at 07:54:49AM -0800, Dan Williams wrote:
> > Looks good to me, although can we make that cookie an actual type? I
> > think it's mostly ok to pass around (void *) for 'entry' inside of
> > fs/dax.c, but once an entry leaves that file I'd like it to have an
> > explicit type to catch people that might accidentally pass a (struct
> > page *) to the unlock routine.
>
> That's a really good idea.  Something like this?
>
> typedef struct {
>         void *v;
> } dax_entry_t;

Yes, please.

> I could see us making good use of that within dax.c.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dax: Fix Xarray conversion of dax_unlock_mapping_entry()
  2018-11-30 16:33       ` Dan Williams
@ 2018-11-30 17:01         ` Dan Williams
  2018-11-30 19:50           ` Matthew Wilcox
  0 siblings, 1 reply; 12+ messages in thread
From: Dan Williams @ 2018-11-30 17:01 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-nvdimm, Jan Kara, linux-fsdevel, Linux Kernel Mailing List

On Fri, Nov 30, 2018 at 8:33 AM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Fri, Nov 30, 2018 at 8:24 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Fri, Nov 30, 2018 at 07:54:49AM -0800, Dan Williams wrote:
> > > Looks good to me, although can we make that cookie an actual type? I
> > > think it's mostly ok to pass around (void *) for 'entry' inside of
> > > fs/dax.c, but once an entry leaves that file I'd like it to have an
> > > explicit type to catch people that might accidentally pass a (struct
> > > page *) to the unlock routine.
> >
> > That's a really good idea.  Something like this?
> >
> > typedef struct {
> >         void *v;
> > } dax_entry_t;
>
> Yes, please.
>
> > I could see us making good use of that within dax.c.

I'm now thinking that this is a nice improvement for 4.21. For 4.20-rc
lets do the localized fix.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dax: Fix Xarray conversion of dax_unlock_mapping_entry()
  2018-11-30 17:01         ` Dan Williams
@ 2018-11-30 19:50           ` Matthew Wilcox
  2018-11-30 20:05             ` Dan Williams
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2018-11-30 19:50 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm, Jan Kara, linux-fsdevel, Linux Kernel Mailing List

On Fri, Nov 30, 2018 at 09:01:07AM -0800, Dan Williams wrote:
> On Fri, Nov 30, 2018 at 8:33 AM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > On Fri, Nov 30, 2018 at 8:24 AM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Fri, Nov 30, 2018 at 07:54:49AM -0800, Dan Williams wrote:
> > > > Looks good to me, although can we make that cookie an actual type? I
> > > > think it's mostly ok to pass around (void *) for 'entry' inside of
> > > > fs/dax.c, but once an entry leaves that file I'd like it to have an
> > > > explicit type to catch people that might accidentally pass a (struct
> > > > page *) to the unlock routine.
> > >
> > > That's a really good idea.  Something like this?
> > >
> > > typedef struct {
> > >         void *v;
> > > } dax_entry_t;
> >
> > Yes, please.

Oh.  The caller needs to interpret it to see if the entry was successfully
locked, so it needs to be an integer type (or we need a comparison
function ... bleh).

> > > I could see us making good use of that within dax.c.
> 
> I'm now thinking that this is a nice improvement for 4.21. For 4.20-rc
> lets do the localized fix.

I think both patches are equally risky.  I admit this patch crosses a
file boundary, but the other patch changes dax_make_entry() which is
used by several functions which aren't part of this path, whereas this
patch only changes functions used in the path which is known to be buggy.

This patch has the advantage of getting us closer to where we want to be
sooner.


>From 1135b8d08f23ab4f5b28261535a817f3de9297c9 Mon Sep 17 00:00:00 2001
From: Matthew Wilcox <willy@infradead.org>
Date: Fri, 30 Nov 2018 11:05:06 -0500
Subject: [PATCH] dax: Change lock/unlock API

Return the unlock cookie from dax_lock_mapping_entry() and
pass it to dax_unlock_mapping_entry().  This fixes a bug where
dax_unlock_mapping_entry() was assuming that the page was PMD-aligned
if the entry was a PMD entry.

Debugged-by: Dan Williams <dan.j.williams@intel.com>
Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray")
Signed-off-by: Matthew Wilcox <willy@infradead.org>
---
 fs/dax.c            | 21 ++++++++-------------
 include/linux/dax.h | 15 +++++++++------
 mm/memory-failure.c |  6 ++++--
 3 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 9bcce89ea18e..d2c04e802978 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -351,20 +351,20 @@ static struct page *dax_busy_page(void *entry)
  * @page: The page whose entry we want to lock
  *
  * Context: Process context.
- * Return: %true if the entry was locked or does not need to be locked.
+ * Return: A cookie to pass to dax_unlock_mapping_entry() or 0 if the
+ * entry could not be locked.
  */
-bool dax_lock_mapping_entry(struct page *page)
+dax_entry_t dax_lock_mapping_entry(struct page *page)
 {
 	XA_STATE(xas, NULL, 0);
 	void *entry;
-	bool locked;
 
 	/* Ensure page->mapping isn't freed while we look at it */
 	rcu_read_lock();
 	for (;;) {
 		struct address_space *mapping = READ_ONCE(page->mapping);
 
-		locked = false;
+		entry = NULL;
 		if (!dax_mapping(mapping))
 			break;
 
@@ -375,7 +375,7 @@ bool dax_lock_mapping_entry(struct page *page)
 		 * otherwise we would not have a valid pfn_to_page()
 		 * translation.
 		 */
-		locked = true;
+		entry = (void *)~0UL;
 		if (S_ISCHR(mapping->host->i_mode))
 			break;
 
@@ -400,23 +400,18 @@ bool dax_lock_mapping_entry(struct page *page)
 		break;
 	}
 	rcu_read_unlock();
-	return locked;
+	return (dax_entry_t)entry;
 }
 
-void dax_unlock_mapping_entry(struct page *page)
+void dax_unlock_mapping_entry(struct page *page, dax_entry_t entry)
 {
 	struct address_space *mapping = page->mapping;
 	XA_STATE(xas, &mapping->i_pages, page->index);
-	void *entry;
 
 	if (S_ISCHR(mapping->host->i_mode))
 		return;
 
-	rcu_read_lock();
-	entry = xas_load(&xas);
-	rcu_read_unlock();
-	entry = dax_make_entry(page_to_pfn_t(page), dax_is_pmd_entry(entry));
-	dax_unlock_entry(&xas, entry);
+	dax_unlock_entry(&xas, (void *)entry);
 }
 
 /*
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 450b28db9533..f3fcdf3cb86f 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -7,6 +7,8 @@
 #include <linux/radix-tree.h>
 #include <asm/pgtable.h>
 
+typedef unsigned long dax_entry_t;
+
 struct iomap_ops;
 struct dax_device;
 struct dax_operations {
@@ -88,8 +90,8 @@ int dax_writeback_mapping_range(struct address_space *mapping,
 		struct block_device *bdev, struct writeback_control *wbc);
 
 struct page *dax_layout_busy_page(struct address_space *mapping);
-bool dax_lock_mapping_entry(struct page *page);
-void dax_unlock_mapping_entry(struct page *page);
+dax_entry_t dax_lock_mapping_entry(struct page *page);
+void dax_unlock_mapping_entry(struct page *page, dax_entry_t cookie);
 #else
 static inline bool bdev_dax_supported(struct block_device *bdev,
 		int blocksize)
@@ -122,14 +124,15 @@ static inline int dax_writeback_mapping_range(struct address_space *mapping,
 	return -EOPNOTSUPP;
 }
 
-static inline bool dax_lock_mapping_entry(struct page *page)
+static inline dax_entry_t dax_lock_mapping_entry(struct page *page)
 {
 	if (IS_DAX(page->mapping->host))
-		return true;
-	return false;
+		return ~0UL;
+	return 0;
 }
 
-static inline void dax_unlock_mapping_entry(struct page *page)
+static inline
+void dax_unlock_mapping_entry(struct page *page, dax_entry_t cookie)
 {
 }
 #endif
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 0cd3de3550f0..688a406f31b3 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1161,6 +1161,7 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
 	LIST_HEAD(tokill);
 	int rc = -EBUSY;
 	loff_t start;
+	dax_entry_t cookie;
 
 	/*
 	 * Prevent the inode from being freed while we are interrogating
@@ -1169,7 +1170,8 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
 	 * also prevents changes to the mapping of this pfn until
 	 * poison signaling is complete.
 	 */
-	if (!dax_lock_mapping_entry(page))
+	cookie = dax_lock_mapping_entry(page);
+	if (!cookie)
 		goto out;
 
 	if (hwpoison_filter(page)) {
@@ -1220,7 +1222,7 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
 	kill_procs(&tokill, flags & MF_MUST_KILL, !unmap_success, pfn, flags);
 	rc = 0;
 unlock:
-	dax_unlock_mapping_entry(page);
+	dax_unlock_mapping_entry(page, cookie);
 out:
 	/* drop pgmap ref acquired in caller */
 	put_dev_pagemap(pgmap);
-- 
2.19.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dax: Fix Xarray conversion of dax_unlock_mapping_entry()
  2018-11-30 19:50           ` Matthew Wilcox
@ 2018-11-30 20:05             ` Dan Williams
  2018-12-04  3:33               ` Dan Williams
  0 siblings, 1 reply; 12+ messages in thread
From: Dan Williams @ 2018-11-30 20:05 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-nvdimm, Jan Kara, linux-fsdevel, Linux Kernel Mailing List

On Fri, Nov 30, 2018 at 11:50 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Nov 30, 2018 at 09:01:07AM -0800, Dan Williams wrote:
> > On Fri, Nov 30, 2018 at 8:33 AM Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > On Fri, Nov 30, 2018 at 8:24 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Fri, Nov 30, 2018 at 07:54:49AM -0800, Dan Williams wrote:
> > > > > Looks good to me, although can we make that cookie an actual type? I
> > > > > think it's mostly ok to pass around (void *) for 'entry' inside of
> > > > > fs/dax.c, but once an entry leaves that file I'd like it to have an
> > > > > explicit type to catch people that might accidentally pass a (struct
> > > > > page *) to the unlock routine.
> > > >
> > > > That's a really good idea.  Something like this?
> > > >
> > > > typedef struct {
> > > >         void *v;
> > > > } dax_entry_t;
> > >
> > > Yes, please.
>
> Oh.  The caller needs to interpret it to see if the entry was successfully
> locked, so it needs to be an integer type (or we need a comparison
> function ... bleh).
>
> > > > I could see us making good use of that within dax.c.
> >
> > I'm now thinking that this is a nice improvement for 4.21. For 4.20-rc
> > lets do the localized fix.
>
> I think both patches are equally risky.  I admit this patch crosses a
> file boundary, but the other patch changes dax_make_entry() which is
> used by several functions which aren't part of this path, whereas this
> patch only changes functions used in the path which is known to be buggy.

I'm almost buying this argument... but I'd feel better knowing that
all dax_make_entry() usages are safe against this bug pattern. I
didn't audit the other occurrences of dax_make_entry() for this bug,
did you build some confidence here?

>
> This patch has the advantage of getting us closer to where we want to be
> sooner.

One comment below...

>
>
> From 1135b8d08f23ab4f5b28261535a817f3de9297c9 Mon Sep 17 00:00:00 2001
> From: Matthew Wilcox <willy@infradead.org>
> Date: Fri, 30 Nov 2018 11:05:06 -0500
> Subject: [PATCH] dax: Change lock/unlock API
>
> Return the unlock cookie from dax_lock_mapping_entry() and
> pass it to dax_unlock_mapping_entry().  This fixes a bug where
> dax_unlock_mapping_entry() was assuming that the page was PMD-aligned
> if the entry was a PMD entry.
>
> Debugged-by: Dan Williams <dan.j.williams@intel.com>
> Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray")
> Signed-off-by: Matthew Wilcox <willy@infradead.org>
> ---
>  fs/dax.c            | 21 ++++++++-------------
>  include/linux/dax.h | 15 +++++++++------
>  mm/memory-failure.c |  6 ++++--
>  3 files changed, 21 insertions(+), 21 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 9bcce89ea18e..d2c04e802978 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -351,20 +351,20 @@ static struct page *dax_busy_page(void *entry)
>   * @page: The page whose entry we want to lock
>   *
>   * Context: Process context.
> - * Return: %true if the entry was locked or does not need to be locked.
> + * Return: A cookie to pass to dax_unlock_mapping_entry() or 0 if the
> + * entry could not be locked.
>   */
> -bool dax_lock_mapping_entry(struct page *page)
> +dax_entry_t dax_lock_mapping_entry(struct page *page)
>  {
>         XA_STATE(xas, NULL, 0);
>         void *entry;
> -       bool locked;
>
>         /* Ensure page->mapping isn't freed while we look at it */
>         rcu_read_lock();
>         for (;;) {
>                 struct address_space *mapping = READ_ONCE(page->mapping);
>
> -               locked = false;
> +               entry = NULL;
>                 if (!dax_mapping(mapping))
>                         break;
>
> @@ -375,7 +375,7 @@ bool dax_lock_mapping_entry(struct page *page)
>                  * otherwise we would not have a valid pfn_to_page()
>                  * translation.
>                  */
> -               locked = true;
> +               entry = (void *)~0UL;
>                 if (S_ISCHR(mapping->host->i_mode))
>                         break;
>
> @@ -400,23 +400,18 @@ bool dax_lock_mapping_entry(struct page *page)
>                 break;
>         }
>         rcu_read_unlock();
> -       return locked;
> +       return (dax_entry_t)entry;
>  }
>
> -void dax_unlock_mapping_entry(struct page *page)
> +void dax_unlock_mapping_entry(struct page *page, dax_entry_t entry)

Let's not require the page to be passed back, it can be derived:

    page = pfn_to_page(dax_to_pfn((void*) entry));

A bit more symmetric that way and canonical with other locking schemes
that return a cookie.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dax: Fix Xarray conversion of dax_unlock_mapping_entry()
  2018-11-30 20:05             ` Dan Williams
@ 2018-12-04  3:33               ` Dan Williams
  2018-12-05  1:34                 ` Matthew Wilcox
  0 siblings, 1 reply; 12+ messages in thread
From: Dan Williams @ 2018-12-04  3:33 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-nvdimm, Jan Kara, linux-fsdevel, Linux Kernel Mailing List

On Fri, Nov 30, 2018 at 12:05 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Fri, Nov 30, 2018 at 11:50 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Fri, Nov 30, 2018 at 09:01:07AM -0800, Dan Williams wrote:
> > > On Fri, Nov 30, 2018 at 8:33 AM Dan Williams <dan.j.williams@intel.com> wrote:
> > > >
> > > > On Fri, Nov 30, 2018 at 8:24 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > > >
> > > > > On Fri, Nov 30, 2018 at 07:54:49AM -0800, Dan Williams wrote:
> > > > > > Looks good to me, although can we make that cookie an actual type? I
> > > > > > think it's mostly ok to pass around (void *) for 'entry' inside of
> > > > > > fs/dax.c, but once an entry leaves that file I'd like it to have an
> > > > > > explicit type to catch people that might accidentally pass a (struct
> > > > > > page *) to the unlock routine.
> > > > >
> > > > > That's a really good idea.  Something like this?
> > > > >
> > > > > typedef struct {
> > > > >         void *v;
> > > > > } dax_entry_t;
> > > >
> > > > Yes, please.
> >
> > Oh.  The caller needs to interpret it to see if the entry was successfully
> > locked, so it needs to be an integer type (or we need a comparison
> > function ... bleh).
> >
> > > > > I could see us making good use of that within dax.c.
> > >
> > > I'm now thinking that this is a nice improvement for 4.21. For 4.20-rc
> > > lets do the localized fix.
> >
> > I think both patches are equally risky.  I admit this patch crosses a
> > file boundary, but the other patch changes dax_make_entry() which is
> > used by several functions which aren't part of this path, whereas this
> > patch only changes functions used in the path which is known to be buggy.
>
> I'm almost buying this argument... but I'd feel better knowing that
> all dax_make_entry() usages are safe against this bug pattern. I
> didn't audit the other occurrences of dax_make_entry() for this bug,
> did you build some confidence here?
>
> >
> > This patch has the advantage of getting us closer to where we want to be
> > sooner.
>
> One comment below...
>
> >
> >
> > From 1135b8d08f23ab4f5b28261535a817f3de9297c9 Mon Sep 17 00:00:00 2001
> > From: Matthew Wilcox <willy@infradead.org>
> > Date: Fri, 30 Nov 2018 11:05:06 -0500
> > Subject: [PATCH] dax: Change lock/unlock API
> >
> > Return the unlock cookie from dax_lock_mapping_entry() and
> > pass it to dax_unlock_mapping_entry().  This fixes a bug where
> > dax_unlock_mapping_entry() was assuming that the page was PMD-aligned
> > if the entry was a PMD entry.
> >
> > Debugged-by: Dan Williams <dan.j.williams@intel.com>
> > Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray")
> > Signed-off-by: Matthew Wilcox <willy@infradead.org>
> > ---
> >  fs/dax.c            | 21 ++++++++-------------
> >  include/linux/dax.h | 15 +++++++++------
> >  mm/memory-failure.c |  6 ++++--
> >  3 files changed, 21 insertions(+), 21 deletions(-)
> >
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 9bcce89ea18e..d2c04e802978 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -351,20 +351,20 @@ static struct page *dax_busy_page(void *entry)
> >   * @page: The page whose entry we want to lock
> >   *
> >   * Context: Process context.
> > - * Return: %true if the entry was locked or does not need to be locked.
> > + * Return: A cookie to pass to dax_unlock_mapping_entry() or 0 if the
> > + * entry could not be locked.
> >   */
> > -bool dax_lock_mapping_entry(struct page *page)
> > +dax_entry_t dax_lock_mapping_entry(struct page *page)
> >  {
> >         XA_STATE(xas, NULL, 0);
> >         void *entry;
> > -       bool locked;
> >
> >         /* Ensure page->mapping isn't freed while we look at it */
> >         rcu_read_lock();
> >         for (;;) {
> >                 struct address_space *mapping = READ_ONCE(page->mapping);
> >
> > -               locked = false;
> > +               entry = NULL;
> >                 if (!dax_mapping(mapping))
> >                         break;
> >
> > @@ -375,7 +375,7 @@ bool dax_lock_mapping_entry(struct page *page)
> >                  * otherwise we would not have a valid pfn_to_page()
> >                  * translation.
> >                  */
> > -               locked = true;
> > +               entry = (void *)~0UL;
> >                 if (S_ISCHR(mapping->host->i_mode))
> >                         break;
> >
> > @@ -400,23 +400,18 @@ bool dax_lock_mapping_entry(struct page *page)
> >                 break;
> >         }
> >         rcu_read_unlock();
> > -       return locked;
> > +       return (dax_entry_t)entry;
> >  }
> >
> > -void dax_unlock_mapping_entry(struct page *page)
> > +void dax_unlock_mapping_entry(struct page *page, dax_entry_t entry)
>
> Let's not require the page to be passed back, it can be derived:
>
>     page = pfn_to_page(dax_to_pfn((void*) entry));
>
> A bit more symmetric that way and canonical with other locking schemes
> that return a cookie.

The patch does not apply on top of the pending fixes. could resend on
top of the current libnvdimm-fixes branch [1]?

I think because we are changing the calling convention to take return
a locked dax_entry_t type, I feel like we should go back to the
originally proposed names of these interfaces. I.e.

    dax_entry_t dax_lock_page(struct page *page)

    void dax_unlock_page(dax_entry_t entry)

Yes, it introduces an entry-to-pfn and pfn-to-page conversion.
However, If I can't convince you to drop the page argument, lets at
least do the name change. I.e. offload the responsibility of conveying
that this is not the traditional page lock to the fact that a
dax_entry is returned and passed back to the unlock routine.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git/log/?h=libnvdimm-fixes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dax: Fix Xarray conversion of dax_unlock_mapping_entry()
  2018-12-04  3:33               ` Dan Williams
@ 2018-12-05  1:34                 ` Matthew Wilcox
  2018-12-05  6:11                   ` Dan Williams
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2018-12-05  1:34 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm, Jan Kara, linux-fsdevel, Linux Kernel Mailing List

On Mon, Dec 03, 2018 at 07:33:43PM -0800, Dan Williams wrote:
> On Fri, Nov 30, 2018 at 12:05 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > > -void dax_unlock_mapping_entry(struct page *page)
> > > +void dax_unlock_mapping_entry(struct page *page, dax_entry_t entry)
> >
> > Let's not require the page to be passed back, it can be derived:
> >
> >     page = pfn_to_page(dax_to_pfn((void*) entry));
> >
> > A bit more symmetric that way and canonical with other locking schemes
> > that return a cookie.
> 
> The patch does not apply on top of the pending fixes. could resend on
> top of the current libnvdimm-fixes branch [1]?
> 
> I think because we are changing the calling convention to take return
> a locked dax_entry_t type, I feel like we should go back to the
> originally proposed names of these interfaces. I.e.
> 
>     dax_entry_t dax_lock_page(struct page *page)
> 
>     void dax_unlock_page(dax_entry_t entry)
> 
> Yes, it introduces an entry-to-pfn and pfn-to-page conversion.
> However, If I can't convince you to drop the page argument, lets at
> least do the name change. I.e. offload the responsibility of conveying
> that this is not the traditional page lock to the fact that a
> dax_entry is returned and passed back to the unlock routine.

From: Matthew Wilcox <willy@infradead.org>
Date: Fri, 30 Nov 2018 11:05:06 -0500
Subject: [PATCH] dax: Change lock/unlock API

Return the unlock cookie from dax_lock_page() and pass it to
dax_unlock_page().  This fixes a bug where dax_unlock_page() was
assuming that the page was PMD-aligned if the entry was a PMD entry.

Debugged-by: Dan Williams <dan.j.williams@intel.com>
Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray")
Signed-off-by: Matthew Wilcox <willy@infradead.org>
---
diff --git a/fs/dax.c b/fs/dax.c
index 3f592dc18d67..48132eca3761 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -379,20 +379,20 @@ static struct page *dax_busy_page(void *entry)
  * @page: The page whose entry we want to lock
  *
  * Context: Process context.
- * Return: %true if the entry was locked or does not need to be locked.
+ * Return: A cookie to pass to dax_unlock_page() or 0 if the entry could
+ * not be locked.
  */
-bool dax_lock_mapping_entry(struct page *page)
+dax_entry_t dax_lock_page(struct page *page)
 {
 	XA_STATE(xas, NULL, 0);
 	void *entry;
-	bool locked;
 
 	/* Ensure page->mapping isn't freed while we look at it */
 	rcu_read_lock();
 	for (;;) {
 		struct address_space *mapping = READ_ONCE(page->mapping);
 
-		locked = false;
+		entry = NULL;
 		if (!mapping || !dax_mapping(mapping))
 			break;
 
@@ -403,7 +403,7 @@ bool dax_lock_mapping_entry(struct page *page)
 		 * otherwise we would not have a valid pfn_to_page()
 		 * translation.
 		 */
-		locked = true;
+		entry = (void *)~0UL;
 		if (S_ISCHR(mapping->host->i_mode))
 			break;
 
@@ -426,23 +426,18 @@ bool dax_lock_mapping_entry(struct page *page)
 		break;
 	}
 	rcu_read_unlock();
-	return locked;
+	return (dax_entry_t)entry;
 }
 
-void dax_unlock_mapping_entry(struct page *page)
+void dax_unlock_page(struct page *page, dax_entry_t cookie)
 {
 	struct address_space *mapping = page->mapping;
 	XA_STATE(xas, &mapping->i_pages, page->index);
-	void *entry;
 
 	if (S_ISCHR(mapping->host->i_mode))
 		return;
 
-	rcu_read_lock();
-	entry = xas_load(&xas);
-	rcu_read_unlock();
-	entry = dax_make_entry(page_to_pfn_t(page), dax_is_pmd_entry(entry));
-	dax_unlock_entry(&xas, entry);
+	dax_unlock_entry(&xas, (void *)cookie);
 }
 
 /*
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 450b28db9533..0dd316a74a29 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -7,6 +7,8 @@
 #include <linux/radix-tree.h>
 #include <asm/pgtable.h>
 
+typedef unsigned long dax_entry_t;
+
 struct iomap_ops;
 struct dax_device;
 struct dax_operations {
@@ -88,8 +90,8 @@ int dax_writeback_mapping_range(struct address_space *mapping,
 		struct block_device *bdev, struct writeback_control *wbc);
 
 struct page *dax_layout_busy_page(struct address_space *mapping);
-bool dax_lock_mapping_entry(struct page *page);
-void dax_unlock_mapping_entry(struct page *page);
+dax_entry_t dax_lock_page(struct page *page);
+void dax_unlock_page(struct page *page, dax_entry_t cookie);
 #else
 static inline bool bdev_dax_supported(struct block_device *bdev,
 		int blocksize)
@@ -122,14 +124,14 @@ static inline int dax_writeback_mapping_range(struct address_space *mapping,
 	return -EOPNOTSUPP;
 }
 
-static inline bool dax_lock_mapping_entry(struct page *page)
+static inline dax_entry_t dax_lock_page(struct page *page)
 {
 	if (IS_DAX(page->mapping->host))
-		return true;
-	return false;
+		return ~0UL;
+	return 0;
 }
 
-static inline void dax_unlock_mapping_entry(struct page *page)
+static inline void dax_unlock_page(struct page *page, dax_entry_t cookie)
 {
 }
 #endif
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 0cd3de3550f0..7c72f2a95785 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1161,6 +1161,7 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
 	LIST_HEAD(tokill);
 	int rc = -EBUSY;
 	loff_t start;
+	dax_entry_t cookie;
 
 	/*
 	 * Prevent the inode from being freed while we are interrogating
@@ -1169,7 +1170,8 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
 	 * also prevents changes to the mapping of this pfn until
 	 * poison signaling is complete.
 	 */
-	if (!dax_lock_mapping_entry(page))
+	cookie = dax_lock_page(page);
+	if (!cookie)
 		goto out;
 
 	if (hwpoison_filter(page)) {
@@ -1220,7 +1222,7 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
 	kill_procs(&tokill, flags & MF_MUST_KILL, !unmap_success, pfn, flags);
 	rc = 0;
 unlock:
-	dax_unlock_mapping_entry(page);
+	dax_unlock_page(page, cookie);
 out:
 	/* drop pgmap ref acquired in caller */
 	put_dev_pagemap(pgmap);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dax: Fix Xarray conversion of dax_unlock_mapping_entry()
  2018-12-05  1:34                 ` Matthew Wilcox
@ 2018-12-05  6:11                   ` Dan Williams
  2018-12-05  9:22                     ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Dan Williams @ 2018-12-05  6:11 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-nvdimm, Jan Kara, linux-fsdevel, Linux Kernel Mailing List

On Tue, Dec 4, 2018 at 5:35 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Dec 03, 2018 at 07:33:43PM -0800, Dan Williams wrote:
> > On Fri, Nov 30, 2018 at 12:05 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > > > -void dax_unlock_mapping_entry(struct page *page)
> > > > +void dax_unlock_mapping_entry(struct page *page, dax_entry_t entry)
> > >
> > > Let's not require the page to be passed back, it can be derived:
> > >
> > >     page = pfn_to_page(dax_to_pfn((void*) entry));
> > >
> > > A bit more symmetric that way and canonical with other locking schemes
> > > that return a cookie.
> >
> > The patch does not apply on top of the pending fixes. could resend on
> > top of the current libnvdimm-fixes branch [1]?
> >
> > I think because we are changing the calling convention to take return
> > a locked dax_entry_t type, I feel like we should go back to the
> > originally proposed names of these interfaces. I.e.
> >
> >     dax_entry_t dax_lock_page(struct page *page)
> >
> >     void dax_unlock_page(dax_entry_t entry)
> >
> > Yes, it introduces an entry-to-pfn and pfn-to-page conversion.
> > However, If I can't convince you to drop the page argument, lets at
> > least do the name change. I.e. offload the responsibility of conveying
> > that this is not the traditional page lock to the fact that a
> > dax_entry is returned and passed back to the unlock routine.
>
> From: Matthew Wilcox <willy@infradead.org>
> Date: Fri, 30 Nov 2018 11:05:06 -0500
> Subject: [PATCH] dax: Change lock/unlock API
>
> Return the unlock cookie from dax_lock_page() and pass it to
> dax_unlock_page().  This fixes a bug where dax_unlock_page() was
> assuming that the page was PMD-aligned if the entry was a PMD entry.
>
> Debugged-by: Dan Williams <dan.j.williams@intel.com>
> Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray")
> Signed-off-by: Matthew Wilcox <willy@infradead.org>

Looks good, passes testing. I did fix up the changelog a bit to make
it clearer that this is a fix, not new development, and included more
details about the failing signature from my initial fix. No code
changes. I'll push it out to libnvdimm-fixes for -next to pick up.

---

dax: Fix unlock mismatch with updated API

Internal to dax_unlock_mapping_entry(), dax_unlock_entry() is used to
store a replacement entry in the Xarray at the given xas-index with the
DAX_LOCKED bit clear. When called, dax_unlock_entry() expects the unlocked
value of the entry relative to the current Xarray state to be specified.

In most contexts dax_unlock_entry() is operating in the same scope as
the matched dax_lock_entry(). However, in the dax_unlock_mapping_entry()
case the implementation needs to recall the original entry. In the case
where the original entry is a 'pmd' entry it is possible that the pfn
performed to do the lookup is misaligned to the value retrieved in the
Xarray.

Change the api to return return the unlock cookie from dax_lock_page()
and pass it to dax_unlock_page(). This fixes a bug where
dax_unlock_page() was assuming that the page was PMD-aligned if the
entry was a PMD entry with signatures like:

 WARNING: CPU: 38 PID: 1396 at fs/dax.c:340 dax_insert_entry+0x2b2/0x2d0
 RIP: 0010:dax_insert_entry+0x2b2/0x2d0
 [..]
 Call Trace:
  dax_iomap_pte_fault.isra.41+0x791/0xde0
  ext4_dax_huge_fault+0x16f/0x1f0
  ? up_read+0x1c/0xa0
  __do_fault+0x1f/0x160
  __handle_mm_fault+0x1033/0x1490
  handle_mm_fault+0x18b/0x3d0

Link: https://lkml.kernel.org/r/20181130154902.GL10377@bombadil.infradead.org
Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray")
Reported-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

---

> ---
> diff --git a/fs/dax.c b/fs/dax.c
> index 3f592dc18d67..48132eca3761 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -379,20 +379,20 @@ static struct page *dax_busy_page(void *entry)
>   * @page: The page whose entry we want to lock
>   *
>   * Context: Process context.
> - * Return: %true if the entry was locked or does not need to be locked.
> + * Return: A cookie to pass to dax_unlock_page() or 0 if the entry could
> + * not be locked.
>   */
> -bool dax_lock_mapping_entry(struct page *page)
> +dax_entry_t dax_lock_page(struct page *page)
>  {
>         XA_STATE(xas, NULL, 0);
>         void *entry;
> -       bool locked;
>
>         /* Ensure page->mapping isn't freed while we look at it */
>         rcu_read_lock();
>         for (;;) {
>                 struct address_space *mapping = READ_ONCE(page->mapping);
>
> -               locked = false;
> +               entry = NULL;
>                 if (!mapping || !dax_mapping(mapping))
>                         break;
>
> @@ -403,7 +403,7 @@ bool dax_lock_mapping_entry(struct page *page)
>                  * otherwise we would not have a valid pfn_to_page()
>                  * translation.
>                  */
> -               locked = true;
> +               entry = (void *)~0UL;
>                 if (S_ISCHR(mapping->host->i_mode))
>                         break;
>
> @@ -426,23 +426,18 @@ bool dax_lock_mapping_entry(struct page *page)
>                 break;
>         }
>         rcu_read_unlock();
> -       return locked;
> +       return (dax_entry_t)entry;
>  }
>
> -void dax_unlock_mapping_entry(struct page *page)
> +void dax_unlock_page(struct page *page, dax_entry_t cookie)
>  {
>         struct address_space *mapping = page->mapping;
>         XA_STATE(xas, &mapping->i_pages, page->index);
> -       void *entry;
>
>         if (S_ISCHR(mapping->host->i_mode))
>                 return;
>
> -       rcu_read_lock();
> -       entry = xas_load(&xas);
> -       rcu_read_unlock();
> -       entry = dax_make_entry(page_to_pfn_t(page), dax_is_pmd_entry(entry));
> -       dax_unlock_entry(&xas, entry);
> +       dax_unlock_entry(&xas, (void *)cookie);
>  }
>
>  /*
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 450b28db9533..0dd316a74a29 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -7,6 +7,8 @@
>  #include <linux/radix-tree.h>
>  #include <asm/pgtable.h>
>
> +typedef unsigned long dax_entry_t;
> +
>  struct iomap_ops;
>  struct dax_device;
>  struct dax_operations {
> @@ -88,8 +90,8 @@ int dax_writeback_mapping_range(struct address_space *mapping,
>                 struct block_device *bdev, struct writeback_control *wbc);
>
>  struct page *dax_layout_busy_page(struct address_space *mapping);
> -bool dax_lock_mapping_entry(struct page *page);
> -void dax_unlock_mapping_entry(struct page *page);
> +dax_entry_t dax_lock_page(struct page *page);
> +void dax_unlock_page(struct page *page, dax_entry_t cookie);
>  #else
>  static inline bool bdev_dax_supported(struct block_device *bdev,
>                 int blocksize)
> @@ -122,14 +124,14 @@ static inline int dax_writeback_mapping_range(struct address_space *mapping,
>         return -EOPNOTSUPP;
>  }
>
> -static inline bool dax_lock_mapping_entry(struct page *page)
> +static inline dax_entry_t dax_lock_page(struct page *page)
>  {
>         if (IS_DAX(page->mapping->host))
> -               return true;
> -       return false;
> +               return ~0UL;
> +       return 0;
>  }
>
> -static inline void dax_unlock_mapping_entry(struct page *page)
> +static inline void dax_unlock_page(struct page *page, dax_entry_t cookie)
>  {
>  }
>  #endif
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 0cd3de3550f0..7c72f2a95785 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1161,6 +1161,7 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>         LIST_HEAD(tokill);
>         int rc = -EBUSY;
>         loff_t start;
> +       dax_entry_t cookie;
>
>         /*
>          * Prevent the inode from being freed while we are interrogating
> @@ -1169,7 +1170,8 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>          * also prevents changes to the mapping of this pfn until
>          * poison signaling is complete.
>          */
> -       if (!dax_lock_mapping_entry(page))
> +       cookie = dax_lock_page(page);
> +       if (!cookie)
>                 goto out;
>
>         if (hwpoison_filter(page)) {
> @@ -1220,7 +1222,7 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>         kill_procs(&tokill, flags & MF_MUST_KILL, !unmap_success, pfn, flags);
>         rc = 0;
>  unlock:
> -       dax_unlock_mapping_entry(page);
> +       dax_unlock_page(page, cookie);
>  out:
>         /* drop pgmap ref acquired in caller */
>         put_dev_pagemap(pgmap);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dax: Fix Xarray conversion of dax_unlock_mapping_entry()
  2018-12-05  6:11                   ` Dan Williams
@ 2018-12-05  9:22                     ` Jan Kara
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2018-12-05  9:22 UTC (permalink / raw)
  To: Dan Williams
  Cc: Matthew Wilcox, linux-nvdimm, Jan Kara, linux-fsdevel,
	Linux Kernel Mailing List

On Tue 04-12-18 22:11:10, Dan Williams wrote:
> On Tue, Dec 4, 2018 at 5:35 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Mon, Dec 03, 2018 at 07:33:43PM -0800, Dan Williams wrote:
> > > On Fri, Nov 30, 2018 at 12:05 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > > > > -void dax_unlock_mapping_entry(struct page *page)
> > > > > +void dax_unlock_mapping_entry(struct page *page, dax_entry_t entry)
> > > >
> > > > Let's not require the page to be passed back, it can be derived:
> > > >
> > > >     page = pfn_to_page(dax_to_pfn((void*) entry));
> > > >
> > > > A bit more symmetric that way and canonical with other locking schemes
> > > > that return a cookie.
> > >
> > > The patch does not apply on top of the pending fixes. could resend on
> > > top of the current libnvdimm-fixes branch [1]?
> > >
> > > I think because we are changing the calling convention to take return
> > > a locked dax_entry_t type, I feel like we should go back to the
> > > originally proposed names of these interfaces. I.e.
> > >
> > >     dax_entry_t dax_lock_page(struct page *page)
> > >
> > >     void dax_unlock_page(dax_entry_t entry)
> > >
> > > Yes, it introduces an entry-to-pfn and pfn-to-page conversion.
> > > However, If I can't convince you to drop the page argument, lets at
> > > least do the name change. I.e. offload the responsibility of conveying
> > > that this is not the traditional page lock to the fact that a
> > > dax_entry is returned and passed back to the unlock routine.
> >
> > From: Matthew Wilcox <willy@infradead.org>
> > Date: Fri, 30 Nov 2018 11:05:06 -0500
> > Subject: [PATCH] dax: Change lock/unlock API
> >
> > Return the unlock cookie from dax_lock_page() and pass it to
> > dax_unlock_page().  This fixes a bug where dax_unlock_page() was
> > assuming that the page was PMD-aligned if the entry was a PMD entry.
> >
> > Debugged-by: Dan Williams <dan.j.williams@intel.com>
> > Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray")
> > Signed-off-by: Matthew Wilcox <willy@infradead.org>
> 
> Looks good, passes testing. I did fix up the changelog a bit to make
> it clearer that this is a fix, not new development, and included more
> details about the failing signature from my initial fix. No code
> changes. I'll push it out to libnvdimm-fixes for -next to pick up.
> 
> ---
> 
> dax: Fix unlock mismatch with updated API
> 
> Internal to dax_unlock_mapping_entry(), dax_unlock_entry() is used to
> store a replacement entry in the Xarray at the given xas-index with the
> DAX_LOCKED bit clear. When called, dax_unlock_entry() expects the unlocked
> value of the entry relative to the current Xarray state to be specified.
> 
> In most contexts dax_unlock_entry() is operating in the same scope as
> the matched dax_lock_entry(). However, in the dax_unlock_mapping_entry()
> case the implementation needs to recall the original entry. In the case
> where the original entry is a 'pmd' entry it is possible that the pfn
> performed to do the lookup is misaligned to the value retrieved in the
> Xarray.
> 
> Change the api to return return the unlock cookie from dax_lock_page()
> and pass it to dax_unlock_page(). This fixes a bug where
> dax_unlock_page() was assuming that the page was PMD-aligned if the
> entry was a PMD entry with signatures like:
> 
>  WARNING: CPU: 38 PID: 1396 at fs/dax.c:340 dax_insert_entry+0x2b2/0x2d0
>  RIP: 0010:dax_insert_entry+0x2b2/0x2d0
>  [..]
>  Call Trace:
>   dax_iomap_pte_fault.isra.41+0x791/0xde0
>   ext4_dax_huge_fault+0x16f/0x1f0
>   ? up_read+0x1c/0xa0
>   __do_fault+0x1f/0x160
>   __handle_mm_fault+0x1033/0x1490
>   handle_mm_fault+0x18b/0x3d0
> 
> Link: https://lkml.kernel.org/r/20181130154902.GL10377@bombadil.infradead.org
> Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray")
> Reported-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Matthew Wilcox <willy@infradead.org>
> Tested-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

The patch looks good to me. Thanks for nailing this down! You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> 
> ---
> 
> > ---
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 3f592dc18d67..48132eca3761 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -379,20 +379,20 @@ static struct page *dax_busy_page(void *entry)
> >   * @page: The page whose entry we want to lock
> >   *
> >   * Context: Process context.
> > - * Return: %true if the entry was locked or does not need to be locked.
> > + * Return: A cookie to pass to dax_unlock_page() or 0 if the entry could
> > + * not be locked.
> >   */
> > -bool dax_lock_mapping_entry(struct page *page)
> > +dax_entry_t dax_lock_page(struct page *page)
> >  {
> >         XA_STATE(xas, NULL, 0);
> >         void *entry;
> > -       bool locked;
> >
> >         /* Ensure page->mapping isn't freed while we look at it */
> >         rcu_read_lock();
> >         for (;;) {
> >                 struct address_space *mapping = READ_ONCE(page->mapping);
> >
> > -               locked = false;
> > +               entry = NULL;
> >                 if (!mapping || !dax_mapping(mapping))
> >                         break;
> >
> > @@ -403,7 +403,7 @@ bool dax_lock_mapping_entry(struct page *page)
> >                  * otherwise we would not have a valid pfn_to_page()
> >                  * translation.
> >                  */
> > -               locked = true;
> > +               entry = (void *)~0UL;
> >                 if (S_ISCHR(mapping->host->i_mode))
> >                         break;
> >
> > @@ -426,23 +426,18 @@ bool dax_lock_mapping_entry(struct page *page)
> >                 break;
> >         }
> >         rcu_read_unlock();
> > -       return locked;
> > +       return (dax_entry_t)entry;
> >  }
> >
> > -void dax_unlock_mapping_entry(struct page *page)
> > +void dax_unlock_page(struct page *page, dax_entry_t cookie)
> >  {
> >         struct address_space *mapping = page->mapping;
> >         XA_STATE(xas, &mapping->i_pages, page->index);
> > -       void *entry;
> >
> >         if (S_ISCHR(mapping->host->i_mode))
> >                 return;
> >
> > -       rcu_read_lock();
> > -       entry = xas_load(&xas);
> > -       rcu_read_unlock();
> > -       entry = dax_make_entry(page_to_pfn_t(page), dax_is_pmd_entry(entry));
> > -       dax_unlock_entry(&xas, entry);
> > +       dax_unlock_entry(&xas, (void *)cookie);
> >  }
> >
> >  /*
> > diff --git a/include/linux/dax.h b/include/linux/dax.h
> > index 450b28db9533..0dd316a74a29 100644
> > --- a/include/linux/dax.h
> > +++ b/include/linux/dax.h
> > @@ -7,6 +7,8 @@
> >  #include <linux/radix-tree.h>
> >  #include <asm/pgtable.h>
> >
> > +typedef unsigned long dax_entry_t;
> > +
> >  struct iomap_ops;
> >  struct dax_device;
> >  struct dax_operations {
> > @@ -88,8 +90,8 @@ int dax_writeback_mapping_range(struct address_space *mapping,
> >                 struct block_device *bdev, struct writeback_control *wbc);
> >
> >  struct page *dax_layout_busy_page(struct address_space *mapping);
> > -bool dax_lock_mapping_entry(struct page *page);
> > -void dax_unlock_mapping_entry(struct page *page);
> > +dax_entry_t dax_lock_page(struct page *page);
> > +void dax_unlock_page(struct page *page, dax_entry_t cookie);
> >  #else
> >  static inline bool bdev_dax_supported(struct block_device *bdev,
> >                 int blocksize)
> > @@ -122,14 +124,14 @@ static inline int dax_writeback_mapping_range(struct address_space *mapping,
> >         return -EOPNOTSUPP;
> >  }
> >
> > -static inline bool dax_lock_mapping_entry(struct page *page)
> > +static inline dax_entry_t dax_lock_page(struct page *page)
> >  {
> >         if (IS_DAX(page->mapping->host))
> > -               return true;
> > -       return false;
> > +               return ~0UL;
> > +       return 0;
> >  }
> >
> > -static inline void dax_unlock_mapping_entry(struct page *page)
> > +static inline void dax_unlock_page(struct page *page, dax_entry_t cookie)
> >  {
> >  }
> >  #endif
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index 0cd3de3550f0..7c72f2a95785 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -1161,6 +1161,7 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
> >         LIST_HEAD(tokill);
> >         int rc = -EBUSY;
> >         loff_t start;
> > +       dax_entry_t cookie;
> >
> >         /*
> >          * Prevent the inode from being freed while we are interrogating
> > @@ -1169,7 +1170,8 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
> >          * also prevents changes to the mapping of this pfn until
> >          * poison signaling is complete.
> >          */
> > -       if (!dax_lock_mapping_entry(page))
> > +       cookie = dax_lock_page(page);
> > +       if (!cookie)
> >                 goto out;
> >
> >         if (hwpoison_filter(page)) {
> > @@ -1220,7 +1222,7 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
> >         kill_procs(&tokill, flags & MF_MUST_KILL, !unmap_success, pfn, flags);
> >         rc = 0;
> >  unlock:
> > -       dax_unlock_mapping_entry(page);
> > +       dax_unlock_page(page, cookie);
> >  out:
> >         /* drop pgmap ref acquired in caller */
> >         put_dev_pagemap(pgmap);
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, back to index

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-30  0:13 [PATCH] dax: Fix Xarray conversion of dax_unlock_mapping_entry() Dan Williams
2018-11-30 15:49 ` Matthew Wilcox
2018-11-30 15:54   ` Dan Williams
2018-11-30 16:24     ` Matthew Wilcox
2018-11-30 16:33       ` Dan Williams
2018-11-30 17:01         ` Dan Williams
2018-11-30 19:50           ` Matthew Wilcox
2018-11-30 20:05             ` Dan Williams
2018-12-04  3:33               ` Dan Williams
2018-12-05  1:34                 ` Matthew Wilcox
2018-12-05  6:11                   ` Dan Williams
2018-12-05  9:22                     ` Jan Kara

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git