linux-nvdimm.lists.01.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] mm: Fix pfn_to_online_page() with respect to ZONE_DEVICE
@ 2021-01-12  9:34 Dan Williams
  2021-01-12  9:34 ` [PATCH v2 1/5] mm: Move pfn_to_online_page() out of line Dan Williams
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Dan Williams @ 2021-01-12  9:34 UTC (permalink / raw)
  To: linux-mm
  Cc: David Hildenbrand, stable, Naoya Horiguchi, Qian Cai,
	Michal Hocko, Oscar Salvador, Michal Hocko, Andrew Morton,
	linux-nvdimm, linux-kernel

Changes since v1 [1]:
- Clarify the failing condition in patch 3 (Michal)
- Clarify how subsection collisions manifest in shipping systems
  (Michal)
- Use zone_idx() (Michal)
- Move section_taint_zone_device() conditions to
  move_pfn_range_to_zone() (Michal)
- Fix pfn_to_online_page() to account for pfn_valid() vs
  pfn_section_valid() confusion (David)

[1]: http://lore.kernel.org/r/160990599013.2430134.11556277600719835946.stgit@dwillia2-desk3.amr.corp.intel.com

---

Michal reminds that the discussion about how to ensure pfn-walkers do
not get confused by ZONE_DEVICE pages never resolved. A pfn-walker that
uses pfn_to_online_page() may inadvertently translate a pfn as online
and in the page allocator, when it is offline managed by a ZONE_DEVICE
mapping (details in Patch 3: ("mm: Teach pfn_to_online_page() about
ZONE_DEVICE section collisions")).

The 2 proposals under consideration are teach pfn_to_online_page() to be
precise in the presence of mixed-zone sections, or teach the memory-add
code to drop the System RAM associated with ZONE_DEVICE collisions. In
order to not regress memory capacity by a few 10s to 100s of MiB the
approach taken in this set is to add precision to pfn_to_online_page().

In the course of validating pfn_to_online_page() a couple other fixes
fell out:

1/ soft_offline_page() fails to drop the reference taken in the
   madvise(..., MADV_SOFT_OFFLINE) case.

2/ The libnvdimm sysfs attribute visibility code was failing to publish
   the resource base for memmap=ss!nn defined namespaces. This is needed
   for the regression test for soft_offline_page().

---

Dan Williams (5):
      mm: Move pfn_to_online_page() out of line
      mm: Teach pfn_to_online_page() to consider subsection validity
      mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions
      mm: Fix page reference leak in soft_offline_page()
      libnvdimm/namespace: Fix visibility of namespace resource attribute


 drivers/nvdimm/namespace_devs.c |   10 +++---
 include/linux/memory_hotplug.h  |   17 +----------
 include/linux/mmzone.h          |   22 +++++++++-----
 mm/memory-failure.c             |   20 ++++++++++---
 mm/memory_hotplug.c             |   62 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 99 insertions(+), 32 deletions(-)
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/5] mm: Move pfn_to_online_page() out of line
  2021-01-12  9:34 [PATCH v2 0/5] mm: Fix pfn_to_online_page() with respect to ZONE_DEVICE Dan Williams
@ 2021-01-12  9:34 ` Dan Williams
  2021-01-12  9:46   ` David Hildenbrand
  2021-01-12 10:19   ` Oscar Salvador
  2021-01-12  9:34 ` [PATCH v2 2/5] mm: Teach pfn_to_online_page() to consider subsection validity Dan Williams
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 16+ messages in thread
From: Dan Williams @ 2021-01-12  9:34 UTC (permalink / raw)
  To: linux-mm; +Cc: David Hildenbrand, Michal Hocko, linux-nvdimm, linux-kernel

pfn_to_online_page() is already too large to be a macro or an inline
function. In anticipation of further logic changes / growth, move it out
of line.

No functional change, just code movement.

Cc: David Hildenbrand <david@redhat.com>
Reported-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/memory_hotplug.h |   17 +----------------
 mm/memory_hotplug.c            |   16 ++++++++++++++++
 2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 15acce5ab106..3d99de0db2dd 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -16,22 +16,7 @@ struct resource;
 struct vmem_altmap;
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-/*
- * Return page for the valid pfn only if the page is online. All pfn
- * walkers which rely on the fully initialized page->flags and others
- * should use this rather than pfn_valid && pfn_to_page
- */
-#define pfn_to_online_page(pfn)					   \
-({								   \
-	struct page *___page = NULL;				   \
-	unsigned long ___pfn = pfn;				   \
-	unsigned long ___nr = pfn_to_section_nr(___pfn);	   \
-								   \
-	if (___nr < NR_MEM_SECTIONS && online_section_nr(___nr) && \
-	    pfn_valid_within(___pfn))				   \
-		___page = pfn_to_page(___pfn);			   \
-	___page;						   \
-})
+struct page *pfn_to_online_page(unsigned long pfn);
 
 /*
  * Types for free bootmem stored in page->lru.next. These have to be in
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index f9d57b9be8c7..55a69d4396e7 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -300,6 +300,22 @@ static int check_hotplug_memory_addressable(unsigned long pfn,
 	return 0;
 }
 
+/*
+ * Return page for the valid pfn only if the page is online. All pfn
+ * walkers which rely on the fully initialized page->flags and others
+ * should use this rather than pfn_valid && pfn_to_page
+ */
+struct page *pfn_to_online_page(unsigned long pfn)
+{
+	unsigned long nr = pfn_to_section_nr(pfn);
+
+	if (nr < NR_MEM_SECTIONS && online_section_nr(nr) &&
+	    pfn_valid_within(pfn))
+		return pfn_to_page(pfn);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(pfn_to_online_page);
+
 /*
  * Reasonably generic function for adding memory.  It is
  * expected that archs that support memory hotplug will
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/5] mm: Teach pfn_to_online_page() to consider subsection validity
  2021-01-12  9:34 [PATCH v2 0/5] mm: Fix pfn_to_online_page() with respect to ZONE_DEVICE Dan Williams
  2021-01-12  9:34 ` [PATCH v2 1/5] mm: Move pfn_to_online_page() out of line Dan Williams
@ 2021-01-12  9:34 ` Dan Williams
  2021-01-12  9:53   ` David Hildenbrand
  2021-01-12  9:34 ` [PATCH v2 3/5] mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions Dan Williams
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Dan Williams @ 2021-01-12  9:34 UTC (permalink / raw)
  To: linux-mm
  Cc: Qian Cai, Michal Hocko, David Hildenbrand, linux-nvdimm, linux-kernel

pfn_section_valid() determines pfn validity on subsection granularity.

pfn_valid_within() internally uses pfn_section_valid(), but gates it
with early_section() to preserve the traditional behavior of pfn_valid()
before subsection support was added.

pfn_to_online_page() wants the explicit precision that pfn_valid() does
not offer, so use pfn_section_valid() directly. Since
pfn_to_online_page() already open codes the validity of the section
number vs NR_MEM_SECTIONS, there's not much value to using
pfn_valid_within(), just use pfn_section_valid(). This loses the
valid_section() check that pfn_valid_within() was performing, but that
was already redundant with the online check.

Fixes: b13bc35193d9 ("mm/hotplug: invalid PFNs from pfn_to_online_page()")
Cc: Qian Cai <cai@lca.pw>
Cc: Michal Hocko <mhocko@suse.com>
Reported-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 55a69d4396e7..a845b3979bc0 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -308,11 +308,19 @@ static int check_hotplug_memory_addressable(unsigned long pfn,
 struct page *pfn_to_online_page(unsigned long pfn)
 {
 	unsigned long nr = pfn_to_section_nr(pfn);
+	struct mem_section *ms;
+
+	if (nr >= NR_MEM_SECTIONS)
+		return NULL;
+
+	ms = __nr_to_section(nr);
+	if (!online_section(ms))
+		return NULL;
+
+	if (!pfn_section_valid(ms, pfn))
+		return NULL;
 
-	if (nr < NR_MEM_SECTIONS && online_section_nr(nr) &&
-	    pfn_valid_within(pfn))
-		return pfn_to_page(pfn);
-	return NULL;
+	return pfn_to_page(pfn);
 }
 EXPORT_SYMBOL_GPL(pfn_to_online_page);
 
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/5] mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions
  2021-01-12  9:34 [PATCH v2 0/5] mm: Fix pfn_to_online_page() with respect to ZONE_DEVICE Dan Williams
  2021-01-12  9:34 ` [PATCH v2 1/5] mm: Move pfn_to_online_page() out of line Dan Williams
  2021-01-12  9:34 ` [PATCH v2 2/5] mm: Teach pfn_to_online_page() to consider subsection validity Dan Williams
@ 2021-01-12  9:34 ` Dan Williams
  2021-01-12 10:01   ` David Hildenbrand
  2021-01-12 11:00   ` Oscar Salvador
  2021-01-12  9:34 ` [PATCH v2 4/5] mm: Fix page reference leak in soft_offline_page() Dan Williams
  2021-01-12  9:35 ` [PATCH v2 5/5] libnvdimm/namespace: Fix visibility of namespace resource attribute Dan Williams
  4 siblings, 2 replies; 16+ messages in thread
From: Dan Williams @ 2021-01-12  9:34 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Michal Hocko, David Hildenbrand, linux-nvdimm,
	linux-kernel

While pfn_to_online_page() is able to determine pfn_valid() at
subsection granularity it is not able to reliably determine if a given
pfn is also online if the section is mixes ZONE_{NORMAL,MOVABLE} with
ZONE_DEVICE. This means that pfn_to_online_page() may return invalid
@page objects. For example with a memory map like:

100000000-1fbffffff : System RAM
  142000000-143002e16 : Kernel code
  143200000-143713fff : Kernel rodata
  143800000-143b15b7f : Kernel data
  144227000-144ffffff : Kernel bss
1fc000000-2fbffffff : Persistent Memory (legacy)
  1fc000000-2fbffffff : namespace0.0

This command:

echo 0x1fc000000 > /sys/devices/system/memory/soft_offline_page

...succeeds when it should fail. When it succeeds it touches
an uninitialized page and may crash or cause other damage (see
dissolve_free_huge_page()).

While the memory map above is contrived via the memmap=ss!nn kernel
command line option, the collision happens in practice on shipping
platforms. The memory controller resources that decode spans of
physical address space are a limited resource. One technique
platform-firmware uses to conserve those resources is to share a decoder
across 2 devices to keep the address range contiguous. Unfortunately the
unit of operation of a decoder is 64MiB while the Linux section size is
128MiB. This results in situations where, without subsection hotplug
memory mappings with different lifetimes collide into one object that
can only express one lifetime.

Update move_pfn_range_to_zone() to flag (SECTION_TAINT_ZONE_DEVICE) a
section that mixes ZONE_DEVICE pfns with other online pfns. With
SECTION_TAINT_ZONE_DEVICE to delineate, pfn_to_online_page() can fall
back to a slow-path check for ZONE_DEVICE pfns in an online section. In
the fast path online_section() for a full ZONE_DEVICE section returns
false.

Because the collision case is rare, and for simplicity, the
SECTION_TAINT_ZONE_DEVICE flag is never cleared once set.

Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Cc: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Michal Hocko <mhocko@suse.com>
Reported-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/mmzone.h |   22 +++++++++++++++-------
 mm/memory_hotplug.c    |   38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b593316bff3d..0b5c44f730b4 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1273,13 +1273,14 @@ extern size_t mem_section_usage_size(void);
  *      which results in PFN_SECTION_SHIFT equal 6.
  * To sum it up, at least 6 bits are available.
  */
-#define	SECTION_MARKED_PRESENT	(1UL<<0)
-#define SECTION_HAS_MEM_MAP	(1UL<<1)
-#define SECTION_IS_ONLINE	(1UL<<2)
-#define SECTION_IS_EARLY	(1UL<<3)
-#define SECTION_MAP_LAST_BIT	(1UL<<4)
-#define SECTION_MAP_MASK	(~(SECTION_MAP_LAST_BIT-1))
-#define SECTION_NID_SHIFT	3
+#define SECTION_MARKED_PRESENT		(1UL<<0)
+#define SECTION_HAS_MEM_MAP		(1UL<<1)
+#define SECTION_IS_ONLINE		(1UL<<2)
+#define SECTION_IS_EARLY		(1UL<<3)
+#define SECTION_TAINT_ZONE_DEVICE	(1UL<<4)
+#define SECTION_MAP_LAST_BIT		(1UL<<5)
+#define SECTION_MAP_MASK		(~(SECTION_MAP_LAST_BIT-1))
+#define SECTION_NID_SHIFT		3
 
 static inline struct page *__section_mem_map_addr(struct mem_section *section)
 {
@@ -1318,6 +1319,13 @@ static inline int online_section(struct mem_section *section)
 	return (section && (section->section_mem_map & SECTION_IS_ONLINE));
 }
 
+static inline int online_device_section(struct mem_section *section)
+{
+	unsigned long flags = SECTION_IS_ONLINE | SECTION_TAINT_ZONE_DEVICE;
+
+	return section && ((section->section_mem_map & flags) == flags);
+}
+
 static inline int online_section_nr(unsigned long nr)
 {
 	return online_section(__nr_to_section(nr));
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a845b3979bc0..b2ccb84c3082 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -308,6 +308,7 @@ static int check_hotplug_memory_addressable(unsigned long pfn,
 struct page *pfn_to_online_page(unsigned long pfn)
 {
 	unsigned long nr = pfn_to_section_nr(pfn);
+	struct dev_pagemap *pgmap;
 	struct mem_section *ms;
 
 	if (nr >= NR_MEM_SECTIONS)
@@ -320,6 +321,22 @@ struct page *pfn_to_online_page(unsigned long pfn)
 	if (!pfn_section_valid(ms, pfn))
 		return NULL;
 
+	if (!online_device_section(ms))
+		return pfn_to_page(pfn);
+
+	/*
+	 * Slowpath: when ZONE_DEVICE collides with
+	 * ZONE_{NORMAL,MOVABLE} within the same section some pfns in
+	 * the section may be 'offline' but 'valid'. Only
+	 * get_dev_pagemap() can determine sub-section online status.
+	 */
+	pgmap = get_dev_pagemap(pfn, NULL);
+	put_dev_pagemap(pgmap);
+
+	/* The presence of a pgmap indicates ZONE_DEVICE offline pfn */
+	if (pgmap)
+		return NULL;
+
 	return pfn_to_page(pfn);
 }
 EXPORT_SYMBOL_GPL(pfn_to_online_page);
@@ -702,6 +719,14 @@ static void __meminit resize_pgdat_range(struct pglist_data *pgdat, unsigned lon
 	pgdat->node_spanned_pages = max(start_pfn + nr_pages, old_end_pfn) - pgdat->node_start_pfn;
 
 }
+
+static void section_taint_zone_device(unsigned long pfn)
+{
+	struct mem_section *ms = __pfn_to_section(pfn);
+
+	ms->section_mem_map |= SECTION_TAINT_ZONE_DEVICE;
+}
+
 /*
  * Associate the pfn range with the given zone, initializing the memmaps
  * and resizing the pgdat/zone data to span the added pages. After this
@@ -731,6 +756,19 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 	resize_pgdat_range(pgdat, start_pfn, nr_pages);
 	pgdat_resize_unlock(pgdat, &flags);
 
+	/*
+	 * Subsection population requires care in pfn_to_online_page().
+	 * Set the taint to enable the slow path detection of
+	 * ZONE_DEVICE pages in an otherwise  ZONE_{NORMAL,MOVABLE}
+	 * section.
+	 */
+	if (zone_idx(zone) == ZONE_DEVICE) {
+		if (!IS_ALIGNED(start_pfn, PAGES_PER_SECTION))
+			section_taint_zone_device(start_pfn);
+		if (!IS_ALIGNED(start_pfn + nr_pages, PAGES_PER_SECTION))
+			section_taint_zone_device(start_pfn + nr_pages);
+	}
+
 	/*
 	 * TODO now we have a visible range of pages which are not associated
 	 * with their zone properly. Not nice but set_pfnblock_flags_mask
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 4/5] mm: Fix page reference leak in soft_offline_page()
  2021-01-12  9:34 [PATCH v2 0/5] mm: Fix pfn_to_online_page() with respect to ZONE_DEVICE Dan Williams
                   ` (2 preceding siblings ...)
  2021-01-12  9:34 ` [PATCH v2 3/5] mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions Dan Williams
@ 2021-01-12  9:34 ` Dan Williams
  2021-01-12  9:53   ` Oscar Salvador
  2021-01-12 10:16   ` David Hildenbrand
  2021-01-12  9:35 ` [PATCH v2 5/5] libnvdimm/namespace: Fix visibility of namespace resource attribute Dan Williams
  4 siblings, 2 replies; 16+ messages in thread
From: Dan Williams @ 2021-01-12  9:34 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Naoya Horiguchi, David Hildenbrand, Michal Hocko,
	Oscar Salvador, stable, linux-nvdimm, linux-kernel

The conversion to move pfn_to_online_page() internal to
soft_offline_page() missed that the get_user_pages() reference needs to
be dropped when pfn_to_online_page() fails.

When soft_offline_page() is handed a pfn_valid() &&
!pfn_to_online_page() pfn the kernel hangs at dax-device shutdown due to
a leaked reference.

Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn")
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 mm/memory-failure.c |   20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 5a38e9eade94..78b173c7190c 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1885,6 +1885,12 @@ static int soft_offline_free_page(struct page *page)
 	return rc;
 }
 
+static void put_ref_page(struct page *page)
+{
+	if (page)
+		put_page(page);
+}
+
 /**
  * soft_offline_page - Soft offline a page.
  * @pfn: pfn to soft-offline
@@ -1910,20 +1916,26 @@ static int soft_offline_free_page(struct page *page)
 int soft_offline_page(unsigned long pfn, int flags)
 {
 	int ret;
-	struct page *page;
 	bool try_again = true;
+	struct page *page, *ref_page = NULL;
+
+	WARN_ON_ONCE(!pfn_valid(pfn) && (flags & MF_COUNT_INCREASED));
 
 	if (!pfn_valid(pfn))
 		return -ENXIO;
+	if (flags & MF_COUNT_INCREASED)
+		ref_page = pfn_to_page(pfn);
+
 	/* Only online pages can be soft-offlined (esp., not ZONE_DEVICE). */
 	page = pfn_to_online_page(pfn);
-	if (!page)
+	if (!page) {
+		put_ref_page(ref_page);
 		return -EIO;
+	}
 
 	if (PageHWPoison(page)) {
 		pr_info("%s: %#lx page already poisoned\n", __func__, pfn);
-		if (flags & MF_COUNT_INCREASED)
-			put_page(page);
+		put_ref_page(ref_page);
 		return 0;
 	}
 
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 5/5] libnvdimm/namespace: Fix visibility of namespace resource attribute
  2021-01-12  9:34 [PATCH v2 0/5] mm: Fix pfn_to_online_page() with respect to ZONE_DEVICE Dan Williams
                   ` (3 preceding siblings ...)
  2021-01-12  9:34 ` [PATCH v2 4/5] mm: Fix page reference leak in soft_offline_page() Dan Williams
@ 2021-01-12  9:35 ` Dan Williams
  4 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2021-01-12  9:35 UTC (permalink / raw)
  To: linux-mm; +Cc: stable, linux-nvdimm, linux-kernel

Legacy pmem namespaces lost support for the "resource" attribute when
the code was cleaned up to put the permission visibility in the
declaration. Restore this by listing 'resource' in the default
attributes.

A new ndctl regression test for pfn_to_online_page() corner cases builds
on this fix.

Fixes: bfd2e9140656 ("libnvdimm: Simplify root read-only definition for the 'resource' attribute")
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/namespace_devs.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 6da67f4d641a..2403b71b601e 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -1635,11 +1635,11 @@ static umode_t namespace_visible(struct kobject *kobj,
 		return a->mode;
 	}
 
-	if (a == &dev_attr_nstype.attr || a == &dev_attr_size.attr
-			|| a == &dev_attr_holder.attr
-			|| a == &dev_attr_holder_class.attr
-			|| a == &dev_attr_force_raw.attr
-			|| a == &dev_attr_mode.attr)
+	/* base is_namespace_io() attributes */
+	if (a == &dev_attr_nstype.attr || a == &dev_attr_size.attr ||
+	    a == &dev_attr_holder.attr || a == &dev_attr_holder_class.attr ||
+	    a == &dev_attr_force_raw.attr || a == &dev_attr_mode.attr ||
+	    a == &dev_attr_resource.attr)
 		return a->mode;
 
 	return 0;
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/5] mm: Move pfn_to_online_page() out of line
  2021-01-12  9:34 ` [PATCH v2 1/5] mm: Move pfn_to_online_page() out of line Dan Williams
@ 2021-01-12  9:46   ` David Hildenbrand
  2021-01-12 10:19   ` Oscar Salvador
  1 sibling, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2021-01-12  9:46 UTC (permalink / raw)
  To: Dan Williams, linux-mm; +Cc: Michal Hocko, linux-nvdimm, linux-kernel

On 12.01.21 10:34, Dan Williams wrote:
> pfn_to_online_page() is already too large to be a macro or an inline
> function. In anticipation of further logic changes / growth, move it out
> of line.
> 
> No functional change, just code movement.
> 
> Cc: David Hildenbrand <david@redhat.com>
> Reported-by: Michal Hocko <mhocko@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/memory_hotplug.h |   17 +----------------
>  mm/memory_hotplug.c            |   16 ++++++++++++++++
>  2 files changed, 17 insertions(+), 16 deletions(-)
> 
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 15acce5ab106..3d99de0db2dd 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -16,22 +16,7 @@ struct resource;
>  struct vmem_altmap;
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> -/*
> - * Return page for the valid pfn only if the page is online. All pfn
> - * walkers which rely on the fully initialized page->flags and others
> - * should use this rather than pfn_valid && pfn_to_page
> - */
> -#define pfn_to_online_page(pfn)					   \
> -({								   \
> -	struct page *___page = NULL;				   \
> -	unsigned long ___pfn = pfn;				   \
> -	unsigned long ___nr = pfn_to_section_nr(___pfn);	   \
> -								   \
> -	if (___nr < NR_MEM_SECTIONS && online_section_nr(___nr) && \
> -	    pfn_valid_within(___pfn))				   \
> -		___page = pfn_to_page(___pfn);			   \
> -	___page;						   \
> -})
> +struct page *pfn_to_online_page(unsigned long pfn);
>  
>  /*
>   * Types for free bootmem stored in page->lru.next. These have to be in
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index f9d57b9be8c7..55a69d4396e7 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -300,6 +300,22 @@ static int check_hotplug_memory_addressable(unsigned long pfn,
>  	return 0;
>  }
>  
> +/*
> + * Return page for the valid pfn only if the page is online. All pfn
> + * walkers which rely on the fully initialized page->flags and others
> + * should use this rather than pfn_valid && pfn_to_page
> + */
> +struct page *pfn_to_online_page(unsigned long pfn)
> +{
> +	unsigned long nr = pfn_to_section_nr(pfn);
> +
> +	if (nr < NR_MEM_SECTIONS && online_section_nr(nr) &&
> +	    pfn_valid_within(pfn))
> +		return pfn_to_page(pfn);
> +	return NULL;
> +}
> +EXPORT_SYMBOL_GPL(pfn_to_online_page);
> +
>  /*
>   * Reasonably generic function for adding memory.  It is
>   * expected that archs that support memory hotplug will
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] mm: Teach pfn_to_online_page() to consider subsection validity
  2021-01-12  9:34 ` [PATCH v2 2/5] mm: Teach pfn_to_online_page() to consider subsection validity Dan Williams
@ 2021-01-12  9:53   ` David Hildenbrand
  2021-01-12 10:48     ` Oscar Salvador
  2021-01-12 22:20     ` Dan Williams
  0 siblings, 2 replies; 16+ messages in thread
From: David Hildenbrand @ 2021-01-12  9:53 UTC (permalink / raw)
  To: Dan Williams, linux-mm
  Cc: Qian Cai, Michal Hocko, linux-nvdimm, linux-kernel, Anshuman Khandual

On 12.01.21 10:34, Dan Williams wrote:
> pfn_section_valid() determines pfn validity on subsection granularity.
> 
> pfn_valid_within() internally uses pfn_section_valid(), but gates it
> with early_section() to preserve the traditional behavior of pfn_valid()
> before subsection support was added.
> 
> pfn_to_online_page() wants the explicit precision that pfn_valid() does
> not offer, so use pfn_section_valid() directly. Since
> pfn_to_online_page() already open codes the validity of the section
> number vs NR_MEM_SECTIONS, there's not much value to using
> pfn_valid_within(), just use pfn_section_valid(). This loses the
> valid_section() check that pfn_valid_within() was performing, but that
> was already redundant with the online check.
> 
> Fixes: b13bc35193d9 ("mm/hotplug: invalid PFNs from pfn_to_online_page()")
> Cc: Qian Cai <cai@lca.pw>
> Cc: Michal Hocko <mhocko@suse.com>
> Reported-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/memory_hotplug.c |   16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 55a69d4396e7..a845b3979bc0 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -308,11 +308,19 @@ static int check_hotplug_memory_addressable(unsigned long pfn,
>  struct page *pfn_to_online_page(unsigned long pfn)
>  {
>  	unsigned long nr = pfn_to_section_nr(pfn);
> +	struct mem_section *ms;
> +
> +	if (nr >= NR_MEM_SECTIONS)
> +		return NULL;
> +
> +	ms = __nr_to_section(nr);
> +	if (!online_section(ms))
> +		return NULL;
> +
> +	if (!pfn_section_valid(ms, pfn))
> +		return NULL;

That's not sufficient for alternative implementations of pfn_valid().

You still need some kind of pfn_valid(pfn) for alternative versions of
pfn_valid(). Consider arm64 memory holes in the memmap. See their
current (yet to be fixed/reworked) pfn_valid() implementation.
(pfn_valid_within() is implicitly active on arm64)

Actually, I think we should add something like the following, to make
this clearer (pfn_valid_within() is confusing)

#ifdef CONFIG_HAVE_ARCH_PFN_VALID
	/* We might have to check for holes inside the memmap. */
	if (!pfn_valid())
		return NULL;
#endif

>  
> -	if (nr < NR_MEM_SECTIONS && online_section_nr(nr) &&
> -	    pfn_valid_within(pfn))
> -		return pfn_to_page(pfn);
> -	return NULL;
> +	return pfn_to_page(pfn);
>  }
>  EXPORT_SYMBOL_GPL(pfn_to_online_page);
>  
> 


-- 
Thanks,

David / dhildenb
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 4/5] mm: Fix page reference leak in soft_offline_page()
  2021-01-12  9:34 ` [PATCH v2 4/5] mm: Fix page reference leak in soft_offline_page() Dan Williams
@ 2021-01-12  9:53   ` Oscar Salvador
  2021-01-12 20:03     ` Dan Williams
  2021-01-12 10:16   ` David Hildenbrand
  1 sibling, 1 reply; 16+ messages in thread
From: Oscar Salvador @ 2021-01-12  9:53 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-mm, Andrew Morton, Naoya Horiguchi, David Hildenbrand,
	Michal Hocko, stable, linux-nvdimm, linux-kernel

On Tue, Jan 12, 2021 at 01:34:58AM -0800, Dan Williams wrote:
> The conversion to move pfn_to_online_page() internal to
> soft_offline_page() missed that the get_user_pages() reference needs to
> be dropped when pfn_to_online_page() fails.

I would be more specific here wrt. get_user_pages (madvise).
soft_offline_page gets called from more places besides madvise_*.

> When soft_offline_page() is handed a pfn_valid() &&
> !pfn_to_online_page() pfn the kernel hangs at dax-device shutdown due to
> a leaked reference.
> 
> Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn")
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

LGTM, thanks for catching this:

Reviewed-by: Oscar Salvador <osalvador@suse.de>

A nit below.

> ---
>  mm/memory-failure.c |   20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 5a38e9eade94..78b173c7190c 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1885,6 +1885,12 @@ static int soft_offline_free_page(struct page *page)
>  	return rc;
>  }
>  
> +static void put_ref_page(struct page *page)
> +{
> +	if (page)
> +		put_page(page);
> +}

I am not sure this warrants a function.
I would probably go with "if (ref_page).." in the two corresponding places,
but not feeling strong here.

> +
>  /**
>   * soft_offline_page - Soft offline a page.
>   * @pfn: pfn to soft-offline
> @@ -1910,20 +1916,26 @@ static int soft_offline_free_page(struct page *page)
>  int soft_offline_page(unsigned long pfn, int flags)
>  {
>  	int ret;
> -	struct page *page;
>  	bool try_again = true;
> +	struct page *page, *ref_page = NULL;
> +
> +	WARN_ON_ONCE(!pfn_valid(pfn) && (flags & MF_COUNT_INCREASED));

Did you see any scenario where this could happen? I understand that you are
adding this because we will leak a reference in case pfn is not valid anymore.

-- 
Oscar Salvador
SUSE L3
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/5] mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions
  2021-01-12  9:34 ` [PATCH v2 3/5] mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions Dan Williams
@ 2021-01-12 10:01   ` David Hildenbrand
  2021-01-12 11:00   ` Oscar Salvador
  1 sibling, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2021-01-12 10:01 UTC (permalink / raw)
  To: Dan Williams, linux-mm
  Cc: Andrew Morton, Michal Hocko, linux-nvdimm, linux-kernel

On 12.01.21 10:34, Dan Williams wrote:
> While pfn_to_online_page() is able to determine pfn_valid() at
> subsection granularity it is not able to reliably determine if a given
> pfn is also online if the section is mixes ZONE_{NORMAL,MOVABLE} with
> ZONE_DEVICE. This means that pfn_to_online_page() may return invalid
> @page objects. For example with a memory map like:
> 
> 100000000-1fbffffff : System RAM
>   142000000-143002e16 : Kernel code
>   143200000-143713fff : Kernel rodata
>   143800000-143b15b7f : Kernel data
>   144227000-144ffffff : Kernel bss
> 1fc000000-2fbffffff : Persistent Memory (legacy)
>   1fc000000-2fbffffff : namespace0.0
> 
> This command:
> 
> echo 0x1fc000000 > /sys/devices/system/memory/soft_offline_page
> 
> ...succeeds when it should fail. When it succeeds it touches
> an uninitialized page and may crash or cause other damage (see
> dissolve_free_huge_page()).
> 
> While the memory map above is contrived via the memmap=ss!nn kernel
> command line option, the collision happens in practice on shipping
> platforms. The memory controller resources that decode spans of
> physical address space are a limited resource. One technique
> platform-firmware uses to conserve those resources is to share a decoder
> across 2 devices to keep the address range contiguous. Unfortunately the
> unit of operation of a decoder is 64MiB while the Linux section size is
> 128MiB. This results in situations where, without subsection hotplug
> memory mappings with different lifetimes collide into one object that
> can only express one lifetime.
> 
> Update move_pfn_range_to_zone() to flag (SECTION_TAINT_ZONE_DEVICE) a
> section that mixes ZONE_DEVICE pfns with other online pfns. With
> SECTION_TAINT_ZONE_DEVICE to delineate, pfn_to_online_page() can fall
> back to a slow-path check for ZONE_DEVICE pfns in an online section. In
> the fast path online_section() for a full ZONE_DEVICE section returns
> false.
> 
> Because the collision case is rare, and for simplicity, the
> SECTION_TAINT_ZONE_DEVICE flag is never cleared once set.
> 
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Reported-by: Michal Hocko <mhocko@suse.com>
> Reported-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/mmzone.h |   22 +++++++++++++++-------
>  mm/memory_hotplug.c    |   38 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 53 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index b593316bff3d..0b5c44f730b4 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1273,13 +1273,14 @@ extern size_t mem_section_usage_size(void);
>   *      which results in PFN_SECTION_SHIFT equal 6.
>   * To sum it up, at least 6 bits are available.
>   */
> -#define	SECTION_MARKED_PRESENT	(1UL<<0)
> -#define SECTION_HAS_MEM_MAP	(1UL<<1)
> -#define SECTION_IS_ONLINE	(1UL<<2)
> -#define SECTION_IS_EARLY	(1UL<<3)
> -#define SECTION_MAP_LAST_BIT	(1UL<<4)
> -#define SECTION_MAP_MASK	(~(SECTION_MAP_LAST_BIT-1))
> -#define SECTION_NID_SHIFT	3
> +#define SECTION_MARKED_PRESENT		(1UL<<0)
> +#define SECTION_HAS_MEM_MAP		(1UL<<1)
> +#define SECTION_IS_ONLINE		(1UL<<2)
> +#define SECTION_IS_EARLY		(1UL<<3)
> +#define SECTION_TAINT_ZONE_DEVICE	(1UL<<4)
> +#define SECTION_MAP_LAST_BIT		(1UL<<5)
> +#define SECTION_MAP_MASK		(~(SECTION_MAP_LAST_BIT-1))
> +#define SECTION_NID_SHIFT		3
>  
>  static inline struct page *__section_mem_map_addr(struct mem_section *section)
>  {
> @@ -1318,6 +1319,13 @@ static inline int online_section(struct mem_section *section)
>  	return (section && (section->section_mem_map & SECTION_IS_ONLINE));
>  }
>  
> +static inline int online_device_section(struct mem_section *section)
> +{
> +	unsigned long flags = SECTION_IS_ONLINE | SECTION_TAINT_ZONE_DEVICE;
> +
> +	return section && ((section->section_mem_map & flags) == flags);
> +}
> +
>  static inline int online_section_nr(unsigned long nr)
>  {
>  	return online_section(__nr_to_section(nr));
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index a845b3979bc0..b2ccb84c3082 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -308,6 +308,7 @@ static int check_hotplug_memory_addressable(unsigned long pfn,
>  struct page *pfn_to_online_page(unsigned long pfn)
>  {
>  	unsigned long nr = pfn_to_section_nr(pfn);
> +	struct dev_pagemap *pgmap;
>  	struct mem_section *ms;
>  
>  	if (nr >= NR_MEM_SECTIONS)
> @@ -320,6 +321,22 @@ struct page *pfn_to_online_page(unsigned long pfn)
>  	if (!pfn_section_valid(ms, pfn))
>  		return NULL;
>  
> +	if (!online_device_section(ms))
> +		return pfn_to_page(pfn);
> +
> +	/*
> +	 * Slowpath: when ZONE_DEVICE collides with
> +	 * ZONE_{NORMAL,MOVABLE} within the same section some pfns in
> +	 * the section may be 'offline' but 'valid'. Only
> +	 * get_dev_pagemap() can determine sub-section online status.
> +	 */
> +	pgmap = get_dev_pagemap(pfn, NULL);
> +	put_dev_pagemap(pgmap);
> +
> +	/* The presence of a pgmap indicates ZONE_DEVICE offline pfn */
> +	if (pgmap)
> +		return NULL;
> +
>  	return pfn_to_page(pfn);
>  }
>  EXPORT_SYMBOL_GPL(pfn_to_online_page);
> @@ -702,6 +719,14 @@ static void __meminit resize_pgdat_range(struct pglist_data *pgdat, unsigned lon
>  	pgdat->node_spanned_pages = max(start_pfn + nr_pages, old_end_pfn) - pgdat->node_start_pfn;
>  
>  }
> +
> +static void section_taint_zone_device(unsigned long pfn)
> +{
> +	struct mem_section *ms = __pfn_to_section(pfn);
> +
> +	ms->section_mem_map |= SECTION_TAINT_ZONE_DEVICE;
> +}
> +
>  /*
>   * Associate the pfn range with the given zone, initializing the memmaps
>   * and resizing the pgdat/zone data to span the added pages. After this
> @@ -731,6 +756,19 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>  	resize_pgdat_range(pgdat, start_pfn, nr_pages);
>  	pgdat_resize_unlock(pgdat, &flags);
>  
> +	/*
> +	 * Subsection population requires care in pfn_to_online_page().
> +	 * Set the taint to enable the slow path detection of
> +	 * ZONE_DEVICE pages in an otherwise  ZONE_{NORMAL,MOVABLE}
> +	 * section.
> +	 */
> +	if (zone_idx(zone) == ZONE_DEVICE) {
> +		if (!IS_ALIGNED(start_pfn, PAGES_PER_SECTION))
> +			section_taint_zone_device(start_pfn);
> +		if (!IS_ALIGNED(start_pfn + nr_pages, PAGES_PER_SECTION))
> +			section_taint_zone_device(start_pfn + nr_pages);
> +	}
> +
>  	/*
>  	 * TODO now we have a visible range of pages which are not associated
>  	 * with their zone properly. Not nice but set_pfnblock_flags_mask
> 

LGHTM, thanks

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 4/5] mm: Fix page reference leak in soft_offline_page()
  2021-01-12  9:34 ` [PATCH v2 4/5] mm: Fix page reference leak in soft_offline_page() Dan Williams
  2021-01-12  9:53   ` Oscar Salvador
@ 2021-01-12 10:16   ` David Hildenbrand
  1 sibling, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2021-01-12 10:16 UTC (permalink / raw)
  To: Dan Williams, linux-mm
  Cc: Andrew Morton, Naoya Horiguchi, Michal Hocko, Oscar Salvador,
	stable, linux-nvdimm, linux-kernel

On 12.01.21 10:34, Dan Williams wrote:
> The conversion to move pfn_to_online_page() internal to
> soft_offline_page() missed that the get_user_pages() reference needs to
> be dropped when pfn_to_online_page() fails.
> 
> When soft_offline_page() is handed a pfn_valid() &&
> !pfn_to_online_page() pfn the kernel hangs at dax-device shutdown due to
> a leaked reference.
> 
> Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn")
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  mm/memory-failure.c |   20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 5a38e9eade94..78b173c7190c 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1885,6 +1885,12 @@ static int soft_offline_free_page(struct page *page)
>  	return rc;
>  }
>  
> +static void put_ref_page(struct page *page)
> +{
> +	if (page)
> +		put_page(page);
> +}
> +
>  /**
>   * soft_offline_page - Soft offline a page.
>   * @pfn: pfn to soft-offline
> @@ -1910,20 +1916,26 @@ static int soft_offline_free_page(struct page *page)
>  int soft_offline_page(unsigned long pfn, int flags)
>  {
>  	int ret;
> -	struct page *page;
>  	bool try_again = true;
> +	struct page *page, *ref_page = NULL;
> +
> +	WARN_ON_ONCE(!pfn_valid(pfn) && (flags & MF_COUNT_INCREASED));
>  
>  	if (!pfn_valid(pfn))
>  		return -ENXIO;
> +	if (flags & MF_COUNT_INCREASED)
> +		ref_page = pfn_to_page(pfn);
> +
>  	/* Only online pages can be soft-offlined (esp., not ZONE_DEVICE). */
>  	page = pfn_to_online_page(pfn);
> -	if (!page)
> +	if (!page) {
> +		put_ref_page(ref_page);
>  		return -EIO;
> +	}
>  
>  	if (PageHWPoison(page)) {
>  		pr_info("%s: %#lx page already poisoned\n", __func__, pfn);
> -		if (flags & MF_COUNT_INCREASED)
> -			put_page(page);
> +		put_ref_page(ref_page);
>  		return 0;
>  	}

Reviewed-by: David Hildenbrand <david@redhat.com>


-- 
Thanks,

David / dhildenb
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/5] mm: Move pfn_to_online_page() out of line
  2021-01-12  9:34 ` [PATCH v2 1/5] mm: Move pfn_to_online_page() out of line Dan Williams
  2021-01-12  9:46   ` David Hildenbrand
@ 2021-01-12 10:19   ` Oscar Salvador
  1 sibling, 0 replies; 16+ messages in thread
From: Oscar Salvador @ 2021-01-12 10:19 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-mm, David Hildenbrand, Michal Hocko, linux-nvdimm, linux-kernel

On Tue, Jan 12, 2021 at 01:34:42AM -0800, Dan Williams wrote:
> pfn_to_online_page() is already too large to be a macro or an inline
> function. In anticipation of further logic changes / growth, move it out
> of line.
> 
> No functional change, just code movement.
> 
> Cc: David Hildenbrand <david@redhat.com>
> Reported-by: Michal Hocko <mhocko@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

-- 
Oscar Salvador
SUSE L3
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] mm: Teach pfn_to_online_page() to consider subsection validity
  2021-01-12  9:53   ` David Hildenbrand
@ 2021-01-12 10:48     ` Oscar Salvador
  2021-01-12 22:20     ` Dan Williams
  1 sibling, 0 replies; 16+ messages in thread
From: Oscar Salvador @ 2021-01-12 10:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, Qian Cai, Michal Hocko, linux-nvdimm, linux-kernel,
	Anshuman Khandual

On Tue, Jan 12, 2021 at 10:53:17AM +0100, David Hildenbrand wrote:
> That's not sufficient for alternative implementations of pfn_valid().
> 
> You still need some kind of pfn_valid(pfn) for alternative versions of
> pfn_valid(). Consider arm64 memory holes in the memmap. See their
> current (yet to be fixed/reworked) pfn_valid() implementation.
> (pfn_valid_within() is implicitly active on arm64)
> 
> Actually, I think we should add something like the following, to make
> this clearer (pfn_valid_within() is confusing)
> 
> #ifdef CONFIG_HAVE_ARCH_PFN_VALID
> 	/* We might have to check for holes inside the memmap. */
> 	if (!pfn_valid())
> 		return NULL;
> #endif

I have to confess that I was a bit confused by pfn_valid_within + HOLES_IN_ZONES
+ HAVE_ARCH_PFN_VALID.

At first I thought that we should stick with pfn_valid_within, as we also
depend on HOLES_IN_ZONES, so it could be that

 if (IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID))
  ...

would to too much work, as if CONFIG_HOLES_IN_ZONES was not set but an arch
pfn_valid was provided, we would perform unedeed checks.
But on a closer look, CONFIG_HOLES_IN_ZONES is set by default on arm64, and
on ia64 when SPARSEMEM is set, so looks fine.


-- 
Oscar Salvador
SUSE L3
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/5] mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions
  2021-01-12  9:34 ` [PATCH v2 3/5] mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions Dan Williams
  2021-01-12 10:01   ` David Hildenbrand
@ 2021-01-12 11:00   ` Oscar Salvador
  1 sibling, 0 replies; 16+ messages in thread
From: Oscar Salvador @ 2021-01-12 11:00 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-mm, Andrew Morton, Michal Hocko, David Hildenbrand,
	linux-nvdimm, linux-kernel

On Tue, Jan 12, 2021 at 01:34:53AM -0800, Dan Williams wrote:
> While pfn_to_online_page() is able to determine pfn_valid() at
> subsection granularity it is not able to reliably determine if a given
> pfn is also online if the section is mixes ZONE_{NORMAL,MOVABLE} with
> ZONE_DEVICE. This means that pfn_to_online_page() may return invalid
> @page objects. For example with a memory map like:
> 
> 100000000-1fbffffff : System RAM
>   142000000-143002e16 : Kernel code
>   143200000-143713fff : Kernel rodata
>   143800000-143b15b7f : Kernel data
>   144227000-144ffffff : Kernel bss
> 1fc000000-2fbffffff : Persistent Memory (legacy)
>   1fc000000-2fbffffff : namespace0.0
> 
> This command:
> 
> echo 0x1fc000000 > /sys/devices/system/memory/soft_offline_page
> 
> ...succeeds when it should fail. When it succeeds it touches
> an uninitialized page and may crash or cause other damage (see
> dissolve_free_huge_page()).

[...]
 
> Because the collision case is rare, and for simplicity, the
> SECTION_TAINT_ZONE_DEVICE flag is never cleared once set.
> 
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Reported-by: Michal Hocko <mhocko@suse.com>
> Reported-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

-- 
Oscar Salvador
SUSE L3
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 4/5] mm: Fix page reference leak in soft_offline_page()
  2021-01-12  9:53   ` Oscar Salvador
@ 2021-01-12 20:03     ` Dan Williams
  0 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2021-01-12 20:03 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Linux MM, Andrew Morton, Naoya Horiguchi, David Hildenbrand,
	Michal Hocko, stable, linux-nvdimm, Linux Kernel Mailing List

On Tue, Jan 12, 2021 at 1:54 AM Oscar Salvador <osalvador@suse.de> wrote:
>
> On Tue, Jan 12, 2021 at 01:34:58AM -0800, Dan Williams wrote:
> > The conversion to move pfn_to_online_page() internal to
> > soft_offline_page() missed that the get_user_pages() reference needs to
> > be dropped when pfn_to_online_page() fails.
>
> I would be more specific here wrt. get_user_pages (madvise).
> soft_offline_page gets called from more places besides madvise_*.

Sure.

>
> > When soft_offline_page() is handed a pfn_valid() &&
> > !pfn_to_online_page() pfn the kernel hangs at dax-device shutdown due to
> > a leaked reference.
> >
> > Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn")
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: Oscar Salvador <osalvador@suse.de>
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> LGTM, thanks for catching this:
>
> Reviewed-by: Oscar Salvador <osalvador@suse.de>
>
> A nit below.
>
> > ---
> >  mm/memory-failure.c |   20 ++++++++++++++++----
> >  1 file changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index 5a38e9eade94..78b173c7190c 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -1885,6 +1885,12 @@ static int soft_offline_free_page(struct page *page)
> >       return rc;
> >  }
> >
> > +static void put_ref_page(struct page *page)
> > +{
> > +     if (page)
> > +             put_page(page);
> > +}
>
> I am not sure this warrants a function.
> I would probably go with "if (ref_page).." in the two corresponding places,
> but not feeling strong here.

I'll take another look, it felt cluttered...

>
> > +
> >  /**
> >   * soft_offline_page - Soft offline a page.
> >   * @pfn: pfn to soft-offline
> > @@ -1910,20 +1916,26 @@ static int soft_offline_free_page(struct page *page)
> >  int soft_offline_page(unsigned long pfn, int flags)
> >  {
> >       int ret;
> > -     struct page *page;
> >       bool try_again = true;
> > +     struct page *page, *ref_page = NULL;
> > +
> > +     WARN_ON_ONCE(!pfn_valid(pfn) && (flags & MF_COUNT_INCREASED));
>
> Did you see any scenario where this could happen? I understand that you are
> adding this because we will leak a reference in case pfn is not valid anymore.
>

I did not, more future proofing / documenting against refactoring that
fails to consider that case.
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] mm: Teach pfn_to_online_page() to consider subsection validity
  2021-01-12  9:53   ` David Hildenbrand
  2021-01-12 10:48     ` Oscar Salvador
@ 2021-01-12 22:20     ` Dan Williams
  1 sibling, 0 replies; 16+ messages in thread
From: Dan Williams @ 2021-01-12 22:20 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Qian Cai, Michal Hocko, linux-nvdimm,
	Linux Kernel Mailing List, Anshuman Khandual

On Tue, Jan 12, 2021 at 1:53 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 12.01.21 10:34, Dan Williams wrote:
> > pfn_section_valid() determines pfn validity on subsection granularity.
> >
> > pfn_valid_within() internally uses pfn_section_valid(), but gates it
> > with early_section() to preserve the traditional behavior of pfn_valid()
> > before subsection support was added.
> >
> > pfn_to_online_page() wants the explicit precision that pfn_valid() does
> > not offer, so use pfn_section_valid() directly. Since
> > pfn_to_online_page() already open codes the validity of the section
> > number vs NR_MEM_SECTIONS, there's not much value to using
> > pfn_valid_within(), just use pfn_section_valid(). This loses the
> > valid_section() check that pfn_valid_within() was performing, but that
> > was already redundant with the online check.
> >
> > Fixes: b13bc35193d9 ("mm/hotplug: invalid PFNs from pfn_to_online_page()")
> > Cc: Qian Cai <cai@lca.pw>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Reported-by: David Hildenbrand <david@redhat.com>
> > ---
> >  mm/memory_hotplug.c |   16 ++++++++++++----
> >  1 file changed, 12 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index 55a69d4396e7..a845b3979bc0 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -308,11 +308,19 @@ static int check_hotplug_memory_addressable(unsigned long pfn,
> >  struct page *pfn_to_online_page(unsigned long pfn)
> >  {
> >       unsigned long nr = pfn_to_section_nr(pfn);
> > +     struct mem_section *ms;
> > +
> > +     if (nr >= NR_MEM_SECTIONS)
> > +             return NULL;
> > +
> > +     ms = __nr_to_section(nr);
> > +     if (!online_section(ms))
> > +             return NULL;
> > +
> > +     if (!pfn_section_valid(ms, pfn))
> > +             return NULL;
>
> That's not sufficient for alternative implementations of pfn_valid().
>
> You still need some kind of pfn_valid(pfn) for alternative versions of
> pfn_valid(). Consider arm64 memory holes in the memmap. See their
> current (yet to be fixed/reworked) pfn_valid() implementation.
> (pfn_valid_within() is implicitly active on arm64)
>
> Actually, I think we should add something like the following, to make
> this clearer (pfn_valid_within() is confusing)
>
> #ifdef CONFIG_HAVE_ARCH_PFN_VALID
>         /* We might have to check for holes inside the memmap. */
>         if (!pfn_valid())
>                 return NULL;
> #endif

Looks good to me, I'll take Oscar's version that uses IS_ENABLED().

Skipping the call to pfn_valid() saves 16-bytes of code text on x86_64.
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-01-12 22:20 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-12  9:34 [PATCH v2 0/5] mm: Fix pfn_to_online_page() with respect to ZONE_DEVICE Dan Williams
2021-01-12  9:34 ` [PATCH v2 1/5] mm: Move pfn_to_online_page() out of line Dan Williams
2021-01-12  9:46   ` David Hildenbrand
2021-01-12 10:19   ` Oscar Salvador
2021-01-12  9:34 ` [PATCH v2 2/5] mm: Teach pfn_to_online_page() to consider subsection validity Dan Williams
2021-01-12  9:53   ` David Hildenbrand
2021-01-12 10:48     ` Oscar Salvador
2021-01-12 22:20     ` Dan Williams
2021-01-12  9:34 ` [PATCH v2 3/5] mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions Dan Williams
2021-01-12 10:01   ` David Hildenbrand
2021-01-12 11:00   ` Oscar Salvador
2021-01-12  9:34 ` [PATCH v2 4/5] mm: Fix page reference leak in soft_offline_page() Dan Williams
2021-01-12  9:53   ` Oscar Salvador
2021-01-12 20:03     ` Dan Williams
2021-01-12 10:16   ` David Hildenbrand
2021-01-12  9:35 ` [PATCH v2 5/5] libnvdimm/namespace: Fix visibility of namespace resource attribute Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).