Stable Archive on lore.kernel.org
 help / color / Atom feed
* [5.4-stable PATCH 0/7] libnvdimm: Cross-arch compatible namespace alignment
@ 2020-05-21 23:37 Dan Williams
  2020-05-21 23:37 ` [5.4-stable PATCH 1/7] mm/memremap_pages: Kill unused __devm_memremap_pages() Dan Williams
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Dan Williams @ 2020-05-21 23:37 UTC (permalink / raw)
  To: stable
  Cc: Oliver O'Halloran, Vishal Verma, Benjamin Herrenschmidt,
	Michael Ellerman, Paul Mackerras, Christoph Hellwig, Jeff Moyer,
	Aneesh Kumar K.V, linux-nvdimm

Hello stable team,

These patches have been shipping in mainline since v5.7-rc1 with no
reported issues. They address long standing problems in libnvdimm's
handling of namespace provisioning relative to alignment constraints
including crashes trying to even load the driver on some PowerPC
configurations.

I did fold one build fix [1] into "libnvdimm/region: Introduce an 'align'
attribute" so as to not convey the bisection breakage to -stable.

Please consider them for v5.4-stable. They do pass the latest
version of the ndctl unit tests.

[1]: 04ff4863e126 libnvdimm/region: Fix build error


---

Original cover letter for the series:

Aneesh reports that PowerPC requires 16MiB alignment for the address
range passed to devm_memremap_pages(), and Jeff reports that it is
possible to create a misaligned namespace which blocks future namespace
creation in that region. Both of these issues require namespace
alignment to be managed at the region level rather than padding at the
namespace level which has been a broken approach to date.

Introduce memremap_compat_align() to indicate the hard requirements of
an arch's memremap_pages() implementation. Use the maximum known
memremap_compat_align() to set the default namespace alignment for
libnvdimm. Consult that alignment when allocating free space. Finally,
allow the default region alignment to be overridden to maintain the same
namespace creation capability as previous kernels (modulo dax operation
not being supported with a non-zero start_pad).

---

Aneesh Kumar K.V (1):
      libnvdimm/pfn_dev: Don't clear device memmap area during generic namespace probe

Dan Williams (6):
      mm/memremap_pages: Kill unused __devm_memremap_pages()
      mm/memremap_pages: Introduce memremap_compat_align()
      libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid
      libnvdimm/namespace: Enforce memremap_compat_align()
      libnvdimm/region: Introduce NDD_LABELING
      libnvdimm/region: Introduce an 'align' attribute


 arch/powerpc/Kconfig                      |    1 
 arch/powerpc/mm/ioremap.c                 |   21 +++++
 arch/powerpc/platforms/pseries/papr_scm.c |    2 
 drivers/acpi/nfit/core.c                  |    4 +
 drivers/nvdimm/dimm.c                     |    2 
 drivers/nvdimm/dimm_devs.c                |   95 +++++++++++++++++----
 drivers/nvdimm/namespace_devs.c           |   28 ++++++
 drivers/nvdimm/nd.h                       |    3 -
 drivers/nvdimm/pfn.h                      |   12 +++
 drivers/nvdimm/pfn_devs.c                 |   48 +++++++++--
 drivers/nvdimm/region_devs.c              |  130 ++++++++++++++++++++++++++---
 include/linux/io.h                        |    2 
 include/linux/libnvdimm.h                 |    2 
 include/linux/memremap.h                  |    8 ++
 include/linux/mmzone.h                    |    1 
 lib/Kconfig                               |    3 +
 mm/memremap.c                             |   23 +++++
 17 files changed, 335 insertions(+), 50 deletions(-)

base-commit: 1cdaf895c99d319c0007d0b62818cf85fc4b087f

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [5.4-stable PATCH 1/7] mm/memremap_pages: Kill unused __devm_memremap_pages()
  2020-05-21 23:37 [5.4-stable PATCH 0/7] libnvdimm: Cross-arch compatible namespace alignment Dan Williams
@ 2020-05-21 23:37 ` Dan Williams
  2020-05-21 23:37 ` [5.4-stable PATCH 2/7] mm/memremap_pages: Introduce memremap_compat_align() Dan Williams
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2020-05-21 23:37 UTC (permalink / raw)
  To: stable
  Cc: Christoph Hellwig, Aneesh Kumar K.V, Christoph Hellwig,
	vishal.l.verma, linux-nvdimm

Commit 1d0827b75ee7df497f611a2ac412a88135fb0ef5 upstream.

Kill this definition that was introduced in commit 41e94a851304 ("add
devm_memremap_pages") add never used.

Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/158041476158.3889308.4221100673554151124.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/io.h |    2 --
 1 file changed, 2 deletions(-)

diff --git a/include/linux/io.h b/include/linux/io.h
index a59834bc0a11..35e8d84935e0 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -79,8 +79,6 @@ void *devm_memremap(struct device *dev, resource_size_t offset,
 		size_t size, unsigned long flags);
 void devm_memunmap(struct device *dev, void *addr);
 
-void *__devm_memremap_pages(struct device *dev, struct resource *res);
-
 #ifdef CONFIG_PCI
 /*
  * The PCI specifications (Rev 3.0, 3.2.5 "Transaction Ordering and


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [5.4-stable PATCH 2/7] mm/memremap_pages: Introduce memremap_compat_align()
  2020-05-21 23:37 [5.4-stable PATCH 0/7] libnvdimm: Cross-arch compatible namespace alignment Dan Williams
  2020-05-21 23:37 ` [5.4-stable PATCH 1/7] mm/memremap_pages: Kill unused __devm_memremap_pages() Dan Williams
@ 2020-05-21 23:37 ` Dan Williams
  2020-05-21 23:37 ` [5.4-stable PATCH 3/7] libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid Dan Williams
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2020-05-21 23:37 UTC (permalink / raw)
  To: stable
  Cc: Aneesh Kumar K.V, Jeff Moyer, Benjamin Herrenschmidt,
	Paul Mackerras, Aneesh Kumar K.V, Michael Ellerman,
	vishal.l.verma, hch, linux-nvdimm

Commit 9ffc1d19fc4a6dfcfe06c91c2861ad6d44fdd92d upstream.

The "sub-section memory hotplug" facility allows memremap_pages() users
like libnvdimm to compensate for hardware platforms like x86 that have a
section size larger than their hardware memory mapping granularity.  The
compensation that sub-section support affords is being tolerant of
physical memory resources shifting by units smaller (64MiB on x86) than
the memory-hotplug section size (128 MiB). Where the platform
physical-memory mapping granularity is limited by the number and
capability of address-decode-registers in the memory controller.

While the sub-section support allows memremap_pages() to operate on
sub-section (2MiB) granularity, the Power architecture may still
require 16MiB alignment on "!radix_enabled()" platforms.

In order for libnvdimm to be able to detect and manage this per-arch
limitation, introduce memremap_compat_align() as a common minimum
alignment across all driver-facing memory-mapping interfaces, and let
Power override it to 16MiB in the "!radix_enabled()" case.

The assumption / requirement for 16MiB to be a viable
memremap_compat_align() value is that Power does not have platforms
where its equivalent of address-decode-registers never hardware remaps a
persistent memory resource on smaller than 16MiB boundaries. Note that I
tried my best to not add a new Kconfig symbol, but header include
entanglements defeated the #ifndef memremap_compat_align design pattern
and the need to export it defeats the __weak design pattern for arch
overrides.

Based on an initial patch by Aneesh.

Link: http://lore.kernel.org/r/CAPcyv4gBGNP95APYaBcsocEa50tQj9b5h__83vgngjq3ouGX_Q@mail.gmail.com
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/powerpc/Kconfig      |    1 +
 arch/powerpc/mm/ioremap.c |   21 +++++++++++++++++++++
 drivers/nvdimm/pfn_devs.c |    2 +-
 include/linux/memremap.h  |    8 ++++++++
 include/linux/mmzone.h    |    1 +
 lib/Kconfig               |    3 +++
 mm/memremap.c             |   23 +++++++++++++++++++++++
 7 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2b1033f13210..a6b65bb6be47 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -126,6 +126,7 @@ config PPC
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_HAS_KCOV
 	select ARCH_HAS_HUGEPD			if HUGETLB_PAGE
+	select ARCH_HAS_MEMREMAP_COMPAT_ALIGN
 	select ARCH_HAS_MMIOWB			if PPC64
 	select ARCH_HAS_PHYS_TO_DMA
 	select ARCH_HAS_PMEM_API
diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
index fc669643ce6a..b1a0aebe8c48 100644
--- a/arch/powerpc/mm/ioremap.c
+++ b/arch/powerpc/mm/ioremap.c
@@ -2,6 +2,7 @@
 
 #include <linux/io.h>
 #include <linux/slab.h>
+#include <linux/mmzone.h>
 #include <linux/vmalloc.h>
 #include <asm/io-workarounds.h>
 
@@ -97,3 +98,23 @@ void __iomem *do_ioremap(phys_addr_t pa, phys_addr_t offset, unsigned long size,
 
 	return NULL;
 }
+
+#ifdef CONFIG_ZONE_DEVICE
+/*
+ * Override the generic version in mm/memremap.c.
+ *
+ * With hash translation, the direct-map range is mapped with just one
+ * page size selected by htab_init_page_sizes(). Consult
+ * mmu_psize_defs[] to determine the minimum page size alignment.
+*/
+unsigned long memremap_compat_align(void)
+{
+	unsigned int shift = mmu_psize_defs[mmu_linear_psize].shift;
+
+	if (radix_enabled())
+		return SUBSECTION_SIZE;
+	return max(SUBSECTION_SIZE, 1UL << shift);
+
+}
+EXPORT_SYMBOL_GPL(memremap_compat_align);
+#endif
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 60d81fae06ee..aa144c8a4ee6 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -752,7 +752,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 	start = nsio->res.start;
 	size = resource_size(&nsio->res);
 	npfns = PHYS_PFN(size - SZ_8K);
-	align = max(nd_pfn->align, (1UL << SUBSECTION_SHIFT));
+	align = max(nd_pfn->align, SUBSECTION_SIZE);
 	end_trunc = start + size - ALIGN_DOWN(start + size, align);
 	if (nd_pfn->mode == PFN_MODE_PMEM) {
 		/*
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 6fefb09af7c3..8af1cbd8f293 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -132,6 +132,7 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
 
 unsigned long vmem_altmap_offset(struct vmem_altmap *altmap);
 void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns);
+unsigned long memremap_compat_align(void);
 #else
 static inline void *devm_memremap_pages(struct device *dev,
 		struct dev_pagemap *pgmap)
@@ -165,6 +166,12 @@ static inline void vmem_altmap_free(struct vmem_altmap *altmap,
 		unsigned long nr_pfns)
 {
 }
+
+/* when memremap_pages() is disabled all archs can remap a single page */
+static inline unsigned long memremap_compat_align(void)
+{
+	return PAGE_SIZE;
+}
 #endif /* CONFIG_ZONE_DEVICE */
 
 static inline void put_dev_pagemap(struct dev_pagemap *pgmap)
@@ -172,4 +179,5 @@ static inline void put_dev_pagemap(struct dev_pagemap *pgmap)
 	if (pgmap)
 		percpu_ref_put(pgmap->ref);
 }
+
 #endif /* _LINUX_MEMREMAP_H_ */
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 8b5f758942a2..c4d17aff7f5f 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1157,6 +1157,7 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec)
 #define SECTION_ALIGN_DOWN(pfn)	((pfn) & PAGE_SECTION_MASK)
 
 #define SUBSECTION_SHIFT 21
+#define SUBSECTION_SIZE (1UL << SUBSECTION_SHIFT)
 
 #define PFN_SUBSECTION_SHIFT (SUBSECTION_SHIFT - PAGE_SHIFT)
 #define PAGES_PER_SUBSECTION (1UL << PFN_SUBSECTION_SHIFT)
diff --git a/lib/Kconfig b/lib/Kconfig
index 3321d04dfa5a..7e779a868a8b 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -605,6 +605,9 @@ config ARCH_NO_SG_CHAIN
 config ARCH_HAS_PMEM_API
 	bool
 
+config ARCH_HAS_MEMREMAP_COMPAT_ALIGN
+	bool
+
 # use memcpy to implement user copies for nommu architectures
 config UACCESS_MEMCPY
 	bool
diff --git a/mm/memremap.c b/mm/memremap.c
index c51c6bd2fe34..1053fe72e114 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -7,6 +7,7 @@
 #include <linux/mm.h>
 #include <linux/pfn_t.h>
 #include <linux/swap.h>
+#include <linux/mmzone.h>
 #include <linux/swapops.h>
 #include <linux/types.h>
 #include <linux/wait_bit.h>
@@ -14,6 +15,28 @@
 
 static DEFINE_XARRAY(pgmap_array);
 
+/*
+ * The memremap() and memremap_pages() interfaces are alternately used
+ * to map persistent memory namespaces. These interfaces place different
+ * constraints on the alignment and size of the mapping (namespace).
+ * memremap() can map individual PAGE_SIZE pages. memremap_pages() can
+ * only map subsections (2MB), and at least one architecture (PowerPC)
+ * the minimum mapping granularity of memremap_pages() is 16MB.
+ *
+ * The role of memremap_compat_align() is to communicate the minimum
+ * arch supported alignment of a namespace such that it can freely
+ * switch modes without violating the arch constraint. Namely, do not
+ * allow a namespace to be PAGE_SIZE aligned since that namespace may be
+ * reconfigured into a mode that requires SUBSECTION_SIZE alignment.
+ */
+#ifndef CONFIG_ARCH_HAS_MEMREMAP_COMPAT_ALIGN
+unsigned long memremap_compat_align(void)
+{
+	return SUBSECTION_SIZE;
+}
+EXPORT_SYMBOL_GPL(memremap_compat_align);
+#endif
+
 #ifdef CONFIG_DEV_PAGEMAP_OPS
 DEFINE_STATIC_KEY_FALSE(devmap_managed_key);
 EXPORT_SYMBOL(devmap_managed_key);


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [5.4-stable PATCH 3/7] libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid
  2020-05-21 23:37 [5.4-stable PATCH 0/7] libnvdimm: Cross-arch compatible namespace alignment Dan Williams
  2020-05-21 23:37 ` [5.4-stable PATCH 1/7] mm/memremap_pages: Kill unused __devm_memremap_pages() Dan Williams
  2020-05-21 23:37 ` [5.4-stable PATCH 2/7] mm/memremap_pages: Introduce memremap_compat_align() Dan Williams
@ 2020-05-21 23:37 ` Dan Williams
  2020-05-21 23:38 ` [5.4-stable PATCH 4/7] libnvdimm/pfn_dev: Don't clear device memmap area during generic namespace probe Dan Williams
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2020-05-21 23:37 UTC (permalink / raw)
  To: stable
  Cc: Aneesh Kumar K.V, Jeff Moyer, Aneesh Kumar K.V, vishal.l.verma,
	hch, linux-nvdimm

Commit b2ba7e91fa81bec9b64c47ab852145559cad2b68 upstream.

The EOPNOTSUPP return code from the pmem driver indicates that the
namespace has a configuration that may be valid, but the current kernel
does not support it. Expand this to all of the nd_pfn_validate() error
conditions after the infoblock has been verified as self consistent.

This prevents exposing the namespace to I/O when the infoblock needs to
be corrected, or the system needs to be put into a different
configuration (like changing the page size on PowerPC).

Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/pfn_devs.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index aa144c8a4ee6..8c5b13567f55 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -560,14 +560,14 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
 			dev_dbg(&nd_pfn->dev, "align: %lx:%lx mode: %d:%d\n",
 					nd_pfn->align, align, nd_pfn->mode,
 					mode);
-			return -EINVAL;
+			return -EOPNOTSUPP;
 		}
 	}
 
 	if (align > nvdimm_namespace_capacity(ndns)) {
 		dev_err(&nd_pfn->dev, "alignment: %lx exceeds capacity %llx\n",
 				align, nvdimm_namespace_capacity(ndns));
-		return -EINVAL;
+		return -EOPNOTSUPP;
 	}
 
 	/*
@@ -580,7 +580,7 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
 	if (offset >= resource_size(&nsio->res)) {
 		dev_err(&nd_pfn->dev, "pfn array size exceeds capacity of %s\n",
 				dev_name(&ndns->dev));
-		return -EBUSY;
+		return -EOPNOTSUPP;
 	}
 
 	if ((align && !IS_ALIGNED(nsio->res.start + offset + start_pad, align))
@@ -588,7 +588,7 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
 		dev_err(&nd_pfn->dev,
 				"bad offset: %#llx dax disabled align: %#lx\n",
 				offset, align);
-		return -ENXIO;
+		return -EOPNOTSUPP;
 	}
 
 	return nd_pfn_clear_memmap_errors(nd_pfn);


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [5.4-stable PATCH 4/7] libnvdimm/pfn_dev: Don't clear device memmap area during generic namespace probe
  2020-05-21 23:37 [5.4-stable PATCH 0/7] libnvdimm: Cross-arch compatible namespace alignment Dan Williams
                   ` (2 preceding siblings ...)
  2020-05-21 23:37 ` [5.4-stable PATCH 3/7] libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid Dan Williams
@ 2020-05-21 23:38 ` Dan Williams
  2020-05-21 23:38 ` [5.4-stable PATCH 5/7] libnvdimm/namespace: Enforce memremap_compat_align() Dan Williams
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2020-05-21 23:38 UTC (permalink / raw)
  To: stable; +Cc: Aneesh Kumar K.V, vishal.l.verma, hch, linux-nvdimm

From: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

Commit c1f45d86a522d568aef541dbbc066ccac262b4c3 upstream.

nvdimm core use nd_pfn_validate when looking for devdax or fsdax namespace. In this
case device resources are allocated against nd_namespace_io dev. In-order to
allow remap of range in nd_pfn_clear_memmap_error(), move the device memmap
area clearing while initializing pfn namespace. With this device
resource are allocated against nd_pfn and we can use nd_pfn->dev for remapping.

This also avoids calling nd_pfn_clear_mmap_errors twice. Once while probing the
namespace and second while initializing a pfn namespace.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Link: https://lore.kernel.org/r/20191101032728.113001-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/pfn_devs.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 8c5b13567f55..6e5b042f453e 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -591,7 +591,7 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
 		return -EOPNOTSUPP;
 	}
 
-	return nd_pfn_clear_memmap_errors(nd_pfn);
+	return 0;
 }
 EXPORT_SYMBOL(nd_pfn_validate);
 
@@ -729,6 +729,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 		sig = PFN_SIG;
 
 	rc = nd_pfn_validate(nd_pfn, sig);
+	if (rc == 0)
+		return nd_pfn_clear_memmap_errors(nd_pfn);
 	if (rc != -ENODEV)
 		return rc;
 
@@ -796,6 +798,10 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 	checksum = nd_sb_checksum((struct nd_gen_sb *) pfn_sb);
 	pfn_sb->checksum = cpu_to_le64(checksum);
 
+	rc = nd_pfn_clear_memmap_errors(nd_pfn);
+	if (rc)
+		return rc;
+
 	return nvdimm_write_bytes(ndns, SZ_4K, pfn_sb, sizeof(*pfn_sb), 0);
 }
 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [5.4-stable PATCH 5/7] libnvdimm/namespace: Enforce memremap_compat_align()
  2020-05-21 23:37 [5.4-stable PATCH 0/7] libnvdimm: Cross-arch compatible namespace alignment Dan Williams
                   ` (3 preceding siblings ...)
  2020-05-21 23:38 ` [5.4-stable PATCH 4/7] libnvdimm/pfn_dev: Don't clear device memmap area during generic namespace probe Dan Williams
@ 2020-05-21 23:38 ` Dan Williams
  2020-05-21 23:38 ` [5.4-stable PATCH 6/7] libnvdimm/region: Introduce NDD_LABELING Dan Williams
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2020-05-21 23:38 UTC (permalink / raw)
  To: stable
  Cc: Aneesh Kumar K.V, Jeff Moyer, Aneesh Kumar K.V, vishal.l.verma,
	hch, linux-nvdimm

Commit 6acd7d5ef264d8e9a8988cebf6eeb3567eaf60c6 upstream.

The pmem driver on PowerPC crashes with the following signature when
instantiating misaligned namespaces that map their capacity via
memremap_pages().

    BUG: Unable to handle kernel data access at 0xc001000406000000
    Faulting instruction address: 0xc000000000090790
    NIP [c000000000090790] arch_add_memory+0xc0/0x130
    LR [c000000000090744] arch_add_memory+0x74/0x130
    Call Trace:
     arch_add_memory+0x74/0x130 (unreliable)
     memremap_pages+0x74c/0xa30
     devm_memremap_pages+0x3c/0xa0
     pmem_attach_disk+0x188/0x770
     nvdimm_bus_probe+0xd8/0x470

With the assumption that only memremap_pages() has alignment
constraints, enforce memremap_compat_align() for
pmem_should_map_pages(), nd_pfn, and nd_dax cases. This includes
preventing the creation of namespaces where the base address is
misaligned and cases there infoblock padding parameters are invalid.

Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Fixes: a3619190d62e ("libnvdimm/pfn: stop padding pmem namespaces to section alignment")
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/namespace_devs.c |   17 +++++++++++++++++
 drivers/nvdimm/pfn.h            |   12 ++++++++++++
 drivers/nvdimm/pfn_devs.c       |   32 +++++++++++++++++++++++++++++---
 3 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index cca0a3ba1d2c..b2db15250e0d 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -10,6 +10,7 @@
 #include <linux/nd.h>
 #include "nd-core.h"
 #include "pmem.h"
+#include "pfn.h"
 #include "nd.h"
 
 static void namespace_io_release(struct device *dev)
@@ -1735,6 +1736,22 @@ struct nd_namespace_common *nvdimm_namespace_common_probe(struct device *dev)
 		return ERR_PTR(-ENODEV);
 	}
 
+	/*
+	 * Note, alignment validation for fsdax and devdax mode
+	 * namespaces happens in nd_pfn_validate() where infoblock
+	 * padding parameters can be applied.
+	 */
+	if (pmem_should_map_pages(dev)) {
+		struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
+		struct resource *res = &nsio->res;
+
+		if (!IS_ALIGNED(res->start | (res->end + 1),
+					memremap_compat_align())) {
+			dev_err(&ndns->dev, "%pr misaligned, unable to map\n", res);
+			return ERR_PTR(-EOPNOTSUPP);
+		}
+	}
+
 	if (is_namespace_pmem(&ndns->dev)) {
 		struct nd_namespace_pmem *nspm;
 
diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h
index acb19517f678..37cb1b8a2a39 100644
--- a/drivers/nvdimm/pfn.h
+++ b/drivers/nvdimm/pfn.h
@@ -24,6 +24,18 @@ struct nd_pfn_sb {
 	__le64 npfns;
 	__le32 mode;
 	/* minor-version-1 additions for section alignment */
+	/**
+	 * @start_pad: Deprecated attribute to pad start-misaligned namespaces
+	 *
+	 * start_pad is deprecated because the original definition did
+	 * not comprehend that dataoff is relative to the base address
+	 * of the namespace not the start_pad adjusted base. The result
+	 * is that the dax path is broken, but the block-I/O path is
+	 * not. The kernel will no longer create namespaces using start
+	 * padding, but it still supports block-I/O for legacy
+	 * configurations mainly to allow a backup, reconfigure the
+	 * namespace, and restore flow to repair dax operation.
+	 */
 	__le32 start_pad;
 	__le32 end_trunc;
 	/* minor-version-2 record the base alignment of the mapping */
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 6e5b042f453e..18a2602d93e7 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -445,6 +445,7 @@ static bool nd_supported_alignment(unsigned long align)
 int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
 {
 	u64 checksum, offset;
+	struct resource *res;
 	enum nd_pfn_mode mode;
 	struct nd_namespace_io *nsio;
 	unsigned long align, start_pad;
@@ -577,13 +578,14 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
 	 * established.
 	 */
 	nsio = to_nd_namespace_io(&ndns->dev);
-	if (offset >= resource_size(&nsio->res)) {
+	res = &nsio->res;
+	if (offset >= resource_size(res)) {
 		dev_err(&nd_pfn->dev, "pfn array size exceeds capacity of %s\n",
 				dev_name(&ndns->dev));
 		return -EOPNOTSUPP;
 	}
 
-	if ((align && !IS_ALIGNED(nsio->res.start + offset + start_pad, align))
+	if ((align && !IS_ALIGNED(res->start + offset + start_pad, align))
 			|| !IS_ALIGNED(offset, PAGE_SIZE)) {
 		dev_err(&nd_pfn->dev,
 				"bad offset: %#llx dax disabled align: %#lx\n",
@@ -591,6 +593,18 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
 		return -EOPNOTSUPP;
 	}
 
+	if (!IS_ALIGNED(res->start + le32_to_cpu(pfn_sb->start_pad),
+				memremap_compat_align())) {
+		dev_err(&nd_pfn->dev, "resource start misaligned\n");
+		return -EOPNOTSUPP;
+	}
+
+	if (!IS_ALIGNED(res->end + 1 - le32_to_cpu(pfn_sb->end_trunc),
+				memremap_compat_align())) {
+		dev_err(&nd_pfn->dev, "resource end misaligned\n");
+		return -EOPNOTSUPP;
+	}
+
 	return 0;
 }
 EXPORT_SYMBOL(nd_pfn_validate);
@@ -754,7 +768,19 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 	start = nsio->res.start;
 	size = resource_size(&nsio->res);
 	npfns = PHYS_PFN(size - SZ_8K);
-	align = max(nd_pfn->align, SUBSECTION_SIZE);
+	align = max(nd_pfn->align, memremap_compat_align());
+
+	/*
+	 * When @start is misaligned fail namespace creation. See
+	 * the 'struct nd_pfn_sb' commentary on why ->start_pad is not
+	 * an option.
+	 */
+	if (!IS_ALIGNED(start, memremap_compat_align())) {
+		dev_err(&nd_pfn->dev, "%s: start %pa misaligned to %#lx\n",
+				dev_name(&ndns->dev), &start,
+				memremap_compat_align());
+		return -EINVAL;
+	}
 	end_trunc = start + size - ALIGN_DOWN(start + size, align);
 	if (nd_pfn->mode == PFN_MODE_PMEM) {
 		/*


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [5.4-stable PATCH 6/7] libnvdimm/region: Introduce NDD_LABELING
  2020-05-21 23:37 [5.4-stable PATCH 0/7] libnvdimm: Cross-arch compatible namespace alignment Dan Williams
                   ` (4 preceding siblings ...)
  2020-05-21 23:38 ` [5.4-stable PATCH 5/7] libnvdimm/namespace: Enforce memremap_compat_align() Dan Williams
@ 2020-05-21 23:38 ` Dan Williams
  2020-05-21 23:38 ` [5.4-stable PATCH 7/7] libnvdimm/region: Introduce an 'align' attribute Dan Williams
  2020-05-22 11:58 ` [5.4-stable PATCH 0/7] libnvdimm: Cross-arch compatible namespace alignment Greg KH
  7 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2020-05-21 23:38 UTC (permalink / raw)
  To: stable
  Cc: Vishal Verma, Oliver O'Halloran, Jeff Moyer,
	Aneesh Kumar K.V, hch, linux-nvdimm

Commit a0e374525def2ef18a078523e1faefb5ce2b05e5 upstream.

The NDD_ALIASING flag is used to indicate where pmem capacity might
alias with blk capacity and require labeling. It is also used to
indicate whether the DIMM supports labeling. Separate this latter
capability into its own flag so that the NDD_ALIASING flag is scoped to
true aliased configurations.

To my knowledge aliased configurations only exist in the ACPI spec,
there are no known platforms that ship this support in production.

This clarity allows namespace-capacity alignment constraints around
interleave-ways to be relaxed.

Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Link: https://lore.kernel.org/r/158041477856.3889308.4212605617834097674.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/powerpc/platforms/pseries/papr_scm.c |    2 +-
 drivers/acpi/nfit/core.c                  |    4 +++-
 drivers/nvdimm/dimm.c                     |    2 +-
 drivers/nvdimm/dimm_devs.c                |    9 +++++----
 drivers/nvdimm/namespace_devs.c           |    2 +-
 drivers/nvdimm/nd.h                       |    2 +-
 drivers/nvdimm/region_devs.c              |   10 +++++-----
 include/linux/libnvdimm.h                 |    2 ++
 8 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 66fd517c4816..d3cf791bc39f 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -347,7 +347,7 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
 	}
 
 	dimm_flags = 0;
-	set_bit(NDD_ALIASING, &dimm_flags);
+	set_bit(NDD_LABELING, &dimm_flags);
 
 	p->nvdimm = nvdimm_create(p->bus, p, papr_scm_dimm_groups,
 				dimm_flags, PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 12d980aafc5f..702105b17dda 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2030,8 +2030,10 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *acpi_desc)
 			continue;
 		}
 
-		if (nfit_mem->bdw && nfit_mem->memdev_pmem)
+		if (nfit_mem->bdw && nfit_mem->memdev_pmem) {
 			set_bit(NDD_ALIASING, &flags);
+			set_bit(NDD_LABELING, &flags);
+		}
 
 		/* collate flags across all memdevs for this dimm */
 		list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
diff --git a/drivers/nvdimm/dimm.c b/drivers/nvdimm/dimm.c
index 64776ed15bb3..7d4ddc4d9322 100644
--- a/drivers/nvdimm/dimm.c
+++ b/drivers/nvdimm/dimm.c
@@ -99,7 +99,7 @@ static int nvdimm_probe(struct device *dev)
 	if (ndd->ns_current >= 0) {
 		rc = nd_label_reserve_dpa(ndd);
 		if (rc == 0)
-			nvdimm_set_aliasing(dev);
+			nvdimm_set_labeling(dev);
 	}
 	nvdimm_bus_unlock(dev);
 
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index 196aa44c4936..def5c2846bea 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -32,7 +32,7 @@ int nvdimm_check_config_data(struct device *dev)
 
 	if (!nvdimm->cmd_mask ||
 	    !test_bit(ND_CMD_GET_CONFIG_DATA, &nvdimm->cmd_mask)) {
-		if (test_bit(NDD_ALIASING, &nvdimm->flags))
+		if (test_bit(NDD_LABELING, &nvdimm->flags))
 			return -ENXIO;
 		else
 			return -ENOTTY;
@@ -173,11 +173,11 @@ int nvdimm_set_config_data(struct nvdimm_drvdata *ndd, size_t offset,
 	return rc;
 }
 
-void nvdimm_set_aliasing(struct device *dev)
+void nvdimm_set_labeling(struct device *dev)
 {
 	struct nvdimm *nvdimm = to_nvdimm(dev);
 
-	set_bit(NDD_ALIASING, &nvdimm->flags);
+	set_bit(NDD_LABELING, &nvdimm->flags);
 }
 
 void nvdimm_set_locked(struct device *dev)
@@ -322,8 +322,9 @@ static ssize_t flags_show(struct device *dev,
 {
 	struct nvdimm *nvdimm = to_nvdimm(dev);
 
-	return sprintf(buf, "%s%s\n",
+	return sprintf(buf, "%s%s%s\n",
 			test_bit(NDD_ALIASING, &nvdimm->flags) ? "alias " : "",
+			test_bit(NDD_LABELING, &nvdimm->flags) ? "label " : "",
 			test_bit(NDD_LOCKED, &nvdimm->flags) ? "lock " : "");
 }
 static DEVICE_ATTR_RO(flags);
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index b2db15250e0d..8cf1c8932a3b 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -2519,7 +2519,7 @@ static int init_active_labels(struct nd_region *nd_region)
 		if (!ndd) {
 			if (test_bit(NDD_LOCKED, &nvdimm->flags))
 				/* fail, label data may be unreadable */;
-			else if (test_bit(NDD_ALIASING, &nvdimm->flags))
+			else if (test_bit(NDD_LABELING, &nvdimm->flags))
 				/* fail, labels needed to disambiguate dpa */;
 			else
 				return 0;
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index ee5c04070ef9..4a946c019e0f 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -244,7 +244,7 @@ int nvdimm_set_config_data(struct nvdimm_drvdata *ndd, size_t offset,
 		void *buf, size_t len);
 long nvdimm_clear_poison(struct device *dev, phys_addr_t phys,
 		unsigned int len);
-void nvdimm_set_aliasing(struct device *dev);
+void nvdimm_set_labeling(struct device *dev);
 void nvdimm_set_locked(struct device *dev);
 void nvdimm_clear_locked(struct device *dev);
 int nvdimm_security_setup_events(struct device *dev);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index ef423ba1a711..b1b13debffbc 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -225,16 +225,16 @@ EXPORT_SYMBOL_GPL(nd_blk_region_set_provider_data);
 int nd_region_to_nstype(struct nd_region *nd_region)
 {
 	if (is_memory(&nd_region->dev)) {
-		u16 i, alias;
+		u16 i, label;
 
-		for (i = 0, alias = 0; i < nd_region->ndr_mappings; i++) {
+		for (i = 0, label = 0; i < nd_region->ndr_mappings; i++) {
 			struct nd_mapping *nd_mapping = &nd_region->mapping[i];
 			struct nvdimm *nvdimm = nd_mapping->nvdimm;
 
-			if (test_bit(NDD_ALIASING, &nvdimm->flags))
-				alias++;
+			if (test_bit(NDD_LABELING, &nvdimm->flags))
+				label++;
 		}
-		if (alias)
+		if (label)
 			return ND_DEVICE_NAMESPACE_PMEM;
 		else
 			return ND_DEVICE_NAMESPACE_IO;
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index b6eddf912568..aa4b79d68d79 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -37,6 +37,8 @@ enum {
 	NDD_WORK_PENDING = 4,
 	/* ignore / filter NSLABEL_FLAG_LOCAL for this DIMM, i.e. no aliasing */
 	NDD_NOBLK = 5,
+	/* dimm supports namespace labels */
+	NDD_LABELING = 6,
 
 	/* need to set a limit somewhere, but yes, this is likely overkill */
 	ND_IOCTL_MAX_BUFLEN = SZ_4M,


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [5.4-stable PATCH 7/7] libnvdimm/region: Introduce an 'align' attribute
  2020-05-21 23:37 [5.4-stable PATCH 0/7] libnvdimm: Cross-arch compatible namespace alignment Dan Williams
                   ` (5 preceding siblings ...)
  2020-05-21 23:38 ` [5.4-stable PATCH 6/7] libnvdimm/region: Introduce NDD_LABELING Dan Williams
@ 2020-05-21 23:38 ` Dan Williams
  2020-05-22 11:58 ` [5.4-stable PATCH 0/7] libnvdimm: Cross-arch compatible namespace alignment Greg KH
  7 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2020-05-21 23:38 UTC (permalink / raw)
  To: stable
  Cc: Aneesh Kumar K.V, Jeff Moyer, Aneesh Kumar K.V, Jeff Moyer,
	vishal.l.verma, hch, linux-nvdimm

Commit 2522afb86a8cceba0f67dbf05772d21b76d79f06 upstream.

The align attribute applies an alignment constraint for namespace
creation in a region. Whereas the 'align' attribute of a namespace
applied alignment padding via an info block, the 'align' attribute
applies alignment constraints to the free space allocation.

The default for 'align' is the maximum known memremap_compat_align()
across all archs (16MiB from PowerPC at time of writing) multiplied by
the number of interleave ways if there is blk-aliasing. The minimum is
PAGE_SIZE and allows for the creation of cross-arch incompatible
namespaces, just as previous kernels allowed, but the expectation is
cross-arch and mode-independent compatibility by default.

The regression risk with this change is limited to cases that were
dependent on the ability to create unaligned namespaces, *and* for some
reason are unable to opt-out of aligned namespaces by writing to
'regionX/align'. If such a scenario arises the default can be flipped
from opt-out to opt-in of compat-aligned namespace creation, but that is
a last resort. The kernel will otherwise continue to support existing
defined misaligned namespaces.

Unfortunately this change needs to touch several parts of the
implementation at once:

- region/available_size: expand busy extents to current align
- region/max_available_extent: expand busy extents to current align
- namespace/size: trim free space to current align

...to keep the free space accounting conforming to the dynamic align
setting.

Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Link: https://lore.kernel.org/r/158041478371.3889308.14542630147672668068.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/dimm_devs.c      |   86 +++++++++++++++++++++++-----
 drivers/nvdimm/namespace_devs.c |    9 ++-
 drivers/nvdimm/nd.h             |    1 
 drivers/nvdimm/region_devs.c    |  120 ++++++++++++++++++++++++++++++++++++---
 4 files changed, 190 insertions(+), 26 deletions(-)

diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index def5c2846bea..8c773fb60296 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -557,6 +557,21 @@ int nvdimm_security_freeze(struct nvdimm *nvdimm)
 	return rc;
 }
 
+static unsigned long dpa_align(struct nd_region *nd_region)
+{
+	struct device *dev = &nd_region->dev;
+
+	if (dev_WARN_ONCE(dev, !is_nvdimm_bus_locked(dev),
+				"bus lock required for capacity provision\n"))
+		return 0;
+	if (dev_WARN_ONCE(dev, !nd_region->ndr_mappings || nd_region->align
+				% nd_region->ndr_mappings,
+				"invalid region align %#lx mappings: %d\n",
+				nd_region->align, nd_region->ndr_mappings))
+		return 0;
+	return nd_region->align / nd_region->ndr_mappings;
+}
+
 int alias_dpa_busy(struct device *dev, void *data)
 {
 	resource_size_t map_end, blk_start, new;
@@ -565,6 +580,7 @@ int alias_dpa_busy(struct device *dev, void *data)
 	struct nd_region *nd_region;
 	struct nvdimm_drvdata *ndd;
 	struct resource *res;
+	unsigned long align;
 	int i;
 
 	if (!is_memory(dev))
@@ -602,13 +618,21 @@ int alias_dpa_busy(struct device *dev, void *data)
 	 * Find the free dpa from the end of the last pmem allocation to
 	 * the end of the interleave-set mapping.
 	 */
+	align = dpa_align(nd_region);
+	if (!align)
+		return 0;
+
 	for_each_dpa_resource(ndd, res) {
+		resource_size_t start, end;
+
 		if (strncmp(res->name, "pmem", 4) != 0)
 			continue;
-		if ((res->start >= blk_start && res->start < map_end)
-				|| (res->end >= blk_start
-					&& res->end <= map_end)) {
-			new = max(blk_start, min(map_end + 1, res->end + 1));
+
+		start = ALIGN_DOWN(res->start, align);
+		end = ALIGN(res->end + 1, align) - 1;
+		if ((start >= blk_start && start < map_end)
+				|| (end >= blk_start && end <= map_end)) {
+			new = max(blk_start, min(map_end, end) + 1);
 			if (new != blk_start) {
 				blk_start = new;
 				goto retry;
@@ -648,6 +672,7 @@ resource_size_t nd_blk_available_dpa(struct nd_region *nd_region)
 		.res = NULL,
 	};
 	struct resource *res;
+	unsigned long align;
 
 	if (!ndd)
 		return 0;
@@ -655,10 +680,20 @@ resource_size_t nd_blk_available_dpa(struct nd_region *nd_region)
 	device_for_each_child(&nvdimm_bus->dev, &info, alias_dpa_busy);
 
 	/* now account for busy blk allocations in unaliased dpa */
+	align = dpa_align(nd_region);
+	if (!align)
+		return 0;
 	for_each_dpa_resource(ndd, res) {
+		resource_size_t start, end, size;
+
 		if (strncmp(res->name, "blk", 3) != 0)
 			continue;
-		info.available -= resource_size(res);
+		start = ALIGN_DOWN(res->start, align);
+		end = ALIGN(res->end + 1, align) - 1;
+		size = end - start + 1;
+		if (size >= info.available)
+			return 0;
+		info.available -= size;
 	}
 
 	return info.available;
@@ -677,19 +712,31 @@ resource_size_t nd_pmem_max_contiguous_dpa(struct nd_region *nd_region,
 	struct nvdimm_bus *nvdimm_bus;
 	resource_size_t max = 0;
 	struct resource *res;
+	unsigned long align;
 
 	/* if a dimm is disabled the available capacity is zero */
 	if (!ndd)
 		return 0;
 
+	align = dpa_align(nd_region);
+	if (!align)
+		return 0;
+
 	nvdimm_bus = walk_to_nvdimm_bus(ndd->dev);
 	if (__reserve_free_pmem(&nd_region->dev, nd_mapping->nvdimm))
 		return 0;
 	for_each_dpa_resource(ndd, res) {
+		resource_size_t start, end;
+
 		if (strcmp(res->name, "pmem-reserve") != 0)
 			continue;
-		if (resource_size(res) > max)
-			max = resource_size(res);
+		/* trim free space relative to current alignment setting */
+		start = ALIGN(res->start, align);
+		end = ALIGN_DOWN(res->end + 1, align) - 1;
+		if (end < start)
+			continue;
+		if (end - start + 1 > max)
+			max = end - start + 1;
 	}
 	release_free_pmem(nvdimm_bus, nd_mapping);
 	return max;
@@ -717,24 +764,33 @@ resource_size_t nd_pmem_available_dpa(struct nd_region *nd_region,
 	struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
 	struct resource *res;
 	const char *reason;
+	unsigned long align;
 
 	if (!ndd)
 		return 0;
 
+	align = dpa_align(nd_region);
+	if (!align)
+		return 0;
+
 	map_start = nd_mapping->start;
 	map_end = map_start + nd_mapping->size - 1;
 	blk_start = max(map_start, map_end + 1 - *overlap);
 	for_each_dpa_resource(ndd, res) {
-		if (res->start >= map_start && res->start < map_end) {
+		resource_size_t start, end;
+
+		start = ALIGN_DOWN(res->start, align);
+		end = ALIGN(res->end + 1, align) - 1;
+		if (start >= map_start && start < map_end) {
 			if (strncmp(res->name, "blk", 3) == 0)
 				blk_start = min(blk_start,
-						max(map_start, res->start));
-			else if (res->end > map_end) {
+						max(map_start, start));
+			else if (end > map_end) {
 				reason = "misaligned to iset";
 				goto err;
 			} else
-				busy += resource_size(res);
-		} else if (res->end >= map_start && res->end <= map_end) {
+				busy += end - start + 1;
+		} else if (end >= map_start && end <= map_end) {
 			if (strncmp(res->name, "blk", 3) == 0) {
 				/*
 				 * If a BLK allocation overlaps the start of
@@ -743,8 +799,8 @@ resource_size_t nd_pmem_available_dpa(struct nd_region *nd_region,
 				 */
 				blk_start = map_start;
 			} else
-				busy += resource_size(res);
-		} else if (map_start > res->start && map_start < res->end) {
+				busy += end - start + 1;
+		} else if (map_start > start && map_start < end) {
 			/* total eclipse of the mapping */
 			busy += nd_mapping->size;
 			blk_start = map_start;
@@ -754,7 +810,7 @@ resource_size_t nd_pmem_available_dpa(struct nd_region *nd_region,
 	*overlap = map_end + 1 - blk_start;
 	available = blk_start - map_start;
 	if (busy < available)
-		return available - busy;
+		return ALIGN_DOWN(available - busy, align);
 	return 0;
 
  err:
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 8cf1c8932a3b..e80de9b9ccce 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -568,6 +568,11 @@ static void space_valid(struct nd_region *nd_region, struct nvdimm_drvdata *ndd,
 {
 	bool is_reserve = strcmp(label_id->id, "pmem-reserve") == 0;
 	bool is_pmem = strncmp(label_id->id, "pmem", 4) == 0;
+	unsigned long align;
+
+	align = nd_region->align / nd_region->ndr_mappings;
+	valid->start = ALIGN(valid->start, align);
+	valid->end = ALIGN_DOWN(valid->end + 1, align) - 1;
 
 	if (valid->start >= valid->end)
 		goto invalid;
@@ -1007,10 +1012,10 @@ static ssize_t __size_store(struct device *dev, unsigned long long val)
 		return -ENXIO;
 	}
 
-	div_u64_rem(val, PAGE_SIZE * nd_region->ndr_mappings, &remainder);
+	div_u64_rem(val, nd_region->align, &remainder);
 	if (remainder) {
 		dev_dbg(dev, "%llu is not %ldK aligned\n", val,
-				(PAGE_SIZE * nd_region->ndr_mappings) / SZ_1K);
+				nd_region->align / SZ_1K);
 		return -EINVAL;
 	}
 
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 4a946c019e0f..e37c79c340d6 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -146,6 +146,7 @@ struct nd_region {
 	struct device *btt_seed;
 	struct device *pfn_seed;
 	struct device *dax_seed;
+	unsigned long align;
 	u16 ndr_mappings;
 	u64 ndr_size;
 	u64 ndr_start;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index b1b13debffbc..7d5ab00c7b45 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -246,21 +246,25 @@ int nd_region_to_nstype(struct nd_region *nd_region)
 }
 EXPORT_SYMBOL(nd_region_to_nstype);
 
-static ssize_t size_show(struct device *dev,
-		struct device_attribute *attr, char *buf)
+static unsigned long long region_size(struct nd_region *nd_region)
 {
-	struct nd_region *nd_region = to_nd_region(dev);
-	unsigned long long size = 0;
-
-	if (is_memory(dev)) {
-		size = nd_region->ndr_size;
+	if (is_memory(&nd_region->dev)) {
+		return nd_region->ndr_size;
 	} else if (nd_region->ndr_mappings == 1) {
 		struct nd_mapping *nd_mapping = &nd_region->mapping[0];
 
-		size = nd_mapping->size;
+		return nd_mapping->size;
 	}
 
-	return sprintf(buf, "%llu\n", size);
+	return 0;
+}
+
+static ssize_t size_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct nd_region *nd_region = to_nd_region(dev);
+
+	return sprintf(buf, "%llu\n", region_size(nd_region));
 }
 static DEVICE_ATTR_RO(size);
 
@@ -559,6 +563,54 @@ static ssize_t read_only_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(read_only);
 
+static ssize_t align_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct nd_region *nd_region = to_nd_region(dev);
+
+	return sprintf(buf, "%#lx\n", nd_region->align);
+}
+
+static ssize_t align_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
+{
+	struct nd_region *nd_region = to_nd_region(dev);
+	unsigned long val, dpa;
+	u32 remainder;
+	int rc;
+
+	rc = kstrtoul(buf, 0, &val);
+	if (rc)
+		return rc;
+
+	if (!nd_region->ndr_mappings)
+		return -ENXIO;
+
+	/*
+	 * Ensure space-align is evenly divisible by the region
+	 * interleave-width because the kernel typically has no facility
+	 * to determine which DIMM(s), dimm-physical-addresses, would
+	 * contribute to the tail capacity in system-physical-address
+	 * space for the namespace.
+	 */
+	dpa = div_u64_rem(val, nd_region->ndr_mappings, &remainder);
+	if (!is_power_of_2(dpa) || dpa < PAGE_SIZE
+			|| val > region_size(nd_region) || remainder)
+		return -EINVAL;
+
+	/*
+	 * Given that space allocation consults this value multiple
+	 * times ensure it does not change for the duration of the
+	 * allocation.
+	 */
+	nvdimm_bus_lock(dev);
+	nd_region->align = val;
+	nvdimm_bus_unlock(dev);
+
+	return len;
+}
+static DEVICE_ATTR_RW(align);
+
 static ssize_t region_badblocks_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
@@ -601,6 +653,7 @@ static DEVICE_ATTR_RO(persistence_domain);
 
 static struct attribute *nd_region_attributes[] = {
 	&dev_attr_size.attr,
+	&dev_attr_align.attr,
 	&dev_attr_nstype.attr,
 	&dev_attr_mappings.attr,
 	&dev_attr_btt_seed.attr,
@@ -660,6 +713,19 @@ static umode_t region_visible(struct kobject *kobj, struct attribute *a, int n)
 		return a->mode;
 	}
 
+	if (a == &dev_attr_align.attr) {
+		int i;
+
+		for (i = 0; i < nd_region->ndr_mappings; i++) {
+			struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+			struct nvdimm *nvdimm = nd_mapping->nvdimm;
+
+			if (test_bit(NDD_LABELING, &nvdimm->flags))
+				return a->mode;
+		}
+		return 0;
+	}
+
 	if (a != &dev_attr_set_cookie.attr
 			&& a != &dev_attr_available_size.attr)
 		return a->mode;
@@ -930,6 +996,41 @@ void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane)
 }
 EXPORT_SYMBOL(nd_region_release_lane);
 
+/*
+ * PowerPC requires this alignment for memremap_pages(). All other archs
+ * should be ok with SUBSECTION_SIZE (see memremap_compat_align()).
+ */
+#define MEMREMAP_COMPAT_ALIGN_MAX SZ_16M
+
+static unsigned long default_align(struct nd_region *nd_region)
+{
+	unsigned long align;
+	int i, mappings;
+	u32 remainder;
+
+	if (is_nd_blk(&nd_region->dev))
+		align = PAGE_SIZE;
+	else
+		align = MEMREMAP_COMPAT_ALIGN_MAX;
+
+	for (i = 0; i < nd_region->ndr_mappings; i++) {
+		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+		struct nvdimm *nvdimm = nd_mapping->nvdimm;
+
+		if (test_bit(NDD_ALIASING, &nvdimm->flags)) {
+			align = MEMREMAP_COMPAT_ALIGN_MAX;
+			break;
+		}
+	}
+
+	mappings = max_t(u16, 1, nd_region->ndr_mappings);
+	div_u64_rem(align, mappings, &remainder);
+	if (remainder)
+		align *= mappings;
+
+	return align;
+}
+
 static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
 		struct nd_region_desc *ndr_desc, struct device_type *dev_type,
 		const char *caller)
@@ -1034,6 +1135,7 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
 	dev->of_node = ndr_desc->of_node;
 	nd_region->ndr_size = resource_size(ndr_desc->res);
 	nd_region->ndr_start = ndr_desc->res->start;
+	nd_region->align = default_align(nd_region);
 	if (ndr_desc->flush)
 		nd_region->flush = ndr_desc->flush;
 	else


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [5.4-stable PATCH 0/7] libnvdimm: Cross-arch compatible namespace alignment
  2020-05-21 23:37 [5.4-stable PATCH 0/7] libnvdimm: Cross-arch compatible namespace alignment Dan Williams
                   ` (6 preceding siblings ...)
  2020-05-21 23:38 ` [5.4-stable PATCH 7/7] libnvdimm/region: Introduce an 'align' attribute Dan Williams
@ 2020-05-22 11:58 ` Greg KH
  2020-05-22 12:00   ` Greg KH
  7 siblings, 1 reply; 13+ messages in thread
From: Greg KH @ 2020-05-22 11:58 UTC (permalink / raw)
  To: Dan Williams
  Cc: stable, Oliver O'Halloran, Vishal Verma,
	Benjamin Herrenschmidt, Michael Ellerman, Paul Mackerras,
	Christoph Hellwig, Jeff Moyer, Aneesh Kumar K.V, linux-nvdimm

On Thu, May 21, 2020 at 04:37:43PM -0700, Dan Williams wrote:
> Hello stable team,
> 
> These patches have been shipping in mainline since v5.7-rc1 with no
> reported issues. They address long standing problems in libnvdimm's
> handling of namespace provisioning relative to alignment constraints
> including crashes trying to even load the driver on some PowerPC
> configurations.
> 
> I did fold one build fix [1] into "libnvdimm/region: Introduce an 'align'
> attribute" so as to not convey the bisection breakage to -stable.
> 
> Please consider them for v5.4-stable. They do pass the latest
> version of the ndctl unit tests.

What about 5.6.y?  Any user upgrading from 5.4-stable to 5.6-stable
would hit a regression, right?

So can we get a series backported to 5.6.y as well?  I need that before
I can take this series.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [5.4-stable PATCH 0/7] libnvdimm: Cross-arch compatible namespace alignment
  2020-05-22 11:58 ` [5.4-stable PATCH 0/7] libnvdimm: Cross-arch compatible namespace alignment Greg KH
@ 2020-05-22 12:00   ` Greg KH
  2020-05-26 20:42     ` Dan Williams
  0 siblings, 1 reply; 13+ messages in thread
From: Greg KH @ 2020-05-22 12:00 UTC (permalink / raw)
  To: Dan Williams
  Cc: stable, Oliver O'Halloran, Vishal Verma,
	Benjamin Herrenschmidt, Michael Ellerman, Paul Mackerras,
	Christoph Hellwig, Jeff Moyer, Aneesh Kumar K.V, linux-nvdimm

On Fri, May 22, 2020 at 01:58:00PM +0200, Greg KH wrote:
> On Thu, May 21, 2020 at 04:37:43PM -0700, Dan Williams wrote:
> > Hello stable team,
> > 
> > These patches have been shipping in mainline since v5.7-rc1 with no
> > reported issues. They address long standing problems in libnvdimm's
> > handling of namespace provisioning relative to alignment constraints
> > including crashes trying to even load the driver on some PowerPC
> > configurations.
> > 
> > I did fold one build fix [1] into "libnvdimm/region: Introduce an 'align'
> > attribute" so as to not convey the bisection breakage to -stable.
> > 
> > Please consider them for v5.4-stable. They do pass the latest
> > version of the ndctl unit tests.
> 
> What about 5.6.y?  Any user upgrading from 5.4-stable to 5.6-stable
> would hit a regression, right?
> 
> So can we get a series backported to 5.6.y as well?  I need that before
> I can take this series.

Also, I really don't see the "bug" that this is fixing here.  If this
didn't work on PowerPC before, it can continue to just "not work" until
5.7, right?  What problems with 5.4.y and 5.6.y is this series fixing
that used to work before?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [5.4-stable PATCH 0/7] libnvdimm: Cross-arch compatible namespace alignment
  2020-05-22 12:00   ` Greg KH
@ 2020-05-26 20:42     ` Dan Williams
  2020-05-26 20:49       ` Jeff Moyer
  0 siblings, 1 reply; 13+ messages in thread
From: Dan Williams @ 2020-05-26 20:42 UTC (permalink / raw)
  To: Greg KH
  Cc: stable, Oliver O'Halloran, Vishal Verma,
	Benjamin Herrenschmidt, Michael Ellerman, Paul Mackerras,
	Christoph Hellwig, Jeff Moyer, Aneesh Kumar K.V, linux-nvdimm

On Fri, May 22, 2020 at 5:00 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Fri, May 22, 2020 at 01:58:00PM +0200, Greg KH wrote:
> > On Thu, May 21, 2020 at 04:37:43PM -0700, Dan Williams wrote:
> > > Hello stable team,
> > >
> > > These patches have been shipping in mainline since v5.7-rc1 with no
> > > reported issues. They address long standing problems in libnvdimm's
> > > handling of namespace provisioning relative to alignment constraints
> > > including crashes trying to even load the driver on some PowerPC
> > > configurations.
> > >
> > > I did fold one build fix [1] into "libnvdimm/region: Introduce an 'align'
> > > attribute" so as to not convey the bisection breakage to -stable.
> > >
> > > Please consider them for v5.4-stable. They do pass the latest
> > > version of the ndctl unit tests.
> >
> > What about 5.6.y?  Any user upgrading from 5.4-stable to 5.6-stable
> > would hit a regression, right?
> >
> > So can we get a series backported to 5.6.y as well?  I need that before
> > I can take this series.

Yes, should be the exact same set, but I will run the regression suite
to be sure.

> Also, I really don't see the "bug" that this is fixing here.  If this
> didn't work on PowerPC before, it can continue to just "not work" until
> 5.7, right?

There's a mix of "never worked" and "used to work" in this set. The
PowerPC case is indeed a "never worked", but I highlighted it as it
was the simplest to understand.

> What problems with 5.4.y and 5.6.y is this series fixing
> that used to work before?

The "used to work" bug fixed by this set is the fact that the kernel
used to force a 128MB (memory hotplug section size) alignment padding
on all persistent memory namespaces to enable DAX operation. The
support for sub-sections (2MB) dropped forced alignment padding, but
unfortunately introduced a regression for the case of trying to create
multiple unaligned namespaces. When that bug triggers namespace
creation for the region is disabled, iirc, previously that lockout
scenario was prevented.

Jeff, can you corroborate this?

I otherwise agree, if the above never worked then this can all wait
for v5.7 upgrades.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [5.4-stable PATCH 0/7] libnvdimm: Cross-arch compatible namespace alignment
  2020-05-26 20:42     ` Dan Williams
@ 2020-05-26 20:49       ` Jeff Moyer
  2020-05-26 20:52         ` Dan Williams
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Moyer @ 2020-05-26 20:49 UTC (permalink / raw)
  To: Dan Williams
  Cc: Greg KH, stable, Oliver O'Halloran, Vishal Verma,
	Benjamin Herrenschmidt, Michael Ellerman, Paul Mackerras,
	Christoph Hellwig, Aneesh Kumar K.V, linux-nvdimm

Dan Williams <dan.j.williams@intel.com> writes:

>> What problems with 5.4.y and 5.6.y is this series fixing
>> that used to work before?
>
> The "used to work" bug fixed by this set is the fact that the kernel
> used to force a 128MB (memory hotplug section size) alignment padding
> on all persistent memory namespaces to enable DAX operation. The
> support for sub-sections (2MB) dropped forced alignment padding, but
> unfortunately introduced a regression for the case of trying to create
> multiple unaligned namespaces. When that bug triggers namespace
> creation for the region is disabled, iirc, previously that lockout
> scenario was prevented.
>
> Jeff, can you corroborate this?

So, I don't pretend to remember the exact state of brokenness for each
iteration.  :)  As far as I can recall, though, the issue you describe
with a misaligned namespace preventing further namespace creation was
present in all kernels up until it was finally fixed.

> I otherwise agree, if the above never worked then this can all wait
> for v5.7 upgrades.

I can test specific kernel versions if that would help out.

Cheers,
Jeff


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [5.4-stable PATCH 0/7] libnvdimm: Cross-arch compatible namespace alignment
  2020-05-26 20:49       ` Jeff Moyer
@ 2020-05-26 20:52         ` Dan Williams
  0 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2020-05-26 20:52 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Greg KH, stable, Oliver O'Halloran, Vishal Verma,
	Benjamin Herrenschmidt, Michael Ellerman, Paul Mackerras,
	Christoph Hellwig, Aneesh Kumar K.V, linux-nvdimm

On Tue, May 26, 2020 at 1:49 PM Jeff Moyer <jmoyer@redhat.com> wrote:
>
> Dan Williams <dan.j.williams@intel.com> writes:
>
> >> What problems with 5.4.y and 5.6.y is this series fixing
> >> that used to work before?
> >
> > The "used to work" bug fixed by this set is the fact that the kernel
> > used to force a 128MB (memory hotplug section size) alignment padding
> > on all persistent memory namespaces to enable DAX operation. The
> > support for sub-sections (2MB) dropped forced alignment padding, but
> > unfortunately introduced a regression for the case of trying to create
> > multiple unaligned namespaces. When that bug triggers namespace
> > creation for the region is disabled, iirc, previously that lockout
> > scenario was prevented.
> >
> > Jeff, can you corroborate this?
>
> So, I don't pretend to remember the exact state of brokenness for each
> iteration.  :)  As far as I can recall, though, the issue you describe
> with a misaligned namespace preventing further namespace creation was
> present in all kernels up until it was finally fixed.

Well, if it was always there, then there is nothing to fix, and I
misremembered that we went backwards.

> > I otherwise agree, if the above never worked then this can all wait
> > for v5.7 upgrades.
>
> I can test specific kernel versions if that would help out.

Thanks for that offer, but outside of a clear regression I don't think
this meets -stable criteria.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, back to index

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-21 23:37 [5.4-stable PATCH 0/7] libnvdimm: Cross-arch compatible namespace alignment Dan Williams
2020-05-21 23:37 ` [5.4-stable PATCH 1/7] mm/memremap_pages: Kill unused __devm_memremap_pages() Dan Williams
2020-05-21 23:37 ` [5.4-stable PATCH 2/7] mm/memremap_pages: Introduce memremap_compat_align() Dan Williams
2020-05-21 23:37 ` [5.4-stable PATCH 3/7] libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid Dan Williams
2020-05-21 23:38 ` [5.4-stable PATCH 4/7] libnvdimm/pfn_dev: Don't clear device memmap area during generic namespace probe Dan Williams
2020-05-21 23:38 ` [5.4-stable PATCH 5/7] libnvdimm/namespace: Enforce memremap_compat_align() Dan Williams
2020-05-21 23:38 ` [5.4-stable PATCH 6/7] libnvdimm/region: Introduce NDD_LABELING Dan Williams
2020-05-21 23:38 ` [5.4-stable PATCH 7/7] libnvdimm/region: Introduce an 'align' attribute Dan Williams
2020-05-22 11:58 ` [5.4-stable PATCH 0/7] libnvdimm: Cross-arch compatible namespace alignment Greg KH
2020-05-22 12:00   ` Greg KH
2020-05-26 20:42     ` Dan Williams
2020-05-26 20:49       ` Jeff Moyer
2020-05-26 20:52         ` Dan Williams

Stable Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/stable/0 stable/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 stable stable/ https://lore.kernel.org/stable \
		stable@vger.kernel.org
	public-inbox-index stable

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.stable


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git