linux-nvdimm.lists.01.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/6] Validating namespace size and start address attributes.
@ 2020-01-20 14:07 Aneesh Kumar K.V
  2020-01-20 14:07 ` [PATCH v4 1/6] libnvdimm/namespace: Make namespace size validation arch dependent Aneesh Kumar K.V
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2020-01-20 14:07 UTC (permalink / raw)
  To: dan.j.williams, vishal.l.verma, jmoyer; +Cc: linux-nvdimm, Aneesh Kumar K.V

When creating a new namespace, the kernel needs to make sure namespace size and
start address are correctly aligned to SUBSECTION_SIZE. This ensures kernel can
enable/disable namespace without conflicting with memory hotplug rules. Without this,
a namespace that partially covers a SUBSECTION can prevent the creation of an
adjacent namespace because the hotplug subsystem will find the subsection already active.

To make sure new kernel don't break an existing install with an unaligned start/size attribute,
while initializing the namespace the kernel validates these attribute against direct-map
mapping page size rather than subsection size.


Aneesh Kumar K.V (6):
  libnvdimm/namespace: Make namespace size validation arch dependent
  libnvdimm/namespace: Validate namespace start addr and size
  libnvdimm/namespace: Add arch dependent callback for namespace create
    time validation
  libnvdimm/namespace: Validate namespace size when creating a new
    namespace.
  libnvdimm/namespace: Align DPA based on arch restrictions
  libnvdimm/namespace: Expose arch specific supported size align value

 arch/arm64/mm/flush.c           | 13 +++++
 arch/powerpc/lib/pmem.c         | 20 ++++++++
 arch/x86/mm/pageattr.c          | 13 +++++
 drivers/nvdimm/namespace_devs.c | 85 +++++++++++++++++++++++++++++++--
 include/linux/libnvdimm.h       |  2 +
 5 files changed, 128 insertions(+), 5 deletions(-)

-- 
2.24.1
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 1/6] libnvdimm/namespace: Make namespace size validation arch dependent
  2020-01-20 14:07 [PATCH v4 0/6] Validating namespace size and start address attributes Aneesh Kumar K.V
@ 2020-01-20 14:07 ` Aneesh Kumar K.V
  2020-01-24  5:57   ` Dan Williams
  2020-01-20 14:07 ` [PATCH v4 2/6] libnvdimm/namespace: Validate namespace start addr and size Aneesh Kumar K.V
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Aneesh Kumar K.V @ 2020-01-20 14:07 UTC (permalink / raw)
  To: dan.j.williams, vishal.l.verma, jmoyer; +Cc: linux-nvdimm, Aneesh Kumar K.V

The page size used to map the namespace is arch dependent. For example
architectures like ppc64 use 16MB page size for direct-mapping. If the namespace
size is not aligned to the mapping page size, users can observe kernel crash
during namespace init and destroy.

This is due to kernel doing partial map/unmap of the resource range

BUG: Unable to handle kernel data access at 0xc001000406000000
Faulting instruction address: 0xc000000000090790
NIP [c000000000090790] arch_add_memory+0xc0/0x130
LR [c000000000090744] arch_add_memory+0x74/0x130
Call Trace:
 arch_add_memory+0x74/0x130 (unreliable)
 memremap_pages+0x74c/0xa30
 devm_memremap_pages+0x3c/0xa0
 pmem_attach_disk+0x188/0x770
 nvdimm_bus_probe+0xd8/0x470
 really_probe+0x148/0x570
 driver_probe_device+0x19c/0x1d0
 device_driver_attach+0xcc/0x100
 bind_store+0x134/0x1c0
 drv_attr_store+0x44/0x60
 sysfs_kf_write+0x74/0xc0
 kernfs_fop_write+0x1b4/0x290
 __vfs_write+0x3c/0x70
 vfs_write+0xd0/0x260
 ksys_write+0xdc/0x130
 system_call+0x5c/0x68

Kernel should also ensure that namespace size is also mulitple of subsection size.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/arm64/mm/flush.c     | 6 ++++++
 arch/powerpc/lib/pmem.c   | 9 +++++++++
 arch/x86/mm/pageattr.c    | 7 +++++++
 include/linux/libnvdimm.h | 1 +
 4 files changed, 23 insertions(+)

diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index ac485163a4a7..95cb5538bc6e 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -91,4 +91,10 @@ void arch_invalidate_pmem(void *addr, size_t size)
 	__inval_dcache_area(addr, size);
 }
 EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
+
+unsigned long arch_namespace_map_size(void)
+{
+	return PAGE_SIZE;
+}
+EXPORT_SYMBOL_GPL(arch_namespace_map_size);
 #endif
diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 0666a8d29596..63dca24e4a18 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -26,6 +26,15 @@ void arch_invalidate_pmem(void *addr, size_t size)
 }
 EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
 
+unsigned long arch_namespace_map_size(void)
+{
+	if (radix_enabled())
+		return PAGE_SIZE;
+	return (1UL << mmu_psize_defs[mmu_linear_psize].shift);
+
+}
+EXPORT_SYMBOL_GPL(arch_namespace_map_size);
+
 /*
  * CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE symbols
  */
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 1b99ad05b117..d78b5082f376 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -310,6 +310,13 @@ void arch_invalidate_pmem(void *addr, size_t size)
 }
 EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
 
+unsigned long arch_namespace_map_size(void)
+{
+	return PAGE_SIZE;
+}
+EXPORT_SYMBOL_GPL(arch_namespace_map_size);
+
+
 static void __cpa_flush_all(void *arg)
 {
 	unsigned long cache = (unsigned long)arg;
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 9df091bd30ba..a3476dbd2656 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -284,4 +284,5 @@ static inline void arch_invalidate_pmem(void *addr, size_t size)
 }
 #endif
 
+unsigned long arch_namespace_map_size(void);
 #endif /* __LIBNVDIMM_H__ */
-- 
2.24.1
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 2/6] libnvdimm/namespace: Validate namespace start addr and size
  2020-01-20 14:07 [PATCH v4 0/6] Validating namespace size and start address attributes Aneesh Kumar K.V
  2020-01-20 14:07 ` [PATCH v4 1/6] libnvdimm/namespace: Make namespace size validation arch dependent Aneesh Kumar K.V
@ 2020-01-20 14:07 ` Aneesh Kumar K.V
  2020-01-25  1:55   ` Dan Williams
  2020-01-20 14:07 ` [PATCH v4 3/6] libnvdimm/namespace: Add arch dependent callback for namespace create time validation Aneesh Kumar K.V
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Aneesh Kumar K.V @ 2020-01-20 14:07 UTC (permalink / raw)
  To: dan.j.williams, vishal.l.verma, jmoyer; +Cc: linux-nvdimm, Aneesh Kumar K.V

Make sure namespace start addr and size are properly aligned as per architecture
restrictions. If the namespace is not aligned kernel mark the namespace 'disabled'

Architectures like ppc64 use different page size than PAGE_SIZE to map
direct-map address range. The kernel needs to make sure the namespace size is aligned
correctly for the direct-map page size.

kernel log will contain information as below.

[    5.810939] nd_pmem namespace0.1: invalid size/SPA
[    5.810969] nd_pmem: probe of namespace0.1 failed with error -95

and the namespace will be marked 'disabled'

  {
    "dev":"namespace0.1",
    "mode":"fsdax",
    "map":"mem",
    "size":1071644672,
    "uuid":"25577a00-c012-421d-89ca-3ee189e08848",
    "sector_size":512,
    "state":"disabled"
  },

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/nvdimm/namespace_devs.c | 50 +++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 032dc61725ff..0e2c90730ce3 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -1113,6 +1113,51 @@ resource_size_t nvdimm_namespace_capacity(struct nd_namespace_common *ndns)
 }
 EXPORT_SYMBOL(nvdimm_namespace_capacity);
 
+static bool nvdimm_valid_namespace(struct device *dev,
+		struct nd_namespace_common *ndns, resource_size_t size)
+{
+	struct device *ndns_dev = &ndns->dev;
+	struct nd_region *nd_region = to_nd_region(ndns->dev.parent);
+	unsigned long map_size = arch_namespace_map_size();
+	struct resource *res;
+	u32 remainder;
+
+	/*
+	 * Don't validate the start and size for blk namespace type
+	 */
+	if (is_namespace_blk(ndns_dev))
+		return true;
+
+	/*
+	 * For btt and raw namespace kernel use ioremap. Assume both can work
+	 * with PAGE_SIZE alignment.
+	 */
+	if (is_nd_btt(dev) || is_namespace_io(ndns_dev))
+		map_size = PAGE_SIZE;
+
+	div_u64_rem(size, map_size * nd_region->ndr_mappings, &remainder);
+	if (remainder)
+		return false;
+
+	if (is_namespace_pmem(ndns_dev)) {
+		struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(ndns_dev);
+
+		res = &nspm->nsio.res;
+	} else if (is_namespace_io(ndns_dev)) {
+		struct nd_namespace_io *nsio = to_nd_namespace_io(ndns_dev);
+
+		res = &nsio->res;
+	} else
+		/* cannot reach */
+		return false;
+
+	div_u64_rem(res->start, map_size * nd_region->ndr_mappings, &remainder);
+	if (remainder)
+		return false;
+
+	return true;
+}
+
 bool nvdimm_namespace_locked(struct nd_namespace_common *ndns)
 {
 	int i;
@@ -1739,6 +1784,11 @@ struct nd_namespace_common *nvdimm_namespace_common_probe(struct device *dev)
 		return ERR_PTR(-ENODEV);
 	}
 
+	if (!nvdimm_valid_namespace(dev, ndns, size)) {
+		dev_err(&ndns->dev, "invalid size/SPA");
+		return ERR_PTR(-EOPNOTSUPP);
+	}
+
 	if (is_namespace_pmem(&ndns->dev)) {
 		struct nd_namespace_pmem *nspm;
 
-- 
2.24.1
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 3/6] libnvdimm/namespace: Add arch dependent callback for namespace create time validation
  2020-01-20 14:07 [PATCH v4 0/6] Validating namespace size and start address attributes Aneesh Kumar K.V
  2020-01-20 14:07 ` [PATCH v4 1/6] libnvdimm/namespace: Make namespace size validation arch dependent Aneesh Kumar K.V
  2020-01-20 14:07 ` [PATCH v4 2/6] libnvdimm/namespace: Validate namespace start addr and size Aneesh Kumar K.V
@ 2020-01-20 14:07 ` Aneesh Kumar K.V
  2020-01-25  1:59   ` Dan Williams
  2020-01-20 14:07 ` [PATCH v4 4/6] libnvdimm/namespace: Validate namespace size when creating a new namespace Aneesh Kumar K.V
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Aneesh Kumar K.V @ 2020-01-20 14:07 UTC (permalink / raw)
  To: dan.j.williams, vishal.l.verma, jmoyer; +Cc: linux-nvdimm, Aneesh Kumar K.V

Namespace start address and size should be multiple of subsection size to avoid
handling complex partial subsection addition and removal of memory. Even though
subsection size is arch independent (2M), architectures do impose further
restrictions w.r.t namespace size/start addr based on direct-mapping  page size.

This patch adds an arch callback for supporting arch specific restrictions w.r.t
namespace size. This is different from arch_namespace_map_size() in that this
validates against SUBSECTION_SIZE. Ideally, kernel should use the same restrictions
during namespace initialization too. But that prevents an existing unaligned
namespace initialization to fail. The kernel now allows such namespace initialization
so that an existing installation is not broken.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/arm64/mm/flush.c     |  7 +++++++
 arch/powerpc/lib/pmem.c   | 11 +++++++++++
 arch/x86/mm/pageattr.c    |  6 ++++++
 include/linux/libnvdimm.h |  1 +
 4 files changed, 25 insertions(+)

diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index 95cb5538bc6e..32b4ba57cc95 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -97,4 +97,11 @@ unsigned long arch_namespace_map_size(void)
 	return PAGE_SIZE;
 }
 EXPORT_SYMBOL_GPL(arch_namespace_map_size);
+
+unsigned long arch_namespace_align_size(void)
+{
+	return (1UL << SUBSECTION_SHIFT);
+}
+EXPORT_SYMBOL_GPL(arch_namespace_align_size);
+
 #endif
diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 63dca24e4a18..cdf4248c536c 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -35,6 +35,17 @@ unsigned long arch_namespace_map_size(void)
 }
 EXPORT_SYMBOL_GPL(arch_namespace_map_size);
 
+unsigned long arch_namespace_align_size(void)
+{
+	unsigned long sub_section_size = (1UL << SUBSECTION_SHIFT);
+
+	if (radix_enabled())
+		return sub_section_size;
+	return max(sub_section_size, (1UL << mmu_psize_defs[mmu_linear_psize].shift));
+
+}
+EXPORT_SYMBOL_GPL(arch_namespace_align_size);
+
 /*
  * CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE symbols
  */
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index d78b5082f376..1dea3e822e1a 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -316,6 +316,12 @@ unsigned long arch_namespace_map_size(void)
 }
 EXPORT_SYMBOL_GPL(arch_namespace_map_size);
 
+unsigned long arch_namespace_align_size(void)
+{
+	return (1UL << SUBSECTION_SHIFT);
+}
+EXPORT_SYMBOL_GPL(arch_namespace_align_size);
+
 
 static void __cpa_flush_all(void *arg)
 {
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index a3476dbd2656..0f366706b0aa 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -285,4 +285,5 @@ static inline void arch_invalidate_pmem(void *addr, size_t size)
 #endif
 
 unsigned long arch_namespace_map_size(void);
+unsigned long arch_namespace_align_size(void);
 #endif /* __LIBNVDIMM_H__ */
-- 
2.24.1
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 4/6] libnvdimm/namespace: Validate namespace size when creating a new namespace.
  2020-01-20 14:07 [PATCH v4 0/6] Validating namespace size and start address attributes Aneesh Kumar K.V
                   ` (2 preceding siblings ...)
  2020-01-20 14:07 ` [PATCH v4 3/6] libnvdimm/namespace: Add arch dependent callback for namespace create time validation Aneesh Kumar K.V
@ 2020-01-20 14:07 ` Aneesh Kumar K.V
  2020-01-25  2:22   ` Dan Williams
  2020-01-20 14:07 ` [PATCH v4 5/6] libnvdimm/namespace: Align DPA based on arch restrictions Aneesh Kumar K.V
  2020-01-20 14:07 ` [PATCH v4 6/6] libnvdimm/namespace: Expose arch specific supported size align value Aneesh Kumar K.V
  5 siblings, 1 reply; 17+ messages in thread
From: Aneesh Kumar K.V @ 2020-01-20 14:07 UTC (permalink / raw)
  To: dan.j.williams, vishal.l.verma, jmoyer; +Cc: linux-nvdimm, Aneesh Kumar K.V

Kernel should validate the namespace size against SUBSECTION_SIZE rather than
PAGE_SIZE. blk namespace continues to use old rule so that the new kernel don't
break the creation of blk namespaces.

Use the new helper arch_namespace_align_size() so that architecture specific
restrictions are also taken care of.

kernel log will contain the below details
[  939.620064] nd namespace0.3: 1071644672 is not 16384K aligned

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/nvdimm/namespace_devs.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 0e2c90730ce3..e318566ae135 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -891,6 +891,17 @@ static int grow_dpa_allocation(struct nd_region *nd_region,
 	return 0;
 }
 
+static unsigned long nvdimm_validate_namespace_size(struct nd_region *nd_region,
+		unsigned long size, unsigned long align_size)
+{
+	u32 remainder;
+
+	div_u64_rem(size, align_size * nd_region->ndr_mappings, &remainder);
+	if (remainder)
+		return align_size * nd_region->ndr_mappings;
+	return 0;
+}
+
 static void nd_namespace_pmem_set_resource(struct nd_region *nd_region,
 		struct nd_namespace_pmem *nspm, resource_size_t size)
 {
@@ -945,11 +956,13 @@ static ssize_t __size_store(struct device *dev, unsigned long long val)
 {
 	resource_size_t allocated = 0, available = 0;
 	struct nd_region *nd_region = to_nd_region(dev->parent);
+	unsigned long align_size = arch_namespace_align_size();
 	struct nd_namespace_common *ndns = to_ndns(dev);
 	struct nd_mapping *nd_mapping;
 	struct nvdimm_drvdata *ndd;
 	struct nd_label_id label_id;
-	u32 flags = 0, remainder;
+	unsigned long map_size;
+	u32 flags = 0;
 	int rc, i, id = -1;
 	u8 *uuid = NULL;
 
@@ -967,6 +980,8 @@ static ssize_t __size_store(struct device *dev, unsigned long long val)
 		uuid = nsblk->uuid;
 		flags = NSLABEL_FLAG_LOCAL;
 		id = nsblk->id;
+		/* Let's not break blk region */
+		align_size = PAGE_SIZE;
 	}
 
 	/*
@@ -980,10 +995,9 @@ static ssize_t __size_store(struct device *dev, unsigned long long val)
 		return -ENXIO;
 	}
 
-	div_u64_rem(val, PAGE_SIZE * nd_region->ndr_mappings, &remainder);
-	if (remainder) {
-		dev_dbg(dev, "%llu is not %ldK aligned\n", val,
-				(PAGE_SIZE * nd_region->ndr_mappings) / SZ_1K);
+	map_size = nvdimm_validate_namespace_size(nd_region, val, align_size);
+	if (map_size) {
+		dev_err(dev, "%llu is not %ldK aligned\n", val, map_size / SZ_1K);
 		return -EINVAL;
 	}
 
-- 
2.24.1
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 5/6] libnvdimm/namespace: Align DPA based on arch restrictions
  2020-01-20 14:07 [PATCH v4 0/6] Validating namespace size and start address attributes Aneesh Kumar K.V
                   ` (3 preceding siblings ...)
  2020-01-20 14:07 ` [PATCH v4 4/6] libnvdimm/namespace: Validate namespace size when creating a new namespace Aneesh Kumar K.V
@ 2020-01-20 14:07 ` Aneesh Kumar K.V
  2020-01-20 14:07 ` [PATCH v4 6/6] libnvdimm/namespace: Expose arch specific supported size align value Aneesh Kumar K.V
  5 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2020-01-20 14:07 UTC (permalink / raw)
  To: dan.j.williams, vishal.l.verma, jmoyer; +Cc: linux-nvdimm, Aneesh Kumar K.V

When creating new namespace make sure DPA address are properly aligned based
on arch restrictions.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/nvdimm/namespace_devs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index e318566ae135..59169d914b86 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -563,6 +563,8 @@ static void space_valid(struct nd_region *nd_region, struct nvdimm_drvdata *ndd,
 		return;
 	}
 
+	valid->start = ALIGN(valid->start, arch_namespace_align_size());
+
 	/* allocation needs to be contiguous, so this is all or nothing */
 	if (resource_size(valid) < n)
 		goto invalid;
-- 
2.24.1
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 6/6] libnvdimm/namespace: Expose arch specific supported size align value
  2020-01-20 14:07 [PATCH v4 0/6] Validating namespace size and start address attributes Aneesh Kumar K.V
                   ` (4 preceding siblings ...)
  2020-01-20 14:07 ` [PATCH v4 5/6] libnvdimm/namespace: Align DPA based on arch restrictions Aneesh Kumar K.V
@ 2020-01-20 14:07 ` Aneesh Kumar K.V
  5 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2020-01-20 14:07 UTC (permalink / raw)
  To: dan.j.williams, vishal.l.verma, jmoyer; +Cc: linux-nvdimm, Aneesh Kumar K.V

Expose supported size align as a namespace RO attribute. Usespace can use this
to validate the size argument specified when creating a new namespace.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/nvdimm/namespace_devs.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 59169d914b86..ea76270c87d2 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -1660,6 +1660,14 @@ static ssize_t force_raw_show(struct device *dev,
 }
 static DEVICE_ATTR_RW(force_raw);
 
+static ssize_t supported_size_align_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%ld\n", arch_namespace_align_size());
+}
+static DEVICE_ATTR_RO(supported_size_align);
+
+
 static struct attribute *nd_namespace_attributes[] = {
 	&dev_attr_nstype.attr,
 	&dev_attr_size.attr,
@@ -1672,6 +1680,7 @@ static struct attribute *nd_namespace_attributes[] = {
 	&dev_attr_sector_size.attr,
 	&dev_attr_dpa_extents.attr,
 	&dev_attr_holder_class.attr,
+	&dev_attr_supported_size_align.attr,
 	NULL,
 };
 
-- 
2.24.1
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/6] libnvdimm/namespace: Make namespace size validation arch dependent
  2020-01-20 14:07 ` [PATCH v4 1/6] libnvdimm/namespace: Make namespace size validation arch dependent Aneesh Kumar K.V
@ 2020-01-24  5:57   ` Dan Williams
  2020-01-24  7:34     ` Aneesh Kumar K.V
  0 siblings, 1 reply; 17+ messages in thread
From: Dan Williams @ 2020-01-24  5:57 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-nvdimm

On Mon, Jan 20, 2020 at 6:08 AM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> The page size used to map the namespace is arch dependent. For example
> architectures like ppc64 use 16MB page size for direct-mapping. If the namespace
> size is not aligned to the mapping page size, users can observe kernel crash
> during namespace init and destroy.
>
> This is due to kernel doing partial map/unmap of the resource range
>
> BUG: Unable to handle kernel data access at 0xc001000406000000
> Faulting instruction address: 0xc000000000090790
> NIP [c000000000090790] arch_add_memory+0xc0/0x130
> LR [c000000000090744] arch_add_memory+0x74/0x130
> Call Trace:
>  arch_add_memory+0x74/0x130 (unreliable)
>  memremap_pages+0x74c/0xa30
>  devm_memremap_pages+0x3c/0xa0
>  pmem_attach_disk+0x188/0x770
>  nvdimm_bus_probe+0xd8/0x470
>  really_probe+0x148/0x570
>  driver_probe_device+0x19c/0x1d0
>  device_driver_attach+0xcc/0x100
>  bind_store+0x134/0x1c0
>  drv_attr_store+0x44/0x60
>  sysfs_kf_write+0x74/0xc0
>  kernfs_fop_write+0x1b4/0x290
>  __vfs_write+0x3c/0x70
>  vfs_write+0xd0/0x260
>  ksys_write+0xdc/0x130
>  system_call+0x5c/0x68
>
> Kernel should also ensure that namespace size is also mulitple of subsection size.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  arch/arm64/mm/flush.c     | 6 ++++++
>  arch/powerpc/lib/pmem.c   | 9 +++++++++
>  arch/x86/mm/pageattr.c    | 7 +++++++
>  include/linux/libnvdimm.h | 1 +
>  4 files changed, 23 insertions(+)
>
> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> index ac485163a4a7..95cb5538bc6e 100644
> --- a/arch/arm64/mm/flush.c
> +++ b/arch/arm64/mm/flush.c
> @@ -91,4 +91,10 @@ void arch_invalidate_pmem(void *addr, size_t size)
>         __inval_dcache_area(addr, size);
>  }
>  EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
> +
> +unsigned long arch_namespace_map_size(void)
> +{
> +       return PAGE_SIZE;
> +}
> +EXPORT_SYMBOL_GPL(arch_namespace_map_size);
>  #endif
> diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
> index 0666a8d29596..63dca24e4a18 100644
> --- a/arch/powerpc/lib/pmem.c
> +++ b/arch/powerpc/lib/pmem.c
> @@ -26,6 +26,15 @@ void arch_invalidate_pmem(void *addr, size_t size)
>  }
>  EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
>
> +unsigned long arch_namespace_map_size(void)
> +{
> +       if (radix_enabled())
> +               return PAGE_SIZE;
> +       return (1UL << mmu_psize_defs[mmu_linear_psize].shift);
> +
> +}
> +EXPORT_SYMBOL_GPL(arch_namespace_map_size);
> +
>  /*
>   * CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE symbols
>   */
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index 1b99ad05b117..d78b5082f376 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -310,6 +310,13 @@ void arch_invalidate_pmem(void *addr, size_t size)
>  }
>  EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
>
> +unsigned long arch_namespace_map_size(void)
> +{
> +       return PAGE_SIZE;
> +}
> +EXPORT_SYMBOL_GPL(arch_namespace_map_size);
> +
> +
>  static void __cpa_flush_all(void *arg)
>  {
>         unsigned long cache = (unsigned long)arg;
> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
> index 9df091bd30ba..a3476dbd2656 100644
> --- a/include/linux/libnvdimm.h
> +++ b/include/linux/libnvdimm.h
> @@ -284,4 +284,5 @@ static inline void arch_invalidate_pmem(void *addr, size_t size)
>  }
>  #endif
>
> +unsigned long arch_namespace_map_size(void);

This property is more generic than the nvdimm namespace mapping size,
it's more the fundamental remap granularity that the architecture
supports. So I would expect this to be defined in core header files.
Something like:

diff --git a/include/linux/io.h b/include/linux/io.h
index a59834bc0a11..58b3b2091dbb 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -155,6 +155,13 @@ enum {
 void *memremap(resource_size_t offset, size_t size, unsigned long flags);
 void memunmap(void *addr);

+#ifndef memremap_min_align
+static inline unsigned int memremap_min_align(void)
+{
+       return PAGE_SIZE;
+}
+#endif
+
 /*
  * On x86 PAT systems we have memory tracking that keeps track of
  * the allowed mappings on memory ranges. This tracking works for

...and then have a definition is asm/io.h like this:

unsigned int memremap_min_align(void);
#define memremap_min_align memremap_min_align

That way only architectures that want to opt out of the default need
to define something in their local header.
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/6] libnvdimm/namespace: Make namespace size validation arch dependent
  2020-01-24  5:57   ` Dan Williams
@ 2020-01-24  7:34     ` Aneesh Kumar K.V
  2020-01-24 16:45       ` Dan Williams
  0 siblings, 1 reply; 17+ messages in thread
From: Aneesh Kumar K.V @ 2020-01-24  7:34 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm

On 1/24/20 11:27 AM, Dan Williams wrote:
> On Mon, Jan 20, 2020 at 6:08 AM Aneesh Kumar K.V
>

....

>>
>> +unsigned long arch_namespace_map_size(void)
>> +{
>> +       return PAGE_SIZE;
>> +}
>> +EXPORT_SYMBOL_GPL(arch_namespace_map_size);
>> +
>> +
>>   static void __cpa_flush_all(void *arg)
>>   {
>>          unsigned long cache = (unsigned long)arg;
>> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
>> index 9df091bd30ba..a3476dbd2656 100644
>> --- a/include/linux/libnvdimm.h
>> +++ b/include/linux/libnvdimm.h
>> @@ -284,4 +284,5 @@ static inline void arch_invalidate_pmem(void *addr, size_t size)
>>   }
>>   #endif
>>
>> +unsigned long arch_namespace_map_size(void);
> 
> This property is more generic than the nvdimm namespace mapping size,
> it's more the fundamental remap granularity that the architecture
> supports. So I would expect this to be defined in core header files.
> Something like:
> 
> diff --git a/include/linux/io.h b/include/linux/io.h
> index a59834bc0a11..58b3b2091dbb 100644
> --- a/include/linux/io.h
> +++ b/include/linux/io.h
> @@ -155,6 +155,13 @@ enum {
>   void *memremap(resource_size_t offset, size_t size, unsigned long flags);
>   void memunmap(void *addr);
> 
> +#ifndef memremap_min_align
> +static inline unsigned int memremap_min_align(void)
> +{
> +       return PAGE_SIZE;
> +}
> +#endif
> +


Should that be memremap_pages_min_align()?

>   /*
>    * On x86 PAT systems we have memory tracking that keeps track of
>    * the allowed mappings on memory ranges. This tracking works for
> 
> ...and then have a definition is asm/io.h like this:
> 
> unsigned int memremap_min_align(void);
> #define memremap_min_align memremap_min_align
> 
> That way only architectures that want to opt out of the default need
> to define something in their local header.
> 

-aneesh
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/6] libnvdimm/namespace: Make namespace size validation arch dependent
  2020-01-24  7:34     ` Aneesh Kumar K.V
@ 2020-01-24 16:45       ` Dan Williams
  2020-01-24 17:07         ` Aneesh Kumar K.V
  2020-01-24 17:08         ` Aneesh Kumar K.V
  0 siblings, 2 replies; 17+ messages in thread
From: Dan Williams @ 2020-01-24 16:45 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-nvdimm

On Thu, Jan 23, 2020 at 11:34 PM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> On 1/24/20 11:27 AM, Dan Williams wrote:
> > On Mon, Jan 20, 2020 at 6:08 AM Aneesh Kumar K.V
> >
>
> ....
>
> >>
> >> +unsigned long arch_namespace_map_size(void)
> >> +{
> >> +       return PAGE_SIZE;
> >> +}
> >> +EXPORT_SYMBOL_GPL(arch_namespace_map_size);
> >> +
> >> +
> >>   static void __cpa_flush_all(void *arg)
> >>   {
> >>          unsigned long cache = (unsigned long)arg;
> >> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
> >> index 9df091bd30ba..a3476dbd2656 100644
> >> --- a/include/linux/libnvdimm.h
> >> +++ b/include/linux/libnvdimm.h
> >> @@ -284,4 +284,5 @@ static inline void arch_invalidate_pmem(void *addr, size_t size)
> >>   }
> >>   #endif
> >>
> >> +unsigned long arch_namespace_map_size(void);
> >
> > This property is more generic than the nvdimm namespace mapping size,
> > it's more the fundamental remap granularity that the architecture
> > supports. So I would expect this to be defined in core header files.
> > Something like:
> >
> > diff --git a/include/linux/io.h b/include/linux/io.h
> > index a59834bc0a11..58b3b2091dbb 100644
> > --- a/include/linux/io.h
> > +++ b/include/linux/io.h
> > @@ -155,6 +155,13 @@ enum {
> >   void *memremap(resource_size_t offset, size_t size, unsigned long flags);
> >   void memunmap(void *addr);
> >
> > +#ifndef memremap_min_align
> > +static inline unsigned int memremap_min_align(void)
> > +{
> > +       return PAGE_SIZE;
> > +}
> > +#endif
> > +
>
>
> Should that be memremap_pages_min_align()?

No, and on second look it needs to be a common value that results in
properly aligned / sized namespaces across architectures.

What would it take for Power to make it's minimum mapping granularity
SUBSECTION_SIZE? The minute that the minimum alignment changes across
architectures we lose compatibility.

The namespaces need to be sized such that the mode can be changed freely.
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/6] libnvdimm/namespace: Make namespace size validation arch dependent
  2020-01-24 16:45       ` Dan Williams
@ 2020-01-24 17:07         ` Aneesh Kumar K.V
  2020-01-24 18:25           ` Dan Williams
  2020-01-24 17:08         ` Aneesh Kumar K.V
  1 sibling, 1 reply; 17+ messages in thread
From: Aneesh Kumar K.V @ 2020-01-24 17:07 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm, linuxppc-dev

On 1/24/20 10:15 PM, Dan Williams wrote:
> On Thu, Jan 23, 2020 at 11:34 PM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> On 1/24/20 11:27 AM, Dan Williams wrote:
>>> On Mon, Jan 20, 2020 at 6:08 AM Aneesh Kumar K.V
>>>
>>
>> ....
>>
>>>>
>>>> +unsigned long arch_namespace_map_size(void)
>>>> +{
>>>> +       return PAGE_SIZE;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(arch_namespace_map_size);
>>>> +
>>>> +
>>>>    static void __cpa_flush_all(void *arg)
>>>>    {
>>>>           unsigned long cache = (unsigned long)arg;
>>>> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
>>>> index 9df091bd30ba..a3476dbd2656 100644
>>>> --- a/include/linux/libnvdimm.h
>>>> +++ b/include/linux/libnvdimm.h
>>>> @@ -284,4 +284,5 @@ static inline void arch_invalidate_pmem(void *addr, size_t size)
>>>>    }
>>>>    #endif
>>>>
>>>> +unsigned long arch_namespace_map_size(void);
>>>
>>> This property is more generic than the nvdimm namespace mapping size,
>>> it's more the fundamental remap granularity that the architecture
>>> supports. So I would expect this to be defined in core header files.
>>> Something like:
>>>
>>> diff --git a/include/linux/io.h b/include/linux/io.h
>>> index a59834bc0a11..58b3b2091dbb 100644
>>> --- a/include/linux/io.h
>>> +++ b/include/linux/io.h
>>> @@ -155,6 +155,13 @@ enum {
>>>    void *memremap(resource_size_t offset, size_t size, unsigned long flags);
>>>    void memunmap(void *addr);
>>>
>>> +#ifndef memremap_min_align
>>> +static inline unsigned int memremap_min_align(void)
>>> +{
>>> +       return PAGE_SIZE;
>>> +}
>>> +#endif
>>> +
>>
>>
>> Should that be memremap_pages_min_align()?
> 
> No, and on second look it needs to be a common value that results in
> properly aligned / sized namespaces across architectures.
> 
> What would it take for Power to make it's minimum mapping granularity
> SUBSECTION_SIZE? The minute that the minimum alignment changes across
> architectures we lose compatibility.
> 
> The namespaces need to be sized such that the mode can be changed freely.
> 

Linux on ppc64 with hash translation use just one page size for direct 
mapping and that is 16MB.

-aneesh
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/6] libnvdimm/namespace: Make namespace size validation arch dependent
  2020-01-24 16:45       ` Dan Williams
  2020-01-24 17:07         ` Aneesh Kumar K.V
@ 2020-01-24 17:08         ` Aneesh Kumar K.V
  1 sibling, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2020-01-24 17:08 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm

On 1/24/20 10:15 PM, Dan Williams wrote:
> On Thu, Jan 23, 2020 at 11:34 PM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> On 1/24/20 11:27 AM, Dan Williams wrote:
>>> On Mon, Jan 20, 2020 at 6:08 AM Aneesh Kumar K.V
>>>
>>
>> ....
>>
>>>>
>>>> +unsigned long arch_namespace_map_size(void)
>>>> +{
>>>> +       return PAGE_SIZE;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(arch_namespace_map_size);
>>>> +
>>>> +
>>>>    static void __cpa_flush_all(void *arg)
>>>>    {
>>>>           unsigned long cache = (unsigned long)arg;
>>>> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
>>>> index 9df091bd30ba..a3476dbd2656 100644
>>>> --- a/include/linux/libnvdimm.h
>>>> +++ b/include/linux/libnvdimm.h
>>>> @@ -284,4 +284,5 @@ static inline void arch_invalidate_pmem(void *addr, size_t size)
>>>>    }
>>>>    #endif
>>>>
>>>> +unsigned long arch_namespace_map_size(void);
>>>
>>> This property is more generic than the nvdimm namespace mapping size,
>>> it's more the fundamental remap granularity that the architecture
>>> supports. So I would expect this to be defined in core header files.
>>> Something like:
>>>
>>> diff --git a/include/linux/io.h b/include/linux/io.h
>>> index a59834bc0a11..58b3b2091dbb 100644
>>> --- a/include/linux/io.h
>>> +++ b/include/linux/io.h
>>> @@ -155,6 +155,13 @@ enum {
>>>    void *memremap(resource_size_t offset, size_t size, unsigned long flags);
>>>    void memunmap(void *addr);
>>>
>>> +#ifndef memremap_min_align
>>> +static inline unsigned int memremap_min_align(void)
>>> +{
>>> +       return PAGE_SIZE;
>>> +}
>>> +#endif
>>> +
>>
>>
>> Should that be memremap_pages_min_align()?
> 
> No, and on second look it needs to be a common value that results in
> properly aligned / sized namespaces across architectures.
> 
> What would it take for Power to make it's minimum mapping granularity
> SUBSECTION_SIZE? The minute that the minimum alignment changes across
> architectures we lose compatibility.
> 
> The namespaces need to be sized such that the mode can be changed freely.
> 

We should do a discussion for LSF/MM for architecture compatibility 
challenges with NVDIMM. I did post a request to attend on that. But 
never got a response on that submission.



-aneesh
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/6] libnvdimm/namespace: Make namespace size validation arch dependent
  2020-01-24 17:07         ` Aneesh Kumar K.V
@ 2020-01-24 18:25           ` Dan Williams
  2020-01-26 11:41             ` Aneesh Kumar K.V
  0 siblings, 1 reply; 17+ messages in thread
From: Dan Williams @ 2020-01-24 18:25 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-nvdimm, linuxppc-dev

On Fri, Jan 24, 2020 at 9:07 AM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> On 1/24/20 10:15 PM, Dan Williams wrote:
> > On Thu, Jan 23, 2020 at 11:34 PM Aneesh Kumar K.V
> > <aneesh.kumar@linux.ibm.com> wrote:
> >>
> >> On 1/24/20 11:27 AM, Dan Williams wrote:
> >>> On Mon, Jan 20, 2020 at 6:08 AM Aneesh Kumar K.V
> >>>
> >>
> >> ....
> >>
> >>>>
> >>>> +unsigned long arch_namespace_map_size(void)
> >>>> +{
> >>>> +       return PAGE_SIZE;
> >>>> +}
> >>>> +EXPORT_SYMBOL_GPL(arch_namespace_map_size);
> >>>> +
> >>>> +
> >>>>    static void __cpa_flush_all(void *arg)
> >>>>    {
> >>>>           unsigned long cache = (unsigned long)arg;
> >>>> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
> >>>> index 9df091bd30ba..a3476dbd2656 100644
> >>>> --- a/include/linux/libnvdimm.h
> >>>> +++ b/include/linux/libnvdimm.h
> >>>> @@ -284,4 +284,5 @@ static inline void arch_invalidate_pmem(void *addr, size_t size)
> >>>>    }
> >>>>    #endif
> >>>>
> >>>> +unsigned long arch_namespace_map_size(void);
> >>>
> >>> This property is more generic than the nvdimm namespace mapping size,
> >>> it's more the fundamental remap granularity that the architecture
> >>> supports. So I would expect this to be defined in core header files.
> >>> Something like:
> >>>
> >>> diff --git a/include/linux/io.h b/include/linux/io.h
> >>> index a59834bc0a11..58b3b2091dbb 100644
> >>> --- a/include/linux/io.h
> >>> +++ b/include/linux/io.h
> >>> @@ -155,6 +155,13 @@ enum {
> >>>    void *memremap(resource_size_t offset, size_t size, unsigned long flags);
> >>>    void memunmap(void *addr);
> >>>
> >>> +#ifndef memremap_min_align
> >>> +static inline unsigned int memremap_min_align(void)
> >>> +{
> >>> +       return PAGE_SIZE;
> >>> +}
> >>> +#endif
> >>> +
> >>
> >>
> >> Should that be memremap_pages_min_align()?
> >
> > No, and on second look it needs to be a common value that results in
> > properly aligned / sized namespaces across architectures.
> >
> > What would it take for Power to make it's minimum mapping granularity
> > SUBSECTION_SIZE? The minute that the minimum alignment changes across
> > architectures we lose compatibility.
> >
> > The namespaces need to be sized such that the mode can be changed freely.
> >
>
> Linux on ppc64 with hash translation use just one page size for direct
> mapping and that is 16MB.

Ok, I think this means that the dream of SUBSECTION_SIZE being the
minimum compat alignment is dead, or at least a dream deferred.

Let's do this, change the name of this function to:

    memremap_compat_align()

...and define it to be the max() of all the alignment constraints that
the arch may require through either memremap(), or memremap_pages().
Then, teach ndctl to make its default alignment compatible by default,
16MiB, with an override to allow namespace creation with the current
architecture's memremap_compat_align(), exported via sysfs, if it
happens to be less then 16MiB. Finally, cross our fingers and hope
that Power remains the only arch that violates the SUBSECTION_SIZE
minimum value for memremap_compat_align().
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/6] libnvdimm/namespace: Validate namespace start addr and size
  2020-01-20 14:07 ` [PATCH v4 2/6] libnvdimm/namespace: Validate namespace start addr and size Aneesh Kumar K.V
@ 2020-01-25  1:55   ` Dan Williams
  0 siblings, 0 replies; 17+ messages in thread
From: Dan Williams @ 2020-01-25  1:55 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-nvdimm

On Mon, Jan 20, 2020 at 6:08 AM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> Make sure namespace start addr and size are properly aligned as per architecture
> restrictions. If the namespace is not aligned kernel mark the namespace 'disabled'
>
> Architectures like ppc64 use different page size than PAGE_SIZE to map
> direct-map address range. The kernel needs to make sure the namespace size is aligned
> correctly for the direct-map page size.
>
> kernel log will contain information as below.
>
> [    5.810939] nd_pmem namespace0.1: invalid size/SPA
> [    5.810969] nd_pmem: probe of namespace0.1 failed with error -95
>
> and the namespace will be marked 'disabled'
>
>   {
>     "dev":"namespace0.1",
>     "mode":"fsdax",
>     "map":"mem",
>     "size":1071644672,
>     "uuid":"25577a00-c012-421d-89ca-3ee189e08848",
>     "sector_size":512,
>     "state":"disabled"
>   },
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  drivers/nvdimm/namespace_devs.c | 50 +++++++++++++++++++++++++++++++++
>  1 file changed, 50 insertions(+)
>
> diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
> index 032dc61725ff..0e2c90730ce3 100644
> --- a/drivers/nvdimm/namespace_devs.c
> +++ b/drivers/nvdimm/namespace_devs.c
> @@ -1113,6 +1113,51 @@ resource_size_t nvdimm_namespace_capacity(struct nd_namespace_common *ndns)
>  }
>  EXPORT_SYMBOL(nvdimm_namespace_capacity);
>
> +static bool nvdimm_valid_namespace(struct device *dev,
> +               struct nd_namespace_common *ndns, resource_size_t size)
> +{
> +       struct device *ndns_dev = &ndns->dev;
> +       struct nd_region *nd_region = to_nd_region(ndns->dev.parent);
> +       unsigned long map_size = arch_namespace_map_size();
> +       struct resource *res;
> +       u32 remainder;
> +
> +       /*
> +        * Don't validate the start and size for blk namespace type
> +        */

If you're going to comment I'd prefer a comment that explains why, not
what. E.g. "blk namespaces are ioremap()'d with small page aligned
apertures"

> +       if (is_namespace_blk(ndns_dev))
> +               return true;
> +
> +       /*
> +        * For btt and raw namespace kernel use ioremap. Assume both can work
> +        * with PAGE_SIZE alignment.
> +        */
> +       if (is_nd_btt(dev) || is_namespace_io(ndns_dev))
> +               map_size = PAGE_SIZE;
> +
> +       div_u64_rem(size, map_size * nd_region->ndr_mappings, &remainder);

Not all regions have mappings see e820_register_one(), so this is a
divide by zero.

I think this is mixing 2 different constraints. There's the soft
constraint of wanting aliased pmem capacity to remain aligned relative
to blk capacity, and then there's the hard constraint where the kernel
just can't use devm_memremap_pages() for the given capacity alignment.

I think this routine should be called:

    nvdimm_validate_geometry()

...and only check for:

    if (pmem_should_map_pages() || is_nd_pfn() || is_nd_dax())

...i.e namespaces are mapped by memremap_pages() and compare that
against memremap_compat_align().
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 3/6] libnvdimm/namespace: Add arch dependent callback for namespace create time validation
  2020-01-20 14:07 ` [PATCH v4 3/6] libnvdimm/namespace: Add arch dependent callback for namespace create time validation Aneesh Kumar K.V
@ 2020-01-25  1:59   ` Dan Williams
  0 siblings, 0 replies; 17+ messages in thread
From: Dan Williams @ 2020-01-25  1:59 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-nvdimm

On Mon, Jan 20, 2020 at 6:08 AM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> Namespace start address and size should be multiple of subsection size to avoid
> handling complex partial subsection addition and removal of memory. Even though
> subsection size is arch independent (2M), architectures do impose further
> restrictions w.r.t namespace size/start addr based on direct-mapping  page size.
>
> This patch adds an arch callback for supporting arch specific restrictions w.r.t
> namespace size. This is different from arch_namespace_map_size() in that this
> validates against SUBSECTION_SIZE. Ideally, kernel should use the same restrictions
> during namespace initialization too. But that prevents an existing unaligned
> namespace initialization to fail. The kernel now allows such namespace initialization
> so that an existing installation is not broken.

I think we only need memremap_compat_align() to default to the
subsection size with the option for Power to override it to 16MiB.
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 4/6] libnvdimm/namespace: Validate namespace size when creating a new namespace.
  2020-01-20 14:07 ` [PATCH v4 4/6] libnvdimm/namespace: Validate namespace size when creating a new namespace Aneesh Kumar K.V
@ 2020-01-25  2:22   ` Dan Williams
  0 siblings, 0 replies; 17+ messages in thread
From: Dan Williams @ 2020-01-25  2:22 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-nvdimm

On Mon, Jan 20, 2020 at 6:08 AM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> Kernel should validate the namespace size against SUBSECTION_SIZE rather than
> PAGE_SIZE. blk namespace continues to use old rule so that the new kernel don't
> break the creation of blk namespaces.
>
> Use the new helper arch_namespace_align_size() so that architecture specific
> restrictions are also taken care of.
>
> kernel log will contain the below details
> [  939.620064] nd namespace0.3: 1071644672 is not 16384K aligned

This can't be enforced at size_store() time because the size is only
invalid relative to whether someone wants to create more namespaces or
assign a personality. Instead the free space allocation code needs to
pick the correct alignment by default, and it needs to assume that the
boot-to-boot physical address alignment variation is something greater
than memremap_compat_align(). For x86 it's typically 64MiB, do you
happen to know if Power firmware would ever physically remap a
resource range in a way that would violate that 16MiB minimum
alignment guarantee?

If we have that guarantee then the new kernel enabling needed is a way
to adjust the free-space allocator to align new allocations to
memremap_compat_align(), but that ndctl would likely bump that up to
16MiB across the board since it knows that Power needs more alignment
than x86 and it is the agent that wants to enforce cross-arch
compatibility. Otherwise the kernel should do it's best to support
requested size_store().
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/6] libnvdimm/namespace: Make namespace size validation arch dependent
  2020-01-24 18:25           ` Dan Williams
@ 2020-01-26 11:41             ` Aneesh Kumar K.V
  0 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2020-01-26 11:41 UTC (permalink / raw)
  To: Dan Williams, Jeff Moyer; +Cc: linux-nvdimm, linuxppc-dev

Dan Williams <dan.j.williams@intel.com> writes:

> On Fri, Jan 24, 2020 at 9:07 AM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> On 1/24/20 10:15 PM, Dan Williams wrote:
>> > On Thu, Jan 23, 2020 at 11:34 PM Aneesh Kumar K.V
>> > <aneesh.kumar@linux.ibm.com> wrote:
>> >>
>> >> On 1/24/20 11:27 AM, Dan Williams wrote:
>> >>> On Mon, Jan 20, 2020 at 6:08 AM Aneesh Kumar K.V
>> >>>
>> >>
>> >> ....
>> >>
>> >>>>
>> >>>> +unsigned long arch_namespace_map_size(void)
>> >>>> +{
>> >>>> +       return PAGE_SIZE;
>> >>>> +}
>> >>>> +EXPORT_SYMBOL_GPL(arch_namespace_map_size);
>> >>>> +
>> >>>> +
>> >>>>    static void __cpa_flush_all(void *arg)
>> >>>>    {
>> >>>>           unsigned long cache = (unsigned long)arg;
>> >>>> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
>> >>>> index 9df091bd30ba..a3476dbd2656 100644
>> >>>> --- a/include/linux/libnvdimm.h
>> >>>> +++ b/include/linux/libnvdimm.h
>> >>>> @@ -284,4 +284,5 @@ static inline void arch_invalidate_pmem(void *addr, size_t size)
>> >>>>    }
>> >>>>    #endif
>> >>>>
>> >>>> +unsigned long arch_namespace_map_size(void);
>> >>>
>> >>> This property is more generic than the nvdimm namespace mapping size,
>> >>> it's more the fundamental remap granularity that the architecture
>> >>> supports. So I would expect this to be defined in core header files.
>> >>> Something like:
>> >>>
>> >>> diff --git a/include/linux/io.h b/include/linux/io.h
>> >>> index a59834bc0a11..58b3b2091dbb 100644
>> >>> --- a/include/linux/io.h
>> >>> +++ b/include/linux/io.h
>> >>> @@ -155,6 +155,13 @@ enum {
>> >>>    void *memremap(resource_size_t offset, size_t size, unsigned long flags);
>> >>>    void memunmap(void *addr);
>> >>>
>> >>> +#ifndef memremap_min_align
>> >>> +static inline unsigned int memremap_min_align(void)
>> >>> +{
>> >>> +       return PAGE_SIZE;
>> >>> +}
>> >>> +#endif
>> >>> +
>> >>
>> >>
>> >> Should that be memremap_pages_min_align()?
>> >
>> > No, and on second look it needs to be a common value that results in
>> > properly aligned / sized namespaces across architectures.
>> >
>> > What would it take for Power to make it's minimum mapping granularity
>> > SUBSECTION_SIZE? The minute that the minimum alignment changes across
>> > architectures we lose compatibility.
>> >
>> > The namespaces need to be sized such that the mode can be changed freely.
>> >
>>
>> Linux on ppc64 with hash translation use just one page size for direct
>> mapping and that is 16MB.
>
> Ok, I think this means that the dream of SUBSECTION_SIZE being the
> minimum compat alignment is dead, or at least a dream deferred.
>
> Let's do this, change the name of this function to:
>
>     memremap_compat_align()
>
> ...and define it to be the max() of all the alignment constraints that
> the arch may require through either memremap(), or memremap_pages().
> Then, teach ndctl to make its default alignment compatible by default,
> 16MiB, with an override to allow namespace creation with the current
> architecture's memremap_compat_align(), exported via sysfs, if it
> happens to be less then 16MiB. Finally, cross our fingers and hope
> that Power remains the only arch that violates the SUBSECTION_SIZE
> minimum value for memremap_compat_align().

We do have two issues related to alignment here.

1) With upstream kernel, we don't validate the namespace start and size
value and hence we can end up creating namespace that is not aligned to
SUBSECTION_SIZE. This was observed by Jeff Moyer in his test. That means
we will fail to enable already created namespace if we use
SUBSECTION_SIZE to validate their alignment.

The solution I came up with was arch_namespace_map_size() that depends on the
direct-map mapping page size. On architecture like ppc64, this value can
be 16MB. 

3) For new namespaces, we can now ensure they are properly
aligned. For architectures other than ppc64 that value is SUBSECTION_SIZE;
ie, the resource start address and the size value should be aligned to
SUBSECTION_SIZE. For ppc64 this value should be 16MB because if they are
not 16MB aligned we cannot direct-map them.

I guess this can be memremap_compat_align() and we expose this value via
namespace attribute. By default, all architecture will now to try to
align things to 16MB unless specified --nocompat as ndctl
create-namespace command-line option. When used we use the
architecture-specific sysfs value (SUBSECTION_SIZE) to align things
correctly.


-aneesh
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-01-26 11:44 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-20 14:07 [PATCH v4 0/6] Validating namespace size and start address attributes Aneesh Kumar K.V
2020-01-20 14:07 ` [PATCH v4 1/6] libnvdimm/namespace: Make namespace size validation arch dependent Aneesh Kumar K.V
2020-01-24  5:57   ` Dan Williams
2020-01-24  7:34     ` Aneesh Kumar K.V
2020-01-24 16:45       ` Dan Williams
2020-01-24 17:07         ` Aneesh Kumar K.V
2020-01-24 18:25           ` Dan Williams
2020-01-26 11:41             ` Aneesh Kumar K.V
2020-01-24 17:08         ` Aneesh Kumar K.V
2020-01-20 14:07 ` [PATCH v4 2/6] libnvdimm/namespace: Validate namespace start addr and size Aneesh Kumar K.V
2020-01-25  1:55   ` Dan Williams
2020-01-20 14:07 ` [PATCH v4 3/6] libnvdimm/namespace: Add arch dependent callback for namespace create time validation Aneesh Kumar K.V
2020-01-25  1:59   ` Dan Williams
2020-01-20 14:07 ` [PATCH v4 4/6] libnvdimm/namespace: Validate namespace size when creating a new namespace Aneesh Kumar K.V
2020-01-25  2:22   ` Dan Williams
2020-01-20 14:07 ` [PATCH v4 5/6] libnvdimm/namespace: Align DPA based on arch restrictions Aneesh Kumar K.V
2020-01-20 14:07 ` [PATCH v4 6/6] libnvdimm/namespace: Expose arch specific supported size align value Aneesh Kumar K.V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).