linux-nvdimm.lists.01.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/7] Mark the namespace disabled on pfn superblock mismatch
@ 2019-08-19 13:34 Aneesh Kumar K.V
  2019-08-19 13:34 ` [PATCH v6 1/7] libnvdimm/region: Rewrite _probe_success() to _advance_seeds() Aneesh Kumar K.V
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Aneesh Kumar K.V @ 2019-08-19 13:34 UTC (permalink / raw)
  To: dan.j.williams; +Cc: Aneesh Kumar K.V, linux-nvdimm

We add new members to pfn superblock (PAGE_SIZE and struct page size) in this series.
This is now checked while initializing the namespace. If we find a mismatch we mark
the namespace disabled.

This series also handle configs where hugepage support is not enabled by default.
This can result in different align restrictions for dax namespace. We mark the
dax namespace disabled if we find the alignment not supported.

Changes from v5:
* Split patch 3
* Update commit message
* Add MAX_STRUCT_PAGE_SIZE with value 64 and use that when allocating reserve block
* Add BUILD_BUG_ON if we find sizeof(struct page) > 64


Aneesh Kumar K.V (6):
  libnvdimm/pmem: Advance namespace seed for specific probe errors
  libnvdimm/pfn_dev: Add a build check to make sure we notice when
    struct page size change
  libnvdimm/pfn_dev: Add page size and struct page size to pfn
    superblock
  libnvdimm/label: Remove the dpa align check
  libnvdimm: Use PAGE_SIZE instead of SZ_4K for align check
  libnvdimm/dax: Pick the right alignment default when creating dax
    devices

Dan Williams (1):
  libnvdimm/region: Rewrite _probe_success() to _advance_seeds()

 arch/powerpc/include/asm/libnvdimm.h |  9 ++++
 arch/powerpc/mm/Makefile             |  1 +
 arch/powerpc/mm/nvdimm.c             | 34 +++++++++++++
 arch/x86/include/asm/libnvdimm.h     | 19 +++++++
 drivers/nvdimm/bus.c                 |  8 ++-
 drivers/nvdimm/label.c               |  5 --
 drivers/nvdimm/namespace_devs.c      | 40 +++++++++++----
 drivers/nvdimm/nd-core.h             |  3 +-
 drivers/nvdimm/nd.h                  | 10 ++--
 drivers/nvdimm/pfn.h                 |  5 +-
 drivers/nvdimm/pfn_devs.c            | 67 ++++++++++++++++++++++--
 drivers/nvdimm/pmem.c                | 29 +++++++++--
 drivers/nvdimm/region_devs.c         | 76 +++++-----------------------
 include/linux/huge_mm.h              |  7 ++-
 14 files changed, 215 insertions(+), 98 deletions(-)
 create mode 100644 arch/powerpc/include/asm/libnvdimm.h
 create mode 100644 arch/powerpc/mm/nvdimm.c
 create mode 100644 arch/x86/include/asm/libnvdimm.h

-- 
2.21.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v6 1/7] libnvdimm/region: Rewrite _probe_success() to _advance_seeds()
  2019-08-19 13:34 [PATCH v6 0/7] Mark the namespace disabled on pfn superblock mismatch Aneesh Kumar K.V
@ 2019-08-19 13:34 ` Aneesh Kumar K.V
  2019-08-19 13:34 ` [PATCH v6 2/7] libnvdimm/pmem: Advance namespace seed for specific probe errors Aneesh Kumar K.V
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Aneesh Kumar K.V @ 2019-08-19 13:34 UTC (permalink / raw)
  To: dan.j.williams; +Cc: Aneesh Kumar K.V, linux-nvdimm

From: Dan Williams <dan.j.williams@intel.com>

The nd_region_probe_success() helper collides seed management with
nvdimm->busy tracking. Given the 'busy' increment is handled internal to the
nd_region driver 'probe' path move the decrement to the 'remove' path.
With that cleanup the routine can be renamed to the more descriptive
nd_region_advance_seeds().

The change is prompted by an incoming need to optionally advance the
seeds on other events besides 'probe' success.

Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/nvdimm/bus.c            |  7 +---
 drivers/nvdimm/namespace_devs.c | 34 ++++++++++++++---
 drivers/nvdimm/nd-core.h        |  3 +-
 drivers/nvdimm/region_devs.c    | 68 +++++----------------------------
 4 files changed, 41 insertions(+), 71 deletions(-)

diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 798c5c4aea9c..9b64e68a20b8 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -95,10 +95,8 @@ static int nvdimm_bus_probe(struct device *dev)
 	rc = nd_drv->probe(dev);
 	debug_nvdimm_unlock(dev);
 
-	if (rc == 0)
-		nd_region_probe_success(nvdimm_bus, dev);
-	else
-		nd_region_disable(nvdimm_bus, dev);
+	if (rc == 0 && dev->parent && is_nd_region(dev->parent))
+		nd_region_advance_seeds(to_nd_region(dev->parent), dev);
 	nvdimm_bus_probe_end(nvdimm_bus);
 
 	dev_dbg(&nvdimm_bus->dev, "END: %s.probe(%s) = %d\n", dev->driver->name,
@@ -121,7 +119,6 @@ static int nvdimm_bus_remove(struct device *dev)
 		rc = nd_drv->remove(dev);
 		debug_nvdimm_unlock(dev);
 	}
-	nd_region_disable(nvdimm_bus, dev);
 
 	dev_dbg(&nvdimm_bus->dev, "%s.remove(%s) = %d\n", dev->driver->name,
 			dev_name(dev), rc);
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index a16e52251a30..3be81f7b9ed3 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -2462,6 +2462,27 @@ static struct device **create_namespaces(struct nd_region *nd_region)
 	return devs;
 }
 
+static void deactivate_labels(void *region)
+{
+	struct nd_region *nd_region = region;
+	int i;
+
+	for (i = 0; i < nd_region->ndr_mappings; i++) {
+		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+		struct nvdimm_drvdata *ndd = nd_mapping->ndd;
+		struct nvdimm *nvdimm = nd_mapping->nvdimm;
+
+		mutex_lock(&nd_mapping->lock);
+		nd_mapping_free_labels(nd_mapping);
+		mutex_unlock(&nd_mapping->lock);
+
+		put_ndd(ndd);
+		nd_mapping->ndd = NULL;
+		if (ndd)
+			atomic_dec(&nvdimm->busy);
+	}
+}
+
 static int init_active_labels(struct nd_region *nd_region)
 {
 	int i;
@@ -2519,16 +2540,17 @@ static int init_active_labels(struct nd_region *nd_region)
 			mutex_unlock(&nd_mapping->lock);
 		}
 
-		if (j >= count)
-			continue;
+		if (j < count)
+			break;
+	}
 
-		mutex_lock(&nd_mapping->lock);
-		nd_mapping_free_labels(nd_mapping);
-		mutex_unlock(&nd_mapping->lock);
+	if (i < nd_region->ndr_mappings) {
+		deactivate_labels(nd_region);
 		return -ENOMEM;
 	}
 
-	return 0;
+	return devm_add_action_or_reset(&nd_region->dev, deactivate_labels,
+			nd_region);
 }
 
 int nd_region_register_namespaces(struct nd_region *nd_region, int *err)
diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index 0ac52b6eb00e..945658cc32ec 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -128,13 +128,12 @@ int __init nvdimm_bus_init(void);
 void nvdimm_bus_exit(void);
 void nvdimm_devs_exit(void);
 void nd_region_devs_exit(void);
-void nd_region_probe_success(struct nvdimm_bus *nvdimm_bus, struct device *dev);
 struct nd_region;
+void nd_region_advance_seeds(struct nd_region *nd_region, struct device *dev);
 void nd_region_create_ns_seed(struct nd_region *nd_region);
 void nd_region_create_btt_seed(struct nd_region *nd_region);
 void nd_region_create_pfn_seed(struct nd_region *nd_region);
 void nd_region_create_dax_seed(struct nd_region *nd_region);
-void nd_region_disable(struct nvdimm_bus *nvdimm_bus, struct device *dev);
 int nvdimm_bus_create_ndctl(struct nvdimm_bus *nvdimm_bus);
 void nvdimm_bus_destroy_ndctl(struct nvdimm_bus *nvdimm_bus);
 void nd_synchronize(void);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index af30cbe7a8ea..57de49b79d7d 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -715,85 +715,37 @@ void nd_mapping_free_labels(struct nd_mapping *nd_mapping)
 }
 
 /*
- * Upon successful probe/remove, take/release a reference on the
- * associated interleave set (if present), and plant new btt + namespace
- * seeds.  Also, on the removal of a BLK region, notify the provider to
- * disable the region.
+ * When a namespace is activated create new seeds for the next
+ * namespace, or namespace-personality to be configured.
  */
-static void nd_region_notify_driver_action(struct nvdimm_bus *nvdimm_bus,
-		struct device *dev, bool probe)
+void nd_region_advance_seeds(struct nd_region *nd_region, struct device *dev)
 {
-	struct nd_region *nd_region;
-
-	if (!probe && is_nd_region(dev)) {
-		int i;
-
-		nd_region = to_nd_region(dev);
-		for (i = 0; i < nd_region->ndr_mappings; i++) {
-			struct nd_mapping *nd_mapping = &nd_region->mapping[i];
-			struct nvdimm_drvdata *ndd = nd_mapping->ndd;
-			struct nvdimm *nvdimm = nd_mapping->nvdimm;
-
-			mutex_lock(&nd_mapping->lock);
-			nd_mapping_free_labels(nd_mapping);
-			mutex_unlock(&nd_mapping->lock);
-
-			put_ndd(ndd);
-			nd_mapping->ndd = NULL;
-			if (ndd)
-				atomic_dec(&nvdimm->busy);
-		}
-	}
-	if (dev->parent && is_nd_region(dev->parent) && probe) {
-		nd_region = to_nd_region(dev->parent);
-		nvdimm_bus_lock(dev);
-		if (nd_region->ns_seed == dev)
-			nd_region_create_ns_seed(nd_region);
-		nvdimm_bus_unlock(dev);
-	}
-	if (is_nd_btt(dev) && probe) {
+	nvdimm_bus_lock(dev);
+	if (nd_region->ns_seed == dev) {
+		nd_region_create_ns_seed(nd_region);
+	} else if (is_nd_btt(dev)) {
 		struct nd_btt *nd_btt = to_nd_btt(dev);
 
-		nd_region = to_nd_region(dev->parent);
-		nvdimm_bus_lock(dev);
 		if (nd_region->btt_seed == dev)
 			nd_region_create_btt_seed(nd_region);
 		if (nd_region->ns_seed == &nd_btt->ndns->dev)
 			nd_region_create_ns_seed(nd_region);
-		nvdimm_bus_unlock(dev);
-	}
-	if (is_nd_pfn(dev) && probe) {
+	} else if (is_nd_pfn(dev)) {
 		struct nd_pfn *nd_pfn = to_nd_pfn(dev);
 
-		nd_region = to_nd_region(dev->parent);
-		nvdimm_bus_lock(dev);
 		if (nd_region->pfn_seed == dev)
 			nd_region_create_pfn_seed(nd_region);
 		if (nd_region->ns_seed == &nd_pfn->ndns->dev)
 			nd_region_create_ns_seed(nd_region);
-		nvdimm_bus_unlock(dev);
-	}
-	if (is_nd_dax(dev) && probe) {
+	} else if (is_nd_dax(dev)) {
 		struct nd_dax *nd_dax = to_nd_dax(dev);
 
-		nd_region = to_nd_region(dev->parent);
-		nvdimm_bus_lock(dev);
 		if (nd_region->dax_seed == dev)
 			nd_region_create_dax_seed(nd_region);
 		if (nd_region->ns_seed == &nd_dax->nd_pfn.ndns->dev)
 			nd_region_create_ns_seed(nd_region);
-		nvdimm_bus_unlock(dev);
 	}
-}
-
-void nd_region_probe_success(struct nvdimm_bus *nvdimm_bus, struct device *dev)
-{
-	nd_region_notify_driver_action(nvdimm_bus, dev, true);
-}
-
-void nd_region_disable(struct nvdimm_bus *nvdimm_bus, struct device *dev)
-{
-	nd_region_notify_driver_action(nvdimm_bus, dev, false);
+	nvdimm_bus_unlock(dev);
 }
 
 static ssize_t mappingN(struct device *dev, char *buf, int n)
-- 
2.21.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v6 2/7] libnvdimm/pmem: Advance namespace seed for specific probe errors
  2019-08-19 13:34 [PATCH v6 0/7] Mark the namespace disabled on pfn superblock mismatch Aneesh Kumar K.V
  2019-08-19 13:34 ` [PATCH v6 1/7] libnvdimm/region: Rewrite _probe_success() to _advance_seeds() Aneesh Kumar K.V
@ 2019-08-19 13:34 ` Aneesh Kumar K.V
  2019-08-19 13:34 ` [PATCH v6 3/7] libnvdimm/pfn_dev: Add a build check to make sure we notice when struct page size change Aneesh Kumar K.V
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Aneesh Kumar K.V @ 2019-08-19 13:34 UTC (permalink / raw)
  To: dan.j.williams; +Cc: Aneesh Kumar K.V, linux-nvdimm

In order to support marking namespaces with unsupported feature/versions
disabled, nvdimm core should advance the namespace seed on these
probe failures. Otherwise, these failed namespaces will be considered a
seed namespace and will be wrongly used while creating new namespaces.

Add -EOPNOTSUPP as return from pmem probe callback to indicate a namespace
initialization failures due to pfn superblock feature/version mismatch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/nvdimm/bus.c  |  3 ++-
 drivers/nvdimm/pmem.c | 29 +++++++++++++++++++++++++----
 2 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 9b64e68a20b8..69f784bf9744 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -95,7 +95,8 @@ static int nvdimm_bus_probe(struct device *dev)
 	rc = nd_drv->probe(dev);
 	debug_nvdimm_unlock(dev);
 
-	if (rc == 0 && dev->parent && is_nd_region(dev->parent))
+	if ((rc == 0 || rc == -EOPNOTSUPP) &&
+	    dev->parent && is_nd_region(dev->parent))
 		nd_region_advance_seeds(to_nd_region(dev->parent), dev);
 	nvdimm_bus_probe_end(nvdimm_bus);
 
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 4c121dd03dd9..f9f76f6ba07b 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -490,6 +490,7 @@ static int pmem_attach_disk(struct device *dev,
 
 static int nd_pmem_probe(struct device *dev)
 {
+	int ret;
 	struct nd_namespace_common *ndns;
 
 	ndns = nvdimm_namespace_common_probe(dev);
@@ -505,12 +506,32 @@ static int nd_pmem_probe(struct device *dev)
 	if (is_nd_pfn(dev))
 		return pmem_attach_disk(dev, ndns);
 
-	/* if we find a valid info-block we'll come back as that personality */
-	if (nd_btt_probe(dev, ndns) == 0 || nd_pfn_probe(dev, ndns) == 0
-			|| nd_dax_probe(dev, ndns) == 0)
+	ret = nd_btt_probe(dev, ndns);
+	if (ret == 0)
 		return -ENXIO;
 
-	/* ...otherwise we're just a raw pmem device */
+	/*
+	 * We have two failure conditions here, there is no
+	 * info reserver block or we found a valid info reserve block
+	 * but failed to initialize the pfn superblock.
+	 *
+	 * For the first case consider namespace as a raw pmem namespace
+	 * and attach a disk.
+	 *
+	 * For the latter, consider this a success and advance the namespace
+	 * seed.
+	 */
+	ret = nd_pfn_probe(dev, ndns);
+	if (ret == 0)
+		return -ENXIO;
+	else if (ret == -EOPNOTSUPP)
+		return ret;
+
+	ret = nd_dax_probe(dev, ndns);
+	if (ret == 0)
+		return -ENXIO;
+	else if (ret == -EOPNOTSUPP)
+		return ret;
 	return pmem_attach_disk(dev, ndns);
 }
 
-- 
2.21.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v6 3/7] libnvdimm/pfn_dev: Add a build check to make sure we notice when struct page size change
  2019-08-19 13:34 [PATCH v6 0/7] Mark the namespace disabled on pfn superblock mismatch Aneesh Kumar K.V
  2019-08-19 13:34 ` [PATCH v6 1/7] libnvdimm/region: Rewrite _probe_success() to _advance_seeds() Aneesh Kumar K.V
  2019-08-19 13:34 ` [PATCH v6 2/7] libnvdimm/pmem: Advance namespace seed for specific probe errors Aneesh Kumar K.V
@ 2019-08-19 13:34 ` Aneesh Kumar K.V
  2019-08-19 13:34 ` [PATCH v6 4/7] libnvdimm/pfn_dev: Add page size and struct page size to pfn superblock Aneesh Kumar K.V
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Aneesh Kumar K.V @ 2019-08-19 13:34 UTC (permalink / raw)
  To: dan.j.williams; +Cc: Aneesh Kumar K.V, linux-nvdimm

Namespaces created with PFN_MODE_PMEM mode stores struct page in the reserve
block area. We need to make sure we account for the right struct page
size while doing this. Instead of directly depending on sizeof(struct page)
which can change based on different kernel config option, use the max struct
page size (64) while calculating the reserve block area. This makes sure pmem
device can be used across kernels built with different configs.

If the above assumption of max struct page size change, we need to update the
reserve block allocation space for new namespaces created.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/nvdimm/nd.h       |  4 ++++
 drivers/nvdimm/pfn_devs.c | 10 +++++++++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 1b9955651379..e89af4b2d8e9 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -375,6 +375,10 @@ unsigned int pmem_sector_size(struct nd_namespace_common *ndns);
 void nvdimm_badblocks_populate(struct nd_region *nd_region,
 		struct badblocks *bb, const struct resource *res);
 #if IS_ENABLED(CONFIG_ND_CLAIM)
+
+/* max struct page size independent of kernel config */
+#define MAX_STRUCT_PAGE_SIZE 64
+
 int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
 int devm_nsio_enable(struct device *dev, struct nd_namespace_io *nsio);
 void devm_nsio_disable(struct device *dev, struct nd_namespace_io *nsio);
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 3e7b11cf1aae..cd120feb9213 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -701,8 +701,16 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 		 * The altmap should be padded out to the block size used
 		 * when populating the vmemmap. This *should* be equal to
 		 * PMD_SIZE for most architectures.
+		 *
+		 * Also make sure size of struct page is less than 64. We
+		 * want to make sure we use large enough size here so that
+		 * we don't have a dynamic reserve space depending on
+		 * struct page size. But we also want to make sure we notice
+		 * when we end up adding new elements to struct page.
 		 */
-		offset = ALIGN(start + SZ_8K + 64 * npfns, align) - start;
+		BUILD_BUG_ON(sizeof(struct page) > MAX_STRUCT_PAGE_SIZE);
+		offset = ALIGN(start + SZ_8K + MAX_STRUCT_PAGE_SIZE * npfns, align)
+			- start;
 	} else if (nd_pfn->mode == PFN_MODE_RAM)
 		offset = ALIGN(start + SZ_8K, align) - start;
 	else
-- 
2.21.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v6 4/7] libnvdimm/pfn_dev: Add page size and struct page size to pfn superblock
  2019-08-19 13:34 [PATCH v6 0/7] Mark the namespace disabled on pfn superblock mismatch Aneesh Kumar K.V
                   ` (2 preceding siblings ...)
  2019-08-19 13:34 ` [PATCH v6 3/7] libnvdimm/pfn_dev: Add a build check to make sure we notice when struct page size change Aneesh Kumar K.V
@ 2019-08-19 13:34 ` Aneesh Kumar K.V
  2019-08-19 13:34 ` [PATCH v6 5/7] libnvdimm/label: Remove the dpa align check Aneesh Kumar K.V
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Aneesh Kumar K.V @ 2019-08-19 13:34 UTC (permalink / raw)
  To: dan.j.williams; +Cc: Aneesh Kumar K.V, linux-nvdimm

This is needed so that pmem probe don't wrongly initialize a namespace
which doesn't have enough space reserved for holding struct pages
with the current kernel.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/nvdimm/pfn.h      |  5 ++++-
 drivers/nvdimm/pfn_devs.c | 25 ++++++++++++++++++++++++-
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h
index 7381673b7b70..acb19517f678 100644
--- a/drivers/nvdimm/pfn.h
+++ b/drivers/nvdimm/pfn.h
@@ -29,7 +29,10 @@ struct nd_pfn_sb {
 	/* minor-version-2 record the base alignment of the mapping */
 	__le32 align;
 	/* minor-version-3 guarantee the padding and flags are zero */
-	u8 padding[4000];
+	/* minor-version-4 record the page size and struct page size */
+	__le32 page_size;
+	__le16 page_struct_size;
+	u8 padding[3994];
 	__le64 checksum;
 };
 
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index cd120feb9213..22db77ac3d0e 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -460,6 +460,11 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
 	if (__le16_to_cpu(pfn_sb->version_minor) < 2)
 		pfn_sb->align = 0;
 
+	if (__le16_to_cpu(pfn_sb->version_minor) < 4) {
+		pfn_sb->page_struct_size = cpu_to_le16(64);
+		pfn_sb->page_size = cpu_to_le32(PAGE_SIZE);
+	}
+
 	switch (le32_to_cpu(pfn_sb->mode)) {
 	case PFN_MODE_RAM:
 	case PFN_MODE_PMEM:
@@ -475,6 +480,22 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
 		align = 1UL << ilog2(offset);
 	mode = le32_to_cpu(pfn_sb->mode);
 
+	if ((le32_to_cpu(pfn_sb->page_size) > PAGE_SIZE) &&
+	    (mode == PFN_MODE_PMEM)) {
+		dev_err(&nd_pfn->dev,
+			"init failed, page size mismatch %d\n",
+			le32_to_cpu(pfn_sb->page_size));
+		return -EOPNOTSUPP;
+	}
+
+	if ((le16_to_cpu(pfn_sb->page_struct_size) < sizeof(struct page)) &&
+	     (mode == PFN_MODE_PMEM)) {
+		dev_err(&nd_pfn->dev,
+			"init failed, struct page size mismatch %d\n",
+			le16_to_cpu(pfn_sb->page_struct_size));
+		return -EOPNOTSUPP;
+	}
+
 	if (!nd_pfn->uuid) {
 		/*
 		 * When probing a namepace via nd_pfn_probe() the uuid
@@ -730,8 +751,10 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 	memcpy(pfn_sb->uuid, nd_pfn->uuid, 16);
 	memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(&ndns->dev), 16);
 	pfn_sb->version_major = cpu_to_le16(1);
-	pfn_sb->version_minor = cpu_to_le16(3);
+	pfn_sb->version_minor = cpu_to_le16(4);
 	pfn_sb->align = cpu_to_le32(nd_pfn->align);
+	pfn_sb->page_struct_size = cpu_to_le16(MAX_STRUCT_PAGE_SIZE);
+	pfn_sb->page_size = cpu_to_le32(PAGE_SIZE);
 	checksum = nd_sb_checksum((struct nd_gen_sb *) pfn_sb);
 	pfn_sb->checksum = cpu_to_le64(checksum);
 
-- 
2.21.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v6 5/7] libnvdimm/label: Remove the dpa align check
  2019-08-19 13:34 [PATCH v6 0/7] Mark the namespace disabled on pfn superblock mismatch Aneesh Kumar K.V
                   ` (3 preceding siblings ...)
  2019-08-19 13:34 ` [PATCH v6 4/7] libnvdimm/pfn_dev: Add page size and struct page size to pfn superblock Aneesh Kumar K.V
@ 2019-08-19 13:34 ` Aneesh Kumar K.V
  2019-08-19 13:34 ` [PATCH v6 6/7] libnvdimm: Use PAGE_SIZE instead of SZ_4K for " Aneesh Kumar K.V
  2019-08-19 13:34 ` [PATCH v6 7/7] libnvdimm/dax: Pick the right alignment default when creating dax devices Aneesh Kumar K.V
  6 siblings, 0 replies; 11+ messages in thread
From: Aneesh Kumar K.V @ 2019-08-19 13:34 UTC (permalink / raw)
  To: dan.j.williams; +Cc: Aneesh Kumar K.V, linux-nvdimm

There's no strict requirement why slot_valid() needs to check for page alignment
and it would seem to actively hurt cross-page-size compatibility. Let's
delete the check and rely on checksum validation.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/nvdimm/label.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/nvdimm/label.c b/drivers/nvdimm/label.c
index 73e197babc2f..47a4828b8b31 100644
--- a/drivers/nvdimm/label.c
+++ b/drivers/nvdimm/label.c
@@ -353,11 +353,6 @@ static bool slot_valid(struct nvdimm_drvdata *ndd,
 	if (slot != __le32_to_cpu(nd_label->slot))
 		return false;
 
-	/* check that DPA allocations are page aligned */
-	if ((__le64_to_cpu(nd_label->dpa)
-				| __le64_to_cpu(nd_label->rawsize)) % SZ_4K)
-		return false;
-
 	/* check checksum */
 	if (namespace_label_has(ndd, checksum)) {
 		u64 sum, sum_save;
-- 
2.21.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v6 6/7] libnvdimm: Use PAGE_SIZE instead of SZ_4K for align check
  2019-08-19 13:34 [PATCH v6 0/7] Mark the namespace disabled on pfn superblock mismatch Aneesh Kumar K.V
                   ` (4 preceding siblings ...)
  2019-08-19 13:34 ` [PATCH v6 5/7] libnvdimm/label: Remove the dpa align check Aneesh Kumar K.V
@ 2019-08-19 13:34 ` Aneesh Kumar K.V
  2019-08-19 13:34 ` [PATCH v6 7/7] libnvdimm/dax: Pick the right alignment default when creating dax devices Aneesh Kumar K.V
  6 siblings, 0 replies; 11+ messages in thread
From: Aneesh Kumar K.V @ 2019-08-19 13:34 UTC (permalink / raw)
  To: dan.j.williams; +Cc: Aneesh Kumar K.V, linux-nvdimm

Architectures have different page size than 4K. Use the PAGE_SIZE
to make sure ranges are correctly aligned.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/nvdimm/namespace_devs.c | 6 +++---
 drivers/nvdimm/region_devs.c    | 8 ++++----
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 3be81f7b9ed3..43401325c874 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -1006,10 +1006,10 @@ static ssize_t __size_store(struct device *dev, unsigned long long val)
 		return -ENXIO;
 	}
 
-	div_u64_rem(val, SZ_4K * nd_region->ndr_mappings, &remainder);
+	div_u64_rem(val, PAGE_SIZE * nd_region->ndr_mappings, &remainder);
 	if (remainder) {
-		dev_dbg(dev, "%llu is not %dK aligned\n", val,
-				(SZ_4K * nd_region->ndr_mappings) / SZ_1K);
+		dev_dbg(dev, "%llu is not %ldK aligned\n", val,
+				(PAGE_SIZE * nd_region->ndr_mappings) / SZ_1K);
 		return -EINVAL;
 	}
 
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 57de49b79d7d..9550202aa7be 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -944,10 +944,10 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
 		struct nd_mapping_desc *mapping = &ndr_desc->mapping[i];
 		struct nvdimm *nvdimm = mapping->nvdimm;
 
-		if ((mapping->start | mapping->size) % SZ_4K) {
-			dev_err(&nvdimm_bus->dev, "%s: %s mapping%d is not 4K aligned\n",
-					caller, dev_name(&nvdimm->dev), i);
-
+		if ((mapping->start | mapping->size) % PAGE_SIZE) {
+			dev_err(&nvdimm_bus->dev,
+				"%s: %s mapping%d is not %ld aligned\n",
+				caller, dev_name(&nvdimm->dev), i, PAGE_SIZE);
 			return NULL;
 		}
 
-- 
2.21.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v6 7/7] libnvdimm/dax: Pick the right alignment default when creating dax devices
  2019-08-19 13:34 [PATCH v6 0/7] Mark the namespace disabled on pfn superblock mismatch Aneesh Kumar K.V
                   ` (5 preceding siblings ...)
  2019-08-19 13:34 ` [PATCH v6 6/7] libnvdimm: Use PAGE_SIZE instead of SZ_4K for " Aneesh Kumar K.V
@ 2019-08-19 13:34 ` Aneesh Kumar K.V
  2019-09-04  0:18   ` Dan Williams
  6 siblings, 1 reply; 11+ messages in thread
From: Aneesh Kumar K.V @ 2019-08-19 13:34 UTC (permalink / raw)
  To: dan.j.williams; +Cc: Aneesh Kumar K.V, linux-nvdimm

Allow arch to provide the supported alignments and use hugepage alignment only
if we support hugepage. Right now we depend on compile time configs whereas this
patch switch this to runtime discovery.

Architectures like ppc64 can have THP enabled in code, but then can have
hugepage size disabled by the hypervisor. This allows us to create dax devices
with PAGE_SIZE alignment in this case.

Existing dax namespace with alignment larger than PAGE_SIZE will fail to
initialize in this specific case. We still allow fsdax namespace initialization.

With respect to identifying whether to enable hugepage fault for a dax device,
if THP is enabled during compile, we default to taking hugepage fault and in dax
fault handler if we find the fault size > alignment we retry with PAGE_SIZE
fault size.

This also addresses the below failure scenario on ppc64

ndctl create-namespace --mode=devdax  | grep align
 "align":16777216,
 "align":16777216

cat /sys/devices/ndbus0/region0/dax0.0/supported_alignments
 65536 16777216

daxio.static-debug  -z -o /dev/dax0.0
  Bus error (core dumped)

  $ dmesg | tail
   lpar: Failed hash pte insert with error -4
   hash-mmu: mm: Hashing failure ! EA=0x7fff17000000 access=0x8000000000000006 current=daxio
   hash-mmu:     trap=0x300 vsid=0x22cb7a3 ssize=1 base psize=2 psize 10 pte=0xc000000501002b86
   daxio[3860]: bus error (7) at 7fff17000000 nip 7fff973c007c lr 7fff973bff34 code 2 in libpmem.so.1.0.0[7fff973b0000+20000]
   daxio[3860]: code: 792945e4 7d494b78 e95f0098 7d494b78 f93f00a0 4800012c e93f0088 f93f0120
   daxio[3860]: code: e93f00a0 f93f0128 e93f0120 e95f0128 <f9490000> e93f0088 39290008 f93f0110

The failure was due to guest kernel using wrong page size.

The namespaces created with 16M alignment will appear as below on a config with
16M page size disabled.

$ ndctl list -Ni
[
  {
    "dev":"namespace0.1",
    "mode":"fsdax",
    "map":"dev",
    "size":5351931904,
    "uuid":"fc6e9667-461a-4718-82b4-69b24570bddb",
    "align":16777216,
    "blockdev":"pmem0.1",
    "supported_alignments":[
      65536
    ]
  },
  {
    "dev":"namespace0.0",
    "mode":"fsdax",    <==== devdax 16M alignment marked disabled.
    "map":"mem",
    "size":5368709120,
    "uuid":"a4bdf81a-f2ee-4bc6-91db-7b87eddd0484",
    "state":"disabled"
  }
]

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/libnvdimm.h |  9 ++++++++
 arch/powerpc/mm/Makefile             |  1 +
 arch/powerpc/mm/nvdimm.c             | 34 ++++++++++++++++++++++++++++
 arch/x86/include/asm/libnvdimm.h     | 19 ++++++++++++++++
 drivers/nvdimm/nd.h                  |  6 -----
 drivers/nvdimm/pfn_devs.c            | 32 +++++++++++++++++++++++++-
 include/linux/huge_mm.h              |  7 +++++-
 7 files changed, 100 insertions(+), 8 deletions(-)
 create mode 100644 arch/powerpc/include/asm/libnvdimm.h
 create mode 100644 arch/powerpc/mm/nvdimm.c
 create mode 100644 arch/x86/include/asm/libnvdimm.h

diff --git a/arch/powerpc/include/asm/libnvdimm.h b/arch/powerpc/include/asm/libnvdimm.h
new file mode 100644
index 000000000000..d35fd7f48603
--- /dev/null
+++ b/arch/powerpc/include/asm/libnvdimm.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_LIBNVDIMM_H
+#define _ASM_POWERPC_LIBNVDIMM_H
+
+#define nd_pfn_supported_alignments nd_pfn_supported_alignments
+extern unsigned long *nd_pfn_supported_alignments(void);
+extern unsigned long nd_pfn_default_alignment(void);
+
+#endif
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 0f499db315d6..42e4a399ba5d 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -20,3 +20,4 @@ obj-$(CONFIG_HIGHMEM)		+= highmem.o
 obj-$(CONFIG_PPC_COPRO_BASE)	+= copro_fault.o
 obj-$(CONFIG_PPC_PTDUMP)	+= ptdump/
 obj-$(CONFIG_KASAN)		+= kasan/
+obj-$(CONFIG_NVDIMM_PFN)		+= nvdimm.o
diff --git a/arch/powerpc/mm/nvdimm.c b/arch/powerpc/mm/nvdimm.c
new file mode 100644
index 000000000000..a29a4510715e
--- /dev/null
+++ b/arch/powerpc/mm/nvdimm.c
@@ -0,0 +1,34 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <asm/pgtable.h>
+#include <asm/page.h>
+
+#include <linux/mm.h>
+/*
+ * We support only pte and pmd mappings for now.
+ */
+const unsigned long *nd_pfn_supported_alignments(void)
+{
+	static unsigned long supported_alignments[3];
+
+	supported_alignments[0] = PAGE_SIZE;
+
+	if (has_transparent_hugepage())
+		supported_alignments[1] = HPAGE_PMD_SIZE;
+	else
+		supported_alignments[1] = 0;
+
+	supported_alignments[2] = 0;
+	return supported_alignments;
+}
+
+/*
+ * Use pmd mapping if supported as default alignment
+ */
+unsigned long nd_pfn_default_alignment(void)
+{
+
+	if (has_transparent_hugepage())
+		return HPAGE_PMD_SIZE;
+	return PAGE_SIZE;
+}
diff --git a/arch/x86/include/asm/libnvdimm.h b/arch/x86/include/asm/libnvdimm.h
new file mode 100644
index 000000000000..3d5361db9164
--- /dev/null
+++ b/arch/x86/include/asm/libnvdimm.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_LIBNVDIMM_H
+#define _ASM_X86_LIBNVDIMM_H
+
+static inline unsigned long nd_pfn_default_alignment(void)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+	return HPAGE_PMD_SIZE;
+#else
+	return PAGE_SIZE;
+#endif
+}
+
+static inline unsigned long nd_altmap_align_size(unsigned long nd_align)
+{
+	return PMD_SIZE;
+}
+
+#endif
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index e89af4b2d8e9..401a78b02116 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -289,12 +289,6 @@ static inline struct device *nd_btt_create(struct nd_region *nd_region)
 struct nd_pfn *to_nd_pfn(struct device *dev);
 #if IS_ENABLED(CONFIG_NVDIMM_PFN)
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-#define PFN_DEFAULT_ALIGNMENT HPAGE_PMD_SIZE
-#else
-#define PFN_DEFAULT_ALIGNMENT PAGE_SIZE
-#endif
-
 int nd_pfn_probe(struct device *dev, struct nd_namespace_common *ndns);
 bool is_nd_pfn(struct device *dev);
 struct device *nd_pfn_create(struct nd_region *nd_region);
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 22db77ac3d0e..7dc0b5da4c6b 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -10,6 +10,7 @@
 #include <linux/slab.h>
 #include <linux/fs.h>
 #include <linux/mm.h>
+#include <asm/libnvdimm.h>
 #include "nd-core.h"
 #include "pfn.h"
 #include "nd.h"
@@ -103,6 +104,8 @@ static ssize_t align_show(struct device *dev,
 	return sprintf(buf, "%ld\n", nd_pfn->align);
 }
 
+#ifndef nd_pfn_supported_alignments
+#define nd_pfn_supported_alignments nd_pfn_supported_alignments
 static const unsigned long *nd_pfn_supported_alignments(void)
 {
 	/*
@@ -125,6 +128,7 @@ static const unsigned long *nd_pfn_supported_alignments(void)
 
 	return data;
 }
+#endif
 
 static ssize_t align_store(struct device *dev,
 		struct device_attribute *attr, const char *buf, size_t len)
@@ -302,7 +306,7 @@ struct device *nd_pfn_devinit(struct nd_pfn *nd_pfn,
 		return NULL;
 
 	nd_pfn->mode = PFN_MODE_NONE;
-	nd_pfn->align = PFN_DEFAULT_ALIGNMENT;
+	nd_pfn->align = nd_pfn_default_alignment();
 	dev = &nd_pfn->dev;
 	device_initialize(&nd_pfn->dev);
 	if (ndns && !__nd_attach_ndns(&nd_pfn->dev, ndns, &nd_pfn->ndns)) {
@@ -412,6 +416,20 @@ static int nd_pfn_clear_memmap_errors(struct nd_pfn *nd_pfn)
 	return 0;
 }
 
+static bool nd_supported_alignment(unsigned long align)
+{
+	int i;
+	const unsigned long *supported = nd_pfn_supported_alignments();
+
+	if (align == 0)
+		return false;
+
+	for (i = 0; supported[i]; i++)
+		if (align == supported[i])
+			return true;
+	return false;
+}
+
 /**
  * nd_pfn_validate - read and validate info-block
  * @nd_pfn: fsdax namespace runtime state / properties
@@ -496,6 +514,18 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
 		return -EOPNOTSUPP;
 	}
 
+	/*
+	 * Check whether the we support the alignment. For Dax if the
+	 * superblock alignment is not matching, we won't initialize
+	 * the device.
+	 */
+	if (!nd_supported_alignment(align) &&
+	    !memcmp(pfn_sb->signature, DAX_SIG, PFN_SIG_LEN)) {
+		dev_err(&nd_pfn->dev, "init failed, alignment mismatch: "
+			"%ld:%ld\n", nd_pfn->align, align);
+		return -EOPNOTSUPP;
+	}
+
 	if (!nd_pfn->uuid) {
 		/*
 		 * When probing a namepace via nd_pfn_probe() the uuid
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 45ede62aa85b..4fa91f9bd0da 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -108,7 +108,12 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
 
 	if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
 		return true;
-
+	/*
+	 * For dax let's try to do hugepage fault always. If we don't support
+	 * hugepages we will not have enabled namespaces with hugepage alignment.
+	 * This also means we try to handle hugepage fault on device with
+	 * smaller alignment. But for then we will return with VM_FAULT_FALLBACK
+	 */
 	if (vma_is_dax(vma))
 		return true;
 
-- 
2.21.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 7/7] libnvdimm/dax: Pick the right alignment default when creating dax devices
  2019-08-19 13:34 ` [PATCH v6 7/7] libnvdimm/dax: Pick the right alignment default when creating dax devices Aneesh Kumar K.V
@ 2019-09-04  0:18   ` Dan Williams
  2019-09-04  5:17     ` Aneesh Kumar K.V
  0 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2019-09-04  0:18 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-nvdimm

Hi Aneesh,

Patches 1 through 6 look ok, but I feel compelled to point out that
the patches deploy a different indentation style than the surrounding
code. I rely on the vim "=" operator to indent 2 tabs when for
multi-line if statements, i.e. this:

        if (!nd_supported_alignment(align) &&
                        !memcmp(pfn_sb->signature, DAX_SIG, PFN_SIG_LEN)) {
                dev_err(&nd_pfn->dev, "init failed, alignment mismatch: "
                                "%ld:%ld\n", nd_pfn->align, align);
                return -EOPNOTSUPP;
        }

...instead of this:

        if (!nd_supported_alignment(align) &&
            !memcmp(pfn_sb->signature, DAX_SIG, PFN_SIG_LEN)) {
                dev_err(&nd_pfn->dev, "init failed, alignment mismatch: "
                        "%ld:%ld\n", nd_pfn->align, align);
                return -EOPNOTSUPP;
        }

Could you reflow / resend for v7 after addressing the below comments?
To be clear I don't mind the lined up style, but mixed style within
the same file is distracting.

On Mon, Aug 19, 2019 at 6:35 AM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> Allow arch to provide the supported alignments and use hugepage alignment only
> if we support hugepage. Right now we depend on compile time configs whereas this
> patch switch this to runtime discovery.
>
> Architectures like ppc64 can have THP enabled in code, but then can have
> hugepage size disabled by the hypervisor. This allows us to create dax devices
> with PAGE_SIZE alignment in this case.
>
> Existing dax namespace with alignment larger than PAGE_SIZE will fail to
> initialize in this specific case. We still allow fsdax namespace initialization.
>
> With respect to identifying whether to enable hugepage fault for a dax device,
> if THP is enabled during compile, we default to taking hugepage fault and in dax
> fault handler if we find the fault size > alignment we retry with PAGE_SIZE
> fault size.
>
> This also addresses the below failure scenario on ppc64
>
> ndctl create-namespace --mode=devdax  | grep align
>  "align":16777216,
>  "align":16777216
>
> cat /sys/devices/ndbus0/region0/dax0.0/supported_alignments
>  65536 16777216
>
> daxio.static-debug  -z -o /dev/dax0.0
>   Bus error (core dumped)
>
>   $ dmesg | tail
>    lpar: Failed hash pte insert with error -4
>    hash-mmu: mm: Hashing failure ! EA=0x7fff17000000 access=0x8000000000000006 current=daxio
>    hash-mmu:     trap=0x300 vsid=0x22cb7a3 ssize=1 base psize=2 psize 10 pte=0xc000000501002b86
>    daxio[3860]: bus error (7) at 7fff17000000 nip 7fff973c007c lr 7fff973bff34 code 2 in libpmem.so.1.0.0[7fff973b0000+20000]
>    daxio[3860]: code: 792945e4 7d494b78 e95f0098 7d494b78 f93f00a0 4800012c e93f0088 f93f0120
>    daxio[3860]: code: e93f00a0 f93f0128 e93f0120 e95f0128 <f9490000> e93f0088 39290008 f93f0110
>
> The failure was due to guest kernel using wrong page size.
>
> The namespaces created with 16M alignment will appear as below on a config with
> 16M page size disabled.
>
> $ ndctl list -Ni
> [
>   {
>     "dev":"namespace0.1",
>     "mode":"fsdax",
>     "map":"dev",
>     "size":5351931904,
>     "uuid":"fc6e9667-461a-4718-82b4-69b24570bddb",
>     "align":16777216,
>     "blockdev":"pmem0.1",
>     "supported_alignments":[
>       65536
>     ]
>   },
>   {
>     "dev":"namespace0.0",
>     "mode":"fsdax",    <==== devdax 16M alignment marked disabled.

Why is fsdax disabled? It's fine for the fsdax case to fall back to
smaller faults. Or, this running into the case where the stored page
size is larger than the currently running PAGE_SIZE?

>     "map":"mem",
>     "size":5368709120,
>     "uuid":"a4bdf81a-f2ee-4bc6-91db-7b87eddd0484",
>     "state":"disabled"
>   }
> ]
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  arch/powerpc/include/asm/libnvdimm.h |  9 ++++++++
>  arch/powerpc/mm/Makefile             |  1 +
>  arch/powerpc/mm/nvdimm.c             | 34 ++++++++++++++++++++++++++++
>  arch/x86/include/asm/libnvdimm.h     | 19 ++++++++++++++++
>  drivers/nvdimm/nd.h                  |  6 -----
>  drivers/nvdimm/pfn_devs.c            | 32 +++++++++++++++++++++++++-
>  include/linux/huge_mm.h              |  7 +++++-
>  7 files changed, 100 insertions(+), 8 deletions(-)
>  create mode 100644 arch/powerpc/include/asm/libnvdimm.h
>  create mode 100644 arch/powerpc/mm/nvdimm.c
>  create mode 100644 arch/x86/include/asm/libnvdimm.h
>
> diff --git a/arch/powerpc/include/asm/libnvdimm.h b/arch/powerpc/include/asm/libnvdimm.h
> new file mode 100644
> index 000000000000..d35fd7f48603
> --- /dev/null
> +++ b/arch/powerpc/include/asm/libnvdimm.h
> @@ -0,0 +1,9 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_POWERPC_LIBNVDIMM_H
> +#define _ASM_POWERPC_LIBNVDIMM_H
> +
> +#define nd_pfn_supported_alignments nd_pfn_supported_alignments
> +extern unsigned long *nd_pfn_supported_alignments(void);
> +extern unsigned long nd_pfn_default_alignment(void);
> +
> +#endif
> diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
> index 0f499db315d6..42e4a399ba5d 100644
> --- a/arch/powerpc/mm/Makefile
> +++ b/arch/powerpc/mm/Makefile
> @@ -20,3 +20,4 @@ obj-$(CONFIG_HIGHMEM)         += highmem.o
>  obj-$(CONFIG_PPC_COPRO_BASE)   += copro_fault.o
>  obj-$(CONFIG_PPC_PTDUMP)       += ptdump/
>  obj-$(CONFIG_KASAN)            += kasan/
> +obj-$(CONFIG_NVDIMM_PFN)               += nvdimm.o
> diff --git a/arch/powerpc/mm/nvdimm.c b/arch/powerpc/mm/nvdimm.c
> new file mode 100644
> index 000000000000..a29a4510715e
> --- /dev/null
> +++ b/arch/powerpc/mm/nvdimm.c
> @@ -0,0 +1,34 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <asm/pgtable.h>
> +#include <asm/page.h>
> +
> +#include <linux/mm.h>
> +/*
> + * We support only pte and pmd mappings for now.
> + */
> +const unsigned long *nd_pfn_supported_alignments(void)
> +{
> +       static unsigned long supported_alignments[3];
> +
> +       supported_alignments[0] = PAGE_SIZE;
> +
> +       if (has_transparent_hugepage())
> +               supported_alignments[1] = HPAGE_PMD_SIZE;
> +       else
> +               supported_alignments[1] = 0;
> +
> +       supported_alignments[2] = 0;
> +       return supported_alignments;
> +}

I'd prefer to not create a powerpc specific object file especially
because nothing in the above looks powerpc specific. Anything that
prevents just making this the common implementation?

> +
> +/*
> + * Use pmd mapping if supported as default alignment
> + */
> +unsigned long nd_pfn_default_alignment(void)
> +{
> +
> +       if (has_transparent_hugepage())
> +               return HPAGE_PMD_SIZE;
> +       return PAGE_SIZE;
> +}
> diff --git a/arch/x86/include/asm/libnvdimm.h b/arch/x86/include/asm/libnvdimm.h
> new file mode 100644
> index 000000000000..3d5361db9164
> --- /dev/null
> +++ b/arch/x86/include/asm/libnvdimm.h
> @@ -0,0 +1,19 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_X86_LIBNVDIMM_H
> +#define _ASM_X86_LIBNVDIMM_H
> +
> +static inline unsigned long nd_pfn_default_alignment(void)
> +{
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +       return HPAGE_PMD_SIZE;
> +#else
> +       return PAGE_SIZE;
> +#endif
> +}
> +
> +static inline unsigned long nd_altmap_align_size(unsigned long nd_align)
> +{
> +       return PMD_SIZE;
> +}
> +
> +#endif
> diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
> index e89af4b2d8e9..401a78b02116 100644
> --- a/drivers/nvdimm/nd.h
> +++ b/drivers/nvdimm/nd.h
> @@ -289,12 +289,6 @@ static inline struct device *nd_btt_create(struct nd_region *nd_region)
>  struct nd_pfn *to_nd_pfn(struct device *dev);
>  #if IS_ENABLED(CONFIG_NVDIMM_PFN)
>
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -#define PFN_DEFAULT_ALIGNMENT HPAGE_PMD_SIZE
> -#else
> -#define PFN_DEFAULT_ALIGNMENT PAGE_SIZE
> -#endif
> -
>  int nd_pfn_probe(struct device *dev, struct nd_namespace_common *ndns);
>  bool is_nd_pfn(struct device *dev);
>  struct device *nd_pfn_create(struct nd_region *nd_region);
> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> index 22db77ac3d0e..7dc0b5da4c6b 100644
> --- a/drivers/nvdimm/pfn_devs.c
> +++ b/drivers/nvdimm/pfn_devs.c
> @@ -10,6 +10,7 @@
>  #include <linux/slab.h>
>  #include <linux/fs.h>
>  #include <linux/mm.h>
> +#include <asm/libnvdimm.h>
>  #include "nd-core.h"
>  #include "pfn.h"
>  #include "nd.h"
> @@ -103,6 +104,8 @@ static ssize_t align_show(struct device *dev,
>         return sprintf(buf, "%ld\n", nd_pfn->align);
>  }
>
> +#ifndef nd_pfn_supported_alignments
> +#define nd_pfn_supported_alignments nd_pfn_supported_alignments
>  static const unsigned long *nd_pfn_supported_alignments(void)
>  {
>         /*
> @@ -125,6 +128,7 @@ static const unsigned long *nd_pfn_supported_alignments(void)
>
>         return data;
>  }
> +#endif
>
>  static ssize_t align_store(struct device *dev,
>                 struct device_attribute *attr, const char *buf, size_t len)
> @@ -302,7 +306,7 @@ struct device *nd_pfn_devinit(struct nd_pfn *nd_pfn,
>                 return NULL;
>
>         nd_pfn->mode = PFN_MODE_NONE;
> -       nd_pfn->align = PFN_DEFAULT_ALIGNMENT;
> +       nd_pfn->align = nd_pfn_default_alignment();
>         dev = &nd_pfn->dev;
>         device_initialize(&nd_pfn->dev);
>         if (ndns && !__nd_attach_ndns(&nd_pfn->dev, ndns, &nd_pfn->ndns)) {
> @@ -412,6 +416,20 @@ static int nd_pfn_clear_memmap_errors(struct nd_pfn *nd_pfn)
>         return 0;
>  }
>
> +static bool nd_supported_alignment(unsigned long align)
> +{
> +       int i;
> +       const unsigned long *supported = nd_pfn_supported_alignments();
> +
> +       if (align == 0)
> +               return false;
> +
> +       for (i = 0; supported[i]; i++)
> +               if (align == supported[i])
> +                       return true;
> +       return false;
> +}
> +
>  /**
>   * nd_pfn_validate - read and validate info-block
>   * @nd_pfn: fsdax namespace runtime state / properties
> @@ -496,6 +514,18 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
>                 return -EOPNOTSUPP;
>         }
>
> +       /*
> +        * Check whether the we support the alignment. For Dax if the
> +        * superblock alignment is not matching, we won't initialize
> +        * the device.
> +        */
> +       if (!nd_supported_alignment(align) &&
> +           !memcmp(pfn_sb->signature, DAX_SIG, PFN_SIG_LEN)) {

^^^ more indentation reflow.

> +               dev_err(&nd_pfn->dev, "init failed, alignment mismatch: "
> +                       "%ld:%ld\n", nd_pfn->align, align);
> +               return -EOPNOTSUPP;
> +       }
> +
>         if (!nd_pfn->uuid) {
>                 /*
>                  * When probing a namepace via nd_pfn_probe() the uuid
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 45ede62aa85b..4fa91f9bd0da 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -108,7 +108,12 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
>
>         if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
>                 return true;
> -
> +       /*
> +        * For dax let's try to do hugepage fault always. If we don't support
> +        * hugepages we will not have enabled namespaces with hugepage alignment.
> +        * This also means we try to handle hugepage fault on device with
> +        * smaller alignment. But for then we will return with VM_FAULT_FALLBACK
> +        */

Please fix this comment up to not use the word "we". This should also
get an ack from linux-mm folks. "We" in the above alternately means
"we: the runtime/compile-tims platform setting", or "we: the kernel
implementation".
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 7/7] libnvdimm/dax: Pick the right alignment default when creating dax devices
  2019-09-04  0:18   ` Dan Williams
@ 2019-09-04  5:17     ` Aneesh Kumar K.V
       [not found]       ` <ae11dcb0-beaf-a6cf-9456-ac6a7ec84743-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Aneesh Kumar K.V @ 2019-09-04  5:17 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm

On 9/4/19 5:48 AM, Dan Williams wrote:
> Hi Aneesh,
> 
> Patches 1 through 6 look ok, but I feel compelled to point out that
> the patches deploy a different indentation style than the surrounding
> code. I rely on the vim "=" operator to indent 2 tabs when for
> multi-line if statements, i.e. this:
> 
>          if (!nd_supported_alignment(align) &&
>                          !memcmp(pfn_sb->signature, DAX_SIG, PFN_SIG_LEN)) {
>                  dev_err(&nd_pfn->dev, "init failed, alignment mismatch: "
>                                  "%ld:%ld\n", nd_pfn->align, align);
>                  return -EOPNOTSUPP;
>          }
> 
> ...instead of this:
> 
>          if (!nd_supported_alignment(align) &&
>              !memcmp(pfn_sb->signature, DAX_SIG, PFN_SIG_LEN)) {
>                  dev_err(&nd_pfn->dev, "init failed, alignment mismatch: "
>                          "%ld:%ld\n", nd_pfn->align, align);
>                  return -EOPNOTSUPP;
>          }
> 
> Could you reflow / resend for v7 after addressing the below comments?
> To be clear I don't mind the lined up style, but mixed style within
> the same file is distracting.
>


I sent a v7 and then realized there are more feedback below. Let me 
reply here and then possibly send only this patch updated if needed

> On Mon, Aug 19, 2019 at 6:35 AM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> Allow arch to provide the supported alignments and use hugepage alignment only
>> if we support hugepage. Right now we depend on compile time configs whereas this
>> patch switch this to runtime discovery.
>>
>> Architectures like ppc64 can have THP enabled in code, but then can have
>> hugepage size disabled by the hypervisor. This allows us to create dax devices
>> with PAGE_SIZE alignment in this case.
>>
>> Existing dax namespace with alignment larger than PAGE_SIZE will fail to
>> initialize in this specific case. We still allow fsdax namespace initialization.
>>
>> With respect to identifying whether to enable hugepage fault for a dax device,
>> if THP is enabled during compile, we default to taking hugepage fault and in dax
>> fault handler if we find the fault size > alignment we retry with PAGE_SIZE
>> fault size.
>>
>> This also addresses the below failure scenario on ppc64
>>
>> ndctl create-namespace --mode=devdax  | grep align
>>   "align":16777216,
>>   "align":16777216
>>
>> cat /sys/devices/ndbus0/region0/dax0.0/supported_alignments
>>   65536 16777216
>>
>> daxio.static-debug  -z -o /dev/dax0.0
>>    Bus error (core dumped)
>>
>>    $ dmesg | tail
>>     lpar: Failed hash pte insert with error -4
>>     hash-mmu: mm: Hashing failure ! EA=0x7fff17000000 access=0x8000000000000006 current=daxio
>>     hash-mmu:     trap=0x300 vsid=0x22cb7a3 ssize=1 base psize=2 psize 10 pte=0xc000000501002b86
>>     daxio[3860]: bus error (7) at 7fff17000000 nip 7fff973c007c lr 7fff973bff34 code 2 in libpmem.so.1.0.0[7fff973b0000+20000]
>>     daxio[3860]: code: 792945e4 7d494b78 e95f0098 7d494b78 f93f00a0 4800012c e93f0088 f93f0120
>>     daxio[3860]: code: e93f00a0 f93f0128 e93f0120 e95f0128 <f9490000> e93f0088 39290008 f93f0110
>>
>> The failure was due to guest kernel using wrong page size.
>>
>> The namespaces created with 16M alignment will appear as below on a config with
>> 16M page size disabled.
>>
>> $ ndctl list -Ni
>> [
>>    {
>>      "dev":"namespace0.1",
>>      "mode":"fsdax",
>>      "map":"dev",
>>      "size":5351931904,
>>      "uuid":"fc6e9667-461a-4718-82b4-69b24570bddb",
>>      "align":16777216,
>>      "blockdev":"pmem0.1",
>>      "supported_alignments":[
>>        65536
>>      ]
>>    },
>>    {
>>      "dev":"namespace0.0",
>>      "mode":"fsdax",    <==== devdax 16M alignment marked disabled.
> 
> Why is fsdax disabled? It's fine for the fsdax case to fall back to
> smaller faults. Or, this running into the case where the stored page
> size is larger than the currently running PAGE_SIZE?
> 
>>      "map":"mem",
>>      "size":5368709120,
>>      "uuid":"a4bdf81a-f2ee-4bc6-91db-7b87eddd0484",
>>      "state":"disabled"
>>    }
>> ]
>>


That is how ndctl print the output. It is actually the devdax namespace 
which is marked disabled. That is why I added this as a comment next to 
that "mode": line


>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>   arch/powerpc/include/asm/libnvdimm.h |  9 ++++++++
>>   arch/powerpc/mm/Makefile             |  1 +
>>   arch/powerpc/mm/nvdimm.c             | 34 ++++++++++++++++++++++++++++
>>   arch/x86/include/asm/libnvdimm.h     | 19 ++++++++++++++++
>>   drivers/nvdimm/nd.h                  |  6 -----
>>   drivers/nvdimm/pfn_devs.c            | 32 +++++++++++++++++++++++++-
>>   include/linux/huge_mm.h              |  7 +++++-
>>   7 files changed, 100 insertions(+), 8 deletions(-)
>>   create mode 100644 arch/powerpc/include/asm/libnvdimm.h
>>   create mode 100644 arch/powerpc/mm/nvdimm.c
>>   create mode 100644 arch/x86/include/asm/libnvdimm.h
>>
>> diff --git a/arch/powerpc/include/asm/libnvdimm.h b/arch/powerpc/include/asm/libnvdimm.h
>> new file mode 100644
>> index 000000000000..d35fd7f48603
>> --- /dev/null
>> +++ b/arch/powerpc/include/asm/libnvdimm.h
>> @@ -0,0 +1,9 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _ASM_POWERPC_LIBNVDIMM_H
>> +#define _ASM_POWERPC_LIBNVDIMM_H
>> +
>> +#define nd_pfn_supported_alignments nd_pfn_supported_alignments
>> +extern unsigned long *nd_pfn_supported_alignments(void);
>> +extern unsigned long nd_pfn_default_alignment(void);
>> +
>> +#endif
>> diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
>> index 0f499db315d6..42e4a399ba5d 100644
>> --- a/arch/powerpc/mm/Makefile
>> +++ b/arch/powerpc/mm/Makefile
>> @@ -20,3 +20,4 @@ obj-$(CONFIG_HIGHMEM)         += highmem.o
>>   obj-$(CONFIG_PPC_COPRO_BASE)   += copro_fault.o
>>   obj-$(CONFIG_PPC_PTDUMP)       += ptdump/
>>   obj-$(CONFIG_KASAN)            += kasan/
>> +obj-$(CONFIG_NVDIMM_PFN)               += nvdimm.o
>> diff --git a/arch/powerpc/mm/nvdimm.c b/arch/powerpc/mm/nvdimm.c
>> new file mode 100644
>> index 000000000000..a29a4510715e
>> --- /dev/null
>> +++ b/arch/powerpc/mm/nvdimm.c
>> @@ -0,0 +1,34 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +#include <asm/pgtable.h>
>> +#include <asm/page.h>
>> +
>> +#include <linux/mm.h>
>> +/*
>> + * We support only pte and pmd mappings for now.
>> + */
>> +const unsigned long *nd_pfn_supported_alignments(void)
>> +{
>> +       static unsigned long supported_alignments[3];
>> +
>> +       supported_alignments[0] = PAGE_SIZE;
>> +
>> +       if (has_transparent_hugepage())
>> +               supported_alignments[1] = HPAGE_PMD_SIZE;
>> +       else
>> +               supported_alignments[1] = 0;
>> +
>> +       supported_alignments[2] = 0;
>> +       return supported_alignments;
>> +}
> 
> I'd prefer to not create a powerpc specific object file especially
> because nothing in the above looks powerpc specific. Anything that
> prevents just making this the common implementation?
> 


Support for PUD size alignment. The alignment details of what can be 
supported is largely a arch specific details. We could do that with

#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
		HPAGE_PUD_SIZE,
#endif

But I was looking at arch specific functions rather than having all 
those #ifdef there.

>> +
>> +/*
>> + * Use pmd mapping if supported as default alignment
>> + */
>> +unsigned long nd_pfn_default_alignment(void)
>> +{
>> +
>> +       if (has_transparent_hugepage())
>> +               return HPAGE_PMD_SIZE;
>> +       return PAGE_SIZE;
>> +}
>> diff --git a/arch/x86/include/asm/libnvdimm.h b/arch/x86/include/asm/libnvdimm.h
>> new file mode 100644
>> index 000000000000..3d5361db9164
>> --- /dev/null
>> +++ b/arch/x86/include/asm/libnvdimm.h
>> @@ -0,0 +1,19 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _ASM_X86_LIBNVDIMM_H
>> +#define _ASM_X86_LIBNVDIMM_H
>> +
>> +static inline unsigned long nd_pfn_default_alignment(void)
>> +{
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> +       return HPAGE_PMD_SIZE;
>> +#else
>> +       return PAGE_SIZE;
>> +#endif
>> +}
>> +
>> +static inline unsigned long nd_altmap_align_size(unsigned long nd_align)
>> +{
>> +       return PMD_SIZE;
>> +}
>> +
>> +#endif
>> diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
>> index e89af4b2d8e9..401a78b02116 100644
>> --- a/drivers/nvdimm/nd.h
>> +++ b/drivers/nvdimm/nd.h
>> @@ -289,12 +289,6 @@ static inline struct device *nd_btt_create(struct nd_region *nd_region)
>>   struct nd_pfn *to_nd_pfn(struct device *dev);
>>   #if IS_ENABLED(CONFIG_NVDIMM_PFN)
>>
>> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> -#define PFN_DEFAULT_ALIGNMENT HPAGE_PMD_SIZE
>> -#else
>> -#define PFN_DEFAULT_ALIGNMENT PAGE_SIZE
>> -#endif
>> -
>>   int nd_pfn_probe(struct device *dev, struct nd_namespace_common *ndns);
>>   bool is_nd_pfn(struct device *dev);
>>   struct device *nd_pfn_create(struct nd_region *nd_region);
>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
>> index 22db77ac3d0e..7dc0b5da4c6b 100644
>> --- a/drivers/nvdimm/pfn_devs.c
>> +++ b/drivers/nvdimm/pfn_devs.c
>> @@ -10,6 +10,7 @@
>>   #include <linux/slab.h>
>>   #include <linux/fs.h>
>>   #include <linux/mm.h>
>> +#include <asm/libnvdimm.h>
>>   #include "nd-core.h"
>>   #include "pfn.h"
>>   #include "nd.h"
>> @@ -103,6 +104,8 @@ static ssize_t align_show(struct device *dev,
>>          return sprintf(buf, "%ld\n", nd_pfn->align);
>>   }
>>
>> +#ifndef nd_pfn_supported_alignments
>> +#define nd_pfn_supported_alignments nd_pfn_supported_alignments
>>   static const unsigned long *nd_pfn_supported_alignments(void)
>>   {
>>          /*
>> @@ -125,6 +128,7 @@ static const unsigned long *nd_pfn_supported_alignments(void)
>>
>>          return data;
>>   }
>> +#endif
>>
>>   static ssize_t align_store(struct device *dev,
>>                  struct device_attribute *attr, const char *buf, size_t len)
>> @@ -302,7 +306,7 @@ struct device *nd_pfn_devinit(struct nd_pfn *nd_pfn,
>>                  return NULL;
>>
>>          nd_pfn->mode = PFN_MODE_NONE;
>> -       nd_pfn->align = PFN_DEFAULT_ALIGNMENT;
>> +       nd_pfn->align = nd_pfn_default_alignment();
>>          dev = &nd_pfn->dev;
>>          device_initialize(&nd_pfn->dev);
>>          if (ndns && !__nd_attach_ndns(&nd_pfn->dev, ndns, &nd_pfn->ndns)) {
>> @@ -412,6 +416,20 @@ static int nd_pfn_clear_memmap_errors(struct nd_pfn *nd_pfn)
>>          return 0;
>>   }
>>
>> +static bool nd_supported_alignment(unsigned long align)
>> +{
>> +       int i;
>> +       const unsigned long *supported = nd_pfn_supported_alignments();
>> +
>> +       if (align == 0)
>> +               return false;
>> +
>> +       for (i = 0; supported[i]; i++)
>> +               if (align == supported[i])
>> +                       return true;
>> +       return false;
>> +}
>> +
>>   /**
>>    * nd_pfn_validate - read and validate info-block
>>    * @nd_pfn: fsdax namespace runtime state / properties
>> @@ -496,6 +514,18 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
>>                  return -EOPNOTSUPP;
>>          }
>>
>> +       /*
>> +        * Check whether the we support the alignment. For Dax if the
>> +        * superblock alignment is not matching, we won't initialize
>> +        * the device.
>> +        */
>> +       if (!nd_supported_alignment(align) &&
>> +           !memcmp(pfn_sb->signature, DAX_SIG, PFN_SIG_LEN)) {
> 
> ^^^ more indentation reflow.
> 

Both .clang-format and earlier emacs customization recommendation used 
the above style. I sent a  v7 with this fixup alone.


>> +               dev_err(&nd_pfn->dev, "init failed, alignment mismatch: "
>> +                       "%ld:%ld\n", nd_pfn->align, align);
>> +               return -EOPNOTSUPP;
>> +       }
>> +
>>          if (!nd_pfn->uuid) {
>>                  /*
>>                   * When probing a namepace via nd_pfn_probe() the uuid
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 45ede62aa85b..4fa91f9bd0da 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -108,7 +108,12 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
>>
>>          if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
>>                  return true;
>> -
>> +       /*
>> +        * For dax let's try to do hugepage fault always. If we don't support
>> +        * hugepages we will not have enabled namespaces with hugepage alignment.
>> +        * This also means we try to handle hugepage fault on device with
>> +        * smaller alignment. But for then we will return with VM_FAULT_FALLBACK
>> +        */
> 
> Please fix this comment up to not use the word "we". This should also
> get an ack from linux-mm folks. "We" in the above alternately means
> "we: the runtime/compile-tims platform setting", or "we: the kernel
> implementation".
> 

I can update the comment to indicate runtime part.

-aneesh
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 7/7] libnvdimm/dax: Pick the right alignment default when creating dax devices
       [not found]       ` <ae11dcb0-beaf-a6cf-9456-ac6a7ec84743-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>
@ 2019-09-04 18:10         ` Dan Williams
  0 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2019-09-04 18:10 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-nvdimm

On Tue, Sep 3, 2019 at 10:18 PM Aneesh Kumar K.V
<aneesh.kumar-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org> wrote:
>
> On 9/4/19 5:48 AM, Dan Williams wrote:
> > Hi Aneesh,
> >
> > Patches 1 through 6 look ok, but I feel compelled to point out that
> > the patches deploy a different indentation style than the surrounding
> > code. I rely on the vim "=" operator to indent 2 tabs when for
> > multi-line if statements, i.e. this:
> >
> >          if (!nd_supported_alignment(align) &&
> >                          !memcmp(pfn_sb->signature, DAX_SIG, PFN_SIG_LEN)) {
> >                  dev_err(&nd_pfn->dev, "init failed, alignment mismatch: "
> >                                  "%ld:%ld\n", nd_pfn->align, align);
> >                  return -EOPNOTSUPP;
> >          }
> >
> > ...instead of this:
> >
> >          if (!nd_supported_alignment(align) &&
> >              !memcmp(pfn_sb->signature, DAX_SIG, PFN_SIG_LEN)) {
> >                  dev_err(&nd_pfn->dev, "init failed, alignment mismatch: "
> >                          "%ld:%ld\n", nd_pfn->align, align);
> >                  return -EOPNOTSUPP;
> >          }
> >
> > Could you reflow / resend for v7 after addressing the below comments?
> > To be clear I don't mind the lined up style, but mixed style within
> > the same file is distracting.
> >
>
>
> I sent a v7 and then realized there are more feedback below. Let me
> reply here and then possibly send only this patch updated if needed

Ok.

> > On Mon, Aug 19, 2019 at 6:35 AM Aneesh Kumar K.V
> > <aneesh.kumar-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org> wrote:
> >>
> >> Allow arch to provide the supported alignments and use hugepage alignment only
> >> if we support hugepage. Right now we depend on compile time configs whereas this
> >> patch switch this to runtime discovery.
> >>
> >> Architectures like ppc64 can have THP enabled in code, but then can have
> >> hugepage size disabled by the hypervisor. This allows us to create dax devices
> >> with PAGE_SIZE alignment in this case.
> >>
> >> Existing dax namespace with alignment larger than PAGE_SIZE will fail to
> >> initialize in this specific case. We still allow fsdax namespace initialization.
> >>
> >> With respect to identifying whether to enable hugepage fault for a dax device,
> >> if THP is enabled during compile, we default to taking hugepage fault and in dax
> >> fault handler if we find the fault size > alignment we retry with PAGE_SIZE
> >> fault size.
> >>
> >> This also addresses the below failure scenario on ppc64
> >>
> >> ndctl create-namespace --mode=devdax  | grep align
> >>   "align":16777216,
> >>   "align":16777216
> >>
> >> cat /sys/devices/ndbus0/region0/dax0.0/supported_alignments
> >>   65536 16777216
> >>
> >> daxio.static-debug  -z -o /dev/dax0.0
> >>    Bus error (core dumped)
> >>
> >>    $ dmesg | tail
> >>     lpar: Failed hash pte insert with error -4
> >>     hash-mmu: mm: Hashing failure ! EA=0x7fff17000000 access=0x8000000000000006 current=daxio
> >>     hash-mmu:     trap=0x300 vsid=0x22cb7a3 ssize=1 base psize=2 psize 10 pte=0xc000000501002b86
> >>     daxio[3860]: bus error (7) at 7fff17000000 nip 7fff973c007c lr 7fff973bff34 code 2 in libpmem.so.1.0.0[7fff973b0000+20000]
> >>     daxio[3860]: code: 792945e4 7d494b78 e95f0098 7d494b78 f93f00a0 4800012c e93f0088 f93f0120
> >>     daxio[3860]: code: e93f00a0 f93f0128 e93f0120 e95f0128 <f9490000> e93f0088 39290008 f93f0110
> >>
> >> The failure was due to guest kernel using wrong page size.
> >>
> >> The namespaces created with 16M alignment will appear as below on a config with
> >> 16M page size disabled.
> >>
> >> $ ndctl list -Ni
> >> [
> >>    {
> >>      "dev":"namespace0.1",
> >>      "mode":"fsdax",
> >>      "map":"dev",
> >>      "size":5351931904,
> >>      "uuid":"fc6e9667-461a-4718-82b4-69b24570bddb",
> >>      "align":16777216,
> >>      "blockdev":"pmem0.1",
> >>      "supported_alignments":[
> >>        65536
> >>      ]
> >>    },
> >>    {
> >>      "dev":"namespace0.0",
> >>      "mode":"fsdax",    <==== devdax 16M alignment marked disabled.
> >
> > Why is fsdax disabled? It's fine for the fsdax case to fall back to
> > smaller faults. Or, this running into the case where the stored page
> > size is larger than the currently running PAGE_SIZE?
> >
> >>      "map":"mem",
> >>      "size":5368709120,
> >>      "uuid":"a4bdf81a-f2ee-4bc6-91db-7b87eddd0484",
> >>      "state":"disabled"
> >>    }
> >> ]
> >>
>
>
> That is how ndctl print the output. It is actually the devdax namespace
> which is marked disabled. That is why I added this as a comment next to
> that "mode": line

Oh, thanks for the clarification. That is indeed confusing. I suspect
this is due to the fact that of_pmem sets ND_REGION_PAGEMAP, so the
default state is "fsdax" instead of "raw" like other namespace types.

I wonder if we should hide "mode" by default when the namespace is
disabled? Otherwise just might need to document that "mode" is
unreliable when the namespace is disabled.

>
>
> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>
> >> ---
> >>   arch/powerpc/include/asm/libnvdimm.h |  9 ++++++++
> >>   arch/powerpc/mm/Makefile             |  1 +
> >>   arch/powerpc/mm/nvdimm.c             | 34 ++++++++++++++++++++++++++++
> >>   arch/x86/include/asm/libnvdimm.h     | 19 ++++++++++++++++
> >>   drivers/nvdimm/nd.h                  |  6 -----
> >>   drivers/nvdimm/pfn_devs.c            | 32 +++++++++++++++++++++++++-
> >>   include/linux/huge_mm.h              |  7 +++++-
> >>   7 files changed, 100 insertions(+), 8 deletions(-)
> >>   create mode 100644 arch/powerpc/include/asm/libnvdimm.h
> >>   create mode 100644 arch/powerpc/mm/nvdimm.c
> >>   create mode 100644 arch/x86/include/asm/libnvdimm.h
> >>
> >> diff --git a/arch/powerpc/include/asm/libnvdimm.h b/arch/powerpc/include/asm/libnvdimm.h
> >> new file mode 100644
> >> index 000000000000..d35fd7f48603
> >> --- /dev/null
> >> +++ b/arch/powerpc/include/asm/libnvdimm.h
> >> @@ -0,0 +1,9 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +#ifndef _ASM_POWERPC_LIBNVDIMM_H
> >> +#define _ASM_POWERPC_LIBNVDIMM_H
> >> +
> >> +#define nd_pfn_supported_alignments nd_pfn_supported_alignments
> >> +extern unsigned long *nd_pfn_supported_alignments(void);
> >> +extern unsigned long nd_pfn_default_alignment(void);
> >> +
> >> +#endif
> >> diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
> >> index 0f499db315d6..42e4a399ba5d 100644
> >> --- a/arch/powerpc/mm/Makefile
> >> +++ b/arch/powerpc/mm/Makefile
> >> @@ -20,3 +20,4 @@ obj-$(CONFIG_HIGHMEM)         += highmem.o
> >>   obj-$(CONFIG_PPC_COPRO_BASE)   += copro_fault.o
> >>   obj-$(CONFIG_PPC_PTDUMP)       += ptdump/
> >>   obj-$(CONFIG_KASAN)            += kasan/
> >> +obj-$(CONFIG_NVDIMM_PFN)               += nvdimm.o
> >> diff --git a/arch/powerpc/mm/nvdimm.c b/arch/powerpc/mm/nvdimm.c
> >> new file mode 100644
> >> index 000000000000..a29a4510715e
> >> --- /dev/null
> >> +++ b/arch/powerpc/mm/nvdimm.c
> >> @@ -0,0 +1,34 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >> +
> >> +#include <asm/pgtable.h>
> >> +#include <asm/page.h>
> >> +
> >> +#include <linux/mm.h>
> >> +/*
> >> + * We support only pte and pmd mappings for now.
> >> + */
> >> +const unsigned long *nd_pfn_supported_alignments(void)
> >> +{
> >> +       static unsigned long supported_alignments[3];
> >> +
> >> +       supported_alignments[0] = PAGE_SIZE;
> >> +
> >> +       if (has_transparent_hugepage())
> >> +               supported_alignments[1] = HPAGE_PMD_SIZE;
> >> +       else
> >> +               supported_alignments[1] = 0;
> >> +
> >> +       supported_alignments[2] = 0;
> >> +       return supported_alignments;
> >> +}
> >
> > I'd prefer to not create a powerpc specific object file especially
> > because nothing in the above looks powerpc specific. Anything that
> > prevents just making this the common implementation?
> >
>
>
> Support for PUD size alignment. The alignment details of what can be
> supported is largely a arch specific details. We could do that with
>
> #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>                 HPAGE_PUD_SIZE,
> #endif
>
> But I was looking at arch specific functions rather than having all
> those #ifdef there.

The ifdefs could still move to a header with a:

    static const unsigned long supported_alignments[3];

...definition.

>
> >> +
> >> +/*
> >> + * Use pmd mapping if supported as default alignment
> >> + */
> >> +unsigned long nd_pfn_default_alignment(void)
> >> +{
> >> +
> >> +       if (has_transparent_hugepage())
> >> +               return HPAGE_PMD_SIZE;
> >> +       return PAGE_SIZE;
> >> +}
> >> diff --git a/arch/x86/include/asm/libnvdimm.h b/arch/x86/include/asm/libnvdimm.h
> >> new file mode 100644
> >> index 000000000000..3d5361db9164
> >> --- /dev/null
> >> +++ b/arch/x86/include/asm/libnvdimm.h
> >> @@ -0,0 +1,19 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +#ifndef _ASM_X86_LIBNVDIMM_H
> >> +#define _ASM_X86_LIBNVDIMM_H
> >> +
> >> +static inline unsigned long nd_pfn_default_alignment(void)
> >> +{
> >> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >> +       return HPAGE_PMD_SIZE;
> >> +#else
> >> +       return PAGE_SIZE;
> >> +#endif
> >> +}
> >> +
> >> +static inline unsigned long nd_altmap_align_size(unsigned long nd_align)
> >> +{
> >> +       return PMD_SIZE;
> >> +}
> >> +
> >> +#endif
> >> diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
> >> index e89af4b2d8e9..401a78b02116 100644
> >> --- a/drivers/nvdimm/nd.h
> >> +++ b/drivers/nvdimm/nd.h
> >> @@ -289,12 +289,6 @@ static inline struct device *nd_btt_create(struct nd_region *nd_region)
> >>   struct nd_pfn *to_nd_pfn(struct device *dev);
> >>   #if IS_ENABLED(CONFIG_NVDIMM_PFN)
> >>
> >> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >> -#define PFN_DEFAULT_ALIGNMENT HPAGE_PMD_SIZE
> >> -#else
> >> -#define PFN_DEFAULT_ALIGNMENT PAGE_SIZE
> >> -#endif
> >> -
> >>   int nd_pfn_probe(struct device *dev, struct nd_namespace_common *ndns);
> >>   bool is_nd_pfn(struct device *dev);
> >>   struct device *nd_pfn_create(struct nd_region *nd_region);
> >> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> >> index 22db77ac3d0e..7dc0b5da4c6b 100644
> >> --- a/drivers/nvdimm/pfn_devs.c
> >> +++ b/drivers/nvdimm/pfn_devs.c
> >> @@ -10,6 +10,7 @@
> >>   #include <linux/slab.h>
> >>   #include <linux/fs.h>
> >>   #include <linux/mm.h>
> >> +#include <asm/libnvdimm.h>
> >>   #include "nd-core.h"
> >>   #include "pfn.h"
> >>   #include "nd.h"
> >> @@ -103,6 +104,8 @@ static ssize_t align_show(struct device *dev,
> >>          return sprintf(buf, "%ld\n", nd_pfn->align);
> >>   }
> >>
> >> +#ifndef nd_pfn_supported_alignments
> >> +#define nd_pfn_supported_alignments nd_pfn_supported_alignments
> >>   static const unsigned long *nd_pfn_supported_alignments(void)
> >>   {
> >>          /*
> >> @@ -125,6 +128,7 @@ static const unsigned long *nd_pfn_supported_alignments(void)
> >>
> >>          return data;
> >>   }
> >> +#endif
> >>
> >>   static ssize_t align_store(struct device *dev,
> >>                  struct device_attribute *attr, const char *buf, size_t len)
> >> @@ -302,7 +306,7 @@ struct device *nd_pfn_devinit(struct nd_pfn *nd_pfn,
> >>                  return NULL;
> >>
> >>          nd_pfn->mode = PFN_MODE_NONE;
> >> -       nd_pfn->align = PFN_DEFAULT_ALIGNMENT;
> >> +       nd_pfn->align = nd_pfn_default_alignment();
> >>          dev = &nd_pfn->dev;
> >>          device_initialize(&nd_pfn->dev);
> >>          if (ndns && !__nd_attach_ndns(&nd_pfn->dev, ndns, &nd_pfn->ndns)) {
> >> @@ -412,6 +416,20 @@ static int nd_pfn_clear_memmap_errors(struct nd_pfn *nd_pfn)
> >>          return 0;
> >>   }
> >>
> >> +static bool nd_supported_alignment(unsigned long align)
> >> +{
> >> +       int i;
> >> +       const unsigned long *supported = nd_pfn_supported_alignments();
> >> +
> >> +       if (align == 0)
> >> +               return false;
> >> +
> >> +       for (i = 0; supported[i]; i++)
> >> +               if (align == supported[i])
> >> +                       return true;
> >> +       return false;
> >> +}
> >> +
> >>   /**
> >>    * nd_pfn_validate - read and validate info-block
> >>    * @nd_pfn: fsdax namespace runtime state / properties
> >> @@ -496,6 +514,18 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
> >>                  return -EOPNOTSUPP;
> >>          }
> >>
> >> +       /*
> >> +        * Check whether the we support the alignment. For Dax if the
> >> +        * superblock alignment is not matching, we won't initialize
> >> +        * the device.
> >> +        */
> >> +       if (!nd_supported_alignment(align) &&
> >> +           !memcmp(pfn_sb->signature, DAX_SIG, PFN_SIG_LEN)) {
> >
> > ^^^ more indentation reflow.
> >
>
> Both .clang-format and earlier emacs customization recommendation used
> the above style. I sent a  v7 with this fixup alone.

Thanks for that I appreciate the fix up on something that's a
reasonable choice either way.

>
>
> >> +               dev_err(&nd_pfn->dev, "init failed, alignment mismatch: "
> >> +                       "%ld:%ld\n", nd_pfn->align, align);
> >> +               return -EOPNOTSUPP;
> >> +       }
> >> +
> >>          if (!nd_pfn->uuid) {
> >>                  /*
> >>                   * When probing a namepace via nd_pfn_probe() the uuid
> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> >> index 45ede62aa85b..4fa91f9bd0da 100644
> >> --- a/include/linux/huge_mm.h
> >> +++ b/include/linux/huge_mm.h
> >> @@ -108,7 +108,12 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
> >>
> >>          if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
> >>                  return true;
> >> -
> >> +       /*
> >> +        * For dax let's try to do hugepage fault always. If we don't support
> >> +        * hugepages we will not have enabled namespaces with hugepage alignment.
> >> +        * This also means we try to handle hugepage fault on device with
> >> +        * smaller alignment. But for then we will return with VM_FAULT_FALLBACK
> >> +        */
> >
> > Please fix this comment up to not use the word "we". This should also
> > get an ack from linux-mm folks. "We" in the above alternately means
> > "we: the runtime/compile-tims platform setting", or "we: the kernel
> > implementation".
> >
>
> I can update the comment to indicate runtime part.

Thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-09-04 18:10 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-19 13:34 [PATCH v6 0/7] Mark the namespace disabled on pfn superblock mismatch Aneesh Kumar K.V
2019-08-19 13:34 ` [PATCH v6 1/7] libnvdimm/region: Rewrite _probe_success() to _advance_seeds() Aneesh Kumar K.V
2019-08-19 13:34 ` [PATCH v6 2/7] libnvdimm/pmem: Advance namespace seed for specific probe errors Aneesh Kumar K.V
2019-08-19 13:34 ` [PATCH v6 3/7] libnvdimm/pfn_dev: Add a build check to make sure we notice when struct page size change Aneesh Kumar K.V
2019-08-19 13:34 ` [PATCH v6 4/7] libnvdimm/pfn_dev: Add page size and struct page size to pfn superblock Aneesh Kumar K.V
2019-08-19 13:34 ` [PATCH v6 5/7] libnvdimm/label: Remove the dpa align check Aneesh Kumar K.V
2019-08-19 13:34 ` [PATCH v6 6/7] libnvdimm: Use PAGE_SIZE instead of SZ_4K for " Aneesh Kumar K.V
2019-08-19 13:34 ` [PATCH v6 7/7] libnvdimm/dax: Pick the right alignment default when creating dax devices Aneesh Kumar K.V
2019-09-04  0:18   ` Dan Williams
2019-09-04  5:17     ` Aneesh Kumar K.V
     [not found]       ` <ae11dcb0-beaf-a6cf-9456-ac6a7ec84743-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>
2019-09-04 18:10         ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).