Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v5 0/4] Mark the namespace disabled on pfn superblock mismatch
@ 2019-08-09  7:45 Aneesh Kumar K.V
  2019-08-09  7:45 ` [PATCH v5 1/4] nvdimm: Consider probe return -EOPNOTSUPP as success Aneesh Kumar K.V
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2019-08-09  7:45 UTC (permalink / raw)
  To: dan.j.williams; +Cc: linux-nvdimm, linux-mm, linuxppc-dev, Aneesh Kumar K.V

We add new members to pfn superblock (PAGE_SIZE and struct page size) in this series.
This is now checked while initializing the namespace. If we find a mismatch we mark
the namespace disabled.

This series also handle configs where hugepage support is not enabled by default.
This can result in different align restrictions for dax namespace. We mark the
dax namespace disabled if we find the alignment not supported.

Aneesh Kumar K.V (4):
  nvdimm: Consider probe return -EOPNOTSUPP as success
  mm/nvdimm: Add page size and struct page size to pfn superblock
  mm/nvdimm: Use correct #defines instead of open coding
  mm/nvdimm: Pick the right alignment default when creating dax devices

 arch/powerpc/include/asm/libnvdimm.h |  9 ++++
 arch/powerpc/mm/Makefile             |  1 +
 arch/powerpc/mm/nvdimm.c             | 34 +++++++++++++++
 arch/x86/include/asm/libnvdimm.h     | 19 +++++++++
 drivers/nvdimm/bus.c                 |  2 +-
 drivers/nvdimm/label.c               |  2 +-
 drivers/nvdimm/namespace_devs.c      |  6 +--
 drivers/nvdimm/nd.h                  |  6 ---
 drivers/nvdimm/pfn.h                 |  5 ++-
 drivers/nvdimm/pfn_devs.c            | 62 ++++++++++++++++++++++++++--
 drivers/nvdimm/pmem.c                | 26 ++++++++++--
 drivers/nvdimm/region_devs.c         |  8 ++--
 include/linux/huge_mm.h              |  7 +++-
 13 files changed, 163 insertions(+), 24 deletions(-)
 create mode 100644 arch/powerpc/include/asm/libnvdimm.h
 create mode 100644 arch/powerpc/mm/nvdimm.c
 create mode 100644 arch/x86/include/asm/libnvdimm.h

-- 
2.21.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v5 1/4] nvdimm: Consider probe return -EOPNOTSUPP as success
  2019-08-09  7:45 [PATCH v5 0/4] Mark the namespace disabled on pfn superblock mismatch Aneesh Kumar K.V
@ 2019-08-09  7:45 ` Aneesh Kumar K.V
  2019-08-14  4:22   ` Dan Williams
  2019-08-09  7:45 ` [PATCH v5 2/4] mm/nvdimm: Add page size and struct page size to pfn superblock Aneesh Kumar K.V
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Aneesh Kumar K.V @ 2019-08-09  7:45 UTC (permalink / raw)
  To: dan.j.williams; +Cc: linux-nvdimm, linux-mm, linuxppc-dev, Aneesh Kumar K.V

This patch add -EOPNOTSUPP as return from probe callback to
indicate we were not able to initialize a namespace due to pfn superblock
feature/version mismatch. We want to consider this a probe success so that
we can create new namesapce seed and there by avoid marking the failed
namespace as the seed namespace.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/nvdimm/bus.c  |  2 +-
 drivers/nvdimm/pmem.c | 26 ++++++++++++++++++++++----
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 798c5c4aea9c..16c35e6446a7 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -95,7 +95,7 @@ static int nvdimm_bus_probe(struct device *dev)
 	rc = nd_drv->probe(dev);
 	debug_nvdimm_unlock(dev);
 
-	if (rc == 0)
+	if (rc == 0 || rc == -EOPNOTSUPP)
 		nd_region_probe_success(nvdimm_bus, dev);
 	else
 		nd_region_disable(nvdimm_bus, dev);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 4c121dd03dd9..3f498881dd28 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -490,6 +490,7 @@ static int pmem_attach_disk(struct device *dev,
 
 static int nd_pmem_probe(struct device *dev)
 {
+	int ret;
 	struct nd_namespace_common *ndns;
 
 	ndns = nvdimm_namespace_common_probe(dev);
@@ -505,12 +506,29 @@ static int nd_pmem_probe(struct device *dev)
 	if (is_nd_pfn(dev))
 		return pmem_attach_disk(dev, ndns);
 
-	/* if we find a valid info-block we'll come back as that personality */
-	if (nd_btt_probe(dev, ndns) == 0 || nd_pfn_probe(dev, ndns) == 0
-			|| nd_dax_probe(dev, ndns) == 0)
+	ret = nd_btt_probe(dev, ndns);
+	if (ret == 0)
 		return -ENXIO;
+	else if (ret == -EOPNOTSUPP)
+		return ret;
 
-	/* ...otherwise we're just a raw pmem device */
+	ret = nd_pfn_probe(dev, ndns);
+	if (ret == 0)
+		return -ENXIO;
+	else if (ret == -EOPNOTSUPP)
+		return ret;
+
+	ret = nd_dax_probe(dev, ndns);
+	if (ret == 0)
+		return -ENXIO;
+	else if (ret == -EOPNOTSUPP)
+		return ret;
+	/*
+	 * We have two failure conditions here, there is no
+	 * info reserver block or we found a valid info reserve block
+	 * but failed to initialize the pfn superblock.
+	 * Don't create a raw pmem disk for the second case.
+	 */
 	return pmem_attach_disk(dev, ndns);
 }
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v5 2/4] mm/nvdimm: Add page size and struct page size to pfn superblock
  2019-08-09  7:45 [PATCH v5 0/4] Mark the namespace disabled on pfn superblock mismatch Aneesh Kumar K.V
  2019-08-09  7:45 ` [PATCH v5 1/4] nvdimm: Consider probe return -EOPNOTSUPP as success Aneesh Kumar K.V
@ 2019-08-09  7:45 ` Aneesh Kumar K.V
  2019-08-09  7:45 ` [PATCH v5 3/4] mm/nvdimm: Use correct #defines instead of open coding Aneesh Kumar K.V
  2019-08-09  7:45 ` [PATCH v5 4/4] mm/nvdimm: Pick the right alignment default when creating dax devices Aneesh Kumar K.V
  3 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2019-08-09  7:45 UTC (permalink / raw)
  To: dan.j.williams; +Cc: linux-nvdimm, linux-mm, linuxppc-dev, Aneesh Kumar K.V

This is needed so that we don't wrongly initialize a namespace
which doesn't have enough space reserved for holding struct pages
with the current kernel.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/nvdimm/pfn.h      |  5 ++++-
 drivers/nvdimm/pfn_devs.c | 27 ++++++++++++++++++++++++++-
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h
index 7381673b7b70..acb19517f678 100644
--- a/drivers/nvdimm/pfn.h
+++ b/drivers/nvdimm/pfn.h
@@ -29,7 +29,10 @@ struct nd_pfn_sb {
 	/* minor-version-2 record the base alignment of the mapping */
 	__le32 align;
 	/* minor-version-3 guarantee the padding and flags are zero */
-	u8 padding[4000];
+	/* minor-version-4 record the page size and struct page size */
+	__le32 page_size;
+	__le16 page_struct_size;
+	u8 padding[3994];
 	__le64 checksum;
 };
 
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 3e7b11cf1aae..37e96811c2fc 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -460,6 +460,15 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
 	if (__le16_to_cpu(pfn_sb->version_minor) < 2)
 		pfn_sb->align = 0;
 
+	if (__le16_to_cpu(pfn_sb->version_minor) < 4) {
+		/*
+		 * For a large part we use PAGE_SIZE. But we
+		 * do have some accounting code using SZ_4K.
+		 */
+		pfn_sb->page_struct_size = cpu_to_le16(64);
+		pfn_sb->page_size = cpu_to_le32(PAGE_SIZE);
+	}
+
 	switch (le32_to_cpu(pfn_sb->mode)) {
 	case PFN_MODE_RAM:
 	case PFN_MODE_PMEM:
@@ -475,6 +484,20 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
 		align = 1UL << ilog2(offset);
 	mode = le32_to_cpu(pfn_sb->mode);
 
+	if (le32_to_cpu(pfn_sb->page_size) != PAGE_SIZE) {
+		dev_err(&nd_pfn->dev,
+			"init failed, page size mismatch %d\n",
+			le32_to_cpu(pfn_sb->page_size));
+		return -EOPNOTSUPP;
+	}
+
+	if (le16_to_cpu(pfn_sb->page_struct_size) < sizeof(struct page)) {
+		dev_err(&nd_pfn->dev,
+			"init failed, struct page size mismatch %d\n",
+			le16_to_cpu(pfn_sb->page_struct_size));
+		return -EOPNOTSUPP;
+	}
+
 	if (!nd_pfn->uuid) {
 		/*
 		 * When probing a namepace via nd_pfn_probe() the uuid
@@ -722,8 +745,10 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 	memcpy(pfn_sb->uuid, nd_pfn->uuid, 16);
 	memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(&ndns->dev), 16);
 	pfn_sb->version_major = cpu_to_le16(1);
-	pfn_sb->version_minor = cpu_to_le16(3);
+	pfn_sb->version_minor = cpu_to_le16(4);
 	pfn_sb->align = cpu_to_le32(nd_pfn->align);
+	pfn_sb->page_struct_size = cpu_to_le16(sizeof(struct page));
+	pfn_sb->page_size = cpu_to_le32(PAGE_SIZE);
 	checksum = nd_sb_checksum((struct nd_gen_sb *) pfn_sb);
 	pfn_sb->checksum = cpu_to_le64(checksum);
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v5 3/4] mm/nvdimm: Use correct #defines instead of open coding
  2019-08-09  7:45 [PATCH v5 0/4] Mark the namespace disabled on pfn superblock mismatch Aneesh Kumar K.V
  2019-08-09  7:45 ` [PATCH v5 1/4] nvdimm: Consider probe return -EOPNOTSUPP as success Aneesh Kumar K.V
  2019-08-09  7:45 ` [PATCH v5 2/4] mm/nvdimm: Add page size and struct page size to pfn superblock Aneesh Kumar K.V
@ 2019-08-09  7:45 ` Aneesh Kumar K.V
  2019-08-15 21:05   ` Dan Williams
  2019-08-09  7:45 ` [PATCH v5 4/4] mm/nvdimm: Pick the right alignment default when creating dax devices Aneesh Kumar K.V
  3 siblings, 1 reply; 13+ messages in thread
From: Aneesh Kumar K.V @ 2019-08-09  7:45 UTC (permalink / raw)
  To: dan.j.williams; +Cc: linux-nvdimm, linux-mm, linuxppc-dev, Aneesh Kumar K.V

Use PAGE_SIZE instead of SZ_4K and sizeof(struct page) instead of 64.
If we have a kernel built with different struct page size the previous
patch should handle marking the namespace disabled.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/nvdimm/label.c          | 2 +-
 drivers/nvdimm/namespace_devs.c | 6 +++---
 drivers/nvdimm/pfn_devs.c       | 3 ++-
 drivers/nvdimm/region_devs.c    | 8 ++++----
 4 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/nvdimm/label.c b/drivers/nvdimm/label.c
index 73e197babc2f..7ee037063be7 100644
--- a/drivers/nvdimm/label.c
+++ b/drivers/nvdimm/label.c
@@ -355,7 +355,7 @@ static bool slot_valid(struct nvdimm_drvdata *ndd,
 
 	/* check that DPA allocations are page aligned */
 	if ((__le64_to_cpu(nd_label->dpa)
-				| __le64_to_cpu(nd_label->rawsize)) % SZ_4K)
+				| __le64_to_cpu(nd_label->rawsize)) % PAGE_SIZE)
 		return false;
 
 	/* check checksum */
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index a16e52251a30..a9c76df12cb9 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -1006,10 +1006,10 @@ static ssize_t __size_store(struct device *dev, unsigned long long val)
 		return -ENXIO;
 	}
 
-	div_u64_rem(val, SZ_4K * nd_region->ndr_mappings, &remainder);
+	div_u64_rem(val, PAGE_SIZE * nd_region->ndr_mappings, &remainder);
 	if (remainder) {
-		dev_dbg(dev, "%llu is not %dK aligned\n", val,
-				(SZ_4K * nd_region->ndr_mappings) / SZ_1K);
+		dev_dbg(dev, "%llu is not %ldK aligned\n", val,
+				(PAGE_SIZE * nd_region->ndr_mappings) / SZ_1K);
 		return -EINVAL;
 	}
 
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 37e96811c2fc..c1d9be609322 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -725,7 +725,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 		 * when populating the vmemmap. This *should* be equal to
 		 * PMD_SIZE for most architectures.
 		 */
-		offset = ALIGN(start + SZ_8K + 64 * npfns, align) - start;
+		offset = ALIGN(start + SZ_8K + sizeof(struct page) * npfns,
+			       align) - start;
 	} else if (nd_pfn->mode == PFN_MODE_RAM)
 		offset = ALIGN(start + SZ_8K, align) - start;
 	else
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index af30cbe7a8ea..20e265a534f8 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -992,10 +992,10 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
 		struct nd_mapping_desc *mapping = &ndr_desc->mapping[i];
 		struct nvdimm *nvdimm = mapping->nvdimm;
 
-		if ((mapping->start | mapping->size) % SZ_4K) {
-			dev_err(&nvdimm_bus->dev, "%s: %s mapping%d is not 4K aligned\n",
-					caller, dev_name(&nvdimm->dev), i);
-
+		if ((mapping->start | mapping->size) % PAGE_SIZE) {
+			dev_err(&nvdimm_bus->dev,
+				"%s: %s mapping%d is not %ld aligned\n",
+				caller, dev_name(&nvdimm->dev), i, PAGE_SIZE);
 			return NULL;
 		}
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v5 4/4] mm/nvdimm: Pick the right alignment default when creating dax devices
  2019-08-09  7:45 [PATCH v5 0/4] Mark the namespace disabled on pfn superblock mismatch Aneesh Kumar K.V
                   ` (2 preceding siblings ...)
  2019-08-09  7:45 ` [PATCH v5 3/4] mm/nvdimm: Use correct #defines instead of open coding Aneesh Kumar K.V
@ 2019-08-09  7:45 ` Aneesh Kumar K.V
  3 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2019-08-09  7:45 UTC (permalink / raw)
  To: dan.j.williams; +Cc: linux-nvdimm, linux-mm, linuxppc-dev, Aneesh Kumar K.V

Allow arch to provide the supported alignments and use hugepage alignment only
if we support hugepage. Right now we depend on compile time configs whereas this
patch switch this to runtime discovery.

Architectures like ppc64 can have THP enabled in code, but then can have
hugepage size disabled by the hypervisor. This allows us to create dax devices
with PAGE_SIZE alignment in this case.

Existing dax namespace with alignment larger than PAGE_SIZE will fail to
initialize in this specific case. We still allow fsdax namespace initialization.

With respect to identifying whether to enable hugepage fault for a dax device,
if THP is enabled during compile, we default to taking hugepage fault and in dax
fault handler if we find the fault size > alignment we retry with PAGE_SIZE
fault size.

This also addresses the below failure scenario on ppc64

ndctl create-namespace --mode=devdax  | grep align
 "align":16777216,
 "align":16777216

cat /sys/devices/ndbus0/region0/dax0.0/supported_alignments
 65536 16777216

daxio.static-debug  -z -o /dev/dax0.0
  Bus error (core dumped)

  $ dmesg | tail
   lpar: Failed hash pte insert with error -4
   hash-mmu: mm: Hashing failure ! EA=0x7fff17000000 access=0x8000000000000006 current=daxio
   hash-mmu:     trap=0x300 vsid=0x22cb7a3 ssize=1 base psize=2 psize 10 pte=0xc000000501002b86
   daxio[3860]: bus error (7) at 7fff17000000 nip 7fff973c007c lr 7fff973bff34 code 2 in libpmem.so.1.0.0[7fff973b0000+20000]
   daxio[3860]: code: 792945e4 7d494b78 e95f0098 7d494b78 f93f00a0 4800012c e93f0088 f93f0120
   daxio[3860]: code: e93f00a0 f93f0128 e93f0120 e95f0128 <f9490000> e93f0088 39290008 f93f0110

The failure was due to guest kernel using wrong page size.

The namespaces created with 16M alignment will appear as below on a config with
16M page size disabled.

$ ndctl list -Ni
[
  {
    "dev":"namespace0.1",
    "mode":"fsdax",
    "map":"dev",
    "size":5351931904,
    "uuid":"fc6e9667-461a-4718-82b4-69b24570bddb",
    "align":16777216,
    "blockdev":"pmem0.1",
    "supported_alignments":[
      65536
    ]
  },
  {
    "dev":"namespace0.0",
    "mode":"fsdax",    <==== devdax 16M alignment marked disabled.
    "map":"mem",
    "size":5368709120,
    "uuid":"a4bdf81a-f2ee-4bc6-91db-7b87eddd0484",
    "state":"disabled"
  }
]

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/libnvdimm.h |  9 ++++++++
 arch/powerpc/mm/Makefile             |  1 +
 arch/powerpc/mm/nvdimm.c             | 34 ++++++++++++++++++++++++++++
 arch/x86/include/asm/libnvdimm.h     | 19 ++++++++++++++++
 drivers/nvdimm/nd.h                  |  6 -----
 drivers/nvdimm/pfn_devs.c            | 32 +++++++++++++++++++++++++-
 include/linux/huge_mm.h              |  7 +++++-
 7 files changed, 100 insertions(+), 8 deletions(-)
 create mode 100644 arch/powerpc/include/asm/libnvdimm.h
 create mode 100644 arch/powerpc/mm/nvdimm.c
 create mode 100644 arch/x86/include/asm/libnvdimm.h

diff --git a/arch/powerpc/include/asm/libnvdimm.h b/arch/powerpc/include/asm/libnvdimm.h
new file mode 100644
index 000000000000..d35fd7f48603
--- /dev/null
+++ b/arch/powerpc/include/asm/libnvdimm.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_LIBNVDIMM_H
+#define _ASM_POWERPC_LIBNVDIMM_H
+
+#define nd_pfn_supported_alignments nd_pfn_supported_alignments
+extern unsigned long *nd_pfn_supported_alignments(void);
+extern unsigned long nd_pfn_default_alignment(void);
+
+#endif
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 0f499db315d6..42e4a399ba5d 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -20,3 +20,4 @@ obj-$(CONFIG_HIGHMEM)		+= highmem.o
 obj-$(CONFIG_PPC_COPRO_BASE)	+= copro_fault.o
 obj-$(CONFIG_PPC_PTDUMP)	+= ptdump/
 obj-$(CONFIG_KASAN)		+= kasan/
+obj-$(CONFIG_NVDIMM_PFN)		+= nvdimm.o
diff --git a/arch/powerpc/mm/nvdimm.c b/arch/powerpc/mm/nvdimm.c
new file mode 100644
index 000000000000..a29a4510715e
--- /dev/null
+++ b/arch/powerpc/mm/nvdimm.c
@@ -0,0 +1,34 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <asm/pgtable.h>
+#include <asm/page.h>
+
+#include <linux/mm.h>
+/*
+ * We support only pte and pmd mappings for now.
+ */
+const unsigned long *nd_pfn_supported_alignments(void)
+{
+	static unsigned long supported_alignments[3];
+
+	supported_alignments[0] = PAGE_SIZE;
+
+	if (has_transparent_hugepage())
+		supported_alignments[1] = HPAGE_PMD_SIZE;
+	else
+		supported_alignments[1] = 0;
+
+	supported_alignments[2] = 0;
+	return supported_alignments;
+}
+
+/*
+ * Use pmd mapping if supported as default alignment
+ */
+unsigned long nd_pfn_default_alignment(void)
+{
+
+	if (has_transparent_hugepage())
+		return HPAGE_PMD_SIZE;
+	return PAGE_SIZE;
+}
diff --git a/arch/x86/include/asm/libnvdimm.h b/arch/x86/include/asm/libnvdimm.h
new file mode 100644
index 000000000000..3d5361db9164
--- /dev/null
+++ b/arch/x86/include/asm/libnvdimm.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_LIBNVDIMM_H
+#define _ASM_X86_LIBNVDIMM_H
+
+static inline unsigned long nd_pfn_default_alignment(void)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+	return HPAGE_PMD_SIZE;
+#else
+	return PAGE_SIZE;
+#endif
+}
+
+static inline unsigned long nd_altmap_align_size(unsigned long nd_align)
+{
+	return PMD_SIZE;
+}
+
+#endif
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 1b9955651379..5d095a19ff2c 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -289,12 +289,6 @@ static inline struct device *nd_btt_create(struct nd_region *nd_region)
 struct nd_pfn *to_nd_pfn(struct device *dev);
 #if IS_ENABLED(CONFIG_NVDIMM_PFN)
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-#define PFN_DEFAULT_ALIGNMENT HPAGE_PMD_SIZE
-#else
-#define PFN_DEFAULT_ALIGNMENT PAGE_SIZE
-#endif
-
 int nd_pfn_probe(struct device *dev, struct nd_namespace_common *ndns);
 bool is_nd_pfn(struct device *dev);
 struct device *nd_pfn_create(struct nd_region *nd_region);
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index c1d9be609322..f43d1baa6f33 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -10,6 +10,7 @@
 #include <linux/slab.h>
 #include <linux/fs.h>
 #include <linux/mm.h>
+#include <asm/libnvdimm.h>
 #include "nd-core.h"
 #include "pfn.h"
 #include "nd.h"
@@ -103,6 +104,8 @@ static ssize_t align_show(struct device *dev,
 	return sprintf(buf, "%ld\n", nd_pfn->align);
 }
 
+#ifndef nd_pfn_supported_alignments
+#define nd_pfn_supported_alignments nd_pfn_supported_alignments
 static const unsigned long *nd_pfn_supported_alignments(void)
 {
 	/*
@@ -125,6 +128,7 @@ static const unsigned long *nd_pfn_supported_alignments(void)
 
 	return data;
 }
+#endif
 
 static ssize_t align_store(struct device *dev,
 		struct device_attribute *attr, const char *buf, size_t len)
@@ -302,7 +306,7 @@ struct device *nd_pfn_devinit(struct nd_pfn *nd_pfn,
 		return NULL;
 
 	nd_pfn->mode = PFN_MODE_NONE;
-	nd_pfn->align = PFN_DEFAULT_ALIGNMENT;
+	nd_pfn->align = nd_pfn_default_alignment();
 	dev = &nd_pfn->dev;
 	device_initialize(&nd_pfn->dev);
 	if (ndns && !__nd_attach_ndns(&nd_pfn->dev, ndns, &nd_pfn->ndns)) {
@@ -412,6 +416,20 @@ static int nd_pfn_clear_memmap_errors(struct nd_pfn *nd_pfn)
 	return 0;
 }
 
+static bool nd_supported_alignment(unsigned long align)
+{
+	int i;
+	const unsigned long *supported = nd_pfn_supported_alignments();
+
+	if (align == 0)
+		return false;
+
+	for (i = 0; supported[i]; i++)
+		if (align == supported[i])
+			return true;
+	return false;
+}
+
 /**
  * nd_pfn_validate - read and validate info-block
  * @nd_pfn: fsdax namespace runtime state / properties
@@ -498,6 +516,18 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
 		return -EOPNOTSUPP;
 	}
 
+	/*
+	 * Check whether the we support the alignment. For Dax if the
+	 * superblock alignment is not matching, we won't initialize
+	 * the device.
+	 */
+	if (!nd_supported_alignment(align) &&
+	    !memcmp(pfn_sb->signature, DAX_SIG, PFN_SIG_LEN)) {
+		dev_err(&nd_pfn->dev, "init failed, alignment mismatch: "
+			"%ld:%ld\n", nd_pfn->align, align);
+		return -EOPNOTSUPP;
+	}
+
 	if (!nd_pfn->uuid) {
 		/*
 		 * When probing a namepace via nd_pfn_probe() the uuid
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 45ede62aa85b..4fa91f9bd0da 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -108,7 +108,12 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
 
 	if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
 		return true;
-
+	/*
+	 * For dax let's try to do hugepage fault always. If we don't support
+	 * hugepages we will not have enabled namespaces with hugepage alignment.
+	 * This also means we try to handle hugepage fault on device with
+	 * smaller alignment. But for then we will return with VM_FAULT_FALLBACK
+	 */
 	if (vma_is_dax(vma))
 		return true;
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/4] nvdimm: Consider probe return -EOPNOTSUPP as success
  2019-08-09  7:45 ` [PATCH v5 1/4] nvdimm: Consider probe return -EOPNOTSUPP as success Aneesh Kumar K.V
@ 2019-08-14  4:22   ` Dan Williams
  2019-08-15 19:54     ` Dan Williams
  0 siblings, 1 reply; 13+ messages in thread
From: Dan Williams @ 2019-08-14  4:22 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-nvdimm, Linux MM, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 3190 bytes --]

Hi Aneesh, logic looks correct but there are some cleanups I'd like to
see and a lead-in patch that I attached.

I've started prefixing nvdimm patches with:

    libnvdimm/$component:

...since this patch mostly impacts the pmem driver lets prefix it
"libnvdimm/pmem: "

On Fri, Aug 9, 2019 at 12:45 AM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> This patch add -EOPNOTSUPP as return from probe callback to

s/This patch add/Add/

No need to say "this patch" it's obviously a patch.

> indicate we were not able to initialize a namespace due to pfn superblock
> feature/version mismatch. We want to consider this a probe success so that
> we can create new namesapce seed and there by avoid marking the failed
> namespace as the seed namespace.

Please replace usage of "we" with the exact agent involved as which
"we" is being referred to gets confusing for the reader.

i.e. "indicate that the pmem driver was not..." "The nvdimm core wants
to consider this...".

>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  drivers/nvdimm/bus.c  |  2 +-
>  drivers/nvdimm/pmem.c | 26 ++++++++++++++++++++++----
>  2 files changed, 23 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
> index 798c5c4aea9c..16c35e6446a7 100644
> --- a/drivers/nvdimm/bus.c
> +++ b/drivers/nvdimm/bus.c
> @@ -95,7 +95,7 @@ static int nvdimm_bus_probe(struct device *dev)
>         rc = nd_drv->probe(dev);
>         debug_nvdimm_unlock(dev);
>
> -       if (rc == 0)
> +       if (rc == 0 || rc == -EOPNOTSUPP)
>                 nd_region_probe_success(nvdimm_bus, dev);

This now makes the nd_region_probe_success() helper obviously misnamed
since it now wants to take actions on non-probe success. I attached a
lead-in cleanup that you can pull into your series that renames that
routine to nd_region_advance_seeds().

When you rebase this needs a comment about why EOPNOTSUPP has special handling.

>         else
>                 nd_region_disable(nvdimm_bus, dev);
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 4c121dd03dd9..3f498881dd28 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -490,6 +490,7 @@ static int pmem_attach_disk(struct device *dev,
>
>  static int nd_pmem_probe(struct device *dev)
>  {
> +       int ret;
>         struct nd_namespace_common *ndns;
>
>         ndns = nvdimm_namespace_common_probe(dev);
> @@ -505,12 +506,29 @@ static int nd_pmem_probe(struct device *dev)
>         if (is_nd_pfn(dev))
>                 return pmem_attach_disk(dev, ndns);
>
> -       /* if we find a valid info-block we'll come back as that personality */
> -       if (nd_btt_probe(dev, ndns) == 0 || nd_pfn_probe(dev, ndns) == 0
> -                       || nd_dax_probe(dev, ndns) == 0)

Similar need for an updated comment here to explain the special
translation of error codes.

> +       ret = nd_btt_probe(dev, ndns);
> +       if (ret == 0)
>                 return -ENXIO;
> +       else if (ret == -EOPNOTSUPP)

Are there cases where the btt driver needs to return EOPNOTSUPP? I'd
otherwise like to keep this special casing constrained to the pfn /
dax info block cases.

[-- Attachment #2: 0001-libnvdimm-region-Rewrite-_probe_success-to-_advance_.patch --]
[-- Type: text/x-patch, Size: 7584 bytes --]

From 9ec13a8672e87e0b1c5b9427ab926168e53d55bc Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams@intel.com>
Date: Tue, 13 Aug 2019 13:09:27 -0700
Subject: [PATCH] libnvdimm/region: Rewrite _probe_success() to
 _advance_seeds()

The nd_region_probe_success() helper collides seed management with
nvdimm->busy tracking. Given the 'busy' increment is handled internal to the
nd_region driver 'probe' path move the decrement to the 'remove' path.
With that cleanup the routine can be renamed to the more descriptive
nd_region_advance_seeds().

The change is prompted by an incoming need to optionally advance the
seeds on other events besides 'probe' success.

Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/bus.c            |  7 +---
 drivers/nvdimm/namespace_devs.c | 34 ++++++++++++++---
 drivers/nvdimm/nd-core.h        |  3 +-
 drivers/nvdimm/region_devs.c    | 68 +++++----------------------------
 4 files changed, 41 insertions(+), 71 deletions(-)

diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 29479d3b01b0..ee6de34ae525 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -95,10 +95,8 @@ static int nvdimm_bus_probe(struct device *dev)
 	rc = nd_drv->probe(dev);
 	debug_nvdimm_unlock(dev);
 
-	if (rc == 0)
-		nd_region_probe_success(nvdimm_bus, dev);
-	else
-		nd_region_disable(nvdimm_bus, dev);
+	if (rc == 0 && dev->parent && is_nd_region(dev->parent))
+		nd_region_advance_seeds(to_nd_region(dev->parent), dev);
 	nvdimm_bus_probe_end(nvdimm_bus);
 
 	dev_dbg(&nvdimm_bus->dev, "END: %s.probe(%s) = %d\n", dev->driver->name,
@@ -121,7 +119,6 @@ static int nvdimm_bus_remove(struct device *dev)
 		rc = nd_drv->remove(dev);
 		debug_nvdimm_unlock(dev);
 	}
-	nd_region_disable(nvdimm_bus, dev);
 
 	dev_dbg(&nvdimm_bus->dev, "%s.remove(%s) = %d\n", dev->driver->name,
 			dev_name(dev), rc);
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index a16e52251a30..3be81f7b9ed3 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -2462,6 +2462,27 @@ static struct device **create_namespaces(struct nd_region *nd_region)
 	return devs;
 }
 
+static void deactivate_labels(void *region)
+{
+	struct nd_region *nd_region = region;
+	int i;
+
+	for (i = 0; i < nd_region->ndr_mappings; i++) {
+		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+		struct nvdimm_drvdata *ndd = nd_mapping->ndd;
+		struct nvdimm *nvdimm = nd_mapping->nvdimm;
+
+		mutex_lock(&nd_mapping->lock);
+		nd_mapping_free_labels(nd_mapping);
+		mutex_unlock(&nd_mapping->lock);
+
+		put_ndd(ndd);
+		nd_mapping->ndd = NULL;
+		if (ndd)
+			atomic_dec(&nvdimm->busy);
+	}
+}
+
 static int init_active_labels(struct nd_region *nd_region)
 {
 	int i;
@@ -2519,16 +2540,17 @@ static int init_active_labels(struct nd_region *nd_region)
 			mutex_unlock(&nd_mapping->lock);
 		}
 
-		if (j >= count)
-			continue;
+		if (j < count)
+			break;
+	}
 
-		mutex_lock(&nd_mapping->lock);
-		nd_mapping_free_labels(nd_mapping);
-		mutex_unlock(&nd_mapping->lock);
+	if (i < nd_region->ndr_mappings) {
+		deactivate_labels(nd_region);
 		return -ENOMEM;
 	}
 
-	return 0;
+	return devm_add_action_or_reset(&nd_region->dev, deactivate_labels,
+			nd_region);
 }
 
 int nd_region_register_namespaces(struct nd_region *nd_region, int *err)
diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index 454454ba1738..25fa121104d0 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -115,13 +115,12 @@ int __init nvdimm_bus_init(void);
 void nvdimm_bus_exit(void);
 void nvdimm_devs_exit(void);
 void nd_region_devs_exit(void);
-void nd_region_probe_success(struct nvdimm_bus *nvdimm_bus, struct device *dev);
 struct nd_region;
+void nd_region_advance_seeds(struct nd_region *nd_region, struct device *dev);
 void nd_region_create_ns_seed(struct nd_region *nd_region);
 void nd_region_create_btt_seed(struct nd_region *nd_region);
 void nd_region_create_pfn_seed(struct nd_region *nd_region);
 void nd_region_create_dax_seed(struct nd_region *nd_region);
-void nd_region_disable(struct nvdimm_bus *nvdimm_bus, struct device *dev);
 int nvdimm_bus_create_ndctl(struct nvdimm_bus *nvdimm_bus);
 void nvdimm_bus_destroy_ndctl(struct nvdimm_bus *nvdimm_bus);
 void nd_synchronize(void);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index af30cbe7a8ea..57de49b79d7d 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -715,85 +715,37 @@ void nd_mapping_free_labels(struct nd_mapping *nd_mapping)
 }
 
 /*
- * Upon successful probe/remove, take/release a reference on the
- * associated interleave set (if present), and plant new btt + namespace
- * seeds.  Also, on the removal of a BLK region, notify the provider to
- * disable the region.
+ * When a namespace is activated create new seeds for the next
+ * namespace, or namespace-personality to be configured.
  */
-static void nd_region_notify_driver_action(struct nvdimm_bus *nvdimm_bus,
-		struct device *dev, bool probe)
+void nd_region_advance_seeds(struct nd_region *nd_region, struct device *dev)
 {
-	struct nd_region *nd_region;
-
-	if (!probe && is_nd_region(dev)) {
-		int i;
-
-		nd_region = to_nd_region(dev);
-		for (i = 0; i < nd_region->ndr_mappings; i++) {
-			struct nd_mapping *nd_mapping = &nd_region->mapping[i];
-			struct nvdimm_drvdata *ndd = nd_mapping->ndd;
-			struct nvdimm *nvdimm = nd_mapping->nvdimm;
-
-			mutex_lock(&nd_mapping->lock);
-			nd_mapping_free_labels(nd_mapping);
-			mutex_unlock(&nd_mapping->lock);
-
-			put_ndd(ndd);
-			nd_mapping->ndd = NULL;
-			if (ndd)
-				atomic_dec(&nvdimm->busy);
-		}
-	}
-	if (dev->parent && is_nd_region(dev->parent) && probe) {
-		nd_region = to_nd_region(dev->parent);
-		nvdimm_bus_lock(dev);
-		if (nd_region->ns_seed == dev)
-			nd_region_create_ns_seed(nd_region);
-		nvdimm_bus_unlock(dev);
-	}
-	if (is_nd_btt(dev) && probe) {
+	nvdimm_bus_lock(dev);
+	if (nd_region->ns_seed == dev) {
+		nd_region_create_ns_seed(nd_region);
+	} else if (is_nd_btt(dev)) {
 		struct nd_btt *nd_btt = to_nd_btt(dev);
 
-		nd_region = to_nd_region(dev->parent);
-		nvdimm_bus_lock(dev);
 		if (nd_region->btt_seed == dev)
 			nd_region_create_btt_seed(nd_region);
 		if (nd_region->ns_seed == &nd_btt->ndns->dev)
 			nd_region_create_ns_seed(nd_region);
-		nvdimm_bus_unlock(dev);
-	}
-	if (is_nd_pfn(dev) && probe) {
+	} else if (is_nd_pfn(dev)) {
 		struct nd_pfn *nd_pfn = to_nd_pfn(dev);
 
-		nd_region = to_nd_region(dev->parent);
-		nvdimm_bus_lock(dev);
 		if (nd_region->pfn_seed == dev)
 			nd_region_create_pfn_seed(nd_region);
 		if (nd_region->ns_seed == &nd_pfn->ndns->dev)
 			nd_region_create_ns_seed(nd_region);
-		nvdimm_bus_unlock(dev);
-	}
-	if (is_nd_dax(dev) && probe) {
+	} else if (is_nd_dax(dev)) {
 		struct nd_dax *nd_dax = to_nd_dax(dev);
 
-		nd_region = to_nd_region(dev->parent);
-		nvdimm_bus_lock(dev);
 		if (nd_region->dax_seed == dev)
 			nd_region_create_dax_seed(nd_region);
 		if (nd_region->ns_seed == &nd_dax->nd_pfn.ndns->dev)
 			nd_region_create_ns_seed(nd_region);
-		nvdimm_bus_unlock(dev);
 	}
-}
-
-void nd_region_probe_success(struct nvdimm_bus *nvdimm_bus, struct device *dev)
-{
-	nd_region_notify_driver_action(nvdimm_bus, dev, true);
-}
-
-void nd_region_disable(struct nvdimm_bus *nvdimm_bus, struct device *dev)
-{
-	nd_region_notify_driver_action(nvdimm_bus, dev, false);
+	nvdimm_bus_unlock(dev);
 }
 
 static ssize_t mappingN(struct device *dev, char *buf, int n)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/4] nvdimm: Consider probe return -EOPNOTSUPP as success
  2019-08-14  4:22   ` Dan Williams
@ 2019-08-15 19:54     ` Dan Williams
  2019-08-19  7:05       ` Aneesh Kumar K.V
  0 siblings, 1 reply; 13+ messages in thread
From: Dan Williams @ 2019-08-15 19:54 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-nvdimm, Linux MM, linuxppc-dev

On Tue, Aug 13, 2019 at 9:22 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Hi Aneesh, logic looks correct but there are some cleanups I'd like to
> see and a lead-in patch that I attached.
>
> I've started prefixing nvdimm patches with:
>
>     libnvdimm/$component:
>
> ...since this patch mostly impacts the pmem driver lets prefix it
> "libnvdimm/pmem: "
>
> On Fri, Aug 9, 2019 at 12:45 AM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
> >
> > This patch add -EOPNOTSUPP as return from probe callback to
>
> s/This patch add/Add/
>
> No need to say "this patch" it's obviously a patch.
>
> > indicate we were not able to initialize a namespace due to pfn superblock
> > feature/version mismatch. We want to consider this a probe success so that
> > we can create new namesapce seed and there by avoid marking the failed
> > namespace as the seed namespace.
>
> Please replace usage of "we" with the exact agent involved as which
> "we" is being referred to gets confusing for the reader.
>
> i.e. "indicate that the pmem driver was not..." "The nvdimm core wants
> to consider this...".
>
> >
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> > ---
> >  drivers/nvdimm/bus.c  |  2 +-
> >  drivers/nvdimm/pmem.c | 26 ++++++++++++++++++++++----
> >  2 files changed, 23 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
> > index 798c5c4aea9c..16c35e6446a7 100644
> > --- a/drivers/nvdimm/bus.c
> > +++ b/drivers/nvdimm/bus.c
> > @@ -95,7 +95,7 @@ static int nvdimm_bus_probe(struct device *dev)
> >         rc = nd_drv->probe(dev);
> >         debug_nvdimm_unlock(dev);
> >
> > -       if (rc == 0)
> > +       if (rc == 0 || rc == -EOPNOTSUPP)
> >                 nd_region_probe_success(nvdimm_bus, dev);
>
> This now makes the nd_region_probe_success() helper obviously misnamed
> since it now wants to take actions on non-probe success. I attached a
> lead-in cleanup that you can pull into your series that renames that
> routine to nd_region_advance_seeds().
>
> When you rebase this needs a comment about why EOPNOTSUPP has special handling.
>
> >         else
> >                 nd_region_disable(nvdimm_bus, dev);
> > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> > index 4c121dd03dd9..3f498881dd28 100644
> > --- a/drivers/nvdimm/pmem.c
> > +++ b/drivers/nvdimm/pmem.c
> > @@ -490,6 +490,7 @@ static int pmem_attach_disk(struct device *dev,
> >
> >  static int nd_pmem_probe(struct device *dev)
> >  {
> > +       int ret;
> >         struct nd_namespace_common *ndns;
> >
> >         ndns = nvdimm_namespace_common_probe(dev);
> > @@ -505,12 +506,29 @@ static int nd_pmem_probe(struct device *dev)
> >         if (is_nd_pfn(dev))
> >                 return pmem_attach_disk(dev, ndns);
> >
> > -       /* if we find a valid info-block we'll come back as that personality */
> > -       if (nd_btt_probe(dev, ndns) == 0 || nd_pfn_probe(dev, ndns) == 0
> > -                       || nd_dax_probe(dev, ndns) == 0)
>
> Similar need for an updated comment here to explain the special
> translation of error codes.
>
> > +       ret = nd_btt_probe(dev, ndns);
> > +       if (ret == 0)
> >                 return -ENXIO;
> > +       else if (ret == -EOPNOTSUPP)
>
> Are there cases where the btt driver needs to return EOPNOTSUPP? I'd
> otherwise like to keep this special casing constrained to the pfn /
> dax info block cases.

In fact I think EOPNOTSUPP is only something that the device-dax case
would be concerned with because that's the only interface that
attempts to guarantee a given mapping granularity.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 3/4] mm/nvdimm: Use correct #defines instead of open coding
  2019-08-09  7:45 ` [PATCH v5 3/4] mm/nvdimm: Use correct #defines instead of open coding Aneesh Kumar K.V
@ 2019-08-15 21:05   ` Dan Williams
  2019-08-19  7:11     ` Aneesh Kumar K.V
  0 siblings, 1 reply; 13+ messages in thread
From: Dan Williams @ 2019-08-15 21:05 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-nvdimm, Linux MM, linuxppc-dev

On Fri, Aug 9, 2019 at 12:45 AM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> Use PAGE_SIZE instead of SZ_4K and sizeof(struct page) instead of 64.
> If we have a kernel built with different struct page size the previous
> patch should handle marking the namespace disabled.

Each of these changes carry independent non-overlapping regression
risk, so lets split them into separate patches. Others might

> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  drivers/nvdimm/label.c          | 2 +-
>  drivers/nvdimm/namespace_devs.c | 6 +++---
>  drivers/nvdimm/pfn_devs.c       | 3 ++-
>  drivers/nvdimm/region_devs.c    | 8 ++++----
>  4 files changed, 10 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/nvdimm/label.c b/drivers/nvdimm/label.c
> index 73e197babc2f..7ee037063be7 100644
> --- a/drivers/nvdimm/label.c
> +++ b/drivers/nvdimm/label.c
> @@ -355,7 +355,7 @@ static bool slot_valid(struct nvdimm_drvdata *ndd,
>
>         /* check that DPA allocations are page aligned */
>         if ((__le64_to_cpu(nd_label->dpa)
> -                               | __le64_to_cpu(nd_label->rawsize)) % SZ_4K)
> +                               | __le64_to_cpu(nd_label->rawsize)) % PAGE_SIZE)

The UEFI label specification has no concept of PAGE_SIZE, so this
check is a pure Linux-ism. There's no strict requirement why
slot_valid() needs to check for page alignment and it would seem to
actively hurt cross-page-size compatibility, so let's delete the check
and rely on checksum validation.

>                 return false;
>
>         /* check checksum */
> diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
> index a16e52251a30..a9c76df12cb9 100644
> --- a/drivers/nvdimm/namespace_devs.c
> +++ b/drivers/nvdimm/namespace_devs.c
> @@ -1006,10 +1006,10 @@ static ssize_t __size_store(struct device *dev, unsigned long long val)
>                 return -ENXIO;
>         }
>
> -       div_u64_rem(val, SZ_4K * nd_region->ndr_mappings, &remainder);
> +       div_u64_rem(val, PAGE_SIZE * nd_region->ndr_mappings, &remainder);
>         if (remainder) {
> -               dev_dbg(dev, "%llu is not %dK aligned\n", val,
> -                               (SZ_4K * nd_region->ndr_mappings) / SZ_1K);
> +               dev_dbg(dev, "%llu is not %ldK aligned\n", val,
> +                               (PAGE_SIZE * nd_region->ndr_mappings) / SZ_1K);
>                 return -EINVAL;

Yes, looks good, but this deserves its own independent patch.

>         }
>
> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> index 37e96811c2fc..c1d9be609322 100644
> --- a/drivers/nvdimm/pfn_devs.c
> +++ b/drivers/nvdimm/pfn_devs.c
> @@ -725,7 +725,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
>                  * when populating the vmemmap. This *should* be equal to
>                  * PMD_SIZE for most architectures.
>                  */
> -               offset = ALIGN(start + SZ_8K + 64 * npfns, align) - start;
> +               offset = ALIGN(start + SZ_8K + sizeof(struct page) * npfns,

I'd prefer if this was not dynamic and was instead set to the maximum
size of 'struct page' across all archs just to enhance cross-arch
compatibility. I think that answer is '64'.
> +                              align) - start;
>         } else if (nd_pfn->mode == PFN_MODE_RAM)
>                 offset = ALIGN(start + SZ_8K, align) - start;
>         else
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index af30cbe7a8ea..20e265a534f8 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -992,10 +992,10 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
>                 struct nd_mapping_desc *mapping = &ndr_desc->mapping[i];
>                 struct nvdimm *nvdimm = mapping->nvdimm;
>
> -               if ((mapping->start | mapping->size) % SZ_4K) {
> -                       dev_err(&nvdimm_bus->dev, "%s: %s mapping%d is not 4K aligned\n",
> -                                       caller, dev_name(&nvdimm->dev), i);
> -
> +               if ((mapping->start | mapping->size) % PAGE_SIZE) {
> +                       dev_err(&nvdimm_bus->dev,
> +                               "%s: %s mapping%d is not %ld aligned\n",
> +                               caller, dev_name(&nvdimm->dev), i, PAGE_SIZE);
>                         return NULL;
>                 }
>
> --
> 2.21.0
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/4] nvdimm: Consider probe return -EOPNOTSUPP as success
  2019-08-15 19:54     ` Dan Williams
@ 2019-08-19  7:05       ` Aneesh Kumar K.V
  2019-08-19 16:57         ` Dan Williams
  0 siblings, 1 reply; 13+ messages in thread
From: Aneesh Kumar K.V @ 2019-08-19  7:05 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm, Linux MM, linuxppc-dev

Dan Williams <dan.j.williams@intel.com> writes:

> On Tue, Aug 13, 2019 at 9:22 PM Dan Williams <dan.j.williams@intel.com> wrote:
>>
>> Hi Aneesh, logic looks correct but there are some cleanups I'd like to
>> see and a lead-in patch that I attached.
>>
>> I've started prefixing nvdimm patches with:
>>
>>     libnvdimm/$component:
>>
>> ...since this patch mostly impacts the pmem driver lets prefix it
>> "libnvdimm/pmem: "
>>
>> On Fri, Aug 9, 2019 at 12:45 AM Aneesh Kumar K.V
>> <aneesh.kumar@linux.ibm.com> wrote:
>> >
>> > This patch add -EOPNOTSUPP as return from probe callback to
>>
>> s/This patch add/Add/
>>
>> No need to say "this patch" it's obviously a patch.
>>
>> > indicate we were not able to initialize a namespace due to pfn superblock
>> > feature/version mismatch. We want to consider this a probe success so that
>> > we can create new namesapce seed and there by avoid marking the failed
>> > namespace as the seed namespace.
>>
>> Please replace usage of "we" with the exact agent involved as which
>> "we" is being referred to gets confusing for the reader.
>>
>> i.e. "indicate that the pmem driver was not..." "The nvdimm core wants
>> to consider this...".
>>
>> >
>> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> > ---
>> >  drivers/nvdimm/bus.c  |  2 +-
>> >  drivers/nvdimm/pmem.c | 26 ++++++++++++++++++++++----
>> >  2 files changed, 23 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
>> > index 798c5c4aea9c..16c35e6446a7 100644
>> > --- a/drivers/nvdimm/bus.c
>> > +++ b/drivers/nvdimm/bus.c
>> > @@ -95,7 +95,7 @@ static int nvdimm_bus_probe(struct device *dev)
>> >         rc = nd_drv->probe(dev);
>> >         debug_nvdimm_unlock(dev);
>> >
>> > -       if (rc == 0)
>> > +       if (rc == 0 || rc == -EOPNOTSUPP)
>> >                 nd_region_probe_success(nvdimm_bus, dev);
>>
>> This now makes the nd_region_probe_success() helper obviously misnamed
>> since it now wants to take actions on non-probe success. I attached a
>> lead-in cleanup that you can pull into your series that renames that
>> routine to nd_region_advance_seeds().
>>
>> When you rebase this needs a comment about why EOPNOTSUPP has special handling.
>>
>> >         else
>> >                 nd_region_disable(nvdimm_bus, dev);
>> > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
>> > index 4c121dd03dd9..3f498881dd28 100644
>> > --- a/drivers/nvdimm/pmem.c
>> > +++ b/drivers/nvdimm/pmem.c
>> > @@ -490,6 +490,7 @@ static int pmem_attach_disk(struct device *dev,
>> >
>> >  static int nd_pmem_probe(struct device *dev)
>> >  {
>> > +       int ret;
>> >         struct nd_namespace_common *ndns;
>> >
>> >         ndns = nvdimm_namespace_common_probe(dev);
>> > @@ -505,12 +506,29 @@ static int nd_pmem_probe(struct device *dev)
>> >         if (is_nd_pfn(dev))
>> >                 return pmem_attach_disk(dev, ndns);
>> >
>> > -       /* if we find a valid info-block we'll come back as that personality */
>> > -       if (nd_btt_probe(dev, ndns) == 0 || nd_pfn_probe(dev, ndns) == 0
>> > -                       || nd_dax_probe(dev, ndns) == 0)
>>
>> Similar need for an updated comment here to explain the special
>> translation of error codes.
>>
>> > +       ret = nd_btt_probe(dev, ndns);
>> > +       if (ret == 0)
>> >                 return -ENXIO;
>> > +       else if (ret == -EOPNOTSUPP)
>>
>> Are there cases where the btt driver needs to return EOPNOTSUPP? I'd
>> otherwise like to keep this special casing constrained to the pfn /
>> dax info block cases.
>
> In fact I think EOPNOTSUPP is only something that the device-dax case
> would be concerned with because that's the only interface that
> attempts to guarantee a given mapping granularity.

We need to do similar error handling w.r.t fsdax when the pfn superblock
indicates different PAGE_SIZE and struct page size? I don't think btt
needs to support EOPNOTSUPP. But we can keep it for consistency?

-aneesh



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 3/4] mm/nvdimm: Use correct #defines instead of open coding
  2019-08-15 21:05   ` Dan Williams
@ 2019-08-19  7:11     ` Aneesh Kumar K.V
  2019-08-19  9:30       ` Aneesh Kumar K.V
  0 siblings, 1 reply; 13+ messages in thread
From: Aneesh Kumar K.V @ 2019-08-19  7:11 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm, Linux MM, linuxppc-dev

Dan Williams <dan.j.williams@intel.com> writes:

> On Fri, Aug 9, 2019 at 12:45 AM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> Use PAGE_SIZE instead of SZ_4K and sizeof(struct page) instead of 64.
>> If we have a kernel built with different struct page size the previous
>> patch should handle marking the namespace disabled.
>
> Each of these changes carry independent non-overlapping regression
> risk, so lets split them into separate patches. Others might
>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>  drivers/nvdimm/label.c          | 2 +-
>>  drivers/nvdimm/namespace_devs.c | 6 +++---
>>  drivers/nvdimm/pfn_devs.c       | 3 ++-
>>  drivers/nvdimm/region_devs.c    | 8 ++++----
>>  4 files changed, 10 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/nvdimm/label.c b/drivers/nvdimm/label.c
>> index 73e197babc2f..7ee037063be7 100644
>> --- a/drivers/nvdimm/label.c
>> +++ b/drivers/nvdimm/label.c
>> @@ -355,7 +355,7 @@ static bool slot_valid(struct nvdimm_drvdata *ndd,
>>
>>         /* check that DPA allocations are page aligned */
>>         if ((__le64_to_cpu(nd_label->dpa)
>> -                               | __le64_to_cpu(nd_label->rawsize)) % SZ_4K)
>> +                               | __le64_to_cpu(nd_label->rawsize)) % PAGE_SIZE)
>
> The UEFI label specification has no concept of PAGE_SIZE, so this
> check is a pure Linux-ism. There's no strict requirement why
> slot_valid() needs to check for page alignment and it would seem to
> actively hurt cross-page-size compatibility, so let's delete the check
> and rely on checksum validation.


Will do a separate patch to drop that check.

>
>>                 return false;
>>
>>         /* check checksum */
>> diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
>> index a16e52251a30..a9c76df12cb9 100644
>> --- a/drivers/nvdimm/namespace_devs.c
>> +++ b/drivers/nvdimm/namespace_devs.c
>> @@ -1006,10 +1006,10 @@ static ssize_t __size_store(struct device *dev, unsigned long long val)
>>                 return -ENXIO;
>>         }
>>
>> -       div_u64_rem(val, SZ_4K * nd_region->ndr_mappings, &remainder);
>> +       div_u64_rem(val, PAGE_SIZE * nd_region->ndr_mappings, &remainder);
>>         if (remainder) {
>> -               dev_dbg(dev, "%llu is not %dK aligned\n", val,
>> -                               (SZ_4K * nd_region->ndr_mappings) / SZ_1K);
>> +               dev_dbg(dev, "%llu is not %ldK aligned\n", val,
>> +                               (PAGE_SIZE * nd_region->ndr_mappings) / SZ_1K);
>>                 return -EINVAL;
>
> Yes, looks good, but this deserves its own independent patch.
>
>>         }
>>
>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
>> index 37e96811c2fc..c1d9be609322 100644
>> --- a/drivers/nvdimm/pfn_devs.c
>> +++ b/drivers/nvdimm/pfn_devs.c
>> @@ -725,7 +725,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
>>                  * when populating the vmemmap. This *should* be equal to
>>                  * PMD_SIZE for most architectures.
>>                  */
>> -               offset = ALIGN(start + SZ_8K + 64 * npfns, align) - start;
>> +               offset = ALIGN(start + SZ_8K + sizeof(struct page) * npfns,
>
> I'd prefer if this was not dynamic and was instead set to the maximum
> size of 'struct page' across all archs just to enhance cross-arch
> compatibility. I think that answer is '64'.


That still doesn't take care of the case where we add new elements to
struct page later. If we have struct page size changing across
architectures, we should still be ok as long as new size is less than what is
stored in pfn superblock? I understand the desire to keep it
non-dynamic. But we also need to make sure we don't reserve less space
when creating a new namespace on a config that got struct page size >
64? 


>> +                              align) - start;
>>         } else if (nd_pfn->mode == PFN_MODE_RAM)
>>                 offset = ALIGN(start + SZ_8K, align) - start;
>>         else
>> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
>> index af30cbe7a8ea..20e265a534f8 100644
>> --- a/drivers/nvdimm/region_devs.c
>> +++ b/drivers/nvdimm/region_devs.c
>> @@ -992,10 +992,10 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
>>                 struct nd_mapping_desc *mapping = &ndr_desc->mapping[i];
>>                 struct nvdimm *nvdimm = mapping->nvdimm;
>>
>> -               if ((mapping->start | mapping->size) % SZ_4K) {
>> -                       dev_err(&nvdimm_bus->dev, "%s: %s mapping%d is not 4K aligned\n",
>> -                                       caller, dev_name(&nvdimm->dev), i);
>> -
>> +               if ((mapping->start | mapping->size) % PAGE_SIZE) {
>> +                       dev_err(&nvdimm_bus->dev,
>> +                               "%s: %s mapping%d is not %ld aligned\n",
>> +                               caller, dev_name(&nvdimm->dev), i, PAGE_SIZE);
>>                         return NULL;
>>                 }
>>
>> --
>> 2.21.0
>>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 3/4] mm/nvdimm: Use correct #defines instead of open coding
  2019-08-19  7:11     ` Aneesh Kumar K.V
@ 2019-08-19  9:30       ` Aneesh Kumar K.V
  2019-08-19 20:33         ` Dan Williams
  0 siblings, 1 reply; 13+ messages in thread
From: Aneesh Kumar K.V @ 2019-08-19  9:30 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm, Linux MM, linuxppc-dev

Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes:

> Dan Williams <dan.j.williams@intel.com> writes:
>
>> On Fri, Aug 9, 2019 at 12:45 AM Aneesh Kumar K.V
>> <aneesh.kumar@linux.ibm.com> wrote:
>>>
>>

...

>>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
>>> index 37e96811c2fc..c1d9be609322 100644
>>> --- a/drivers/nvdimm/pfn_devs.c
>>> +++ b/drivers/nvdimm/pfn_devs.c
>>> @@ -725,7 +725,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
>>>                  * when populating the vmemmap. This *should* be equal to
>>>                  * PMD_SIZE for most architectures.
>>>                  */
>>> -               offset = ALIGN(start + SZ_8K + 64 * npfns, align) - start;
>>> +               offset = ALIGN(start + SZ_8K + sizeof(struct page) * npfns,
>>
>> I'd prefer if this was not dynamic and was instead set to the maximum
>> size of 'struct page' across all archs just to enhance cross-arch
>> compatibility. I think that answer is '64'.
>
>
> That still doesn't take care of the case where we add new elements to
> struct page later. If we have struct page size changing across
> architectures, we should still be ok as long as new size is less than what is
> stored in pfn superblock? I understand the desire to keep it
> non-dynamic. But we also need to make sure we don't reserve less space
> when creating a new namespace on a config that got struct page size >
> 64? 


How about

libnvdimm/pfn_dev: Add a build check to make sure we notice when struct page size change

When namespace is created with map device as pmem device, struct page is stored in the
reserve block area. We need to make sure we account for the right struct page
size while doing this. Instead of directly depending on sizeof(struct page)
which can change based on different kernel config option, use the max struct
page size (64) while calculating the reserve block area. This makes sure pmem
device can be used across kernels built with different configs.

If the above assumption of max struct page size change, we need to update the
reserve block allocation space for new namespaces created.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

1 file changed, 7 insertions(+)
drivers/nvdimm/pfn_devs.c | 7 +++++++

modified   drivers/nvdimm/pfn_devs.c
@@ -722,7 +722,14 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 		 * The altmap should be padded out to the block size used
 		 * when populating the vmemmap. This *should* be equal to
 		 * PMD_SIZE for most architectures.
+		 *
+		 * Also make sure size of struct page is less than 64. We
+		 * want to make sure we use large enough size here so that
+		 * we don't have a dynamic reserve space depending on
+		 * struct page size. But we also want to make sure we notice
+		 * if we end up adding new elements to struct page.
 		 */
+		BUILD_BUG_ON(64 < sizeof(struct page));
 		offset = ALIGN(start + SZ_8K + 64 * npfns, align) - start;
 	} else if (nd_pfn->mode == PFN_MODE_RAM)
 		offset = ALIGN(start + SZ_8K, align) - start;


-aneesh



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/4] nvdimm: Consider probe return -EOPNOTSUPP as success
  2019-08-19  7:05       ` Aneesh Kumar K.V
@ 2019-08-19 16:57         ` Dan Williams
  0 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2019-08-19 16:57 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-nvdimm, Linux MM, linuxppc-dev

On Mon, Aug 19, 2019 at 12:07 AM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> Dan Williams <dan.j.williams@intel.com> writes:
>
> > On Tue, Aug 13, 2019 at 9:22 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >>
> >> Hi Aneesh, logic looks correct but there are some cleanups I'd like to
> >> see and a lead-in patch that I attached.
> >>
> >> I've started prefixing nvdimm patches with:
> >>
> >>     libnvdimm/$component:
> >>
> >> ...since this patch mostly impacts the pmem driver lets prefix it
> >> "libnvdimm/pmem: "
> >>
> >> On Fri, Aug 9, 2019 at 12:45 AM Aneesh Kumar K.V
> >> <aneesh.kumar@linux.ibm.com> wrote:
> >> >
> >> > This patch add -EOPNOTSUPP as return from probe callback to
> >>
> >> s/This patch add/Add/
> >>
> >> No need to say "this patch" it's obviously a patch.
> >>
> >> > indicate we were not able to initialize a namespace due to pfn superblock
> >> > feature/version mismatch. We want to consider this a probe success so that
> >> > we can create new namesapce seed and there by avoid marking the failed
> >> > namespace as the seed namespace.
> >>
> >> Please replace usage of "we" with the exact agent involved as which
> >> "we" is being referred to gets confusing for the reader.
> >>
> >> i.e. "indicate that the pmem driver was not..." "The nvdimm core wants
> >> to consider this...".
> >>
> >> >
> >> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> >> > ---
> >> >  drivers/nvdimm/bus.c  |  2 +-
> >> >  drivers/nvdimm/pmem.c | 26 ++++++++++++++++++++++----
> >> >  2 files changed, 23 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
> >> > index 798c5c4aea9c..16c35e6446a7 100644
> >> > --- a/drivers/nvdimm/bus.c
> >> > +++ b/drivers/nvdimm/bus.c
> >> > @@ -95,7 +95,7 @@ static int nvdimm_bus_probe(struct device *dev)
> >> >         rc = nd_drv->probe(dev);
> >> >         debug_nvdimm_unlock(dev);
> >> >
> >> > -       if (rc == 0)
> >> > +       if (rc == 0 || rc == -EOPNOTSUPP)
> >> >                 nd_region_probe_success(nvdimm_bus, dev);
> >>
> >> This now makes the nd_region_probe_success() helper obviously misnamed
> >> since it now wants to take actions on non-probe success. I attached a
> >> lead-in cleanup that you can pull into your series that renames that
> >> routine to nd_region_advance_seeds().
> >>
> >> When you rebase this needs a comment about why EOPNOTSUPP has special handling.
> >>
> >> >         else
> >> >                 nd_region_disable(nvdimm_bus, dev);
> >> > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> >> > index 4c121dd03dd9..3f498881dd28 100644
> >> > --- a/drivers/nvdimm/pmem.c
> >> > +++ b/drivers/nvdimm/pmem.c
> >> > @@ -490,6 +490,7 @@ static int pmem_attach_disk(struct device *dev,
> >> >
> >> >  static int nd_pmem_probe(struct device *dev)
> >> >  {
> >> > +       int ret;
> >> >         struct nd_namespace_common *ndns;
> >> >
> >> >         ndns = nvdimm_namespace_common_probe(dev);
> >> > @@ -505,12 +506,29 @@ static int nd_pmem_probe(struct device *dev)
> >> >         if (is_nd_pfn(dev))
> >> >                 return pmem_attach_disk(dev, ndns);
> >> >
> >> > -       /* if we find a valid info-block we'll come back as that personality */
> >> > -       if (nd_btt_probe(dev, ndns) == 0 || nd_pfn_probe(dev, ndns) == 0
> >> > -                       || nd_dax_probe(dev, ndns) == 0)
> >>
> >> Similar need for an updated comment here to explain the special
> >> translation of error codes.
> >>
> >> > +       ret = nd_btt_probe(dev, ndns);
> >> > +       if (ret == 0)
> >> >                 return -ENXIO;
> >> > +       else if (ret == -EOPNOTSUPP)
> >>
> >> Are there cases where the btt driver needs to return EOPNOTSUPP? I'd
> >> otherwise like to keep this special casing constrained to the pfn /
> >> dax info block cases.
> >
> > In fact I think EOPNOTSUPP is only something that the device-dax case
> > would be concerned with because that's the only interface that
> > attempts to guarantee a given mapping granularity.
>
> We need to do similar error handling w.r.t fsdax when the pfn superblock
> indicates different PAGE_SIZE and struct page size?

Only in the case where PAGE_SIZE is less than the pfn superblock page
size, the memmap is stored on pmem, and the reservation is too small.
Otherwise the PAGE_SIZE difference does not matter in practice for the
fsdax case... unless I'm overlooking another failure case?

> I don't think btt
> needs to support EOPNOTSUPP. But we can keep it for consistency?

That's not a sufficient argument in my mind. The comment about why
EOPNOTSUPP is treated specially should have a note about the known
usages, and since there is no BTT case for it lets leave it out.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 3/4] mm/nvdimm: Use correct #defines instead of open coding
  2019-08-19  9:30       ` Aneesh Kumar K.V
@ 2019-08-19 20:33         ` Dan Williams
  0 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2019-08-19 20:33 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-nvdimm, Linux MM, linuxppc-dev

On Mon, Aug 19, 2019 at 2:32 AM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes:
>
> > Dan Williams <dan.j.williams@intel.com> writes:
> >
> >> On Fri, Aug 9, 2019 at 12:45 AM Aneesh Kumar K.V
> >> <aneesh.kumar@linux.ibm.com> wrote:
> >>>
> >>
>
> ...
>
> >>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> >>> index 37e96811c2fc..c1d9be609322 100644
> >>> --- a/drivers/nvdimm/pfn_devs.c
> >>> +++ b/drivers/nvdimm/pfn_devs.c
> >>> @@ -725,7 +725,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
> >>>                  * when populating the vmemmap. This *should* be equal to
> >>>                  * PMD_SIZE for most architectures.
> >>>                  */
> >>> -               offset = ALIGN(start + SZ_8K + 64 * npfns, align) - start;
> >>> +               offset = ALIGN(start + SZ_8K + sizeof(struct page) * npfns,
> >>
> >> I'd prefer if this was not dynamic and was instead set to the maximum
> >> size of 'struct page' across all archs just to enhance cross-arch
> >> compatibility. I think that answer is '64'.
> >
> >
> > That still doesn't take care of the case where we add new elements to
> > struct page later. If we have struct page size changing across
> > architectures, we should still be ok as long as new size is less than what is
> > stored in pfn superblock? I understand the desire to keep it
> > non-dynamic. But we also need to make sure we don't reserve less space
> > when creating a new namespace on a config that got struct page size >
> > 64?
>
>
> How about
>
> libnvdimm/pfn_dev: Add a build check to make sure we notice when struct page size change
>
> When namespace is created with map device as pmem device, struct page is stored in the
> reserve block area. We need to make sure we account for the right struct page
> size while doing this. Instead of directly depending on sizeof(struct page)
> which can change based on different kernel config option, use the max struct
> page size (64) while calculating the reserve block area. This makes sure pmem
> device can be used across kernels built with different configs.
>
> If the above assumption of max struct page size change, we need to update the
> reserve block allocation space for new namespaces created.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>
> 1 file changed, 7 insertions(+)
> drivers/nvdimm/pfn_devs.c | 7 +++++++
>
> modified   drivers/nvdimm/pfn_devs.c
> @@ -722,7 +722,14 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
>                  * The altmap should be padded out to the block size used
>                  * when populating the vmemmap. This *should* be equal to
>                  * PMD_SIZE for most architectures.
> +                *
> +                * Also make sure size of struct page is less than 64. We
> +                * want to make sure we use large enough size here so that
> +                * we don't have a dynamic reserve space depending on
> +                * struct page size. But we also want to make sure we notice
> +                * if we end up adding new elements to struct page.
>                  */
> +               BUILD_BUG_ON(64 < sizeof(struct page));

Looks ok to me. There are ongoing heroic efforts to make sure 'struct
page' does not grown beyond the size of cacheline. The fact that
'struct page_ext' is allocated out of line makes it safe to assume
that 'struct page' will not be growing larger in the foreseeable
future.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, back to index

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-09  7:45 [PATCH v5 0/4] Mark the namespace disabled on pfn superblock mismatch Aneesh Kumar K.V
2019-08-09  7:45 ` [PATCH v5 1/4] nvdimm: Consider probe return -EOPNOTSUPP as success Aneesh Kumar K.V
2019-08-14  4:22   ` Dan Williams
2019-08-15 19:54     ` Dan Williams
2019-08-19  7:05       ` Aneesh Kumar K.V
2019-08-19 16:57         ` Dan Williams
2019-08-09  7:45 ` [PATCH v5 2/4] mm/nvdimm: Add page size and struct page size to pfn superblock Aneesh Kumar K.V
2019-08-09  7:45 ` [PATCH v5 3/4] mm/nvdimm: Use correct #defines instead of open coding Aneesh Kumar K.V
2019-08-15 21:05   ` Dan Williams
2019-08-19  7:11     ` Aneesh Kumar K.V
2019-08-19  9:30       ` Aneesh Kumar K.V
2019-08-19 20:33         ` Dan Williams
2019-08-09  7:45 ` [PATCH v5 4/4] mm/nvdimm: Pick the right alignment default when creating dax devices Aneesh Kumar K.V

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org linux-mm@archiver.kernel.org
	public-inbox-index linux-mm


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox