linux-nvdimm.lists.01.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/10] Support new pmem flush and sync instructions for POWER
@ 2020-06-10  6:23 Aneesh Kumar K.V
  2020-06-10  6:23 ` [PATCH v5 01/10] powerpc/pmem: Restrict papr_scm to P8 and above Aneesh Kumar K.V
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: Aneesh Kumar K.V @ 2020-06-10  6:23 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, msuchanek, Aneesh Kumar K.V

This patch series enables the usage os new pmem flush and sync instructions on POWER
architecture. POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps)
that can be used to write modified locations back to persistent storage. Additionally,
POWER10 also introduce phwsync and plwsync which can be used to establish order of these
writes to persistent storage.
    
This series exposes these instructions to the rest of the kernel. The existing
dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate
synchronization with OpenCAPI-hosted persistent storage. Hence the new instructions
are added as a variant of the old ones that old hardware won't differentiate.

On POWER10, pmem devices will be represented by a different device tree compat
strings. This ensures that older kernels won't initialize pmem devices on POWER10.

W.r.t userspace we want to make sure applications are enabled to use MAP_SYNC only
if they are using the new instructions. To avoid the wrong usage of MAP_SYNC on
newer hardware, we disable MAP_SYNC by default on newer hardware. The namespace specific
attribute /sys/block/pmem0/dax/sync_fault can be used to enable MAP_SYNC later.

With this:
1) vPMEM continues to work since it is a volatile region. That 
doesn't need any flush instructions.

2) pmdk and other user applications get updated to use new instructions
and updated packages are made available to all distributions

3) On newer hardware, the device will appear with a new compat string. 
Hence older distributions won't initialize pmem on newer hardware.

4) If we have a newer kernel with an older distro, we use the per 
namespace sysfs knob that prevents the usage of MAP_SYNC.

5) Sometime in the future, we mark the CONFIG_ARCH_MAP_SYNC_DISABLE=n
on ppc64 when we are confident that everybody is using the new flush 
instruction.

Chaanges from V4:
* Add namespace specific sychronous fault control.

Changes from V3:
* Add new compat string to be used for the device.
* Use arch_pmem_flush_barrier() in dm-writecache.

Aneesh Kumar K.V (10):
  powerpc/pmem: Restrict papr_scm to P8 and above.
  powerpc/pmem: Add new instructions for persistent storage and sync
  powerpc/pmem: Add flush routines using new pmem store and sync
    instruction
  libnvdimm/nvdimm/flush: Allow architecture to override the flush
    barrier
  powerpc/pmem/of_pmem: Update of_pmem to use the new barrier
    instruction.
  powerpc/pmem: Avoid the barrier in flush routines
  powerpc/book3s/pmem: Add WARN_ONCE to catch the wrong usage of pmem
    flush functions.
  libnvdimm/dax: Add a dax flag to control synchronous fault support
  powerpc/pmem: Disable synchronous fault by default
  powerpc/pmem: Initialize pmem device on newer hardware

 arch/powerpc/include/asm/cacheflush.h     | 10 ++++
 arch/powerpc/include/asm/ppc-opcode.h     | 12 ++++
 arch/powerpc/lib/pmem.c                   | 46 ++++++++++++--
 arch/powerpc/platforms/Kconfig.cputype    |  9 +++
 arch/powerpc/platforms/pseries/papr_scm.c | 31 +++++++++-
 arch/powerpc/platforms/pseries/pmem.c     |  6 ++
 drivers/dax/bus.c                         |  2 +-
 drivers/dax/super.c                       | 73 +++++++++++++++++++++++
 drivers/md/dm-writecache.c                |  2 +-
 drivers/nvdimm/of_pmem.c                  |  8 +++
 drivers/nvdimm/pmem.c                     |  4 ++
 drivers/nvdimm/region_devs.c              | 24 ++++++--
 include/linux/dax.h                       | 16 +++++
 include/linux/libnvdimm.h                 |  8 +++
 mm/Kconfig                                |  3 +
 15 files changed, 243 insertions(+), 11 deletions(-)

-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 01/10] powerpc/pmem: Restrict papr_scm to P8 and above.
  2020-06-10  6:23 [PATCH v5 00/10] Support new pmem flush and sync instructions for POWER Aneesh Kumar K.V
@ 2020-06-10  6:23 ` Aneesh Kumar K.V
  2020-06-10  6:23 ` [PATCH v5 02/10] powerpc/pmem: Add new instructions for persistent storage and sync Aneesh Kumar K.V
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Aneesh Kumar K.V @ 2020-06-10  6:23 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, msuchanek, Aneesh Kumar K.V

The PAPR based virtualized persistent memory devices are only supported on
POWER9 and above. In the followup patch, the kernel will switch the persistent
memory cache flush functions to use a new `dcbf` variant instruction. The new
instructions even though added in ISA 3.1 works even on P8 and P9 because these
are implemented as a variant of existing `dcbf` and `hwsync` and on P8 and
P9 behaves as such.

Considering these devices are only supported on P8 and above,  update the driver
to prevent a P7-compat guest from using persistent memory devices.

We don't update of_pmem driver with the same condition, because, on bare-metal,
the firmware enables pmem support only on P9 and above. There the kernel depends
on OPAL firmware to restrict exposing persistent memory related device tree
entries on older hardware. of_pmem.ko is written without any arch dependency and
we don't want to add ppc64 specific cpu feature check in of_pmem driver.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/pmem.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/pmem.c b/arch/powerpc/platforms/pseries/pmem.c
index f860a897a9e0..2347e1038f58 100644
--- a/arch/powerpc/platforms/pseries/pmem.c
+++ b/arch/powerpc/platforms/pseries/pmem.c
@@ -147,6 +147,12 @@ const struct of_device_id drc_pmem_match[] = {
 
 static int pseries_pmem_init(void)
 {
+	/*
+	 * Only supported on POWER8 and above.
+	 */
+	if (!cpu_has_feature(CPU_FTR_ARCH_207S))
+		return 0;
+
 	pmem_node = of_find_node_by_type(NULL, "ibm,persistent-memory");
 	if (!pmem_node)
 		return 0;
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 02/10] powerpc/pmem: Add new instructions for persistent storage and sync
  2020-06-10  6:23 [PATCH v5 00/10] Support new pmem flush and sync instructions for POWER Aneesh Kumar K.V
  2020-06-10  6:23 ` [PATCH v5 01/10] powerpc/pmem: Restrict papr_scm to P8 and above Aneesh Kumar K.V
@ 2020-06-10  6:23 ` Aneesh Kumar K.V
  2020-06-10  6:23 ` [PATCH v5 03/10] powerpc/pmem: Add flush routines using new pmem store and sync instruction Aneesh Kumar K.V
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Aneesh Kumar K.V @ 2020-06-10  6:23 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, msuchanek, Aneesh Kumar K.V

POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps)
that can be used to write modified locations back to persistent storage.

Additionally, POWER10 also introduce phwsync and plwsync which can be used
to establish order of these writes to persistent storage.

This patch exposes these instructions to the rest of the kernel. The existing
dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate
synchronization with OpenCAPI-hosted persistent storage. Hence the new
instructions are added as a variant of the old ones that old hardware
won't differentiate.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/ppc-opcode.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index 2a39c716c343..1ad014e4633e 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -219,6 +219,8 @@
 #define PPC_INST_STWCX			0x7c00012d
 #define PPC_INST_LWSYNC			0x7c2004ac
 #define PPC_INST_SYNC			0x7c0004ac
+#define PPC_INST_PHWSYNC		0x7c8004ac
+#define PPC_INST_PLWSYNC		0x7ca004ac
 #define PPC_INST_SYNC_MASK		0xfc0007fe
 #define PPC_INST_ISYNC			0x4c00012c
 #define PPC_INST_LXVD2X			0x7c000698
@@ -284,6 +286,8 @@
 #define PPC_INST_TABORT			0x7c00071d
 #define PPC_INST_TSR			0x7c0005dd
 
+#define PPC_INST_DCBF			0x7c0000ac
+
 #define PPC_INST_NAP			0x4c000364
 #define PPC_INST_SLEEP			0x4c0003a4
 #define PPC_INST_WINKLE			0x4c0003e4
@@ -532,6 +536,14 @@
 #define STBCIX(s,a,b)		stringify_in_c(.long PPC_INST_STBCIX | \
 				       __PPC_RS(s) | __PPC_RA(a) | __PPC_RB(b))
 
+#define	PPC_DCBFPS(a, b)	stringify_in_c(.long PPC_INST_DCBF |	\
+				       ___PPC_RA(a) | ___PPC_RB(b) | (4 << 21))
+#define	PPC_DCBSTPS(a, b)	stringify_in_c(.long PPC_INST_DCBF |	\
+				       ___PPC_RA(a) | ___PPC_RB(b) | (6 << 21))
+
+#define	PPC_PHWSYNC		stringify_in_c(.long PPC_INST_PHWSYNC)
+#define	PPC_PLWSYNC		stringify_in_c(.long PPC_INST_PLWSYNC)
+
 /*
  * Define what the VSX XX1 form instructions will look like, then add
  * the 128 bit load store instructions based on that.
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 03/10] powerpc/pmem: Add flush routines using new pmem store and sync instruction
  2020-06-10  6:23 [PATCH v5 00/10] Support new pmem flush and sync instructions for POWER Aneesh Kumar K.V
  2020-06-10  6:23 ` [PATCH v5 01/10] powerpc/pmem: Restrict papr_scm to P8 and above Aneesh Kumar K.V
  2020-06-10  6:23 ` [PATCH v5 02/10] powerpc/pmem: Add new instructions for persistent storage and sync Aneesh Kumar K.V
@ 2020-06-10  6:23 ` Aneesh Kumar K.V
  2020-06-10  6:23 ` [PATCH v5 04/10] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier Aneesh Kumar K.V
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Aneesh Kumar K.V @ 2020-06-10  6:23 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, msuchanek, Aneesh Kumar K.V

Start using dcbstps; phwsync; sequence for flushing persistent memory range.
The new instructions are implemented as a variant of dcbf and hwsync and on
P8 and P9 they will be executed as those instructions. We avoid using them on
older hardware. This helps to avoid difficult to debug bugs.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/cacheflush.h |  1 +
 arch/powerpc/lib/pmem.c               | 50 ++++++++++++++++++++++++---
 2 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
index e92191b390f3..81808d1b54ca 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -8,6 +8,7 @@
 
 #include <linux/mm.h>
 #include <asm/cputable.h>
+#include <asm/cpu_has_feature.h>
 
 /*
  * No cache flushing is required when address mappings are changed,
diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 0666a8d29596..5a61aaeb6930 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -9,20 +9,62 @@
 
 #include <asm/cacheflush.h>
 
+static inline void __clean_pmem_range(unsigned long start, unsigned long stop)
+{
+	unsigned long shift = l1_dcache_shift();
+	unsigned long bytes = l1_dcache_bytes();
+	void *addr = (void *)(start & ~(bytes - 1));
+	unsigned long size = stop - (unsigned long)addr + (bytes - 1);
+	unsigned long i;
+
+	for (i = 0; i < size >> shift; i++, addr += bytes)
+		asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): "memory");
+
+
+	asm volatile(PPC_PHWSYNC ::: "memory");
+}
+
+static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
+{
+	unsigned long shift = l1_dcache_shift();
+	unsigned long bytes = l1_dcache_bytes();
+	void *addr = (void *)(start & ~(bytes - 1));
+	unsigned long size = stop - (unsigned long)addr + (bytes - 1);
+	unsigned long i;
+
+	for (i = 0; i < size >> shift; i++, addr += bytes)
+		asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): "memory");
+
+
+	asm volatile(PPC_PHWSYNC ::: "memory");
+}
+
+static inline void clean_pmem_range(unsigned long start, unsigned long stop)
+{
+	if (cpu_has_feature(CPU_FTR_ARCH_207S))
+		return __clean_pmem_range(start, stop);
+}
+
+static inline void flush_pmem_range(unsigned long start, unsigned long stop)
+{
+	if (cpu_has_feature(CPU_FTR_ARCH_207S))
+		return __flush_pmem_range(start, stop);
+}
+
 /*
  * CONFIG_ARCH_HAS_PMEM_API symbols
  */
 void arch_wb_cache_pmem(void *addr, size_t size)
 {
 	unsigned long start = (unsigned long) addr;
-	flush_dcache_range(start, start + size);
+	clean_pmem_range(start, start + size);
 }
 EXPORT_SYMBOL_GPL(arch_wb_cache_pmem);
 
 void arch_invalidate_pmem(void *addr, size_t size)
 {
 	unsigned long start = (unsigned long) addr;
-	flush_dcache_range(start, start + size);
+	flush_pmem_range(start, start + size);
 }
 EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
 
@@ -35,7 +77,7 @@ long __copy_from_user_flushcache(void *dest, const void __user *src,
 	unsigned long copied, start = (unsigned long) dest;
 
 	copied = __copy_from_user(dest, src, size);
-	flush_dcache_range(start, start + size);
+	clean_pmem_range(start, start + size);
 
 	return copied;
 }
@@ -45,7 +87,7 @@ void *memcpy_flushcache(void *dest, const void *src, size_t size)
 	unsigned long start = (unsigned long) dest;
 
 	memcpy(dest, src, size);
-	flush_dcache_range(start, start + size);
+	clean_pmem_range(start, start + size);
 
 	return dest;
 }
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 04/10] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
  2020-06-10  6:23 [PATCH v5 00/10] Support new pmem flush and sync instructions for POWER Aneesh Kumar K.V
                   ` (2 preceding siblings ...)
  2020-06-10  6:23 ` [PATCH v5 03/10] powerpc/pmem: Add flush routines using new pmem store and sync instruction Aneesh Kumar K.V
@ 2020-06-10  6:23 ` Aneesh Kumar K.V
  2020-06-10  6:23 ` [PATCH v5 05/10] powerpc/pmem/of_pmem: Update of_pmem to use the new barrier instruction Aneesh Kumar K.V
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Aneesh Kumar K.V @ 2020-06-10  6:23 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, msuchanek, Aneesh Kumar K.V

Architectures like ppc64 provide persistent memory specific barriers
that will ensure that all stores for which the modifications are
written to persistent storage by preceding dcbfps and dcbstps
instructions have updated persistent storage before any data
access or data transfer caused by subsequent instructions is initiated.
This is in addition to the ordering done by wmb()

Update nvdimm core such that architecture can use barriers other than
wmb to ensure all previous writes are architecturally visible for
the platform buffer flush.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/md/dm-writecache.c   | 2 +-
 drivers/nvdimm/region_devs.c | 8 ++++----
 include/linux/libnvdimm.h    | 4 ++++
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 613c171b1b6d..904fdbf2b089 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -540,7 +540,7 @@ static void ssd_commit_superblock(struct dm_writecache *wc)
 static void writecache_commit_flushed(struct dm_writecache *wc, bool wait_for_ios)
 {
 	if (WC_MODE_PMEM(wc))
-		wmb();
+		arch_pmem_flush_barrier();
 	else
 		ssd_commit_flushed(wc, wait_for_ios);
 }
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index ccbb5b43b8b2..88ea34a9c7fd 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1216,13 +1216,13 @@ int generic_nvdimm_flush(struct nd_region *nd_region)
 	idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));
 
 	/*
-	 * The first wmb() is needed to 'sfence' all previous writes
-	 * such that they are architecturally visible for the platform
-	 * buffer flush.  Note that we've already arranged for pmem
+	 * The first arch_pmem_flush_barrier() is needed to 'sfence' all
+	 * previous writes such that they are architecturally visible for
+	 * the platform buffer flush. Note that we've already arranged for pmem
 	 * writes to avoid the cache via memcpy_flushcache().  The final
 	 * wmb() ensures ordering for the NVDIMM flush write.
 	 */
-	wmb();
+	arch_pmem_flush_barrier();
 	for (i = 0; i < nd_region->ndr_mappings; i++)
 		if (ndrd_get_flush_wpq(ndrd, i, 0))
 			writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 18da4059be09..66f6c65bd789 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -286,4 +286,8 @@ static inline void arch_invalidate_pmem(void *addr, size_t size)
 }
 #endif
 
+#ifndef arch_pmem_flush_barrier
+#define arch_pmem_flush_barrier() wmb()
+#endif
+
 #endif /* __LIBNVDIMM_H__ */
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 05/10] powerpc/pmem/of_pmem: Update of_pmem to use the new barrier instruction.
  2020-06-10  6:23 [PATCH v5 00/10] Support new pmem flush and sync instructions for POWER Aneesh Kumar K.V
                   ` (3 preceding siblings ...)
  2020-06-10  6:23 ` [PATCH v5 04/10] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier Aneesh Kumar K.V
@ 2020-06-10  6:23 ` Aneesh Kumar K.V
  2020-06-10  6:23 ` [PATCH v5 06/10] powerpc/pmem: Avoid the barrier in flush routines Aneesh Kumar K.V
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Aneesh Kumar K.V @ 2020-06-10  6:23 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, msuchanek, Aneesh Kumar K.V

of_pmem on POWER10 can now use phwsync instead of hwsync to ensure
all previous writes are architecturally visible for the platform
buffer flush.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/cacheflush.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
index 81808d1b54ca..bb56a49c9a66 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -120,6 +120,13 @@ static inline void invalidate_dcache_range(unsigned long start,
 #define copy_from_user_page(vma, page, vaddr, dst, src, len) \
 	memcpy(dst, src, len)
 
+
+#define arch_pmem_flush_barrier arch_pmem_flush_barrier
+static inline void  arch_pmem_flush_barrier(void)
+{
+	if (cpu_has_feature(CPU_FTR_ARCH_207S))
+		asm volatile(PPC_PHWSYNC ::: "memory");
+}
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_POWERPC_CACHEFLUSH_H */
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 06/10] powerpc/pmem: Avoid the barrier in flush routines
  2020-06-10  6:23 [PATCH v5 00/10] Support new pmem flush and sync instructions for POWER Aneesh Kumar K.V
                   ` (4 preceding siblings ...)
  2020-06-10  6:23 ` [PATCH v5 05/10] powerpc/pmem/of_pmem: Update of_pmem to use the new barrier instruction Aneesh Kumar K.V
@ 2020-06-10  6:23 ` Aneesh Kumar K.V
  2020-06-10  6:23 ` [PATCH v5 07/10] powerpc/book3s/pmem: Add WARN_ONCE to catch the wrong usage of pmem flush functions Aneesh Kumar K.V
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Aneesh Kumar K.V @ 2020-06-10  6:23 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, msuchanek, Aneesh Kumar K.V

nvdimm expect the flush routines to just mark the cache clean. The barrier
that mark the store globally visible is done in nvdimm_flush().

Update the papr_scm driver to a simplified nvdim_flush callback that do
only the required barrier.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/lib/pmem.c                   |  6 ------
 arch/powerpc/platforms/pseries/papr_scm.c | 13 +++++++++++++
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 5a61aaeb6930..21210fa676e5 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -19,9 +19,6 @@ static inline void __clean_pmem_range(unsigned long start, unsigned long stop)
 
 	for (i = 0; i < size >> shift; i++, addr += bytes)
 		asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): "memory");
-
-
-	asm volatile(PPC_PHWSYNC ::: "memory");
 }
 
 static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
@@ -34,9 +31,6 @@ static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
 
 	for (i = 0; i < size >> shift; i++, addr += bytes)
 		asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): "memory");
-
-
-	asm volatile(PPC_PHWSYNC ::: "memory");
 }
 
 static inline void clean_pmem_range(unsigned long start, unsigned long stop)
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index f35592423380..ad506e7003c9 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -285,6 +285,18 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
 
 	return 0;
 }
+/*
+ * We have made sure the pmem writes are done such that before calling this
+ * all the caches are flushed/clean. We use dcbf/dcbfps to ensure this. Here
+ * we just need to add the necessary barrier to make sure the above flushes
+ * are have updated persistent storage before any data access or data transfer
+ * caused by subsequent instructions is initiated.
+ */
+static int papr_scm_flush_sync(struct nd_region *nd_region, struct bio *bio)
+{
+	arch_pmem_flush_barrier();
+	return 0;
+}
 
 static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
 {
@@ -340,6 +352,7 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
 	ndr_desc.mapping = &mapping;
 	ndr_desc.num_mappings = 1;
 	ndr_desc.nd_set = &p->nd_set;
+	ndr_desc.flush = papr_scm_flush_sync;
 
 	if (p->is_volatile)
 		p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc);
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 07/10] powerpc/book3s/pmem: Add WARN_ONCE to catch the wrong usage of pmem flush functions.
  2020-06-10  6:23 [PATCH v5 00/10] Support new pmem flush and sync instructions for POWER Aneesh Kumar K.V
                   ` (5 preceding siblings ...)
  2020-06-10  6:23 ` [PATCH v5 06/10] powerpc/pmem: Avoid the barrier in flush routines Aneesh Kumar K.V
@ 2020-06-10  6:23 ` Aneesh Kumar K.V
  2020-06-10  6:23 ` [PATCH v5 08/10] libnvdimm/dax: Add a dax flag to control synchronous fault support Aneesh Kumar K.V
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Aneesh Kumar K.V @ 2020-06-10  6:23 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, msuchanek, Aneesh Kumar K.V

We only support persistent memory on P8 and above. This is enforced by the
firmware and further checked on virtualzied platform during platform init.
Add WARN_ONCE in pmem flush routines to catch the wrong usage of these.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/cacheflush.h | 2 ++
 arch/powerpc/lib/pmem.c               | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
index bb56a49c9a66..6dad92bd4be3 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -126,6 +126,8 @@ static inline void  arch_pmem_flush_barrier(void)
 {
 	if (cpu_has_feature(CPU_FTR_ARCH_207S))
 		asm volatile(PPC_PHWSYNC ::: "memory");
+	else
+		WARN_ONCE(1, "Using pmem flush on older hardware.");
 }
 #endif /* __KERNEL__ */
 
diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 21210fa676e5..f40bd908d28d 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -37,12 +37,14 @@ static inline void clean_pmem_range(unsigned long start, unsigned long stop)
 {
 	if (cpu_has_feature(CPU_FTR_ARCH_207S))
 		return __clean_pmem_range(start, stop);
+	WARN_ONCE(1, "Using pmem flush on older hardware.");
 }
 
 static inline void flush_pmem_range(unsigned long start, unsigned long stop)
 {
 	if (cpu_has_feature(CPU_FTR_ARCH_207S))
 		return __flush_pmem_range(start, stop);
+	WARN_ONCE(1, "Using pmem flush on older hardware.");
 }
 
 /*
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 08/10] libnvdimm/dax: Add a dax flag to control synchronous fault support
  2020-06-10  6:23 [PATCH v5 00/10] Support new pmem flush and sync instructions for POWER Aneesh Kumar K.V
                   ` (6 preceding siblings ...)
  2020-06-10  6:23 ` [PATCH v5 07/10] powerpc/book3s/pmem: Add WARN_ONCE to catch the wrong usage of pmem flush functions Aneesh Kumar K.V
@ 2020-06-10  6:23 ` Aneesh Kumar K.V
  2020-06-10  6:23 ` [PATCH v5 09/10] powerpc/pmem: Disable synchronous fault by default Aneesh Kumar K.V
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Aneesh Kumar K.V @ 2020-06-10  6:23 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, msuchanek, Aneesh Kumar K.V

With POWER10, architecture is adding new pmem flush and sync instructions.
The kernel should prevent the usage of MAP_SYNC if applications are not using
the new instructions on newer hardware

This patch adds a dax attribute (/sys/bus/nd/devices/region0/pfn0.1/block/pmem0/dax/sync_fault)
which can be used to control this flag. If the device supports synchronous flush
then userspace can update this attribute to enable/disable the synchronous
fault. The attribute is only visible if there is write cache enabled on the device.

In a followup patch on ppc64 device with compat string "ibm,pmemory-v2"
will disable the sync fault feature.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/dax/bus.c            |  2 +-
 drivers/dax/super.c          | 73 ++++++++++++++++++++++++++++++++++++
 drivers/nvdimm/pmem.c        |  4 ++
 drivers/nvdimm/region_devs.c | 16 ++++++++
 include/linux/dax.h          | 16 ++++++++
 include/linux/libnvdimm.h    |  4 ++
 mm/Kconfig                   |  3 ++
 7 files changed, 117 insertions(+), 1 deletion(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index df238c8b6ef2..8a825ecff49b 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -420,7 +420,7 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region *dax_region, int id,
 	 * No 'host' or dax_operations since there is no access to this
 	 * device outside of mmap of the resulting character device.
 	 */
-	dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
+	dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC | DAXDEV_F_SYNC_ENABLED);
 	if (IS_ERR(dax_dev)) {
 		rc = PTR_ERR(dax_dev);
 		goto err;
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 8e32345be0f7..f93e6649d452 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -198,6 +198,12 @@ enum dax_device_flags {
 	DAXDEV_WRITE_CACHE,
 	/* flag to check if device supports synchronous flush */
 	DAXDEV_SYNC,
+	/*
+	 * flag to indicate whether synchronous flush is enabled.
+	 * Some platform may want to disable synchronous flush support
+	 * even though device supports the same.
+	 */
+	DAXDEV_SYNC_ENABLED,
 };
 
 /**
@@ -254,6 +260,63 @@ static ssize_t write_cache_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(write_cache);
 
+bool __dax_synchronous_enabled(struct dax_device *dax_dev)
+{
+	return test_bit(DAXDEV_SYNC_ENABLED, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(__dax_synchronous_enabled);
+
+static void set_dax_synchronous_enable(struct dax_device *dax_dev, bool enable)
+{
+	if (!test_bit(DAXDEV_SYNC, &dax_dev->flags))
+		return;
+
+	if (enable)
+		set_bit(DAXDEV_SYNC_ENABLED, &dax_dev->flags);
+	else
+		clear_bit(DAXDEV_SYNC_ENABLED, &dax_dev->flags);
+}
+
+
+static ssize_t sync_fault_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	int enabled;
+	struct dax_device *dax_dev = dax_get_by_host(dev_name(dev));
+	ssize_t rc;
+
+	WARN_ON_ONCE(!dax_dev);
+	if (!dax_dev)
+		return -ENXIO;
+
+	enabled = (dax_synchronous(dax_dev) && dax_synchronous_enabled(dax_dev));
+	rc = sprintf(buf, "%d\n", enabled);
+	put_dax(dax_dev);
+	return rc;
+}
+
+static ssize_t sync_fault_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
+{
+	bool enable_sync;
+	int rc = strtobool(buf, &enable_sync);
+	struct dax_device *dax_dev = dax_get_by_host(dev_name(dev));
+
+	WARN_ON_ONCE(!dax_dev);
+	if (!dax_dev)
+		return -ENXIO;
+
+	if (rc)
+		len = rc;
+	else
+		set_dax_synchronous_enable(dax_dev, enable_sync);
+
+	put_dax(dax_dev);
+	return len;
+}
+
+static DEVICE_ATTR_RW(sync_fault);
+
 static umode_t dax_visible(struct kobject *kobj, struct attribute *a, int n)
 {
 	struct device *dev = container_of(kobj, typeof(*dev), kobj);
@@ -267,11 +330,18 @@ static umode_t dax_visible(struct kobject *kobj, struct attribute *a, int n)
 	if (a == &dev_attr_write_cache.attr)
 		return 0;
 #endif
+	if (a == &dev_attr_sync_fault.attr) {
+		if (dax_write_cache_enabled(dax_dev))
+			return a->mode;
+		return 0;
+	}
+
 	return a->mode;
 }
 
 static struct attribute *dax_attributes[] = {
 	&dev_attr_write_cache.attr,
+	&dev_attr_sync_fault.attr,
 	NULL,
 };
 
@@ -594,6 +664,9 @@ struct dax_device *alloc_dax(void *private, const char *__host,
 	if (flags & DAXDEV_F_SYNC)
 		set_dax_synchronous(dax_dev);
 
+	if (flags & DAXDEV_F_SYNC_ENABLED)
+		set_dax_synchronous_enable(dax_dev, true);
+
 	return dax_dev;
 
  err_dev:
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 97f948f8f4e6..a738b237a3ff 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -487,6 +487,10 @@ static int pmem_attach_disk(struct device *dev,
 
 	if (is_nvdimm_sync(nd_region))
 		flags = DAXDEV_F_SYNC;
+
+	if (is_nvdimm_sync_enabled(nd_region))
+		flags |= DAXDEV_F_SYNC_ENABLED;
+
 	dax_dev = alloc_dax(pmem, disk->disk_name, &pmem_dax_ops, flags);
 	if (IS_ERR(dax_dev)) {
 		put_disk(disk);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 88ea34a9c7fd..d60d159b812b 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1283,6 +1283,22 @@ bool is_nvdimm_sync(struct nd_region *nd_region)
 }
 EXPORT_SYMBOL_GPL(is_nvdimm_sync);
 
+bool is_nvdimm_sync_enabled(struct nd_region *nd_region)
+{
+#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE
+	if (is_nd_volatile(&nd_region->dev))
+		return true;
+
+	return is_nd_pmem(&nd_region->dev) &&
+		test_bit(ND_REGION_SYNC_ENABLED, &nd_region->flags);
+#else
+	return true;
+#endif
+
+}
+EXPORT_SYMBOL_GPL(is_nvdimm_sync_enabled);
+
+
 struct conflict_context {
 	struct nd_region *nd_region;
 	resource_size_t start, size;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index d7af5d243f24..c4a3551557de 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -10,6 +10,9 @@
 /* Flag for synchronous flush */
 #define DAXDEV_F_SYNC (1UL << 0)
 
+/* flag for platform forcing synchronous flush disable */
+#define DAXDEV_F_SYNC_ENABLED (1UL << 1)
+
 typedef unsigned long dax_entry_t;
 
 struct iomap_ops;
@@ -59,6 +62,13 @@ static inline void set_dax_synchronous(struct dax_device *dax_dev)
 {
 	__set_dax_synchronous(dax_dev);
 }
+
+bool __dax_synchronous_enabled(struct dax_device *dax_dev);
+static inline bool dax_synchronous_enabled(struct dax_device *dax_dev)
+{
+	return  __dax_synchronous_enabled(dax_dev);
+}
+
 /*
  * Check if given mapping is supported by the file / underlying device.
  */
@@ -69,6 +79,12 @@ static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
 		return true;
 	if (!IS_DAX(file_inode(vma->vm_file)))
 		return false;
+	/*
+	 * check MAP_SYNC is disabled by platform for this device.
+	 */
+	if (!dax_synchronous_enabled(dax_dev))
+		return false;
+
 	return dax_synchronous(dax_dev);
 }
 #else
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 66f6c65bd789..53d4d4b8b6b9 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -63,6 +63,9 @@ enum {
 	/* Platform provides asynchronous flush mechanism */
 	ND_REGION_ASYNC = 3,
 
+	/* Platform wants to disable synchronous flush mechanism */
+	ND_REGION_SYNC_ENABLED= 4,
+
 	/* mark newly adjusted resources as requiring a label update */
 	DPA_RESOURCE_ADJUSTED = 1 << 0,
 };
@@ -262,6 +265,7 @@ int nvdimm_has_flush(struct nd_region *nd_region);
 int nvdimm_has_cache(struct nd_region *nd_region);
 int nvdimm_in_overwrite(struct nvdimm *nvdimm);
 bool is_nvdimm_sync(struct nd_region *nd_region);
+bool is_nvdimm_sync_enabled(struct nd_region *nd_region);
 
 static inline int nvdimm_ctl(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
 		unsigned int buf_len, int *cmd_rc)
diff --git a/mm/Kconfig b/mm/Kconfig
index cffc276fa19c..aebe7e9c079d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -868,4 +868,7 @@ config ARCH_HAS_HUGEPD
 config MAPPING_DIRTY_HELPERS
         bool
 
+config ARCH_MAP_SYNC_DISABLE
+	bool
+
 endmenu
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 09/10] powerpc/pmem: Disable synchronous fault by default
  2020-06-10  6:23 [PATCH v5 00/10] Support new pmem flush and sync instructions for POWER Aneesh Kumar K.V
                   ` (7 preceding siblings ...)
  2020-06-10  6:23 ` [PATCH v5 08/10] libnvdimm/dax: Add a dax flag to control synchronous fault support Aneesh Kumar K.V
@ 2020-06-10  6:23 ` Aneesh Kumar K.V
  2020-06-10  6:23 ` [PATCH v5 10/10] powerpc/pmem: Initialize pmem device on newer hardware Aneesh Kumar K.V
  2020-06-19 13:10 ` [PATCH v5 00/10] Support new pmem flush and sync instructions for POWER Aneesh Kumar K.V
  10 siblings, 0 replies; 12+ messages in thread
From: Aneesh Kumar K.V @ 2020-06-10  6:23 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, msuchanek, Aneesh Kumar K.V

This adds a kernel config option that controls whether MAP_SYNC is enabled by
default. With POWER10, architecture is adding new pmem flush and sync
instructions. The kernel should prevent the usage of MAP_SYNC if applications
are not using the new instructions on newer hardware.

This config allows user to control whether MAP_SYNC should be enabled by
default or not.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/platforms/Kconfig.cputype    |  9 +++++++++
 arch/powerpc/platforms/pseries/papr_scm.c | 17 ++++++++++++++++-
 drivers/nvdimm/of_pmem.c                  |  7 +++++++
 3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index d349603fb889..abcc163b8dc6 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -383,6 +383,15 @@ config PPC_KUEP
 
 	  If you're unsure, say Y.
 
+config ARCH_MAP_SYNC_DISABLE
+	bool "Disable synchronous fault support (MAP_SYNC)"
+	default y
+	help
+	  Disable support for synchronous fault with nvdimm namespaces.
+
+	  If you're unsure, say Y.
+
+
 config PPC_HAVE_KUAP
 	bool
 
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index ad506e7003c9..b970d2dbe589 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -30,6 +30,7 @@ struct papr_scm_priv {
 	uint64_t block_size;
 	int metadata_size;
 	bool is_volatile;
+	bool disable_map_sync;
 
 	uint64_t bound_addr;
 
@@ -353,11 +354,18 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
 	ndr_desc.num_mappings = 1;
 	ndr_desc.nd_set = &p->nd_set;
 	ndr_desc.flush = papr_scm_flush_sync;
+	set_bit(ND_REGION_SYNC_ENABLED, &ndr_desc.flags);
 
 	if (p->is_volatile)
 		p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc);
 	else {
 		set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
+		/*
+		 * for a persistent region, check if the platform needs to
+		 * force MAP_SYNC disable.
+		 */
+		if (p->disable_map_sync)
+			clear_bit(ND_REGION_SYNC_ENABLED, &ndr_desc.flags);
 		p->region = nvdimm_pmem_region_create(p->bus, &ndr_desc);
 	}
 	if (!p->region) {
@@ -378,7 +386,7 @@ err:	nvdimm_bus_unregister(p->bus);
 
 static int papr_scm_probe(struct platform_device *pdev)
 {
-	struct device_node *dn = pdev->dev.of_node;
+	struct device_node *dn;
 	u32 drc_index, metadata_size;
 	u64 blocks, block_size;
 	struct papr_scm_priv *p;
@@ -386,6 +394,10 @@ static int papr_scm_probe(struct platform_device *pdev)
 	u64 uuid[2];
 	int rc;
 
+	dn = dev_of_node(&pdev->dev);
+	if (!dn)
+		return -ENXIO;
+
 	/* check we have all the required DT properties */
 	if (of_property_read_u32(dn, "ibm,my-drc-index", &drc_index)) {
 		dev_err(&pdev->dev, "%pOF: missing drc-index!\n", dn);
@@ -415,6 +427,9 @@ static int papr_scm_probe(struct platform_device *pdev)
 	/* optional DT properties */
 	of_property_read_u32(dn, "ibm,metadata-size", &metadata_size);
 
+	if (of_device_is_compatible(dn, "ibm,pmemory-v2"))
+		p->disable_map_sync = true;
+
 	p->dn = dn;
 	p->drc_index = drc_index;
 	p->block_size = block_size;
diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
index 6826a274a1f1..a6cc3488e552 100644
--- a/drivers/nvdimm/of_pmem.c
+++ b/drivers/nvdimm/of_pmem.c
@@ -59,12 +59,19 @@ static int of_pmem_region_probe(struct platform_device *pdev)
 		ndr_desc.res = &pdev->resource[i];
 		ndr_desc.of_node = np;
 		set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
+		set_bit(ND_REGION_SYNC_ENABLED, &ndr_desc.flags);
 
 		if (is_volatile)
 			region = nvdimm_volatile_region_create(bus, &ndr_desc);
 		else {
 			set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
+			/*
+			 * for a persistent region, check for newer device
+			 */
+			if (of_device_is_compatible(np, "pmem-region-v2"))
+				clear_bit(ND_REGION_SYNC_ENABLED, &ndr_desc.flags);
 			region = nvdimm_pmem_region_create(bus, &ndr_desc);
+
 		}
 
 		if (!region)
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 10/10] powerpc/pmem: Initialize pmem device on newer hardware
  2020-06-10  6:23 [PATCH v5 00/10] Support new pmem flush and sync instructions for POWER Aneesh Kumar K.V
                   ` (8 preceding siblings ...)
  2020-06-10  6:23 ` [PATCH v5 09/10] powerpc/pmem: Disable synchronous fault by default Aneesh Kumar K.V
@ 2020-06-10  6:23 ` Aneesh Kumar K.V
  2020-06-19 13:10 ` [PATCH v5 00/10] Support new pmem flush and sync instructions for POWER Aneesh Kumar K.V
  10 siblings, 0 replies; 12+ messages in thread
From: Aneesh Kumar K.V @ 2020-06-10  6:23 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, msuchanek, Aneesh Kumar K.V

With kernel now supporting new pmem flush/sync instructions, we can now
enable the kernel to initialize the device. On P10 these devices would
appear with a new compatible string. For PAPR device we have

compatible       "ibm,pmemory-v2"

and for OF pmem device we have

compatible       "pmem-region-v2"

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/papr_scm.c | 1 +
 drivers/nvdimm/of_pmem.c                  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index b970d2dbe589..3efd827fe0ac 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -498,6 +498,7 @@ static int papr_scm_remove(struct platform_device *pdev)
 
 static const struct of_device_id papr_scm_match[] = {
 	{ .compatible = "ibm,pmemory" },
+	{ .compatible = "ibm,pmemory-v2" },
 	{ },
 };
 
diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
index a6cc3488e552..1e1585ab07c7 100644
--- a/drivers/nvdimm/of_pmem.c
+++ b/drivers/nvdimm/of_pmem.c
@@ -97,6 +97,7 @@ static int of_pmem_region_remove(struct platform_device *pdev)
 
 static const struct of_device_id of_pmem_region_match[] = {
 	{ .compatible = "pmem-region" },
+	{ .compatible = "pmem-region-v2" },
 	{ },
 };
 
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 00/10] Support new pmem flush and sync instructions for POWER
  2020-06-10  6:23 [PATCH v5 00/10] Support new pmem flush and sync instructions for POWER Aneesh Kumar K.V
                   ` (9 preceding siblings ...)
  2020-06-10  6:23 ` [PATCH v5 10/10] powerpc/pmem: Initialize pmem device on newer hardware Aneesh Kumar K.V
@ 2020-06-19 13:10 ` Aneesh Kumar K.V
  10 siblings, 0 replies; 12+ messages in thread
From: Aneesh Kumar K.V @ 2020-06-19 13:10 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams; +Cc: Jan Kara, msuchanek


"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:

> This patch series enables the usage os new pmem flush and sync instructions on POWER
> architecture. POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps)
> that can be used to write modified locations back to persistent storage. Additionally,
> POWER10 also introduce phwsync and plwsync which can be used to establish order of these
> writes to persistent storage.
>     
> This series exposes these instructions to the rest of the kernel. The existing
> dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate
> synchronization with OpenCAPI-hosted persistent storage. Hence the new instructions
> are added as a variant of the old ones that old hardware won't differentiate.
>
> On POWER10, pmem devices will be represented by a different device tree compat
> strings. This ensures that older kernels won't initialize pmem devices on POWER10.
>
> W.r.t userspace we want to make sure applications are enabled to use MAP_SYNC only
> if they are using the new instructions. To avoid the wrong usage of MAP_SYNC on
> newer hardware, we disable MAP_SYNC by default on newer hardware. The namespace specific
> attribute /sys/block/pmem0/dax/sync_fault can be used to enable MAP_SYNC later.
>
> With this:
> 1) vPMEM continues to work since it is a volatile region. That 
> doesn't need any flush instructions.
>
> 2) pmdk and other user applications get updated to use new instructions
> and updated packages are made available to all distributions
>
> 3) On newer hardware, the device will appear with a new compat string. 
> Hence older distributions won't initialize pmem on newer hardware.
>
> 4) If we have a newer kernel with an older distro, we use the per 
> namespace sysfs knob that prevents the usage of MAP_SYNC.
>
> 5) Sometime in the future, we mark the CONFIG_ARCH_MAP_SYNC_DISABLE=n
> on ppc64 when we are confident that everybody is using the new flush 
> instruction.
>
> Chaanges from V4:
> * Add namespace specific sychronous fault control.
>
> Changes from V3:
> * Add new compat string to be used for the device.
> * Use arch_pmem_flush_barrier() in dm-writecache.
>
> Aneesh Kumar K.V (10):
>   powerpc/pmem: Restrict papr_scm to P8 and above.
>   powerpc/pmem: Add new instructions for persistent storage and sync
>   powerpc/pmem: Add flush routines using new pmem store and sync
>     instruction
>   libnvdimm/nvdimm/flush: Allow architecture to override the flush
>     barrier
>   powerpc/pmem/of_pmem: Update of_pmem to use the new barrier
>     instruction.
>   powerpc/pmem: Avoid the barrier in flush routines
>   powerpc/book3s/pmem: Add WARN_ONCE to catch the wrong usage of pmem
>     flush functions.
>   libnvdimm/dax: Add a dax flag to control synchronous fault support
>   powerpc/pmem: Disable synchronous fault by default
>   powerpc/pmem: Initialize pmem device on newer hardware
>
>  arch/powerpc/include/asm/cacheflush.h     | 10 ++++
>  arch/powerpc/include/asm/ppc-opcode.h     | 12 ++++
>  arch/powerpc/lib/pmem.c                   | 46 ++++++++++++--
>  arch/powerpc/platforms/Kconfig.cputype    |  9 +++
>  arch/powerpc/platforms/pseries/papr_scm.c | 31 +++++++++-
>  arch/powerpc/platforms/pseries/pmem.c     |  6 ++
>  drivers/dax/bus.c                         |  2 +-
>  drivers/dax/super.c                       | 73 +++++++++++++++++++++++
>  drivers/md/dm-writecache.c                |  2 +-
>  drivers/nvdimm/of_pmem.c                  |  8 +++
>  drivers/nvdimm/pmem.c                     |  4 ++
>  drivers/nvdimm/region_devs.c              | 24 ++++++--
>  include/linux/dax.h                       | 16 +++++
>  include/linux/libnvdimm.h                 |  8 +++
>  mm/Kconfig                                |  3 +
>  15 files changed, 243 insertions(+), 11 deletions(-)

Ping.

Are we good with the approach here? 

-aneesh
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-06-19 13:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-10  6:23 [PATCH v5 00/10] Support new pmem flush and sync instructions for POWER Aneesh Kumar K.V
2020-06-10  6:23 ` [PATCH v5 01/10] powerpc/pmem: Restrict papr_scm to P8 and above Aneesh Kumar K.V
2020-06-10  6:23 ` [PATCH v5 02/10] powerpc/pmem: Add new instructions for persistent storage and sync Aneesh Kumar K.V
2020-06-10  6:23 ` [PATCH v5 03/10] powerpc/pmem: Add flush routines using new pmem store and sync instruction Aneesh Kumar K.V
2020-06-10  6:23 ` [PATCH v5 04/10] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier Aneesh Kumar K.V
2020-06-10  6:23 ` [PATCH v5 05/10] powerpc/pmem/of_pmem: Update of_pmem to use the new barrier instruction Aneesh Kumar K.V
2020-06-10  6:23 ` [PATCH v5 06/10] powerpc/pmem: Avoid the barrier in flush routines Aneesh Kumar K.V
2020-06-10  6:23 ` [PATCH v5 07/10] powerpc/book3s/pmem: Add WARN_ONCE to catch the wrong usage of pmem flush functions Aneesh Kumar K.V
2020-06-10  6:23 ` [PATCH v5 08/10] libnvdimm/dax: Add a dax flag to control synchronous fault support Aneesh Kumar K.V
2020-06-10  6:23 ` [PATCH v5 09/10] powerpc/pmem: Disable synchronous fault by default Aneesh Kumar K.V
2020-06-10  6:23 ` [PATCH v5 10/10] powerpc/pmem: Initialize pmem device on newer hardware Aneesh Kumar K.V
2020-06-19 13:10 ` [PATCH v5 00/10] Support new pmem flush and sync instructions for POWER Aneesh Kumar K.V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).