All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/7] Support new pmem flush and sync instructions for POWER
@ 2020-07-01  7:22 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 23+ messages in thread
From: Aneesh Kumar K.V @ 2020-07-01  7:22 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, msuchanek, Aneesh Kumar K.V

This patch series enables the usage os new pmem flush and sync instructions on POWER
architecture. POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps)
that can be used to write modified locations back to persistent storage. Additionally,
POWER10 also introduce phwsync and plwsync which can be used to establish order of these
writes to persistent storage.
    
This series exposes these instructions to the rest of the kernel. The existing
dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate
synchronization with OpenCAPI-hosted persistent storage. Hence the new instructions
are added as a variant of the old ones that old hardware won't differentiate.

On POWER10, pmem devices will be represented by a different device tree compat
strings. This ensures that older kernels won't initialize pmem devices on POWER10.

With this:
1) vPMEM continues to work since it is a volatile region. That 
doesn't need any flush instructions.

2) pmdk and other user applications get updated to use new instructions
and updated packages are made available to all distributions

3) On newer hardware, the device will appear with a new compat string. 
Hence older distributions won't initialize pmem on newer hardware.

Changes from v6:
* rename flush barrier to pmem_wmb(). Update documentation. 
* Drop the WARN_ON in flush routines.
* Drop pap_scm ndr_region flush callback.

Changes from v5:
* Drop CONFIG_ARCH_MAP_SYNC_DISABLE and related changes

Changes from V4:
* Add namespace specific sychronous fault control.

Changes from V3:
* Add new compat string to be used for the device.
* Use arch_pmem_flush_barrier() in dm-writecache.
Aneesh Kumar K.V (7):
  powerpc/pmem: Restrict papr_scm to P8 and above.
  powerpc/pmem: Add new instructions for persistent storage and sync
  powerpc/pmem: Add flush routines using new pmem store and sync
    instruction
  libnvdimm/nvdimm/flush: Allow architecture to override the flush
    barrier
  powerpc/pmem: Update ppc64 to use the new barrier instruction.
  powerpc/pmem: Avoid the barrier in flush routines
  powerpc/pmem: Initialize pmem device on newer hardware

 Documentation/memory-barriers.txt         | 14 ++++++++
 arch/powerpc/include/asm/barrier.h        | 13 +++++++
 arch/powerpc/include/asm/cacheflush.h     |  1 +
 arch/powerpc/include/asm/ppc-opcode.h     | 12 +++++++
 arch/powerpc/lib/pmem.c                   | 44 ++++++++++++++++++++---
 arch/powerpc/platforms/pseries/papr_scm.c |  1 +
 arch/powerpc/platforms/pseries/pmem.c     |  6 ++++
 drivers/md/dm-writecache.c                |  2 +-
 drivers/nvdimm/of_pmem.c                  |  1 +
 drivers/nvdimm/region_devs.c              |  8 ++---
 include/asm-generic/barrier.h             | 10 ++++++
 11 files changed, 103 insertions(+), 9 deletions(-)

-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v7 0/7] Support new pmem flush and sync instructions for POWER
@ 2020-07-01  7:22 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 23+ messages in thread
From: Aneesh Kumar K.V @ 2020-07-01  7:22 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, Jeff Moyer, msuchanek, oohall, Aneesh Kumar K.V

This patch series enables the usage os new pmem flush and sync instructions on POWER
architecture. POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps)
that can be used to write modified locations back to persistent storage. Additionally,
POWER10 also introduce phwsync and plwsync which can be used to establish order of these
writes to persistent storage.
    
This series exposes these instructions to the rest of the kernel. The existing
dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate
synchronization with OpenCAPI-hosted persistent storage. Hence the new instructions
are added as a variant of the old ones that old hardware won't differentiate.

On POWER10, pmem devices will be represented by a different device tree compat
strings. This ensures that older kernels won't initialize pmem devices on POWER10.

With this:
1) vPMEM continues to work since it is a volatile region. That 
doesn't need any flush instructions.

2) pmdk and other user applications get updated to use new instructions
and updated packages are made available to all distributions

3) On newer hardware, the device will appear with a new compat string. 
Hence older distributions won't initialize pmem on newer hardware.

Changes from v6:
* rename flush barrier to pmem_wmb(). Update documentation. 
* Drop the WARN_ON in flush routines.
* Drop pap_scm ndr_region flush callback.

Changes from v5:
* Drop CONFIG_ARCH_MAP_SYNC_DISABLE and related changes

Changes from V4:
* Add namespace specific sychronous fault control.

Changes from V3:
* Add new compat string to be used for the device.
* Use arch_pmem_flush_barrier() in dm-writecache.
Aneesh Kumar K.V (7):
  powerpc/pmem: Restrict papr_scm to P8 and above.
  powerpc/pmem: Add new instructions for persistent storage and sync
  powerpc/pmem: Add flush routines using new pmem store and sync
    instruction
  libnvdimm/nvdimm/flush: Allow architecture to override the flush
    barrier
  powerpc/pmem: Update ppc64 to use the new barrier instruction.
  powerpc/pmem: Avoid the barrier in flush routines
  powerpc/pmem: Initialize pmem device on newer hardware

 Documentation/memory-barriers.txt         | 14 ++++++++
 arch/powerpc/include/asm/barrier.h        | 13 +++++++
 arch/powerpc/include/asm/cacheflush.h     |  1 +
 arch/powerpc/include/asm/ppc-opcode.h     | 12 +++++++
 arch/powerpc/lib/pmem.c                   | 44 ++++++++++++++++++++---
 arch/powerpc/platforms/pseries/papr_scm.c |  1 +
 arch/powerpc/platforms/pseries/pmem.c     |  6 ++++
 drivers/md/dm-writecache.c                |  2 +-
 drivers/nvdimm/of_pmem.c                  |  1 +
 drivers/nvdimm/region_devs.c              |  8 ++---
 include/asm-generic/barrier.h             | 10 ++++++
 11 files changed, 103 insertions(+), 9 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v7 1/7] powerpc/pmem: Restrict papr_scm to P8 and above.
  2020-07-01  7:22 ` Aneesh Kumar K.V
@ 2020-07-01  7:22   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 23+ messages in thread
From: Aneesh Kumar K.V @ 2020-07-01  7:22 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, msuchanek, Aneesh Kumar K.V

The PAPR based virtualized persistent memory devices are only supported on
POWER9 and above. In the followup patch, the kernel will switch the persistent
memory cache flush functions to use a new `dcbf` variant instruction. The new
instructions even though added in ISA 3.1 works even on P8 and P9 because these
are implemented as a variant of existing `dcbf` and `hwsync` and on P8 and
P9 behaves as such.

Considering these devices are only supported on P8 and above,  update the driver
to prevent a P7-compat guest from using persistent memory devices.

We don't update of_pmem driver with the same condition, because, on bare-metal,
the firmware enables pmem support only on P9 and above. There the kernel depends
on OPAL firmware to restrict exposing persistent memory related device tree
entries on older hardware. of_pmem.ko is written without any arch dependency and
we don't want to add ppc64 specific cpu feature check in of_pmem driver.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/pmem.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/pmem.c b/arch/powerpc/platforms/pseries/pmem.c
index f860a897a9e0..2347e1038f58 100644
--- a/arch/powerpc/platforms/pseries/pmem.c
+++ b/arch/powerpc/platforms/pseries/pmem.c
@@ -147,6 +147,12 @@ const struct of_device_id drc_pmem_match[] = {
 
 static int pseries_pmem_init(void)
 {
+	/*
+	 * Only supported on POWER8 and above.
+	 */
+	if (!cpu_has_feature(CPU_FTR_ARCH_207S))
+		return 0;
+
 	pmem_node = of_find_node_by_type(NULL, "ibm,persistent-memory");
 	if (!pmem_node)
 		return 0;
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 1/7] powerpc/pmem: Restrict papr_scm to P8 and above.
@ 2020-07-01  7:22   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 23+ messages in thread
From: Aneesh Kumar K.V @ 2020-07-01  7:22 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, Jeff Moyer, msuchanek, oohall, Aneesh Kumar K.V

The PAPR based virtualized persistent memory devices are only supported on
POWER9 and above. In the followup patch, the kernel will switch the persistent
memory cache flush functions to use a new `dcbf` variant instruction. The new
instructions even though added in ISA 3.1 works even on P8 and P9 because these
are implemented as a variant of existing `dcbf` and `hwsync` and on P8 and
P9 behaves as such.

Considering these devices are only supported on P8 and above,  update the driver
to prevent a P7-compat guest from using persistent memory devices.

We don't update of_pmem driver with the same condition, because, on bare-metal,
the firmware enables pmem support only on P9 and above. There the kernel depends
on OPAL firmware to restrict exposing persistent memory related device tree
entries on older hardware. of_pmem.ko is written without any arch dependency and
we don't want to add ppc64 specific cpu feature check in of_pmem driver.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/pmem.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/pmem.c b/arch/powerpc/platforms/pseries/pmem.c
index f860a897a9e0..2347e1038f58 100644
--- a/arch/powerpc/platforms/pseries/pmem.c
+++ b/arch/powerpc/platforms/pseries/pmem.c
@@ -147,6 +147,12 @@ const struct of_device_id drc_pmem_match[] = {
 
 static int pseries_pmem_init(void)
 {
+	/*
+	 * Only supported on POWER8 and above.
+	 */
+	if (!cpu_has_feature(CPU_FTR_ARCH_207S))
+		return 0;
+
 	pmem_node = of_find_node_by_type(NULL, "ibm,persistent-memory");
 	if (!pmem_node)
 		return 0;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 2/7] powerpc/pmem: Add new instructions for persistent storage and sync
  2020-07-01  7:22 ` Aneesh Kumar K.V
@ 2020-07-01  7:22   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 23+ messages in thread
From: Aneesh Kumar K.V @ 2020-07-01  7:22 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, msuchanek, Aneesh Kumar K.V

POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps)
that can be used to write modified locations back to persistent storage.

Additionally, POWER10 also introduce phwsync and plwsync which can be used
to establish order of these writes to persistent storage.

This patch exposes these instructions to the rest of the kernel. The existing
dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate
synchronization with OpenCAPI-hosted persistent storage. Hence the new
instructions are added as a variant of the old ones that old hardware
won't differentiate.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/ppc-opcode.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index 2a39c716c343..1ad014e4633e 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -219,6 +219,8 @@
 #define PPC_INST_STWCX			0x7c00012d
 #define PPC_INST_LWSYNC			0x7c2004ac
 #define PPC_INST_SYNC			0x7c0004ac
+#define PPC_INST_PHWSYNC		0x7c8004ac
+#define PPC_INST_PLWSYNC		0x7ca004ac
 #define PPC_INST_SYNC_MASK		0xfc0007fe
 #define PPC_INST_ISYNC			0x4c00012c
 #define PPC_INST_LXVD2X			0x7c000698
@@ -284,6 +286,8 @@
 #define PPC_INST_TABORT			0x7c00071d
 #define PPC_INST_TSR			0x7c0005dd
 
+#define PPC_INST_DCBF			0x7c0000ac
+
 #define PPC_INST_NAP			0x4c000364
 #define PPC_INST_SLEEP			0x4c0003a4
 #define PPC_INST_WINKLE			0x4c0003e4
@@ -532,6 +536,14 @@
 #define STBCIX(s,a,b)		stringify_in_c(.long PPC_INST_STBCIX | \
 				       __PPC_RS(s) | __PPC_RA(a) | __PPC_RB(b))
 
+#define	PPC_DCBFPS(a, b)	stringify_in_c(.long PPC_INST_DCBF |	\
+				       ___PPC_RA(a) | ___PPC_RB(b) | (4 << 21))
+#define	PPC_DCBSTPS(a, b)	stringify_in_c(.long PPC_INST_DCBF |	\
+				       ___PPC_RA(a) | ___PPC_RB(b) | (6 << 21))
+
+#define	PPC_PHWSYNC		stringify_in_c(.long PPC_INST_PHWSYNC)
+#define	PPC_PLWSYNC		stringify_in_c(.long PPC_INST_PLWSYNC)
+
 /*
  * Define what the VSX XX1 form instructions will look like, then add
  * the 128 bit load store instructions based on that.
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 2/7] powerpc/pmem: Add new instructions for persistent storage and sync
@ 2020-07-01  7:22   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 23+ messages in thread
From: Aneesh Kumar K.V @ 2020-07-01  7:22 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, Jeff Moyer, msuchanek, oohall, Aneesh Kumar K.V

POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps)
that can be used to write modified locations back to persistent storage.

Additionally, POWER10 also introduce phwsync and plwsync which can be used
to establish order of these writes to persistent storage.

This patch exposes these instructions to the rest of the kernel. The existing
dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate
synchronization with OpenCAPI-hosted persistent storage. Hence the new
instructions are added as a variant of the old ones that old hardware
won't differentiate.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/ppc-opcode.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index 2a39c716c343..1ad014e4633e 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -219,6 +219,8 @@
 #define PPC_INST_STWCX			0x7c00012d
 #define PPC_INST_LWSYNC			0x7c2004ac
 #define PPC_INST_SYNC			0x7c0004ac
+#define PPC_INST_PHWSYNC		0x7c8004ac
+#define PPC_INST_PLWSYNC		0x7ca004ac
 #define PPC_INST_SYNC_MASK		0xfc0007fe
 #define PPC_INST_ISYNC			0x4c00012c
 #define PPC_INST_LXVD2X			0x7c000698
@@ -284,6 +286,8 @@
 #define PPC_INST_TABORT			0x7c00071d
 #define PPC_INST_TSR			0x7c0005dd
 
+#define PPC_INST_DCBF			0x7c0000ac
+
 #define PPC_INST_NAP			0x4c000364
 #define PPC_INST_SLEEP			0x4c0003a4
 #define PPC_INST_WINKLE			0x4c0003e4
@@ -532,6 +536,14 @@
 #define STBCIX(s,a,b)		stringify_in_c(.long PPC_INST_STBCIX | \
 				       __PPC_RS(s) | __PPC_RA(a) | __PPC_RB(b))
 
+#define	PPC_DCBFPS(a, b)	stringify_in_c(.long PPC_INST_DCBF |	\
+				       ___PPC_RA(a) | ___PPC_RB(b) | (4 << 21))
+#define	PPC_DCBSTPS(a, b)	stringify_in_c(.long PPC_INST_DCBF |	\
+				       ___PPC_RA(a) | ___PPC_RB(b) | (6 << 21))
+
+#define	PPC_PHWSYNC		stringify_in_c(.long PPC_INST_PHWSYNC)
+#define	PPC_PLWSYNC		stringify_in_c(.long PPC_INST_PLWSYNC)
+
 /*
  * Define what the VSX XX1 form instructions will look like, then add
  * the 128 bit load store instructions based on that.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 3/7] powerpc/pmem: Add flush routines using new pmem store and sync instruction
  2020-07-01  7:22 ` Aneesh Kumar K.V
@ 2020-07-01  7:22   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 23+ messages in thread
From: Aneesh Kumar K.V @ 2020-07-01  7:22 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, msuchanek, Aneesh Kumar K.V

Start using dcbstps; phwsync; sequence for flushing persistent memory range.
The new instructions are implemented as a variant of dcbf and hwsync and on
P8 and P9 they will be executed as those instructions. We avoid using them on
older hardware. This helps to avoid difficult to debug bugs.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/cacheflush.h |  1 +
 arch/powerpc/lib/pmem.c               | 50 ++++++++++++++++++++++++---
 2 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
index de600b915a3c..54764c6e922d 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -6,6 +6,7 @@
 
 #include <linux/mm.h>
 #include <asm/cputable.h>
+#include <asm/cpu_has_feature.h>
 
 #ifdef CONFIG_PPC_BOOK3S_64
 /*
diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 0666a8d29596..5a61aaeb6930 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -9,20 +9,62 @@
 
 #include <asm/cacheflush.h>
 
+static inline void __clean_pmem_range(unsigned long start, unsigned long stop)
+{
+	unsigned long shift = l1_dcache_shift();
+	unsigned long bytes = l1_dcache_bytes();
+	void *addr = (void *)(start & ~(bytes - 1));
+	unsigned long size = stop - (unsigned long)addr + (bytes - 1);
+	unsigned long i;
+
+	for (i = 0; i < size >> shift; i++, addr += bytes)
+		asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): "memory");
+
+
+	asm volatile(PPC_PHWSYNC ::: "memory");
+}
+
+static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
+{
+	unsigned long shift = l1_dcache_shift();
+	unsigned long bytes = l1_dcache_bytes();
+	void *addr = (void *)(start & ~(bytes - 1));
+	unsigned long size = stop - (unsigned long)addr + (bytes - 1);
+	unsigned long i;
+
+	for (i = 0; i < size >> shift; i++, addr += bytes)
+		asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): "memory");
+
+
+	asm volatile(PPC_PHWSYNC ::: "memory");
+}
+
+static inline void clean_pmem_range(unsigned long start, unsigned long stop)
+{
+	if (cpu_has_feature(CPU_FTR_ARCH_207S))
+		return __clean_pmem_range(start, stop);
+}
+
+static inline void flush_pmem_range(unsigned long start, unsigned long stop)
+{
+	if (cpu_has_feature(CPU_FTR_ARCH_207S))
+		return __flush_pmem_range(start, stop);
+}
+
 /*
  * CONFIG_ARCH_HAS_PMEM_API symbols
  */
 void arch_wb_cache_pmem(void *addr, size_t size)
 {
 	unsigned long start = (unsigned long) addr;
-	flush_dcache_range(start, start + size);
+	clean_pmem_range(start, start + size);
 }
 EXPORT_SYMBOL_GPL(arch_wb_cache_pmem);
 
 void arch_invalidate_pmem(void *addr, size_t size)
 {
 	unsigned long start = (unsigned long) addr;
-	flush_dcache_range(start, start + size);
+	flush_pmem_range(start, start + size);
 }
 EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
 
@@ -35,7 +77,7 @@ long __copy_from_user_flushcache(void *dest, const void __user *src,
 	unsigned long copied, start = (unsigned long) dest;
 
 	copied = __copy_from_user(dest, src, size);
-	flush_dcache_range(start, start + size);
+	clean_pmem_range(start, start + size);
 
 	return copied;
 }
@@ -45,7 +87,7 @@ void *memcpy_flushcache(void *dest, const void *src, size_t size)
 	unsigned long start = (unsigned long) dest;
 
 	memcpy(dest, src, size);
-	flush_dcache_range(start, start + size);
+	clean_pmem_range(start, start + size);
 
 	return dest;
 }
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 3/7] powerpc/pmem: Add flush routines using new pmem store and sync instruction
@ 2020-07-01  7:22   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 23+ messages in thread
From: Aneesh Kumar K.V @ 2020-07-01  7:22 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, Jeff Moyer, msuchanek, oohall, Aneesh Kumar K.V

Start using dcbstps; phwsync; sequence for flushing persistent memory range.
The new instructions are implemented as a variant of dcbf and hwsync and on
P8 and P9 they will be executed as those instructions. We avoid using them on
older hardware. This helps to avoid difficult to debug bugs.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/cacheflush.h |  1 +
 arch/powerpc/lib/pmem.c               | 50 ++++++++++++++++++++++++---
 2 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
index de600b915a3c..54764c6e922d 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -6,6 +6,7 @@
 
 #include <linux/mm.h>
 #include <asm/cputable.h>
+#include <asm/cpu_has_feature.h>
 
 #ifdef CONFIG_PPC_BOOK3S_64
 /*
diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 0666a8d29596..5a61aaeb6930 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -9,20 +9,62 @@
 
 #include <asm/cacheflush.h>
 
+static inline void __clean_pmem_range(unsigned long start, unsigned long stop)
+{
+	unsigned long shift = l1_dcache_shift();
+	unsigned long bytes = l1_dcache_bytes();
+	void *addr = (void *)(start & ~(bytes - 1));
+	unsigned long size = stop - (unsigned long)addr + (bytes - 1);
+	unsigned long i;
+
+	for (i = 0; i < size >> shift; i++, addr += bytes)
+		asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): "memory");
+
+
+	asm volatile(PPC_PHWSYNC ::: "memory");
+}
+
+static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
+{
+	unsigned long shift = l1_dcache_shift();
+	unsigned long bytes = l1_dcache_bytes();
+	void *addr = (void *)(start & ~(bytes - 1));
+	unsigned long size = stop - (unsigned long)addr + (bytes - 1);
+	unsigned long i;
+
+	for (i = 0; i < size >> shift; i++, addr += bytes)
+		asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): "memory");
+
+
+	asm volatile(PPC_PHWSYNC ::: "memory");
+}
+
+static inline void clean_pmem_range(unsigned long start, unsigned long stop)
+{
+	if (cpu_has_feature(CPU_FTR_ARCH_207S))
+		return __clean_pmem_range(start, stop);
+}
+
+static inline void flush_pmem_range(unsigned long start, unsigned long stop)
+{
+	if (cpu_has_feature(CPU_FTR_ARCH_207S))
+		return __flush_pmem_range(start, stop);
+}
+
 /*
  * CONFIG_ARCH_HAS_PMEM_API symbols
  */
 void arch_wb_cache_pmem(void *addr, size_t size)
 {
 	unsigned long start = (unsigned long) addr;
-	flush_dcache_range(start, start + size);
+	clean_pmem_range(start, start + size);
 }
 EXPORT_SYMBOL_GPL(arch_wb_cache_pmem);
 
 void arch_invalidate_pmem(void *addr, size_t size)
 {
 	unsigned long start = (unsigned long) addr;
-	flush_dcache_range(start, start + size);
+	flush_pmem_range(start, start + size);
 }
 EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
 
@@ -35,7 +77,7 @@ long __copy_from_user_flushcache(void *dest, const void __user *src,
 	unsigned long copied, start = (unsigned long) dest;
 
 	copied = __copy_from_user(dest, src, size);
-	flush_dcache_range(start, start + size);
+	clean_pmem_range(start, start + size);
 
 	return copied;
 }
@@ -45,7 +87,7 @@ void *memcpy_flushcache(void *dest, const void *src, size_t size)
 	unsigned long start = (unsigned long) dest;
 
 	memcpy(dest, src, size);
-	flush_dcache_range(start, start + size);
+	clean_pmem_range(start, start + size);
 
 	return dest;
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 4/7] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
  2020-07-01  7:22 ` Aneesh Kumar K.V
@ 2020-07-01  7:22   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 23+ messages in thread
From: Aneesh Kumar K.V @ 2020-07-01  7:22 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, msuchanek, Aneesh Kumar K.V

Architectures like ppc64 provide persistent memory specific barriers
that will ensure that all stores for which the modifications are
written to persistent storage by preceding dcbfps and dcbstps
instructions have updated persistent storage before any data
access or data transfer caused by subsequent instructions is initiated.
This is in addition to the ordering done by wmb()

Update nvdimm core such that architecture can use barriers other than
wmb to ensure all previous writes are architecturally visible for
the platform buffer flush.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 Documentation/memory-barriers.txt | 14 ++++++++++++++
 drivers/md/dm-writecache.c        |  2 +-
 drivers/nvdimm/region_devs.c      |  8 ++++----
 include/asm-generic/barrier.h     | 10 ++++++++++
 4 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index eaabc3134294..ff07cd3b2f82 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1935,6 +1935,20 @@ There are some more advanced barrier functions:
      relaxed I/O accessors and the Documentation/DMA-API.txt file for more
      information on consistent memory.
 
+ (*) pmem_wmb();
+
+     This is for use with persistent memory to ensure that stores for which
+     modifications are written to persistent storage reached a platform
+     durability domain.
+
+     For example, after a non-temporal write to pmem region, we use pmem_wmb()
+     to ensure that stores have reached a platform durability domain. This ensures
+     that stores have updated persistent storage before any data access or
+     data transfer caused by subsequent instructions is initiated. This is
+     in addition to the ordering done by wmb().
+
+     For load from persistent memory, existing read memory barriers are sufficient
+     to ensure read ordering.
 
 ===============================
 IMPLICIT KERNEL MEMORY BARRIERS
diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 74f3c506f084..00534fa4a384 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -536,7 +536,7 @@ static void ssd_commit_superblock(struct dm_writecache *wc)
 static void writecache_commit_flushed(struct dm_writecache *wc, bool wait_for_ios)
 {
 	if (WC_MODE_PMEM(wc))
-		wmb();
+		pmem_wmb();
 	else
 		ssd_commit_flushed(wc, wait_for_ios);
 }
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 4502f9c4708d..c3237c2b03a6 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1206,13 +1206,13 @@ int generic_nvdimm_flush(struct nd_region *nd_region)
 	idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));
 
 	/*
-	 * The first wmb() is needed to 'sfence' all previous writes
-	 * such that they are architecturally visible for the platform
-	 * buffer flush.  Note that we've already arranged for pmem
+	 * The pmem_wmb() is needed to 'sfence' all
+	 * previous writes such that they are architecturally visible for
+	 * the platform buffer flush. Note that we've already arranged for pmem
 	 * writes to avoid the cache via memcpy_flushcache().  The final
 	 * wmb() ensures ordering for the NVDIMM flush write.
 	 */
-	wmb();
+	pmem_wmb();
 	for (i = 0; i < nd_region->ndr_mappings; i++)
 		if (ndrd_get_flush_wpq(ndrd, i, 0))
 			writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 2eacaf7d62f6..b589bb216ee5 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -257,5 +257,15 @@ do {									\
 })
 #endif
 
+/*
+ * pmem_wmb() ensures that all stores for which the modification
+ * are written to persistent storage by preceding instructions have
+ * updated persistent storage before any data  access or data transfer
+ * caused by subsequent instructions is initiated.
+ */
+#ifndef pmem_wmb
+#define pmem_wmb()	wmb()
+#endif
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __ASM_GENERIC_BARRIER_H */
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 4/7] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
@ 2020-07-01  7:22   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 23+ messages in thread
From: Aneesh Kumar K.V @ 2020-07-01  7:22 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, Jeff Moyer, msuchanek, oohall, Aneesh Kumar K.V

Architectures like ppc64 provide persistent memory specific barriers
that will ensure that all stores for which the modifications are
written to persistent storage by preceding dcbfps and dcbstps
instructions have updated persistent storage before any data
access or data transfer caused by subsequent instructions is initiated.
This is in addition to the ordering done by wmb()

Update nvdimm core such that architecture can use barriers other than
wmb to ensure all previous writes are architecturally visible for
the platform buffer flush.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 Documentation/memory-barriers.txt | 14 ++++++++++++++
 drivers/md/dm-writecache.c        |  2 +-
 drivers/nvdimm/region_devs.c      |  8 ++++----
 include/asm-generic/barrier.h     | 10 ++++++++++
 4 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index eaabc3134294..ff07cd3b2f82 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1935,6 +1935,20 @@ There are some more advanced barrier functions:
      relaxed I/O accessors and the Documentation/DMA-API.txt file for more
      information on consistent memory.
 
+ (*) pmem_wmb();
+
+     This is for use with persistent memory to ensure that stores for which
+     modifications are written to persistent storage reached a platform
+     durability domain.
+
+     For example, after a non-temporal write to pmem region, we use pmem_wmb()
+     to ensure that stores have reached a platform durability domain. This ensures
+     that stores have updated persistent storage before any data access or
+     data transfer caused by subsequent instructions is initiated. This is
+     in addition to the ordering done by wmb().
+
+     For load from persistent memory, existing read memory barriers are sufficient
+     to ensure read ordering.
 
 ===============================
 IMPLICIT KERNEL MEMORY BARRIERS
diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 74f3c506f084..00534fa4a384 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -536,7 +536,7 @@ static void ssd_commit_superblock(struct dm_writecache *wc)
 static void writecache_commit_flushed(struct dm_writecache *wc, bool wait_for_ios)
 {
 	if (WC_MODE_PMEM(wc))
-		wmb();
+		pmem_wmb();
 	else
 		ssd_commit_flushed(wc, wait_for_ios);
 }
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 4502f9c4708d..c3237c2b03a6 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1206,13 +1206,13 @@ int generic_nvdimm_flush(struct nd_region *nd_region)
 	idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));
 
 	/*
-	 * The first wmb() is needed to 'sfence' all previous writes
-	 * such that they are architecturally visible for the platform
-	 * buffer flush.  Note that we've already arranged for pmem
+	 * The pmem_wmb() is needed to 'sfence' all
+	 * previous writes such that they are architecturally visible for
+	 * the platform buffer flush. Note that we've already arranged for pmem
 	 * writes to avoid the cache via memcpy_flushcache().  The final
 	 * wmb() ensures ordering for the NVDIMM flush write.
 	 */
-	wmb();
+	pmem_wmb();
 	for (i = 0; i < nd_region->ndr_mappings; i++)
 		if (ndrd_get_flush_wpq(ndrd, i, 0))
 			writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 2eacaf7d62f6..b589bb216ee5 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -257,5 +257,15 @@ do {									\
 })
 #endif
 
+/*
+ * pmem_wmb() ensures that all stores for which the modification
+ * are written to persistent storage by preceding instructions have
+ * updated persistent storage before any data  access or data transfer
+ * caused by subsequent instructions is initiated.
+ */
+#ifndef pmem_wmb
+#define pmem_wmb()	wmb()
+#endif
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __ASM_GENERIC_BARRIER_H */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 5/7] powerpc/pmem: Update ppc64 to use the new barrier instruction.
  2020-07-01  7:22 ` Aneesh Kumar K.V
@ 2020-07-01  7:22   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 23+ messages in thread
From: Aneesh Kumar K.V @ 2020-07-01  7:22 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, msuchanek, Aneesh Kumar K.V

pmem on POWER10 can now use phwsync instead of hwsync to ensure
all previous writes are architecturally visible for the platform
buffer flush.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/barrier.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index 123adcefd40f..35c1b8f3aa68 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -7,6 +7,10 @@
 
 #include <asm/asm-const.h>
 
+#ifndef __ASSEMBLY__
+#include <asm/ppc-opcode.h>
+#endif
+
 /*
  * Memory barrier.
  * The sync instruction guarantees that all memory accesses initiated
@@ -97,6 +101,15 @@ do {									\
 #define barrier_nospec()
 #endif /* CONFIG_PPC_BARRIER_NOSPEC */
 
+/*
+ * pmem_wmb() ensures that all stores for which the modification
+ * are written to persistent storage by preceding dcbfps/dcbstps
+ * instructions have updated persistent storage before any data
+ * access or data transfer caused by subsequent instructions is
+ * initiated.
+ */
+#define pmem_wmb() __asm__ __volatile__(PPC_PHWSYNC ::: "memory")
+
 #include <asm-generic/barrier.h>
 
 #endif /* _ASM_POWERPC_BARRIER_H */
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 5/7] powerpc/pmem: Update ppc64 to use the new barrier instruction.
@ 2020-07-01  7:22   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 23+ messages in thread
From: Aneesh Kumar K.V @ 2020-07-01  7:22 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, Jeff Moyer, msuchanek, oohall, Aneesh Kumar K.V

pmem on POWER10 can now use phwsync instead of hwsync to ensure
all previous writes are architecturally visible for the platform
buffer flush.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/barrier.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index 123adcefd40f..35c1b8f3aa68 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -7,6 +7,10 @@
 
 #include <asm/asm-const.h>
 
+#ifndef __ASSEMBLY__
+#include <asm/ppc-opcode.h>
+#endif
+
 /*
  * Memory barrier.
  * The sync instruction guarantees that all memory accesses initiated
@@ -97,6 +101,15 @@ do {									\
 #define barrier_nospec()
 #endif /* CONFIG_PPC_BARRIER_NOSPEC */
 
+/*
+ * pmem_wmb() ensures that all stores for which the modification
+ * are written to persistent storage by preceding dcbfps/dcbstps
+ * instructions have updated persistent storage before any data
+ * access or data transfer caused by subsequent instructions is
+ * initiated.
+ */
+#define pmem_wmb() __asm__ __volatile__(PPC_PHWSYNC ::: "memory")
+
 #include <asm-generic/barrier.h>
 
 #endif /* _ASM_POWERPC_BARRIER_H */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 6/7] powerpc/pmem: Avoid the barrier in flush routines
  2020-07-01  7:22 ` Aneesh Kumar K.V
@ 2020-07-01  7:22   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 23+ messages in thread
From: Aneesh Kumar K.V @ 2020-07-01  7:22 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, msuchanek, Aneesh Kumar K.V

nvdimm expect the flush routines to just mark the cache clean. The barrier
that mark the store globally visible is done in nvdimm_flush().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/lib/pmem.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 5a61aaeb6930..21210fa676e5 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -19,9 +19,6 @@ static inline void __clean_pmem_range(unsigned long start, unsigned long stop)
 
 	for (i = 0; i < size >> shift; i++, addr += bytes)
 		asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): "memory");
-
-
-	asm volatile(PPC_PHWSYNC ::: "memory");
 }
 
 static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
@@ -34,9 +31,6 @@ static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
 
 	for (i = 0; i < size >> shift; i++, addr += bytes)
 		asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): "memory");
-
-
-	asm volatile(PPC_PHWSYNC ::: "memory");
 }
 
 static inline void clean_pmem_range(unsigned long start, unsigned long stop)
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 6/7] powerpc/pmem: Avoid the barrier in flush routines
@ 2020-07-01  7:22   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 23+ messages in thread
From: Aneesh Kumar K.V @ 2020-07-01  7:22 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, Jeff Moyer, msuchanek, oohall, Aneesh Kumar K.V

nvdimm expect the flush routines to just mark the cache clean. The barrier
that mark the store globally visible is done in nvdimm_flush().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/lib/pmem.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 5a61aaeb6930..21210fa676e5 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -19,9 +19,6 @@ static inline void __clean_pmem_range(unsigned long start, unsigned long stop)
 
 	for (i = 0; i < size >> shift; i++, addr += bytes)
 		asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): "memory");
-
-
-	asm volatile(PPC_PHWSYNC ::: "memory");
 }
 
 static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
@@ -34,9 +31,6 @@ static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
 
 	for (i = 0; i < size >> shift; i++, addr += bytes)
 		asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): "memory");
-
-
-	asm volatile(PPC_PHWSYNC ::: "memory");
 }
 
 static inline void clean_pmem_range(unsigned long start, unsigned long stop)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 7/7] powerpc/pmem: Initialize pmem device on newer hardware
  2020-07-01  7:22 ` Aneesh Kumar K.V
@ 2020-07-01  7:22   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 23+ messages in thread
From: Aneesh Kumar K.V @ 2020-07-01  7:22 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, msuchanek, Aneesh Kumar K.V

With kernel now supporting new pmem flush/sync instructions, we can now
enable the kernel to initialize the device. On P10 these devices would
appear with a new compatible string. For PAPR device we have

compatible       "ibm,pmemory-v2"

and for OF pmem device we have

compatible       "pmem-region-v2"

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/papr_scm.c | 1 +
 drivers/nvdimm/of_pmem.c                  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 9c569078a09f..66c19c0fe566 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -876,6 +876,7 @@ static int papr_scm_remove(struct platform_device *pdev)
 
 static const struct of_device_id papr_scm_match[] = {
 	{ .compatible = "ibm,pmemory" },
+	{ .compatible = "ibm,pmemory-v2" },
 	{ },
 };
 
diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
index 6826a274a1f1..10dbdcdfb9ce 100644
--- a/drivers/nvdimm/of_pmem.c
+++ b/drivers/nvdimm/of_pmem.c
@@ -90,6 +90,7 @@ static int of_pmem_region_remove(struct platform_device *pdev)
 
 static const struct of_device_id of_pmem_region_match[] = {
 	{ .compatible = "pmem-region" },
+	{ .compatible = "pmem-region-v2" },
 	{ },
 };
 
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 7/7] powerpc/pmem: Initialize pmem device on newer hardware
@ 2020-07-01  7:22   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 23+ messages in thread
From: Aneesh Kumar K.V @ 2020-07-01  7:22 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: Jan Kara, Jeff Moyer, msuchanek, oohall, Aneesh Kumar K.V

With kernel now supporting new pmem flush/sync instructions, we can now
enable the kernel to initialize the device. On P10 these devices would
appear with a new compatible string. For PAPR device we have

compatible       "ibm,pmemory-v2"

and for OF pmem device we have

compatible       "pmem-region-v2"

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/papr_scm.c | 1 +
 drivers/nvdimm/of_pmem.c                  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 9c569078a09f..66c19c0fe566 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -876,6 +876,7 @@ static int papr_scm_remove(struct platform_device *pdev)
 
 static const struct of_device_id papr_scm_match[] = {
 	{ .compatible = "ibm,pmemory" },
+	{ .compatible = "ibm,pmemory-v2" },
 	{ },
 };
 
diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
index 6826a274a1f1..10dbdcdfb9ce 100644
--- a/drivers/nvdimm/of_pmem.c
+++ b/drivers/nvdimm/of_pmem.c
@@ -90,6 +90,7 @@ static int of_pmem_region_remove(struct platform_device *pdev)
 
 static const struct of_device_id of_pmem_region_match[] = {
 	{ .compatible = "pmem-region" },
+	{ .compatible = "pmem-region-v2" },
 	{ },
 };
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 0/7] Support new pmem flush and sync instructions for POWER
  2020-07-01  7:22 ` Aneesh Kumar K.V
@ 2020-07-16 12:55   ` Michael Ellerman
  -1 siblings, 0 replies; 23+ messages in thread
From: Michael Ellerman @ 2020-07-16 12:55 UTC (permalink / raw)
  To: mpe, linux-nvdimm, Aneesh Kumar K.V, dan.j.williams, linuxppc-dev
  Cc: msuchanek, Jan Kara

On Wed, 1 Jul 2020 12:52:28 +0530, Aneesh Kumar K.V wrote:
> This patch series enables the usage os new pmem flush and sync instructions on POWER
> architecture. POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps)
> that can be used to write modified locations back to persistent storage. Additionally,
> POWER10 also introduce phwsync and plwsync which can be used to establish order of these
> writes to persistent storage.
> 
> This series exposes these instructions to the rest of the kernel. The existing
> dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate
> synchronization with OpenCAPI-hosted persistent storage. Hence the new instructions
> are added as a variant of the old ones that old hardware won't differentiate.
> 
> [...]

Applied to powerpc/next.

[1/7] powerpc/pmem: Restrict papr_scm to P8 and above.
      https://git.kernel.org/powerpc/c/c83040192f3763b243ece26073d61a895b4a230f
[2/7] powerpc/pmem: Add new instructions for persistent storage and sync
      https://git.kernel.org/powerpc/c/32db09d992ddc7d145595cff49cccfe14e018266
[3/7] powerpc/pmem: Add flush routines using new pmem store and sync instruction
      https://git.kernel.org/powerpc/c/d358042793183a57094dac45a44116e1165ac593
[4/7] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
      https://git.kernel.org/powerpc/c/3e79f082ebfc130360bcee23e4dd74729dcafdf4
[5/7] powerpc/pmem: Update ppc64 to use the new barrier instruction.
      https://git.kernel.org/powerpc/c/76e6c73f33d4e1cc4de4f25c0bf66d59e42113c4
[6/7] powerpc/pmem: Avoid the barrier in flush routines
      https://git.kernel.org/powerpc/c/436499ab868f1a9e497cfdbf641affe8a122c571
[7/7] powerpc/pmem: Initialize pmem device on newer hardware
      https://git.kernel.org/powerpc/c/8c26ab72663b4affc31e47cdf77d61d0172d1033

cheers
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 0/7] Support new pmem flush and sync instructions for POWER
@ 2020-07-16 12:55   ` Michael Ellerman
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Ellerman @ 2020-07-16 12:55 UTC (permalink / raw)
  To: mpe, linux-nvdimm, Aneesh Kumar K.V, dan.j.williams, linuxppc-dev
  Cc: msuchanek, Jan Kara

On Wed, 1 Jul 2020 12:52:28 +0530, Aneesh Kumar K.V wrote:
> This patch series enables the usage os new pmem flush and sync instructions on POWER
> architecture. POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps)
> that can be used to write modified locations back to persistent storage. Additionally,
> POWER10 also introduce phwsync and plwsync which can be used to establish order of these
> writes to persistent storage.
> 
> This series exposes these instructions to the rest of the kernel. The existing
> dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate
> synchronization with OpenCAPI-hosted persistent storage. Hence the new instructions
> are added as a variant of the old ones that old hardware won't differentiate.
> 
> [...]

Applied to powerpc/next.

[1/7] powerpc/pmem: Restrict papr_scm to P8 and above.
      https://git.kernel.org/powerpc/c/c83040192f3763b243ece26073d61a895b4a230f
[2/7] powerpc/pmem: Add new instructions for persistent storage and sync
      https://git.kernel.org/powerpc/c/32db09d992ddc7d145595cff49cccfe14e018266
[3/7] powerpc/pmem: Add flush routines using new pmem store and sync instruction
      https://git.kernel.org/powerpc/c/d358042793183a57094dac45a44116e1165ac593
[4/7] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
      https://git.kernel.org/powerpc/c/3e79f082ebfc130360bcee23e4dd74729dcafdf4
[5/7] powerpc/pmem: Update ppc64 to use the new barrier instruction.
      https://git.kernel.org/powerpc/c/76e6c73f33d4e1cc4de4f25c0bf66d59e42113c4
[6/7] powerpc/pmem: Avoid the barrier in flush routines
      https://git.kernel.org/powerpc/c/436499ab868f1a9e497cfdbf641affe8a122c571
[7/7] powerpc/pmem: Initialize pmem device on newer hardware
      https://git.kernel.org/powerpc/c/8c26ab72663b4affc31e47cdf77d61d0172d1033

cheers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 3/7] powerpc/pmem: Add flush routines using new pmem store and sync instruction
  2020-07-01  7:22   ` Aneesh Kumar K.V
  (?)
@ 2022-01-21  7:36   ` Christophe Leroy
  2022-01-21  9:07     ` Aneesh Kumar K.V
  -1 siblings, 1 reply; 23+ messages in thread
From: Christophe Leroy @ 2022-01-21  7:36 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: oohall, Jeff Moyer, msuchanek, Jan Kara

Le 01/07/2020 à 09:22, Aneesh Kumar K.V a écrit :
> Start using dcbstps; phwsync; sequence for flushing persistent memory range.
> The new instructions are implemented as a variant of dcbf and hwsync and on
> P8 and P9 they will be executed as those instructions. We avoid using them on
> older hardware. This helps to avoid difficult to debug bugs.
> 

Before this patch, the flush was done for all.
After this patch, IIUC the flush is done only on CPUs having feature 
CPU_FTR_ARCH_207S.

What about other CPUs ?

I don't know much about PMEM, my concern is about the UACCESS_FLUSHCACHE 
API introduced by commit 6c44741d75a2 ("powerpc/lib: Implement 
UACCESS_FLUSHCACHE API")

After your patch, __copy_from_user_flushcache() and memcpy_flushcache() 
are not doing cache flush anymore.

Is that intended ?

I'm trying to optimise some ALSA driver that does copy_from_user + 
cache_flush for DMA, and I was wondering if using 
__copy_from_user_flushcache() was an alternative.

Or is it __copy_from_user_inatomic_nocache() which has to be done for that ?

Thanks
Christophe


> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   arch/powerpc/include/asm/cacheflush.h |  1 +
>   arch/powerpc/lib/pmem.c               | 50 ++++++++++++++++++++++++---
>   2 files changed, 47 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
> index de600b915a3c..54764c6e922d 100644
> --- a/arch/powerpc/include/asm/cacheflush.h
> +++ b/arch/powerpc/include/asm/cacheflush.h
> @@ -6,6 +6,7 @@
>   
>   #include <linux/mm.h>
>   #include <asm/cputable.h>
> +#include <asm/cpu_has_feature.h>
>   
>   #ifdef CONFIG_PPC_BOOK3S_64
>   /*
> diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
> index 0666a8d29596..5a61aaeb6930 100644
> --- a/arch/powerpc/lib/pmem.c
> +++ b/arch/powerpc/lib/pmem.c
> @@ -9,20 +9,62 @@
>   
>   #include <asm/cacheflush.h>
>   
> +static inline void __clean_pmem_range(unsigned long start, unsigned long stop)
> +{
> +	unsigned long shift = l1_dcache_shift();
> +	unsigned long bytes = l1_dcache_bytes();
> +	void *addr = (void *)(start & ~(bytes - 1));
> +	unsigned long size = stop - (unsigned long)addr + (bytes - 1);
> +	unsigned long i;
> +
> +	for (i = 0; i < size >> shift; i++, addr += bytes)
> +		asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): "memory");
> +
> +
> +	asm volatile(PPC_PHWSYNC ::: "memory");
> +}
> +
> +static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
> +{
> +	unsigned long shift = l1_dcache_shift();
> +	unsigned long bytes = l1_dcache_bytes();
> +	void *addr = (void *)(start & ~(bytes - 1));
> +	unsigned long size = stop - (unsigned long)addr + (bytes - 1);
> +	unsigned long i;
> +
> +	for (i = 0; i < size >> shift; i++, addr += bytes)
> +		asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): "memory");
> +
> +
> +	asm volatile(PPC_PHWSYNC ::: "memory");
> +}
> +
> +static inline void clean_pmem_range(unsigned long start, unsigned long stop)
> +{
> +	if (cpu_has_feature(CPU_FTR_ARCH_207S))
> +		return __clean_pmem_range(start, stop);
> +}
> +
> +static inline void flush_pmem_range(unsigned long start, unsigned long stop)
> +{
> +	if (cpu_has_feature(CPU_FTR_ARCH_207S))
> +		return __flush_pmem_range(start, stop);
> +}
> +
>   /*
>    * CONFIG_ARCH_HAS_PMEM_API symbols
>    */
>   void arch_wb_cache_pmem(void *addr, size_t size)
>   {
>   	unsigned long start = (unsigned long) addr;
> -	flush_dcache_range(start, start + size);
> +	clean_pmem_range(start, start + size);
>   }
>   EXPORT_SYMBOL_GPL(arch_wb_cache_pmem);
>   
>   void arch_invalidate_pmem(void *addr, size_t size)
>   {
>   	unsigned long start = (unsigned long) addr;
> -	flush_dcache_range(start, start + size);
> +	flush_pmem_range(start, start + size);
>   }
>   EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
>   
> @@ -35,7 +77,7 @@ long __copy_from_user_flushcache(void *dest, const void __user *src,
>   	unsigned long copied, start = (unsigned long) dest;
>   
>   	copied = __copy_from_user(dest, src, size);
> -	flush_dcache_range(start, start + size);
> +	clean_pmem_range(start, start + size);
>   
>   	return copied;
>   }
> @@ -45,7 +87,7 @@ void *memcpy_flushcache(void *dest, const void *src, size_t size)
>   	unsigned long start = (unsigned long) dest;
>   
>   	memcpy(dest, src, size);
> -	flush_dcache_range(start, start + size);
> +	clean_pmem_range(start, start + size);
>   
>   	return dest;
>   }

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 1/7] powerpc/pmem: Restrict papr_scm to P8 and above.
  2020-07-01  7:22   ` Aneesh Kumar K.V
  (?)
@ 2022-01-21  8:40   ` Michal Suchánek
  2022-01-21  9:18     ` Aneesh Kumar K.V
  -1 siblings, 1 reply; 23+ messages in thread
From: Michal Suchánek @ 2022-01-21  8:40 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Jan Kara, linux-nvdimm, Jeff Moyer, oohall, dan.j.williams, linuxppc-dev

Hello,

On Wed, Jul 01, 2020 at 12:52:29PM +0530, Aneesh Kumar K.V wrote:
> The PAPR based virtualized persistent memory devices are only supported on
> POWER9 and above. In the followup patch, the kernel will switch the persistent
> memory cache flush functions to use a new `dcbf` variant instruction. The new
> instructions even though added in ISA 3.1 works even on P8 and P9 because these
> are implemented as a variant of existing `dcbf` and `hwsync` and on P8 and
> P9 behaves as such.
> 
> Considering these devices are only supported on P8 and above,  update the driver
> to prevent a P7-compat guest from using persistent memory devices.
> 
> We don't update of_pmem driver with the same condition, because, on bare-metal,
> the firmware enables pmem support only on P9 and above. There the kernel depends
> on OPAL firmware to restrict exposing persistent memory related device tree
> entries on older hardware. of_pmem.ko is written without any arch dependency and
> we don't want to add ppc64 specific cpu feature check in of_pmem driver.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/pmem.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/pseries/pmem.c b/arch/powerpc/platforms/pseries/pmem.c
> index f860a897a9e0..2347e1038f58 100644
> --- a/arch/powerpc/platforms/pseries/pmem.c
> +++ b/arch/powerpc/platforms/pseries/pmem.c
> @@ -147,6 +147,12 @@ const struct of_device_id drc_pmem_match[] = {
>  
>  static int pseries_pmem_init(void)
>  {
> +	/*
> +	 * Only supported on POWER8 and above.
> +	 */
> +	if (!cpu_has_feature(CPU_FTR_ARCH_207S))
> +		return 0;
> +

This looks superfluous.

The hypervisor is responsible for publishing the pmem in devicetree when
present, kernel is responsible for using it when supported by the
kernel.

Or is there a problem that the flush instruction is not available in P7
compat mode?

Even then volatile regions should still work.

Thanks

Michal

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 3/7] powerpc/pmem: Add flush routines using new pmem store and sync instruction
  2022-01-21  7:36   ` Christophe Leroy
@ 2022-01-21  9:07     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 23+ messages in thread
From: Aneesh Kumar K.V @ 2022-01-21  9:07 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev, mpe, linux-nvdimm, dan.j.williams
  Cc: oohall, Jeff Moyer, msuchanek, Jan Kara

Christophe Leroy <christophe.leroy@csgroup.eu> writes:

> Le 01/07/2020 à 09:22, Aneesh Kumar K.V a écrit :
>> Start using dcbstps; phwsync; sequence for flushing persistent memory range.
>> The new instructions are implemented as a variant of dcbf and hwsync and on
>> P8 and P9 they will be executed as those instructions. We avoid using them on
>> older hardware. This helps to avoid difficult to debug bugs.
>> 
>
> Before this patch, the flush was done for all.
> After this patch, IIUC the flush is done only on CPUs having feature 
> CPU_FTR_ARCH_207S.
>
> What about other CPUs ?
>
> I don't know much about PMEM, my concern is about the UACCESS_FLUSHCACHE 
> API introduced by commit 6c44741d75a2 ("powerpc/lib: Implement 
> UACCESS_FLUSHCACHE API")
>
> After your patch, __copy_from_user_flushcache() and memcpy_flushcache() 
> are not doing cache flush anymore.
>
> Is that intended ?

yes, with the understanding that these functions are used with
persistent memory . We restrict the persistent memory usage to p8 and
above via commit c83040192f3763b243ece26073d61a895b4a230f

>
> I'm trying to optimise some ALSA driver that does copy_from_user + 
> cache_flush for DMA, and I was wondering if using 
> __copy_from_user_flushcache() was an alternative.
>
> Or is it __copy_from_user_inatomic_nocache() which has to be done for that ?
>
> Thanks
> Christophe
>
>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>   arch/powerpc/include/asm/cacheflush.h |  1 +
>>   arch/powerpc/lib/pmem.c               | 50 ++++++++++++++++++++++++---
>>   2 files changed, 47 insertions(+), 4 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
>> index de600b915a3c..54764c6e922d 100644
>> --- a/arch/powerpc/include/asm/cacheflush.h
>> +++ b/arch/powerpc/include/asm/cacheflush.h
>> @@ -6,6 +6,7 @@
>>   
>>   #include <linux/mm.h>
>>   #include <asm/cputable.h>
>> +#include <asm/cpu_has_feature.h>
>>   
>>   #ifdef CONFIG_PPC_BOOK3S_64
>>   /*
>> diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
>> index 0666a8d29596..5a61aaeb6930 100644
>> --- a/arch/powerpc/lib/pmem.c
>> +++ b/arch/powerpc/lib/pmem.c
>> @@ -9,20 +9,62 @@
>>   
>>   #include <asm/cacheflush.h>
>>   
>> +static inline void __clean_pmem_range(unsigned long start, unsigned long stop)
>> +{
>> +	unsigned long shift = l1_dcache_shift();
>> +	unsigned long bytes = l1_dcache_bytes();
>> +	void *addr = (void *)(start & ~(bytes - 1));
>> +	unsigned long size = stop - (unsigned long)addr + (bytes - 1);
>> +	unsigned long i;
>> +
>> +	for (i = 0; i < size >> shift; i++, addr += bytes)
>> +		asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): "memory");
>> +
>> +
>> +	asm volatile(PPC_PHWSYNC ::: "memory");
>> +}
>> +
>> +static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
>> +{
>> +	unsigned long shift = l1_dcache_shift();
>> +	unsigned long bytes = l1_dcache_bytes();
>> +	void *addr = (void *)(start & ~(bytes - 1));
>> +	unsigned long size = stop - (unsigned long)addr + (bytes - 1);
>> +	unsigned long i;
>> +
>> +	for (i = 0; i < size >> shift; i++, addr += bytes)
>> +		asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): "memory");
>> +
>> +
>> +	asm volatile(PPC_PHWSYNC ::: "memory");
>> +}
>> +
>> +static inline void clean_pmem_range(unsigned long start, unsigned long stop)
>> +{
>> +	if (cpu_has_feature(CPU_FTR_ARCH_207S))
>> +		return __clean_pmem_range(start, stop);
>> +}
>> +
>> +static inline void flush_pmem_range(unsigned long start, unsigned long stop)
>> +{
>> +	if (cpu_has_feature(CPU_FTR_ARCH_207S))
>> +		return __flush_pmem_range(start, stop);
>> +}
>> +
>>   /*
>>    * CONFIG_ARCH_HAS_PMEM_API symbols
>>    */
>>   void arch_wb_cache_pmem(void *addr, size_t size)
>>   {
>>   	unsigned long start = (unsigned long) addr;
>> -	flush_dcache_range(start, start + size);
>> +	clean_pmem_range(start, start + size);
>>   }
>>   EXPORT_SYMBOL_GPL(arch_wb_cache_pmem);
>>   
>>   void arch_invalidate_pmem(void *addr, size_t size)
>>   {
>>   	unsigned long start = (unsigned long) addr;
>> -	flush_dcache_range(start, start + size);
>> +	flush_pmem_range(start, start + size);
>>   }
>>   EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
>>   
>> @@ -35,7 +77,7 @@ long __copy_from_user_flushcache(void *dest, const void __user *src,
>>   	unsigned long copied, start = (unsigned long) dest;
>>   
>>   	copied = __copy_from_user(dest, src, size);
>> -	flush_dcache_range(start, start + size);
>> +	clean_pmem_range(start, start + size);
>>   
>>   	return copied;
>>   }
>> @@ -45,7 +87,7 @@ void *memcpy_flushcache(void *dest, const void *src, size_t size)
>>   	unsigned long start = (unsigned long) dest;
>>   
>>   	memcpy(dest, src, size);
>> -	flush_dcache_range(start, start + size);
>> +	clean_pmem_range(start, start + size);
>>   
>>   	return dest;
>>   }

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 1/7] powerpc/pmem: Restrict papr_scm to P8 and above.
  2022-01-21  8:40   ` Michal Suchánek
@ 2022-01-21  9:18     ` Aneesh Kumar K.V
  2022-01-21 13:27       ` Michal Suchánek
  0 siblings, 1 reply; 23+ messages in thread
From: Aneesh Kumar K.V @ 2022-01-21  9:18 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Jan Kara, linux-nvdimm, Jeff Moyer, oohall, dan.j.williams, linuxppc-dev

Michal Suchánek <msuchanek@suse.de> writes:

> Hello,
>
> On Wed, Jul 01, 2020 at 12:52:29PM +0530, Aneesh Kumar K.V wrote:
>> The PAPR based virtualized persistent memory devices are only supported on
>> POWER9 and above. In the followup patch, the kernel will switch the persistent
>> memory cache flush functions to use a new `dcbf` variant instruction. The new
>> instructions even though added in ISA 3.1 works even on P8 and P9 because these
>> are implemented as a variant of existing `dcbf` and `hwsync` and on P8 and
>> P9 behaves as such.
>> 
>> Considering these devices are only supported on P8 and above,  update the driver
>> to prevent a P7-compat guest from using persistent memory devices.
>> 
>> We don't update of_pmem driver with the same condition, because, on bare-metal,
>> the firmware enables pmem support only on P9 and above. There the kernel depends
>> on OPAL firmware to restrict exposing persistent memory related device tree
>> entries on older hardware. of_pmem.ko is written without any arch dependency and
>> we don't want to add ppc64 specific cpu feature check in of_pmem driver.
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/pmem.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>> 
>> diff --git a/arch/powerpc/platforms/pseries/pmem.c b/arch/powerpc/platforms/pseries/pmem.c
>> index f860a897a9e0..2347e1038f58 100644
>> --- a/arch/powerpc/platforms/pseries/pmem.c
>> +++ b/arch/powerpc/platforms/pseries/pmem.c
>> @@ -147,6 +147,12 @@ const struct of_device_id drc_pmem_match[] = {
>>  
>>  static int pseries_pmem_init(void)
>>  {
>> +	/*
>> +	 * Only supported on POWER8 and above.
>> +	 */
>> +	if (!cpu_has_feature(CPU_FTR_ARCH_207S))
>> +		return 0;
>> +
>
> This looks superfluous.
>
> The hypervisor is responsible for publishing the pmem in devicetree when
> present, kernel is responsible for using it when supported by the
> kernel.
>
> Or is there a problem that the flush instruction is not available in P7
> compat mode?

We want to avoid the usage of persistent memory on p7 compat mode
because such a guest can LPM migrate to p7 systems. Now ideally I would
expect hypervisor to avoid such migration, that is a p7 compat mode
guest running on p10 using persistence memory migrating to p7
(considering p7 never really had support for persistent memory).

There was also the complexity w.r.t what instructions the userspace will
use. So it was discussed at that point that we could comfortably state
and prevent the usage of persistent memory on p7 and below. 

>
> Even then volatile regions should still work.

That is a different problem altogether. We could really kill the usage of
cache flush w.r.t volatile regions from the nvdimm driver right? 

For all these reason, disabling pmem on p7 was found to be the simplest solution. 

-aneesh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 1/7] powerpc/pmem: Restrict papr_scm to P8 and above.
  2022-01-21  9:18     ` Aneesh Kumar K.V
@ 2022-01-21 13:27       ` Michal Suchánek
  0 siblings, 0 replies; 23+ messages in thread
From: Michal Suchánek @ 2022-01-21 13:27 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Jan Kara, linux-nvdimm, Jeff Moyer, oohall, dan.j.williams, linuxppc-dev

On Fri, Jan 21, 2022 at 02:48:32PM +0530, Aneesh Kumar K.V wrote:
> Michal Suchánek <msuchanek@suse.de> writes:
> 
> > Hello,
> >
> > On Wed, Jul 01, 2020 at 12:52:29PM +0530, Aneesh Kumar K.V wrote:
> >> The PAPR based virtualized persistent memory devices are only supported on
> >> POWER9 and above. In the followup patch, the kernel will switch the persistent
> >> memory cache flush functions to use a new `dcbf` variant instruction. The new
> >> instructions even though added in ISA 3.1 works even on P8 and P9 because these
> >> are implemented as a variant of existing `dcbf` and `hwsync` and on P8 and
> >> P9 behaves as such.
> >> 
> >> Considering these devices are only supported on P8 and above,  update the driver
> >> to prevent a P7-compat guest from using persistent memory devices.
> >> 
> >> We don't update of_pmem driver with the same condition, because, on bare-metal,
> >> the firmware enables pmem support only on P9 and above. There the kernel depends
> >> on OPAL firmware to restrict exposing persistent memory related device tree
> >> entries on older hardware. of_pmem.ko is written without any arch dependency and
> >> we don't want to add ppc64 specific cpu feature check in of_pmem driver.
> >> 
> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> >> ---
> >>  arch/powerpc/platforms/pseries/pmem.c | 6 ++++++
> >>  1 file changed, 6 insertions(+)
> >> 
> >> diff --git a/arch/powerpc/platforms/pseries/pmem.c b/arch/powerpc/platforms/pseries/pmem.c
> >> index f860a897a9e0..2347e1038f58 100644
> >> --- a/arch/powerpc/platforms/pseries/pmem.c
> >> +++ b/arch/powerpc/platforms/pseries/pmem.c
> >> @@ -147,6 +147,12 @@ const struct of_device_id drc_pmem_match[] = {
> >>  
> >>  static int pseries_pmem_init(void)
> >>  {
> >> +	/*
> >> +	 * Only supported on POWER8 and above.
> >> +	 */
> >> +	if (!cpu_has_feature(CPU_FTR_ARCH_207S))
> >> +		return 0;
> >> +
> >
> > This looks superfluous.
> >
> > The hypervisor is responsible for publishing the pmem in devicetree when
> > present, kernel is responsible for using it when supported by the
> > kernel.
> >
> > Or is there a problem that the flush instruction is not available in P7
> > compat mode?
> 
> We want to avoid the usage of persistent memory on p7 compat mode
> because such a guest can LPM migrate to p7 systems. Now ideally I would
> expect hypervisor to avoid such migration, that is a p7 compat mode
> guest running on p10 using persistence memory migrating to p7
> (considering p7 never really had support for persistent memory).

Yes, I would expect the hypervisor to prevent migration to host that
does not have all the hardawre that the guest uses. It could still
migrate to P8 or whatever in compat mode.

> 
> There was also the complexity w.r.t what instructions the userspace will
> use. So it was discussed at that point that we could comfortably state
> and prevent the usage of persistent memory on p7 and below. 

But is that arbitrary or does POWER7 not support the pmem sync instructions?

If that is true then how is POWER7 compat mode behaving WRT those
instructions?

Thanks

Michal

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-01-21 13:28 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-01  7:22 [PATCH v7 0/7] Support new pmem flush and sync instructions for POWER Aneesh Kumar K.V
2020-07-01  7:22 ` Aneesh Kumar K.V
2020-07-01  7:22 ` [PATCH v7 1/7] powerpc/pmem: Restrict papr_scm to P8 and above Aneesh Kumar K.V
2020-07-01  7:22   ` Aneesh Kumar K.V
2022-01-21  8:40   ` Michal Suchánek
2022-01-21  9:18     ` Aneesh Kumar K.V
2022-01-21 13:27       ` Michal Suchánek
2020-07-01  7:22 ` [PATCH v7 2/7] powerpc/pmem: Add new instructions for persistent storage and sync Aneesh Kumar K.V
2020-07-01  7:22   ` Aneesh Kumar K.V
2020-07-01  7:22 ` [PATCH v7 3/7] powerpc/pmem: Add flush routines using new pmem store and sync instruction Aneesh Kumar K.V
2020-07-01  7:22   ` Aneesh Kumar K.V
2022-01-21  7:36   ` Christophe Leroy
2022-01-21  9:07     ` Aneesh Kumar K.V
2020-07-01  7:22 ` [PATCH v7 4/7] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier Aneesh Kumar K.V
2020-07-01  7:22   ` Aneesh Kumar K.V
2020-07-01  7:22 ` [PATCH v7 5/7] powerpc/pmem: Update ppc64 to use the new barrier instruction Aneesh Kumar K.V
2020-07-01  7:22   ` Aneesh Kumar K.V
2020-07-01  7:22 ` [PATCH v7 6/7] powerpc/pmem: Avoid the barrier in flush routines Aneesh Kumar K.V
2020-07-01  7:22   ` Aneesh Kumar K.V
2020-07-01  7:22 ` [PATCH v7 7/7] powerpc/pmem: Initialize pmem device on newer hardware Aneesh Kumar K.V
2020-07-01  7:22   ` Aneesh Kumar K.V
2020-07-16 12:55 ` [PATCH v7 0/7] Support new pmem flush and sync instructions for POWER Michael Ellerman
2020-07-16 12:55   ` Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.