linux-nvdimm.lists.01.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/5] powerpc/pmem: Add new instructions for persistent storage and sync
@ 2020-05-13  3:47 Aneesh Kumar K.V
  2020-05-13  3:47 ` [PATCH v2 2/5] powerpc/pmem: Add flush routines using new pmem store and sync instruction Aneesh Kumar K.V
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Aneesh Kumar K.V @ 2020-05-13  3:47 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm; +Cc: alistair, Aneesh Kumar K.V

POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps)
that can be used to write modified locations back to persistent storage.

Additionally, POWER10 also introduce phwsync and plwsync which can be used
to establish order of these writes to persistent storage.

This patch exposes these instructions to the rest of the kernel. The existing
dcbf and hwsync instructions in P9 are adequate to enable appropriate
synchronization with OpenCAPI-hosted persistent storage. Hence the new
instructions are added as a variant of the old ones that old hardware
won't differentiate.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/ppc-opcode.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index c1df75edde44..45eccd842f84 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -216,6 +216,8 @@
 #define PPC_INST_STWCX			0x7c00012d
 #define PPC_INST_LWSYNC			0x7c2004ac
 #define PPC_INST_SYNC			0x7c0004ac
+#define PPC_INST_PHWSYNC		0x7c8004ac
+#define PPC_INST_PLWSYNC		0x7ca004ac
 #define PPC_INST_SYNC_MASK		0xfc0007fe
 #define PPC_INST_ISYNC			0x4c00012c
 #define PPC_INST_LXVD2X			0x7c000698
@@ -281,6 +283,8 @@
 #define PPC_INST_TABORT			0x7c00071d
 #define PPC_INST_TSR			0x7c0005dd
 
+#define PPC_INST_DCBF			0x7c0000ac
+
 #define PPC_INST_NAP			0x4c000364
 #define PPC_INST_SLEEP			0x4c0003a4
 #define PPC_INST_WINKLE			0x4c0003e4
@@ -529,6 +533,14 @@
 #define STBCIX(s,a,b)		stringify_in_c(.long PPC_INST_STBCIX | \
 				       __PPC_RS(s) | __PPC_RA(a) | __PPC_RB(b))
 
+#define	PPC_DCBFPS(a, b)	stringify_in_c(.long PPC_INST_DCBF |	\
+				       ___PPC_RA(a) | ___PPC_RB(b) | (4 << 21))
+#define	PPC_DCBSTPS(a, b)	stringify_in_c(.long PPC_INST_DCBF |	\
+				       ___PPC_RA(a) | ___PPC_RB(b) | (6 << 21))
+
+#define	PPC_PHWSYNC		stringify_in_c(.long PPC_INST_PHWSYNC)
+#define	PPC_PLWSYNC		stringify_in_c(.long PPC_INST_PLWSYNC)
+
 /*
  * Define what the VSX XX1 form instructions will look like, then add
  * the 128 bit load store instructions based on that.
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/5] powerpc/pmem: Add flush routines using new pmem store and sync instruction
  2020-05-13  3:47 [PATCH v2 1/5] powerpc/pmem: Add new instructions for persistent storage and sync Aneesh Kumar K.V
@ 2020-05-13  3:47 ` Aneesh Kumar K.V
  2020-05-13  3:47 ` [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier Aneesh Kumar K.V
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Aneesh Kumar K.V @ 2020-05-13  3:47 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm; +Cc: alistair, Aneesh Kumar K.V

Start using dcbstps; phwsync; sequence for flushing persistent memory range.
Even though the new instructions are implemented as a variant of dcbf and hwsync and on
POWER9 they will be executed as those instructions, we still avoid using them on
older hardware. This helps to avoid difficult to debug bugs.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/lib/pmem.c | 52 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 48 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 0666a8d29596..076d75efb57a 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -9,20 +9,64 @@
 
 #include <asm/cacheflush.h>
 
+static inline void clean_pmem_range_isa310(unsigned long start, unsigned long stop)
+{
+	unsigned long shift = l1_dcache_shift();
+	unsigned long bytes = l1_dcache_bytes();
+	void *addr = (void *)(start & ~(bytes - 1));
+	unsigned long size = stop - (unsigned long)addr + (bytes - 1);
+	unsigned long i;
+
+	for (i = 0; i < size >> shift; i++, addr += bytes)
+		asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): "memory");
+
+
+	asm volatile(PPC_PHWSYNC ::: "memory");
+}
+
+static inline void flush_pmem_range_isa310(unsigned long start, unsigned long stop)
+{
+	unsigned long shift = l1_dcache_shift();
+	unsigned long bytes = l1_dcache_bytes();
+	void *addr = (void *)(start & ~(bytes - 1));
+	unsigned long size = stop - (unsigned long)addr + (bytes - 1);
+	unsigned long i;
+
+	for (i = 0; i < size >> shift; i++, addr += bytes)
+		asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): "memory");
+
+
+	asm volatile(PPC_PHWSYNC ::: "memory");
+}
+
+static inline void clean_pmem_range(unsigned long start, unsigned long stop)
+{
+	if (cpu_has_feature(CPU_FTR_ARCH_31))
+		return clean_pmem_range_isa310(start, stop);
+	return flush_dcache_range(start, stop);
+}
+
+static inline void flush_pmem_range(unsigned long start, unsigned long stop)
+{
+	if (cpu_has_feature(CPU_FTR_ARCH_31))
+		return flush_pmem_range_isa310(start, stop);
+	return flush_dcache_range(start, stop);
+}
+
 /*
  * CONFIG_ARCH_HAS_PMEM_API symbols
  */
 void arch_wb_cache_pmem(void *addr, size_t size)
 {
 	unsigned long start = (unsigned long) addr;
-	flush_dcache_range(start, start + size);
+	clean_pmem_range(start, start + size);
 }
 EXPORT_SYMBOL_GPL(arch_wb_cache_pmem);
 
 void arch_invalidate_pmem(void *addr, size_t size)
 {
 	unsigned long start = (unsigned long) addr;
-	flush_dcache_range(start, start + size);
+	flush_pmem_range(start, start + size);
 }
 EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
 
@@ -35,7 +79,7 @@ long __copy_from_user_flushcache(void *dest, const void __user *src,
 	unsigned long copied, start = (unsigned long) dest;
 
 	copied = __copy_from_user(dest, src, size);
-	flush_dcache_range(start, start + size);
+	clean_pmem_range(start, start + size);
 
 	return copied;
 }
@@ -45,7 +89,7 @@ void *memcpy_flushcache(void *dest, const void *src, size_t size)
 	unsigned long start = (unsigned long) dest;
 
 	memcpy(dest, src, size);
-	flush_dcache_range(start, start + size);
+	clean_pmem_range(start, start + size);
 
 	return dest;
 }
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
  2020-05-13  3:47 [PATCH v2 1/5] powerpc/pmem: Add new instructions for persistent storage and sync Aneesh Kumar K.V
  2020-05-13  3:47 ` [PATCH v2 2/5] powerpc/pmem: Add flush routines using new pmem store and sync instruction Aneesh Kumar K.V
@ 2020-05-13  3:47 ` Aneesh Kumar K.V
  2020-05-13 16:14   ` Dan Williams
  2020-05-13  3:47 ` [PATCH v2 4/5] powerpc/pmem/of_pmem: Update of_pmem to use the new barrier instruction Aneesh Kumar K.V
  2020-05-13  3:47 ` [PATCH v2 5/5] powerpc/pmem: Avoid the barrier in flush routines Aneesh Kumar K.V
  3 siblings, 1 reply; 21+ messages in thread
From: Aneesh Kumar K.V @ 2020-05-13  3:47 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm; +Cc: alistair, Aneesh Kumar K.V

Architectures like ppc64 provide persistent memory specific barriers
that will ensure that all stores for which the modifications are
written to persistent storage by preceding dcbfps and dcbstps
instructions have updated persistent storage before any data
access or data transfer caused by subsequent instructions is initiated.
This is in addition to the ordering done by wmb()

Update nvdimm core such that architecture can use barriers other than
wmb to ensure all previous writes are architecturally visible for
the platform buffer flush.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/nvdimm/region_devs.c | 8 ++++----
 include/linux/libnvdimm.h    | 4 ++++
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index ccbb5b43b8b2..88ea34a9c7fd 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1216,13 +1216,13 @@ int generic_nvdimm_flush(struct nd_region *nd_region)
 	idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));
 
 	/*
-	 * The first wmb() is needed to 'sfence' all previous writes
-	 * such that they are architecturally visible for the platform
-	 * buffer flush.  Note that we've already arranged for pmem
+	 * The first arch_pmem_flush_barrier() is needed to 'sfence' all
+	 * previous writes such that they are architecturally visible for
+	 * the platform buffer flush. Note that we've already arranged for pmem
 	 * writes to avoid the cache via memcpy_flushcache().  The final
 	 * wmb() ensures ordering for the NVDIMM flush write.
 	 */
-	wmb();
+	arch_pmem_flush_barrier();
 	for (i = 0; i < nd_region->ndr_mappings; i++)
 		if (ndrd_get_flush_wpq(ndrd, i, 0))
 			writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 18da4059be09..66f6c65bd789 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -286,4 +286,8 @@ static inline void arch_invalidate_pmem(void *addr, size_t size)
 }
 #endif
 
+#ifndef arch_pmem_flush_barrier
+#define arch_pmem_flush_barrier() wmb()
+#endif
+
 #endif /* __LIBNVDIMM_H__ */
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 4/5] powerpc/pmem/of_pmem: Update of_pmem to use the new barrier instruction.
  2020-05-13  3:47 [PATCH v2 1/5] powerpc/pmem: Add new instructions for persistent storage and sync Aneesh Kumar K.V
  2020-05-13  3:47 ` [PATCH v2 2/5] powerpc/pmem: Add flush routines using new pmem store and sync instruction Aneesh Kumar K.V
  2020-05-13  3:47 ` [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier Aneesh Kumar K.V
@ 2020-05-13  3:47 ` Aneesh Kumar K.V
  2020-05-13  6:44   ` kbuild test robot
  2020-05-13  3:47 ` [PATCH v2 5/5] powerpc/pmem: Avoid the barrier in flush routines Aneesh Kumar K.V
  3 siblings, 1 reply; 21+ messages in thread
From: Aneesh Kumar K.V @ 2020-05-13  3:47 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm; +Cc: alistair, Aneesh Kumar K.V

of_pmem on POWER10 can now use phwsync instead of hwsync to ensure
all previous writes are architecturally visible for the platform
buffer flush.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/cacheflush.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
index e92191b390f3..f22057dc9dd0 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -119,6 +119,16 @@ static inline void invalidate_dcache_range(unsigned long start,
 #define copy_from_user_page(vma, page, vaddr, dst, src, len) \
 	memcpy(dst, src, len)
 
+
+#define arch_pmem_flush_barrier arch_pmem_flush_barrier
+static inline void  arch_pmem_flush_barrier(void)
+{
+	if (cpu_has_feature(CPU_FTR_ARCH_31))
+		asm volatile(PPC_PHWSYNC ::: "memory");
+	else
+		asm volatile("hwsync" ::: "memory");
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_POWERPC_CACHEFLUSH_H */
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 5/5] powerpc/pmem: Avoid the barrier in flush routines
  2020-05-13  3:47 [PATCH v2 1/5] powerpc/pmem: Add new instructions for persistent storage and sync Aneesh Kumar K.V
                   ` (2 preceding siblings ...)
  2020-05-13  3:47 ` [PATCH v2 4/5] powerpc/pmem/of_pmem: Update of_pmem to use the new barrier instruction Aneesh Kumar K.V
@ 2020-05-13  3:47 ` Aneesh Kumar K.V
  3 siblings, 0 replies; 21+ messages in thread
From: Aneesh Kumar K.V @ 2020-05-13  3:47 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm; +Cc: alistair, Aneesh Kumar K.V

nvdimm expect the flush routines to just mark the cache clean. The barrier
that mark the store globally visible is done in nvdimm_flush().

Update the papr_scm driver to a simplified nvdim_flush callback that do
only the required barrier.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/lib/pmem.c                   | 34 +++++++++++++++++------
 arch/powerpc/platforms/pseries/papr_scm.c | 13 +++++++++
 2 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 076d75efb57a..3ef15cfa925b 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -9,7 +9,7 @@
 
 #include <asm/cacheflush.h>
 
-static inline void clean_pmem_range_isa310(unsigned long start, unsigned long stop)
+static inline void __clean_pmem_range(unsigned long start, unsigned long stop)
 {
 	unsigned long shift = l1_dcache_shift();
 	unsigned long bytes = l1_dcache_bytes();
@@ -18,13 +18,22 @@ static inline void clean_pmem_range_isa310(unsigned long start, unsigned long st
 	unsigned long i;
 
 	for (i = 0; i < size >> shift; i++, addr += bytes)
-		asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): "memory");
+		dcbf(addr);
+}
 
+static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
+{
+	unsigned long shift = l1_dcache_shift();
+	unsigned long bytes = l1_dcache_bytes();
+	void *addr = (void *)(start & ~(bytes - 1));
+	unsigned long size = stop - (unsigned long)addr + (bytes - 1);
+	unsigned long i;
 
-	asm volatile(PPC_PHWSYNC ::: "memory");
+	for (i = 0; i < size >> shift; i++, addr += bytes)
+		dcbf(addr);
 }
 
-static inline void flush_pmem_range_isa310(unsigned long start, unsigned long stop)
+static inline void clean_pmem_range_isa310(unsigned long start, unsigned long stop)
 {
 	unsigned long shift = l1_dcache_shift();
 	unsigned long bytes = l1_dcache_bytes();
@@ -33,24 +42,33 @@ static inline void flush_pmem_range_isa310(unsigned long start, unsigned long st
 	unsigned long i;
 
 	for (i = 0; i < size >> shift; i++, addr += bytes)
-		asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): "memory");
+		asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): "memory");
+}
 
+static inline void flush_pmem_range_isa310(unsigned long start, unsigned long stop)
+{
+	unsigned long shift = l1_dcache_shift();
+	unsigned long bytes = l1_dcache_bytes();
+	void *addr = (void *)(start & ~(bytes - 1));
+	unsigned long size = stop - (unsigned long)addr + (bytes - 1);
+	unsigned long i;
 
-	asm volatile(PPC_PHWSYNC ::: "memory");
+	for (i = 0; i < size >> shift; i++, addr += bytes)
+		asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): "memory");
 }
 
 static inline void clean_pmem_range(unsigned long start, unsigned long stop)
 {
 	if (cpu_has_feature(CPU_FTR_ARCH_31))
 		return clean_pmem_range_isa310(start, stop);
-	return flush_dcache_range(start, stop);
+	return __clean_pmem_range(start, stop);
 }
 
 static inline void flush_pmem_range(unsigned long start, unsigned long stop)
 {
 	if (cpu_has_feature(CPU_FTR_ARCH_31))
 		return flush_pmem_range_isa310(start, stop);
-	return flush_dcache_range(start, stop);
+	return __flush_pmem_range(start, stop);
 }
 
 /*
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index f35592423380..ad506e7003c9 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -285,6 +285,18 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
 
 	return 0;
 }
+/*
+ * We have made sure the pmem writes are done such that before calling this
+ * all the caches are flushed/clean. We use dcbf/dcbfps to ensure this. Here
+ * we just need to add the necessary barrier to make sure the above flushes
+ * are have updated persistent storage before any data access or data transfer
+ * caused by subsequent instructions is initiated.
+ */
+static int papr_scm_flush_sync(struct nd_region *nd_region, struct bio *bio)
+{
+	arch_pmem_flush_barrier();
+	return 0;
+}
 
 static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
 {
@@ -340,6 +352,7 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
 	ndr_desc.mapping = &mapping;
 	ndr_desc.num_mappings = 1;
 	ndr_desc.nd_set = &p->nd_set;
+	ndr_desc.flush = papr_scm_flush_sync;
 
 	if (p->is_volatile)
 		p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc);
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/5] powerpc/pmem/of_pmem: Update of_pmem to use the new barrier instruction.
  2020-05-13  3:47 ` [PATCH v2 4/5] powerpc/pmem/of_pmem: Update of_pmem to use the new barrier instruction Aneesh Kumar K.V
@ 2020-05-13  6:44   ` kbuild test robot
  0 siblings, 0 replies; 21+ messages in thread
From: kbuild test robot @ 2020-05-13  6:44 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linuxppc-dev, mpe, linux-nvdimm
  Cc: kbuild-all, alistair, Aneesh Kumar K.V

Hi "Aneesh,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on linux-nvdimm/libnvdimm-for-next v5.7-rc5 next-20200512]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Aneesh-Kumar-K-V/powerpc-pmem-Add-new-instructions-for-persistent-storage-and-sync/20200513-133938
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-storcenter_defconfig (attached as .config)
compiler: powerpc-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

WARNING: unmet direct dependencies detected for PPC_INDIRECT_PCI
Depends on PCI
Selected by
- MPC10X_BRIDGE
In file included from include/linux/highmem.h:12,
from include/linux/pagemap.h:11,
from include/linux/blkdev.h:16,
from include/linux/blk-cgroup.h:23,
from include/linux/writeback.h:14,
from include/linux/memcontrol.h:22,
from include/linux/swap.h:9,
from include/linux/suspend.h:5,
from arch/powerpc/kernel/asm-offsets.c:23:
arch/powerpc/include/asm/cacheflush.h: In function 'arch_pmem_flush_barrier':
>> arch/powerpc/include/asm/cacheflush.h:126:22: error: 'CPU_FTR_ARCH_31' undeclared (first use in this function); did you mean
126 | if (cpu_has_feature(CPU_FTR_ARCH_31))
| ^~~~~~~~~~~~~~~
| CPU_FTR_ARCH_300
arch/powerpc/include/asm/cacheflush.h:126:22: note: each undeclared identifier is reported only once for each function it appears in
Makefile arch block certs crypto drivers fs include init ipc kernel lib mm net scripts security sound source usr virt [scripts/Makefile.build:100: arch/powerpc/kernel/asm-offsets.s] Error 1
Target '__build' not remade because of errors.
Makefile arch block certs crypto drivers fs include init ipc kernel lib mm net scripts security sound source usr virt [Makefile:1141: prepare0] Error 2
Target 'prepare' not remade because of errors.
make: Makefile arch block certs crypto drivers fs include init ipc kernel lib mm net scripts security sound source usr virt [Makefile:180: sub-make] Error 2

vim +/CPU_FTR_ARCH_31 +126 arch/powerpc/include/asm/cacheflush.h

   113	
   114	#define copy_to_user_page(vma, page, vaddr, dst, src, len) \
   115		do { \
   116			memcpy(dst, src, len); \
   117			flush_icache_user_range(vma, page, vaddr, len); \
   118		} while (0)
   119	#define copy_from_user_page(vma, page, vaddr, dst, src, len) \
   120		memcpy(dst, src, len)
   121	
   122	
   123	#define arch_pmem_flush_barrier arch_pmem_flush_barrier
   124	static inline void  arch_pmem_flush_barrier(void)
   125	{
 > 126		if (cpu_has_feature(CPU_FTR_ARCH_31))
   127			asm volatile(PPC_PHWSYNC ::: "memory");
   128		else
   129			asm volatile("hwsync" ::: "memory");
   130	}
   131	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
  2020-05-13  3:47 ` [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier Aneesh Kumar K.V
@ 2020-05-13 16:14   ` Dan Williams
  2020-05-19  5:30     ` Aneesh Kumar K.V
  0 siblings, 1 reply; 21+ messages in thread
From: Dan Williams @ 2020-05-13 16:14 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, Michael Ellerman, linux-nvdimm, alistair

On Tue, May 12, 2020 at 8:47 PM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> Architectures like ppc64 provide persistent memory specific barriers
> that will ensure that all stores for which the modifications are
> written to persistent storage by preceding dcbfps and dcbstps
> instructions have updated persistent storage before any data
> access or data transfer caused by subsequent instructions is initiated.
> This is in addition to the ordering done by wmb()
>
> Update nvdimm core such that architecture can use barriers other than
> wmb to ensure all previous writes are architecturally visible for
> the platform buffer flush.

This seems like an exceedingly bad idea, maybe I'm missing something.
This implies that the deployed base of DAX applications using the old
instruction sequence are going to regress on new hardware that
requires the new instructions to be deployed. I'm thinking the kernel
should go as far as to disable DAX operation by default on new
hardware until userspace asserts that it is prepared to switch to the
new implementation. Is there any other way to ensure the forward
compatibility of deployed ppc64 DAX applications?

>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  drivers/nvdimm/region_devs.c | 8 ++++----
>  include/linux/libnvdimm.h    | 4 ++++
>  2 files changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index ccbb5b43b8b2..88ea34a9c7fd 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -1216,13 +1216,13 @@ int generic_nvdimm_flush(struct nd_region *nd_region)
>         idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));
>
>         /*
> -        * The first wmb() is needed to 'sfence' all previous writes
> -        * such that they are architecturally visible for the platform
> -        * buffer flush.  Note that we've already arranged for pmem
> +        * The first arch_pmem_flush_barrier() is needed to 'sfence' all
> +        * previous writes such that they are architecturally visible for
> +        * the platform buffer flush. Note that we've already arranged for pmem
>          * writes to avoid the cache via memcpy_flushcache().  The final
>          * wmb() ensures ordering for the NVDIMM flush write.
>          */
> -       wmb();
> +       arch_pmem_flush_barrier();
>         for (i = 0; i < nd_region->ndr_mappings; i++)
>                 if (ndrd_get_flush_wpq(ndrd, i, 0))
>                         writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
> index 18da4059be09..66f6c65bd789 100644
> --- a/include/linux/libnvdimm.h
> +++ b/include/linux/libnvdimm.h
> @@ -286,4 +286,8 @@ static inline void arch_invalidate_pmem(void *addr, size_t size)
>  }
>  #endif
>
> +#ifndef arch_pmem_flush_barrier
> +#define arch_pmem_flush_barrier() wmb()
> +#endif
> +
>  #endif /* __LIBNVDIMM_H__ */
> --
> 2.26.2
>
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
  2020-05-13 16:14   ` Dan Williams
@ 2020-05-19  5:30     ` Aneesh Kumar K.V
  2020-05-19  7:09       ` Dan Williams
  0 siblings, 1 reply; 21+ messages in thread
From: Aneesh Kumar K.V @ 2020-05-19  5:30 UTC (permalink / raw)
  To: Dan Williams; +Cc: linuxppc-dev, Michael Ellerman, linux-nvdimm, alistair


Hi Dan,

Apologies for the delay in response. I was waiting for feedback from
hardware team before responding to this email.


Dan Williams <dan.j.williams@intel.com> writes:

> On Tue, May 12, 2020 at 8:47 PM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> Architectures like ppc64 provide persistent memory specific barriers
>> that will ensure that all stores for which the modifications are
>> written to persistent storage by preceding dcbfps and dcbstps
>> instructions have updated persistent storage before any data
>> access or data transfer caused by subsequent instructions is initiated.
>> This is in addition to the ordering done by wmb()
>>
>> Update nvdimm core such that architecture can use barriers other than
>> wmb to ensure all previous writes are architecturally visible for
>> the platform buffer flush.
>
> This seems like an exceedingly bad idea, maybe I'm missing something.
> This implies that the deployed base of DAX applications using the old
> instruction sequence are going to regress on new hardware that
> requires the new instructions to be deployed.


pmdk support for ppc64 is still work in progress and there is pull
request to switch pmdk to use new instruction.

https://github.com/tuliom/pmdk/commit/fix-flush

All userspace applications will be switched to use the new
instructions. The new instructions are designed such that when running on P8
and P9 they behave as 'dcbf' and 'hwsync'.

Applications using new instructions will behave as expected when running
on P8 and P9. Only future hardware will differentiate between 'dcbf' and
'dcbfps' 

> I'm thinking the kernel
> should go as far as to disable DAX operation by default on new
> hardware until userspace asserts that it is prepared to switch to the
> new implementation. Is there any other way to ensure the forward
> compatibility of deployed ppc64 DAX applications?

AFAIU there is no released persistent memory hardware on ppc64 platform
and we need to make sure before applications get enabled to use these
persistent memory devices, they should switch to use the new
instruction?


>
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>  drivers/nvdimm/region_devs.c | 8 ++++----
>>  include/linux/libnvdimm.h    | 4 ++++
>>  2 files changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
>> index ccbb5b43b8b2..88ea34a9c7fd 100644
>> --- a/drivers/nvdimm/region_devs.c
>> +++ b/drivers/nvdimm/region_devs.c
>> @@ -1216,13 +1216,13 @@ int generic_nvdimm_flush(struct nd_region *nd_region)
>>         idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));
>>
>>         /*
>> -        * The first wmb() is needed to 'sfence' all previous writes
>> -        * such that they are architecturally visible for the platform
>> -        * buffer flush.  Note that we've already arranged for pmem
>> +        * The first arch_pmem_flush_barrier() is needed to 'sfence' all
>> +        * previous writes such that they are architecturally visible for
>> +        * the platform buffer flush. Note that we've already arranged for pmem
>>          * writes to avoid the cache via memcpy_flushcache().  The final
>>          * wmb() ensures ordering for the NVDIMM flush write.
>>          */
>> -       wmb();
>> +       arch_pmem_flush_barrier();
>>         for (i = 0; i < nd_region->ndr_mappings; i++)
>>                 if (ndrd_get_flush_wpq(ndrd, i, 0))
>>                         writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
>> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
>> index 18da4059be09..66f6c65bd789 100644
>> --- a/include/linux/libnvdimm.h
>> +++ b/include/linux/libnvdimm.h
>> @@ -286,4 +286,8 @@ static inline void arch_invalidate_pmem(void *addr, size_t size)
>>  }
>>  #endif
>>
>> +#ifndef arch_pmem_flush_barrier
>> +#define arch_pmem_flush_barrier() wmb()
>> +#endif
>> +
>>  #endif /* __LIBNVDIMM_H__ */
>> --
>> 2.26.2
>>
> _______________________________________________
> Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
> To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
  2020-05-19  5:30     ` Aneesh Kumar K.V
@ 2020-05-19  7:09       ` Dan Williams
  2020-05-19 13:52         ` Aneesh Kumar K.V
  0 siblings, 1 reply; 21+ messages in thread
From: Dan Williams @ 2020-05-19  7:09 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, Michael Ellerman, linux-nvdimm, alistair

On Mon, May 18, 2020 at 10:30 PM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
>
> Hi Dan,
>
> Apologies for the delay in response. I was waiting for feedback from
> hardware team before responding to this email.
>
>
> Dan Williams <dan.j.williams@intel.com> writes:
>
> > On Tue, May 12, 2020 at 8:47 PM Aneesh Kumar K.V
> > <aneesh.kumar@linux.ibm.com> wrote:
> >>
> >> Architectures like ppc64 provide persistent memory specific barriers
> >> that will ensure that all stores for which the modifications are
> >> written to persistent storage by preceding dcbfps and dcbstps
> >> instructions have updated persistent storage before any data
> >> access or data transfer caused by subsequent instructions is initiated.
> >> This is in addition to the ordering done by wmb()
> >>
> >> Update nvdimm core such that architecture can use barriers other than
> >> wmb to ensure all previous writes are architecturally visible for
> >> the platform buffer flush.
> >
> > This seems like an exceedingly bad idea, maybe I'm missing something.
> > This implies that the deployed base of DAX applications using the old
> > instruction sequence are going to regress on new hardware that
> > requires the new instructions to be deployed.
>
>
> pmdk support for ppc64 is still work in progress and there is pull
> request to switch pmdk to use new instruction.

Ok.

>
> https://github.com/tuliom/pmdk/commit/fix-flush
>
> All userspace applications will be switched to use the new
> instructions. The new instructions are designed such that when running on P8
> and P9 they behave as 'dcbf' and 'hwsync'.

Sure, makes sense.

> Applications using new instructions will behave as expected when running
> on P8 and P9. Only future hardware will differentiate between 'dcbf' and
> 'dcbfps'

Right, this is the problem. Applications using new instructions behave
as expected, the kernel has been shipping of_pmem and papr_scm for
several cycles now, you're saying that the DAX applications written
against those platforms are going to be broken on P8 and P9?

> > I'm thinking the kernel
> > should go as far as to disable DAX operation by default on new
> > hardware until userspace asserts that it is prepared to switch to the
> > new implementation. Is there any other way to ensure the forward
> > compatibility of deployed ppc64 DAX applications?
>
> AFAIU there is no released persistent memory hardware on ppc64 platform
> and we need to make sure before applications get enabled to use these
> persistent memory devices, they should switch to use the new
> instruction?

Right, I want the kernel to offer some level of safety here because
everything you are describing sounds like a flag day conversion. Am I
misreading? Is there some other gate that prevents existing users of
of_pmem and papr_scm from having their expectations violated when
running on P8 / P9 hardware? Maybe there's tighter ecosystem control
that I'm just not familiar with, I'm only going off the fact that the
kernel has shipped a non-zero number of NVDIMM drivers that build with
ARCH=ppc64 for several cycles.
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
  2020-05-19  7:09       ` Dan Williams
@ 2020-05-19 13:52         ` Aneesh Kumar K.V
  2020-05-19 18:59           ` Dan Williams
  0 siblings, 1 reply; 21+ messages in thread
From: Aneesh Kumar K.V @ 2020-05-19 13:52 UTC (permalink / raw)
  To: Dan Williams; +Cc: linuxppc-dev, Michael Ellerman, linux-nvdimm, alistair

Dan Williams <dan.j.williams@intel.com> writes:

> On Mon, May 18, 2020 at 10:30 PM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:

...

>> Applications using new instructions will behave as expected when running
>> on P8 and P9. Only future hardware will differentiate between 'dcbf' and
>> 'dcbfps'
>
> Right, this is the problem. Applications using new instructions behave
> as expected, the kernel has been shipping of_pmem and papr_scm for
> several cycles now, you're saying that the DAX applications written
> against those platforms are going to be broken on P8 and P9?

The expecation is that both kernel and userspace would get upgraded to
use the new instruction before actual persistent memory devices are
made available.

>
>> > I'm thinking the kernel
>> > should go as far as to disable DAX operation by default on new
>> > hardware until userspace asserts that it is prepared to switch to the
>> > new implementation. Is there any other way to ensure the forward
>> > compatibility of deployed ppc64 DAX applications?
>>
>> AFAIU there is no released persistent memory hardware on ppc64 platform
>> and we need to make sure before applications get enabled to use these
>> persistent memory devices, they should switch to use the new
>> instruction?
>
> Right, I want the kernel to offer some level of safety here because
> everything you are describing sounds like a flag day conversion. Am I
> misreading? Is there some other gate that prevents existing users of
> of_pmem and papr_scm from having their expectations violated when
> running on P8 / P9 hardware? Maybe there's tighter ecosystem control
> that I'm just not familiar with, I'm only going off the fact that the
> kernel has shipped a non-zero number of NVDIMM drivers that build with
> ARCH=ppc64 for several cycles.

If we are looking at adding changes to kernel that will prevent a kernel
from running on newer hardware in a specific case, we could as well take
the changes to get the kernel use the newer instructions right?

But I agree with your concern that if we have older kernel/applications
that continue to use `dcbf` on future hardware we will end up
having issues w.r.t powerfail consistency. The plan is what you outlined
above as tighter ecosystem control. Considering we don't have a pmem
device generally available, we get both kernel and userspace upgraded
to use these new instructions before such a device is made available.

-aneesh
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
  2020-05-19 13:52         ` Aneesh Kumar K.V
@ 2020-05-19 18:59           ` Dan Williams
  2020-05-20 18:43             ` Aneesh Kumar K.V
  2020-05-21 14:38             ` Jeff Moyer
  0 siblings, 2 replies; 21+ messages in thread
From: Dan Williams @ 2020-05-19 18:59 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, Michael Ellerman, linux-nvdimm, alistair

On Tue, May 19, 2020 at 6:53 AM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> Dan Williams <dan.j.williams@intel.com> writes:
>
> > On Mon, May 18, 2020 at 10:30 PM Aneesh Kumar K.V
> > <aneesh.kumar@linux.ibm.com> wrote:
>
> ...
>
> >> Applications using new instructions will behave as expected when running
> >> on P8 and P9. Only future hardware will differentiate between 'dcbf' and
> >> 'dcbfps'
> >
> > Right, this is the problem. Applications using new instructions behave
> > as expected, the kernel has been shipping of_pmem and papr_scm for
> > several cycles now, you're saying that the DAX applications written
> > against those platforms are going to be broken on P8 and P9?
>
> The expecation is that both kernel and userspace would get upgraded to
> use the new instruction before actual persistent memory devices are
> made available.
>
> >
> >> > I'm thinking the kernel
> >> > should go as far as to disable DAX operation by default on new
> >> > hardware until userspace asserts that it is prepared to switch to the
> >> > new implementation. Is there any other way to ensure the forward
> >> > compatibility of deployed ppc64 DAX applications?
> >>
> >> AFAIU there is no released persistent memory hardware on ppc64 platform
> >> and we need to make sure before applications get enabled to use these
> >> persistent memory devices, they should switch to use the new
> >> instruction?
> >
> > Right, I want the kernel to offer some level of safety here because
> > everything you are describing sounds like a flag day conversion. Am I
> > misreading? Is there some other gate that prevents existing users of
> > of_pmem and papr_scm from having their expectations violated when
> > running on P8 / P9 hardware? Maybe there's tighter ecosystem control
> > that I'm just not familiar with, I'm only going off the fact that the
> > kernel has shipped a non-zero number of NVDIMM drivers that build with
> > ARCH=ppc64 for several cycles.
>
> If we are looking at adding changes to kernel that will prevent a kernel
> from running on newer hardware in a specific case, we could as well take
> the changes to get the kernel use the newer instructions right?

Oh, no, I'm not talking about stopping the kernel from running. I'm
simply recommending that support for MAP_SYNC mappings (userspace
managed flushing) be disabled by default on PPC with either a
compile-time or run-time default to assert that userspace has been
audited for legacy applications or that the platform owner is
otherwise willing to take the risk.

> But I agree with your concern that if we have older kernel/applications
> that continue to use `dcbf` on future hardware we will end up
> having issues w.r.t powerfail consistency. The plan is what you outlined
> above as tighter ecosystem control. Considering we don't have a pmem
> device generally available, we get both kernel and userspace upgraded
> to use these new instructions before such a device is made available.

Ok, I think a compile time kernel option with a runtime override
satisfies my concern. Does that work for you?
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
  2020-05-19 18:59           ` Dan Williams
@ 2020-05-20 18:43             ` Aneesh Kumar K.V
  2020-05-21 14:38             ` Jeff Moyer
  1 sibling, 0 replies; 21+ messages in thread
From: Aneesh Kumar K.V @ 2020-05-20 18:43 UTC (permalink / raw)
  To: Dan Williams; +Cc: linuxppc-dev, Michael Ellerman, linux-nvdimm, alistair

Dan Williams <dan.j.williams@intel.com> writes:

> On Tue, May 19, 2020 at 6:53 AM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> Dan Williams <dan.j.williams@intel.com> writes:
>>
>> > On Mon, May 18, 2020 at 10:30 PM Aneesh Kumar K.V
>> > <aneesh.kumar@linux.ibm.com> wrote:
>>
>> ...
>>
>> >> Applications using new instructions will behave as expected when running
>> >> on P8 and P9. Only future hardware will differentiate between 'dcbf' and
>> >> 'dcbfps'
>> >
>> > Right, this is the problem. Applications using new instructions behave
>> > as expected, the kernel has been shipping of_pmem and papr_scm for
>> > several cycles now, you're saying that the DAX applications written
>> > against those platforms are going to be broken on P8 and P9?
>>
>> The expecation is that both kernel and userspace would get upgraded to
>> use the new instruction before actual persistent memory devices are
>> made available.
>>
>> >
>> >> > I'm thinking the kernel
>> >> > should go as far as to disable DAX operation by default on new
>> >> > hardware until userspace asserts that it is prepared to switch to the
>> >> > new implementation. Is there any other way to ensure the forward
>> >> > compatibility of deployed ppc64 DAX applications?
>> >>
>> >> AFAIU there is no released persistent memory hardware on ppc64 platform
>> >> and we need to make sure before applications get enabled to use these
>> >> persistent memory devices, they should switch to use the new
>> >> instruction?
>> >
>> > Right, I want the kernel to offer some level of safety here because
>> > everything you are describing sounds like a flag day conversion. Am I
>> > misreading? Is there some other gate that prevents existing users of
>> > of_pmem and papr_scm from having their expectations violated when
>> > running on P8 / P9 hardware? Maybe there's tighter ecosystem control
>> > that I'm just not familiar with, I'm only going off the fact that the
>> > kernel has shipped a non-zero number of NVDIMM drivers that build with
>> > ARCH=ppc64 for several cycles.
>>
>> If we are looking at adding changes to kernel that will prevent a kernel
>> from running on newer hardware in a specific case, we could as well take
>> the changes to get the kernel use the newer instructions right?
>
> Oh, no, I'm not talking about stopping the kernel from running. I'm
> simply recommending that support for MAP_SYNC mappings (userspace
> managed flushing) be disabled by default on PPC with either a
> compile-time or run-time default to assert that userspace has been
> audited for legacy applications or that the platform owner is
> otherwise willing to take the risk.
>
>> But I agree with your concern that if we have older kernel/applications
>> that continue to use `dcbf` on future hardware we will end up
>> having issues w.r.t powerfail consistency. The plan is what you outlined
>> above as tighter ecosystem control. Considering we don't have a pmem
>> device generally available, we get both kernel and userspace upgraded
>> to use these new instructions before such a device is made available.
>
> Ok, I think a compile time kernel option with a runtime override
> satisfies my concern. Does that work for you?

something like below? But this still won't handle devdax mmap right?

diff --git a/arch/arm64/include/asm/libnvdimm.h b/arch/arm64/include/asm/libnvdimm.h
new file mode 100644
index 000000000000..aee697a72537
--- /dev/null
+++ b/arch/arm64/include/asm/libnvdimm.h
@@ -0,0 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ARCH_LIBNVDIMM_H__
+#define __ARCH_LIBNVDIMM_H__
+
+#endif /* __ARCH_LIBNVDIMM_H__  */
diff --git a/arch/powerpc/include/asm/libnvdimm.h b/arch/powerpc/include/asm/libnvdimm.h
new file mode 100644
index 000000000000..da479200bfb8
--- /dev/null
+++ b/arch/powerpc/include/asm/libnvdimm.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ARCH_LIBNVDIMM_H__
+#define __ARCH_LIBNVDIMM_H__
+
+#define arch_disable_sync_nvdimm arch_disable_sync_nvdimm
+extern bool arch_disable_sync_nvdimm(void);
+
+#endif /* __ARCH_LIBNVDIMM_H__  */
diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 0666a8d29596..3ce4fb4f167b 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -9,6 +9,8 @@
 
 #include <asm/cacheflush.h>
 
+
+static bool sync_fault = IS_ENABLED(CONFIG_PPC_NVDIMM_SYNC_FAULT);
 /*
  * CONFIG_ARCH_HAS_PMEM_API symbols
  */
@@ -57,3 +59,16 @@ void memcpy_page_flushcache(char *to, struct page *page, size_t offset,
 	memcpy_flushcache(to, page_to_virt(page) + offset, len);
 }
 EXPORT_SYMBOL(memcpy_page_flushcache);
+
+bool arch_disable_sync_nvdimm(void)
+{
+	return !sync_fault;
+}
+
+static int __init parse_sync_fault(char *p)
+{
+	sync_fault = true;
+	return 0;
+}
+early_param("enable_sync_fault", parse_sync_fault);
+
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 27a81c291be8..dde11d75a746 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -383,6 +383,15 @@ config PPC_KUEP
 
 	  If you're unsure, say Y.
 
+config PPC_NVDIMM_SYNC_FAULT
+	bool "Synchronous fault support (MAP_SYNC)"
+	default n
+	help
+	  Enable support for synchronous fault with nvdimm namespaces.
+
+	  If you're unsure, say N.
+
+
 config PPC_HAVE_KUAP
 	bool
 
diff --git a/arch/x86/include/asm/libnvdimm.h b/arch/x86/include/asm/libnvdimm.h
new file mode 100644
index 000000000000..aee697a72537
--- /dev/null
+++ b/arch/x86/include/asm/libnvdimm.h
@@ -0,0 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ARCH_LIBNVDIMM_H__
+#define __ARCH_LIBNVDIMM_H__
+
+#endif /* __ARCH_LIBNVDIMM_H__  */
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index ccbb5b43b8b2..74a0809491af 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1278,6 +1278,13 @@ bool is_nvdimm_sync(struct nd_region *nd_region)
 	if (is_nd_volatile(&nd_region->dev))
 		return true;
 
+	/*
+	 * If arch is forcing a synchronous fault
+	 * disable.
+	 */
+	if (arch_disable_sync_nvdimm())
+		return false;
+
 	return is_nd_pmem(&nd_region->dev) &&
 		!test_bit(ND_REGION_ASYNC, &nd_region->flags);
 }
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 18da4059be09..891449aebe91 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -13,6 +13,8 @@
 #include <linux/spinlock.h>
 #include <linux/bio.h>
 
+#include <asm/libnvdimm.h>
+
 struct badrange_entry {
 	u64 start;
 	u64 length;
@@ -286,4 +288,12 @@ static inline void arch_invalidate_pmem(void *addr, size_t size)
 }
 #endif
 
+#ifndef arch_disable_sync_nvdimm
+#define arch_disable_sync_nvdimm arch_disable_sync_nvdimm
+static inline bool arch_disable_sync_nvdimm()
+{
+	return false;
+}
+#endif
+
 #endif /* __LIBNVDIMM_H__ */
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
  2020-05-19 18:59           ` Dan Williams
  2020-05-20 18:43             ` Aneesh Kumar K.V
@ 2020-05-21 14:38             ` Jeff Moyer
  2020-05-21 17:02               ` Aneesh Kumar K.V
  2020-05-21 18:34               ` Dan Williams
  1 sibling, 2 replies; 21+ messages in thread
From: Jeff Moyer @ 2020-05-21 14:38 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Dan Williams
  Cc: linuxppc-dev, Michael Ellerman, linux-nvdimm, alistair, mpatocka

Dan Williams <dan.j.williams@intel.com> writes:

>> But I agree with your concern that if we have older kernel/applications
>> that continue to use `dcbf` on future hardware we will end up
>> having issues w.r.t powerfail consistency. The plan is what you outlined
>> above as tighter ecosystem control. Considering we don't have a pmem
>> device generally available, we get both kernel and userspace upgraded
>> to use these new instructions before such a device is made available.

I thought power already supported NVDIMM-N, no?  So are you saying that
those devices will continue to work with the existing flushing and
fencing mechanisms?

> Ok, I think a compile time kernel option with a runtime override
> satisfies my concern. Does that work for you?

The compile time option only helps when running newer kernels.  I'm not
sure how you would even begin to audit userspace applications (keep in
mind, not every application is open source, and not every application
uses pmdk).  I also question the merits of forcing the administrator to
make the determination of whether all applications on the system will
work properly.  Really, you have to rely on the vendor to tell you the
platform is supported, and at that point, why put further hurdles in the
way?

The decision to require different instructions on ppc is unfortunate,
but one I'm sure we have no control over.  I don't see any merit in the
kernel disallowing MAP_SYNC access on these platforms.  Ideally, we'd
have some way of ensuring older kernels don't work with these new
platforms, but I don't think that's possible.

Moving on to the patch itself--Aneesh, have you audited other persistent
memory users in the kernel?  For example, drivers/md/dm-writecache.c does
this:

static void writecache_commit_flushed(struct dm_writecache *wc, bool wait_for_ios)
{
 	if (WC_MODE_PMEM(wc))
	        wmb(); <==========
        else
                ssd_commit_flushed(wc, wait_for_ios);
}

I believe you'll need to make modifications there.

Cheers,
Jeff
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
  2020-05-21 14:38             ` Jeff Moyer
@ 2020-05-21 17:02               ` Aneesh Kumar K.V
  2020-05-21 18:25                 ` Dan Williams
  2020-05-21 18:34               ` Dan Williams
  1 sibling, 1 reply; 21+ messages in thread
From: Aneesh Kumar K.V @ 2020-05-21 17:02 UTC (permalink / raw)
  To: Jeff Moyer, Dan Williams
  Cc: linuxppc-dev, Michael Ellerman, linux-nvdimm, alistair, mpatocka,
	Jan Kara

On 5/21/20 8:08 PM, Jeff Moyer wrote:
> Dan Williams <dan.j.williams@intel.com> writes:
> 
>>> But I agree with your concern that if we have older kernel/applications
>>> that continue to use `dcbf` on future hardware we will end up
>>> having issues w.r.t powerfail consistency. The plan is what you outlined
>>> above as tighter ecosystem control. Considering we don't have a pmem
>>> device generally available, we get both kernel and userspace upgraded
>>> to use these new instructions before such a device is made available.
> 
> I thought power already supported NVDIMM-N, no?  So are you saying that
> those devices will continue to work with the existing flushing and
> fencing mechanisms?
> 

yes. these devices can continue to use 'dcbf + hwsync' as long as we are 
running them on P9.


>> Ok, I think a compile time kernel option with a runtime override
>> satisfies my concern. Does that work for you?
> 
> The compile time option only helps when running newer kernels.  I'm not
> sure how you would even begin to audit userspace applications (keep in
> mind, not every application is open source, and not every application
> uses pmdk).  I also question the merits of forcing the administrator to
> make the determination of whether all applications on the system will
> work properly.  Really, you have to rely on the vendor to tell you the
> platform is supported, and at that point, why put further hurdles in the
> way?
> 
> The decision to require different instructions on ppc is unfortunate,
> but one I'm sure we have no control over.  I don't see any merit in the
> kernel disallowing MAP_SYNC access on these platforms.  Ideally, we'd
> have some way of ensuring older kernels don't work with these new
> platforms, but I don't think that's possible.
> 


I am currently looking at the possibility of firmware present these 
devices with different device-tree compat values. So that older 
/existing kernel won't initialize the device on newer systems. Is that a 
good compromise? We still can end up with older userspace and newer 
kernel. One of the option suggested by Jan Kara is to use a prctl flag 
to control that? (intead of kernel parameter option I posted before)


> Moving on to the patch itself--Aneesh, have you audited other persistent
> memory users in the kernel?  For example, drivers/md/dm-writecache.c does
> this:
> 
> static void writecache_commit_flushed(struct dm_writecache *wc, bool wait_for_ios)
> {
>   	if (WC_MODE_PMEM(wc))
> 	        wmb(); <==========
>          else
>                  ssd_commit_flushed(wc, wait_for_ios);
> }
> 
> I believe you'll need to make modifications there.
> 

Correct. Thanks for catching that.


I don't understand dm much, wondering how this will work with 
non-synchronous DAX device?

-aneesh
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
  2020-05-21 17:02               ` Aneesh Kumar K.V
@ 2020-05-21 18:25                 ` Dan Williams
  2020-05-21 18:52                   ` Mikulas Patocka
  0 siblings, 1 reply; 21+ messages in thread
From: Dan Williams @ 2020-05-21 18:25 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linuxppc-dev, Michael Ellerman, linux-nvdimm, alistair,
	Mikulas Patocka, Jan Kara

On Thu, May 21, 2020 at 10:03 AM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> On 5/21/20 8:08 PM, Jeff Moyer wrote:
> > Dan Williams <dan.j.williams@intel.com> writes:
> >
> >>> But I agree with your concern that if we have older kernel/applications
> >>> that continue to use `dcbf` on future hardware we will end up
> >>> having issues w.r.t powerfail consistency. The plan is what you outlined
> >>> above as tighter ecosystem control. Considering we don't have a pmem
> >>> device generally available, we get both kernel and userspace upgraded
> >>> to use these new instructions before such a device is made available.
> >
> > I thought power already supported NVDIMM-N, no?  So are you saying that
> > those devices will continue to work with the existing flushing and
> > fencing mechanisms?
> >
>
> yes. these devices can continue to use 'dcbf + hwsync' as long as we are
> running them on P9.
>
>
> >> Ok, I think a compile time kernel option with a runtime override
> >> satisfies my concern. Does that work for you?
> >
> > The compile time option only helps when running newer kernels.  I'm not
> > sure how you would even begin to audit userspace applications (keep in
> > mind, not every application is open source, and not every application
> > uses pmdk).  I also question the merits of forcing the administrator to
> > make the determination of whether all applications on the system will
> > work properly.  Really, you have to rely on the vendor to tell you the
> > platform is supported, and at that point, why put further hurdles in the
> > way?
> >
> > The decision to require different instructions on ppc is unfortunate,
> > but one I'm sure we have no control over.  I don't see any merit in the
> > kernel disallowing MAP_SYNC access on these platforms.  Ideally, we'd
> > have some way of ensuring older kernels don't work with these new
> > platforms, but I don't think that's possible.
> >
>
>
> I am currently looking at the possibility of firmware present these
> devices with different device-tree compat values. So that older
> /existing kernel won't initialize the device on newer systems. Is that a
> good compromise? We still can end up with older userspace and newer
> kernel. One of the option suggested by Jan Kara is to use a prctl flag
> to control that? (intead of kernel parameter option I posted before)
>
>
> > Moving on to the patch itself--Aneesh, have you audited other persistent
> > memory users in the kernel?  For example, drivers/md/dm-writecache.c does
> > this:
> >
> > static void writecache_commit_flushed(struct dm_writecache *wc, bool wait_for_ios)
> > {
> >       if (WC_MODE_PMEM(wc))
> >               wmb(); <==========
> >          else
> >                  ssd_commit_flushed(wc, wait_for_ios);
> > }
> >
> > I believe you'll need to make modifications there.
> >
>
> Correct. Thanks for catching that.
>
>
> I don't understand dm much, wondering how this will work with
> non-synchronous DAX device?

That's a good point. DM-writecache needs to be cognizant of things
like virtio-pmem that violate the rule that persisent memory writes
can be flushed by CPU functions rather than calling back into the
driver. It seems we need to always make the flush case a dax_operation
callback to account for this.
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
  2020-05-21 14:38             ` Jeff Moyer
  2020-05-21 17:02               ` Aneesh Kumar K.V
@ 2020-05-21 18:34               ` Dan Williams
  1 sibling, 0 replies; 21+ messages in thread
From: Dan Williams @ 2020-05-21 18:34 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Aneesh Kumar K.V, linuxppc-dev, Michael Ellerman, linux-nvdimm,
	alistair, Mikulas Patocka

On Thu, May 21, 2020 at 7:39 AM Jeff Moyer <jmoyer@redhat.com> wrote:
>
> Dan Williams <dan.j.williams@intel.com> writes:
>
> >> But I agree with your concern that if we have older kernel/applications
> >> that continue to use `dcbf` on future hardware we will end up
> >> having issues w.r.t powerfail consistency. The plan is what you outlined
> >> above as tighter ecosystem control. Considering we don't have a pmem
> >> device generally available, we get both kernel and userspace upgraded
> >> to use these new instructions before such a device is made available.
>
> I thought power already supported NVDIMM-N, no?  So are you saying that
> those devices will continue to work with the existing flushing and
> fencing mechanisms?
>
> > Ok, I think a compile time kernel option with a runtime override
> > satisfies my concern. Does that work for you?
>
> The compile time option only helps when running newer kernels.  I'm not
> sure how you would even begin to audit userspace applications (keep in
> mind, not every application is open source, and not every application
> uses pmdk).  I also question the merits of forcing the administrator to
> make the determination of whether all applications on the system will
> work properly.  Really, you have to rely on the vendor to tell you the
> platform is supported, and at that point, why put further hurdles in the
> way?

I'm thoroughly confused by this. I thought this was exactly the role
of a Linux distribution vendor. ISVs qualify their application on a
hardware-platform + distribution combination and the distribution owns
picking ABI defaults like CONFIG_SYSFS_DEPRECATED regardless of
whether they can guarantee that all apps are updated to the new
semantics.

The administrator is not forced, the administrator if afforded an
override in the extreme case that they find an exception to what was
qualified and need to override the distribution's compile-time choice.

>
> The decision to require different instructions on ppc is unfortunate,
> but one I'm sure we have no control over.  I don't see any merit in the
> kernel disallowing MAP_SYNC access on these platforms.  Ideally, we'd
> have some way of ensuring older kernels don't work with these new
> platforms, but I don't think that's possible.

I see disabling MAP_SYNC as the more targeted form of "ensursing older
kernels don't work.

So I guess we agree that something should break when baseline
assumptions change, we just don't yet agree on where that break should
happen?
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
  2020-05-21 18:25                 ` Dan Williams
@ 2020-05-21 18:52                   ` Mikulas Patocka
  2020-05-22  9:31                     ` Michal Suchánek
  0 siblings, 1 reply; 21+ messages in thread
From: Mikulas Patocka @ 2020-05-21 18:52 UTC (permalink / raw)
  To: Dan Williams
  Cc: Aneesh Kumar K.V, linuxppc-dev, Michael Ellerman, linux-nvdimm,
	alistair, Jan Kara



On Thu, 21 May 2020, Dan Williams wrote:

> On Thu, May 21, 2020 at 10:03 AM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
> >
> > > Moving on to the patch itself--Aneesh, have you audited other persistent
> > > memory users in the kernel?  For example, drivers/md/dm-writecache.c does
> > > this:
> > >
> > > static void writecache_commit_flushed(struct dm_writecache *wc, bool wait_for_ios)
> > > {
> > >       if (WC_MODE_PMEM(wc))
> > >               wmb(); <==========
> > >          else
> > >                  ssd_commit_flushed(wc, wait_for_ios);
> > > }
> > >
> > > I believe you'll need to make modifications there.
> > >
> >
> > Correct. Thanks for catching that.
> >
> >
> > I don't understand dm much, wondering how this will work with
> > non-synchronous DAX device?
> 
> That's a good point. DM-writecache needs to be cognizant of things
> like virtio-pmem that violate the rule that persisent memory writes
> can be flushed by CPU functions rather than calling back into the
> driver. It seems we need to always make the flush case a dax_operation
> callback to account for this.

dm-writecache is normally sitting on the top of dm-linear, so it would 
need to pass the wmb() call through the dm core and dm-linear target ... 
that would slow it down ... I remember that you already did it this way 
some times ago and then removed it.

What's the exact problem with POWER? Could the POWER system have two types 
of persistent memory that need two different ways of flushing?

Mikulas
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
  2020-05-21 18:52                   ` Mikulas Patocka
@ 2020-05-22  9:31                     ` Michal Suchánek
  2020-05-22 10:08                       ` Aneesh Kumar K.V
  0 siblings, 1 reply; 21+ messages in thread
From: Michal Suchánek @ 2020-05-22  9:31 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Jan Kara, linux-nvdimm, Aneesh Kumar K.V, alistair, linuxppc-dev

On Thu, May 21, 2020 at 02:52:30PM -0400, Mikulas Patocka wrote:
> 
> 
> On Thu, 21 May 2020, Dan Williams wrote:
> 
> > On Thu, May 21, 2020 at 10:03 AM Aneesh Kumar K.V
> > <aneesh.kumar@linux.ibm.com> wrote:
> > >
> > > > Moving on to the patch itself--Aneesh, have you audited other persistent
> > > > memory users in the kernel?  For example, drivers/md/dm-writecache.c does
> > > > this:
> > > >
> > > > static void writecache_commit_flushed(struct dm_writecache *wc, bool wait_for_ios)
> > > > {
> > > >       if (WC_MODE_PMEM(wc))
> > > >               wmb(); <==========
> > > >          else
> > > >                  ssd_commit_flushed(wc, wait_for_ios);
> > > > }
> > > >
> > > > I believe you'll need to make modifications there.
> > > >
> > >
> > > Correct. Thanks for catching that.
> > >
> > >
> > > I don't understand dm much, wondering how this will work with
> > > non-synchronous DAX device?
> > 
> > That's a good point. DM-writecache needs to be cognizant of things
> > like virtio-pmem that violate the rule that persisent memory writes
> > can be flushed by CPU functions rather than calling back into the
> > driver. It seems we need to always make the flush case a dax_operation
> > callback to account for this.
> 
> dm-writecache is normally sitting on the top of dm-linear, so it would 
> need to pass the wmb() call through the dm core and dm-linear target ... 
> that would slow it down ... I remember that you already did it this way 
> some times ago and then removed it.
> 
> What's the exact problem with POWER? Could the POWER system have two types 
> of persistent memory that need two different ways of flushing?

As far as I understand the discussion so far

 - on POWER $oldhardware uses $oldinstruction to ensure pmem consistency
 - on POWER $newhardware uses $newinstruction to ensure pmem consistency
   (compatible with $oldinstruction on $oldhardware)
 - on some platforms instead of barrier instruction a callback into the
   driver is issued to ensure consistency

None of this is reflected by the dm driver.

Thanks

Michal
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
  2020-05-22  9:31                     ` Michal Suchánek
@ 2020-05-22 10:08                       ` Aneesh Kumar K.V
  2020-05-22 13:01                         ` Mikulas Patocka
  0 siblings, 1 reply; 21+ messages in thread
From: Aneesh Kumar K.V @ 2020-05-22 10:08 UTC (permalink / raw)
  To: Michal Suchánek, Mikulas Patocka
  Cc: Jan Kara, linux-nvdimm, alistair, linuxppc-dev

On 5/22/20 3:01 PM, Michal Suchánek wrote:
> On Thu, May 21, 2020 at 02:52:30PM -0400, Mikulas Patocka wrote:
>>
>>
>> On Thu, 21 May 2020, Dan Williams wrote:
>>
>>> On Thu, May 21, 2020 at 10:03 AM Aneesh Kumar K.V
>>> <aneesh.kumar@linux.ibm.com> wrote:
>>>>
>>>>> Moving on to the patch itself--Aneesh, have you audited other persistent
>>>>> memory users in the kernel?  For example, drivers/md/dm-writecache.c does
>>>>> this:
>>>>>
>>>>> static void writecache_commit_flushed(struct dm_writecache *wc, bool wait_for_ios)
>>>>> {
>>>>>        if (WC_MODE_PMEM(wc))
>>>>>                wmb(); <==========
>>>>>           else
>>>>>                   ssd_commit_flushed(wc, wait_for_ios);
>>>>> }
>>>>>
>>>>> I believe you'll need to make modifications there.
>>>>>
>>>>
>>>> Correct. Thanks for catching that.
>>>>
>>>>
>>>> I don't understand dm much, wondering how this will work with
>>>> non-synchronous DAX device?
>>>
>>> That's a good point. DM-writecache needs to be cognizant of things
>>> like virtio-pmem that violate the rule that persisent memory writes
>>> can be flushed by CPU functions rather than calling back into the
>>> driver. It seems we need to always make the flush case a dax_operation
>>> callback to account for this.
>>
>> dm-writecache is normally sitting on the top of dm-linear, so it would
>> need to pass the wmb() call through the dm core and dm-linear target ...
>> that would slow it down ... I remember that you already did it this way
>> some times ago and then removed it.
>>
>> What's the exact problem with POWER? Could the POWER system have two types
>> of persistent memory that need two different ways of flushing?
> 
> As far as I understand the discussion so far
> 
>   - on POWER $oldhardware uses $oldinstruction to ensure pmem consistency
>   - on POWER $newhardware uses $newinstruction to ensure pmem consistency
>     (compatible with $oldinstruction on $oldhardware)

Correct.

>   - on some platforms instead of barrier instruction a callback into the
>     driver is issued to ensure consistency 

This is virtio-pmem only at this point IIUC.


> 
> None of this is reflected by the dm driver.
> 

-aneesh
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
  2020-05-22 10:08                       ` Aneesh Kumar K.V
@ 2020-05-22 13:01                         ` Mikulas Patocka
  2020-06-26 10:20                           ` Michal Suchánek
  0 siblings, 1 reply; 21+ messages in thread
From: Mikulas Patocka @ 2020-05-22 13:01 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Michal Suchánek, Jan Kara, linux-nvdimm, alistair, linuxppc-dev



On Fri, 22 May 2020, Aneesh Kumar K.V wrote:

> On 5/22/20 3:01 PM, Michal Suchánek wrote:
> > On Thu, May 21, 2020 at 02:52:30PM -0400, Mikulas Patocka wrote:
> > > 
> > > 
> > > On Thu, 21 May 2020, Dan Williams wrote:
> > > 
> > > > On Thu, May 21, 2020 at 10:03 AM Aneesh Kumar K.V
> > > > <aneesh.kumar@linux.ibm.com> wrote:
> > > > > 
> > > > > > Moving on to the patch itself--Aneesh, have you audited other
> > > > > > persistent
> > > > > > memory users in the kernel?  For example, drivers/md/dm-writecache.c
> > > > > > does
> > > > > > this:
> > > > > > 
> > > > > > static void writecache_commit_flushed(struct dm_writecache *wc, bool
> > > > > > wait_for_ios)
> > > > > > {
> > > > > >        if (WC_MODE_PMEM(wc))
> > > > > >                wmb(); <==========
> > > > > >           else
> > > > > >                   ssd_commit_flushed(wc, wait_for_ios);
> > > > > > }
> > > > > > 
> > > > > > I believe you'll need to make modifications there.
> > > > > > 
> > > > > 
> > > > > Correct. Thanks for catching that.
> > > > > 
> > > > > 
> > > > > I don't understand dm much, wondering how this will work with
> > > > > non-synchronous DAX device?
> > > > 
> > > > That's a good point. DM-writecache needs to be cognizant of things
> > > > like virtio-pmem that violate the rule that persisent memory writes
> > > > can be flushed by CPU functions rather than calling back into the
> > > > driver. It seems we need to always make the flush case a dax_operation
> > > > callback to account for this.
> > > 
> > > dm-writecache is normally sitting on the top of dm-linear, so it would
> > > need to pass the wmb() call through the dm core and dm-linear target ...
> > > that would slow it down ... I remember that you already did it this way
> > > some times ago and then removed it.
> > > 
> > > What's the exact problem with POWER? Could the POWER system have two types
> > > of persistent memory that need two different ways of flushing?
> > 
> > As far as I understand the discussion so far
> > 
> >   - on POWER $oldhardware uses $oldinstruction to ensure pmem consistency
> >   - on POWER $newhardware uses $newinstruction to ensure pmem consistency
> >     (compatible with $oldinstruction on $oldhardware)
> 
> Correct.
> 
> >   - on some platforms instead of barrier instruction a callback into the
> >     driver is issued to ensure consistency 
> 
> This is virtio-pmem only at this point IIUC.
> 
> -aneesh

And does the virtio-pmem driver track which pages are dirty? Or does it 
need to specify the range of pages to flush in the flush function?

> > None of this is reflected by the dm driver.

We could make a new dax method:
void *(dax_get_flush_function)(void);

This would return a pointer to "wmb()" on x86 and something else on Power.

The method "dax_get_flush_function" would be called only once when 
initializing the writecache driver (because the call would be slow because 
it would have to go through the DM stack) and then, the returned function 
would be called each time we need write ordering. The returned function 
would do just "sfence; ret".

Mikulas
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
  2020-05-22 13:01                         ` Mikulas Patocka
@ 2020-06-26 10:20                           ` Michal Suchánek
  0 siblings, 0 replies; 21+ messages in thread
From: Michal Suchánek @ 2020-06-26 10:20 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Aneesh Kumar K.V, Jan Kara, linux-nvdimm, alistair, linuxppc-dev

On Fri, May 22, 2020 at 09:01:17AM -0400, Mikulas Patocka wrote:
> 
> 
> On Fri, 22 May 2020, Aneesh Kumar K.V wrote:
> 
> > On 5/22/20 3:01 PM, Michal Suchánek wrote:
> > > On Thu, May 21, 2020 at 02:52:30PM -0400, Mikulas Patocka wrote:
> > > > 
> > > > 
> > > > On Thu, 21 May 2020, Dan Williams wrote:
> > > > 
> > > > > On Thu, May 21, 2020 at 10:03 AM Aneesh Kumar K.V
> > > > > <aneesh.kumar@linux.ibm.com> wrote:
> > > > > > 
> > > > > > > Moving on to the patch itself--Aneesh, have you audited other
> > > > > > > persistent
> > > > > > > memory users in the kernel?  For example, drivers/md/dm-writecache.c
> > > > > > > does
> > > > > > > this:
> > > > > > > 
> > > > > > > static void writecache_commit_flushed(struct dm_writecache *wc, bool
> > > > > > > wait_for_ios)
> > > > > > > {
> > > > > > >        if (WC_MODE_PMEM(wc))
> > > > > > >                wmb(); <==========
> > > > > > >           else
> > > > > > >                   ssd_commit_flushed(wc, wait_for_ios);
> > > > > > > }
> > > > > > > 
> > > > > > > I believe you'll need to make modifications there.
> > > > > > > 
> > > > > > 
> > > > > > Correct. Thanks for catching that.
> > > > > > 
> > > > > > 
> > > > > > I don't understand dm much, wondering how this will work with
> > > > > > non-synchronous DAX device?
> > > > > 
> > > > > That's a good point. DM-writecache needs to be cognizant of things
> > > > > like virtio-pmem that violate the rule that persisent memory writes
> > > > > can be flushed by CPU functions rather than calling back into the
> > > > > driver. It seems we need to always make the flush case a dax_operation
> > > > > callback to account for this.
> > > > 
> > > > dm-writecache is normally sitting on the top of dm-linear, so it would
> > > > need to pass the wmb() call through the dm core and dm-linear target ...
> > > > that would slow it down ... I remember that you already did it this way
> > > > some times ago and then removed it.
> > > > 
> > > > What's the exact problem with POWER? Could the POWER system have two types
> > > > of persistent memory that need two different ways of flushing?
> > > 
> > > As far as I understand the discussion so far
> > > 
> > >   - on POWER $oldhardware uses $oldinstruction to ensure pmem consistency
> > >   - on POWER $newhardware uses $newinstruction to ensure pmem consistency
> > >     (compatible with $oldinstruction on $oldhardware)
> > 
> > Correct.
> > 
> > >   - on some platforms instead of barrier instruction a callback into the
> > >     driver is issued to ensure consistency 
> > 
> > This is virtio-pmem only at this point IIUC.
> > 
> > -aneesh
> 
> And does the virtio-pmem driver track which pages are dirty? Or does it 
> need to specify the range of pages to flush in the flush function?
> 
> > > None of this is reflected by the dm driver.
> 
> We could make a new dax method:
> void *(dax_get_flush_function)(void);
> 
> This would return a pointer to "wmb()" on x86 and something else on Power.
> 
> The method "dax_get_flush_function" would be called only once when 
> initializing the writecache driver (because the call would be slow because 
> it would have to go through the DM stack) and then, the returned function 
> would be called each time we need write ordering. The returned function 
> would do just "sfence; ret".

Hello,

as far as I understand the code virtio_pmem has a fush function defined
which indeed can make use of the region properties, such as memory
range. If such function exists you need quivalent of sync() - call into
the device in question. If it does not calling arch_pmem_flush_barrier()
instead of wmb() should suffice.

I am not aware of an interface to determine if the flush function exists
for a particular region.

Thanks

Michal
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-06-26 10:20 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-13  3:47 [PATCH v2 1/5] powerpc/pmem: Add new instructions for persistent storage and sync Aneesh Kumar K.V
2020-05-13  3:47 ` [PATCH v2 2/5] powerpc/pmem: Add flush routines using new pmem store and sync instruction Aneesh Kumar K.V
2020-05-13  3:47 ` [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier Aneesh Kumar K.V
2020-05-13 16:14   ` Dan Williams
2020-05-19  5:30     ` Aneesh Kumar K.V
2020-05-19  7:09       ` Dan Williams
2020-05-19 13:52         ` Aneesh Kumar K.V
2020-05-19 18:59           ` Dan Williams
2020-05-20 18:43             ` Aneesh Kumar K.V
2020-05-21 14:38             ` Jeff Moyer
2020-05-21 17:02               ` Aneesh Kumar K.V
2020-05-21 18:25                 ` Dan Williams
2020-05-21 18:52                   ` Mikulas Patocka
2020-05-22  9:31                     ` Michal Suchánek
2020-05-22 10:08                       ` Aneesh Kumar K.V
2020-05-22 13:01                         ` Mikulas Patocka
2020-06-26 10:20                           ` Michal Suchánek
2020-05-21 18:34               ` Dan Williams
2020-05-13  3:47 ` [PATCH v2 4/5] powerpc/pmem/of_pmem: Update of_pmem to use the new barrier instruction Aneesh Kumar K.V
2020-05-13  6:44   ` kbuild test robot
2020-05-13  3:47 ` [PATCH v2 5/5] powerpc/pmem: Avoid the barrier in flush routines Aneesh Kumar K.V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).