linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] dax: I/O path enhancements
@ 2015-08-13 16:51 Ross Zwisler
  2015-08-13 16:51 ` [PATCH v2 1/7] brd: make rd_size static Ross Zwisler
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Ross Zwisler @ 2015-08-13 16:51 UTC (permalink / raw)
  To: linux-kernel, linux-nvdimm, Dan Williams
  Cc: Ross Zwisler, Alexander Viro, Ameen Ali, Andrew Morton,
	Arnd Bergmann, Benjamin Herrenschmidt, Boaz Harrosh,
	Borislav Petkov, Christoph Hellwig, David S. Miller,
	Gerald Schaefer, Greg KH, Heiko Carstens, H. Peter Anvin,
	Ingo Molnar, Jan Kara, Jeff Layton, Jens Axboe, Jiri Slaby,
	Joe Perches, Jonathan Corbet, Juergen Gross, linux390, linux-doc,
	linux-fsdevel, linuxppc-dev, linux-s390, Martin K. Petersen,
	Martin Schwidefsky, Matthew Wilcox, Mauro Carvalho Chehab,
	Michael Ellerman, Mike Snitzer, Miklos Szeredi, Ming Lei,
	Omar Sandoval, Paul Mackerras, Sagi Grimberg, Shaohua Li,
	Tejun Heo, Thomas Gleixner, Toshi Kani, Uwe Kleine-König,
	Wolfram Sang, x86, Dave Chinner

The goal of this series is to enhance the DAX I/O path so that all operations
that store data (I/O writes, zeroing blocks, punching holes, etc.) properly
synchronize the stores to media using the PMEM API.  This ensures that the data
DAX is writing is durable on media before the operation completes.

Patches 1-4 are a few random cleanups.

Changes from v1:
 - Removed patches to PMEM for the "read flush" _DSM flag.  These are different
   enough that they deserve their own series, and they have a separate baseline
   which is currently moving (Dan's memremap() series).
 - Added clear_pmem() PMEM API to zero DAX memory and flush it in one call.
   (Dave)
 - Open coded flushing in arch_wb_cache_pmem() instead of adding a generic
   clwb_flush_range().  This allowed me to avoid having extra memory barriers
   and instead rely completely on arch_wmb_pmem() for ordering. (Dave)
 - Moved the arch implementation of the PMEM API into it's own arch header
   (Christoph).

Ross Zwisler (7):
  brd: make rd_size static
  pmem, x86: move x86 PMEM API to new pmem.h header
  pmem: remove layer when calling arch_has_wmb_pmem()
  pmem, x86: clean up conditional pmem includes
  pmem: add wb_cache_pmem() and clear_pmem()
  dax: update I/O path to do proper PMEM flushing
  pmem, dax: have direct_access use __pmem annotation

 Documentation/filesystems/Locking |   3 +-
 MAINTAINERS                       |   1 +
 arch/powerpc/sysdev/axonram.c     |   7 ++-
 arch/x86/include/asm/cacheflush.h |  71 ----------------------
 arch/x86/include/asm/pmem.h       | 123 ++++++++++++++++++++++++++++++++++++++
 drivers/block/brd.c               |   6 +-
 drivers/nvdimm/pmem.c             |   4 +-
 drivers/s390/block/dcssblk.c      |  10 ++--
 fs/block_dev.c                    |   2 +-
 fs/dax.c                          |  73 ++++++++++++++--------
 include/linux/blkdev.h            |   8 +--
 include/linux/pmem.h              |  66 ++++++++++++++++----
 12 files changed, 247 insertions(+), 127 deletions(-)
 create mode 100644 arch/x86/include/asm/pmem.h

-- 
2.1.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/7] brd: make rd_size static
  2015-08-13 16:51 [PATCH v2 0/7] dax: I/O path enhancements Ross Zwisler
@ 2015-08-13 16:51 ` Ross Zwisler
  2015-08-13 16:51 ` [PATCH v2 2/7] pmem, x86: move x86 PMEM API to new pmem.h header Ross Zwisler
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Ross Zwisler @ 2015-08-13 16:51 UTC (permalink / raw)
  To: linux-kernel, linux-nvdimm, Dan Williams; +Cc: Ross Zwisler, Jens Axboe

Make rd_size static because it is local to drivers/block/brd.c

This was reported by sparse:

drivers/block/brd.c:445:5: warning: symbol 'rd_size' was not declared.
Should it be static?

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
---
 drivers/block/brd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 64ab495..5750b39 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -442,7 +442,7 @@ static int rd_nr = CONFIG_BLK_DEV_RAM_COUNT;
 module_param(rd_nr, int, S_IRUGO);
 MODULE_PARM_DESC(rd_nr, "Maximum number of brd devices");
 
-int rd_size = CONFIG_BLK_DEV_RAM_SIZE;
+static int rd_size = CONFIG_BLK_DEV_RAM_SIZE;
 module_param(rd_size, int, S_IRUGO);
 MODULE_PARM_DESC(rd_size, "Size of each RAM disk in kbytes.");
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/7] pmem, x86: move x86 PMEM API to new pmem.h header
  2015-08-13 16:51 [PATCH v2 0/7] dax: I/O path enhancements Ross Zwisler
  2015-08-13 16:51 ` [PATCH v2 1/7] brd: make rd_size static Ross Zwisler
@ 2015-08-13 16:51 ` Ross Zwisler
  2015-08-15  8:58   ` Christoph Hellwig
  2015-08-13 16:51 ` [PATCH v2 3/7] pmem: remove layer when calling arch_has_wmb_pmem() Ross Zwisler
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Ross Zwisler @ 2015-08-13 16:51 UTC (permalink / raw)
  To: linux-kernel, linux-nvdimm, Dan Williams
  Cc: Ross Zwisler, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andrew Morton, Arnd Bergmann, Mauro Carvalho Chehab, Greg KH,
	David S. Miller, Joe Perches, Jiri Slaby, Tejun Heo,
	Borislav Petkov, Juergen Gross, Toshi Kani, Christoph Hellwig

Move the x86 PMEM API implementation out of asm/cacheflush.h and into
its own header asm/pmem.h.  This will allow members of the PMEM API to
be more easily identified on this and other architectures.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
---
 MAINTAINERS                       |  1 +
 arch/x86/include/asm/cacheflush.h | 71 ------------------------------
 arch/x86/include/asm/pmem.h       | 92 +++++++++++++++++++++++++++++++++++++++
 include/linux/pmem.h              |  2 +-
 4 files changed, 94 insertions(+), 72 deletions(-)
 create mode 100644 arch/x86/include/asm/pmem.h

diff --git a/MAINTAINERS b/MAINTAINERS
index a9ae6c1..d0b5a08 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6165,6 +6165,7 @@ Q:	https://patchwork.kernel.org/project/linux-nvdimm/list/
 S:	Supported
 F:	drivers/nvdimm/pmem.c
 F:	include/linux/pmem.h
+F:	arch/*/include/asm/pmem.h
 
 LINUX FOR IBM pSERIES (RS/6000)
 M:	Paul Mackerras <paulus@au.ibm.com>
diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
index 9bf3ea1..471418a 100644
--- a/arch/x86/include/asm/cacheflush.h
+++ b/arch/x86/include/asm/cacheflush.h
@@ -109,75 +109,4 @@ static inline int rodata_test(void)
 }
 #endif
 
-#ifdef ARCH_HAS_NOCACHE_UACCESS
-
-/**
- * arch_memcpy_to_pmem - copy data to persistent memory
- * @dst: destination buffer for the copy
- * @src: source buffer for the copy
- * @n: length of the copy in bytes
- *
- * Copy data to persistent memory media via non-temporal stores so that
- * a subsequent arch_wmb_pmem() can flush cpu and memory controller
- * write buffers to guarantee durability.
- */
-static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src,
-		size_t n)
-{
-	int unwritten;
-
-	/*
-	 * We are copying between two kernel buffers, if
-	 * __copy_from_user_inatomic_nocache() returns an error (page
-	 * fault) we would have already reported a general protection fault
-	 * before the WARN+BUG.
-	 */
-	unwritten = __copy_from_user_inatomic_nocache((void __force *) dst,
-			(void __user *) src, n);
-	if (WARN(unwritten, "%s: fault copying %p <- %p unwritten: %d\n",
-				__func__, dst, src, unwritten))
-		BUG();
-}
-
-/**
- * arch_wmb_pmem - synchronize writes to persistent memory
- *
- * After a series of arch_memcpy_to_pmem() operations this drains data
- * from cpu write buffers and any platform (memory controller) buffers
- * to ensure that written data is durable on persistent memory media.
- */
-static inline void arch_wmb_pmem(void)
-{
-	/*
-	 * wmb() to 'sfence' all previous writes such that they are
-	 * architecturally visible to 'pcommit'.  Note, that we've
-	 * already arranged for pmem writes to avoid the cache via
-	 * arch_memcpy_to_pmem().
-	 */
-	wmb();
-	pcommit_sfence();
-}
-
-static inline bool __arch_has_wmb_pmem(void)
-{
-#ifdef CONFIG_X86_64
-	/*
-	 * We require that wmb() be an 'sfence', that is only guaranteed on
-	 * 64-bit builds
-	 */
-	return static_cpu_has(X86_FEATURE_PCOMMIT);
-#else
-	return false;
-#endif
-}
-#else /* ARCH_HAS_NOCACHE_UACCESS i.e. ARCH=um */
-extern void arch_memcpy_to_pmem(void __pmem *dst, const void *src, size_t n);
-extern void arch_wmb_pmem(void);
-
-static inline bool __arch_has_wmb_pmem(void)
-{
-	return false;
-}
-#endif
-
 #endif /* _ASM_X86_CACHEFLUSH_H */
diff --git a/arch/x86/include/asm/pmem.h b/arch/x86/include/asm/pmem.h
new file mode 100644
index 0000000..f43462c
--- /dev/null
+++ b/arch/x86/include/asm/pmem.h
@@ -0,0 +1,92 @@
+/*
+ * Copyright(c) 2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#ifndef __ASM_X86_PMEM_H__
+#define __ASM_X86_PMEM_H__
+
+#include <linux/uaccess.h>
+#include <asm/cacheflush.h>
+#include <asm/cpufeature.h>
+#include <asm/special_insns.h>
+
+#ifdef ARCH_HAS_NOCACHE_UACCESS
+
+/**
+ * arch_memcpy_to_pmem - copy data to persistent memory
+ * @dst: destination buffer for the copy
+ * @src: source buffer for the copy
+ * @n: length of the copy in bytes
+ *
+ * Copy data to persistent memory media via non-temporal stores so that
+ * a subsequent arch_wmb_pmem() can flush cpu and memory controller
+ * write buffers to guarantee durability.
+ */
+static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src,
+		size_t n)
+{
+	int unwritten;
+
+	/*
+	 * We are copying between two kernel buffers, if
+	 * __copy_from_user_inatomic_nocache() returns an error (page
+	 * fault) we would have already reported a general protection fault
+	 * before the WARN+BUG.
+	 */
+	unwritten = __copy_from_user_inatomic_nocache((void __force *) dst,
+			(void __user *) src, n);
+	if (WARN(unwritten, "%s: fault copying %p <- %p unwritten: %d\n",
+				__func__, dst, src, unwritten))
+		BUG();
+}
+
+/**
+ * arch_wmb_pmem - synchronize writes to persistent memory
+ *
+ * After a series of arch_memcpy_to_pmem() operations this drains data
+ * from cpu write buffers and any platform (memory controller) buffers
+ * to ensure that written data is durable on persistent memory media.
+ */
+static inline void arch_wmb_pmem(void)
+{
+	/*
+	 * wmb() to 'sfence' all previous writes such that they are
+	 * architecturally visible to 'pcommit'.  Note, that we've
+	 * already arranged for pmem writes to avoid the cache via
+	 * arch_memcpy_to_pmem().
+	 */
+	wmb();
+	pcommit_sfence();
+}
+
+static inline bool __arch_has_wmb_pmem(void)
+{
+#ifdef CONFIG_X86_64
+	/*
+	 * We require that wmb() be an 'sfence', that is only guaranteed on
+	 * 64-bit builds
+	 */
+	return static_cpu_has(X86_FEATURE_PCOMMIT);
+#else
+	return false;
+#endif
+}
+#else /* ARCH_HAS_NOCACHE_UACCESS i.e. ARCH=um */
+extern void arch_memcpy_to_pmem(void __pmem *dst, const void *src, size_t n);
+extern void arch_wmb_pmem(void);
+
+static inline bool __arch_has_wmb_pmem(void)
+{
+	return false;
+}
+#endif
+
+#endif /* __ASM_X86_PMEM_H__ */
diff --git a/include/linux/pmem.h b/include/linux/pmem.h
index d211404..5dad6d1 100644
--- a/include/linux/pmem.h
+++ b/include/linux/pmem.h
@@ -16,7 +16,7 @@
 #include <linux/io.h>
 
 #ifdef CONFIG_ARCH_HAS_PMEM_API
-#include <asm/cacheflush.h>
+#include <asm/pmem.h>
 #else
 static inline void arch_wmb_pmem(void)
 {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/7] pmem: remove layer when calling arch_has_wmb_pmem()
  2015-08-13 16:51 [PATCH v2 0/7] dax: I/O path enhancements Ross Zwisler
  2015-08-13 16:51 ` [PATCH v2 1/7] brd: make rd_size static Ross Zwisler
  2015-08-13 16:51 ` [PATCH v2 2/7] pmem, x86: move x86 PMEM API to new pmem.h header Ross Zwisler
@ 2015-08-13 16:51 ` Ross Zwisler
  2015-08-13 16:51 ` [PATCH v2 4/7] pmem, x86: clean up conditional pmem includes Ross Zwisler
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Ross Zwisler @ 2015-08-13 16:51 UTC (permalink / raw)
  To: linux-kernel, linux-nvdimm, Dan Williams
  Cc: Ross Zwisler, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Christoph Hellwig

Prior to this change arch_has_wmb_pmem() was only called by
arch_has_pmem_api().  Both arch_has_wmb_pmem() and arch_has_pmem_api()
checked to make sure that CONFIG_ARCH_HAS_PMEM_API was enabled.

Instead, remove the old arch_has_wmb_pmem() wrapper to be rid of one
extra layer of indirection and the redundant CONFIG_ARCH_HAS_PMEM_API
check. Rename __arch_has_wmb_pmem() to arch_has_wmb_pmem() since we no
longer have a wrapper, and just have arch_has_pmem_api() call the
architecture specific arch_has_wmb_pmem() directly.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 arch/x86/include/asm/pmem.h |  2 +-
 include/linux/pmem.h        | 13 +++----------
 2 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/pmem.h b/arch/x86/include/asm/pmem.h
index f43462c..1e8dbb7 100644
--- a/arch/x86/include/asm/pmem.h
+++ b/arch/x86/include/asm/pmem.h
@@ -67,7 +67,7 @@ static inline void arch_wmb_pmem(void)
 	pcommit_sfence();
 }
 
-static inline bool __arch_has_wmb_pmem(void)
+static inline bool arch_has_wmb_pmem(void)
 {
 #ifdef CONFIG_X86_64
 	/*
diff --git a/include/linux/pmem.h b/include/linux/pmem.h
index 5dad6d1..9d619d2 100644
--- a/include/linux/pmem.h
+++ b/include/linux/pmem.h
@@ -23,7 +23,7 @@ static inline void arch_wmb_pmem(void)
 	BUG();
 }
 
-static inline bool __arch_has_wmb_pmem(void)
+static inline bool arch_has_wmb_pmem(void)
 {
 	return false;
 }
@@ -44,7 +44,7 @@ static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src,
 /*
  * Architectures that define ARCH_HAS_PMEM_API must provide
  * implementations for arch_memremap_pmem(), arch_memcpy_to_pmem(),
- * arch_wmb_pmem(), and __arch_has_wmb_pmem().
+ * arch_wmb_pmem(), and arch_has_wmb_pmem().
  */
 
 static inline void memcpy_from_pmem(void *dst, void __pmem const *src, size_t size)
@@ -58,7 +58,7 @@ static inline void memunmap_pmem(void __pmem *addr)
 }
 
 /**
- * arch_has_wmb_pmem - true if wmb_pmem() ensures durability
+ * arch_has_pmem_api - true if wmb_pmem() ensures durability
  *
  * For a given cpu implementation within an architecture it is possible
  * that wmb_pmem() resolves to a nop.  In the case this returns
@@ -66,13 +66,6 @@ static inline void memunmap_pmem(void __pmem *addr)
  * fall back to a different data consistency model, or otherwise notify
  * the user.
  */
-static inline bool arch_has_wmb_pmem(void)
-{
-	if (IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API))
-		return __arch_has_wmb_pmem();
-	return false;
-}
-
 static inline bool arch_has_pmem_api(void)
 {
 	return IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API) && arch_has_wmb_pmem();
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 4/7] pmem, x86: clean up conditional pmem includes
  2015-08-13 16:51 [PATCH v2 0/7] dax: I/O path enhancements Ross Zwisler
                   ` (2 preceding siblings ...)
  2015-08-13 16:51 ` [PATCH v2 3/7] pmem: remove layer when calling arch_has_wmb_pmem() Ross Zwisler
@ 2015-08-13 16:51 ` Ross Zwisler
  2015-08-13 16:51 ` [PATCH v2 5/7] pmem: add wb_cache_pmem() and clear_pmem() Ross Zwisler
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Ross Zwisler @ 2015-08-13 16:51 UTC (permalink / raw)
  To: linux-kernel, linux-nvdimm, Dan Williams
  Cc: Ross Zwisler, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86

Prior to this change x86_64 used the pmem defines in
arch/x86/include/asm/pmem.h, and UM used the default ones at the
top of include/linux/pmem.h.  The inclusion or exclusion in linux/pmem.h
was controlled by CONFIG_ARCH_HAS_PMEM_API, but the ones in asm/pmem.h
were controlled by ARCH_HAS_NOCACHE_UACCESS.

Instead, control them both with CONFIG_ARCH_HAS_PMEM_API so that it's
clear that they are related and we don't run into the possibility where
they are both included or excluded.  Also remove a bunch of stale
function prototypes meant for UM in asm/pmem.h - these just conflicted
with the inline defaults in linux/pmem.h and gave compile errors.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
---
 arch/x86/include/asm/pmem.h | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/pmem.h b/arch/x86/include/asm/pmem.h
index 1e8dbb7..7f3413f 100644
--- a/arch/x86/include/asm/pmem.h
+++ b/arch/x86/include/asm/pmem.h
@@ -18,8 +18,7 @@
 #include <asm/cpufeature.h>
 #include <asm/special_insns.h>
 
-#ifdef ARCH_HAS_NOCACHE_UACCESS
-
+#ifdef CONFIG_ARCH_HAS_PMEM_API
 /**
  * arch_memcpy_to_pmem - copy data to persistent memory
  * @dst: destination buffer for the copy
@@ -79,14 +78,6 @@ static inline bool arch_has_wmb_pmem(void)
 	return false;
 #endif
 }
-#else /* ARCH_HAS_NOCACHE_UACCESS i.e. ARCH=um */
-extern void arch_memcpy_to_pmem(void __pmem *dst, const void *src, size_t n);
-extern void arch_wmb_pmem(void);
-
-static inline bool __arch_has_wmb_pmem(void)
-{
-	return false;
-}
-#endif
+#endif /* CONFIG_ARCH_HAS_PMEM_API */
 
 #endif /* __ASM_X86_PMEM_H__ */
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 5/7] pmem: add wb_cache_pmem() and clear_pmem()
  2015-08-13 16:51 [PATCH v2 0/7] dax: I/O path enhancements Ross Zwisler
                   ` (3 preceding siblings ...)
  2015-08-13 16:51 ` [PATCH v2 4/7] pmem, x86: clean up conditional pmem includes Ross Zwisler
@ 2015-08-13 16:51 ` Ross Zwisler
  2015-08-13 16:51 ` [PATCH v2 6/7] dax: update I/O path to do proper PMEM flushing Ross Zwisler
  2015-08-13 16:51 ` [PATCH v2 7/7] pmem, dax: have direct_access use __pmem annotation Ross Zwisler
  6 siblings, 0 replies; 21+ messages in thread
From: Ross Zwisler @ 2015-08-13 16:51 UTC (permalink / raw)
  To: linux-kernel, linux-nvdimm, Dan Williams
  Cc: Ross Zwisler, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Dave Chinner

Add support for two new PMEM APIs, wb_cache_pmem() and clear_pmem().
The first, wb_cache_pmem(), is used to write back ranges of dirtied
cache lines to media in order to make stores durable.  The contents of
the now-clean cache lines can potentially still reside in the cache
after this write back operation allowing subsequent loads to be serviced
from the cache.

The second, clear_pmem(), zeros a PMEM memory range and ensures that the
newly zeroed data is properly flushed from the processor cache to media.
This can be done either with normal writes followed by wb_cache_pmem()
calls, or by using non-temporal stores.

Both of these new APIs must be explicitly ordered using a wmb_pmem()
function call.  Because both APIs are unordered they can be called as
needed without introducing any unwanted memory barriers.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
---
 arch/x86/include/asm/pmem.h | 40 ++++++++++++++++++++++++++++++++++
 include/linux/pmem.h        | 53 ++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/pmem.h b/arch/x86/include/asm/pmem.h
index 7f3413f..89b04c0 100644
--- a/arch/x86/include/asm/pmem.h
+++ b/arch/x86/include/asm/pmem.h
@@ -66,6 +66,46 @@ static inline void arch_wmb_pmem(void)
 	pcommit_sfence();
 }
 
+/**
+ * arch_wb_cache_pmem - write back a cache range with CLWB
+ * @addr:	virtual start address
+ * @size:	number of bytes to write back
+ *
+ * Write back a cache range using the CLWB (cache line write back)
+ * instruction.  This function requires explicit ordering with an
+ * arch_wmb_pmem() function call.
+ */
+static inline void arch_wb_cache_pmem(void __pmem *addr, size_t size)
+{
+	u16 x86_clflush_size = boot_cpu_data.x86_clflush_size;
+	unsigned long clflush_mask = x86_clflush_size - 1;
+	void *vend = (void __force *)addr + size;
+	void *p;
+
+	for (p = (void *)((unsigned long)addr & ~clflush_mask);
+	     p < vend; p += x86_clflush_size)
+		clwb(p);
+}
+
+/**
+ * arch_clear_pmem - zero a PMEM memory range
+ * @addr:	virtual start address
+ * @size:	number of bytes to zero
+ *
+ * Write zeros into the memory range starting at 'addr' for 'size' bytes.
+ * This function requires explicit ordering with an arch_wmb_pmem() call.
+ */
+static inline void arch_clear_pmem(void __pmem *addr, size_t size)
+{
+	/* TODO: implement the zeroing via non-temporal writes */
+	if (size == PAGE_SIZE && ((unsigned long)addr & ~PAGE_MASK) == 0)
+		clear_page((void __force *)addr);
+	else
+		memset((void __force *)addr, 0, size);
+
+	arch_wb_cache_pmem(addr, size);
+}
+
 static inline bool arch_has_wmb_pmem(void)
 {
 #ifdef CONFIG_X86_64
diff --git a/include/linux/pmem.h b/include/linux/pmem.h
index 9d619d2..dd1b72c 100644
--- a/include/linux/pmem.h
+++ b/include/linux/pmem.h
@@ -39,12 +39,22 @@ static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src,
 {
 	BUG();
 }
+
+static inline void arch_wb_cache_pmem(void __pmem *addr, size_t size)
+{
+	BUG();
+}
+
+static inline void arch_clear_pmem(void __pmem *addr, size_t size)
+{
+	BUG();
+}
 #endif
 
 /*
- * Architectures that define ARCH_HAS_PMEM_API must provide
- * implementations for arch_memremap_pmem(), arch_memcpy_to_pmem(),
- * arch_wmb_pmem(), and arch_has_wmb_pmem().
+ * Architectures that define ARCH_HAS_PMEM_API must provide implementations
+ * for arch_memremap_pmem(), arch_memcpy_to_pmem(), arch_wmb_pmem(),
+ * arch_wb_cache_pmem(), arch_clear_pmem() and arch_has_wmb_pmem().
  */
 
 static inline void memcpy_from_pmem(void *dst, void __pmem const *src, size_t size)
@@ -90,6 +100,14 @@ static void __pmem *default_memremap_pmem(resource_size_t offset,
 	return (void __pmem __force *)ioremap_wt(offset, size);
 }
 
+static inline void default_clear_pmem(void __pmem *addr, size_t size)
+{
+	if (size == PAGE_SIZE && ((unsigned long)addr & ~PAGE_MASK) == 0)
+		clear_page((void __force *)addr);
+	else
+		memset((void __force *)addr, 0, size);
+}
+
 /**
  * memremap_pmem - map physical persistent memory for pmem api
  * @offset: physical address of persistent memory
@@ -142,4 +160,33 @@ static inline void wmb_pmem(void)
 	if (arch_has_pmem_api())
 		arch_wmb_pmem();
 }
+
+/**
+ * wb_cache_pmem - write back a cache range
+ * @addr:	virtual start address
+ * @size:	number of bytes to write back
+ *
+ * This function requires explicit ordering with a wmb_pmem() call.
+ */
+static inline void wb_cache_pmem(void __pmem *addr, size_t size)
+{
+	if (arch_has_pmem_api())
+		arch_wb_cache_pmem(addr, size);
+}
+
+/**
+ * clear_pmem - zero a PMEM memory range
+ * @addr:	virtual start address
+ * @size:	number of bytes to zero
+ *
+ * Write zeros into the memory range starting at 'addr' for 'size' bytes.
+ * This function requires explicit ordering with a wmb_pmem() call.
+ */
+static inline void clear_pmem(void __pmem *addr, size_t size)
+{
+	if (arch_has_pmem_api())
+		arch_clear_pmem(addr, size);
+	else
+		default_clear_pmem(addr, size);
+}
 #endif /* __PMEM_H__ */
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 6/7] dax: update I/O path to do proper PMEM flushing
  2015-08-13 16:51 [PATCH v2 0/7] dax: I/O path enhancements Ross Zwisler
                   ` (4 preceding siblings ...)
  2015-08-13 16:51 ` [PATCH v2 5/7] pmem: add wb_cache_pmem() and clear_pmem() Ross Zwisler
@ 2015-08-13 16:51 ` Ross Zwisler
  2015-08-13 21:11   ` Dan Williams
  2015-08-13 16:51 ` [PATCH v2 7/7] pmem, dax: have direct_access use __pmem annotation Ross Zwisler
  6 siblings, 1 reply; 21+ messages in thread
From: Ross Zwisler @ 2015-08-13 16:51 UTC (permalink / raw)
  To: linux-kernel, linux-nvdimm, Dan Williams
  Cc: Ross Zwisler, Matthew Wilcox, Alexander Viro, linux-fsdevel,
	Dave Chinner

Update the DAX I/O path so that all operations that store data (I/O
writes, zeroing blocks, punching holes, etc.) properly synchronize the
stores to media using the PMEM API.  This ensures that the data DAX is
writing is durable on media before the operation completes.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
---
 fs/dax.c | 45 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index b6769ce..ea1b2c8 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -17,12 +17,14 @@
 #include <linux/atomic.h>
 #include <linux/blkdev.h>
 #include <linux/buffer_head.h>
+#include <linux/dax.h>
 #include <linux/fs.h>
 #include <linux/genhd.h>
 #include <linux/highmem.h>
 #include <linux/memcontrol.h>
 #include <linux/mm.h>
 #include <linux/mutex.h>
+#include <linux/pmem.h>
 #include <linux/sched.h>
 #include <linux/uio.h>
 #include <linux/vmstat.h>
@@ -46,10 +48,7 @@ int dax_clear_blocks(struct inode *inode, sector_t block, long size)
 			unsigned pgsz = PAGE_SIZE - offset_in_page(addr);
 			if (pgsz > count)
 				pgsz = count;
-			if (pgsz < PAGE_SIZE)
-				memset(addr, 0, pgsz);
-			else
-				clear_page(addr);
+			clear_pmem((void __pmem *)addr, pgsz);
 			addr += pgsz;
 			size -= pgsz;
 			count -= pgsz;
@@ -59,6 +58,7 @@ int dax_clear_blocks(struct inode *inode, sector_t block, long size)
 		}
 	} while (size);
 
+	wmb_pmem();
 	return 0;
 }
 EXPORT_SYMBOL_GPL(dax_clear_blocks);
@@ -70,15 +70,16 @@ static long dax_get_addr(struct buffer_head *bh, void **addr, unsigned blkbits)
 	return bdev_direct_access(bh->b_bdev, sector, addr, &pfn, bh->b_size);
 }
 
+/* the clear_pmem() calls are ordered by a wmb_pmem() in the caller */
 static void dax_new_buf(void *addr, unsigned size, unsigned first, loff_t pos,
 			loff_t end)
 {
 	loff_t final = end - pos + first; /* The final byte of the buffer */
 
 	if (first > 0)
-		memset(addr, 0, first);
+		clear_pmem((void __pmem *)addr, first);
 	if (final < size)
-		memset(addr + final, 0, size - final);
+		clear_pmem((void __pmem *)addr + final, size - final);
 }
 
 static bool buffer_written(struct buffer_head *bh)
@@ -108,6 +109,7 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
 	loff_t bh_max = start;
 	void *addr;
 	bool hole = false;
+	bool need_wmb = false;
 
 	if (iov_iter_rw(iter) != WRITE)
 		end = min(end, i_size_read(inode));
@@ -145,18 +147,27 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
 				retval = dax_get_addr(bh, &addr, blkbits);
 				if (retval < 0)
 					break;
-				if (buffer_unwritten(bh) || buffer_new(bh))
+				if (buffer_unwritten(bh) || buffer_new(bh)) {
 					dax_new_buf(addr, retval, first, pos,
 									end);
+					need_wmb = true;
+				}
 				addr += first;
 				size = retval - first;
 			}
 			max = min(pos + size, end);
 		}
 
-		if (iov_iter_rw(iter) == WRITE)
+		if (iov_iter_rw(iter) == WRITE) {
 			len = copy_from_iter_nocache(addr, max - pos, iter);
-		else if (!hole)
+			/*
+			 * copy_from_iter_nocache() uses non-temporal stores
+			 * for iovec iterators so we can skip the write back.
+			 */
+			if (!iter_is_iovec(iter))
+				wb_cache_pmem((void __pmem *)addr, max - pos);
+			need_wmb = true;
+		} else if (!hole)
 			len = copy_to_iter(addr, max - pos, iter);
 		else
 			len = iov_iter_zero(max - pos, iter);
@@ -168,6 +179,9 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
 		addr += len;
 	}
 
+	if (need_wmb)
+		wmb_pmem();
+
 	return (pos == start) ? retval : pos - start;
 }
 
@@ -300,8 +314,10 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
 		goto out;
 	}
 
-	if (buffer_unwritten(bh) || buffer_new(bh))
-		clear_page(addr);
+	if (buffer_unwritten(bh) || buffer_new(bh)) {
+		clear_pmem((void __pmem *)addr, PAGE_SIZE);
+		wmb_pmem();
+	}
 
 	error = vm_insert_mixed(vma, vaddr, pfn);
 
@@ -608,7 +624,9 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
 		if (buffer_unwritten(&bh) || buffer_new(&bh)) {
 			int i;
 			for (i = 0; i < PTRS_PER_PMD; i++)
-				clear_page(kaddr + i * PAGE_SIZE);
+				clear_pmem((void __pmem *)kaddr + i*PAGE_SIZE,
+						PAGE_SIZE);
+			wmb_pmem();
 			count_vm_event(PGMAJFAULT);
 			mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
 			result |= VM_FAULT_MAJOR;
@@ -720,7 +738,8 @@ int dax_zero_page_range(struct inode *inode, loff_t from, unsigned length,
 		err = dax_get_addr(&bh, &addr, inode->i_blkbits);
 		if (err < 0)
 			return err;
-		memset(addr + offset, 0, length);
+		clear_pmem((void __pmem *)addr + offset, length);
+		wmb_pmem();
 	}
 
 	return 0;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 7/7] pmem, dax: have direct_access use __pmem annotation
  2015-08-13 16:51 [PATCH v2 0/7] dax: I/O path enhancements Ross Zwisler
                   ` (5 preceding siblings ...)
  2015-08-13 16:51 ` [PATCH v2 6/7] dax: update I/O path to do proper PMEM flushing Ross Zwisler
@ 2015-08-13 16:51 ` Ross Zwisler
  2015-08-13 21:20   ` Dan Williams
  2015-08-15  9:19   ` Christoph Hellwig
  6 siblings, 2 replies; 21+ messages in thread
From: Ross Zwisler @ 2015-08-13 16:51 UTC (permalink / raw)
  To: linux-kernel, linux-nvdimm, Dan Williams
  Cc: Ross Zwisler, Jonathan Corbet, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Jens Axboe, Martin Schwidefsky,
	Heiko Carstens, linux390, Alexander Viro, Matthew Wilcox,
	Jeff Layton, Christoph Hellwig, Andrew Morton, Omar Sandoval,
	Boaz Harrosh, Miklos Szeredi, Jan Kara, Wolfram Sang,
	Uwe Kleine-König, Gerald Schaefer, Ameen Ali,
	Martin K. Petersen, Sagi Grimberg, Mike Snitzer, Tejun Heo,
	Shaohua Li, Ming Lei, linux-doc, linuxppc-dev, linux-s390,
	linux-fsdevel, Dave Chinner

Update the annotation for the kaddr pointer returned by direct_access()
so that it is a __pmem pointer.  This is consistent with the PMEM driver
and with how this direct_access() pointer is used in the DAX code.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
---
 Documentation/filesystems/Locking |  3 ++-
 arch/powerpc/sysdev/axonram.c     |  7 ++++---
 drivers/block/brd.c               |  4 ++--
 drivers/nvdimm/pmem.c             |  4 ++--
 drivers/s390/block/dcssblk.c      | 10 +++++----
 fs/block_dev.c                    |  2 +-
 fs/dax.c                          | 44 +++++++++++++++++++++------------------
 include/linux/blkdev.h            |  8 +++----
 8 files changed, 45 insertions(+), 37 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 6a34a0f..06d4434 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -397,7 +397,8 @@ prototypes:
 	int (*release) (struct gendisk *, fmode_t);
 	int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
 	int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
-	int (*direct_access) (struct block_device *, sector_t, void **, unsigned long *);
+	int (*direct_access) (struct block_device *, sector_t, void __pmem **,
+				unsigned long *);
 	int (*media_changed) (struct gendisk *);
 	void (*unlock_native_capacity) (struct gendisk *);
 	int (*revalidate_disk) (struct gendisk *);
diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index ee90db1..a2be2a6 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -141,13 +141,14 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio)
  */
 static long
 axon_ram_direct_access(struct block_device *device, sector_t sector,
-		       void **kaddr, unsigned long *pfn, long size)
+		       void __pmem **kaddr, unsigned long *pfn, long size)
 {
 	struct axon_ram_bank *bank = device->bd_disk->private_data;
 	loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
+	void *addr = (void *)(bank->ph_addr + offset);
 
-	*kaddr = (void *)(bank->ph_addr + offset);
-	*pfn = virt_to_phys(*kaddr) >> PAGE_SHIFT;
+	*kaddr = (void __pmem *)addr;
+	*pfn = virt_to_phys(addr) >> PAGE_SHIFT;
 
 	return bank->size - offset;
 }
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 5750b39..2691bb6 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -371,7 +371,7 @@ static int brd_rw_page(struct block_device *bdev, sector_t sector,
 
 #ifdef CONFIG_BLK_DEV_RAM_DAX
 static long brd_direct_access(struct block_device *bdev, sector_t sector,
-			void **kaddr, unsigned long *pfn, long size)
+			void __pmem **kaddr, unsigned long *pfn, long size)
 {
 	struct brd_device *brd = bdev->bd_disk->private_data;
 	struct page *page;
@@ -381,7 +381,7 @@ static long brd_direct_access(struct block_device *bdev, sector_t sector,
 	page = brd_insert_page(brd, sector);
 	if (!page)
 		return -ENOSPC;
-	*kaddr = page_address(page);
+	*kaddr = (void __pmem *)page_address(page);
 	*pfn = page_to_pfn(page);
 
 	/*
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index ade9eb9..68f6a6a 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -92,7 +92,7 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
 }
 
 static long pmem_direct_access(struct block_device *bdev, sector_t sector,
-			      void **kaddr, unsigned long *pfn, long size)
+		      void __pmem **kaddr, unsigned long *pfn, long size)
 {
 	struct pmem_device *pmem = bdev->bd_disk->private_data;
 	size_t offset = sector << 9;
@@ -101,7 +101,7 @@ static long pmem_direct_access(struct block_device *bdev, sector_t sector,
 		return -ENODEV;
 
 	/* FIXME convert DAX to comprehend that this mapping has a lifetime */
-	*kaddr = (void __force *) pmem->virt_addr + offset;
+	*kaddr = pmem->virt_addr + offset;
 	*pfn = (pmem->phys_addr + offset) >> PAGE_SHIFT;
 
 	return pmem->size - offset;
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index da21281..2c5a397 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -29,7 +29,7 @@ static int dcssblk_open(struct block_device *bdev, fmode_t mode);
 static void dcssblk_release(struct gendisk *disk, fmode_t mode);
 static void dcssblk_make_request(struct request_queue *q, struct bio *bio);
 static long dcssblk_direct_access(struct block_device *bdev, sector_t secnum,
-				 void **kaddr, unsigned long *pfn, long size);
+			 void __pmem **kaddr, unsigned long *pfn, long size);
 
 static char dcssblk_segments[DCSSBLK_PARM_LEN] = "\0";
 
@@ -879,18 +879,20 @@ fail:
 
 static long
 dcssblk_direct_access (struct block_device *bdev, sector_t secnum,
-			void **kaddr, unsigned long *pfn, long size)
+			void __pmem **kaddr, unsigned long *pfn, long size)
 {
 	struct dcssblk_dev_info *dev_info;
 	unsigned long offset, dev_sz;
+	void *addr;
 
 	dev_info = bdev->bd_disk->private_data;
 	if (!dev_info)
 		return -ENODEV;
 	dev_sz = dev_info->end - dev_info->start;
 	offset = secnum * 512;
-	*kaddr = (void *) (dev_info->start + offset);
-	*pfn = virt_to_phys(*kaddr) >> PAGE_SHIFT;
+	addr = (void *) (dev_info->start + offset);
+	*pfn = virt_to_phys(addr) >> PAGE_SHIFT;
+	*kaddr = (void __pmem *) addr;
 
 	return dev_sz - offset;
 }
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 9be2d7e..4ab366d 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -442,7 +442,7 @@ EXPORT_SYMBOL_GPL(bdev_write_page);
  * accessible at this address.
  */
 long bdev_direct_access(struct block_device *bdev, sector_t sector,
-			void **addr, unsigned long *pfn, long size)
+			void __pmem **addr, unsigned long *pfn, long size)
 {
 	long avail;
 	const struct block_device_operations *ops = bdev->bd_disk->fops;
diff --git a/fs/dax.c b/fs/dax.c
index ea1b2c8..633a1ba 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -36,7 +36,7 @@ int dax_clear_blocks(struct inode *inode, sector_t block, long size)
 
 	might_sleep();
 	do {
-		void *addr;
+		void __pmem *addr;
 		unsigned long pfn;
 		long count;
 
@@ -48,7 +48,7 @@ int dax_clear_blocks(struct inode *inode, sector_t block, long size)
 			unsigned pgsz = PAGE_SIZE - offset_in_page(addr);
 			if (pgsz > count)
 				pgsz = count;
-			clear_pmem((void __pmem *)addr, pgsz);
+			clear_pmem(addr, pgsz);
 			addr += pgsz;
 			size -= pgsz;
 			count -= pgsz;
@@ -63,7 +63,8 @@ int dax_clear_blocks(struct inode *inode, sector_t block, long size)
 }
 EXPORT_SYMBOL_GPL(dax_clear_blocks);
 
-static long dax_get_addr(struct buffer_head *bh, void **addr, unsigned blkbits)
+static long dax_get_addr(struct buffer_head *bh, void __pmem **addr,
+		unsigned blkbits)
 {
 	unsigned long pfn;
 	sector_t sector = bh->b_blocknr << (blkbits - 9);
@@ -71,15 +72,15 @@ static long dax_get_addr(struct buffer_head *bh, void **addr, unsigned blkbits)
 }
 
 /* the clear_pmem() calls are ordered by a wmb_pmem() in the caller */
-static void dax_new_buf(void *addr, unsigned size, unsigned first, loff_t pos,
-			loff_t end)
+static void dax_new_buf(void __pmem *addr, unsigned size, unsigned first,
+		loff_t pos, loff_t end)
 {
 	loff_t final = end - pos + first; /* The final byte of the buffer */
 
 	if (first > 0)
-		clear_pmem((void __pmem *)addr, first);
+		clear_pmem(addr, first);
 	if (final < size)
-		clear_pmem((void __pmem *)addr + final, size - final);
+		clear_pmem(addr + final, size - final);
 }
 
 static bool buffer_written(struct buffer_head *bh)
@@ -107,7 +108,7 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
 	loff_t pos = start;
 	loff_t max = start;
 	loff_t bh_max = start;
-	void *addr;
+	void __pmem *addr;
 	bool hole = false;
 	bool need_wmb = false;
 
@@ -159,16 +160,18 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
 		}
 
 		if (iov_iter_rw(iter) == WRITE) {
-			len = copy_from_iter_nocache(addr, max - pos, iter);
+			len = copy_from_iter_nocache((void __force *)addr,
+					max - pos, iter);
 			/*
 			 * copy_from_iter_nocache() uses non-temporal stores
 			 * for iovec iterators so we can skip the write back.
 			 */
 			if (!iter_is_iovec(iter))
-				wb_cache_pmem((void __pmem *)addr, max - pos);
+				wb_cache_pmem(addr, max - pos);
 			need_wmb = true;
 		} else if (!hole)
-			len = copy_to_iter(addr, max - pos, iter);
+			len = copy_to_iter((void __force *)addr, max - pos,
+					iter);
 		else
 			len = iov_iter_zero(max - pos, iter);
 
@@ -274,11 +277,13 @@ static int dax_load_hole(struct address_space *mapping, struct page *page,
 static int copy_user_bh(struct page *to, struct buffer_head *bh,
 			unsigned blkbits, unsigned long vaddr)
 {
-	void *vfrom, *vto;
+	void __pmem *vfrom;
+	void *vto;
+
 	if (dax_get_addr(bh, &vfrom, blkbits) < 0)
 		return -EIO;
 	vto = kmap_atomic(to);
-	copy_user_page(vto, vfrom, vaddr, to);
+	copy_user_page(vto, (void __force *)vfrom, vaddr, to);
 	kunmap_atomic(vto);
 	return 0;
 }
@@ -288,7 +293,7 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
 {
 	sector_t sector = bh->b_blocknr << (inode->i_blkbits - 9);
 	unsigned long vaddr = (unsigned long)vmf->virtual_address;
-	void *addr;
+	void __pmem *addr;
 	unsigned long pfn;
 	pgoff_t size;
 	int error;
@@ -315,7 +320,7 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
 	}
 
 	if (buffer_unwritten(bh) || buffer_new(bh)) {
-		clear_pmem((void __pmem *)addr, PAGE_SIZE);
+		clear_pmem(addr, PAGE_SIZE);
 		wmb_pmem();
 	}
 
@@ -530,7 +535,7 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
 	unsigned long pmd_addr = address & PMD_MASK;
 	bool write = flags & FAULT_FLAG_WRITE;
 	long length;
-	void *kaddr;
+	void __pmem *kaddr;
 	pgoff_t size, pgoff;
 	sector_t block, sector;
 	unsigned long pfn;
@@ -624,8 +629,7 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
 		if (buffer_unwritten(&bh) || buffer_new(&bh)) {
 			int i;
 			for (i = 0; i < PTRS_PER_PMD; i++)
-				clear_pmem((void __pmem *)kaddr + i*PAGE_SIZE,
-						PAGE_SIZE);
+				clear_pmem(kaddr + i * PAGE_SIZE, PAGE_SIZE);
 			wmb_pmem();
 			count_vm_event(PGMAJFAULT);
 			mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
@@ -734,11 +738,11 @@ int dax_zero_page_range(struct inode *inode, loff_t from, unsigned length,
 	if (err < 0)
 		return err;
 	if (buffer_written(&bh)) {
-		void *addr;
+		void __pmem *addr;
 		err = dax_get_addr(&bh, &addr, inode->i_blkbits);
 		if (err < 0)
 			return err;
-		clear_pmem((void __pmem *)addr + offset, length);
+		clear_pmem(addr + offset, length);
 		wmb_pmem();
 	}
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d4068c1..c401ecd 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1555,8 +1555,8 @@ struct block_device_operations {
 	int (*rw_page)(struct block_device *, sector_t, struct page *, int rw);
 	int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
 	int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
-	long (*direct_access)(struct block_device *, sector_t,
-					void **, unsigned long *pfn, long size);
+	long (*direct_access)(struct block_device *, sector_t, void __pmem **,
+			unsigned long *pfn, long size);
 	unsigned int (*check_events) (struct gendisk *disk,
 				      unsigned int clearing);
 	/* ->media_changed() is DEPRECATED, use ->check_events() instead */
@@ -1574,8 +1574,8 @@ extern int __blkdev_driver_ioctl(struct block_device *, fmode_t, unsigned int,
 extern int bdev_read_page(struct block_device *, sector_t, struct page *);
 extern int bdev_write_page(struct block_device *, sector_t, struct page *,
 						struct writeback_control *);
-extern long bdev_direct_access(struct block_device *, sector_t, void **addr,
-						unsigned long *pfn, long size);
+extern long bdev_direct_access(struct block_device *, sector_t,
+		void __pmem **addr, unsigned long *pfn, long size);
 #else /* CONFIG_BLOCK */
 
 struct block_device;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 6/7] dax: update I/O path to do proper PMEM flushing
  2015-08-13 16:51 ` [PATCH v2 6/7] dax: update I/O path to do proper PMEM flushing Ross Zwisler
@ 2015-08-13 21:11   ` Dan Williams
  2015-08-14 16:48     ` Ross Zwisler
  0 siblings, 1 reply; 21+ messages in thread
From: Dan Williams @ 2015-08-13 21:11 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: linux-kernel, linux-nvdimm@lists.01.org, Matthew Wilcox,
	Alexander Viro, linux-fsdevel, Dave Chinner

On Thu, Aug 13, 2015 at 9:51 AM, Ross Zwisler
<ross.zwisler@linux.intel.com> wrote:
> Update the DAX I/O path so that all operations that store data (I/O
> writes, zeroing blocks, punching holes, etc.) properly synchronize the
> stores to media using the PMEM API.  This ensures that the data DAX is
> writing is durable on media before the operation completes.
>
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
[..]
> @@ -145,18 +147,27 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
>                                 retval = dax_get_addr(bh, &addr, blkbits);
>                                 if (retval < 0)
>                                         break;
> -                               if (buffer_unwritten(bh) || buffer_new(bh))
> +                               if (buffer_unwritten(bh) || buffer_new(bh)) {
>                                         dax_new_buf(addr, retval, first, pos,
>                                                                         end);
> +                                       need_wmb = true;
> +                               }
>                                 addr += first;
>                                 size = retval - first;
>                         }
>                         max = min(pos + size, end);
>                 }
>
> -               if (iov_iter_rw(iter) == WRITE)
> +               if (iov_iter_rw(iter) == WRITE) {
>                         len = copy_from_iter_nocache(addr, max - pos, iter);
> -               else if (!hole)
> +                       /*
> +                        * copy_from_iter_nocache() uses non-temporal stores
> +                        * for iovec iterators so we can skip the write back.
> +                        */
> +                       if (!iter_is_iovec(iter))
> +                               wb_cache_pmem((void __pmem *)addr, max - pos);
> +                       need_wmb = true;

I think this should become copy_from_iter_pmem() and hide the
wb_cache_pmem() as an internal arch detail.  I.e. wb_cache_pmem()
should not be a global api when its usage is architecture specific.
Otherwise are you asserting that all architecture implementations of
copy_from_iter_nocache() are pmem safe?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/7] pmem, dax: have direct_access use __pmem annotation
  2015-08-13 16:51 ` [PATCH v2 7/7] pmem, dax: have direct_access use __pmem annotation Ross Zwisler
@ 2015-08-13 21:20   ` Dan Williams
  2015-08-14 16:55     ` Ross Zwisler
  2015-08-15  9:19   ` Christoph Hellwig
  1 sibling, 1 reply; 21+ messages in thread
From: Dan Williams @ 2015-08-13 21:20 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: linux-kernel, linux-nvdimm@lists.01.org, Jonathan Corbet,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Jens Axboe, Martin Schwidefsky, Heiko Carstens, linux390,
	Alexander Viro, Matthew Wilcox, Jeff Layton, Christoph Hellwig,
	Andrew Morton, Omar Sandoval, Boaz Harrosh, Miklos Szeredi,
	Jan Kara, Wolfram Sang, Uwe Kleine-König, Gerald Schaefer,
	Ameen Ali, Martin K. Petersen, Sagi Grimberg, Mike Snitzer,
	Tejun Heo, Shaohua Li, Ming Lei, linux-doc, linuxppc-dev,
	linux-s390, linux-fsdevel, Dave Chinner

On Thu, Aug 13, 2015 at 9:51 AM, Ross Zwisler
<ross.zwisler@linux.intel.com> wrote:
> Update the annotation for the kaddr pointer returned by direct_access()
> so that it is a __pmem pointer.  This is consistent with the PMEM driver
> and with how this direct_access() pointer is used in the DAX code.
>
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> ---
>  Documentation/filesystems/Locking |  3 ++-
>  arch/powerpc/sysdev/axonram.c     |  7 ++++---
>  drivers/block/brd.c               |  4 ++--
>  drivers/nvdimm/pmem.c             |  4 ++--
>  drivers/s390/block/dcssblk.c      | 10 +++++----
>  fs/block_dev.c                    |  2 +-
>  fs/dax.c                          | 44 +++++++++++++++++++++------------------
>  include/linux/blkdev.h            |  8 +++----
>  8 files changed, 45 insertions(+), 37 deletions(-)
>
> diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
> index 6a34a0f..06d4434 100644
> --- a/Documentation/filesystems/Locking
> +++ b/Documentation/filesystems/Locking
> @@ -397,7 +397,8 @@ prototypes:
>         int (*release) (struct gendisk *, fmode_t);
>         int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
>         int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
> -       int (*direct_access) (struct block_device *, sector_t, void **, unsigned long *);
> +       int (*direct_access) (struct block_device *, sector_t, void __pmem **,
> +                               unsigned long *);

So this collides with the __pfn_t work.  I think the we have a
reasonable chance of getting that in to 4.3, so I'd wait to see if we
hit any major roadblocks with that set [1] before merging these.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-August/001803.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 6/7] dax: update I/O path to do proper PMEM flushing
  2015-08-13 21:11   ` Dan Williams
@ 2015-08-14 16:48     ` Ross Zwisler
  0 siblings, 0 replies; 21+ messages in thread
From: Ross Zwisler @ 2015-08-14 16:48 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-kernel, linux-nvdimm@lists.01.org, Matthew Wilcox,
	Alexander Viro, linux-fsdevel, Dave Chinner

On Thu, 2015-08-13 at 14:11 -0700, Dan Williams wrote:
> On Thu, Aug 13, 2015 at 9:51 AM, Ross Zwisler
> <ross.zwisler@linux.intel.com> wrote:
> > Update the DAX I/O path so that all operations that store data (I/O
> > writes, zeroing blocks, punching holes, etc.) properly synchronize the
> > stores to media using the PMEM API.  This ensures that the data DAX is
> > writing is durable on media before the operation completes.
> >
> > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> [..]
> > @@ -145,18 +147,27 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
> >                                 retval = dax_get_addr(bh, &addr, blkbits);
> >                                 if (retval < 0)
> >                                         break;
> > -                               if (buffer_unwritten(bh) || buffer_new(bh))
> > +                               if (buffer_unwritten(bh) || buffer_new(bh)) {
> >                                         dax_new_buf(addr, retval, first, pos,
> >                                                                         end);
> > +                                       need_wmb = true;
> > +                               }
> >                                 addr += first;
> >                                 size = retval - first;
> >                         }
> >                         max = min(pos + size, end);
> >                 }
> >
> > -               if (iov_iter_rw(iter) == WRITE)
> > +               if (iov_iter_rw(iter) == WRITE) {
> >                         len = copy_from_iter_nocache(addr, max - pos, iter);
> > -               else if (!hole)
> > +                       /*
> > +                        * copy_from_iter_nocache() uses non-temporal stores
> > +                        * for iovec iterators so we can skip the write back.
> > +                        */
> > +                       if (!iter_is_iovec(iter))
> > +                               wb_cache_pmem((void __pmem *)addr, max - pos);
> > +                       need_wmb = true;
> 
> I think this should become copy_from_iter_pmem() and hide the
> wb_cache_pmem() as an internal arch detail.  I.e. wb_cache_pmem()
> should not be a global api when its usage is architecture specific.
> Otherwise are you asserting that all architecture implementations of
> copy_from_iter_nocache() are pmem safe?

Great point.  Nope, copy_from_iter_nocache() uses __copy_from_user_nocache(),
which just defaults to __copy_from_user() on non-x86.  Dang, the PMEM API just
keeps growing...  :(



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/7] pmem, dax: have direct_access use __pmem annotation
  2015-08-13 21:20   ` Dan Williams
@ 2015-08-14 16:55     ` Ross Zwisler
  2015-08-14 16:58       ` Dan Williams
  0 siblings, 1 reply; 21+ messages in thread
From: Ross Zwisler @ 2015-08-14 16:55 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-kernel, linux-nvdimm@lists.01.org, Jonathan Corbet,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Jens Axboe, Martin Schwidefsky, Heiko Carstens, linux390,
	Alexander Viro, Matthew Wilcox, Jeff Layton, Christoph Hellwig,
	Andrew Morton, Omar Sandoval, Boaz Harrosh, Miklos Szeredi,
	Jan Kara, Wolfram Sang, Uwe Kleine-König, Gerald Schaefer,
	Ameen Ali, Martin K. Petersen, Sagi Grimberg, Mike Snitzer,
	Tejun Heo, Shaohua Li, Ming Lei, linux-doc, linuxppc-dev,
	linux-s390, linux-fsdevel, Dave Chinner

On Thu, 2015-08-13 at 14:20 -0700, Dan Williams wrote:
> On Thu, Aug 13, 2015 at 9:51 AM, Ross Zwisler
> <ross.zwisler@linux.intel.com> wrote:
> > Update the annotation for the kaddr pointer returned by direct_access()
> > so that it is a __pmem pointer.  This is consistent with the PMEM driver
> > and with how this direct_access() pointer is used in the DAX code.
> >
> > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > ---
> >  Documentation/filesystems/Locking |  3 ++-
> >  arch/powerpc/sysdev/axonram.c     |  7 ++++---
> >  drivers/block/brd.c               |  4 ++--
> >  drivers/nvdimm/pmem.c             |  4 ++--
> >  drivers/s390/block/dcssblk.c      | 10 +++++----
> >  fs/block_dev.c                    |  2 +-
> >  fs/dax.c                          | 44 +++++++++++++++++++++------------------
> >  include/linux/blkdev.h            |  8 +++----
> >  8 files changed, 45 insertions(+), 37 deletions(-)
> >
> > diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
> > index 6a34a0f..06d4434 100644
> > --- a/Documentation/filesystems/Locking
> > +++ b/Documentation/filesystems/Locking
> > @@ -397,7 +397,8 @@ prototypes:
> >         int (*release) (struct gendisk *, fmode_t);
> >         int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
> >         int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
> > -       int (*direct_access) (struct block_device *, sector_t, void **, unsigned long *);
> > +       int (*direct_access) (struct block_device *, sector_t, void __pmem **,
> > +                               unsigned long *);
> 
> So this collides with the __pfn_t work.  I think the we have a
> reasonable chance of getting that in to 4.3, so I'd wait to see if we
> hit any major roadblocks with that set [1] before merging these.
> 
> [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-August/001803.html

Fair enough.  Yea, I hadn't merged with that series yet a) because I didn't
know when its review cycle would settle down and b) because that series hadn't
pulled in changes from Matthew for PMD support, which I was originally using
as a baseline.

I'll merge with your code for v3.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/7] pmem, dax: have direct_access use __pmem annotation
  2015-08-14 16:55     ` Ross Zwisler
@ 2015-08-14 16:58       ` Dan Williams
  2015-08-15  9:11         ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Dan Williams @ 2015-08-14 16:58 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: linux-kernel, linux-nvdimm@lists.01.org, Jonathan Corbet,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Jens Axboe, Martin Schwidefsky, Heiko Carstens, linux390,
	Alexander Viro, Matthew Wilcox, Jeff Layton, Christoph Hellwig,
	Andrew Morton, Omar Sandoval, Boaz Harrosh, Miklos Szeredi,
	Jan Kara, Wolfram Sang, Uwe Kleine-König, Gerald Schaefer,
	Ameen Ali, Martin K. Petersen, Sagi Grimberg, Mike Snitzer,
	Tejun Heo, Shaohua Li, Ming Lei, linux-doc, linuxppc-dev,
	linux-s390, linux-fsdevel, Dave Chinner

On Fri, Aug 14, 2015 at 9:55 AM, Ross Zwisler
<ross.zwisler@linux.intel.com> wrote:
> On Thu, 2015-08-13 at 14:20 -0700, Dan Williams wrote:
>> On Thu, Aug 13, 2015 at 9:51 AM, Ross Zwisler
>> <ross.zwisler@linux.intel.com> wrote:
>> > Update the annotation for the kaddr pointer returned by direct_access()
>> > so that it is a __pmem pointer.  This is consistent with the PMEM driver
>> > and with how this direct_access() pointer is used in the DAX code.
>> >
>> > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
>> > ---
>> >  Documentation/filesystems/Locking |  3 ++-
>> >  arch/powerpc/sysdev/axonram.c     |  7 ++++---
>> >  drivers/block/brd.c               |  4 ++--
>> >  drivers/nvdimm/pmem.c             |  4 ++--
>> >  drivers/s390/block/dcssblk.c      | 10 +++++----
>> >  fs/block_dev.c                    |  2 +-
>> >  fs/dax.c                          | 44 +++++++++++++++++++++------------------
>> >  include/linux/blkdev.h            |  8 +++----
>> >  8 files changed, 45 insertions(+), 37 deletions(-)
>> >
>> > diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
>> > index 6a34a0f..06d4434 100644
>> > --- a/Documentation/filesystems/Locking
>> > +++ b/Documentation/filesystems/Locking
>> > @@ -397,7 +397,8 @@ prototypes:
>> >         int (*release) (struct gendisk *, fmode_t);
>> >         int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
>> >         int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
>> > -       int (*direct_access) (struct block_device *, sector_t, void **, unsigned long *);
>> > +       int (*direct_access) (struct block_device *, sector_t, void __pmem **,
>> > +                               unsigned long *);
>>
>> So this collides with the __pfn_t work.  I think the we have a
>> reasonable chance of getting that in to 4.3, so I'd wait to see if we
>> hit any major roadblocks with that set [1] before merging these.
>>
>> [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-August/001803.html
>
> Fair enough.  Yea, I hadn't merged with that series yet a) because I didn't
> know when its review cycle would settle down and b) because that series hadn't
> pulled in changes from Matthew for PMD support, which I was originally using
> as a baseline.
>
> I'll merge with your code for v3.

Sounds, let me go rebase the __pfn_t patches on -mm so we'all lined up
and collision free.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/7] pmem, x86: move x86 PMEM API to new pmem.h header
  2015-08-13 16:51 ` [PATCH v2 2/7] pmem, x86: move x86 PMEM API to new pmem.h header Ross Zwisler
@ 2015-08-15  8:58   ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2015-08-15  8:58 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: linux-kernel, linux-nvdimm, Dan Williams, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, Andrew Morton, Arnd Bergmann,
	Mauro Carvalho Chehab, Greg KH, David S. Miller, Joe Perches,
	Jiri Slaby, Tejun Heo, Borislav Petkov, Juergen Gross,
	Toshi Kani, Christoph Hellwig

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/7] pmem, dax: have direct_access use __pmem annotation
  2015-08-14 16:58       ` Dan Williams
@ 2015-08-15  9:11         ` Christoph Hellwig
  2015-08-15 15:49           ` Dan Williams
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2015-08-15  9:11 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ross Zwisler, linux-kernel, linux-nvdimm@lists.01.org,
	Jonathan Corbet, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Jens Axboe, Martin Schwidefsky, Heiko Carstens,
	linux390, Alexander Viro, Matthew Wilcox, Jeff Layton,
	Christoph Hellwig, Andrew Morton, Omar Sandoval, Boaz Harrosh,
	Miklos Szeredi, Jan Kara, Wolfram Sang, Uwe Kleine-König,
	Gerald Schaefer, Ameen Ali, Martin K. Petersen, Sagi Grimberg,
	Mike Snitzer, Tejun Heo, Shaohua Li, Ming Lei, linux-doc,
	linuxppc-dev, linux-s390, linux-fsdevel, Dave Chinner

On Fri, Aug 14, 2015 at 09:58:16AM -0700, Dan Williams wrote:
> > I'll merge with your code for v3.
> 
> Sounds, let me go rebase the __pfn_t patches on -mm so we'all lined up
> and collision free.

I'm doubt that we'll have PFN mapping ready for 4.3.  I'd rather see
Ross series goes first, and move the patch to remove the size argument
from ->direct access [1] over to this series as well.

[1] https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git/commit/?h=pfn&id=8e15e69fb9e61ac563c5a7ffd9dd9a7b545cced3

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/7] pmem, dax: have direct_access use __pmem annotation
  2015-08-13 16:51 ` [PATCH v2 7/7] pmem, dax: have direct_access use __pmem annotation Ross Zwisler
  2015-08-13 21:20   ` Dan Williams
@ 2015-08-15  9:19   ` Christoph Hellwig
  2015-08-15 15:44     ` Dan Williams
  1 sibling, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2015-08-15  9:19 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: linux-kernel, linux-nvdimm, Dan Williams, Jonathan Corbet,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Jens Axboe, Martin Schwidefsky, Heiko Carstens, linux390,
	Alexander Viro, Matthew Wilcox, Jeff Layton, Christoph Hellwig,
	Andrew Morton, Omar Sandoval, Boaz Harrosh, Miklos Szeredi,
	Jan Kara, Wolfram Sang, Uwe Kleine-König, Gerald Schaefer,
	Ameen Ali, Martin K. Petersen, Sagi Grimberg, Mike Snitzer,
	Tejun Heo, Shaohua Li, Ming Lei, linux-doc, linuxppc-dev,
	linux-s390, linux-fsdevel, Dave Chinner

On Thu, Aug 13, 2015 at 10:51:11AM -0600, Ross Zwisler wrote:
> Update the annotation for the kaddr pointer returned by direct_access()
> so that it is a __pmem pointer.  This is consistent with the PMEM driver
> and with how this direct_access() pointer is used in the DAX code.

IFF we stick to the __pmem annotations this looks good.

That beeing said I start to really dislike them.  We don't special
accesors to read/write from pmem, we just need to explicitly commit
it if we want to make it persistent.  So I really don't see the need
to treat it special and require all the force casts to and from the
attribute.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/7] pmem, dax: have direct_access use __pmem annotation
  2015-08-15  9:19   ` Christoph Hellwig
@ 2015-08-15 15:44     ` Dan Williams
  2015-08-15 16:00       ` Christoph Hellwig
  2015-08-17 20:07       ` Ross Zwisler
  0 siblings, 2 replies; 21+ messages in thread
From: Dan Williams @ 2015-08-15 15:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ross Zwisler, linux-kernel, linux-nvdimm@lists.01.org,
	Jonathan Corbet, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Jens Axboe, Martin Schwidefsky, Heiko Carstens,
	linux390, Alexander Viro, Matthew Wilcox, Jeff Layton,
	Andrew Morton, Omar Sandoval, Boaz Harrosh, Miklos Szeredi,
	Jan Kara, Wolfram Sang, Uwe Kleine-König, Gerald Schaefer,
	Ameen Ali, Martin K. Petersen, Sagi Grimberg, Mike Snitzer,
	Tejun Heo, Shaohua Li, Ming Lei, linux-doc, linuxppc-dev,
	linux-s390, linux-fsdevel, Dave Chinner

On Sat, Aug 15, 2015 at 2:19 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Thu, Aug 13, 2015 at 10:51:11AM -0600, Ross Zwisler wrote:
>> Update the annotation for the kaddr pointer returned by direct_access()
>> so that it is a __pmem pointer.  This is consistent with the PMEM driver
>> and with how this direct_access() pointer is used in the DAX code.
>
> IFF we stick to the __pmem annotations this looks good.
>
> That beeing said I start to really dislike them.  We don't special
> accesors to read/write from pmem, we just need to explicitly commit
> it if we want to make it persistent.  So I really don't see the need
> to treat it special and require all the force casts to and from the
> attribute.

I'm not going to put up much of a fight if it's really getting in the way....

That said, while we don't need special accessors we do need guarantees
that anything that has written to a persistent memory address has done
so in a way that wmb_pmem() is able to flush it.  It's more of a "I've
audited this code path for wmb_pmem() compatibility so use this api to
write to pmem."

Perhaps a better way to statically check for missed flushes might be
to have acquire_pmem_for_write() + release() annotations and the final
release does a wmb_pmem(), but as far as I can tell the sparse
acquire/release annotations don't stack.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/7] pmem, dax: have direct_access use __pmem annotation
  2015-08-15  9:11         ` Christoph Hellwig
@ 2015-08-15 15:49           ` Dan Williams
  0 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2015-08-15 15:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ross Zwisler, linux-kernel, linux-nvdimm@lists.01.org,
	Jonathan Corbet, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Jens Axboe, Martin Schwidefsky, Heiko Carstens,
	linux390, Alexander Viro, Matthew Wilcox, Jeff Layton,
	Andrew Morton, Omar Sandoval, Boaz Harrosh, Miklos Szeredi,
	Jan Kara, Wolfram Sang, Uwe Kleine-König, Gerald Schaefer,
	Ameen Ali, Martin K. Petersen, Sagi Grimberg, Mike Snitzer,
	Tejun Heo, Shaohua Li, Ming Lei, linux-doc, linuxppc-dev,
	linux-s390, linux-fsdevel, Dave Chinner

On Sat, Aug 15, 2015 at 2:11 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Fri, Aug 14, 2015 at 09:58:16AM -0700, Dan Williams wrote:
>> > I'll merge with your code for v3.
>>
>> Sounds, let me go rebase the __pfn_t patches on -mm so we'all lined up
>> and collision free.
>
> I'm doubt that we'll have PFN mapping ready for 4.3.  I'd rather see
> Ross series goes first, and move the patch to remove the size argument
> from ->direct access [1] over to this series as well.
>
> [1] https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git/commit/?h=pfn&id=8e15e69fb9e61ac563c5a7ffd9dd9a7b545cced3

Yes, let's do it.  The need for DAX persistence guarantees is a higher
priority than solving the DAX vs PMEM unbind bug.  Especially if we
can defer __pfn_t and get some basic struct page mapping into 4.3 as
well.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/7] pmem, dax: have direct_access use __pmem annotation
  2015-08-15 15:44     ` Dan Williams
@ 2015-08-15 16:00       ` Christoph Hellwig
  2015-08-15 18:05         ` Dan Williams
  2015-08-17 20:07       ` Ross Zwisler
  1 sibling, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2015-08-15 16:00 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, Ross Zwisler, linux-kernel,
	linux-nvdimm@lists.01.org, Jonathan Corbet,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Jens Axboe, Martin Schwidefsky, Heiko Carstens, linux390,
	Alexander Viro, Matthew Wilcox, Jeff Layton, Andrew Morton,
	Omar Sandoval, Boaz Harrosh, Miklos Szeredi, Jan Kara,
	Wolfram Sang, Uwe Kleine-König, Gerald Schaefer, Ameen Ali,
	Martin K. Petersen, Sagi Grimberg, Mike Snitzer, Tejun Heo,
	Shaohua Li, Ming Lei, linux-doc, linuxppc-dev, linux-s390,
	linux-fsdevel, Dave Chinner

On Sat, Aug 15, 2015 at 08:44:27AM -0700, Dan Williams wrote:
> That said, while we don't need special accessors we do need guarantees
> that anything that has written to a persistent memory address has done
> so in a way that wmb_pmem() is able to flush it.  It's more of a "I've
> audited this code path for wmb_pmem() compatibility so use this api to
> write to pmem."

I'm more worried about things where don't just do plain loads and stores
to a pmem region but DMA, which will end up as a nightmare of casts.

But we can wait and see how that evolves in the end..

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/7] pmem, dax: have direct_access use __pmem annotation
  2015-08-15 16:00       ` Christoph Hellwig
@ 2015-08-15 18:05         ` Dan Williams
  0 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2015-08-15 18:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ross Zwisler, linux-kernel, linux-nvdimm@lists.01.org,
	Jonathan Corbet, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Jens Axboe, Martin Schwidefsky, Heiko Carstens,
	linux390, Alexander Viro, Matthew Wilcox, Jeff Layton,
	Andrew Morton, Omar Sandoval, Boaz Harrosh, Miklos Szeredi,
	Jan Kara, Wolfram Sang, Uwe Kleine-König, Gerald Schaefer,
	Ameen Ali, Martin K. Petersen, Sagi Grimberg, Mike Snitzer,
	Tejun Heo, Shaohua Li, Ming Lei, linux-doc, linuxppc-dev,
	linux-s390, linux-fsdevel, Dave Chinner

On Sat, Aug 15, 2015 at 9:00 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Sat, Aug 15, 2015 at 08:44:27AM -0700, Dan Williams wrote:
>> That said, while we don't need special accessors we do need guarantees
>> that anything that has written to a persistent memory address has done
>> so in a way that wmb_pmem() is able to flush it.  It's more of a "I've
>> audited this code path for wmb_pmem() compatibility so use this api to
>> write to pmem."
>
> I'm more worried about things where don't just do plain loads and stores
> to a pmem region but DMA, which will end up as a nightmare of casts.
>
> But we can wait and see how that evolves in the end..

It's already not possible to do something like dma_map_single() on an
ioremap()'d address, so there currently are't any __iomem/DMA
collisions.  As long as DMA setup is relative to a physical address
resource I think we're ok.

Making sure a DMA is both complete and persistent though is a different problem.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/7] pmem, dax: have direct_access use __pmem annotation
  2015-08-15 15:44     ` Dan Williams
  2015-08-15 16:00       ` Christoph Hellwig
@ 2015-08-17 20:07       ` Ross Zwisler
  1 sibling, 0 replies; 21+ messages in thread
From: Ross Zwisler @ 2015-08-17 20:07 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, linux-kernel, linux-nvdimm@lists.01.org,
	Jonathan Corbet, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Jens Axboe, Martin Schwidefsky, Heiko Carstens,
	linux390, Alexander Viro, Matthew Wilcox, Jeff Layton,
	Andrew Morton, Omar Sandoval, Boaz Harrosh, Miklos Szeredi,
	Jan Kara, Wolfram Sang, Uwe Kleine-König, Gerald Schaefer,
	Ameen Ali, Martin K. Petersen, Sagi Grimberg, Mike Snitzer,
	Tejun Heo, Shaohua Li, Ming Lei, linux-doc, linuxppc-dev,
	linux-s390, linux-fsdevel, Dave Chinner

On Sat, 2015-08-15 at 08:44 -0700, Dan Williams wrote:
> On Sat, Aug 15, 2015 at 2:19 AM, Christoph Hellwig <hch@lst.de> wrote:
> > On Thu, Aug 13, 2015 at 10:51:11AM -0600, Ross Zwisler wrote:
> >> Update the annotation for the kaddr pointer returned by direct_access()
> >> so that it is a __pmem pointer.  This is consistent with the PMEM driver
> >> and with how this direct_access() pointer is used in the DAX code.
> >
> > IFF we stick to the __pmem annotations this looks good.
> >
> > That beeing said I start to really dislike them.  We don't special
> > accesors to read/write from pmem, we just need to explicitly commit
> > it if we want to make it persistent.  So I really don't see the need
> > to treat it special and require all the force casts to and from the
> > attribute.
> 
> I'm not going to put up much of a fight if it's really getting in the way....
> 
> That said, while we don't need special accessors we do need guarantees
> that anything that has written to a persistent memory address has done
> so in a way that wmb_pmem() is able to flush it.  It's more of a "I've
> audited this code path for wmb_pmem() compatibility so use this api to
> write to pmem."
> 
> Perhaps a better way to statically check for missed flushes might be
> to have acquire_pmem_for_write() + release() annotations and the final
> release does a wmb_pmem(), but as far as I can tell the sparse
> acquire/release annotations don't stack.

FWIW I've been on the fence about the __pmem annotations, but my current
thought is that we really do need a way of saying that stores to these
pointers need special care for wmb_pmem() to do its thing and that __pmem does
a reasonably good job of that.  If we can figure out a cooler way, such as the
write() + release() flow Dan is talking about, great.  But I think we need
something to keep us from making errors by storing to PMEM pointers and
leaving data in the processor cache.



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-08-17 20:07 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-13 16:51 [PATCH v2 0/7] dax: I/O path enhancements Ross Zwisler
2015-08-13 16:51 ` [PATCH v2 1/7] brd: make rd_size static Ross Zwisler
2015-08-13 16:51 ` [PATCH v2 2/7] pmem, x86: move x86 PMEM API to new pmem.h header Ross Zwisler
2015-08-15  8:58   ` Christoph Hellwig
2015-08-13 16:51 ` [PATCH v2 3/7] pmem: remove layer when calling arch_has_wmb_pmem() Ross Zwisler
2015-08-13 16:51 ` [PATCH v2 4/7] pmem, x86: clean up conditional pmem includes Ross Zwisler
2015-08-13 16:51 ` [PATCH v2 5/7] pmem: add wb_cache_pmem() and clear_pmem() Ross Zwisler
2015-08-13 16:51 ` [PATCH v2 6/7] dax: update I/O path to do proper PMEM flushing Ross Zwisler
2015-08-13 21:11   ` Dan Williams
2015-08-14 16:48     ` Ross Zwisler
2015-08-13 16:51 ` [PATCH v2 7/7] pmem, dax: have direct_access use __pmem annotation Ross Zwisler
2015-08-13 21:20   ` Dan Williams
2015-08-14 16:55     ` Ross Zwisler
2015-08-14 16:58       ` Dan Williams
2015-08-15  9:11         ` Christoph Hellwig
2015-08-15 15:49           ` Dan Williams
2015-08-15  9:19   ` Christoph Hellwig
2015-08-15 15:44     ` Dan Williams
2015-08-15 16:00       ` Christoph Hellwig
2015-08-15 18:05         ` Dan Williams
2015-08-17 20:07       ` Ross Zwisler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).