linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] pmem api, generic ioremap_cache, and memremap
@ 2015-05-30 18:59 Dan Williams
  2015-05-30 18:59 ` [PATCH v2 1/4] arch/*/asm/io.h: add ioremap_cache() to all architectures Dan Williams
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Dan Williams @ 2015-05-30 18:59 UTC (permalink / raw)
  To: arnd, mingo, bp, hpa, tglx, ross.zwisler, akpm
  Cc: jgross, x86, toshi.kani, linux-nvdimm, mcgrof, konrad.wilk,
	linux-kernel, stefan.bader, luto, linux-mm, geert, hmh, tj, hch

The pmem api is responsible for shepherding data out to persistent
media.  The pmem driver uses this api, when available, to assert that
data is durable by the time bio_endio() is invoked.  When an
architecture or cpu can not make persistence guarantees the driver warns
and falls back to "best effort" implementation.

Changes since v1 [1]:

1/ Rebase on tip/master + Toshi's ioremap_wt() patches and enable
   ioremap_cache() to be used generically in drivers.  Fix
   devm_ioremap_resource() in the process.

2/ Rather than add yet another instance of "force cast away __iomem for
   non-io-memory" take the opportunity to introduce memremap() for this use
   case and fix up the current users that botch their handling of the
   __iomem annotation.

3/ Mandate that consumers of the pmem api handle the case when archs, or
   cpus within an arch are not able to make durability guarantees for
   writes to persistent memory.  See pmem_ops in drivers/block/pmem.c

4/ Drop the persistent_flush() api as there are no users until the BLK
   driver is introduced, and even then it is not a "flush to persistence"
   it is an invalidation of a previous mmio aperture setting
   (io_flush_cache_range()).

5/ Add persistent_remap() to the pmem api for the arch to pick its
   desired memory type that corresponds to the assumptions of
   persistent_copy() and persistent_sync().

[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-May/000929.html

This boots and processes pmem writes on x86, cross-compile 0day results
are still pending.

---

Dan Williams (3):
      arch/*/asm/io.h: add ioremap_cache() to all architectures
      devm: fix ioremap_cache() usage
      arch: introduce memremap()

Ross Zwisler (1):
      arch, x86: cache management apis for persistent memory


 arch/arc/include/asm/io.h            |    1 
 arch/arm/include/asm/io.h            |    2 +
 arch/arm64/include/asm/io.h          |    2 +
 arch/arm64/kernel/efi.c              |    4 +
 arch/arm64/kernel/smp_spin_table.c   |   10 ++--
 arch/avr32/include/asm/io.h          |    1 
 arch/frv/include/asm/io.h            |    6 ++
 arch/m32r/include/asm/io.h           |    1 
 arch/m68k/include/asm/io_mm.h        |    7 +++
 arch/m68k/include/asm/io_no.h        |    5 ++
 arch/metag/include/asm/io.h          |    5 ++
 arch/microblaze/include/asm/io.h     |    1 
 arch/mn10300/include/asm/io.h        |    1 
 arch/nios2/include/asm/io.h          |    1 
 arch/s390/include/asm/io.h           |    1 
 arch/sparc/include/asm/io_32.h       |    1 
 arch/sparc/include/asm/io_64.h       |    1 
 arch/tile/include/asm/io.h           |    1 
 arch/x86/Kconfig                     |    1 
 arch/x86/include/asm/cacheflush.h    |   24 +++++++++
 arch/x86/include/asm/io.h            |    7 +++
 arch/x86/kernel/crash_dump_64.c      |    6 +-
 arch/x86/kernel/kdebugfs.c           |    8 +--
 arch/x86/kernel/ksysfs.c             |   28 +++++-----
 arch/x86/mm/ioremap.c                |   10 +---
 arch/xtensa/include/asm/io.h         |    3 +
 drivers/acpi/apei/einj.c             |    8 +--
 drivers/acpi/apei/erst.c             |   14 +++--
 drivers/block/pmem.c                 |   62 +++++++++++++++++++++--
 drivers/firmware/google/memconsole.c |    4 +
 include/asm-generic/io.h             |    8 +++
 include/asm-generic/iomap.h          |    4 +
 include/linux/device.h               |    5 ++
 include/linux/io.h                   |   38 ++++++++++++++
 include/linux/pmem.h                 |   93 ++++++++++++++++++++++++++++++++++
 lib/Kconfig                          |    3 +
 lib/devres.c                         |   48 ++++++++----------
 37 files changed, 347 insertions(+), 78 deletions(-)
 create mode 100644 include/linux/pmem.h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/4] arch/*/asm/io.h: add ioremap_cache() to all architectures
  2015-05-30 18:59 [PATCH v2 0/4] pmem api, generic ioremap_cache, and memremap Dan Williams
@ 2015-05-30 18:59 ` Dan Williams
  2015-06-01 22:36   ` Toshi Kani
  2015-05-30 18:59 ` [PATCH v2 2/4] devm: fix ioremap_cache() usage Dan Williams
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Dan Williams @ 2015-05-30 18:59 UTC (permalink / raw)
  To: arnd, mingo, bp, hpa, tglx, ross.zwisler, akpm
  Cc: jgross, x86, toshi.kani, linux-nvdimm, mcgrof, konrad.wilk,
	linux-kernel, stefan.bader, luto, linux-mm, geert, hmh, tj, hch

Similar to ioremap_wc() let architecture implementations optionally
provide ioremap_cache().  As is, current ioremap_cache() users have
architecture dependencies that prevent them from compiling on archs
without ioremap_cache().  In some cases the architectures that have a
cached ioremap() capability have an identifier other than
"ioremap_cache".

Allow drivers to compile with ioremap_cache() support and fallback to a
safe / uncached ioremap otherwise.

Cc: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/arc/include/asm/io.h        |    1 +
 arch/arm/include/asm/io.h        |    2 ++
 arch/arm64/include/asm/io.h      |    2 ++
 arch/avr32/include/asm/io.h      |    1 +
 arch/frv/include/asm/io.h        |    6 ++++++
 arch/m32r/include/asm/io.h       |    1 +
 arch/m68k/include/asm/io_mm.h    |    7 +++++++
 arch/m68k/include/asm/io_no.h    |    5 +++++
 arch/metag/include/asm/io.h      |    5 +++++
 arch/microblaze/include/asm/io.h |    1 +
 arch/mn10300/include/asm/io.h    |    1 +
 arch/nios2/include/asm/io.h      |    1 +
 arch/s390/include/asm/io.h       |    1 +
 arch/sparc/include/asm/io_32.h   |    1 +
 arch/sparc/include/asm/io_64.h   |    1 +
 arch/tile/include/asm/io.h       |    1 +
 arch/x86/include/asm/io.h        |    1 +
 arch/xtensa/include/asm/io.h     |    3 +++
 include/asm-generic/io.h         |    8 ++++++++
 include/asm-generic/iomap.h      |    4 ++++
 20 files changed, 53 insertions(+)

diff --git a/arch/arc/include/asm/io.h b/arch/arc/include/asm/io.h
index 7cc4ced5dbf4..6b6f5a47acec 100644
--- a/arch/arc/include/asm/io.h
+++ b/arch/arc/include/asm/io.h
@@ -19,6 +19,7 @@ extern void __iomem *ioremap_prot(phys_addr_t offset, unsigned long size,
 extern void iounmap(const void __iomem *addr);
 
 #define ioremap_nocache(phy, sz)	ioremap(phy, sz)
+#define ioremap_cache(phy, sz)		ioremap(phy, sz)
 #define ioremap_wc(phy, sz)		ioremap(phy, sz)
 #define ioremap_wt(phy, sz)		ioremap(phy, sz)
 
diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index 1b7677d1e5e1..5e2c5cbdffdc 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -23,6 +23,8 @@
 
 #ifdef __KERNEL__
 
+#define ARCH_HAS_IOREMAP_CACHE
+
 #include <linux/types.h>
 #include <linux/blk_types.h>
 #include <asm/byteorder.h>
diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index 7116d3973058..598fd5c22240 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -21,6 +21,8 @@
 
 #ifdef __KERNEL__
 
+#define ARCH_HAS_IOREMAP_CACHE
+
 #include <linux/types.h>
 #include <linux/blk_types.h>
 
diff --git a/arch/avr32/include/asm/io.h b/arch/avr32/include/asm/io.h
index e998ff5d8e1a..c6994d880dbd 100644
--- a/arch/avr32/include/asm/io.h
+++ b/arch/avr32/include/asm/io.h
@@ -297,6 +297,7 @@ extern void __iounmap(void __iomem *addr);
 
 #define ioremap_wc ioremap_nocache
 #define ioremap_wt ioremap_nocache
+#define ioremap_cache ioremap_nocache
 
 #define cached(addr) P1SEGADDR(addr)
 #define uncached(addr) P2SEGADDR(addr)
diff --git a/arch/frv/include/asm/io.h b/arch/frv/include/asm/io.h
index a31b63ec4930..cd841f852af3 100644
--- a/arch/frv/include/asm/io.h
+++ b/arch/frv/include/asm/io.h
@@ -18,6 +18,7 @@
 #ifdef __KERNEL__
 
 #define ARCH_HAS_IOREMAP_WT
+#define ARCH_HAS_IOREMAP_CACHE
 
 #include <linux/types.h>
 #include <asm/virtconvert.h>
@@ -277,6 +278,11 @@ static inline void __iomem *ioremap_fullcache(unsigned long physaddr, unsigned l
 	return __ioremap(physaddr, size, IOMAP_FULL_CACHING);
 }
 
+static inline void __iomem *ioremap_cache(unsigned long physaddr, unsigned long size)
+{
+	return __ioremap(physaddr, size, IOMAP_FULL_CACHING);
+}
+
 #define ioremap_wc ioremap_nocache
 
 extern void iounmap(void volatile __iomem *addr);
diff --git a/arch/m32r/include/asm/io.h b/arch/m32r/include/asm/io.h
index 0c3f25ee3381..f3eceeac25c8 100644
--- a/arch/m32r/include/asm/io.h
+++ b/arch/m32r/include/asm/io.h
@@ -67,6 +67,7 @@ static inline void __iomem *ioremap(unsigned long offset, unsigned long size)
 
 extern void iounmap(volatile void __iomem *addr);
 #define ioremap_nocache(off,size) ioremap(off,size)
+#define ioremap_cache ioremap_nocache
 #define ioremap_wc ioremap_nocache
 #define ioremap_wt ioremap_nocache
 
diff --git a/arch/m68k/include/asm/io_mm.h b/arch/m68k/include/asm/io_mm.h
index 618c85d3c786..aaf1009f2f94 100644
--- a/arch/m68k/include/asm/io_mm.h
+++ b/arch/m68k/include/asm/io_mm.h
@@ -21,6 +21,7 @@
 #ifdef __KERNEL__
 
 #define ARCH_HAS_IOREMAP_WT
+#define ARCH_HAS_IOREMAP_CACHE
 
 #include <linux/compiler.h>
 #include <asm/raw_io.h>
@@ -478,6 +479,12 @@ static inline void __iomem *ioremap_fullcache(unsigned long physaddr,
 	return __ioremap(physaddr, size, IOMAP_FULL_CACHING);
 }
 
+static inline void __iomem *ioremap_cache(unsigned long physaddr,
+		unsigned long size)
+{
+	return __ioremap(physaddr, size, IOMAP_FULL_CACHING);
+}
+
 static inline void memset_io(volatile void __iomem *addr, unsigned char val, int count)
 {
 	__builtin_memset((void __force *) addr, val, count);
diff --git a/arch/m68k/include/asm/io_no.h b/arch/m68k/include/asm/io_no.h
index ad7bd40e6742..020483566b73 100644
--- a/arch/m68k/include/asm/io_no.h
+++ b/arch/m68k/include/asm/io_no.h
@@ -4,6 +4,7 @@
 #ifdef __KERNEL__
 
 #define ARCH_HAS_IOREMAP_WT
+#define ARCH_HAS_IOREMAP_CACHE
 
 #include <asm/virtconvert.h>
 #include <asm-generic/iomap.h>
@@ -163,6 +164,10 @@ static inline void *ioremap_fullcache(unsigned long physaddr, unsigned long size
 {
 	return __ioremap(physaddr, size, IOMAP_FULL_CACHING);
 }
+static inline void *ioremap_cache(unsigned long physaddr, unsigned long size)
+{
+	return __ioremap(physaddr, size, IOMAP_FULL_CACHING);
+}
 
 #define	iounmap(addr)	do { } while(0)
 
diff --git a/arch/metag/include/asm/io.h b/arch/metag/include/asm/io.h
index 9890f21eadbe..d9b2873e3ca8 100644
--- a/arch/metag/include/asm/io.h
+++ b/arch/metag/include/asm/io.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_METAG_IO_H
 #define _ASM_METAG_IO_H
 
+#define ARCH_HAS_IOREMAP_CACHE
+
 #include <linux/types.h>
 #include <asm/pgtable-bits.h>
 
@@ -157,6 +159,9 @@ extern void __iounmap(void __iomem *addr);
 #define ioremap_cached(offset, size)            \
 	__ioremap((offset), (size), _PAGE_CACHEABLE)
 
+#define ioremap_cache(offset, size)            \
+	__ioremap((offset), (size), _PAGE_CACHEABLE)
+
 #define ioremap_wc(offset, size)                \
 	__ioremap((offset), (size), _PAGE_WR_COMBINE)
 
diff --git a/arch/microblaze/include/asm/io.h b/arch/microblaze/include/asm/io.h
index 39b6315db82e..986cc0c9e67f 100644
--- a/arch/microblaze/include/asm/io.h
+++ b/arch/microblaze/include/asm/io.h
@@ -43,6 +43,7 @@ extern void __iomem *ioremap(phys_addr_t address, unsigned long size);
 #define ioremap_fullcache(addr, size)		ioremap((addr), (size))
 #define ioremap_wc(addr, size)			ioremap((addr), (size))
 #define ioremap_wt(addr, size)			ioremap((addr), (size))
+#define ioremap_cache(addr, size)		ioremap((addr), (size))
 
 #endif /* CONFIG_MMU */
 
diff --git a/arch/mn10300/include/asm/io.h b/arch/mn10300/include/asm/io.h
index 07c5b4a3903b..dcab414f40df 100644
--- a/arch/mn10300/include/asm/io.h
+++ b/arch/mn10300/include/asm/io.h
@@ -283,6 +283,7 @@ static inline void __iomem *ioremap_nocache(unsigned long offset, unsigned long
 
 #define ioremap_wc ioremap_nocache
 #define ioremap_wt ioremap_nocache
+#define ioremap_cache ioremap_nocache
 
 static inline void iounmap(void __iomem *addr)
 {
diff --git a/arch/nios2/include/asm/io.h b/arch/nios2/include/asm/io.h
index c5a62da22cd2..367e2ea7663a 100644
--- a/arch/nios2/include/asm/io.h
+++ b/arch/nios2/include/asm/io.h
@@ -47,6 +47,7 @@ static inline void iounmap(void __iomem *addr)
 
 #define ioremap_wc ioremap_nocache
 #define ioremap_wt ioremap_nocache
+#define ioremap_cache ioremap_nocache
 
 /* Pages to physical address... */
 #define page_to_phys(page)	virt_to_phys(page_to_virt(page))
diff --git a/arch/s390/include/asm/io.h b/arch/s390/include/asm/io.h
index cb5fdf3a78fc..6824c3daa2a1 100644
--- a/arch/s390/include/asm/io.h
+++ b/arch/s390/include/asm/io.h
@@ -30,6 +30,7 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr);
 #define ioremap_nocache(addr, size)	ioremap(addr, size)
 #define ioremap_wc			ioremap_nocache
 #define ioremap_wt			ioremap_nocache
+#define ioremap_cache			ioremap_nocache
 
 static inline void __iomem *ioremap(unsigned long offset, unsigned long size)
 {
diff --git a/arch/sparc/include/asm/io_32.h b/arch/sparc/include/asm/io_32.h
index 57f26c398dc9..b9a734caf57d 100644
--- a/arch/sparc/include/asm/io_32.h
+++ b/arch/sparc/include/asm/io_32.h
@@ -128,6 +128,7 @@ static inline void sbus_memcpy_toio(volatile void __iomem *dst,
  */
 void __iomem *ioremap(unsigned long offset, unsigned long size);
 #define ioremap_nocache(X,Y)	ioremap((X),(Y))
+#define ioremap_cache(X,Y)	ioremap((X),(Y))
 #define ioremap_wc(X,Y)		ioremap((X),(Y))
 #define ioremap_wt(X,Y)		ioremap((X),(Y))
 void iounmap(volatile void __iomem *addr);
diff --git a/arch/sparc/include/asm/io_64.h b/arch/sparc/include/asm/io_64.h
index c32fa3f752c8..0a0899d505b4 100644
--- a/arch/sparc/include/asm/io_64.h
+++ b/arch/sparc/include/asm/io_64.h
@@ -401,6 +401,7 @@ static inline void __iomem *ioremap(unsigned long offset, unsigned long size)
 }
 
 #define ioremap_nocache(X,Y)		ioremap((X),(Y))
+#define ioremap_cache(X,Y)		ioremap((X),(Y))
 #define ioremap_wc(X,Y)			ioremap((X),(Y))
 #define ioremap_wt(X,Y)			ioremap((X),(Y))
 
diff --git a/arch/tile/include/asm/io.h b/arch/tile/include/asm/io.h
index dc61de15c1f9..fe853a135e25 100644
--- a/arch/tile/include/asm/io.h
+++ b/arch/tile/include/asm/io.h
@@ -53,6 +53,7 @@ extern void iounmap(volatile void __iomem *addr);
 #endif
 
 #define ioremap_nocache(physaddr, size)		ioremap(physaddr, size)
+#define ioremap_cache(physaddr, size)		ioremap(physaddr, size)
 #define ioremap_wc(physaddr, size)		ioremap(physaddr, size)
 #define ioremap_wt(physaddr, size)		ioremap(physaddr, size)
 #define ioremap_fullcache(physaddr, size)	ioremap(physaddr, size)
diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 6c3a130de503..956f2768bdc1 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -36,6 +36,7 @@
 
 #define ARCH_HAS_IOREMAP_WC
 #define ARCH_HAS_IOREMAP_WT
+#define ARCH_HAS_IOREMAP_CACHE
 
 #include <linux/string.h>
 #include <linux/compiler.h>
diff --git a/arch/xtensa/include/asm/io.h b/arch/xtensa/include/asm/io.h
index c39bb6e61911..f91a8a99aa29 100644
--- a/arch/xtensa/include/asm/io.h
+++ b/arch/xtensa/include/asm/io.h
@@ -12,6 +12,9 @@
 #define _XTENSA_IO_H
 
 #ifdef __KERNEL__
+
+#define ARCH_HAS_IOREMAP_CACHE
+
 #include <asm/byteorder.h>
 #include <asm/page.h>
 #include <asm/vectors.h>
diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index f56094cfdeff..a0665dfcab47 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -793,6 +793,14 @@ static inline void __iomem *ioremap_wt(phys_addr_t offset, size_t size)
 }
 #endif
 
+#ifndef ioremap_cache
+#define ioremap_cache ioremap_cache
+static inline void __iomem *ioremap_cache(phys_addr_t offset, size_t size)
+{
+	return ioremap_nocache(offset, size);
+}
+#endif
+
 #ifndef iounmap
 #define iounmap iounmap
 
diff --git a/include/asm-generic/iomap.h b/include/asm-generic/iomap.h
index d8f8622fa044..f0f30464cecd 100644
--- a/include/asm-generic/iomap.h
+++ b/include/asm-generic/iomap.h
@@ -70,6 +70,10 @@ extern void ioport_unmap(void __iomem *);
 #define ioremap_wt ioremap_nocache
 #endif
 
+#ifndef ARCH_HAS_IOREMAP_CACHE
+#define ioremap_cache ioremap_nocache
+#endif
+
 #ifdef CONFIG_PCI
 /* Destroy a virtual mapping cookie for a PCI BAR (memory or IO) */
 struct pci_dev;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 2/4] devm: fix ioremap_cache() usage
  2015-05-30 18:59 [PATCH v2 0/4] pmem api, generic ioremap_cache, and memremap Dan Williams
  2015-05-30 18:59 ` [PATCH v2 1/4] arch/*/asm/io.h: add ioremap_cache() to all architectures Dan Williams
@ 2015-05-30 18:59 ` Dan Williams
  2015-05-30 20:52   ` Arnd Bergmann
  2015-05-30 18:59 ` [PATCH v2 3/4] arch: introduce memremap() Dan Williams
  2015-05-30 18:59 ` [PATCH v2 4/4] arch, x86: cache management apis for persistent memory Dan Williams
  3 siblings, 1 reply; 19+ messages in thread
From: Dan Williams @ 2015-05-30 18:59 UTC (permalink / raw)
  To: arnd, mingo, bp, hpa, tglx, ross.zwisler, akpm
  Cc: jgross, x86, toshi.kani, linux-nvdimm, mcgrof, konrad.wilk,
	linux-kernel, stefan.bader, luto, linux-mm, geert, hmh, tj, hch

Provide devm_ioremap_cache() and fix up devm_ioremap_resource() to
actually provide cacheable mappings.  On archs that implement
ioremap_cache() devm_ioremap_resource() is always silently falling back
to uncached when IORESOURCE_CACHEABLE is specified.

Cc: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/io.h |    2 ++
 lib/devres.c       |   48 +++++++++++++++++++++---------------------------
 2 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/include/linux/io.h b/include/linux/io.h
index 04cce4da3685..1c9ad4c6d485 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -70,6 +70,8 @@ static inline void devm_ioport_unmap(struct device *dev, void __iomem *addr)
 
 void __iomem *devm_ioremap(struct device *dev, resource_size_t offset,
 			   resource_size_t size);
+void __iomem *devm_ioremap_cache(struct device *dev, resource_size_t offset,
+			   resource_size_t size);
 void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t offset,
 				   resource_size_t size);
 void __iomem *devm_ioremap_wc(struct device *dev, resource_size_t offset,
diff --git a/lib/devres.c b/lib/devres.c
index fbe2aac522e6..c8e75cdaf816 100644
--- a/lib/devres.c
+++ b/lib/devres.c
@@ -14,6 +14,8 @@ static int devm_ioremap_match(struct device *dev, void *res, void *match_data)
 	return *(void **)res == match_data;
 }
 
+typedef void __iomem *(*ioremap_fn)(resource_size_t offset, unsigned long size);
+
 /**
  * devm_ioremap - Managed ioremap()
  * @dev: Generic device to remap IO address for
@@ -22,8 +24,9 @@ static int devm_ioremap_match(struct device *dev, void *res, void *match_data)
  *
  * Managed ioremap().  Map is automatically unmapped on driver detach.
  */
-void __iomem *devm_ioremap(struct device *dev, resource_size_t offset,
-			   resource_size_t size)
+static void __iomem *devm_ioremap_type(struct device *dev,
+		resource_size_t offset, resource_size_t size,
+		ioremap_fn ioremap_type)
 {
 	void __iomem **ptr, *addr;
 
@@ -31,7 +34,7 @@ void __iomem *devm_ioremap(struct device *dev, resource_size_t offset,
 	if (!ptr)
 		return NULL;
 
-	addr = ioremap(offset, size);
+	addr = ioremap_type(offset, size);
 	if (addr) {
 		*ptr = addr;
 		devres_add(dev, ptr);
@@ -40,34 +43,25 @@ void __iomem *devm_ioremap(struct device *dev, resource_size_t offset,
 
 	return addr;
 }
+
+void __iomem *devm_ioremap(struct device *dev, resource_size_t offset,
+			   resource_size_t size)
+{
+	return devm_ioremap_type(dev, offset, size, ioremap);
+}
 EXPORT_SYMBOL(devm_ioremap);
 
-/**
- * devm_ioremap_nocache - Managed ioremap_nocache()
- * @dev: Generic device to remap IO address for
- * @offset: BUS offset to map
- * @size: Size of map
- *
- * Managed ioremap_nocache().  Map is automatically unmapped on driver
- * detach.
- */
+void __iomem *devm_ioremap_cache(struct device *dev, resource_size_t offset,
+			   resource_size_t size)
+{
+	return devm_ioremap_type(dev, offset, size, ioremap_cache);
+}
+EXPORT_SYMBOL(devm_ioremap_cache);
+
 void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t offset,
 				   resource_size_t size)
 {
-	void __iomem **ptr, *addr;
-
-	ptr = devres_alloc(devm_ioremap_release, sizeof(*ptr), GFP_KERNEL);
-	if (!ptr)
-		return NULL;
-
-	addr = ioremap_nocache(offset, size);
-	if (addr) {
-		*ptr = addr;
-		devres_add(dev, ptr);
-	} else
-		devres_free(ptr);
-
-	return addr;
+	return devm_ioremap_type(dev, offset, size, ioremap_nocache);
 }
 EXPORT_SYMBOL(devm_ioremap_nocache);
 
@@ -154,7 +148,7 @@ void __iomem *devm_ioremap_resource(struct device *dev, struct resource *res)
 	}
 
 	if (res->flags & IORESOURCE_CACHEABLE)
-		dest_ptr = devm_ioremap(dev, res->start, size);
+		dest_ptr = devm_ioremap_cache(dev, res->start, size);
 	else
 		dest_ptr = devm_ioremap_nocache(dev, res->start, size);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 3/4] arch: introduce memremap()
  2015-05-30 18:59 [PATCH v2 0/4] pmem api, generic ioremap_cache, and memremap Dan Williams
  2015-05-30 18:59 ` [PATCH v2 1/4] arch/*/asm/io.h: add ioremap_cache() to all architectures Dan Williams
  2015-05-30 18:59 ` [PATCH v2 2/4] devm: fix ioremap_cache() usage Dan Williams
@ 2015-05-30 18:59 ` Dan Williams
  2015-05-30 21:00   ` Arnd Bergmann
  2015-05-30 18:59 ` [PATCH v2 4/4] arch, x86: cache management apis for persistent memory Dan Williams
  3 siblings, 1 reply; 19+ messages in thread
From: Dan Williams @ 2015-05-30 18:59 UTC (permalink / raw)
  To: arnd, mingo, bp, hpa, tglx, ross.zwisler, akpm
  Cc: jgross, x86, toshi.kani, linux-nvdimm, mcgrof, konrad.wilk,
	linux-kernel, stefan.bader, luto, linux-mm, geert, hmh, tj, hch

Many existing users of ioremap_cache() and some select ioremap_nocache()
users are mapping memory that is known in advance to not have i/o side
effects.  These users are forced to cast away the __iomem annotation, or
otherwise neglect to fix the sparse errors thrown when dereferencing
pointers to this memory.  Provide memremap() as a non __iomem annotated
ioremep().

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/arm64/kernel/efi.c              |    4 ++--
 arch/arm64/kernel/smp_spin_table.c   |   10 +++++----
 arch/x86/kernel/crash_dump_64.c      |    6 +++---
 arch/x86/kernel/kdebugfs.c           |    8 ++++----
 arch/x86/kernel/ksysfs.c             |   28 +++++++++++++-------------
 arch/x86/mm/ioremap.c                |   10 ++++-----
 drivers/acpi/apei/einj.c             |    8 ++++----
 drivers/acpi/apei/erst.c             |   14 +++++++------
 drivers/block/pmem.c                 |    6 +++---
 drivers/firmware/google/memconsole.c |    4 ++--
 include/linux/device.h               |    5 +++++
 include/linux/io.h                   |   36 ++++++++++++++++++++++++++++++++++
 12 files changed, 89 insertions(+), 50 deletions(-)

diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index ab21e0d58278..b672ef33f08b 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -289,7 +289,7 @@ static int __init arm64_enable_runtime_services(void)
 	pr_info("Remapping and enabling EFI services.\n");
 
 	mapsize = memmap.map_end - memmap.map;
-	memmap.map = (__force void *)ioremap_cache((phys_addr_t)memmap.phys_map,
+	memmap.map = memremap_cache((phys_addr_t)memmap.phys_map,
 						   mapsize);
 	if (!memmap.map) {
 		pr_err("Failed to remap EFI memory map\n");
@@ -298,7 +298,7 @@ static int __init arm64_enable_runtime_services(void)
 	memmap.map_end = memmap.map + mapsize;
 	efi.memmap = &memmap;
 
-	efi.systab = (__force void *)ioremap_cache(efi_system_table,
+	efi.systab = memremap_cache(efi_system_table,
 						   sizeof(efi_system_table_t));
 	if (!efi.systab) {
 		pr_err("Failed to remap EFI System Table\n");
diff --git a/arch/arm64/kernel/smp_spin_table.c b/arch/arm64/kernel/smp_spin_table.c
index 14944e5b28da..893c8586e20f 100644
--- a/arch/arm64/kernel/smp_spin_table.c
+++ b/arch/arm64/kernel/smp_spin_table.c
@@ -67,18 +67,18 @@ static int smp_spin_table_cpu_init(struct device_node *dn, unsigned int cpu)
 
 static int smp_spin_table_cpu_prepare(unsigned int cpu)
 {
-	__le64 __iomem *release_addr;
+	__le64 *release_addr;
 
 	if (!cpu_release_addr[cpu])
 		return -ENODEV;
 
 	/*
 	 * The cpu-release-addr may or may not be inside the linear mapping.
-	 * As ioremap_cache will either give us a new mapping or reuse the
+	 * As memremap_cache will either give us a new mapping or reuse the
 	 * existing linear mapping, we can use it to cover both cases. In
 	 * either case the memory will be MT_NORMAL.
 	 */
-	release_addr = ioremap_cache(cpu_release_addr[cpu],
+	release_addr = memremap_cache(cpu_release_addr[cpu],
 				     sizeof(*release_addr));
 	if (!release_addr)
 		return -ENOMEM;
@@ -91,7 +91,7 @@ static int smp_spin_table_cpu_prepare(unsigned int cpu)
 	 * the boot protocol.
 	 */
 	writeq_relaxed(__pa(secondary_holding_pen), release_addr);
-	__flush_dcache_area((__force void *)release_addr,
+	__flush_dcache_area(release_addr,
 			    sizeof(*release_addr));
 
 	/*
@@ -99,7 +99,7 @@ static int smp_spin_table_cpu_prepare(unsigned int cpu)
 	 */
 	sev();
 
-	iounmap(release_addr);
+	memunmap(release_addr);
 
 	return 0;
 }
diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c
index afa64adb75ee..8e04011665fd 100644
--- a/arch/x86/kernel/crash_dump_64.c
+++ b/arch/x86/kernel/crash_dump_64.c
@@ -31,19 +31,19 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
 	if (!csize)
 		return 0;
 
-	vaddr = ioremap_cache(pfn << PAGE_SHIFT, PAGE_SIZE);
+	vaddr = memremap_cache(pfn << PAGE_SHIFT, PAGE_SIZE);
 	if (!vaddr)
 		return -ENOMEM;
 
 	if (userbuf) {
 		if (copy_to_user(buf, vaddr + offset, csize)) {
-			iounmap(vaddr);
+			memunmap(vaddr);
 			return -EFAULT;
 		}
 	} else
 		memcpy(buf, vaddr + offset, csize);
 
 	set_iounmap_nonlazy();
-	iounmap(vaddr);
+	memunmap(vaddr);
 	return csize;
 }
diff --git a/arch/x86/kernel/kdebugfs.c b/arch/x86/kernel/kdebugfs.c
index dc1404bf8e4b..731b10e2814f 100644
--- a/arch/x86/kernel/kdebugfs.c
+++ b/arch/x86/kernel/kdebugfs.c
@@ -49,7 +49,7 @@ static ssize_t setup_data_read(struct file *file, char __user *user_buf,
 	pa = node->paddr + sizeof(struct setup_data) + pos;
 	pg = pfn_to_page((pa + count - 1) >> PAGE_SHIFT);
 	if (PageHighMem(pg)) {
-		p = ioremap_cache(pa, count);
+		p = memremap_cache(pa, count);
 		if (!p)
 			return -ENXIO;
 	} else
@@ -58,7 +58,7 @@ static ssize_t setup_data_read(struct file *file, char __user *user_buf,
 	remain = copy_to_user(user_buf, p, count);
 
 	if (PageHighMem(pg))
-		iounmap(p);
+		memunmap(p);
 
 	if (remain)
 		return -EFAULT;
@@ -128,7 +128,7 @@ static int __init create_setup_data_nodes(struct dentry *parent)
 
 		pg = pfn_to_page((pa_data+sizeof(*data)-1) >> PAGE_SHIFT);
 		if (PageHighMem(pg)) {
-			data = ioremap_cache(pa_data, sizeof(*data));
+			data = memremap_cache(pa_data, sizeof(*data));
 			if (!data) {
 				kfree(node);
 				error = -ENXIO;
@@ -144,7 +144,7 @@ static int __init create_setup_data_nodes(struct dentry *parent)
 		pa_data = data->next;
 
 		if (PageHighMem(pg))
-			iounmap(data);
+			memunmap(data);
 		if (error)
 			goto err_dir;
 		no++;
diff --git a/arch/x86/kernel/ksysfs.c b/arch/x86/kernel/ksysfs.c
index c2bedaea11f7..2fbc62886eae 100644
--- a/arch/x86/kernel/ksysfs.c
+++ b/arch/x86/kernel/ksysfs.c
@@ -16,8 +16,8 @@
 #include <linux/stat.h>
 #include <linux/slab.h>
 #include <linux/mm.h>
+#include <linux/io.h>
 
-#include <asm/io.h>
 #include <asm/setup.h>
 
 static ssize_t version_show(struct kobject *kobj,
@@ -79,12 +79,12 @@ static int get_setup_data_paddr(int nr, u64 *paddr)
 			*paddr = pa_data;
 			return 0;
 		}
-		data = ioremap_cache(pa_data, sizeof(*data));
+		data = memremap_cache(pa_data, sizeof(*data));
 		if (!data)
 			return -ENOMEM;
 
 		pa_data = data->next;
-		iounmap(data);
+		memunmap(data);
 		i++;
 	}
 	return -EINVAL;
@@ -97,17 +97,17 @@ static int __init get_setup_data_size(int nr, size_t *size)
 	u64 pa_data = boot_params.hdr.setup_data;
 
 	while (pa_data) {
-		data = ioremap_cache(pa_data, sizeof(*data));
+		data = memremap_cache(pa_data, sizeof(*data));
 		if (!data)
 			return -ENOMEM;
 		if (nr == i) {
 			*size = data->len;
-			iounmap(data);
+			memunmap(data);
 			return 0;
 		}
 
 		pa_data = data->next;
-		iounmap(data);
+		memunmap(data);
 		i++;
 	}
 	return -EINVAL;
@@ -127,12 +127,12 @@ static ssize_t type_show(struct kobject *kobj,
 	ret = get_setup_data_paddr(nr, &paddr);
 	if (ret)
 		return ret;
-	data = ioremap_cache(paddr, sizeof(*data));
+	data = memremap_cache(paddr, sizeof(*data));
 	if (!data)
 		return -ENOMEM;
 
 	ret = sprintf(buf, "0x%x\n", data->type);
-	iounmap(data);
+	memunmap(data);
 	return ret;
 }
 
@@ -154,7 +154,7 @@ static ssize_t setup_data_data_read(struct file *fp,
 	ret = get_setup_data_paddr(nr, &paddr);
 	if (ret)
 		return ret;
-	data = ioremap_cache(paddr, sizeof(*data));
+	data = memremap_cache(paddr, sizeof(*data));
 	if (!data)
 		return -ENOMEM;
 
@@ -170,15 +170,15 @@ static ssize_t setup_data_data_read(struct file *fp,
 		goto out;
 
 	ret = count;
-	p = ioremap_cache(paddr + sizeof(*data), data->len);
+	p = memremap_cache(paddr + sizeof(*data), data->len);
 	if (!p) {
 		ret = -ENOMEM;
 		goto out;
 	}
 	memcpy(buf, p + off, count);
-	iounmap(p);
+	memunmap(p);
 out:
-	iounmap(data);
+	memunmap(data);
 	return ret;
 }
 
@@ -250,13 +250,13 @@ static int __init get_setup_data_total_num(u64 pa_data, int *nr)
 	*nr = 0;
 	while (pa_data) {
 		*nr += 1;
-		data = ioremap_cache(pa_data, sizeof(*data));
+		data = memremap_cache(pa_data, sizeof(*data));
 		if (!data) {
 			ret = -ENOMEM;
 			goto out;
 		}
 		pa_data = data->next;
-		iounmap(data);
+		memunmap(data);
 	}
 
 out:
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index cc5ccc415cc0..f48c137560df 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -414,12 +414,10 @@ void *xlate_dev_mem_ptr(phys_addr_t phys)
 	if (page_is_ram(start >> PAGE_SHIFT))
 		return __va(phys);
 
-	vaddr = ioremap_cache(start, PAGE_SIZE);
-	/* Only add the offset on success and return NULL if the ioremap() failed: */
+	vaddr = memremap_cache(start, PAGE_SIZE);
 	if (vaddr)
-		vaddr += offset;
-
-	return vaddr;
+		return vaddr + offset;
+	return NULL;
 }
 
 void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
@@ -427,7 +425,7 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
 	if (page_is_ram(phys >> PAGE_SHIFT))
 		return;
 
-	iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
+	memunmap((void *) ((unsigned long)addr & PAGE_MASK));
 }
 
 static pte_t bm_pte[PAGE_SIZE/sizeof(pte_t)] __page_aligned_bss;
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index a095d4f858da..2ec9006cfb6c 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -318,7 +318,7 @@ static int __einj_error_trigger(u64 trigger_paddr, u32 type,
 			    sizeof(*trigger_tab) - 1);
 		goto out;
 	}
-	trigger_tab = ioremap_cache(trigger_paddr, sizeof(*trigger_tab));
+	trigger_tab = memremap_cache(trigger_paddr, sizeof(*trigger_tab));
 	if (!trigger_tab) {
 		pr_err(EINJ_PFX "Failed to map trigger table!\n");
 		goto out_rel_header;
@@ -346,8 +346,8 @@ static int __einj_error_trigger(u64 trigger_paddr, u32 type,
 		       (unsigned long long)trigger_paddr + table_size - 1);
 		goto out_rel_header;
 	}
-	iounmap(trigger_tab);
-	trigger_tab = ioremap_cache(trigger_paddr, table_size);
+	memunmap(trigger_tab);
+	trigger_tab = memremap_cache(trigger_paddr, table_size);
 	if (!trigger_tab) {
 		pr_err(EINJ_PFX "Failed to map trigger table!\n");
 		goto out_rel_entry;
@@ -409,7 +409,7 @@ out_rel_header:
 	release_mem_region(trigger_paddr, sizeof(*trigger_tab));
 out:
 	if (trigger_tab)
-		iounmap(trigger_tab);
+		memunmap(trigger_tab);
 
 	return rc;
 }
diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c
index ed65e9c4b5b0..4f8b62404db5 100644
--- a/drivers/acpi/apei/erst.c
+++ b/drivers/acpi/apei/erst.c
@@ -76,7 +76,7 @@ static struct acpi_table_erst *erst_tab;
 static struct erst_erange {
 	u64 base;
 	u64 size;
-	void __iomem *vaddr;
+	void *vaddr;
 	u32 attr;
 } erst_erange;
 
@@ -279,19 +279,19 @@ static int erst_exec_move_data(struct apei_exec_context *ctx,
 	if (rc)
 		return rc;
 
-	src = ioremap(ctx->src_base + offset, ctx->var2);
+	src = memremap(ctx->src_base + offset, ctx->var2);
 	if (!src)
 		return -ENOMEM;
-	dst = ioremap(ctx->dst_base + offset, ctx->var2);
+	dst = memremap(ctx->dst_base + offset, ctx->var2);
 	if (!dst) {
-		iounmap(src);
+		memunmap(src);
 		return -ENOMEM;
 	}
 
 	memmove(dst, src, ctx->var2);
 
-	iounmap(src);
-	iounmap(dst);
+	memunmap(src);
+	memunmap(dst);
 
 	return 0;
 }
@@ -1184,7 +1184,7 @@ static int __init erst_init(void)
 		goto err_unmap_reg;
 	}
 	rc = -ENOMEM;
-	erst_erange.vaddr = ioremap_cache(erst_erange.base,
+	erst_erange.vaddr = memremap_cache(erst_erange.base,
 					  erst_erange.size);
 	if (!erst_erange.vaddr)
 		goto err_release_erange;
diff --git a/drivers/block/pmem.c b/drivers/block/pmem.c
index 095dfaadcaa5..799acff6bd7c 100644
--- a/drivers/block/pmem.c
+++ b/drivers/block/pmem.c
@@ -143,7 +143,7 @@ static struct pmem_device *pmem_alloc(struct device *dev, struct resource *res)
 	 * of the CPU caches in case of a crash.
 	 */
 	err = -ENOMEM;
-	pmem->virt_addr = ioremap_wt(pmem->phys_addr, pmem->size);
+	pmem->virt_addr = memremap_wt(pmem->phys_addr, pmem->size);
 	if (!pmem->virt_addr)
 		goto out_release_region;
 
@@ -179,7 +179,7 @@ static struct pmem_device *pmem_alloc(struct device *dev, struct resource *res)
 out_free_queue:
 	blk_cleanup_queue(pmem->pmem_queue);
 out_unmap:
-	iounmap(pmem->virt_addr);
+	memunmap(pmem->virt_addr);
 out_release_region:
 	release_mem_region(pmem->phys_addr, pmem->size);
 out_free_dev:
@@ -193,7 +193,7 @@ static void pmem_free(struct pmem_device *pmem)
 	del_gendisk(pmem->pmem_disk);
 	put_disk(pmem->pmem_disk);
 	blk_cleanup_queue(pmem->pmem_queue);
-	iounmap(pmem->virt_addr);
+	memunmap(pmem->virt_addr);
 	release_mem_region(pmem->phys_addr, pmem->size);
 	kfree(pmem);
 }
diff --git a/drivers/firmware/google/memconsole.c b/drivers/firmware/google/memconsole.c
index 2f569aaed4c7..877433dc8297 100644
--- a/drivers/firmware/google/memconsole.c
+++ b/drivers/firmware/google/memconsole.c
@@ -52,14 +52,14 @@ static ssize_t memconsole_read(struct file *filp, struct kobject *kobp,
 	char *memconsole;
 	ssize_t ret;
 
-	memconsole = ioremap_cache(memconsole_baseaddr, memconsole_length);
+	memconsole = memremap_cache(memconsole_baseaddr, memconsole_length);
 	if (!memconsole) {
 		pr_err("memconsole: ioremap_cache failed\n");
 		return -ENOMEM;
 	}
 	ret = memory_read_from_buffer(buf, count, &pos, memconsole,
 				      memconsole_length);
-	iounmap(memconsole);
+	memunmap(memconsole);
 	return ret;
 }
 
diff --git a/include/linux/device.h b/include/linux/device.h
index 6558af90c8fe..518f49c5d596 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -638,6 +638,11 @@ extern void devm_free_pages(struct device *dev, unsigned long addr);
 
 void __iomem *devm_ioremap_resource(struct device *dev, struct resource *res);
 
+static inline void *devm_memremap_resource(struct device *dev, struct resource *res)
+{
+	return (void __force *) devm_ioremap_resource(dev, res);
+}
+
 /* allows to add/remove a custom action to devres stack */
 int devm_add_action(struct device *dev, void (*action)(void *), void *data);
 void devm_remove_action(struct device *dev, void (*action)(void *), void *data);
diff --git a/include/linux/io.h b/include/linux/io.h
index 1c9ad4c6d485..86c636ce4c43 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -122,4 +122,40 @@ static inline int arch_phys_wc_index(int handle)
 #endif
 #endif
 
+/*
+ * memremap() is "ioremap" for cases where it is known that the resource
+ * being mapped does not have i/o side effects and the __iomem
+ * annotation is not applicable.
+ */
+
+static inline void *memremap(resource_size_t offset, size_t size)
+{
+	return (void __force *) ioremap(offset, size);
+}
+
+static inline void *memremap_nocache(resource_size_t offset, size_t size)
+{
+	return (void __force *) ioremap_nocache(offset, size);
+}
+
+static inline void *memremap_cache(resource_size_t offset, size_t size)
+{
+	return (void __force *) ioremap_cache(offset, size);
+}
+
+static inline void *memremap_wc(resource_size_t offset, size_t size)
+{
+	return (void __force *) ioremap_wc(offset, size);
+}
+
+static inline void *memremap_wt(resource_size_t offset, size_t size)
+{
+	return (void __force *) ioremap_wt(offset, size);
+}
+
+static inline void memunmap(void *addr)
+{
+	return iounmap((void __iomem __force *) addr);
+}
+
 #endif /* _LINUX_IO_H */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 4/4] arch, x86: cache management apis for persistent memory
  2015-05-30 18:59 [PATCH v2 0/4] pmem api, generic ioremap_cache, and memremap Dan Williams
                   ` (2 preceding siblings ...)
  2015-05-30 18:59 ` [PATCH v2 3/4] arch: introduce memremap() Dan Williams
@ 2015-05-30 18:59 ` Dan Williams
  2015-06-01  9:19   ` Paul Bolle
  2015-06-01 11:39   ` Boaz Harrosh
  3 siblings, 2 replies; 19+ messages in thread
From: Dan Williams @ 2015-05-30 18:59 UTC (permalink / raw)
  To: arnd, mingo, bp, hpa, tglx, ross.zwisler, akpm
  Cc: jgross, x86, toshi.kani, linux-nvdimm, mcgrof, konrad.wilk,
	linux-kernel, stefan.bader, luto, linux-mm, geert, hmh, tj, hch

From: Ross Zwisler <ross.zwisler@linux.intel.com>

Based on an original patch by Ross Zwisler [1].

Writes to persistent memory have the potential to be posted to cpu
cache, cpu write buffers, and platform write buffers (memory controller)
before being committed to persistent media.  Provide apis
(persistent_copy() and persistent_sync()) to copy data and assert that
it is durable in PMEM (a persistent linear address range).

[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-May/000932.html

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
[djbw: s/arch_persistent_flush()/io_flush_cache_range()/]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/Kconfig                  |    1 
 arch/x86/include/asm/cacheflush.h |   24 ++++++++++
 arch/x86/include/asm/io.h         |    6 ++
 drivers/block/pmem.c              |   58 ++++++++++++++++++++++-
 include/linux/pmem.h              |   93 +++++++++++++++++++++++++++++++++++++
 lib/Kconfig                       |    3 +
 6 files changed, 183 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/pmem.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 73a4d0330ad0..6412d92e6f1e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -102,6 +102,7 @@ config X86
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
 	select ARCH_HAS_SG_CHAIN
+	select ARCH_HAS_PMEM_API
 	select CLKEVT_I8253
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select GENERIC_IOMAP
diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
index b6f7457d12e4..6b8bd5c43bf6 100644
--- a/arch/x86/include/asm/cacheflush.h
+++ b/arch/x86/include/asm/cacheflush.h
@@ -4,6 +4,7 @@
 /* Caches aren't brain-dead on the intel. */
 #include <asm-generic/cacheflush.h>
 #include <asm/special_insns.h>
+#include <asm/uaccess.h>
 
 /*
  * The set_memory_* API can be used to change various attributes of a virtual
@@ -108,4 +109,27 @@ static inline int rodata_test(void)
 }
 #endif
 
+static inline void arch_persistent_copy(void *dst, const void *src, size_t n)
+{
+	/*
+	 * We are copying between two kernel buffers, if
+	 * __copy_from_user_inatomic_nocache() returns an error (page
+	 * fault) we would have already taken an unhandled fault before
+	 * the BUG_ON.  The BUG_ON is simply here to satisfy
+	 * __must_check and allow reuse of the common non-temporal store
+	 * implementation for persistent_copy().
+	 */
+	BUG_ON(__copy_from_user_inatomic_nocache(dst, src, n));
+}
+
+static inline void arch_persistent_sync(void)
+{
+	wmb();
+	pcommit_sfence();
+}
+
+static inline bool __arch_has_persistent_sync(void)
+{
+	return boot_cpu_has(X86_FEATURE_PCOMMIT);
+}
 #endif /* _ASM_X86_CACHEFLUSH_H */
diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 956f2768bdc1..f3c32bb207cf 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -250,6 +250,12 @@ static inline void flush_write_buffers(void)
 #endif
 }
 
+static inline void *arch_persistent_remap(resource_size_t offset,
+	unsigned long size)
+{
+	return (void __force *) ioremap_cache(offset, size);
+}
+
 #endif /* __KERNEL__ */
 
 extern void native_io_delay(void);
diff --git a/drivers/block/pmem.c b/drivers/block/pmem.c
index 799acff6bd7c..10cbe557165c 100644
--- a/drivers/block/pmem.c
+++ b/drivers/block/pmem.c
@@ -23,9 +23,16 @@
 #include <linux/module.h>
 #include <linux/moduleparam.h>
 #include <linux/slab.h>
+#include <linux/pmem.h>
 
 #define PMEM_MINORS		16
 
+struct pmem_ops {
+	void *(*remap)(resource_size_t offset, unsigned long size);
+	void (*copy)(void *dst, const void *src, size_t size);
+	void (*sync)(void);
+};
+
 struct pmem_device {
 	struct request_queue	*pmem_queue;
 	struct gendisk		*pmem_disk;
@@ -34,11 +41,54 @@ struct pmem_device {
 	phys_addr_t		phys_addr;
 	void			*virt_addr;
 	size_t			size;
+	struct pmem_ops		ops;
 };
 
 static int pmem_major;
 static atomic_t pmem_index;
 
+static void default_pmem_sync(void)
+{
+	wmb();
+}
+
+static void default_pmem_copy(void *dst, const void *src, size_t size)
+{
+	memcpy(dst, src, size);
+}
+
+static void pmem_ops_default_init(struct pmem_device *pmem)
+{
+	/*
+	 * These defaults seek to offer decent performance and minimize
+	 * the window between i/o completion and writes being durable on
+	 * media.  However, it is undefined / architecture specific
+	 * whether acknowledged data may be lost in transit if a power
+	 * fail occurs after bio_endio().
+	 */
+	pmem->ops.remap = memremap_wt;
+	pmem->ops.copy = default_pmem_copy;
+	pmem->ops.sync = default_pmem_sync;
+}
+
+static bool pmem_ops_init(struct pmem_device *pmem)
+{
+	if (IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API) &&
+			arch_has_persistent_sync()) {
+		/*
+		 * This arch + cpu guarantees that bio_endio() == data
+		 * durable on media.
+		 */
+		pmem->ops.remap = persistent_remap;
+		pmem->ops.copy = persistent_copy;
+		pmem->ops.sync = persistent_sync;
+		return true;
+	}
+
+	pmem_ops_default_init(pmem);
+	return false;
+}
+
 static void pmem_do_bvec(struct pmem_device *pmem, struct page *page,
 			unsigned int len, unsigned int off, int rw,
 			sector_t sector)
@@ -51,7 +101,7 @@ static void pmem_do_bvec(struct pmem_device *pmem, struct page *page,
 		flush_dcache_page(page);
 	} else {
 		flush_dcache_page(page);
-		memcpy(pmem->virt_addr + pmem_off, mem + off, len);
+		pmem->ops.copy(pmem->virt_addr + pmem_off, mem + off, len);
 	}
 
 	kunmap_atomic(mem);
@@ -82,6 +132,8 @@ static void pmem_make_request(struct request_queue *q, struct bio *bio)
 		sector += bvec.bv_len >> 9;
 	}
 
+	if (rw)
+		pmem->ops.sync();
 out:
 	bio_endio(bio, err);
 }
@@ -131,6 +183,8 @@ static struct pmem_device *pmem_alloc(struct device *dev, struct resource *res)
 
 	pmem->phys_addr = res->start;
 	pmem->size = resource_size(res);
+	if (!pmem_ops_init(pmem))
+		dev_warn(dev, "unable to guarantee persistence of writes\n");
 
 	err = -EINVAL;
 	if (!request_mem_region(pmem->phys_addr, pmem->size, "pmem")) {
@@ -143,7 +197,7 @@ static struct pmem_device *pmem_alloc(struct device *dev, struct resource *res)
 	 * of the CPU caches in case of a crash.
 	 */
 	err = -ENOMEM;
-	pmem->virt_addr = memremap_wt(pmem->phys_addr, pmem->size);
+	pmem->virt_addr = pmem->ops.remap(pmem->phys_addr, pmem->size);
 	if (!pmem->virt_addr)
 		goto out_release_region;
 
diff --git a/include/linux/pmem.h b/include/linux/pmem.h
new file mode 100644
index 000000000000..e9a63ee1d361
--- /dev/null
+++ b/include/linux/pmem.h
@@ -0,0 +1,93 @@
+/*
+ * Copyright(c) 2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#ifndef __PMEM_H__
+#define __PMEM_H__
+
+#include <linux/io.h>
+#include <asm/cacheflush.h>
+
+/*
+ * Architectures that define ARCH_HAS_PMEM_API must provide
+ * implementations for arch_persistent_remap(), arch_persistent_copy(),
+ * arch_persistent_sync(), and __arch_has_persistent_sync().
+ */
+
+#ifdef CONFIG_ARCH_HAS_PMEM_API
+/**
+ * persistent_remap - map physical persistent memory for pmem api
+ * @offset: physical address of persistent memory
+ * @size: size of the mapping
+ *
+ * Establish a mapping of the architecture specific memory type expected
+ * by persistent_copy() and persistent_sync().  For example, it may be
+ * the case that an uncacheable or writethrough mapping is sufficient,
+ * or a writeback mapping provided persistent_copy() and
+ * persistent_sync() arrange for the data to be written through the
+ * cache to persistent media.
+ */
+static inline void *persistent_remap(resource_size_t offset, unsigned long size)
+{
+	return arch_persistent_remap(offset, size);
+}
+
+/**
+ * persistent_copy - copy data to persistent memory
+ * @dst: destination buffer for the copy
+ * @src: source buffer for the copy
+ * @n: length of the copy in bytes
+ *
+ * Perform a memory copy that results in the destination of the copy
+ * being effectively evicted from, or never written to, the processor
+ * cache hierarchy after the copy completes.  After persistent_copy()
+ * data may still reside in cpu or platform buffers, so this operation
+ * must be followed by a persistent_sync().
+ */
+static inline void persistent_copy(void *dst, const void *src, size_t n)
+{
+	arch_persistent_copy(dst, src, n);
+}
+
+/**
+ * persistent_sync - synchronize writes to persistent memory
+ *
+ * After a series of persistent_copy() operations this drains data from
+ * cpu write buffers and any platform (memory controller) buffers to
+ * ensure that written data is durable on persistent memory media.
+ */
+static inline void persistent_sync(void)
+{
+	arch_persistent_sync();
+}
+
+/**
+ * arch_has_persistent_sync - true if persistent_sync() ensures durability
+ *
+ * For a given cpu implementation within an architecture it is possible
+ * that persistent_sync() resolves to a nop.  In the case this returns
+ * false, pmem api users are unable to ensure durabilty and may want to
+ * fall back to a different data consistency model, or otherwise notify
+ * the user.
+ */
+static inline bool arch_has_persistent_sync(void)
+{
+	return __arch_has_persistent_sync();
+}
+#else
+/* undefined symbols */
+extern void *persistent_remap(resource_size_t offet, unsigned long size);
+extern void persistent_copy(void *dst, const void *src, size_t n);
+extern void persistent_sync(void);
+extern bool arch_has_persistent_sync(void);
+#endif /* CONFIG_ARCH_HAS_PMEM_API */
+
+#endif /* __PMEM_H__ */
diff --git a/lib/Kconfig b/lib/Kconfig
index 601965a948e8..e6a3c892d514 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -522,4 +522,7 @@ source "lib/fonts/Kconfig"
 config ARCH_HAS_SG_CHAIN
 	def_bool n
 
+config ARCH_HAS_PMEM_API
+	def_bool n
+
 endmenu

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/4] devm: fix ioremap_cache() usage
  2015-05-30 18:59 ` [PATCH v2 2/4] devm: fix ioremap_cache() usage Dan Williams
@ 2015-05-30 20:52   ` Arnd Bergmann
  2015-05-30 21:16     ` Dan Williams
  0 siblings, 1 reply; 19+ messages in thread
From: Arnd Bergmann @ 2015-05-30 20:52 UTC (permalink / raw)
  To: Dan Williams
  Cc: mingo, bp, hpa, tglx, ross.zwisler, akpm, jgross, x86,
	toshi.kani, linux-nvdimm, mcgrof, konrad.wilk, linux-kernel,
	stefan.bader, luto, linux-mm, geert, hmh, tj, hch

On Saturday 30 May 2015, Dan Williams wrote:
> @@ -154,7 +148,7 @@ void __iomem *devm_ioremap_resource(struct device *dev, struct resource *res)
>         }
>  
>         if (res->flags & IORESOURCE_CACHEABLE)
> -               dest_ptr = devm_ioremap(dev, res->start, size);
> +               dest_ptr = devm_ioremap_cache(dev, res->start, size);
>         else
>                 dest_ptr = devm_ioremap_nocache(dev, res->start, size);

I think the existing uses of IORESOURCE_CACHEABLE are mostly bugs, so changing
the behavior here may cause more problems than it solves.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/4] arch: introduce memremap()
  2015-05-30 18:59 ` [PATCH v2 3/4] arch: introduce memremap() Dan Williams
@ 2015-05-30 21:00   ` Arnd Bergmann
  2015-05-30 21:39     ` Dan Williams
  0 siblings, 1 reply; 19+ messages in thread
From: Arnd Bergmann @ 2015-05-30 21:00 UTC (permalink / raw)
  To: Dan Williams
  Cc: mingo, bp, hpa, tglx, ross.zwisler, akpm, jgross, x86,
	toshi.kani, linux-nvdimm, mcgrof, konrad.wilk, linux-kernel,
	stefan.bader, luto, linux-mm, geert, hmh, tj, hch

On Saturday 30 May 2015, Dan Williams wrote:
> 
> +/*
> + * memremap() is "ioremap" for cases where it is known that the resource
> + * being mapped does not have i/o side effects and the __iomem
> + * annotation is not applicable.
> + */
> +
> +static inline void *memremap(resource_size_t offset, size_t size)
> +{
> +       return (void __force *) ioremap(offset, size);
> +}
> +
> +static inline void *memremap_nocache(resource_size_t offset, size_t size)
> +{
> +       return (void __force *) ioremap_nocache(offset, size);
> +}
> +
> +static inline void *memremap_cache(resource_size_t offset, size_t size)
> +{
> +       return (void __force *) ioremap_cache(offset, size);
> +}
> +

There are architectures on which the result of ioremap is not necessarily
a pointer, but instead indicates that the access is to be done through
some other indirect access, or require special instructions. I think implementing
the memremap() interfaces is generally helpful, but don't rely on the
ioremap implementation.

Adding both cached an uncached versions is also dangerous, because you
typically get either undefined behavior or a system checkstop when a
single page is mapped both cached and uncached at the same time. This
means that doing memremap() or memremap_nocache() on something that
may be part of the linear kernel mapping is a bug, and we should probably
check for that here.

We can probably avoid having both memremap() and memremap_nocache(),
as all architectures define ioremap() and ioremap_nocache() to be the
same thing.
	
	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/4] devm: fix ioremap_cache() usage
  2015-05-30 20:52   ` Arnd Bergmann
@ 2015-05-30 21:16     ` Dan Williams
  2015-06-01 14:26       ` Arnd Bergmann
  0 siblings, 1 reply; 19+ messages in thread
From: Dan Williams @ 2015-05-30 21:16 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Ingo Molnar, Borislav Petkov, H. Peter Anvin, Thomas Gleixner,
	Ross Zwisler, Andrew Morton, Juergen Gross, X86 ML, Kani,
	Toshimitsu, linux-nvdimm, Luis Rodriguez, Konrad Rzeszutek Wilk,
	linux-kernel, Stefan Bader, Andy Lutomirski, linux-mm, geert,
	Henrique de Moraes Holschuh, Tejun Heo, Christoph Hellwig

On Sat, May 30, 2015 at 1:52 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Saturday 30 May 2015, Dan Williams wrote:
>> @@ -154,7 +148,7 @@ void __iomem *devm_ioremap_resource(struct device *dev, struct resource *res)
>>         }
>>
>>         if (res->flags & IORESOURCE_CACHEABLE)
>> -               dest_ptr = devm_ioremap(dev, res->start, size);
>> +               dest_ptr = devm_ioremap_cache(dev, res->start, size);
>>         else
>>                 dest_ptr = devm_ioremap_nocache(dev, res->start, size);
>
> I think the existing uses of IORESOURCE_CACHEABLE are mostly bugs, so changing
> the behavior here may cause more problems than it solves.
>

Ok, but that effectively makes devm_ioremap_resource() unusable for
the cached case.  How about introducing devm_ioremap_cache_resource(),
and cleaning up devm_ioremap_resource() to stop pretending that it is
honoring the memory type of the resource?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/4] arch: introduce memremap()
  2015-05-30 21:00   ` Arnd Bergmann
@ 2015-05-30 21:39     ` Dan Williams
  2015-06-01 14:29       ` Arnd Bergmann
  0 siblings, 1 reply; 19+ messages in thread
From: Dan Williams @ 2015-05-30 21:39 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Ingo Molnar, Borislav Petkov, H. Peter Anvin, Thomas Gleixner,
	Ross Zwisler, Andrew Morton, Juergen Gross, X86 ML, Kani,
	Toshimitsu, linux-nvdimm, Luis Rodriguez, Konrad Rzeszutek Wilk,
	linux-kernel, Stefan Bader, Andy Lutomirski, linux-mm, geert,
	Henrique de Moraes Holschuh, Tejun Heo, Christoph Hellwig

On Sat, May 30, 2015 at 2:00 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Saturday 30 May 2015, Dan Williams wrote:
>>
>> +/*
>> + * memremap() is "ioremap" for cases where it is known that the resource
>> + * being mapped does not have i/o side effects and the __iomem
>> + * annotation is not applicable.
>> + */
>> +
>> +static inline void *memremap(resource_size_t offset, size_t size)
>> +{
>> +       return (void __force *) ioremap(offset, size);
>> +}
>> +
>> +static inline void *memremap_nocache(resource_size_t offset, size_t size)
>> +{
>> +       return (void __force *) ioremap_nocache(offset, size);
>> +}
>> +
>> +static inline void *memremap_cache(resource_size_t offset, size_t size)
>> +{
>> +       return (void __force *) ioremap_cache(offset, size);
>> +}
>> +
>
> There are architectures on which the result of ioremap is not necessarily
> a pointer, but instead indicates that the access is to be done through
> some other indirect access, or require special instructions. I think implementing
> the memremap() interfaces is generally helpful, but don't rely on the
> ioremap implementation.

Is it enough to detect the archs where ioremap() does return an
otherwise usable pointer and set ARCH_HAS_MEMREMAP, in the first take
of this introduction?  Regardless, it seems that drivers should have
Kconfig dependency checks for archs where ioremap can not be used in
this manner.

> Adding both cached an uncached versions is also dangerous, because you
> typically get either undefined behavior or a system checkstop when a
> single page is mapped both cached and uncached at the same time. This
> means that doing memremap() or memremap_nocache() on something that
> may be part of the linear kernel mapping is a bug, and we should probably
> check for that here.

Part of the reason for relying on ioremap() was to borrow its internal
checks to fail attempts that try to remap ranges that are already in
the kernel linear map.  Hmm, that's a guarantee x86 ioremap gives, but
maybe that's not universal?

> We can probably avoid having both memremap() and memremap_nocache(),
> as all architectures define ioremap() and ioremap_nocache() to be the
> same thing.
>

Ok

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 4/4] arch, x86: cache management apis for persistent memory
  2015-05-30 18:59 ` [PATCH v2 4/4] arch, x86: cache management apis for persistent memory Dan Williams
@ 2015-06-01  9:19   ` Paul Bolle
  2015-06-01 11:39   ` Boaz Harrosh
  1 sibling, 0 replies; 19+ messages in thread
From: Paul Bolle @ 2015-06-01  9:19 UTC (permalink / raw)
  To: Dan Williams
  Cc: arnd, mingo, bp, hpa, tglx, ross.zwisler, akpm, jgross, x86,
	toshi.kani, linux-nvdimm, mcgrof, konrad.wilk, linux-kernel,
	stefan.bader, luto, linux-mm, geert, hmh, tj, hch

On Sat, 2015-05-30 at 14:59 -0400, Dan Williams wrote:
> --- a/lib/Kconfig
> +++ b/lib/Kconfig

> +config ARCH_HAS_PMEM_API
> +	def_bool n

'n' is the default anyway, so I think 

config ARCH_HAS_PMEM_API
	bool

should work just as well.


Paul Bolle

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 4/4] arch, x86: cache management apis for persistent memory
  2015-05-30 18:59 ` [PATCH v2 4/4] arch, x86: cache management apis for persistent memory Dan Williams
  2015-06-01  9:19   ` Paul Bolle
@ 2015-06-01 11:39   ` Boaz Harrosh
  2015-06-01 11:44     ` Boaz Harrosh
  2015-06-01 16:22     ` Dan Williams
  1 sibling, 2 replies; 19+ messages in thread
From: Boaz Harrosh @ 2015-06-01 11:39 UTC (permalink / raw)
  To: Dan Williams, arnd, mingo, bp, hpa, tglx, ross.zwisler, akpm
  Cc: jgross, konrad.wilk, mcgrof, x86, linux-kernel, stefan.bader,
	luto, linux-mm, linux-nvdimm, geert, hmh, tj, hch

On 05/30/2015 09:59 PM, Dan Williams wrote:
> From: Ross Zwisler <ross.zwisler@linux.intel.com>
> 
> Based on an original patch by Ross Zwisler [1].
> 
> Writes to persistent memory have the potential to be posted to cpu
> cache, cpu write buffers, and platform write buffers (memory controller)
> before being committed to persistent media.  Provide apis
> (persistent_copy() and persistent_sync()) to copy data and assert that
> it is durable in PMEM (a persistent linear address range).
> 
> [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-May/000932.html
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> [djbw: s/arch_persistent_flush()/io_flush_cache_range()/]
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/Kconfig                  |    1 
>  arch/x86/include/asm/cacheflush.h |   24 ++++++++++
>  arch/x86/include/asm/io.h         |    6 ++
>  drivers/block/pmem.c              |   58 ++++++++++++++++++++++-
>  include/linux/pmem.h              |   93 +++++++++++++++++++++++++++++++++++++
>  lib/Kconfig                       |    3 +
>  6 files changed, 183 insertions(+), 2 deletions(-)
>  create mode 100644 include/linux/pmem.h
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 73a4d0330ad0..6412d92e6f1e 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -102,6 +102,7 @@ config X86
>  	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
>  	select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
>  	select ARCH_HAS_SG_CHAIN
> +	select ARCH_HAS_PMEM_API
>  	select CLKEVT_I8253
>  	select ARCH_HAVE_NMI_SAFE_CMPXCHG
>  	select GENERIC_IOMAP
> diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
> index b6f7457d12e4..6b8bd5c43bf6 100644
> --- a/arch/x86/include/asm/cacheflush.h
> +++ b/arch/x86/include/asm/cacheflush.h
> @@ -4,6 +4,7 @@
>  /* Caches aren't brain-dead on the intel. */
>  #include <asm-generic/cacheflush.h>
>  #include <asm/special_insns.h>
> +#include <asm/uaccess.h>
>  
>  /*
>   * The set_memory_* API can be used to change various attributes of a virtual
> @@ -108,4 +109,27 @@ static inline int rodata_test(void)
>  }
>  #endif
>  
> +static inline void arch_persistent_copy(void *dst, const void *src, size_t n)
> +{
> +	/*
> +	 * We are copying between two kernel buffers, if
> +	 * __copy_from_user_inatomic_nocache() returns an error (page
> +	 * fault) we would have already taken an unhandled fault before
> +	 * the BUG_ON.  The BUG_ON is simply here to satisfy
> +	 * __must_check and allow reuse of the common non-temporal store
> +	 * implementation for persistent_copy().
> +	 */
> +	BUG_ON(__copy_from_user_inatomic_nocache(dst, src, n));
> +}
> +
> +static inline void arch_persistent_sync(void)
> +{
> +	wmb();
> +	pcommit_sfence();
> +}
> +
> +static inline bool __arch_has_persistent_sync(void)
> +{
> +	return boot_cpu_has(X86_FEATURE_PCOMMIT);
> +}
>  #endif /* _ASM_X86_CACHEFLUSH_H */
> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
> index 956f2768bdc1..f3c32bb207cf 100644
> --- a/arch/x86/include/asm/io.h
> +++ b/arch/x86/include/asm/io.h
> @@ -250,6 +250,12 @@ static inline void flush_write_buffers(void)
>  #endif
>  }
>  
> +static inline void *arch_persistent_remap(resource_size_t offset,
> +	unsigned long size)
> +{
> +	return (void __force *) ioremap_cache(offset, size);
> +}
> +
>  #endif /* __KERNEL__ */
>  
>  extern void native_io_delay(void);
> diff --git a/drivers/block/pmem.c b/drivers/block/pmem.c
> index 799acff6bd7c..10cbe557165c 100644
> --- a/drivers/block/pmem.c
> +++ b/drivers/block/pmem.c
> @@ -23,9 +23,16 @@
>  #include <linux/module.h>
>  #include <linux/moduleparam.h>
>  #include <linux/slab.h>
> +#include <linux/pmem.h>
>  
>  #define PMEM_MINORS		16
>  
> +struct pmem_ops {
> +	void *(*remap)(resource_size_t offset, unsigned long size);
> +	void (*copy)(void *dst, const void *src, size_t size);
> +	void (*sync)(void);

What? why the ops at all see below ...

> +};
> +
>  struct pmem_device {
>  	struct request_queue	*pmem_queue;
>  	struct gendisk		*pmem_disk;
> @@ -34,11 +41,54 @@ struct pmem_device {
>  	phys_addr_t		phys_addr;
>  	void			*virt_addr;
>  	size_t			size;
> +	struct pmem_ops		ops;
>  };
>  
>  static int pmem_major;
>  static atomic_t pmem_index;
>  
> +static void default_pmem_sync(void)
> +{
> +	wmb();
> +}
> +
> +static void default_pmem_copy(void *dst, const void *src, size_t size)
> +{
> +	memcpy(dst, src, size);
> +}
> +
> +static void pmem_ops_default_init(struct pmem_device *pmem)
> +{
> +	/*
> +	 * These defaults seek to offer decent performance and minimize
> +	 * the window between i/o completion and writes being durable on
> +	 * media.  However, it is undefined / architecture specific
> +	 * whether acknowledged data may be lost in transit if a power
> +	 * fail occurs after bio_endio().
> +	 */
> +	pmem->ops.remap = memremap_wt;
> +	pmem->ops.copy = default_pmem_copy;
> +	pmem->ops.sync = default_pmem_sync;
> +}
> +
> +static bool pmem_ops_init(struct pmem_device *pmem)
> +{
> +	if (IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API) &&
> +			arch_has_persistent_sync()) {

I must be slow and stupid but it looks to me like:

if arch_has_persistent_sync == false then persistent_sync
can then just be the default above wmb, and
	if (something_always_off())
		do
Will be always faster then a function pointer. This
if can be in the generic implementation of persistent_sync
and be done with.

Then persistent_copy() can just have an inline generic
implementation of your memcpy above, and the WARN_ON_ONCE
or what ever you want.

And no need for any opt vector and function pointers call out.

And also for me the all arch_has_persistent_sync() is mute,
the arches that have members of its family not support some
fixture can do the if (always_false_or_true) thing and do
not pollute the global name space. All we need is a single
switch as above
#ifdef CONFIG_ARCH_HAS_PMEM_API
	=> arch_persistent_sync
#else
	=> wmb
#endif

> +		/*
> +		 * This arch + cpu guarantees that bio_endio() == data
> +		 * durable on media.
> +		 */
> +		pmem->ops.remap = persistent_remap;
> +		pmem->ops.copy = persistent_copy;
> +		pmem->ops.sync = persistent_sync;
> +		return true;
> +	}
> +
> +	pmem_ops_default_init(pmem);
> +	return false;
> +}
> +
>  static void pmem_do_bvec(struct pmem_device *pmem, struct page *page,
>  			unsigned int len, unsigned int off, int rw,
>  			sector_t sector)
> @@ -51,7 +101,7 @@ static void pmem_do_bvec(struct pmem_device *pmem, struct page *page,
>  		flush_dcache_page(page);
>  	} else {
>  		flush_dcache_page(page);
> -		memcpy(pmem->virt_addr + pmem_off, mem + off, len);
> +		pmem->ops.copy(pmem->virt_addr + pmem_off, mem + off, len);
>  	}
>  
>  	kunmap_atomic(mem);
> @@ -82,6 +132,8 @@ static void pmem_make_request(struct request_queue *q, struct bio *bio)
>  		sector += bvec.bv_len >> 9;
>  	}
>  
> +	if (rw)
> +		pmem->ops.sync();
>  out:
>  	bio_endio(bio, err);
>  }
> @@ -131,6 +183,8 @@ static struct pmem_device *pmem_alloc(struct device *dev, struct resource *res)
>  
>  	pmem->phys_addr = res->start;
>  	pmem->size = resource_size(res);
> +	if (!pmem_ops_init(pmem))

fine just #ifndef CONFIG_ARCH_HAS_PMEM_API

> +		dev_warn(dev, "unable to guarantee persistence of writes\n");

But I wouldn't even bother I'd just put a WARN_ON_ONCE inside
persistent_copy() on any real use pmem or not and be done with it.

>  
>  	err = -EINVAL;
>  	if (!request_mem_region(pmem->phys_addr, pmem->size, "pmem")) {
> @@ -143,7 +197,7 @@ static struct pmem_device *pmem_alloc(struct device *dev, struct resource *res)
>  	 * of the CPU caches in case of a crash.
>  	 */
>  	err = -ENOMEM;
> -	pmem->virt_addr = memremap_wt(pmem->phys_addr, pmem->size);
> +	pmem->virt_addr = pmem->ops.remap(pmem->phys_addr, pmem->size);
>  	if (!pmem->virt_addr)
>  		goto out_release_region;
>  
> diff --git a/include/linux/pmem.h b/include/linux/pmem.h
> new file mode 100644
> index 000000000000..e9a63ee1d361
> --- /dev/null
> +++ b/include/linux/pmem.h
> @@ -0,0 +1,93 @@
> +/*
> + * Copyright(c) 2015 Intel Corporation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of version 2 of the GNU General Public License as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + */
> +#ifndef __PMEM_H__
> +#define __PMEM_H__
> +
> +#include <linux/io.h>
> +#include <asm/cacheflush.h>
> +
> +/*
> + * Architectures that define ARCH_HAS_PMEM_API must provide
> + * implementations for arch_persistent_remap(), arch_persistent_copy(),
> + * arch_persistent_sync(), and __arch_has_persistent_sync().
> + */
> +
> +#ifdef CONFIG_ARCH_HAS_PMEM_API
> +/**
> + * persistent_remap - map physical persistent memory for pmem api
> + * @offset: physical address of persistent memory
> + * @size: size of the mapping
> + *
> + * Establish a mapping of the architecture specific memory type expected
> + * by persistent_copy() and persistent_sync().  For example, it may be
> + * the case that an uncacheable or writethrough mapping is sufficient,
> + * or a writeback mapping provided persistent_copy() and
> + * persistent_sync() arrange for the data to be written through the
> + * cache to persistent media.
> + */
> +static inline void *persistent_remap(resource_size_t offset, unsigned long size)
> +{
> +	return arch_persistent_remap(offset, size);
> +}
> +
> +/**
> + * persistent_copy - copy data to persistent memory
> + * @dst: destination buffer for the copy
> + * @src: source buffer for the copy
> + * @n: length of the copy in bytes
> + *
> + * Perform a memory copy that results in the destination of the copy
> + * being effectively evicted from, or never written to, the processor
> + * cache hierarchy after the copy completes.  After persistent_copy()
> + * data may still reside in cpu or platform buffers, so this operation
> + * must be followed by a persistent_sync().
> + */
> +static inline void persistent_copy(void *dst, const void *src, size_t n)
> +{
> +	arch_persistent_copy(dst, src, n);
> +}
> +
> +/**
> + * persistent_sync - synchronize writes to persistent memory
> + *
> + * After a series of persistent_copy() operations this drains data from
> + * cpu write buffers and any platform (memory controller) buffers to
> + * ensure that written data is durable on persistent memory media.
> + */
> +static inline void persistent_sync(void)
> +{
> +	arch_persistent_sync();
> +}
> +
> +/**
> + * arch_has_persistent_sync - true if persistent_sync() ensures durability
> + *
> + * For a given cpu implementation within an architecture it is possible
> + * that persistent_sync() resolves to a nop.  In the case this returns
> + * false, pmem api users are unable to ensure durabilty and may want to
> + * fall back to a different data consistency model, or otherwise notify
> + * the user.
> + */
> +static inline bool arch_has_persistent_sync(void)

Again this can just go inside the arch in question.
Those arches without a choice need not bother

> +{
> +	return __arch_has_persistent_sync();
> +}
> +#else
> +/* undefined symbols */
> +extern void *persistent_remap(resource_size_t offet, unsigned long size);
> +extern void persistent_copy(void *dst, const void *src, size_t n);
> +extern void persistent_sync(void);

Define these to the generic imp you have in pmem.c (memcpy && wmb)
After all drivers/block/pmem.c is just as generic as this here place

> +extern bool arch_has_persistent_sync(void);
> +#endif /* CONFIG_ARCH_HAS_PMEM_API */
> +
> +#endif /* __PMEM_H__ */
> diff --git a/lib/Kconfig b/lib/Kconfig
> index 601965a948e8..e6a3c892d514 100644
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -522,4 +522,7 @@ source "lib/fonts/Kconfig"
>  config ARCH_HAS_SG_CHAIN
>  	def_bool n
>  
> +config ARCH_HAS_PMEM_API
> +	def_bool n
> +
>  endmenu
> 

Cheers
Boaz


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 4/4] arch, x86: cache management apis for persistent memory
  2015-06-01 11:39   ` Boaz Harrosh
@ 2015-06-01 11:44     ` Boaz Harrosh
  2015-06-01 16:07       ` Dan Williams
  2015-06-01 16:22     ` Dan Williams
  1 sibling, 1 reply; 19+ messages in thread
From: Boaz Harrosh @ 2015-06-01 11:44 UTC (permalink / raw)
  To: Dan Williams, arnd, mingo, bp, hpa, tglx, ross.zwisler, akpm
  Cc: jgross, konrad.wilk, mcgrof, x86, linux-kernel, stefan.bader,
	luto, linux-mm, linux-nvdimm, geert, hmh, tj, hch

Forgot one thing

On 06/01/2015 02:39 PM, Boaz Harrosh wrote:
>> +static inline void persistent_copy(void *dst, const void *src, size_t n)

Could we please make this
memcpy_persistent

Same as:
copy_from_user_nocache

The generic name of what it does first then the special override.
copy_from_user_XXX is same as copy_from_user but with XXX applied

Same here exactly as memcpy_ but with persistent applied.

<>

Thanks
Boaz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/4] devm: fix ioremap_cache() usage
  2015-05-30 21:16     ` Dan Williams
@ 2015-06-01 14:26       ` Arnd Bergmann
  0 siblings, 0 replies; 19+ messages in thread
From: Arnd Bergmann @ 2015-06-01 14:26 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ingo Molnar, Borislav Petkov, H. Peter Anvin, Thomas Gleixner,
	Ross Zwisler, Andrew Morton, Juergen Gross, X86 ML, Kani,
	Toshimitsu, linux-nvdimm, Luis Rodriguez, Konrad Rzeszutek Wilk,
	linux-kernel, Stefan Bader, Andy Lutomirski, linux-mm, geert,
	Henrique de Moraes Holschuh, Tejun Heo, Christoph Hellwig

On Saturday 30 May 2015 14:16:28 Dan Williams wrote:
> On Sat, May 30, 2015 at 1:52 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Saturday 30 May 2015, Dan Williams wrote:
> >> @@ -154,7 +148,7 @@ void __iomem *devm_ioremap_resource(struct device *dev, struct resource *res)
> >>         }
> >>
> >>         if (res->flags & IORESOURCE_CACHEABLE)
> >> -               dest_ptr = devm_ioremap(dev, res->start, size);
> >> +               dest_ptr = devm_ioremap_cache(dev, res->start, size);
> >>         else
> >>                 dest_ptr = devm_ioremap_nocache(dev, res->start, size);
> >
> > I think the existing uses of IORESOURCE_CACHEABLE are mostly bugs, so changing
> > the behavior here may cause more problems than it solves.
> >
> 
> Ok, but that effectively makes devm_ioremap_resource() unusable for
> the cached case.  How about introducing devm_ioremap_cache_resource(),
> and cleaning up devm_ioremap_resource() to stop pretending that it is
> honoring the memory type of the resource?

I was thinking the opposite approach and basically removing all uses
of IORESOURCE_CACHEABLE from the kernel. There are only a handful of
them.and we can probably replace them all with hardcoded ioremap_cached()
calls in the cases they are actually useful

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/4] arch: introduce memremap()
  2015-05-30 21:39     ` Dan Williams
@ 2015-06-01 14:29       ` Arnd Bergmann
  0 siblings, 0 replies; 19+ messages in thread
From: Arnd Bergmann @ 2015-06-01 14:29 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ingo Molnar, Borislav Petkov, H. Peter Anvin, Thomas Gleixner,
	Ross Zwisler, Andrew Morton, Juergen Gross, X86 ML, Kani,
	Toshimitsu, linux-nvdimm, Luis Rodriguez, Konrad Rzeszutek Wilk,
	linux-kernel, Stefan Bader, Andy Lutomirski, linux-mm, geert,
	Henrique de Moraes Holschuh, Tejun Heo, Christoph Hellwig

On Saturday 30 May 2015 14:39:48 Dan Williams wrote:
> On Sat, May 30, 2015 at 2:00 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Saturday 30 May 2015, Dan Williams wrote:
> >>
> >> +/*
> >> + * memremap() is "ioremap" for cases where it is known that the resource
> >> + * being mapped does not have i/o side effects and the __iomem
> >> + * annotation is not applicable.
> >> + */
> >> +
> >> +static inline void *memremap(resource_size_t offset, size_t size)
> >> +{
> >> +       return (void __force *) ioremap(offset, size);
> >> +}
> >> +
> >> +static inline void *memremap_nocache(resource_size_t offset, size_t size)
> >> +{
> >> +       return (void __force *) ioremap_nocache(offset, size);
> >> +}
> >> +
> >> +static inline void *memremap_cache(resource_size_t offset, size_t size)
> >> +{
> >> +       return (void __force *) ioremap_cache(offset, size);
> >> +}
> >> +
> >
> > There are architectures on which the result of ioremap is not necessarily
> > a pointer, but instead indicates that the access is to be done through
> > some other indirect access, or require special instructions. I think implementing
> > the memremap() interfaces is generally helpful, but don't rely on the
> > ioremap implementation.
> 
> Is it enough to detect the archs where ioremap() does return an
> otherwise usable pointer and set ARCH_HAS_MEMREMAP, in the first take
> of this introduction?  Regardless, it seems that drivers should have
> Kconfig dependency checks for archs where ioremap can not be used in
> this manner.

Yes, that should work.

> > Adding both cached an uncached versions is also dangerous, because you
> > typically get either undefined behavior or a system checkstop when a
> > single page is mapped both cached and uncached at the same time. This
> > means that doing memremap() or memremap_nocache() on something that
> > may be part of the linear kernel mapping is a bug, and we should probably
> > check for that here.
> 
> Part of the reason for relying on ioremap() was to borrow its internal
> checks to fail attempts that try to remap ranges that are already in
> the kernel linear map.  Hmm, that's a guarantee x86 ioremap gives, but
> maybe that's not universal?

I haven't seen that check elsewhere. IIRC what ioremap() guarantees on ARM
is that if there is an existing boot-time mapping (similar to x86 fixmap,
but more commonly used), we use the same flags in the new ioremap and
override the ones that are provided by the caller.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 4/4] arch, x86: cache management apis for persistent memory
  2015-06-01 11:44     ` Boaz Harrosh
@ 2015-06-01 16:07       ` Dan Williams
  0 siblings, 0 replies; 19+ messages in thread
From: Dan Williams @ 2015-06-01 16:07 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Arnd Bergmann, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Thomas Gleixner, Ross Zwisler, Andrew Morton, Juergen Gross,
	Konrad Rzeszutek Wilk, Luis Rodriguez, X86 ML, linux-kernel,
	Stefan Bader, Andy Lutomirski, linux-mm, linux-nvdimm, geert,
	Henrique de Moraes Holschuh, Tejun Heo, Christoph Hellwig

On Mon, Jun 1, 2015 at 4:44 AM, Boaz Harrosh <boaz@plexistor.com> wrote:
> Forgot one thing
>
> On 06/01/2015 02:39 PM, Boaz Harrosh wrote:
>>> +static inline void persistent_copy(void *dst, const void *src, size_t n)
>
> Could we please make this
> memcpy_persistent
>
> Same as:
> copy_from_user_nocache
>
> The generic name of what it does first then the special override.
> copy_from_user_XXX is same as copy_from_user but with XXX applied
>
> Same here exactly as memcpy_ but with persistent applied.

Ok, makes sense.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 4/4] arch, x86: cache management apis for persistent memory
  2015-06-01 11:39   ` Boaz Harrosh
  2015-06-01 11:44     ` Boaz Harrosh
@ 2015-06-01 16:22     ` Dan Williams
  1 sibling, 0 replies; 19+ messages in thread
From: Dan Williams @ 2015-06-01 16:22 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Arnd Bergmann, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Thomas Gleixner, Ross Zwisler, Andrew Morton, Juergen Gross,
	Konrad Rzeszutek Wilk, Luis Rodriguez, X86 ML, linux-kernel,
	Stefan Bader, Andy Lutomirski, linux-mm, linux-nvdimm, geert,
	Henrique de Moraes Holschuh, Tejun Heo, Christoph Hellwig

On Mon, Jun 1, 2015 at 4:39 AM, Boaz Harrosh <boaz@plexistor.com> wrote:
> On 05/30/2015 09:59 PM, Dan Williams wrote:
>> From: Ross Zwisler <ross.zwisler@linux.intel.com>
>>
>> Based on an original patch by Ross Zwisler [1].
>>
>> Writes to persistent memory have the potential to be posted to cpu
>> cache, cpu write buffers, and platform write buffers (memory controller)
>> before being committed to persistent media.  Provide apis
>> (persistent_copy() and persistent_sync()) to copy data and assert that
>> it is durable in PMEM (a persistent linear address range).
>>
>> [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-May/000932.html
>>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
>> [djbw: s/arch_persistent_flush()/io_flush_cache_range()/]
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  arch/x86/Kconfig                  |    1
>>  arch/x86/include/asm/cacheflush.h |   24 ++++++++++
>>  arch/x86/include/asm/io.h         |    6 ++
>>  drivers/block/pmem.c              |   58 ++++++++++++++++++++++-
>>  include/linux/pmem.h              |   93 +++++++++++++++++++++++++++++++++++++
>>  lib/Kconfig                       |    3 +
>>  6 files changed, 183 insertions(+), 2 deletions(-)
>>  create mode 100644 include/linux/pmem.h
>>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 73a4d0330ad0..6412d92e6f1e 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -102,6 +102,7 @@ config X86
>>       select HAVE_ARCH_TRANSPARENT_HUGEPAGE
>>       select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
>>       select ARCH_HAS_SG_CHAIN
>> +     select ARCH_HAS_PMEM_API
>>       select CLKEVT_I8253
>>       select ARCH_HAVE_NMI_SAFE_CMPXCHG
>>       select GENERIC_IOMAP
>> diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
>> index b6f7457d12e4..6b8bd5c43bf6 100644
>> --- a/arch/x86/include/asm/cacheflush.h
>> +++ b/arch/x86/include/asm/cacheflush.h
>> @@ -4,6 +4,7 @@
>>  /* Caches aren't brain-dead on the intel. */
>>  #include <asm-generic/cacheflush.h>
>>  #include <asm/special_insns.h>
>> +#include <asm/uaccess.h>
>>
>>  /*
>>   * The set_memory_* API can be used to change various attributes of a virtual
>> @@ -108,4 +109,27 @@ static inline int rodata_test(void)
>>  }
>>  #endif
>>
>> +static inline void arch_persistent_copy(void *dst, const void *src, size_t n)
>> +{
>> +     /*
>> +      * We are copying between two kernel buffers, if
>> +      * __copy_from_user_inatomic_nocache() returns an error (page
>> +      * fault) we would have already taken an unhandled fault before
>> +      * the BUG_ON.  The BUG_ON is simply here to satisfy
>> +      * __must_check and allow reuse of the common non-temporal store
>> +      * implementation for persistent_copy().
>> +      */
>> +     BUG_ON(__copy_from_user_inatomic_nocache(dst, src, n));
>> +}
>> +
>> +static inline void arch_persistent_sync(void)
>> +{
>> +     wmb();
>> +     pcommit_sfence();
>> +}
>> +
>> +static inline bool __arch_has_persistent_sync(void)
>> +{
>> +     return boot_cpu_has(X86_FEATURE_PCOMMIT);
>> +}
>>  #endif /* _ASM_X86_CACHEFLUSH_H */
>> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>> index 956f2768bdc1..f3c32bb207cf 100644
>> --- a/arch/x86/include/asm/io.h
>> +++ b/arch/x86/include/asm/io.h
>> @@ -250,6 +250,12 @@ static inline void flush_write_buffers(void)
>>  #endif
>>  }
>>
>> +static inline void *arch_persistent_remap(resource_size_t offset,
>> +     unsigned long size)
>> +{
>> +     return (void __force *) ioremap_cache(offset, size);
>> +}
>> +
>>  #endif /* __KERNEL__ */
>>
>>  extern void native_io_delay(void);
>> diff --git a/drivers/block/pmem.c b/drivers/block/pmem.c
>> index 799acff6bd7c..10cbe557165c 100644
>> --- a/drivers/block/pmem.c
>> +++ b/drivers/block/pmem.c
>> @@ -23,9 +23,16 @@
>>  #include <linux/module.h>
>>  #include <linux/moduleparam.h>
>>  #include <linux/slab.h>
>> +#include <linux/pmem.h>
>>
>>  #define PMEM_MINORS          16
>>
>> +struct pmem_ops {
>> +     void *(*remap)(resource_size_t offset, unsigned long size);
>> +     void (*copy)(void *dst, const void *src, size_t size);
>> +     void (*sync)(void);
>
> What? why the ops at all see below ...
>
>> +};
>> +
>>  struct pmem_device {
>>       struct request_queue    *pmem_queue;
>>       struct gendisk          *pmem_disk;
>> @@ -34,11 +41,54 @@ struct pmem_device {
>>       phys_addr_t             phys_addr;
>>       void                    *virt_addr;
>>       size_t                  size;
>> +     struct pmem_ops         ops;
>>  };
>>
>>  static int pmem_major;
>>  static atomic_t pmem_index;
>>
>> +static void default_pmem_sync(void)
>> +{
>> +     wmb();
>> +}
>> +
>> +static void default_pmem_copy(void *dst, const void *src, size_t size)
>> +{
>> +     memcpy(dst, src, size);
>> +}
>> +
>> +static void pmem_ops_default_init(struct pmem_device *pmem)
>> +{
>> +     /*
>> +      * These defaults seek to offer decent performance and minimize
>> +      * the window between i/o completion and writes being durable on
>> +      * media.  However, it is undefined / architecture specific
>> +      * whether acknowledged data may be lost in transit if a power
>> +      * fail occurs after bio_endio().
>> +      */
>> +     pmem->ops.remap = memremap_wt;
>> +     pmem->ops.copy = default_pmem_copy;
>> +     pmem->ops.sync = default_pmem_sync;
>> +}
>> +
>> +static bool pmem_ops_init(struct pmem_device *pmem)
>> +{
>> +     if (IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API) &&
>> +                     arch_has_persistent_sync()) {
>
> I must be slow and stupid but it looks to me like:
>
> if arch_has_persistent_sync == false then persistent_sync
> can then just be the default above wmb, and
>         if (something_always_off())
>                 do
> Will be always faster then a function pointer. This
> if can be in the generic implementation of persistent_sync
> and be done with.

A few things... that "something_always_off()" check is a cpuid call on
x86.  There's no need to keep asking we can resolve the answer once at
init.  If you can get this indirect branch to show up in a meaningful
way on a profile I'd reconsider, but branch-target-buffers are fairly
good these days.  The wmb() is neither necessary nor sufficient on x86
and it's simply a guess that it helps on other archs().

> Then persistent_copy() can just have an inline generic
> implementation of your memcpy above, and the WARN_ON_ONCE
> or what ever you want.

No, persistent_copy() does not have the proper context to tell you
which device is encountering the problem.  And again, we'd be taking a
conditional branch every i/o for this determination that could be made
once at init.

> And no need for any opt vector and function pointers call out.

There's a third case to consider, driver local persistence guarantees.
ADR platforms need not call pcommit_sfence() to guarantee persistence.
In that case I would expect ADR platform enabling developers to extend
pmem_ops_init() to turn off the warning with ops that are specific to
their platform.  We also have directed flush support through registers
that could be turned into another op option.

> And also for me the all arch_has_persistent_sync() is mute,
> the arches that have members of its family not support some
> fixture can do the if (always_false_or_true) thing and do
> not pollute the global name space. All we need is a single
> switch as above
> #ifdef CONFIG_ARCH_HAS_PMEM_API
>         => arch_persistent_sync
> #else
>         => wmb

wmb() is not a substitute for pcommit, I think it is dangerous to
imply that you can silently fall back to wmb().

> #endif
>
>> +             /*
>> +              * This arch + cpu guarantees that bio_endio() == data
>> +              * durable on media.
>> +              */
>> +             pmem->ops.remap = persistent_remap;
>> +             pmem->ops.copy = persistent_copy;
>> +             pmem->ops.sync = persistent_sync;
>> +             return true;
>> +     }
>> +
>> +     pmem_ops_default_init(pmem);
>> +     return false;
>> +}
>> +
>>  static void pmem_do_bvec(struct pmem_device *pmem, struct page *page,
>>                       unsigned int len, unsigned int off, int rw,
>>                       sector_t sector)
>> @@ -51,7 +101,7 @@ static void pmem_do_bvec(struct pmem_device *pmem, struct page *page,
>>               flush_dcache_page(page);
>>       } else {
>>               flush_dcache_page(page);
>> -             memcpy(pmem->virt_addr + pmem_off, mem + off, len);
>> +             pmem->ops.copy(pmem->virt_addr + pmem_off, mem + off, len);
>>       }
>>
>>       kunmap_atomic(mem);
>> @@ -82,6 +132,8 @@ static void pmem_make_request(struct request_queue *q, struct bio *bio)
>>               sector += bvec.bv_len >> 9;
>>       }
>>
>> +     if (rw)
>> +             pmem->ops.sync();
>>  out:
>>       bio_endio(bio, err);
>>  }
>> @@ -131,6 +183,8 @@ static struct pmem_device *pmem_alloc(struct device *dev, struct resource *res)
>>
>>       pmem->phys_addr = res->start;
>>       pmem->size = resource_size(res);
>> +     if (!pmem_ops_init(pmem))
>
> fine just #ifndef CONFIG_ARCH_HAS_PMEM_API

No because the presence of the API is not sufficient to determine
whether persistence durability can be guaranteed.

>> +             dev_warn(dev, "unable to guarantee persistence of writes\n");
>
> But I wouldn't even bother I'd just put a WARN_ON_ONCE inside
> persistent_copy() on any real use pmem or not and be done with it.

Again, we can make this determination once, and not forever for every
future usage.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/4] arch/*/asm/io.h: add ioremap_cache() to all architectures
  2015-05-30 18:59 ` [PATCH v2 1/4] arch/*/asm/io.h: add ioremap_cache() to all architectures Dan Williams
@ 2015-06-01 22:36   ` Toshi Kani
  2015-06-02  8:20     ` Arnd Bergmann
  0 siblings, 1 reply; 19+ messages in thread
From: Toshi Kani @ 2015-06-01 22:36 UTC (permalink / raw)
  To: Dan Williams
  Cc: arnd, mingo, bp, hpa, tglx, ross.zwisler, akpm, jgross, x86,
	linux-nvdimm, mcgrof, konrad.wilk, linux-kernel, stefan.bader,
	luto, linux-mm, geert, hmh, tj, hch

On Sat, 2015-05-30 at 14:59 -0400, Dan Williams wrote:
> Similar to ioremap_wc() let architecture implementations optionally
> provide ioremap_cache().  As is, current ioremap_cache() users have
> architecture dependencies that prevent them from compiling on archs
> without ioremap_cache().  In some cases the architectures that have a
> cached ioremap() capability have an identifier other than
> "ioremap_cache".
> 
> Allow drivers to compile with ioremap_cache() support and fallback to a
> safe / uncached ioremap otherwise.
 :
> diff --git a/arch/mn10300/include/asm/io.h b/arch/mn10300/include/asm/io.h
> index 07c5b4a3903b..dcab414f40df 100644
> --- a/arch/mn10300/include/asm/io.h
> +++ b/arch/mn10300/include/asm/io.h
> @@ -283,6 +283,7 @@ static inline void __iomem *ioremap_nocache(unsigned long offset, unsigned long
>  
>  #define ioremap_wc ioremap_nocache
>  #define ioremap_wt ioremap_nocache
> +#define ioremap_cache ioremap_nocache

>From the comment in ioremap_nocache(), ioremap() may be cacheable in
this arch.  

> diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
> index f56094cfdeff..a0665dfcab47 100644
> --- a/include/asm-generic/io.h
> +++ b/include/asm-generic/io.h
> @@ -793,6 +793,14 @@ static inline void __iomem *ioremap_wt(phys_addr_t offset, size_t size)
>  }
>  #endif
>  
> +#ifndef ioremap_cache
> +#define ioremap_cache ioremap_cache
> +static inline void __iomem *ioremap_cache(phys_addr_t offset, size_t size)
> +{
> +	return ioremap_nocache(offset, size);

Should this be defined as ioremap()?

> +}
> +#endif
> +
>  #ifndef iounmap
>  #define iounmap iounmap
>  
> diff --git a/include/asm-generic/iomap.h b/include/asm-generic/iomap.h
> index d8f8622fa044..f0f30464cecd 100644
> --- a/include/asm-generic/iomap.h
> +++ b/include/asm-generic/iomap.h
> @@ -70,6 +70,10 @@ extern void ioport_unmap(void __iomem *);
>  #define ioremap_wt ioremap_nocache
>  #endif
>  
> +#ifndef ARCH_HAS_IOREMAP_CACHE
> +#define ioremap_cache ioremap_nocache

Ditto.


Also, it'd be nice to remove ioremap_cached() and ioremap_fullcache()
with a separate patch in this opportunity.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/4] arch/*/asm/io.h: add ioremap_cache() to all architectures
  2015-06-01 22:36   ` Toshi Kani
@ 2015-06-02  8:20     ` Arnd Bergmann
  2015-06-02  8:38       ` Geert Uytterhoeven
  0 siblings, 1 reply; 19+ messages in thread
From: Arnd Bergmann @ 2015-06-02  8:20 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Dan Williams, mingo, bp, hpa, tglx, ross.zwisler, akpm, jgross,
	x86, linux-nvdimm, mcgrof, konrad.wilk, linux-kernel,
	stefan.bader, luto, linux-mm, geert, hmh, tj, hch, dhowells

On Monday 01 June 2015 16:36:06 Toshi Kani wrote:
> On Sat, 2015-05-30 at 14:59 -0400, Dan Williams wrote:
> > Similar to ioremap_wc() let architecture implementations optionally
> > provide ioremap_cache().  As is, current ioremap_cache() users have
> > architecture dependencies that prevent them from compiling on archs
> > without ioremap_cache().  In some cases the architectures that have a
> > cached ioremap() capability have an identifier other than
> > "ioremap_cache".
> > 
> > Allow drivers to compile with ioremap_cache() support and fallback to a
> > safe / uncached ioremap otherwise.
>  :
> > diff --git a/arch/mn10300/include/asm/io.h b/arch/mn10300/include/asm/io.h
> > index 07c5b4a3903b..dcab414f40df 100644
> > --- a/arch/mn10300/include/asm/io.h
> > +++ b/arch/mn10300/include/asm/io.h
> > @@ -283,6 +283,7 @@ static inline void __iomem *ioremap_nocache(unsigned long offset, unsigned long
> >  
> >  #define ioremap_wc ioremap_nocache
> >  #define ioremap_wt ioremap_nocache
> > +#define ioremap_cache ioremap_nocache
> 
> From the comment in ioremap_nocache(), ioremap() may be cacheable in
> this arch.  

Right, and I guess that would be a bug. ;-)

mn10300 decides caching on the address, so presumably all arguments passed into
ioremap here already have that bit set. I've checked all the resource
definitions for mn10300, and they are all between 0xA0000000 and 0xBFFFFFFF,
which is non-cacheable.

> > diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
> > index f56094cfdeff..a0665dfcab47 100644
> > --- a/include/asm-generic/io.h
> > +++ b/include/asm-generic/io.h
> > @@ -793,6 +793,14 @@ static inline void __iomem *ioremap_wt(phys_addr_t offset, size_t size)
> >  }
> >  #endif
> >  
> > +#ifndef ioremap_cache
> > +#define ioremap_cache ioremap_cache
> > +static inline void __iomem *ioremap_cache(phys_addr_t offset, size_t size)
> > +{
> > +	return ioremap_nocache(offset, size);
> 
> Should this be defined as ioremap()?

I would leave it like this, for clarity. All architectures at the moment
need to define ioremap_nocache and ioremap to be the same thing anyway,
but this definition makes it clearer that it's not actually cached.

> > diff --git a/include/asm-generic/iomap.h b/include/asm-generic/iomap.h
> > index d8f8622fa044..f0f30464cecd 100644
> > --- a/include/asm-generic/iomap.h
> > +++ b/include/asm-generic/iomap.h
> > @@ -70,6 +70,10 @@ extern void ioport_unmap(void __iomem *);
> >  #define ioremap_wt ioremap_nocache
> >  #endif
> >  
> > +#ifndef ARCH_HAS_IOREMAP_CACHE
> > +#define ioremap_cache ioremap_nocache
> 
> Ditto.
> 
> 
> Also, it'd be nice to remove ioremap_cached() and ioremap_fullcache()
> with a separate patch in this opportunity.

Agreed.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/4] arch/*/asm/io.h: add ioremap_cache() to all architectures
  2015-06-02  8:20     ` Arnd Bergmann
@ 2015-06-02  8:38       ` Geert Uytterhoeven
  0 siblings, 0 replies; 19+ messages in thread
From: Geert Uytterhoeven @ 2015-06-02  8:38 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Toshi Kani, Dan Williams, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Thomas Gleixner, ross.zwisler, Andrew Morton,
	jgross, the arch/x86 maintainers, linux-nvdimm,
	Luis R. Rodriguez, Konrad Rzeszutek Wilk, linux-kernel,
	stefan.bader, Andy Lutomirski, Linux MM,
	Henrique de Moraes Holschuh, Tejun Heo, Christoph Hellwig,
	David Howells

On Tue, Jun 2, 2015 at 10:20 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> > --- a/arch/mn10300/include/asm/io.h
>> > +++ b/arch/mn10300/include/asm/io.h
>> > @@ -283,6 +283,7 @@ static inline void __iomem *ioremap_nocache(unsigned long offset, unsigned long
>> >
>> >  #define ioremap_wc ioremap_nocache
>> >  #define ioremap_wt ioremap_nocache
>> > +#define ioremap_cache ioremap_nocache
>>
>> From the comment in ioremap_nocache(), ioremap() may be cacheable in
>> this arch.
>
> Right, and I guess that would be a bug. ;-)
>
> mn10300 decides caching on the address, so presumably all arguments passed into

Aha, like MIPS...

> ioremap here already have that bit set. I've checked all the resource
> definitions for mn10300, and they are all between 0xA0000000 and 0xBFFFFFFF,
> which is non-cacheable.

But ioremap() clears that bit again:

static inline void __iomem *ioremap(unsigned long offset, unsigned long size)
{
        return (void __iomem *)(offset & ~0x20000000);
}

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-06-02  8:38 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-30 18:59 [PATCH v2 0/4] pmem api, generic ioremap_cache, and memremap Dan Williams
2015-05-30 18:59 ` [PATCH v2 1/4] arch/*/asm/io.h: add ioremap_cache() to all architectures Dan Williams
2015-06-01 22:36   ` Toshi Kani
2015-06-02  8:20     ` Arnd Bergmann
2015-06-02  8:38       ` Geert Uytterhoeven
2015-05-30 18:59 ` [PATCH v2 2/4] devm: fix ioremap_cache() usage Dan Williams
2015-05-30 20:52   ` Arnd Bergmann
2015-05-30 21:16     ` Dan Williams
2015-06-01 14:26       ` Arnd Bergmann
2015-05-30 18:59 ` [PATCH v2 3/4] arch: introduce memremap() Dan Williams
2015-05-30 21:00   ` Arnd Bergmann
2015-05-30 21:39     ` Dan Williams
2015-06-01 14:29       ` Arnd Bergmann
2015-05-30 18:59 ` [PATCH v2 4/4] arch, x86: cache management apis for persistent memory Dan Williams
2015-06-01  9:19   ` Paul Bolle
2015-06-01 11:39   ` Boaz Harrosh
2015-06-01 11:44     ` Boaz Harrosh
2015-06-01 16:07       ` Dan Williams
2015-06-01 16:22     ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).