linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/17] replace pcommit with ADR or directed flushing
@ 2016-07-10  3:24 Dan Williams
  2016-07-10  3:24 ` [PATCH v2 01/17] nfit: always associate flush hints Dan Williams
                   ` (16 more replies)
  0 siblings, 17 replies; 32+ messages in thread
From: Dan Williams @ 2016-07-10  3:24 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Xiao Guangrong, linux-acpi, Peter Zijlstra, linux-kernel, x86,
	Adrian Hunter, Arnaldo Carvalho de Melo, hch, Alexander Shishkin,
	Ingo Molnar, Andy Lutomirski, Josh Poimboeuf, linux-fsdevel,
	Paolo Bonzini, Thomas Gleixner, Borislav Petkov, H. Peter Anvin,
	Ross Zwisler

Changes since v1 [1]:

1/ Move flush address data from nvdimm_drvdata to nd_region_data (Greg,
   Toshi)

2/ Add more detail to cover letter and patch descriptions (Linda, Jeff)

3/ Account for s/REQ_FLUSH/REQ_PREFLUSH/ rename pending in -next.

4/ Add a directed flush at pmem ->remove() and ->shutdown() time.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2016-June/005897.html

---

The pcommit instruction, which has not shipped on any product, is
deprecated. Instead, the expectation is that platforms implement either
ADR, or provide one or more flush addresses per nvdimm. ADR
(Asynchronous DRAM Refresh) flushes data in posted write buffers to the
memory controller on a power-fail event. Flush addresses are defined in
ACPI 6.x as an NVDIMM Firmware Interface Table (NFIT) sub-structure:
"Flush Hint Address Structure". A flush hint is an mmio address that
when written and fenced assures that all previous posted writes
targeting a given dimm have been flushed to media.

Code paths that previously called wmb_pmem() instead must arrange for a
flush request to be sent to the pmem driver. Towards this end, the pmem
driver is converted to advertise itself as having a write cache to
indicate to a filesystem that a flush request must occur before writes
are guaranteed to be on media.  See "[PATCH v2 08/17] libnvdimm:
introduce nvdimm_flush() and nvdimm_has_flush()" for details.

---

Dan Williams (17):
      nfit: always associate flush hints
      nfit: don't override return value of nfit_mem_init
      libnvdimm: introduce devm_nvdimm_memremap(), convert nfit_spa_map() users
      libnvdimm, nfit: remove nfit_spa_map() infrastructure
      libnvdimm, nfit: move flush hint mapping to region-device driver-data
      tools/testing/nvdimm: simulate multiple flush hints per-dimm
      libnvdimm: keep region data alive over namespace removal
      libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()
      libnvdimm: cycle flush hints
      libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush()
      libnvdimm, pmem: flush posted-write queues on shutdown
      fs/dax: remove wmb_pmem()
      libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
      pmem: kill wmb_pmem()
      Revert "KVM: x86: add pcommit support"
      x86/insn: remove pcommit
      pmem: kill __pmem address space


 Documentation/filesystems/Locking                  |    2 
 arch/powerpc/sysdev/axonram.c                      |    4 
 arch/x86/include/asm/cpufeatures.h                 |    1 
 arch/x86/include/asm/pmem.h                        |   77 ++-----
 arch/x86/include/asm/special_insns.h               |   46 ----
 arch/x86/include/asm/vmx.h                         |    1 
 arch/x86/include/uapi/asm/vmx.h                    |    4 
 arch/x86/kvm/cpuid.c                               |    2 
 arch/x86/kvm/cpuid.h                               |    8 -
 arch/x86/kvm/vmx.c                                 |   32 ---
 arch/x86/lib/x86-opcode-map.txt                    |    2 
 drivers/acpi/nfit.c                                |  230 +++-----------------
 drivers/acpi/nfit.h                                |   25 --
 drivers/block/brd.c                                |    4 
 drivers/nvdimm/bus.c                               |   16 +
 drivers/nvdimm/claim.c                             |    2 
 drivers/nvdimm/core.c                              |  122 +++++++++++
 drivers/nvdimm/dimm_devs.c                         |    5 
 drivers/nvdimm/nd-core.h                           |    4 
 drivers/nvdimm/nd.h                                |   10 +
 drivers/nvdimm/pmem.c                              |   59 ++++-
 drivers/nvdimm/pmem.h                              |    4 
 drivers/nvdimm/region.c                            |   19 +-
 drivers/nvdimm/region_devs.c                       |  148 ++++++++++++-
 drivers/s390/block/dcssblk.c                       |    6 -
 fs/dax.c                                           |   13 -
 include/linux/blkdev.h                             |    6 -
 include/linux/compiler.h                           |    2 
 include/linux/libnvdimm.h                          |   16 +
 include/linux/nd.h                                 |    3 
 include/linux/pmem.h                               |  117 ++--------
 scripts/checkpatch.pl                              |    1 
 tools/objtool/arch/x86/insn/x86-opcode-map.txt     |    2 
 tools/perf/arch/x86/tests/insn-x86-dat-32.c        |    2 
 tools/perf/arch/x86/tests/insn-x86-dat-64.c        |    2 
 tools/perf/arch/x86/tests/insn-x86-dat-src.c       |    4 
 .../perf/util/intel-pt-decoder/x86-opcode-map.txt  |    2 
 tools/testing/nvdimm/pmem-dax.c                    |    2 
 tools/testing/nvdimm/test/nfit.c                   |   55 +++--
 39 files changed, 505 insertions(+), 555 deletions(-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 01/17] nfit: always associate flush hints
  2016-07-10  3:24 [PATCH v2 00/17] replace pcommit with ADR or directed flushing Dan Williams
@ 2016-07-10  3:24 ` Dan Williams
  2016-07-10  3:24 ` [PATCH v2 02/17] nfit: don't override return value of nfit_mem_init Dan Williams
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Dan Williams @ 2016-07-10  3:24 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: linux-fsdevel, linux-acpi, Ross Zwisler, hch, linux-kernel

Before enabling use of flush hints for pmem regions, we need to make
sure they are always associated.  Move the initialization of nfit_flush
out of the block-window specific init path to the general init path.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/acpi/nfit.c |   17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 3e54157f02cc..d79837b9d07e 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -614,7 +614,6 @@ static void nfit_mem_init_bdw(struct acpi_nfit_desc *acpi_desc,
 {
 	u16 dcr = __to_nfit_memdev(nfit_mem)->region_index;
 	struct nfit_memdev *nfit_memdev;
-	struct nfit_flush *nfit_flush;
 	struct nfit_bdw *nfit_bdw;
 	struct nfit_idt *nfit_idt;
 	u16 idt_idx, range_index;
@@ -647,14 +646,6 @@ static void nfit_mem_init_bdw(struct acpi_nfit_desc *acpi_desc,
 			nfit_mem->idt_bdw = nfit_idt->idt;
 			break;
 		}
-
-		list_for_each_entry(nfit_flush, &acpi_desc->flushes, list) {
-			if (nfit_flush->flush->device_handle !=
-					nfit_memdev->memdev->device_handle)
-				continue;
-			nfit_mem->nfit_flush = nfit_flush;
-			break;
-		}
 		break;
 	}
 }
@@ -675,6 +666,7 @@ static int nfit_mem_dcr_init(struct acpi_nfit_desc *acpi_desc,
 	}
 
 	list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
+		struct nfit_flush *nfit_flush;
 		struct nfit_dcr *nfit_dcr;
 		u32 device_handle;
 		u16 dcr;
@@ -721,6 +713,13 @@ static int nfit_mem_dcr_init(struct acpi_nfit_desc *acpi_desc,
 			break;
 		}
 
+		list_for_each_entry(nfit_flush, &acpi_desc->flushes, list) {
+			if (nfit_flush->flush->device_handle != device_handle)
+				continue;
+			nfit_mem->nfit_flush = nfit_flush;
+			break;
+		}
+
 		if (dcr && !nfit_mem->dcr) {
 			dev_err(acpi_desc->dev, "SPA %d missing DCR %d\n",
 					spa->range_index, dcr);

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 02/17] nfit: don't override return value of nfit_mem_init
  2016-07-10  3:24 [PATCH v2 00/17] replace pcommit with ADR or directed flushing Dan Williams
  2016-07-10  3:24 ` [PATCH v2 01/17] nfit: always associate flush hints Dan Williams
@ 2016-07-10  3:24 ` Dan Williams
  2016-07-10  3:24 ` [PATCH v2 03/17] libnvdimm: introduce devm_nvdimm_memremap(), convert nfit_spa_map() users Dan Williams
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Dan Williams @ 2016-07-10  3:24 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: linux-fsdevel, linux-acpi, hch, linux-kernel

We were needlessly converting nfit_mem_init() errors to -ENOMEM.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/acpi/nfit.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index d79837b9d07e..f8c1a850effc 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -2422,10 +2422,9 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
 	if (rc)
 		goto out_unlock;
 
-	if (nfit_mem_init(acpi_desc) != 0) {
-		rc = -ENOMEM;
+	rc = nfit_mem_init(acpi_desc);
+	if (rc)
 		goto out_unlock;
-	}
 
 	acpi_nfit_init_dsms(acpi_desc);
 

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 03/17] libnvdimm: introduce devm_nvdimm_memremap(), convert nfit_spa_map() users
  2016-07-10  3:24 [PATCH v2 00/17] replace pcommit with ADR or directed flushing Dan Williams
  2016-07-10  3:24 ` [PATCH v2 01/17] nfit: always associate flush hints Dan Williams
  2016-07-10  3:24 ` [PATCH v2 02/17] nfit: don't override return value of nfit_mem_init Dan Williams
@ 2016-07-10  3:24 ` Dan Williams
  2016-07-10  5:30   ` kbuild test robot
  2016-07-12 22:22   ` [PATCH v3] " Dan Williams
  2016-07-10  3:24 ` [PATCH v2 04/17] libnvdimm, nfit: remove nfit_spa_map() infrastructure Dan Williams
                   ` (13 subsequent siblings)
  16 siblings, 2 replies; 32+ messages in thread
From: Dan Williams @ 2016-07-10  3:24 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: linux-fsdevel, linux-acpi, hch, linux-kernel

In preparation for generically mapping flush hint addresses for both the
BLK and PMEM use case, provide a generic / reference counted mapping
api.  Given the fact that a dimm may belong to multiple regions (PMEM
and BLK), the flush hint addresses need to be held valid as long as any
region associated with the dimm is active.  This is similar to the
existing BLK-region case where multiple BLK-regions may share an
aperture mapping.  Up-level this shared / reference-counted mapping
capability from the nfit driver to a core nvdimm capability.

This eliminates the need for the nd_blk_region.disable() callback.  Note
that the removal of nfit_spa_map() and related infrastructure is
deferred to a later patch.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/acpi/nfit.c       |   14 +++--
 drivers/nvdimm/core.c     |  122 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/nvdimm/nd-core.h  |    1 
 include/linux/libnvdimm.h |    9 +++
 4 files changed, 139 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index f8c1a850effc..b047dbe13bed 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -1616,7 +1616,8 @@ static void __iomem *__nfit_spa_map(struct acpi_nfit_desc *acpi_desc,
  * when all region devices referencing the same mapping are disabled /
  * unbound.
  */
-static void __iomem *nfit_spa_map(struct acpi_nfit_desc *acpi_desc,
+static __maybe_unused void __iomem *nfit_spa_map(
+		struct acpi_nfit_desc *acpi_desc,
 		struct acpi_nfit_system_address *spa, enum spa_map_type type)
 {
 	void __iomem *iomem;
@@ -1669,7 +1670,6 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus,
 		struct device *dev)
 {
 	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
-	struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
 	struct nd_blk_region *ndbr = to_nd_blk_region(dev);
 	struct nfit_flush *nfit_flush;
 	struct nfit_blk_mmio *mmio;
@@ -1697,8 +1697,8 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus,
 	/* map block aperture memory */
 	nfit_blk->bdw_offset = nfit_mem->bdw->offset;
 	mmio = &nfit_blk->mmio[BDW];
-	mmio->addr.base = nfit_spa_map(acpi_desc, nfit_mem->spa_bdw,
-			SPA_MAP_APERTURE);
+	mmio->addr.base = devm_nvdimm_memremap(dev, nfit_mem->spa_bdw->address,
+                        nfit_mem->spa_bdw->length, ARCH_MEMREMAP_PMEM);
 	if (!mmio->addr.base) {
 		dev_dbg(dev, "%s: %s failed to map bdw\n", __func__,
 				nvdimm_name(nvdimm));
@@ -1720,8 +1720,8 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus,
 	nfit_blk->cmd_offset = nfit_mem->dcr->command_offset;
 	nfit_blk->stat_offset = nfit_mem->dcr->status_offset;
 	mmio = &nfit_blk->mmio[DCR];
-	mmio->addr.base = nfit_spa_map(acpi_desc, nfit_mem->spa_dcr,
-			SPA_MAP_CONTROL);
+	mmio->addr.base = devm_nvdimm_ioremap(dev, nfit_mem->spa_dcr->address,
+			nfit_mem->spa_dcr->length);
 	if (!mmio->addr.base) {
 		dev_dbg(dev, "%s: %s failed to map dcr\n", __func__,
 				nvdimm_name(nvdimm));
@@ -1748,7 +1748,7 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus,
 
 	nfit_flush = nfit_mem->nfit_flush;
 	if (nfit_flush && nfit_flush->flush->hint_count != 0) {
-		nfit_blk->nvdimm_flush = devm_ioremap_nocache(dev,
+		nfit_blk->nvdimm_flush = devm_nvdimm_ioremap(dev,
 				nfit_flush->flush->hint_address[0], 8);
 		if (!nfit_blk->nvdimm_flush)
 			return -ENOMEM;
diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
index 32e4fe2f6274..f9686297ff79 100644
--- a/drivers/nvdimm/core.c
+++ b/drivers/nvdimm/core.c
@@ -57,6 +57,127 @@ bool is_nvdimm_bus_locked(struct device *dev)
 }
 EXPORT_SYMBOL(is_nvdimm_bus_locked);
 
+struct nvdimm_map {
+	struct nvdimm_bus *nvdimm_bus;
+	struct list_head list;
+	resource_size_t offset;
+	unsigned long flags;
+	size_t size;
+	union {
+		void *mem;
+		void __iomem *iomem;
+	};
+	struct kref kref;
+};
+
+static struct nvdimm_map *find_nvdimm_map(struct device *dev,
+		resource_size_t offset)
+{
+	struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
+	struct nvdimm_map *nvdimm_map;
+
+	list_for_each_entry(nvdimm_map, &nvdimm_bus->mapping_list, list)
+		if (nvdimm_map->offset == offset)
+			return nvdimm_map;
+	return NULL;
+}
+
+static struct nvdimm_map *alloc_nvdimm_map(struct device *dev,
+		resource_size_t offset, size_t size, unsigned long flags)
+{
+	struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
+	struct nvdimm_map *nvdimm_map;
+
+	nvdimm_map = kzalloc(sizeof(*nvdimm_map), GFP_KERNEL);
+	if (!nvdimm_map)
+		return NULL;
+
+	INIT_LIST_HEAD(&nvdimm_map->list);
+	nvdimm_map->nvdimm_bus = nvdimm_bus;
+	nvdimm_map->offset = offset;
+	nvdimm_map->flags = flags;
+	nvdimm_map->size = size;
+	kref_init(&nvdimm_map->kref);
+
+	if (!request_mem_region(offset, size, dev_name(&nvdimm_bus->dev)))
+		goto err_request_region;
+
+	if (flags)
+		nvdimm_map->mem = memremap(offset, size, flags);
+	else
+		nvdimm_map->iomem = ioremap(offset, size);
+
+	if (!nvdimm_map->mem)
+		goto err_map;
+
+	dev_WARN_ONCE(dev, !is_nvdimm_bus_locked(dev), "%s: bus unlocked!",
+			__func__);
+	list_add(&nvdimm_map->list, &nvdimm_bus->mapping_list);
+
+	return nvdimm_map;
+
+ err_map:
+	release_mem_region(offset, size);
+ err_request_region:
+	kfree(nvdimm_map);
+	return NULL;
+}
+
+static void nvdimm_map_release(struct kref *kref)
+{
+	struct nvdimm_bus *nvdimm_bus;
+	struct nvdimm_map *nvdimm_map;
+
+	nvdimm_map = container_of(kref, struct nvdimm_map, kref);
+	nvdimm_bus = nvdimm_map->nvdimm_bus;
+
+	dev_dbg(&nvdimm_bus->dev, "%s: %pa\n", __func__, &nvdimm_map->offset);
+	list_del(&nvdimm_map->list);
+	if (nvdimm_map->flags)
+		memunmap(nvdimm_map->mem);
+	else
+		iounmap(nvdimm_map->iomem);
+	release_mem_region(nvdimm_map->offset, nvdimm_map->size);
+	kfree(nvdimm_map);
+}
+
+static void nvdimm_map_put(void *data)
+{
+	struct nvdimm_map *nvdimm_map = data;
+	struct nvdimm_bus *nvdimm_bus = nvdimm_map->nvdimm_bus;
+
+	nvdimm_bus_lock(&nvdimm_bus->dev);
+	kref_put(&nvdimm_map->kref, nvdimm_map_release);
+	nvdimm_bus_unlock(&nvdimm_bus->dev);
+}
+
+/**
+ * devm_nvdimm_memremap - map a resource that is shared across regions
+ * @dev: device that will own a reference to the shared mapping
+ * @offset: physical base address of the mapping
+ * @size: mapping size
+ * @flags: memremap flags, or, if zero, perform an ioremap instead
+ */
+void *devm_nvdimm_memremap(struct device *dev, resource_size_t offset,
+		size_t size, unsigned long flags)
+{
+	struct nvdimm_map *nvdimm_map;
+
+	nvdimm_bus_lock(dev);
+	nvdimm_map = find_nvdimm_map(dev, offset);
+	if (!nvdimm_map)
+		nvdimm_map = alloc_nvdimm_map(dev, offset, size, flags);
+	else
+		kref_get(&nvdimm_map->kref);
+	nvdimm_bus_unlock(dev);
+
+	if (devm_add_action_or_reset(dev, nvdimm_map_put, nvdimm_map))
+		return NULL;
+
+	return nvdimm_map->mem;
+}
+EXPORT_SYMBOL_GPL(devm_nvdimm_memremap);
+
 u64 nd_fletcher64(void *addr, size_t len, bool le)
 {
 	u32 *buf = addr;
@@ -335,6 +456,7 @@ struct nvdimm_bus *__nvdimm_bus_register(struct device *parent,
 	if (!nvdimm_bus)
 		return NULL;
 	INIT_LIST_HEAD(&nvdimm_bus->list);
+	INIT_LIST_HEAD(&nvdimm_bus->mapping_list);
 	INIT_LIST_HEAD(&nvdimm_bus->poison_list);
 	init_waitqueue_head(&nvdimm_bus->probe_wait);
 	nvdimm_bus->id = ida_simple_get(&nd_ida, 0, 0, GFP_KERNEL);
diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index 284cdaa268cf..790b62cc81ed 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -31,6 +31,7 @@ struct nvdimm_bus {
 	struct device dev;
 	int id, probe_active;
 	struct list_head poison_list;
+	struct list_head mapping_list;
 	struct mutex reconfig_mutex;
 };
 
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 0c3c30cbbea5..18c3cc48a970 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -99,6 +99,15 @@ struct nd_region_desc {
 	unsigned long flags;
 };
 
+struct device;
+void *devm_nvdimm_memremap(struct device *dev, resource_size_t offset,
+		size_t size, unsigned long flags);
+static inline void __iomem *devm_nvdimm_ioremap(struct device *dev,
+		resource_size_t offset, size_t size)
+{
+	return (void __iomem *) devm_nvdimm_memremap(dev, offset, size, 0);
+}
+
 struct nvdimm_bus;
 struct module;
 struct device;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 04/17] libnvdimm, nfit: remove nfit_spa_map() infrastructure
  2016-07-10  3:24 [PATCH v2 00/17] replace pcommit with ADR or directed flushing Dan Williams
                   ` (2 preceding siblings ...)
  2016-07-10  3:24 ` [PATCH v2 03/17] libnvdimm: introduce devm_nvdimm_memremap(), convert nfit_spa_map() users Dan Williams
@ 2016-07-10  3:24 ` Dan Williams
  2016-07-10  3:24 ` [PATCH v2 05/17] libnvdimm, nfit: move flush hint mapping to region-device driver-data Dan Williams
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Dan Williams @ 2016-07-10  3:24 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: linux-fsdevel, linux-acpi, hch, linux-kernel

Now that all shared mappings are handled by devm_nvdimm_memremap() we no
longer need nfit_spa_map() nor do we need to trigger a callback to the
bus provider at region disable time.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/acpi/nfit.c          |  146 ------------------------------------------
 drivers/acpi/nfit.h          |   21 ------
 drivers/nvdimm/nd.h          |    1 
 drivers/nvdimm/region_devs.c |    3 -
 include/linux/libnvdimm.h    |    1 
 5 files changed, 172 deletions(-)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index b047dbe13bed..b76c95981547 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -1509,126 +1509,6 @@ static int acpi_nfit_blk_region_do_io(struct nd_blk_region *ndbr,
 	return rc;
 }
 
-static void nfit_spa_mapping_release(struct kref *kref)
-{
-	struct nfit_spa_mapping *spa_map = to_spa_map(kref);
-	struct acpi_nfit_system_address *spa = spa_map->spa;
-	struct acpi_nfit_desc *acpi_desc = spa_map->acpi_desc;
-
-	WARN_ON(!mutex_is_locked(&acpi_desc->spa_map_mutex));
-	dev_dbg(acpi_desc->dev, "%s: SPA%d\n", __func__, spa->range_index);
-	if (spa_map->type == SPA_MAP_APERTURE)
-		memunmap((void __force *)spa_map->addr.aperture);
-	else
-		iounmap(spa_map->addr.base);
-	release_mem_region(spa->address, spa->length);
-	list_del(&spa_map->list);
-	kfree(spa_map);
-}
-
-static struct nfit_spa_mapping *find_spa_mapping(
-		struct acpi_nfit_desc *acpi_desc,
-		struct acpi_nfit_system_address *spa)
-{
-	struct nfit_spa_mapping *spa_map;
-
-	WARN_ON(!mutex_is_locked(&acpi_desc->spa_map_mutex));
-	list_for_each_entry(spa_map, &acpi_desc->spa_maps, list)
-		if (spa_map->spa == spa)
-			return spa_map;
-
-	return NULL;
-}
-
-static void nfit_spa_unmap(struct acpi_nfit_desc *acpi_desc,
-		struct acpi_nfit_system_address *spa)
-{
-	struct nfit_spa_mapping *spa_map;
-
-	mutex_lock(&acpi_desc->spa_map_mutex);
-	spa_map = find_spa_mapping(acpi_desc, spa);
-
-	if (spa_map)
-		kref_put(&spa_map->kref, nfit_spa_mapping_release);
-	mutex_unlock(&acpi_desc->spa_map_mutex);
-}
-
-static void __iomem *__nfit_spa_map(struct acpi_nfit_desc *acpi_desc,
-		struct acpi_nfit_system_address *spa, enum spa_map_type type)
-{
-	resource_size_t start = spa->address;
-	resource_size_t n = spa->length;
-	struct nfit_spa_mapping *spa_map;
-	struct resource *res;
-
-	WARN_ON(!mutex_is_locked(&acpi_desc->spa_map_mutex));
-
-	spa_map = find_spa_mapping(acpi_desc, spa);
-	if (spa_map) {
-		kref_get(&spa_map->kref);
-		return spa_map->addr.base;
-	}
-
-	spa_map = kzalloc(sizeof(*spa_map), GFP_KERNEL);
-	if (!spa_map)
-		return NULL;
-
-	INIT_LIST_HEAD(&spa_map->list);
-	spa_map->spa = spa;
-	kref_init(&spa_map->kref);
-	spa_map->acpi_desc = acpi_desc;
-
-	res = request_mem_region(start, n, dev_name(acpi_desc->dev));
-	if (!res)
-		goto err_mem;
-
-	spa_map->type = type;
-	if (type == SPA_MAP_APERTURE)
-		spa_map->addr.aperture = (void __pmem *)memremap(start, n,
-							ARCH_MEMREMAP_PMEM);
-	else
-		spa_map->addr.base = ioremap_nocache(start, n);
-
-
-	if (!spa_map->addr.base)
-		goto err_map;
-
-	list_add_tail(&spa_map->list, &acpi_desc->spa_maps);
-	return spa_map->addr.base;
-
- err_map:
-	release_mem_region(start, n);
- err_mem:
-	kfree(spa_map);
-	return NULL;
-}
-
-/**
- * nfit_spa_map - interleave-aware managed-mappings of acpi_nfit_system_address ranges
- * @nvdimm_bus: NFIT-bus that provided the spa table entry
- * @nfit_spa: spa table to map
- * @type: aperture or control region
- *
- * In the case where block-data-window apertures and
- * dimm-control-regions are interleaved they will end up sharing a
- * single request_mem_region() + ioremap() for the address range.  In
- * the style of devm nfit_spa_map() mappings are automatically dropped
- * when all region devices referencing the same mapping are disabled /
- * unbound.
- */
-static __maybe_unused void __iomem *nfit_spa_map(
-		struct acpi_nfit_desc *acpi_desc,
-		struct acpi_nfit_system_address *spa, enum spa_map_type type)
-{
-	void __iomem *iomem;
-
-	mutex_lock(&acpi_desc->spa_map_mutex);
-	iomem = __nfit_spa_map(acpi_desc, spa, type);
-	mutex_unlock(&acpi_desc->spa_map_mutex);
-
-	return iomem;
-}
-
 static int nfit_blk_init_interleave(struct nfit_blk_mmio *mmio,
 		struct acpi_nfit_interleave *idt, u16 interleave_ways)
 {
@@ -1773,29 +1653,6 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus,
 	return 0;
 }
 
-static void acpi_nfit_blk_region_disable(struct nvdimm_bus *nvdimm_bus,
-		struct device *dev)
-{
-	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
-	struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
-	struct nd_blk_region *ndbr = to_nd_blk_region(dev);
-	struct nfit_blk *nfit_blk = nd_blk_region_provider_data(ndbr);
-	int i;
-
-	if (!nfit_blk)
-		return; /* never enabled */
-
-	/* auto-free BLK spa mappings */
-	for (i = 0; i < 2; i++) {
-		struct nfit_blk_mmio *mmio = &nfit_blk->mmio[i];
-
-		if (mmio->addr.base)
-			nfit_spa_unmap(acpi_desc, mmio->spa);
-	}
-	nd_blk_region_set_provider_data(ndbr, NULL);
-	/* devm will free nfit_blk */
-}
-
 static int ars_get_cap(struct acpi_nfit_desc *acpi_desc,
 		struct nd_cmd_ars_cap *cmd, struct nfit_spa *nfit_spa)
 {
@@ -1969,7 +1826,6 @@ static int acpi_nfit_init_mapping(struct acpi_nfit_desc *acpi_desc,
 		ndr_desc->num_mappings = blk_valid;
 		ndbr_desc = to_blk_region_desc(ndr_desc);
 		ndbr_desc->enable = acpi_nfit_blk_region_enable;
-		ndbr_desc->disable = acpi_nfit_blk_region_disable;
 		ndbr_desc->do_io = acpi_desc->blk_do_io;
 		nfit_spa->nd_region = nvdimm_blk_region_create(acpi_desc->nvdimm_bus,
 				ndr_desc);
@@ -2509,7 +2365,6 @@ void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
 	nd_desc->clear_to_send = acpi_nfit_clear_to_send;
 	nd_desc->attr_groups = acpi_nfit_attribute_groups;
 
-	INIT_LIST_HEAD(&acpi_desc->spa_maps);
 	INIT_LIST_HEAD(&acpi_desc->spas);
 	INIT_LIST_HEAD(&acpi_desc->dcrs);
 	INIT_LIST_HEAD(&acpi_desc->bdws);
@@ -2517,7 +2372,6 @@ void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
 	INIT_LIST_HEAD(&acpi_desc->flushes);
 	INIT_LIST_HEAD(&acpi_desc->memdevs);
 	INIT_LIST_HEAD(&acpi_desc->dimms);
-	mutex_init(&acpi_desc->spa_map_mutex);
 	mutex_init(&acpi_desc->init_mutex);
 	INIT_WORK(&acpi_desc->work, acpi_nfit_scrub);
 }
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index f06fa91c5abf..52078475d969 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -135,9 +135,7 @@ struct acpi_nfit_desc {
 	struct nvdimm_bus_descriptor nd_desc;
 	struct acpi_table_header acpi_header;
 	struct acpi_nfit_header *nfit;
-	struct mutex spa_map_mutex;
 	struct mutex init_mutex;
-	struct list_head spa_maps;
 	struct list_head memdevs;
 	struct list_head flushes;
 	struct list_head dimms;
@@ -188,25 +186,6 @@ struct nfit_blk {
 	u32 dimm_flags;
 };
 
-enum spa_map_type {
-	SPA_MAP_CONTROL,
-	SPA_MAP_APERTURE,
-};
-
-struct nfit_spa_mapping {
-	struct acpi_nfit_desc *acpi_desc;
-	struct acpi_nfit_system_address *spa;
-	struct list_head list;
-	struct kref kref;
-	enum spa_map_type type;
-	struct nd_blk_addr addr;
-};
-
-static inline struct nfit_spa_mapping *to_spa_map(struct kref *kref)
-{
-	return container_of(kref, struct nfit_spa_mapping, kref);
-}
-
 static inline struct acpi_nfit_memory_map *__to_nfit_memdev(
 		struct nfit_mem *nfit_mem)
 {
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index d0ac93c31dda..2819e886dfd2 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -119,7 +119,6 @@ struct nd_region {
 
 struct nd_blk_region {
 	int (*enable)(struct nvdimm_bus *nvdimm_bus, struct device *dev);
-	void (*disable)(struct nvdimm_bus *nvdimm_bus, struct device *dev);
 	int (*do_io)(struct nd_blk_region *ndbr, resource_size_t dpa,
 			void *iobuf, u64 len, int rw);
 	void *blk_provider_data;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 40fcfea26fbb..694b21024871 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -433,8 +433,6 @@ static void nd_region_notify_driver_action(struct nvdimm_bus *nvdimm_bus,
 
 		if (is_nd_pmem(dev))
 			return;
-
-		to_nd_blk_region(dev)->disable(nvdimm_bus, dev);
 	}
 	if (dev->parent && is_nd_blk(dev->parent) && probe) {
 		nd_region = to_nd_region(dev->parent);
@@ -698,7 +696,6 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
 		if (ndbr) {
 			nd_region = &ndbr->nd_region;
 			ndbr->enable = ndbr_desc->enable;
-			ndbr->disable = ndbr_desc->disable;
 			ndbr->do_io = ndbr_desc->do_io;
 		}
 		region_buf = ndbr;
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 18c3cc48a970..1050f9aa3a3e 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -114,7 +114,6 @@ struct device;
 struct nd_blk_region;
 struct nd_blk_region_desc {
 	int (*enable)(struct nvdimm_bus *nvdimm_bus, struct device *dev);
-	void (*disable)(struct nvdimm_bus *nvdimm_bus, struct device *dev);
 	int (*do_io)(struct nd_blk_region *ndbr, resource_size_t dpa,
 			void *iobuf, u64 len, int rw);
 	struct nd_region_desc ndr_desc;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 05/17] libnvdimm, nfit: move flush hint mapping to region-device driver-data
  2016-07-10  3:24 [PATCH v2 00/17] replace pcommit with ADR or directed flushing Dan Williams
                   ` (3 preceding siblings ...)
  2016-07-10  3:24 ` [PATCH v2 04/17] libnvdimm, nfit: remove nfit_spa_map() infrastructure Dan Williams
@ 2016-07-10  3:24 ` Dan Williams
  2016-07-10  3:25 ` [PATCH v2 06/17] tools/testing/nvdimm: simulate multiple flush hints per-dimm Dan Williams
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Dan Williams @ 2016-07-10  3:24 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: linux-fsdevel, linux-acpi, hch, linux-kernel

In preparation for triggering flushes of a DIMM's writes-posted-queue
(WPQ) via the pmem driver move mapping of flush hint addresses to the
region driver.  Since this uses devm_nvdimm_memremap() the flush
addresses will remain mapped while any region to which the dimm belongs
is active.

We need to communicate more information to the nvdimm core to facilitate
this mapping, namely each dimm object now carries an array of flush hint
address resources.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/acpi/nfit.c          |   21 +++++++++++
 drivers/acpi/nfit.h          |    1 +
 drivers/nvdimm/dimm_devs.c   |    5 ++-
 drivers/nvdimm/nd-core.h     |    3 +-
 drivers/nvdimm/nd.h          |    8 +++-
 drivers/nvdimm/region.c      |   16 ++++-----
 drivers/nvdimm/region_devs.c |   79 ++++++++++++++++++++++++++++++++++++++++--
 include/linux/libnvdimm.h    |    4 ++
 8 files changed, 119 insertions(+), 18 deletions(-)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index b76c95981547..6796f780870a 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -714,9 +714,24 @@ static int nfit_mem_dcr_init(struct acpi_nfit_desc *acpi_desc,
 		}
 
 		list_for_each_entry(nfit_flush, &acpi_desc->flushes, list) {
+			struct acpi_nfit_flush_address *flush;
+			u16 i;
+
 			if (nfit_flush->flush->device_handle != device_handle)
 				continue;
 			nfit_mem->nfit_flush = nfit_flush;
+			flush = nfit_flush->flush;
+			nfit_mem->flush_wpq = devm_kzalloc(acpi_desc->dev,
+					flush->hint_count
+					* sizeof(struct resource), GFP_KERNEL);
+			if (!nfit_mem->flush_wpq)
+				return -ENOMEM;
+			for (i = 0; i < flush->hint_count; i++) {
+				struct resource *res = &nfit_mem->flush_wpq[i];
+
+				res->start = flush->hint_address[i];
+				res->end = res->start + 8 - 1;
+			}
 			break;
 		}
 
@@ -1171,6 +1186,7 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *acpi_desc)
 	int dimm_count = 0;
 
 	list_for_each_entry(nfit_mem, &acpi_desc->dimms, list) {
+		struct acpi_nfit_flush_address *flush;
 		unsigned long flags = 0, cmd_mask;
 		struct nvdimm *nvdimm;
 		u32 device_handle;
@@ -1204,9 +1220,12 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *acpi_desc)
 		if (nfit_mem->family == NVDIMM_FAMILY_INTEL)
 			cmd_mask |= nfit_mem->dsm_mask;
 
+		flush = nfit_mem->nfit_flush ? nfit_mem->nfit_flush->flush
+			: NULL;
 		nvdimm = nvdimm_create(acpi_desc->nvdimm_bus, nfit_mem,
 				acpi_nfit_dimm_attribute_groups,
-				flags, cmd_mask);
+				flags, cmd_mask, flush ? flush->hint_count : 0,
+				nfit_mem->flush_wpq);
 		if (!nvdimm)
 			return -ENOMEM;
 
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index 52078475d969..9282eb324dcc 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -127,6 +127,7 @@ struct nfit_mem {
 	struct list_head list;
 	struct acpi_device *adev;
 	struct acpi_nfit_desc *acpi_desc;
+	struct resource *flush_wpq;
 	unsigned long dsm_mask;
 	int family;
 };
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index bbde28d3dec5..d9bba5edd8dc 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -346,7 +346,8 @@ EXPORT_SYMBOL_GPL(nvdimm_attribute_group);
 
 struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
 		const struct attribute_group **groups, unsigned long flags,
-		unsigned long cmd_mask)
+		unsigned long cmd_mask, int num_flush,
+		struct resource *flush_wpq)
 {
 	struct nvdimm *nvdimm = kzalloc(sizeof(*nvdimm), GFP_KERNEL);
 	struct device *dev;
@@ -362,6 +363,8 @@ struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
 	nvdimm->provider_data = provider_data;
 	nvdimm->flags = flags;
 	nvdimm->cmd_mask = cmd_mask;
+	nvdimm->num_flush = num_flush;
+	nvdimm->flush_wpq = flush_wpq;
 	atomic_set(&nvdimm->busy, 0);
 	dev = &nvdimm->dev;
 	dev_set_name(dev, "nmem%d", nvdimm->id);
diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index 790b62cc81ed..6e961f7f43e7 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -41,7 +41,8 @@ struct nvdimm {
 	unsigned long cmd_mask;
 	struct device dev;
 	atomic_t busy;
-	int id;
+	int id, num_flush;
+	struct resource *flush_wpq;
 };
 
 bool is_nvdimm(struct device *dev);
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 2819e886dfd2..5912bd6b4234 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -49,9 +49,10 @@ struct nvdimm_drvdata {
 	struct kref kref;
 };
 
-struct nd_region_namespaces {
-	int count;
-	int active;
+struct nd_region_data {
+	int ns_count;
+	int ns_active;
+	void __iomem *flush_wpq[0][0];
 };
 
 static inline struct nd_namespace_index *to_namespace_index(
@@ -324,6 +325,7 @@ static inline void devm_nsio_disable(struct device *dev,
 }
 #endif
 int nd_blk_region_init(struct nd_region *nd_region);
+int nd_region_activate(struct nd_region *nd_region);
 void __nd_iostat_start(struct bio *bio, unsigned long *start);
 static inline bool nd_iostat_start(struct bio *bio, unsigned long *start)
 {
diff --git a/drivers/nvdimm/region.c b/drivers/nvdimm/region.c
index 05a912359939..333175dac8d5 100644
--- a/drivers/nvdimm/region.c
+++ b/drivers/nvdimm/region.c
@@ -20,7 +20,7 @@ static int nd_region_probe(struct device *dev)
 {
 	int err, rc;
 	static unsigned long once;
-	struct nd_region_namespaces *num_ns;
+	struct nd_region_data *ndrd;
 	struct nd_region *nd_region = to_nd_region(dev);
 
 	if (nd_region->num_lanes > num_online_cpus()
@@ -33,21 +33,21 @@ static int nd_region_probe(struct device *dev)
 				nd_region->num_lanes);
 	}
 
+	rc = nd_region_activate(nd_region);
+	if (rc)
+		return rc;
+
 	rc = nd_blk_region_init(nd_region);
 	if (rc)
 		return rc;
 
 	rc = nd_region_register_namespaces(nd_region, &err);
-	num_ns = devm_kzalloc(dev, sizeof(*num_ns), GFP_KERNEL);
-	if (!num_ns)
-		return -ENOMEM;
-
 	if (rc < 0)
 		return rc;
 
-	num_ns->active = rc;
-	num_ns->count = rc + err;
-	dev_set_drvdata(dev, num_ns);
+	ndrd = dev_get_drvdata(dev);
+	ndrd->ns_active = rc;
+	ndrd->ns_count = rc + err;
 
 	if (rc && err && rc == err)
 		return -ENODEV;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 694b21024871..67022f74febc 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -22,6 +22,79 @@
 
 static DEFINE_IDA(region_ida);
 
+static int nvdimm_map_flush(struct device *dev, struct nvdimm *nvdimm, int dimm,
+		struct nd_region_data *ndrd)
+{
+	int i, j;
+
+	dev_dbg(dev, "%s: map %d flush address%s\n", nvdimm_name(nvdimm),
+			nvdimm->num_flush, nvdimm->num_flush == 1 ? "" : "es");
+	for (i = 0; i < nvdimm->num_flush; i++) {
+		struct resource *res = &nvdimm->flush_wpq[i];
+		unsigned long pfn = PHYS_PFN(res->start);
+		void __iomem *flush_page;
+
+		/* check if flush hints share a page */
+		for (j = 0; j < i; j++) {
+			struct resource *res_j = &nvdimm->flush_wpq[j];
+			unsigned long pfn_j = PHYS_PFN(res_j->start);
+
+			if (pfn == pfn_j)
+				break;
+		}
+
+		if (j < i)
+			flush_page = (void __iomem *) ((unsigned long)
+					ndrd->flush_wpq[dimm][j] & PAGE_MASK);
+		else
+			flush_page = devm_nvdimm_ioremap(dev,
+					PHYS_PFN(pfn), PAGE_SIZE);
+		if (!flush_page)
+			return -ENXIO;
+		ndrd->flush_wpq[dimm][i] = flush_page
+			+ (res->start & ~PAGE_MASK);
+	}
+
+	return 0;
+}
+
+int nd_region_activate(struct nd_region *nd_region)
+{
+	int i;
+	struct nd_region_data *ndrd;
+	struct device *dev = &nd_region->dev;
+	size_t flush_data_size = sizeof(void *);
+
+	nvdimm_bus_lock(&nd_region->dev);
+	for (i = 0; i < nd_region->ndr_mappings; i++) {
+		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+		struct nvdimm *nvdimm = nd_mapping->nvdimm;
+
+		/* at least one null hint slot per-dimm for the "no-hint" case */
+		flush_data_size += sizeof(void *);
+		if (!nvdimm->num_flush)
+			continue;
+		flush_data_size += nvdimm->num_flush * sizeof(void *);
+	}
+	nvdimm_bus_unlock(&nd_region->dev);
+
+	ndrd = devm_kzalloc(dev, sizeof(*ndrd) + flush_data_size, GFP_KERNEL);
+	if (!ndrd)
+		return -ENOMEM;
+	dev_set_drvdata(dev, ndrd);
+
+	for (i = 0; i < nd_region->ndr_mappings; i++) {
+		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+		struct nvdimm *nvdimm = nd_mapping->nvdimm;
+		int rc = nvdimm_map_flush(&nd_region->dev, nvdimm, i, ndrd);
+
+		if (rc)
+			return rc;
+	}
+
+	return 0;
+}
+
 static void nd_region_release(struct device *dev)
 {
 	struct nd_region *nd_region = to_nd_region(dev);
@@ -242,12 +315,12 @@ static DEVICE_ATTR_RO(available_size);
 static ssize_t init_namespaces_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
-	struct nd_region_namespaces *num_ns = dev_get_drvdata(dev);
+	struct nd_region_data *ndrd = dev_get_drvdata(dev);
 	ssize_t rc;
 
 	nvdimm_bus_lock(dev);
-	if (num_ns)
-		rc = sprintf(buf, "%d/%d\n", num_ns->active, num_ns->count);
+	if (ndrd)
+		rc = sprintf(buf, "%d/%d\n", ndrd->ns_active, ndrd->ns_count);
 	else
 		rc = -ENXIO;
 	nvdimm_bus_unlock(dev);
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 1050f9aa3a3e..815b9b430ead 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -52,6 +52,7 @@ typedef int (*ndctl_fn)(struct nvdimm_bus_descriptor *nd_desc,
 
 struct nd_namespace_label;
 struct nvdimm_drvdata;
+
 struct nd_mapping {
 	struct nvdimm *nvdimm;
 	struct nd_namespace_label **labels;
@@ -142,7 +143,8 @@ unsigned long nvdimm_cmd_mask(struct nvdimm *nvdimm);
 void *nvdimm_provider_data(struct nvdimm *nvdimm);
 struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
 		const struct attribute_group **groups, unsigned long flags,
-		unsigned long cmd_mask);
+		unsigned long cmd_mask, int num_flush,
+		struct resource *flush_wpq);
 const struct nd_cmd_desc *nd_cmd_dimm_desc(int cmd);
 const struct nd_cmd_desc *nd_cmd_bus_desc(int cmd);
 u32 nd_cmd_in_size(struct nvdimm *nvdimm, int cmd,

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 06/17] tools/testing/nvdimm: simulate multiple flush hints per-dimm
  2016-07-10  3:24 [PATCH v2 00/17] replace pcommit with ADR or directed flushing Dan Williams
                   ` (4 preceding siblings ...)
  2016-07-10  3:24 ` [PATCH v2 05/17] libnvdimm, nfit: move flush hint mapping to region-device driver-data Dan Williams
@ 2016-07-10  3:25 ` Dan Williams
  2016-07-10  3:25 ` [PATCH v2 07/17] libnvdimm: keep region data alive over namespace removal Dan Williams
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Dan Williams @ 2016-07-10  3:25 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: linux-fsdevel, linux-acpi, hch, linux-kernel

Sample nfit data to test the kernel's handling of the multiple
flush-hint case.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 tools/testing/nvdimm/test/nfit.c |   55 +++++++++++++++++++++++---------------
 1 file changed, 33 insertions(+), 22 deletions(-)

diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index 4fdd139f6e6c..ff09a28890ed 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -98,6 +98,7 @@
 enum {
 	NUM_PM  = 3,
 	NUM_DCR = 5,
+	NUM_HINTS = 8,
 	NUM_BDW = NUM_DCR,
 	NUM_SPA = NUM_PM + NUM_DCR + NUM_BDW,
 	NUM_MEM = NUM_DCR + NUM_BDW + 2 /* spa0 iset */ + 4 /* spa1 iset */,
@@ -569,7 +570,8 @@ static int nfit_test0_alloc(struct nfit_test *t)
 			+ offsetof(struct acpi_nfit_control_region,
 					window_size) * NUM_DCR
 			+ sizeof(struct acpi_nfit_data_region) * NUM_BDW
-			+ sizeof(struct acpi_nfit_flush_address) * NUM_DCR;
+			+ (sizeof(struct acpi_nfit_flush_address)
+					+ sizeof(u64) * NUM_HINTS) * NUM_DCR;
 	int i;
 
 	t->nfit_buf = test_alloc(t, nfit_size, &t->nfit_dma);
@@ -599,7 +601,8 @@ static int nfit_test0_alloc(struct nfit_test *t)
 			return -ENOMEM;
 		sprintf(t->label[i], "label%d", i);
 
-		t->flush[i] = test_alloc(t, 8, &t->flush_dma[i]);
+		t->flush[i] = test_alloc(t, sizeof(u64) * NUM_HINTS,
+				&t->flush_dma[i]);
 		if (!t->flush[i])
 			return -ENOMEM;
 	}
@@ -633,6 +636,8 @@ static int nfit_test1_alloc(struct nfit_test *t)
 
 static void nfit_test0_setup(struct nfit_test *t)
 {
+	const int flush_hint_size = sizeof(struct acpi_nfit_flush_address)
+		+ (sizeof(u64) * NUM_HINTS);
 	struct acpi_nfit_desc *acpi_desc;
 	struct acpi_nfit_memory_map *memdev;
 	void *nfit_buf = t->nfit_buf;
@@ -640,7 +645,7 @@ static void nfit_test0_setup(struct nfit_test *t)
 	struct acpi_nfit_control_region *dcr;
 	struct acpi_nfit_data_region *bdw;
 	struct acpi_nfit_flush_address *flush;
-	unsigned int offset;
+	unsigned int offset, i;
 
 	/*
 	 * spa0 (interleave first half of dimm0 and dimm1, note storage
@@ -1126,37 +1131,41 @@ static void nfit_test0_setup(struct nfit_test *t)
 	/* flush0 (dimm0) */
 	flush = nfit_buf + offset;
 	flush->header.type = ACPI_NFIT_TYPE_FLUSH_ADDRESS;
-	flush->header.length = sizeof(struct acpi_nfit_flush_address);
+	flush->header.length = flush_hint_size;
 	flush->device_handle = handle[0];
-	flush->hint_count = 1;
-	flush->hint_address[0] = t->flush_dma[0];
+	flush->hint_count = NUM_HINTS;
+	for (i = 0; i < NUM_HINTS; i++)
+		flush->hint_address[i] = t->flush_dma[0] + i * sizeof(u64);
 
 	/* flush1 (dimm1) */
-	flush = nfit_buf + offset + sizeof(struct acpi_nfit_flush_address) * 1;
+	flush = nfit_buf + offset + flush_hint_size * 1;
 	flush->header.type = ACPI_NFIT_TYPE_FLUSH_ADDRESS;
-	flush->header.length = sizeof(struct acpi_nfit_flush_address);
+	flush->header.length = flush_hint_size;
 	flush->device_handle = handle[1];
-	flush->hint_count = 1;
-	flush->hint_address[0] = t->flush_dma[1];
+	flush->hint_count = NUM_HINTS;
+	for (i = 0; i < NUM_HINTS; i++)
+		flush->hint_address[i] = t->flush_dma[1] + i * sizeof(u64);
 
 	/* flush2 (dimm2) */
-	flush = nfit_buf + offset + sizeof(struct acpi_nfit_flush_address) * 2;
+	flush = nfit_buf + offset + flush_hint_size  * 2;
 	flush->header.type = ACPI_NFIT_TYPE_FLUSH_ADDRESS;
-	flush->header.length = sizeof(struct acpi_nfit_flush_address);
+	flush->header.length = flush_hint_size;
 	flush->device_handle = handle[2];
-	flush->hint_count = 1;
-	flush->hint_address[0] = t->flush_dma[2];
+	flush->hint_count = NUM_HINTS;
+	for (i = 0; i < NUM_HINTS; i++)
+		flush->hint_address[i] = t->flush_dma[2] + i * sizeof(u64);
 
 	/* flush3 (dimm3) */
-	flush = nfit_buf + offset + sizeof(struct acpi_nfit_flush_address) * 3;
+	flush = nfit_buf + offset + flush_hint_size * 3;
 	flush->header.type = ACPI_NFIT_TYPE_FLUSH_ADDRESS;
-	flush->header.length = sizeof(struct acpi_nfit_flush_address);
+	flush->header.length = flush_hint_size;
 	flush->device_handle = handle[3];
-	flush->hint_count = 1;
-	flush->hint_address[0] = t->flush_dma[3];
+	flush->hint_count = NUM_HINTS;
+	for (i = 0; i < NUM_HINTS; i++)
+		flush->hint_address[i] = t->flush_dma[3] + i * sizeof(u64);
 
 	if (t->setup_hotplug) {
-		offset = offset + sizeof(struct acpi_nfit_flush_address) * 4;
+		offset = offset + flush_hint_size * 4;
 		/* dcr-descriptor4: blk */
 		dcr = nfit_buf + offset;
 		dcr->header.type = ACPI_NFIT_TYPE_CONTROL_REGION;
@@ -1285,10 +1294,12 @@ static void nfit_test0_setup(struct nfit_test *t)
 		/* flush3 (dimm4) */
 		flush = nfit_buf + offset;
 		flush->header.type = ACPI_NFIT_TYPE_FLUSH_ADDRESS;
-		flush->header.length = sizeof(struct acpi_nfit_flush_address);
+		flush->header.length = flush_hint_size;
 		flush->device_handle = handle[4];
-		flush->hint_count = 1;
-		flush->hint_address[0] = t->flush_dma[4];
+		flush->hint_count = NUM_HINTS;
+		for (i = 0; i < NUM_HINTS; i++)
+			flush->hint_address[i] = t->flush_dma[4]
+				+ i * sizeof(u64);
 	}
 
 	post_ars_status(&t->ars_state, t->spa_set_dma[0], SPA0_SIZE);

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 07/17] libnvdimm: keep region data alive over namespace removal
  2016-07-10  3:24 [PATCH v2 00/17] replace pcommit with ADR or directed flushing Dan Williams
                   ` (5 preceding siblings ...)
  2016-07-10  3:25 ` [PATCH v2 06/17] tools/testing/nvdimm: simulate multiple flush hints per-dimm Dan Williams
@ 2016-07-10  3:25 ` Dan Williams
  2016-07-10  3:25 ` [PATCH v2 08/17] libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush() Dan Williams
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Dan Williams @ 2016-07-10  3:25 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: linux-fsdevel, linux-acpi, hch, linux-kernel

nd_region device driver data will be used in the namespace i/o path.
Re-order nd_region_remove() to ensure this data stays live across
namespace device removal

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/region.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/region.c b/drivers/nvdimm/region.c
index 333175dac8d5..8f241772ec0b 100644
--- a/drivers/nvdimm/region.c
+++ b/drivers/nvdimm/region.c
@@ -82,6 +82,8 @@ static int nd_region_remove(struct device *dev)
 {
 	struct nd_region *nd_region = to_nd_region(dev);
 
+	device_for_each_child(dev, NULL, child_unregister);
+
 	/* flush attribute readers and disable */
 	nvdimm_bus_lock(dev);
 	nd_region->ns_seed = NULL;
@@ -91,7 +93,6 @@ static int nd_region_remove(struct device *dev)
 	dev_set_drvdata(dev, NULL);
 	nvdimm_bus_unlock(dev);
 
-	device_for_each_child(dev, NULL, child_unregister);
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 08/17] libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()
  2016-07-10  3:24 [PATCH v2 00/17] replace pcommit with ADR or directed flushing Dan Williams
                   ` (6 preceding siblings ...)
  2016-07-10  3:25 ` [PATCH v2 07/17] libnvdimm: keep region data alive over namespace removal Dan Williams
@ 2016-07-10  3:25 ` Dan Williams
  2016-07-10  4:47   ` kbuild test robot
  2016-07-12 22:25   ` [PATCH v3] " Dan Williams
  2016-07-10  3:25 ` [PATCH v2 09/17] libnvdimm: cycle flush hints Dan Williams
                   ` (8 subsequent siblings)
  16 siblings, 2 replies; 32+ messages in thread
From: Dan Williams @ 2016-07-10  3:25 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: linux-fsdevel, linux-acpi, Ross Zwisler, hch, linux-kernel

nvdimm_flush() is a replacement for the x86 'pcommit' instruction.  It is
an optional write flushing mechanism that an nvdimm bus can provide for
the pmem driver to consume.  In the case of the NFIT nvdimm-bus-provider
nvdimm_flush() is implemented as a series of flush-hint-address [1]
writes to each dimm in the interleave set (region) that backs the
namespace.

The nvdimm_has_flush() routine relies on platform firmware to describe
the flushing capabilities of a platform.  It uses the heuristic of
whether an nvdimm bus provider provides flush address data to return a
ternary result:

      1: flush addresses defined
      0: dimm topology described without flush addresses (assume ADR)
 -errno: no topology information, unable to determine flush mechanism

The pmem driver is expected to take the following actions on this ternary
result:

      1: nvdimm_flush() in response to REQ_FUA / REQ_FLUSH and shutdown
      0: do not set, WC or FUA on the queue, take no further action
 -errno: warn and then operate as if nvdimm_has_flush() returned '0'

The caveat of this heuristic is that it can not distinguish the "dimm
does not have flush address" case from the "platform firmware is broken
and failed to describe a flush address".  Given we are already
explicitly trusting the NFIT there's not much more we can do beyond
blacklisting broken firmwares if they are ever encountered.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/acpi/nfit.c          |   33 ++-----------------------
 drivers/acpi/nfit.h          |    1 -
 drivers/nvdimm/pmem.c        |   27 ++++++++++++++++-----
 drivers/nvdimm/region_devs.c |   55 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/libnvdimm.h    |    2 ++
 5 files changed, 81 insertions(+), 37 deletions(-)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 6796f780870a..0497175ee6cb 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -1393,24 +1393,6 @@ static u64 to_interleave_offset(u64 offset, struct nfit_blk_mmio *mmio)
 	return mmio->base_offset + line_offset + table_offset + sub_line_offset;
 }
 
-static void wmb_blk(struct nfit_blk *nfit_blk)
-{
-
-	if (nfit_blk->nvdimm_flush) {
-		/*
-		 * The first wmb() is needed to 'sfence' all previous writes
-		 * such that they are architecturally visible for the platform
-		 * buffer flush.  Note that we've already arranged for pmem
-		 * writes to avoid the cache via arch_memcpy_to_pmem().  The
-		 * final wmb() ensures ordering for the NVDIMM flush write.
-		 */
-		wmb();
-		writeq(1, nfit_blk->nvdimm_flush);
-		wmb();
-	} else
-		wmb_pmem();
-}
-
 static u32 read_blk_stat(struct nfit_blk *nfit_blk, unsigned int bw)
 {
 	struct nfit_blk_mmio *mmio = &nfit_blk->mmio[DCR];
@@ -1445,7 +1427,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, unsigned int bw,
 		offset = to_interleave_offset(offset, mmio);
 
 	writeq(cmd, mmio->addr.base + offset);
-	wmb_blk(nfit_blk);
+	nvdimm_flush(nfit_blk->nd_region);
 
 	if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
 		readq(mmio->addr.base + offset);
@@ -1496,7 +1478,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk,
 	}
 
 	if (rw)
-		wmb_blk(nfit_blk);
+		nvdimm_flush(nfit_blk->nd_region);
 
 	rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
 	return rc;
@@ -1570,7 +1552,6 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus,
 {
 	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
 	struct nd_blk_region *ndbr = to_nd_blk_region(dev);
-	struct nfit_flush *nfit_flush;
 	struct nfit_blk_mmio *mmio;
 	struct nfit_blk *nfit_blk;
 	struct nfit_mem *nfit_mem;
@@ -1645,15 +1626,7 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus,
 		return rc;
 	}
 
-	nfit_flush = nfit_mem->nfit_flush;
-	if (nfit_flush && nfit_flush->flush->hint_count != 0) {
-		nfit_blk->nvdimm_flush = devm_nvdimm_ioremap(dev,
-				nfit_flush->flush->hint_address[0], 8);
-		if (!nfit_blk->nvdimm_flush)
-			return -ENOMEM;
-	}
-
-	if (!arch_has_wmb_pmem() && !nfit_blk->nvdimm_flush)
+	if (nvdimm_has_flush(nfit_blk->nd_region) < 0)
 		dev_warn(dev, "unable to guarantee persistence of writes\n");
 
 	if (mmio->line_size == 0)
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index 9282eb324dcc..9fda77cf81da 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -183,7 +183,6 @@ struct nfit_blk {
 	u64 bdw_offset; /* post interleave offset */
 	u64 stat_offset;
 	u64 cmd_offset;
-	void __iomem *nvdimm_flush;
 	u32 dimm_flags;
 };
 
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index b6fcb97a601c..e303655f243e 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -33,10 +33,24 @@
 #include "pfn.h"
 #include "nd.h"
 
+static struct device *to_dev(struct pmem_device *pmem)
+{
+	/*
+	 * nvdimm bus services need a 'dev' parameter, and we record the device
+	 * at init in bb.dev.
+	 */
+	return pmem->bb.dev;
+}
+
+static struct nd_region *to_region(struct pmem_device *pmem)
+{
+	return to_nd_region(to_dev(pmem)->parent);
+}
+
 static void pmem_clear_poison(struct pmem_device *pmem, phys_addr_t offset,
 		unsigned int len)
 {
-	struct device *dev = pmem->bb.dev;
+	struct device *dev = to_dev(pmem);
 	sector_t sector;
 	long cleared;
 
@@ -122,7 +136,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 		nd_iostat_end(bio, start);
 
 	if (bio_data_dir(bio))
-		wmb_pmem();
+		nvdimm_flush(to_region(pmem));
 
 	bio_endio(bio);
 	return BLK_QC_T_NONE;
@@ -136,7 +150,7 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
 
 	rc = pmem_do_bvec(pmem, page, PAGE_SIZE, 0, rw, sector);
 	if (rw & WRITE)
-		wmb_pmem();
+		nvdimm_flush(to_region(pmem));
 
 	/*
 	 * The ->rw_page interface is subtle and tricky.  The core
@@ -193,6 +207,7 @@ static int pmem_attach_disk(struct device *dev,
 		struct nd_namespace_common *ndns)
 {
 	struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
+	struct nd_region *nd_region = to_nd_region(dev->parent);
 	struct vmem_altmap __altmap, *altmap = NULL;
 	struct resource *res = &nsio->res;
 	struct nd_pfn *nd_pfn = NULL;
@@ -222,7 +237,7 @@ static int pmem_attach_disk(struct device *dev,
 	dev_set_drvdata(dev, pmem);
 	pmem->phys_addr = res->start;
 	pmem->size = resource_size(res);
-	if (!arch_has_wmb_pmem())
+	if (nvdimm_has_flush(nd_region) < 0)
 		dev_warn(dev, "unable to guarantee persistence of writes\n");
 
 	if (!devm_request_mem_region(dev, res->start, resource_size(res),
@@ -284,7 +299,7 @@ static int pmem_attach_disk(struct device *dev,
 			/ 512);
 	if (devm_init_badblocks(dev, &pmem->bb))
 		return -ENOMEM;
-	nvdimm_badblocks_populate(to_nd_region(dev->parent), &pmem->bb, res);
+	nvdimm_badblocks_populate(nd_region, &pmem->bb, res);
 	disk->bb = &pmem->bb;
 	add_disk(disk);
 
@@ -331,8 +346,8 @@ static int nd_pmem_remove(struct device *dev)
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
 {
-	struct nd_region *nd_region = to_nd_region(dev->parent);
 	struct pmem_device *pmem = dev_get_drvdata(dev);
+	struct nd_region *nd_region = to_region(pmem);
 	resource_size_t offset = 0, end_trunc = 0;
 	struct nd_namespace_common *ndns;
 	struct nd_namespace_io *nsio;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 67022f74febc..46b6e2f7d5f0 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -14,6 +14,7 @@
 #include <linux/highmem.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/pmem.h>
 #include <linux/sort.h>
 #include <linux/io.h>
 #include <linux/nd.h>
@@ -864,6 +865,60 @@ struct nd_region *nvdimm_volatile_region_create(struct nvdimm_bus *nvdimm_bus,
 }
 EXPORT_SYMBOL_GPL(nvdimm_volatile_region_create);
 
+/**
+ * nvdimm_flush - flush any posted write queues between the cpu and pmem media
+ * @nd_region: blk or interleaved pmem region
+ */
+void nvdimm_flush(struct nd_region *nd_region)
+{
+	struct nd_region_data *ndrd = dev_get_drvdata(&nd_region->dev);
+	int i;
+
+	/*
+	 * The first wmb() is needed to 'sfence' all previous writes
+	 * such that they are architecturally visible for the platform
+	 * buffer flush.  Note that we've already arranged for pmem
+	 * writes to avoid the cache via arch_memcpy_to_pmem().  The
+	 * final wmb() ensures ordering for the NVDIMM flush write.
+	 */
+	wmb();
+	for (i = 0; i < nd_region->ndr_mappings; i++)
+		if (ndrd->flush_wpq[i][0])
+			writeq(1, ndrd->flush_wpq[i][0]);
+	wmb();
+}
+EXPORT_SYMBOL_GPL(nvdimm_flush);
+
+/**
+ * nvdimm_has_flush - determine write flushing requirements
+ * @nd_region: blk or interleaved pmem region
+ *
+ * Returns 1 if writes require flushing
+ * Returns 0 if writes do not require flushing
+ * Returns -ENXIO if flushing capability can not be determined
+ */
+int nvdimm_has_flush(struct nd_region *nd_region)
+{
+	struct nd_region_data *ndrd = dev_get_drvdata(&nd_region->dev);
+	int i;
+
+	/* no nvdimm == flushing capability unknown */
+	if (nd_region->ndr_mappings == 0)
+		return -ENXIO;
+
+	for (i = 0; i < nd_region->ndr_mappings; i++)
+		/* flush hints present, flushing required */
+		if (ndrd->flush_wpq[i][0])
+			return 1;
+
+	/*
+	 * The platform defines dimm devices without hints, assume
+	 * platform persistence mechanism like ADR
+	 */
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nvdimm_has_flush);
+
 void __exit nd_region_devs_exit(void)
 {
 	ida_destroy(&region_ida);
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 815b9b430ead..d37fda6dd64c 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -166,4 +166,6 @@ struct nvdimm *nd_blk_region_to_dimm(struct nd_blk_region *ndbr);
 unsigned int nd_region_acquire_lane(struct nd_region *nd_region);
 void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane);
 u64 nd_fletcher64(void *addr, size_t len, bool le);
+void nvdimm_flush(struct nd_region *nd_region);
+int nvdimm_has_flush(struct nd_region *nd_region);
 #endif /* __LIBNVDIMM_H__ */

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 09/17] libnvdimm: cycle flush hints
  2016-07-10  3:24 [PATCH v2 00/17] replace pcommit with ADR or directed flushing Dan Williams
                   ` (7 preceding siblings ...)
  2016-07-10  3:25 ` [PATCH v2 08/17] libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush() Dan Williams
@ 2016-07-10  3:25 ` Dan Williams
  2016-07-10  3:25 ` [PATCH v2 10/17] libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush() Dan Williams
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Dan Williams @ 2016-07-10  3:25 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: linux-fsdevel, linux-acpi, Ross Zwisler, hch, linux-kernel

When the NFIT provides multiple flush hint addresses per-dimm it is
expressing that the platform is capable of processing multiple flush
requests in parallel.  There is some fixed cost per flush request, let
the cost be shared in parallel on multiple cpus.

Since there may not be enough flush hint addresses for each cpu to have
one, keep a per-cpu index of the last used hint, hash it with current
pid, and assume that access pattern and scheduler randomness will keep
the flush-hint usage somewhat staggered across cpus.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/nd.h          |    1 +
 drivers/nvdimm/region_devs.c |   17 ++++++++++++++---
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 5912bd6b4234..40476399d227 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -52,6 +52,7 @@ struct nvdimm_drvdata {
 struct nd_region_data {
 	int ns_count;
 	int ns_active;
+	unsigned int flush_mask;
 	void __iomem *flush_wpq[0][0];
 };
 
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 46b6e2f7d5f0..4bcb3b6744aa 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -14,6 +14,7 @@
 #include <linux/highmem.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/hash.h>
 #include <linux/pmem.h>
 #include <linux/sort.h>
 #include <linux/io.h>
@@ -22,6 +23,7 @@
 #include "nd.h"
 
 static DEFINE_IDA(region_ida);
+static DEFINE_PER_CPU(int, flush_idx);
 
 static int nvdimm_map_flush(struct device *dev, struct nvdimm *nvdimm, int dimm,
 		struct nd_region_data *ndrd)
@@ -61,7 +63,7 @@ static int nvdimm_map_flush(struct device *dev, struct nvdimm *nvdimm, int dimm,
 
 int nd_region_activate(struct nd_region *nd_region)
 {
-	int i;
+	int i, num_flush = 0;
 	struct nd_region_data *ndrd;
 	struct device *dev = &nd_region->dev;
 	size_t flush_data_size = sizeof(void *);
@@ -73,6 +75,7 @@ int nd_region_activate(struct nd_region *nd_region)
 
 		/* at least one null hint slot per-dimm for the "no-hint" case */
 		flush_data_size += sizeof(void *);
+		num_flush = min_not_zero(num_flush, nvdimm->num_flush);
 		if (!nvdimm->num_flush)
 			continue;
 		flush_data_size += nvdimm->num_flush * sizeof(void *);
@@ -84,6 +87,7 @@ int nd_region_activate(struct nd_region *nd_region)
 		return -ENOMEM;
 	dev_set_drvdata(dev, ndrd);
 
+	ndrd->flush_mask = (1 << ilog2(num_flush)) - 1;
 	for (i = 0; i < nd_region->ndr_mappings; i++) {
 		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
 		struct nvdimm *nvdimm = nd_mapping->nvdimm;
@@ -872,7 +876,14 @@ EXPORT_SYMBOL_GPL(nvdimm_volatile_region_create);
 void nvdimm_flush(struct nd_region *nd_region)
 {
 	struct nd_region_data *ndrd = dev_get_drvdata(&nd_region->dev);
-	int i;
+	int i, idx;
+
+	/*
+	 * Try to encourage some diversity in flush hint addresses
+	 * across cpus assuming a limited number of flush hints.
+	 */
+	idx = this_cpu_read(flush_idx);
+	idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8));
 
 	/*
 	 * The first wmb() is needed to 'sfence' all previous writes
@@ -884,7 +895,7 @@ void nvdimm_flush(struct nd_region *nd_region)
 	wmb();
 	for (i = 0; i < nd_region->ndr_mappings; i++)
 		if (ndrd->flush_wpq[i][0])
-			writeq(1, ndrd->flush_wpq[i][0]);
+			writeq(1, ndrd->flush_wpq[i][idx & ndrd->flush_mask]);
 	wmb();
 }
 EXPORT_SYMBOL_GPL(nvdimm_flush);

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 10/17] libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush()
  2016-07-10  3:24 [PATCH v2 00/17] replace pcommit with ADR or directed flushing Dan Williams
                   ` (8 preceding siblings ...)
  2016-07-10  3:25 ` [PATCH v2 09/17] libnvdimm: cycle flush hints Dan Williams
@ 2016-07-10  3:25 ` Dan Williams
  2016-07-12 22:26   ` [PATCH v3] " Dan Williams
  2016-07-10  3:25 ` [PATCH v2 11/17] libnvdimm, pmem: flush posted-write queues on shutdown Dan Williams
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 32+ messages in thread
From: Dan Williams @ 2016-07-10  3:25 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: linux-fsdevel, linux-acpi, Ross Zwisler, hch, linux-kernel

Given that nvdimm_flush() has higher overhead than wmb_pmem() (pointer
chasing through nd_region), and that we otherwise assume a platform has
ADR capability when flush hints are not present, move nvdimm_flush() to
REQ_FLUSH context.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/pmem.c |   24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index e303655f243e..18cd95719da0 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -113,6 +113,11 @@ static int pmem_do_bvec(struct pmem_device *pmem, struct page *page,
 	return rc;
 }
 
+/* account for REQ_FLUSH rename, replace with REQ_PREFLUSH after v4.8-rc1 */
+#ifndef REQ_FLUSH
+#define REQ_FLUSH REQ_PREFLUSH
+#endif
+
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
 	int rc = 0;
@@ -121,6 +126,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 	struct bio_vec bvec;
 	struct bvec_iter iter;
 	struct pmem_device *pmem = q->queuedata;
+	struct nd_region *nd_region = to_region(pmem);
+
+	if (bio->bi_rw & REQ_FLUSH)
+		nvdimm_flush(nd_region);
 
 	do_acct = nd_iostat_start(bio, &start);
 	bio_for_each_segment(bvec, bio, iter) {
@@ -135,8 +144,8 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 	if (do_acct)
 		nd_iostat_end(bio, start);
 
-	if (bio_data_dir(bio))
-		nvdimm_flush(to_region(pmem));
+	if (bio->bi_rw & REQ_FUA)
+		nvdimm_flush(nd_region);
 
 	bio_endio(bio);
 	return BLK_QC_T_NONE;
@@ -149,8 +158,6 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
 	int rc;
 
 	rc = pmem_do_bvec(pmem, page, PAGE_SIZE, 0, rw, sector);
-	if (rw & WRITE)
-		nvdimm_flush(to_region(pmem));
 
 	/*
 	 * The ->rw_page interface is subtle and tricky.  The core
@@ -209,9 +216,9 @@ static int pmem_attach_disk(struct device *dev,
 	struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
 	struct nd_region *nd_region = to_nd_region(dev->parent);
 	struct vmem_altmap __altmap, *altmap = NULL;
+	int nid = dev_to_node(dev), has_flush;
 	struct resource *res = &nsio->res;
 	struct nd_pfn *nd_pfn = NULL;
-	int nid = dev_to_node(dev);
 	struct nd_pfn_sb *pfn_sb;
 	struct pmem_device *pmem;
 	struct resource pfn_res;
@@ -237,8 +244,6 @@ static int pmem_attach_disk(struct device *dev,
 	dev_set_drvdata(dev, pmem);
 	pmem->phys_addr = res->start;
 	pmem->size = resource_size(res);
-	if (nvdimm_has_flush(nd_region) < 0)
-		dev_warn(dev, "unable to guarantee persistence of writes\n");
 
 	if (!devm_request_mem_region(dev, res->start, resource_size(res),
 				dev_name(dev))) {
@@ -279,6 +284,11 @@ static int pmem_attach_disk(struct device *dev,
 		return PTR_ERR(addr);
 	pmem->virt_addr = (void __pmem *) addr;
 
+	has_flush = nvdimm_has_flush(nd_region);
+	if (has_flush < 0)
+		dev_warn(dev, "unable to guarantee persistence of writes\n");
+	else if (has_flush > 0)
+		blk_queue_write_cache(q, true, true);
 	blk_queue_make_request(q, pmem_make_request);
 	blk_queue_physical_block_size(q, PAGE_SIZE);
 	blk_queue_max_hw_sectors(q, UINT_MAX);

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 11/17] libnvdimm, pmem: flush posted-write queues on shutdown
  2016-07-10  3:24 [PATCH v2 00/17] replace pcommit with ADR or directed flushing Dan Williams
                   ` (9 preceding siblings ...)
  2016-07-10  3:25 ` [PATCH v2 10/17] libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush() Dan Williams
@ 2016-07-10  3:25 ` Dan Williams
  2016-07-10  3:25 ` [PATCH v2 12/17] fs/dax: remove wmb_pmem() Dan Williams
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Dan Williams @ 2016-07-10  3:25 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: linux-fsdevel, linux-acpi, hch, linux-kernel

Commit writes to media on system shutdown or pmem driver unload.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/bus.c  |   16 ++++++++++++++++
 drivers/nvdimm/pmem.c |    8 ++++++++
 include/linux/nd.h    |    1 +
 3 files changed, 25 insertions(+)

diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index e4882e63bece..1cc7880320fe 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -136,6 +136,21 @@ static int nvdimm_bus_remove(struct device *dev)
 	return rc;
 }
 
+static void nvdimm_bus_shutdown(struct device *dev)
+{
+	struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
+	struct nd_device_driver *nd_drv = NULL;
+
+	if (dev->driver)
+		nd_drv = to_nd_device_driver(dev->driver);
+
+	if (nd_drv && nd_drv->shutdown) {
+		nd_drv->shutdown(dev);
+		dev_dbg(&nvdimm_bus->dev, "%s.shutdown(%s)\n",
+				dev->driver->name, dev_name(dev));
+	}
+}
+
 void nd_device_notify(struct device *dev, enum nvdimm_event event)
 {
 	device_lock(dev);
@@ -214,6 +229,7 @@ static struct bus_type nvdimm_bus_type = {
 	.match = nvdimm_bus_match,
 	.probe = nvdimm_bus_probe,
 	.remove = nvdimm_bus_remove,
+	.shutdown = nvdimm_bus_shutdown,
 };
 
 static ASYNC_DOMAIN_EXCLUSIVE(nd_async_domain);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 18cd95719da0..3f3fdb9586b9 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -351,9 +351,16 @@ static int nd_pmem_remove(struct device *dev)
 {
 	if (is_nd_btt(dev))
 		nvdimm_namespace_detach_btt(to_nd_btt(dev));
+	nvdimm_flush(to_nd_region(dev->parent));
+
 	return 0;
 }
 
+static void nd_pmem_shutdown(struct device *dev)
+{
+	nvdimm_flush(to_nd_region(dev->parent));
+}
+
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
 {
 	struct pmem_device *pmem = dev_get_drvdata(dev);
@@ -393,6 +400,7 @@ static struct nd_device_driver nd_pmem_driver = {
 	.probe = nd_pmem_probe,
 	.remove = nd_pmem_remove,
 	.notify = nd_pmem_notify,
+	.shutdown = nd_pmem_shutdown,
 	.drv = {
 		.name = "nd_pmem",
 	},
diff --git a/include/linux/nd.h b/include/linux/nd.h
index aee2761d294c..1ecd64643512 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -26,6 +26,7 @@ struct nd_device_driver {
 	unsigned long type;
 	int (*probe)(struct device *dev);
 	int (*remove)(struct device *dev);
+	void (*shutdown)(struct device *dev);
 	void (*notify)(struct device *dev, enum nvdimm_event event);
 };
 

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 12/17] fs/dax: remove wmb_pmem()
  2016-07-10  3:24 [PATCH v2 00/17] replace pcommit with ADR or directed flushing Dan Williams
                   ` (10 preceding siblings ...)
  2016-07-10  3:25 ` [PATCH v2 11/17] libnvdimm, pmem: flush posted-write queues on shutdown Dan Williams
@ 2016-07-10  3:25 ` Dan Williams
  2016-07-10  3:25 ` [PATCH v2 13/17] libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes Dan Williams
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Dan Williams @ 2016-07-10  3:25 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: linux-fsdevel, linux-acpi, Ross Zwisler, hch, linux-kernel

Flushing posted-write queues is now deferred to REQ_FLUSH context, or
otherwise handled by an ADR event at the platform level.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/dax.c |    7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 761495bf5eb9..434f421da660 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -147,7 +147,7 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
 		      struct buffer_head *bh)
 {
 	loff_t pos = start, max = start, bh_max = start;
-	bool hole = false, need_wmb = false;
+	bool hole = false;
 	struct block_device *bdev = NULL;
 	int rw = iov_iter_rw(iter), rc;
 	long map_len = 0;
@@ -213,7 +213,6 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
 
 		if (iov_iter_rw(iter) == WRITE) {
 			len = copy_from_iter_pmem(dax.addr, max - pos, iter);
-			need_wmb = true;
 		} else if (!hole)
 			len = copy_to_iter((void __force *) dax.addr, max - pos,
 					iter);
@@ -230,8 +229,6 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
 			dax.addr += len;
 	}
 
-	if (need_wmb)
-		wmb_pmem();
 	dax_unmap_atomic(bdev, &dax);
 
 	return (pos == start) ? rc : pos - start;
@@ -783,7 +780,6 @@ int dax_writeback_mapping_range(struct address_space *mapping,
 				return ret;
 		}
 	}
-	wmb_pmem();
 	return 0;
 }
 EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
@@ -1227,7 +1223,6 @@ int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
 		if (dax_map_atomic(bdev, &dax) < 0)
 			return PTR_ERR(dax.addr);
 		clear_pmem(dax.addr + offset, length);
-		wmb_pmem();
 		dax_unmap_atomic(bdev, &dax);
 	}
 	return 0;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 13/17] libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
  2016-07-10  3:24 [PATCH v2 00/17] replace pcommit with ADR or directed flushing Dan Williams
                   ` (11 preceding siblings ...)
  2016-07-10  3:25 ` [PATCH v2 12/17] fs/dax: remove wmb_pmem() Dan Williams
@ 2016-07-10  3:25 ` Dan Williams
  2016-07-10  3:25 ` [PATCH v2 14/17] pmem: kill wmb_pmem() Dan Williams
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Dan Williams @ 2016-07-10  3:25 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: linux-fsdevel, linux-acpi, Ross Zwisler, hch, linux-kernel

nsio_rw_bytes() is used to write info block metadata to the namespace,
so it should trigger a flush after every write.  Replace wmb_pmem() with
nvdimm_flush() in this path.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/claim.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index 9997ff94a132..d5dc80c48b4c 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -240,7 +240,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 		return memcpy_from_pmem(buf, nsio->addr + offset, size);
 	} else {
 		memcpy_to_pmem(nsio->addr + offset, buf, size);
-		wmb_pmem();
+		nvdimm_flush(to_nd_region(ndns->dev.parent));
 	}
 
 	return 0;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 14/17] pmem: kill wmb_pmem()
  2016-07-10  3:24 [PATCH v2 00/17] replace pcommit with ADR or directed flushing Dan Williams
                   ` (12 preceding siblings ...)
  2016-07-10  3:25 ` [PATCH v2 13/17] libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes Dan Williams
@ 2016-07-10  3:25 ` Dan Williams
  2016-07-10  3:25 ` [PATCH v2 15/17] Revert "KVM: x86: add pcommit support" Dan Williams
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Dan Williams @ 2016-07-10  3:25 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: linux-fsdevel, linux-acpi, Ross Zwisler, hch, linux-kernel

All users have been replaced with flushing in the pmem driver.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/pmem.h |   36 ++-------------------------------
 include/linux/pmem.h        |   47 ++++---------------------------------------
 2 files changed, 6 insertions(+), 77 deletions(-)

diff --git a/arch/x86/include/asm/pmem.h b/arch/x86/include/asm/pmem.h
index fbc5e92e1ecc..a8cf2a6b14d9 100644
--- a/arch/x86/include/asm/pmem.h
+++ b/arch/x86/include/asm/pmem.h
@@ -26,8 +26,7 @@
  * @n: length of the copy in bytes
  *
  * Copy data to persistent memory media via non-temporal stores so that
- * a subsequent arch_wmb_pmem() can flush cpu and memory controller
- * write buffers to guarantee durability.
+ * a subsequent pmem driver flush operation will drain posted write queues.
  */
 static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src,
 		size_t n)
@@ -57,32 +56,12 @@ static inline int arch_memcpy_from_pmem(void *dst, const void __pmem *src,
 }
 
 /**
- * arch_wmb_pmem - synchronize writes to persistent memory
- *
- * After a series of arch_memcpy_to_pmem() operations this drains data
- * from cpu write buffers and any platform (memory controller) buffers
- * to ensure that written data is durable on persistent memory media.
- */
-static inline void arch_wmb_pmem(void)
-{
-	/*
-	 * wmb() to 'sfence' all previous writes such that they are
-	 * architecturally visible to 'pcommit'.  Note, that we've
-	 * already arranged for pmem writes to avoid the cache via
-	 * arch_memcpy_to_pmem().
-	 */
-	wmb();
-	pcommit_sfence();
-}
-
-/**
  * arch_wb_cache_pmem - write back a cache range with CLWB
  * @vaddr:	virtual start address
  * @size:	number of bytes to write back
  *
  * Write back a cache range using the CLWB (cache line write back)
- * instruction.  This function requires explicit ordering with an
- * arch_wmb_pmem() call.
+ * instruction.
  */
 static inline void arch_wb_cache_pmem(void __pmem *addr, size_t size)
 {
@@ -113,7 +92,6 @@ static inline bool __iter_needs_pmem_wb(struct iov_iter *i)
  * @i:		iterator with source data
  *
  * Copy data from the iterator 'i' to the PMEM buffer starting at 'addr'.
- * This function requires explicit ordering with an arch_wmb_pmem() call.
  */
 static inline size_t arch_copy_from_iter_pmem(void __pmem *addr, size_t bytes,
 		struct iov_iter *i)
@@ -136,7 +114,6 @@ static inline size_t arch_copy_from_iter_pmem(void __pmem *addr, size_t bytes,
  * @size:	number of bytes to zero
  *
  * Write zeros into the memory range starting at 'addr' for 'size' bytes.
- * This function requires explicit ordering with an arch_wmb_pmem() call.
  */
 static inline void arch_clear_pmem(void __pmem *addr, size_t size)
 {
@@ -150,14 +127,5 @@ static inline void arch_invalidate_pmem(void __pmem *addr, size_t size)
 {
 	clflush_cache_range((void __force *) addr, size);
 }
-
-static inline bool __arch_has_wmb_pmem(void)
-{
-	/*
-	 * We require that wmb() be an 'sfence', that is only guaranteed on
-	 * 64-bit builds
-	 */
-	return static_cpu_has(X86_FEATURE_PCOMMIT);
-}
 #endif /* CONFIG_ARCH_HAS_PMEM_API */
 #endif /* __ASM_X86_PMEM_H__ */
diff --git a/include/linux/pmem.h b/include/linux/pmem.h
index 57d146fe44dd..9e3ea94b8157 100644
--- a/include/linux/pmem.h
+++ b/include/linux/pmem.h
@@ -26,16 +26,6 @@
  * calling these symbols with arch_has_pmem_api() and redirect to the
  * implementation in asm/pmem.h.
  */
-static inline bool __arch_has_wmb_pmem(void)
-{
-	return false;
-}
-
-static inline void arch_wmb_pmem(void)
-{
-	BUG();
-}
-
 static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src,
 		size_t n)
 {
@@ -101,20 +91,6 @@ static inline int memcpy_from_pmem(void *dst, void __pmem const *src,
 		return default_memcpy_from_pmem(dst, src, size);
 }
 
-/**
- * arch_has_wmb_pmem - true if wmb_pmem() ensures durability
- *
- * For a given cpu implementation within an architecture it is possible
- * that wmb_pmem() resolves to a nop.  In the case this returns
- * false, pmem api users are unable to ensure durability and may want to
- * fall back to a different data consistency model, or otherwise notify
- * the user.
- */
-static inline bool arch_has_wmb_pmem(void)
-{
-	return arch_has_pmem_api() && __arch_has_wmb_pmem();
-}
-
 /*
  * These defaults seek to offer decent performance and minimize the
  * window between i/o completion and writes being durable on media.
@@ -152,7 +128,7 @@ static inline void default_clear_pmem(void __pmem *addr, size_t size)
  * being effectively evicted from, or never written to, the processor
  * cache hierarchy after the copy completes.  After memcpy_to_pmem()
  * data may still reside in cpu or platform buffers, so this operation
- * must be followed by a wmb_pmem().
+ * must be followed by a blkdev_issue_flush() on the pmem block device.
  */
 static inline void memcpy_to_pmem(void __pmem *dst, const void *src, size_t n)
 {
@@ -163,28 +139,13 @@ static inline void memcpy_to_pmem(void __pmem *dst, const void *src, size_t n)
 }
 
 /**
- * wmb_pmem - synchronize writes to persistent memory
- *
- * After a series of memcpy_to_pmem() operations this drains data from
- * cpu write buffers and any platform (memory controller) buffers to
- * ensure that written data is durable on persistent memory media.
- */
-static inline void wmb_pmem(void)
-{
-	if (arch_has_wmb_pmem())
-		arch_wmb_pmem();
-	else
-		wmb();
-}
-
-/**
  * copy_from_iter_pmem - copy data from an iterator to PMEM
  * @addr:	PMEM destination address
  * @bytes:	number of bytes to copy
  * @i:		iterator with source data
  *
  * Copy data from the iterator 'i' to the PMEM buffer starting at 'addr'.
- * This function requires explicit ordering with a wmb_pmem() call.
+ * See blkdev_issue_flush() note for memcpy_to_pmem().
  */
 static inline size_t copy_from_iter_pmem(void __pmem *addr, size_t bytes,
 		struct iov_iter *i)
@@ -200,7 +161,7 @@ static inline size_t copy_from_iter_pmem(void __pmem *addr, size_t bytes,
  * @size:	number of bytes to zero
  *
  * Write zeros into the memory range starting at 'addr' for 'size' bytes.
- * This function requires explicit ordering with a wmb_pmem() call.
+ * See blkdev_issue_flush() note for memcpy_to_pmem().
  */
 static inline void clear_pmem(void __pmem *addr, size_t size)
 {
@@ -230,7 +191,7 @@ static inline void invalidate_pmem(void __pmem *addr, size_t size)
  * @size:	number of bytes to write back
  *
  * Write back the processor cache range starting at 'addr' for 'size' bytes.
- * This function requires explicit ordering with a wmb_pmem() call.
+ * See blkdev_issue_flush() note for memcpy_to_pmem().
  */
 static inline void wb_cache_pmem(void __pmem *addr, size_t size)
 {

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 15/17] Revert "KVM: x86: add pcommit support"
  2016-07-10  3:24 [PATCH v2 00/17] replace pcommit with ADR or directed flushing Dan Williams
                   ` (13 preceding siblings ...)
  2016-07-10  3:25 ` [PATCH v2 14/17] pmem: kill wmb_pmem() Dan Williams
@ 2016-07-10  3:25 ` Dan Williams
  2016-07-10  3:25 ` [PATCH v2 16/17] x86/insn: remove pcommit Dan Williams
  2016-07-10  3:25 ` [PATCH v2 17/17] pmem: kill __pmem address space Dan Williams
  16 siblings, 0 replies; 32+ messages in thread
From: Dan Williams @ 2016-07-10  3:25 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Xiao Guangrong, linux-kernel, linux-acpi, linux-fsdevel,
	Paolo Bonzini, Ross Zwisler, hch

This reverts commit 8b3e34e46aca9b6d349b331cd9cf71ccbdc91b2e.

Given the deprecation of the pcommit instruction, revert its usage as a
vm exit source in kvm.

Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/vmx.h      |    1 -
 arch/x86/include/uapi/asm/vmx.h |    4 +---
 arch/x86/kvm/cpuid.c            |    2 +-
 arch/x86/kvm/cpuid.h            |    8 --------
 arch/x86/kvm/vmx.c              |   32 ++++----------------------------
 5 files changed, 6 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 14c63c7e8337..a002b07a7099 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -72,7 +72,6 @@
 #define SECONDARY_EXEC_SHADOW_VMCS              0x00004000
 #define SECONDARY_EXEC_ENABLE_PML               0x00020000
 #define SECONDARY_EXEC_XSAVES			0x00100000
-#define SECONDARY_EXEC_PCOMMIT			0x00200000
 #define SECONDARY_EXEC_TSC_SCALING              0x02000000
 
 #define PIN_BASED_EXT_INTR_MASK                 0x00000001
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index 5b15d94a33f8..37fee272618f 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -78,7 +78,6 @@
 #define EXIT_REASON_PML_FULL            62
 #define EXIT_REASON_XSAVES              63
 #define EXIT_REASON_XRSTORS             64
-#define EXIT_REASON_PCOMMIT             65
 
 #define VMX_EXIT_REASONS \
 	{ EXIT_REASON_EXCEPTION_NMI,         "EXCEPTION_NMI" }, \
@@ -127,8 +126,7 @@
 	{ EXIT_REASON_INVVPID,               "INVVPID" }, \
 	{ EXIT_REASON_INVPCID,               "INVPCID" }, \
 	{ EXIT_REASON_XSAVES,                "XSAVES" }, \
-	{ EXIT_REASON_XRSTORS,               "XRSTORS" }, \
-	{ EXIT_REASON_PCOMMIT,               "PCOMMIT" }
+	{ EXIT_REASON_XRSTORS,               "XRSTORS" }
 
 #define VMX_ABORT_SAVE_GUEST_MSR_FAIL        1
 #define VMX_ABORT_LOAD_HOST_MSR_FAIL         4
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 7597b42a8a88..643565364497 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -366,7 +366,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
 		F(BMI2) | F(ERMS) | f_invpcid | F(RTM) | f_mpx | F(RDSEED) |
 		F(ADX) | F(SMAP) | F(AVX512F) | F(AVX512PF) | F(AVX512ER) |
-		F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB) | F(PCOMMIT);
+		F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB);
 
 	/* cpuid 0xD.1.eax */
 	const u32 kvm_cpuid_D_1_eax_x86_features =
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index e17a74b1d852..35058c2c0eea 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -144,14 +144,6 @@ static inline bool guest_cpuid_has_rtm(struct kvm_vcpu *vcpu)
 	return best && (best->ebx & bit(X86_FEATURE_RTM));
 }
 
-static inline bool guest_cpuid_has_pcommit(struct kvm_vcpu *vcpu)
-{
-	struct kvm_cpuid_entry2 *best;
-
-	best = kvm_find_cpuid_entry(vcpu, 7, 0);
-	return best && (best->ebx & bit(X86_FEATURE_PCOMMIT));
-}
-
 static inline bool guest_cpuid_has_rdtscp(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *best;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index fb93010beaa4..2e2685424fdc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2705,8 +2705,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
 		SECONDARY_EXEC_APIC_REGISTER_VIRT |
 		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
 		SECONDARY_EXEC_WBINVD_EXITING |
-		SECONDARY_EXEC_XSAVES |
-		SECONDARY_EXEC_PCOMMIT;
+		SECONDARY_EXEC_XSAVES;
 
 	if (enable_ept) {
 		/* nested EPT: emulate EPT also to L1 */
@@ -3268,7 +3267,6 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 			SECONDARY_EXEC_SHADOW_VMCS |
 			SECONDARY_EXEC_XSAVES |
 			SECONDARY_EXEC_ENABLE_PML |
-			SECONDARY_EXEC_PCOMMIT |
 			SECONDARY_EXEC_TSC_SCALING;
 		if (adjust_vmx_controls(min2, opt2,
 					MSR_IA32_VMX_PROCBASED_CTLS2,
@@ -4856,9 +4854,6 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
 	if (!enable_pml)
 		exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
 
-	/* Currently, we allow L1 guest to directly run pcommit instruction. */
-	exec_control &= ~SECONDARY_EXEC_PCOMMIT;
-
 	return exec_control;
 }
 
@@ -4902,9 +4897,10 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 
 	vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, vmx_exec_control(vmx));
 
-	if (cpu_has_secondary_exec_ctrls())
+	if (cpu_has_secondary_exec_ctrls()) {
 		vmcs_write32(SECONDARY_VM_EXEC_CONTROL,
 				vmx_secondary_exec_control(vmx));
+	}
 
 	if (kvm_vcpu_apicv_active(&vmx->vcpu)) {
 		vmcs_write64(EOI_EXIT_BITMAP0, 0);
@@ -7557,13 +7553,6 @@ static int handle_pml_full(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
-static int handle_pcommit(struct kvm_vcpu *vcpu)
-{
-	/* we never catch pcommit instruct for L1 guest. */
-	WARN_ON(1);
-	return 1;
-}
-
 /*
  * The exit handlers return 1 if the exit was handled fully and guest execution
  * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
@@ -7614,7 +7603,6 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[EXIT_REASON_XSAVES]                  = handle_xsaves,
 	[EXIT_REASON_XRSTORS]                 = handle_xrstors,
 	[EXIT_REASON_PML_FULL]		      = handle_pml_full,
-	[EXIT_REASON_PCOMMIT]                 = handle_pcommit,
 };
 
 static const int kvm_vmx_max_exit_handlers =
@@ -7923,8 +7911,6 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
 		 * the XSS exit bitmap in vmcs12.
 		 */
 		return nested_cpu_has2(vmcs12, SECONDARY_EXEC_XSAVES);
-	case EXIT_REASON_PCOMMIT:
-		return nested_cpu_has2(vmcs12, SECONDARY_EXEC_PCOMMIT);
 	default:
 		return true;
 	}
@@ -9085,15 +9071,6 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
 
 	if (cpu_has_secondary_exec_ctrls())
 		vmcs_set_secondary_exec_control(secondary_exec_ctl);
-
-	if (static_cpu_has(X86_FEATURE_PCOMMIT) && nested) {
-		if (guest_cpuid_has_pcommit(vcpu))
-			vmx->nested.nested_vmx_secondary_ctls_high |=
-				SECONDARY_EXEC_PCOMMIT;
-		else
-			vmx->nested.nested_vmx_secondary_ctls_high &=
-				~SECONDARY_EXEC_PCOMMIT;
-	}
 }
 
 static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
@@ -9706,8 +9683,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 		exec_control &= ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
 				  SECONDARY_EXEC_RDTSCP |
 				  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
-				  SECONDARY_EXEC_APIC_REGISTER_VIRT |
-				  SECONDARY_EXEC_PCOMMIT);
+				  SECONDARY_EXEC_APIC_REGISTER_VIRT);
 		if (nested_cpu_has(vmcs12,
 				CPU_BASED_ACTIVATE_SECONDARY_CONTROLS))
 			exec_control |= vmcs12->secondary_vm_exec_control;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 16/17] x86/insn: remove pcommit
  2016-07-10  3:24 [PATCH v2 00/17] replace pcommit with ADR or directed flushing Dan Williams
                   ` (14 preceding siblings ...)
  2016-07-10  3:25 ` [PATCH v2 15/17] Revert "KVM: x86: add pcommit support" Dan Williams
@ 2016-07-10  3:25 ` Dan Williams
  2016-07-12 14:57   ` Peter Zijlstra
  2016-07-10  3:25 ` [PATCH v2 17/17] pmem: kill __pmem address space Dan Williams
  16 siblings, 1 reply; 32+ messages in thread
From: Dan Williams @ 2016-07-10  3:25 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Xiao Guangrong, Josh Poimboeuf, linux-acpi, Peter Zijlstra,
	linux-kernel, x86, Adrian Hunter, Arnaldo Carvalho de Melo, hch,
	Alexander Shishkin, Ingo Molnar, Andy Lutomirski, H. Peter Anvin,
	linux-fsdevel, Thomas Gleixner, Borislav Petkov, Ross Zwisler

The pcommit instruction is being deprecated in favor of either ADR
(asynchronous DRAM refresh: flush-on-power-fail) at the platform level, or
posted-write-queue flush addresses as defined by the ACPI 6.x NFIT (NVDIMM
Firmware Interface Table).

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/cpufeatures.h                 |    1 
 arch/x86/include/asm/special_insns.h               |   46 --------------------
 arch/x86/lib/x86-opcode-map.txt                    |    2 -
 tools/objtool/arch/x86/insn/x86-opcode-map.txt     |    2 -
 tools/perf/arch/x86/tests/insn-x86-dat-32.c        |    2 -
 tools/perf/arch/x86/tests/insn-x86-dat-64.c        |    2 -
 tools/perf/arch/x86/tests/insn-x86-dat-src.c       |    4 --
 .../perf/util/intel-pt-decoder/x86-opcode-map.txt  |    2 -
 8 files changed, 3 insertions(+), 58 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 4a413485f9eb..700d97df7d28 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -225,7 +225,6 @@
 #define X86_FEATURE_RDSEED	( 9*32+18) /* The RDSEED instruction */
 #define X86_FEATURE_ADX		( 9*32+19) /* The ADCX and ADOX instructions */
 #define X86_FEATURE_SMAP	( 9*32+20) /* Supervisor Mode Access Prevention */
-#define X86_FEATURE_PCOMMIT	( 9*32+22) /* PCOMMIT instruction */
 #define X86_FEATURE_CLFLUSHOPT	( 9*32+23) /* CLFLUSHOPT instruction */
 #define X86_FEATURE_CLWB	( 9*32+24) /* CLWB instruction */
 #define X86_FEATURE_AVX512PF	( 9*32+26) /* AVX-512 Prefetch */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index d96d04377765..587d7914ea4b 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -253,52 +253,6 @@ static inline void clwb(volatile void *__p)
 		: [pax] "a" (p));
 }
 
-/**
- * pcommit_sfence() - persistent commit and fence
- *
- * The PCOMMIT instruction ensures that data that has been flushed from the
- * processor's cache hierarchy with CLWB, CLFLUSHOPT or CLFLUSH is accepted to
- * memory and is durable on the DIMM.  The primary use case for this is
- * persistent memory.
- *
- * This function shows how to properly use CLWB/CLFLUSHOPT/CLFLUSH and PCOMMIT
- * with appropriate fencing.
- *
- * Example:
- * void flush_and_commit_buffer(void *vaddr, unsigned int size)
- * {
- *         unsigned long clflush_mask = boot_cpu_data.x86_clflush_size - 1;
- *         void *vend = vaddr + size;
- *         void *p;
- *
- *         for (p = (void *)((unsigned long)vaddr & ~clflush_mask);
- *              p < vend; p += boot_cpu_data.x86_clflush_size)
- *                 clwb(p);
- *
- *         // SFENCE to order CLWB/CLFLUSHOPT/CLFLUSH cache flushes
- *         // MFENCE via mb() also works
- *         wmb();
- *
- *         // PCOMMIT and the required SFENCE for ordering
- *         pcommit_sfence();
- * }
- *
- * After this function completes the data pointed to by 'vaddr' has been
- * accepted to memory and will be durable if the 'vaddr' points to persistent
- * memory.
- *
- * PCOMMIT must always be ordered by an MFENCE or SFENCE, so to help simplify
- * things we include both the PCOMMIT and the required SFENCE in the
- * alternatives generated by pcommit_sfence().
- */
-static inline void pcommit_sfence(void)
-{
-	alternative(ASM_NOP7,
-		    ".byte 0x66, 0x0f, 0xae, 0xf8\n\t" /* pcommit */
-		    "sfence",
-		    X86_FEATURE_PCOMMIT);
-}
-
 #define nop() asm volatile ("nop")
 
 
diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index d388de72eaca..28632ee68377 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -947,7 +947,7 @@ GrpTable: Grp15
 4: XSAVE
 5: XRSTOR | lfence (11B)
 6: XSAVEOPT | clwb (66) | mfence (11B)
-7: clflush | clflushopt (66) | sfence (11B) | pcommit (66),(11B)
+7: clflush | clflushopt (66) | sfence (11B)
 EndTable
 
 GrpTable: Grp16
diff --git a/tools/objtool/arch/x86/insn/x86-opcode-map.txt b/tools/objtool/arch/x86/insn/x86-opcode-map.txt
index d388de72eaca..28632ee68377 100644
--- a/tools/objtool/arch/x86/insn/x86-opcode-map.txt
+++ b/tools/objtool/arch/x86/insn/x86-opcode-map.txt
@@ -947,7 +947,7 @@ GrpTable: Grp15
 4: XSAVE
 5: XRSTOR | lfence (11B)
 6: XSAVEOPT | clwb (66) | mfence (11B)
-7: clflush | clflushopt (66) | sfence (11B) | pcommit (66),(11B)
+7: clflush | clflushopt (66) | sfence (11B)
 EndTable
 
 GrpTable: Grp16
diff --git a/tools/perf/arch/x86/tests/insn-x86-dat-32.c b/tools/perf/arch/x86/tests/insn-x86-dat-32.c
index 3b491cfe204e..38a48daed154 100644
--- a/tools/perf/arch/x86/tests/insn-x86-dat-32.c
+++ b/tools/perf/arch/x86/tests/insn-x86-dat-32.c
@@ -654,5 +654,3 @@
 "0f c7 1d 78 56 34 12 \txrstors 0x12345678",},
 {{0x0f, 0xc7, 0x9c, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 8, 0, "", "",
 "0f c7 9c c8 78 56 34 12 \txrstors 0x12345678(%eax,%ecx,8)",},
-{{0x66, 0x0f, 0xae, 0xf8, }, 4, 0, "", "",
-"66 0f ae f8          \tpcommit ",},
diff --git a/tools/perf/arch/x86/tests/insn-x86-dat-64.c b/tools/perf/arch/x86/tests/insn-x86-dat-64.c
index 4fe7cce179c4..1f11ea85b60f 100644
--- a/tools/perf/arch/x86/tests/insn-x86-dat-64.c
+++ b/tools/perf/arch/x86/tests/insn-x86-dat-64.c
@@ -764,5 +764,3 @@
 "0f c7 9c c8 78 56 34 12 \txrstors 0x12345678(%rax,%rcx,8)",},
 {{0x41, 0x0f, 0xc7, 0x9c, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 9, 0, "", "",
 "41 0f c7 9c c8 78 56 34 12 \txrstors 0x12345678(%r8,%rcx,8)",},
-{{0x66, 0x0f, 0xae, 0xf8, }, 4, 0, "", "",
-"66 0f ae f8          \tpcommit ",},
diff --git a/tools/perf/arch/x86/tests/insn-x86-dat-src.c b/tools/perf/arch/x86/tests/insn-x86-dat-src.c
index 41b1b1c62660..033b8a6fdab9 100644
--- a/tools/perf/arch/x86/tests/insn-x86-dat-src.c
+++ b/tools/perf/arch/x86/tests/insn-x86-dat-src.c
@@ -866,10 +866,6 @@ int main(void)
 
 #endif /* #ifndef __x86_64__ */
 
-	/* pcommit */
-
-	asm volatile("pcommit");
-
 	/* Following line is a marker for the awk script - do not change */
 	asm volatile("rdtsc"); /* Stop here */
 
diff --git a/tools/perf/util/intel-pt-decoder/x86-opcode-map.txt b/tools/perf/util/intel-pt-decoder/x86-opcode-map.txt
index d388de72eaca..28632ee68377 100644
--- a/tools/perf/util/intel-pt-decoder/x86-opcode-map.txt
+++ b/tools/perf/util/intel-pt-decoder/x86-opcode-map.txt
@@ -947,7 +947,7 @@ GrpTable: Grp15
 4: XSAVE
 5: XRSTOR | lfence (11B)
 6: XSAVEOPT | clwb (66) | mfence (11B)
-7: clflush | clflushopt (66) | sfence (11B) | pcommit (66),(11B)
+7: clflush | clflushopt (66) | sfence (11B)
 EndTable
 
 GrpTable: Grp16

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 17/17] pmem: kill __pmem address space
  2016-07-10  3:24 [PATCH v2 00/17] replace pcommit with ADR or directed flushing Dan Williams
                   ` (15 preceding siblings ...)
  2016-07-10  3:25 ` [PATCH v2 16/17] x86/insn: remove pcommit Dan Williams
@ 2016-07-10  3:25 ` Dan Williams
  16 siblings, 0 replies; 32+ messages in thread
From: Dan Williams @ 2016-07-10  3:25 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: linux-fsdevel, linux-acpi, Ross Zwisler, hch, linux-kernel

The __pmem address space was meant to annotate codepaths that touch
persistent memory and need to coordinate a call to wmb_pmem().  Now that
wmb_pmem() is gone, there is little need to keep this annotation.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/filesystems/Locking |    2 +
 arch/powerpc/sysdev/axonram.c     |    4 +-
 arch/x86/include/asm/pmem.h       |   41 +++++++++-------------
 drivers/acpi/nfit.h               |    2 +
 drivers/block/brd.c               |    4 +-
 drivers/nvdimm/pmem.c             |    6 ++-
 drivers/nvdimm/pmem.h             |    4 +-
 drivers/s390/block/dcssblk.c      |    6 ++-
 fs/dax.c                          |    6 ++-
 include/linux/blkdev.h            |    6 ++-
 include/linux/compiler.h          |    2 -
 include/linux/nd.h                |    2 +
 include/linux/pmem.h              |   70 +++++++++----------------------------
 scripts/checkpatch.pl             |    1 -
 tools/testing/nvdimm/pmem-dax.c   |    2 +
 15 files changed, 56 insertions(+), 102 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 75eea7ce3d7c..d9c37ec4c760 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -395,7 +395,7 @@ prototypes:
 	int (*release) (struct gendisk *, fmode_t);
 	int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
 	int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
-	int (*direct_access) (struct block_device *, sector_t, void __pmem **,
+	int (*direct_access) (struct block_device *, sector_t, void **,
 				unsigned long *);
 	int (*media_changed) (struct gendisk *);
 	void (*unlock_native_capacity) (struct gendisk *);
diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index ff75d70f7285..154cd9110c08 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -143,12 +143,12 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio)
  */
 static long
 axon_ram_direct_access(struct block_device *device, sector_t sector,
-		       void __pmem **kaddr, pfn_t *pfn, long size)
+		       void **kaddr, pfn_t *pfn, long size)
 {
 	struct axon_ram_bank *bank = device->bd_disk->private_data;
 	loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
 
-	*kaddr = (void __pmem __force *) bank->io_addr + offset;
+	*kaddr = bank->io_addr + offset;
 	*pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
 	return bank->size - offset;
 }
diff --git a/arch/x86/include/asm/pmem.h b/arch/x86/include/asm/pmem.h
index a8cf2a6b14d9..643eba42d620 100644
--- a/arch/x86/include/asm/pmem.h
+++ b/arch/x86/include/asm/pmem.h
@@ -28,10 +28,9 @@
  * Copy data to persistent memory media via non-temporal stores so that
  * a subsequent pmem driver flush operation will drain posted write queues.
  */
-static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src,
-		size_t n)
+static inline void arch_memcpy_to_pmem(void *dst, const void *src, size_t n)
 {
-	int unwritten;
+	int rem;
 
 	/*
 	 * We are copying between two kernel buffers, if
@@ -39,19 +38,17 @@ static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src,
 	 * fault) we would have already reported a general protection fault
 	 * before the WARN+BUG.
 	 */
-	unwritten = __copy_from_user_inatomic_nocache((void __force *) dst,
-			(void __user *) src, n);
-	if (WARN(unwritten, "%s: fault copying %p <- %p unwritten: %d\n",
-				__func__, dst, src, unwritten))
+	rem = __copy_from_user_inatomic_nocache(dst, (void __user *) src, n);
+	if (WARN(rem, "%s: fault copying %p <- %p unwritten: %d\n",
+				__func__, dst, src, rem))
 		BUG();
 }
 
-static inline int arch_memcpy_from_pmem(void *dst, const void __pmem *src,
-		size_t n)
+static inline int arch_memcpy_from_pmem(void *dst, const void *src, size_t n)
 {
 	if (static_cpu_has(X86_FEATURE_MCE_RECOVERY))
-		return memcpy_mcsafe(dst, (void __force *) src, n);
-	memcpy(dst, (void __force *) src, n);
+		return memcpy_mcsafe(dst, src, n);
+	memcpy(dst, src, n);
 	return 0;
 }
 
@@ -63,15 +60,14 @@ static inline int arch_memcpy_from_pmem(void *dst, const void __pmem *src,
  * Write back a cache range using the CLWB (cache line write back)
  * instruction.
  */
-static inline void arch_wb_cache_pmem(void __pmem *addr, size_t size)
+static inline void arch_wb_cache_pmem(void *addr, size_t size)
 {
 	u16 x86_clflush_size = boot_cpu_data.x86_clflush_size;
 	unsigned long clflush_mask = x86_clflush_size - 1;
-	void *vaddr = (void __force *)addr;
-	void *vend = vaddr + size;
+	void *vend = addr + size;
 	void *p;
 
-	for (p = (void *)((unsigned long)vaddr & ~clflush_mask);
+	for (p = (void *)((unsigned long)addr & ~clflush_mask);
 	     p < vend; p += x86_clflush_size)
 		clwb(p);
 }
@@ -93,14 +89,13 @@ static inline bool __iter_needs_pmem_wb(struct iov_iter *i)
  *
  * Copy data from the iterator 'i' to the PMEM buffer starting at 'addr'.
  */
-static inline size_t arch_copy_from_iter_pmem(void __pmem *addr, size_t bytes,
+static inline size_t arch_copy_from_iter_pmem(void *addr, size_t bytes,
 		struct iov_iter *i)
 {
-	void *vaddr = (void __force *)addr;
 	size_t len;
 
 	/* TODO: skip the write-back by always using non-temporal stores */
-	len = copy_from_iter_nocache(vaddr, bytes, i);
+	len = copy_from_iter_nocache(addr, bytes, i);
 
 	if (__iter_needs_pmem_wb(i))
 		arch_wb_cache_pmem(addr, bytes);
@@ -115,17 +110,15 @@ static inline size_t arch_copy_from_iter_pmem(void __pmem *addr, size_t bytes,
  *
  * Write zeros into the memory range starting at 'addr' for 'size' bytes.
  */
-static inline void arch_clear_pmem(void __pmem *addr, size_t size)
+static inline void arch_clear_pmem(void *addr, size_t size)
 {
-	void *vaddr = (void __force *)addr;
-
-	memset(vaddr, 0, size);
+	memset(addr, 0, size);
 	arch_wb_cache_pmem(addr, size);
 }
 
-static inline void arch_invalidate_pmem(void __pmem *addr, size_t size)
+static inline void arch_invalidate_pmem(void *addr, size_t size)
 {
-	clflush_cache_range((void __force *) addr, size);
+	clflush_cache_range(addr, size);
 }
 #endif /* CONFIG_ARCH_HAS_PMEM_API */
 #endif /* __ASM_X86_PMEM_H__ */
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index 9fda77cf81da..80fb2c0ac8bf 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -164,7 +164,7 @@ enum nd_blk_mmio_selector {
 struct nd_blk_addr {
 	union {
 		void __iomem *base;
-		void __pmem  *aperture;
+		void *aperture;
 	};
 };
 
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index c04bd9bc39fd..5f1fe4e6208d 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -381,7 +381,7 @@ static int brd_rw_page(struct block_device *bdev, sector_t sector,
 
 #ifdef CONFIG_BLK_DEV_RAM_DAX
 static long brd_direct_access(struct block_device *bdev, sector_t sector,
-			void __pmem **kaddr, pfn_t *pfn, long size)
+			void **kaddr, pfn_t *pfn, long size)
 {
 	struct brd_device *brd = bdev->bd_disk->private_data;
 	struct page *page;
@@ -391,7 +391,7 @@ static long brd_direct_access(struct block_device *bdev, sector_t sector,
 	page = brd_insert_page(brd, sector);
 	if (!page)
 		return -ENOSPC;
-	*kaddr = (void __pmem *)page_address(page);
+	*kaddr = page_address(page);
 	*pfn = page_to_pfn_t(page);
 
 	return PAGE_SIZE;
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 3f3fdb9586b9..ed1ec0afe66f 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -74,7 +74,7 @@ static int pmem_do_bvec(struct pmem_device *pmem, struct page *page,
 	bool bad_pmem = false;
 	void *mem = kmap_atomic(page);
 	phys_addr_t pmem_off = sector * 512 + pmem->data_offset;
-	void __pmem *pmem_addr = pmem->virt_addr + pmem_off;
+	void *pmem_addr = pmem->virt_addr + pmem_off;
 
 	if (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
 		bad_pmem = true;
@@ -173,7 +173,7 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
 
 /* see "strong" declaration in tools/testing/nvdimm/pmem-dax.c */
 __weak long pmem_direct_access(struct block_device *bdev, sector_t sector,
-		      void __pmem **kaddr, pfn_t *pfn, long size)
+		      void **kaddr, pfn_t *pfn, long size)
 {
 	struct pmem_device *pmem = bdev->bd_queue->queuedata;
 	resource_size_t offset = sector * 512 + pmem->data_offset;
@@ -282,7 +282,7 @@ static int pmem_attach_disk(struct device *dev,
 
 	if (IS_ERR(addr))
 		return PTR_ERR(addr);
-	pmem->virt_addr = (void __pmem *) addr;
+	pmem->virt_addr = addr;
 
 	has_flush = nvdimm_has_flush(nd_region);
 	if (has_flush < 0)
diff --git a/drivers/nvdimm/pmem.h b/drivers/nvdimm/pmem.h
index c48d4e3aa346..b4ee4f71b4a1 100644
--- a/drivers/nvdimm/pmem.h
+++ b/drivers/nvdimm/pmem.h
@@ -6,7 +6,7 @@
 #include <linux/fs.h>
 
 long pmem_direct_access(struct block_device *bdev, sector_t sector,
-		      void __pmem **kaddr, pfn_t *pfn, long size);
+		      void **kaddr, pfn_t *pfn, long size);
 /* this definition is in it's own header for tools/testing/nvdimm to consume */
 struct pmem_device {
 	/* One contiguous memory region per device */
@@ -14,7 +14,7 @@ struct pmem_device {
 	/* when non-zero this device is hosting a 'pfn' instance */
 	phys_addr_t		data_offset;
 	u64			pfn_flags;
-	void __pmem		*virt_addr;
+	void			*virt_addr;
 	/* immutable base size of the namespace */
 	size_t			size;
 	/* trim size when namespace capacity has been section aligned */
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index bed53c46dd90..023c5c975dc0 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -31,7 +31,7 @@ static void dcssblk_release(struct gendisk *disk, fmode_t mode);
 static blk_qc_t dcssblk_make_request(struct request_queue *q,
 						struct bio *bio);
 static long dcssblk_direct_access(struct block_device *bdev, sector_t secnum,
-			 void __pmem **kaddr, pfn_t *pfn, long size);
+			 void **kaddr, pfn_t *pfn, long size);
 
 static char dcssblk_segments[DCSSBLK_PARM_LEN] = "\0";
 
@@ -884,7 +884,7 @@ fail:
 
 static long
 dcssblk_direct_access (struct block_device *bdev, sector_t secnum,
-			void __pmem **kaddr, pfn_t *pfn, long size)
+			void **kaddr, pfn_t *pfn, long size)
 {
 	struct dcssblk_dev_info *dev_info;
 	unsigned long offset, dev_sz;
@@ -894,7 +894,7 @@ dcssblk_direct_access (struct block_device *bdev, sector_t secnum,
 		return -ENODEV;
 	dev_sz = dev_info->end - dev_info->start;
 	offset = secnum * 512;
-	*kaddr = (void __pmem *) (dev_info->start + offset);
+	*kaddr = (void *) dev_info->start + offset;
 	*pfn = __pfn_to_pfn_t(PFN_DOWN(dev_info->start + offset), PFN_DEV);
 
 	return dev_sz - offset;
diff --git a/fs/dax.c b/fs/dax.c
index 434f421da660..c8312f6441bc 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -75,13 +75,13 @@ static long dax_map_atomic(struct block_device *bdev, struct blk_dax_ctl *dax)
 	struct request_queue *q = bdev->bd_queue;
 	long rc = -EIO;
 
-	dax->addr = (void __pmem *) ERR_PTR(-EIO);
+	dax->addr = ERR_PTR(-EIO);
 	if (blk_queue_enter(q, true) != 0)
 		return rc;
 
 	rc = bdev_direct_access(bdev, dax);
 	if (rc < 0) {
-		dax->addr = (void __pmem *) ERR_PTR(rc);
+		dax->addr = ERR_PTR(rc);
 		blk_queue_exit(q);
 		return rc;
 	}
@@ -152,7 +152,7 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
 	int rw = iov_iter_rw(iter), rc;
 	long map_len = 0;
 	struct blk_dax_ctl dax = {
-		.addr = (void __pmem *) ERR_PTR(-EIO),
+		.addr = ERR_PTR(-EIO),
 	};
 	unsigned blkbits = inode->i_blkbits;
 	sector_t file_blks = (i_size_read(inode) + (1 << blkbits) - 1)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3d9cf326574f..fde908b2836b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1659,7 +1659,7 @@ static inline bool integrity_req_gap_front_merge(struct request *req,
  */
 struct blk_dax_ctl {
 	sector_t sector;
-	void __pmem *addr;
+	void *addr;
 	long size;
 	pfn_t pfn;
 };
@@ -1670,8 +1670,8 @@ struct block_device_operations {
 	int (*rw_page)(struct block_device *, sector_t, struct page *, int rw);
 	int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
 	int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
-	long (*direct_access)(struct block_device *, sector_t, void __pmem **,
-			pfn_t *, long);
+	long (*direct_access)(struct block_device *, sector_t, void **, pfn_t *,
+			long);
 	unsigned int (*check_events) (struct gendisk *disk,
 				      unsigned int clearing);
 	/* ->media_changed() is DEPRECATED, use ->check_events() instead */
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 793c0829e3a3..b966974938ed 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -17,7 +17,6 @@
 # define __release(x)	__context__(x,-1)
 # define __cond_lock(x,c)	((c) ? ({ __acquire(x); 1; }) : 0)
 # define __percpu	__attribute__((noderef, address_space(3)))
-# define __pmem		__attribute__((noderef, address_space(5)))
 #ifdef CONFIG_SPARSE_RCU_POINTER
 # define __rcu		__attribute__((noderef, address_space(4)))
 #else /* CONFIG_SPARSE_RCU_POINTER */
@@ -45,7 +44,6 @@ extern void __chk_io_ptr(const volatile void __iomem *);
 # define __cond_lock(x,c) (c)
 # define __percpu
 # define __rcu
-# define __pmem
 # define __private
 # define ACCESS_PRIVATE(p, member) ((p)->member)
 #endif /* __CHECKER__ */
diff --git a/include/linux/nd.h b/include/linux/nd.h
index 1ecd64643512..f1ea426d6a5e 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -68,7 +68,7 @@ struct nd_namespace_io {
 	struct nd_namespace_common common;
 	struct resource res;
 	resource_size_t size;
-	void __pmem *addr;
+	void *addr;
 	struct badblocks bb;
 };
 
diff --git a/include/linux/pmem.h b/include/linux/pmem.h
index 9e3ea94b8157..e856c2cb0fe8 100644
--- a/include/linux/pmem.h
+++ b/include/linux/pmem.h
@@ -26,37 +26,35 @@
  * calling these symbols with arch_has_pmem_api() and redirect to the
  * implementation in asm/pmem.h.
  */
-static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src,
-		size_t n)
+static inline void arch_memcpy_to_pmem(void *dst, const void *src, size_t n)
 {
 	BUG();
 }
 
-static inline int arch_memcpy_from_pmem(void *dst, const void __pmem *src,
-		size_t n)
+static inline int arch_memcpy_from_pmem(void *dst, const void *src, size_t n)
 {
 	BUG();
 	return -EFAULT;
 }
 
-static inline size_t arch_copy_from_iter_pmem(void __pmem *addr, size_t bytes,
+static inline size_t arch_copy_from_iter_pmem(void *addr, size_t bytes,
 		struct iov_iter *i)
 {
 	BUG();
 	return 0;
 }
 
-static inline void arch_clear_pmem(void __pmem *addr, size_t size)
+static inline void arch_clear_pmem(void *addr, size_t size)
 {
 	BUG();
 }
 
-static inline void arch_wb_cache_pmem(void __pmem *addr, size_t size)
+static inline void arch_wb_cache_pmem(void *addr, size_t size)
 {
 	BUG();
 }
 
-static inline void arch_invalidate_pmem(void __pmem *addr, size_t size)
+static inline void arch_invalidate_pmem(void *addr, size_t size)
 {
 	BUG();
 }
@@ -67,13 +65,6 @@ static inline bool arch_has_pmem_api(void)
 	return IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API);
 }
 
-static inline int default_memcpy_from_pmem(void *dst, void __pmem const *src,
-		size_t size)
-{
-	memcpy(dst, (void __force *) src, size);
-	return 0;
-}
-
 /*
  * memcpy_from_pmem - read from persistent memory with error handling
  * @dst: destination buffer
@@ -82,40 +73,13 @@ static inline int default_memcpy_from_pmem(void *dst, void __pmem const *src,
  *
  * Returns 0 on success negative error code on failure.
  */
-static inline int memcpy_from_pmem(void *dst, void __pmem const *src,
-		size_t size)
+static inline int memcpy_from_pmem(void *dst, void const *src, size_t size)
 {
 	if (arch_has_pmem_api())
 		return arch_memcpy_from_pmem(dst, src, size);
 	else
-		return default_memcpy_from_pmem(dst, src, size);
-}
-
-/*
- * These defaults seek to offer decent performance and minimize the
- * window between i/o completion and writes being durable on media.
- * However, it is undefined / architecture specific whether
- * ARCH_MEMREMAP_PMEM + default_memcpy_to_pmem is sufficient for
- * making data durable relative to i/o completion.
- */
-static inline void default_memcpy_to_pmem(void __pmem *dst, const void *src,
-		size_t size)
-{
-	memcpy((void __force *) dst, src, size);
-}
-
-static inline size_t default_copy_from_iter_pmem(void __pmem *addr,
-		size_t bytes, struct iov_iter *i)
-{
-	return copy_from_iter_nocache((void __force *)addr, bytes, i);
-}
-
-static inline void default_clear_pmem(void __pmem *addr, size_t size)
-{
-	if (size == PAGE_SIZE && ((unsigned long)addr & ~PAGE_MASK) == 0)
-		clear_page((void __force *)addr);
-	else
-		memset((void __force *)addr, 0, size);
+		memcpy(dst, src, size);
+	return 0;
 }
 
 /**
@@ -130,12 +94,12 @@ static inline void default_clear_pmem(void __pmem *addr, size_t size)
  * data may still reside in cpu or platform buffers, so this operation
  * must be followed by a blkdev_issue_flush() on the pmem block device.
  */
-static inline void memcpy_to_pmem(void __pmem *dst, const void *src, size_t n)
+static inline void memcpy_to_pmem(void *dst, const void *src, size_t n)
 {
 	if (arch_has_pmem_api())
 		arch_memcpy_to_pmem(dst, src, n);
 	else
-		default_memcpy_to_pmem(dst, src, n);
+		memcpy(dst, src, n);
 }
 
 /**
@@ -147,12 +111,12 @@ static inline void memcpy_to_pmem(void __pmem *dst, const void *src, size_t n)
  * Copy data from the iterator 'i' to the PMEM buffer starting at 'addr'.
  * See blkdev_issue_flush() note for memcpy_to_pmem().
  */
-static inline size_t copy_from_iter_pmem(void __pmem *addr, size_t bytes,
+static inline size_t copy_from_iter_pmem(void *addr, size_t bytes,
 		struct iov_iter *i)
 {
 	if (arch_has_pmem_api())
 		return arch_copy_from_iter_pmem(addr, bytes, i);
-	return default_copy_from_iter_pmem(addr, bytes, i);
+	return copy_from_iter_nocache(addr, bytes, i);
 }
 
 /**
@@ -163,12 +127,12 @@ static inline size_t copy_from_iter_pmem(void __pmem *addr, size_t bytes,
  * Write zeros into the memory range starting at 'addr' for 'size' bytes.
  * See blkdev_issue_flush() note for memcpy_to_pmem().
  */
-static inline void clear_pmem(void __pmem *addr, size_t size)
+static inline void clear_pmem(void *addr, size_t size)
 {
 	if (arch_has_pmem_api())
 		arch_clear_pmem(addr, size);
 	else
-		default_clear_pmem(addr, size);
+		memset(addr, 0, size);
 }
 
 /**
@@ -179,7 +143,7 @@ static inline void clear_pmem(void __pmem *addr, size_t size)
  * For platforms that support clearing poison this flushes any poisoned
  * ranges out of the cache
  */
-static inline void invalidate_pmem(void __pmem *addr, size_t size)
+static inline void invalidate_pmem(void *addr, size_t size)
 {
 	if (arch_has_pmem_api())
 		arch_invalidate_pmem(addr, size);
@@ -193,7 +157,7 @@ static inline void invalidate_pmem(void __pmem *addr, size_t size)
  * Write back the processor cache range starting at 'addr' for 'size' bytes.
  * See blkdev_issue_flush() note for memcpy_to_pmem().
  */
-static inline void wb_cache_pmem(void __pmem *addr, size_t size)
+static inline void wb_cache_pmem(void *addr, size_t size)
 {
 	if (arch_has_pmem_api())
 		arch_wb_cache_pmem(addr, size);
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 4904ced676d4..24a08363995a 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -313,7 +313,6 @@ our $Sparse	= qr{
 			__kernel|
 			__force|
 			__iomem|
-			__pmem|
 			__must_check|
 			__init_refok|
 			__kprobes|
diff --git a/tools/testing/nvdimm/pmem-dax.c b/tools/testing/nvdimm/pmem-dax.c
index 1e0218ce6a8b..c9b8c48f85fc 100644
--- a/tools/testing/nvdimm/pmem-dax.c
+++ b/tools/testing/nvdimm/pmem-dax.c
@@ -16,7 +16,7 @@
 #include <nd.h>
 
 long pmem_direct_access(struct block_device *bdev, sector_t sector,
-		void __pmem **kaddr, pfn_t *pfn, long size)
+		void **kaddr, pfn_t *pfn, long size)
 {
 	struct pmem_device *pmem = bdev->bd_queue->queuedata;
 	resource_size_t offset = sector * 512 + pmem->data_offset;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 08/17] libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()
  2016-07-10  3:25 ` [PATCH v2 08/17] libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush() Dan Williams
@ 2016-07-10  4:47   ` kbuild test robot
  2016-07-10  5:01     ` Dan Williams
  2016-07-12 22:25   ` [PATCH v3] " Dan Williams
  1 sibling, 1 reply; 32+ messages in thread
From: kbuild test robot @ 2016-07-10  4:47 UTC (permalink / raw)
  To: Dan Williams
  Cc: kbuild-all, linux-nvdimm, linux-fsdevel, linux-acpi,
	Ross Zwisler, hch, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1546 bytes --]

Hi,

[auto build test ERROR on linux-nvdimm/libnvdimm-for-next]
[also build test ERROR on next-20160708]
[cannot apply to v4.7-rc6]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Dan-Williams/replace-pcommit-with-ADR-or-directed-flushing/20160710-113558
base:   https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git libnvdimm-for-next
config: i386-randconfig-r0-201628 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/nvdimm/region_devs.c: In function 'nvdimm_flush':
>> drivers/nvdimm/region_devs.c:887:4: error: implicit declaration of function 'writeq' [-Werror=implicit-function-declaration]
       writeq(1, ndrd->flush_wpq[i][0]);
       ^~~~~~
   cc1: some warnings being treated as errors

vim +/writeq +887 drivers/nvdimm/region_devs.c

   881		 * writes to avoid the cache via arch_memcpy_to_pmem().  The
   882		 * final wmb() ensures ordering for the NVDIMM flush write.
   883		 */
   884		wmb();
   885		for (i = 0; i < nd_region->ndr_mappings; i++)
   886			if (ndrd->flush_wpq[i][0])
 > 887				writeq(1, ndrd->flush_wpq[i][0]);
   888		wmb();
   889	}
   890	EXPORT_SYMBOL_GPL(nvdimm_flush);

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 28705 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 08/17] libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()
  2016-07-10  4:47   ` kbuild test robot
@ 2016-07-10  5:01     ` Dan Williams
  2016-07-11  3:48       ` Li, Philip
  0 siblings, 1 reply; 32+ messages in thread
From: Dan Williams @ 2016-07-10  5:01 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, linux-nvdimm@lists.01.org, linux-fsdevel, Linux ACPI,
	Ross Zwisler, Christoph Hellwig, linux-kernel

On Sat, Jul 9, 2016 at 9:47 PM, kbuild test robot <lkp@intel.com> wrote:
> Hi,
>
> [auto build test ERROR on linux-nvdimm/libnvdimm-for-next]
> [also build test ERROR on next-20160708]
> [cannot apply to v4.7-rc6]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url:    https://github.com/0day-ci/linux/commits/Dan-Williams/replace-pcommit-with-ADR-or-directed-flushing/20160710-113558
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git libnvdimm-for-next
> config: i386-randconfig-r0-201628 (attached as .config)
> compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=i386

Hi kbuild team,

Can we add an "i386 allmodconfig" build to the standard "BUILD
SUCCESS" notification runs?  I had two positive build results on a
private branch prior to posting this series, but the i386 runs did not
build the nvdimm sub-system.

In any event this report is valid, so thank you for that!


>
> All errors (new ones prefixed by >>):
>
>    drivers/nvdimm/region_devs.c: In function 'nvdimm_flush':
>>> drivers/nvdimm/region_devs.c:887:4: error: implicit declaration of function 'writeq' [-Werror=implicit-function-declaration]
>        writeq(1, ndrd->flush_wpq[i][0]);
>        ^~~~~~
>    cc1: some warnings being treated as errors
>
> vim +/writeq +887 drivers/nvdimm/region_devs.c
>
>    881           * writes to avoid the cache via arch_memcpy_to_pmem().  The
>    882           * final wmb() ensures ordering for the NVDIMM flush write.
>    883           */
>    884          wmb();
>    885          for (i = 0; i < nd_region->ndr_mappings; i++)
>    886                  if (ndrd->flush_wpq[i][0])
>  > 887                          writeq(1, ndrd->flush_wpq[i][0]);
>    888          wmb();
>    889  }
>    890  EXPORT_SYMBOL_GPL(nvdimm_flush);
>
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 03/17] libnvdimm: introduce devm_nvdimm_memremap(), convert nfit_spa_map() users
  2016-07-10  3:24 ` [PATCH v2 03/17] libnvdimm: introduce devm_nvdimm_memremap(), convert nfit_spa_map() users Dan Williams
@ 2016-07-10  5:30   ` kbuild test robot
  2016-07-12 22:22   ` [PATCH v3] " Dan Williams
  1 sibling, 0 replies; 32+ messages in thread
From: kbuild test robot @ 2016-07-10  5:30 UTC (permalink / raw)
  To: Dan Williams
  Cc: kbuild-all, linux-nvdimm, linux-fsdevel, linux-acpi, hch, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2885 bytes --]

Hi,

[auto build test ERROR on linux-nvdimm/libnvdimm-for-next]
[also build test ERROR on v4.7-rc6 next-20160708]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Dan-Williams/replace-pcommit-with-ADR-or-directed-flushing/20160710-113558
base:   https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git libnvdimm-for-next
config: um-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
        # save the attached .config to linux build tree
        make ARCH=um 

All error/warnings (new ones prefixed by >>):

   drivers/nvdimm/core.c: In function 'alloc_nvdimm_map':
>> drivers/nvdimm/core.c:108:23: error: implicit declaration of function 'ioremap' [-Werror=implicit-function-declaration]
      nvdimm_map->iomem = ioremap(offset, size);
                          ^~~~~~~
>> drivers/nvdimm/core.c:108:21: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
      nvdimm_map->iomem = ioremap(offset, size);
                        ^
   drivers/nvdimm/core.c: In function 'nvdimm_map_release':
>> drivers/nvdimm/core.c:139:3: error: implicit declaration of function 'iounmap' [-Werror=implicit-function-declaration]
      iounmap(nvdimm_map->iomem);
      ^~~~~~~
   cc1: some warnings being treated as errors

vim +/ioremap +108 drivers/nvdimm/core.c

   102		if (!request_mem_region(offset, size, dev_name(&nvdimm_bus->dev)))
   103			goto err_request_region;
   104	
   105		if (flags)
   106			nvdimm_map->mem = memremap(offset, size, flags);
   107		else
 > 108			nvdimm_map->iomem = ioremap(offset, size);
   109	
   110		if (!nvdimm_map->mem)
   111			goto err_map;
   112	
   113		dev_WARN_ONCE(dev, !is_nvdimm_bus_locked(dev), "%s: bus unlocked!",
   114				__func__);
   115		list_add(&nvdimm_map->list, &nvdimm_bus->mapping_list);
   116	
   117		return nvdimm_map;
   118	
   119	 err_map:
   120		release_mem_region(offset, size);
   121	 err_request_region:
   122		kfree(nvdimm_map);
   123		return NULL;
   124	}
   125	
   126	static void nvdimm_map_release(struct kref *kref)
   127	{
   128		struct nvdimm_bus *nvdimm_bus;
   129		struct nvdimm_map *nvdimm_map;
   130	
   131		nvdimm_map = container_of(kref, struct nvdimm_map, kref);
   132		nvdimm_bus = nvdimm_map->nvdimm_bus;
   133	
   134		dev_dbg(&nvdimm_bus->dev, "%s: %pa\n", __func__, &nvdimm_map->offset);
   135		list_del(&nvdimm_map->list);
   136		if (nvdimm_map->flags)
   137			memunmap(nvdimm_map->mem);
   138		else
 > 139			iounmap(nvdimm_map->iomem);
   140		release_mem_region(nvdimm_map->offset, nvdimm_map->size);
   141		kfree(nvdimm_map);
   142	}

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 18144 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v2 08/17] libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()
  2016-07-10  5:01     ` Dan Williams
@ 2016-07-11  3:48       ` Li, Philip
  0 siblings, 0 replies; 32+ messages in thread
From: Li, Philip @ 2016-07-11  3:48 UTC (permalink / raw)
  To: Williams, Dan J, lkp
  Cc: kbuild-all, linux-nvdimm@lists.01.org, linux-fsdevel, Linux ACPI,
	Ross Zwisler, Christoph Hellwig, linux-kernel



> -----Original Message-----
> From: Williams, Dan J
> Sent: Sunday, July 10, 2016 1:01 PM
> To: lkp <lkp@intel.com>
> Cc: kbuild-all@01.org; linux-nvdimm@lists.01.org; linux-fsdevel <linux-
> fsdevel@vger.kernel.org>; Linux ACPI <linux-acpi@vger.kernel.org>; Ross
> Zwisler <ross.zwisler@linux.intel.com>; Christoph Hellwig <hch@lst.de>; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH v2 08/17] libnvdimm: introduce nvdimm_flush() and
> nvdimm_has_flush()
> 
> On Sat, Jul 9, 2016 at 9:47 PM, kbuild test robot <lkp@intel.com> wrote:
> > Hi,
> >
> > [auto build test ERROR on linux-nvdimm/libnvdimm-for-next]
> > [also build test ERROR on next-20160708]
> > [cannot apply to v4.7-rc6]
> > [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system]
> >
> > url:    https://github.com/0day-ci/linux/commits/Dan-Williams/replace-
> pcommit-with-ADR-or-directed-flushing/20160710-113558
> > base:   https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git
> libnvdimm-for-next
> > config: i386-randconfig-r0-201628 (attached as .config)
> > compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
> > reproduce:
> >         # save the attached .config to linux build tree
> >         make ARCH=i386
> 
> Hi kbuild team,
> 
> Can we add an "i386 allmodconfig" build to the standard "BUILD
> SUCCESS" notification runs?  I had two positive build results on a

Thanks, yes, currently i386 allmodconfig has been covered for all kinds of test including
kbiuld on registered repo or LKML patches. If the test is running on a repo for its new commits,
a BUILD SUCCESS mail, it will list the current coverage by the time the mail is sent out like

m32r                       m32104ut_defconfig
m32r                     mappi3.smp_defconfig
m32r                         opsput_defconfig
m32r                           usrv_defconfig
xtensa                       common_defconfig
xtensa                          iss_defconfig
i386                             allmodconfig
mips                                   jz4740
mips                              allnoconfig

> private branch prior to posting this series, but the i386 runs did not
> build the nvdimm sub-system.
> 
> In any event this report is valid, so thank you for that!
> 
> 
> >
> > All errors (new ones prefixed by >>):
> >
> >    drivers/nvdimm/region_devs.c: In function 'nvdimm_flush':
> >>> drivers/nvdimm/region_devs.c:887:4: error: implicit declaration of function
> 'writeq' [-Werror=implicit-function-declaration]
> >        writeq(1, ndrd->flush_wpq[i][0]);
> >        ^~~~~~
> >    cc1: some warnings being treated as errors
> >
> > vim +/writeq +887 drivers/nvdimm/region_devs.c
> >
> >    881           * writes to avoid the cache via arch_memcpy_to_pmem().  The
> >    882           * final wmb() ensures ordering for the NVDIMM flush write.
> >    883           */
> >    884          wmb();
> >    885          for (i = 0; i < nd_region->ndr_mappings; i++)
> >    886                  if (ndrd->flush_wpq[i][0])
> >  > 887                          writeq(1, ndrd->flush_wpq[i][0]);
> >    888          wmb();
> >    889  }
> >    890  EXPORT_SYMBOL_GPL(nvdimm_flush);
> >
> > ---
> > 0-DAY kernel test infrastructure                Open Source Technology Center
> > https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 16/17] x86/insn: remove pcommit
  2016-07-10  3:25 ` [PATCH v2 16/17] x86/insn: remove pcommit Dan Williams
@ 2016-07-12 14:57   ` Peter Zijlstra
  2016-07-12 22:12     ` Dan Williams
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2016-07-12 14:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm, Xiao Guangrong, Josh Poimboeuf, linux-acpi,
	linux-kernel, x86, Adrian Hunter, Arnaldo Carvalho de Melo, hch,
	Alexander Shishkin, Ingo Molnar, Andy Lutomirski, H. Peter Anvin,
	linux-fsdevel, Thomas Gleixner, Borislav Petkov, Ross Zwisler

On Sat, Jul 09, 2016 at 08:25:54PM -0700, Dan Williams wrote:
> The pcommit instruction is being deprecated in favor of either ADR
> (asynchronous DRAM refresh: flush-on-power-fail) at the platform level, or
> posted-write-queue flush addresses as defined by the ACPI 6.x NFIT (NVDIMM
> Firmware Interface Table).

>  arch/x86/include/asm/cpufeatures.h                 |    1 
>  arch/x86/include/asm/special_insns.h               |   46 --------------------
>  arch/x86/lib/x86-opcode-map.txt                    |    2 -
>  tools/objtool/arch/x86/insn/x86-opcode-map.txt     |    2 -
>  tools/perf/arch/x86/tests/insn-x86-dat-32.c        |    2 -
>  tools/perf/arch/x86/tests/insn-x86-dat-64.c        |    2 -
>  tools/perf/arch/x86/tests/insn-x86-dat-src.c       |    4 --

Just deprecated, or is it completely eradicated, removed from history,
will never ever happen and we'll reissue the opcode for something else?

Because if its only deprecated then removing it from the instruction
decoders seems wrong, old binaries might still contain the opcode.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 16/17] x86/insn: remove pcommit
  2016-07-12 14:57   ` Peter Zijlstra
@ 2016-07-12 22:12     ` Dan Williams
  2016-07-22 15:55       ` Dan Williams
  0 siblings, 1 reply; 32+ messages in thread
From: Dan Williams @ 2016-07-12 22:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-nvdimm, Xiao Guangrong, Josh Poimboeuf, Linux ACPI,
	linux-kernel, X86 ML, Adrian Hunter, Arnaldo Carvalho de Melo,
	Christoph Hellwig, Alexander Shishkin, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, linux-fsdevel, Thomas Gleixner,
	Borislav Petkov, Ross Zwisler

On Tue, Jul 12, 2016 at 7:57 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Sat, Jul 09, 2016 at 08:25:54PM -0700, Dan Williams wrote:
>> The pcommit instruction is being deprecated in favor of either ADR
>> (asynchronous DRAM refresh: flush-on-power-fail) at the platform level, or
>> posted-write-queue flush addresses as defined by the ACPI 6.x NFIT (NVDIMM
>> Firmware Interface Table).
>
>>  arch/x86/include/asm/cpufeatures.h                 |    1
>>  arch/x86/include/asm/special_insns.h               |   46 --------------------
>>  arch/x86/lib/x86-opcode-map.txt                    |    2 -
>>  tools/objtool/arch/x86/insn/x86-opcode-map.txt     |    2 -
>>  tools/perf/arch/x86/tests/insn-x86-dat-32.c        |    2 -
>>  tools/perf/arch/x86/tests/insn-x86-dat-64.c        |    2 -
>>  tools/perf/arch/x86/tests/insn-x86-dat-src.c       |    4 --
>
> Just deprecated, or is it completely eradicated, removed from history,
> will never ever happen and we'll reissue the opcode for something else?
>
> Because if its only deprecated then removing it from the instruction
> decoders seems wrong, old binaries might still contain the opcode.

Eradicated.

"The new instructions like CLWB and CLFLUSHOPT will be rolled into the
SDM but PCOMMIT will be removed from the Extensions doc and not rolled
into the SDM." [1]

Existing binaries are already gating their usage on the presence of
the cpu id flag, that flag and the instruction opcode are reserved
going forward.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2016-June/005923.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3] libnvdimm: introduce devm_nvdimm_memremap(), convert nfit_spa_map() users
  2016-07-10  3:24 ` [PATCH v2 03/17] libnvdimm: introduce devm_nvdimm_memremap(), convert nfit_spa_map() users Dan Williams
  2016-07-10  5:30   ` kbuild test robot
@ 2016-07-12 22:22   ` Dan Williams
  1 sibling, 0 replies; 32+ messages in thread
From: Dan Williams @ 2016-07-12 22:22 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: linux-fsdevel, linux-acpi, hch, linux-kernel

In preparation for generically mapping flush hint addresses for both the
BLK and PMEM use case, provide a generic / reference counted mapping
api.  Given the fact that a dimm may belong to multiple regions (PMEM
and BLK), the flush hint addresses need to be held valid as long as any
region associated with the dimm is active.  This is similar to the
existing BLK-region case where multiple BLK-regions may share an
aperture mapping.  Up-level this shared / reference-counted mapping
capability from the nfit driver to a core nvdimm capability.

This eliminates the need for the nd_blk_region.disable() callback.  Note
that the removal of nfit_spa_map() and related infrastructure is
deferred to a later patch.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

Change since v2: fix ARCH=um build error

 drivers/acpi/nfit.c       |   14 +++--
 drivers/nvdimm/Kconfig    |    2 -
 drivers/nvdimm/core.c     |  123 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/nvdimm/nd-core.h  |    1 
 include/linux/libnvdimm.h |    9 +++
 5 files changed, 141 insertions(+), 8 deletions(-)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index f8c1a850effc..b047dbe13bed 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -1616,7 +1616,8 @@ static void __iomem *__nfit_spa_map(struct acpi_nfit_desc *acpi_desc,
  * when all region devices referencing the same mapping are disabled /
  * unbound.
  */
-static void __iomem *nfit_spa_map(struct acpi_nfit_desc *acpi_desc,
+static __maybe_unused void __iomem *nfit_spa_map(
+		struct acpi_nfit_desc *acpi_desc,
 		struct acpi_nfit_system_address *spa, enum spa_map_type type)
 {
 	void __iomem *iomem;
@@ -1669,7 +1670,6 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus,
 		struct device *dev)
 {
 	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
-	struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
 	struct nd_blk_region *ndbr = to_nd_blk_region(dev);
 	struct nfit_flush *nfit_flush;
 	struct nfit_blk_mmio *mmio;
@@ -1697,8 +1697,8 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus,
 	/* map block aperture memory */
 	nfit_blk->bdw_offset = nfit_mem->bdw->offset;
 	mmio = &nfit_blk->mmio[BDW];
-	mmio->addr.base = nfit_spa_map(acpi_desc, nfit_mem->spa_bdw,
-			SPA_MAP_APERTURE);
+	mmio->addr.base = devm_nvdimm_memremap(dev, nfit_mem->spa_bdw->address,
+                        nfit_mem->spa_bdw->length, ARCH_MEMREMAP_PMEM);
 	if (!mmio->addr.base) {
 		dev_dbg(dev, "%s: %s failed to map bdw\n", __func__,
 				nvdimm_name(nvdimm));
@@ -1720,8 +1720,8 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus,
 	nfit_blk->cmd_offset = nfit_mem->dcr->command_offset;
 	nfit_blk->stat_offset = nfit_mem->dcr->status_offset;
 	mmio = &nfit_blk->mmio[DCR];
-	mmio->addr.base = nfit_spa_map(acpi_desc, nfit_mem->spa_dcr,
-			SPA_MAP_CONTROL);
+	mmio->addr.base = devm_nvdimm_ioremap(dev, nfit_mem->spa_dcr->address,
+			nfit_mem->spa_dcr->length);
 	if (!mmio->addr.base) {
 		dev_dbg(dev, "%s: %s failed to map dcr\n", __func__,
 				nvdimm_name(nvdimm));
@@ -1748,7 +1748,7 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus,
 
 	nfit_flush = nfit_mem->nfit_flush;
 	if (nfit_flush && nfit_flush->flush->hint_count != 0) {
-		nfit_blk->nvdimm_flush = devm_ioremap_nocache(dev,
+		nfit_blk->nvdimm_flush = devm_nvdimm_ioremap(dev,
 				nfit_flush->flush->hint_address[0], 8);
 		if (!nfit_blk->nvdimm_flush)
 			return -ENOMEM;
diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index 7c8a3bf07884..124c2432ac9c 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -1,6 +1,7 @@
 menuconfig LIBNVDIMM
 	tristate "NVDIMM (Non-Volatile Memory Device) Support"
 	depends on PHYS_ADDR_T_64BIT
+	depends on HAS_IOMEM
 	depends on BLK_DEV
 	help
 	  Generic support for non-volatile memory devices including
@@ -19,7 +20,6 @@ if LIBNVDIMM
 config BLK_DEV_PMEM
 	tristate "PMEM: Persistent memory block device support"
 	default LIBNVDIMM
-	depends on HAS_IOMEM
 	select ND_BTT if BTT
 	select ND_PFN if NVDIMM_PFN
 	help
diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
index 32e4fe2f6274..757e0cf028bf 100644
--- a/drivers/nvdimm/core.c
+++ b/drivers/nvdimm/core.c
@@ -20,6 +20,7 @@
 #include <linux/ndctl.h>
 #include <linux/mutex.h>
 #include <linux/slab.h>
+#include <linux/io.h>
 #include "nd-core.h"
 #include "nd.h"
 
@@ -57,6 +58,127 @@ bool is_nvdimm_bus_locked(struct device *dev)
 }
 EXPORT_SYMBOL(is_nvdimm_bus_locked);
 
+struct nvdimm_map {
+	struct nvdimm_bus *nvdimm_bus;
+	struct list_head list;
+	resource_size_t offset;
+	unsigned long flags;
+	size_t size;
+	union {
+		void *mem;
+		void __iomem *iomem;
+	};
+	struct kref kref;
+};
+
+static struct nvdimm_map *find_nvdimm_map(struct device *dev,
+		resource_size_t offset)
+{
+	struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
+	struct nvdimm_map *nvdimm_map;
+
+	list_for_each_entry(nvdimm_map, &nvdimm_bus->mapping_list, list)
+		if (nvdimm_map->offset == offset)
+			return nvdimm_map;
+	return NULL;
+}
+
+static struct nvdimm_map *alloc_nvdimm_map(struct device *dev,
+		resource_size_t offset, size_t size, unsigned long flags)
+{
+	struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
+	struct nvdimm_map *nvdimm_map;
+
+	nvdimm_map = kzalloc(sizeof(*nvdimm_map), GFP_KERNEL);
+	if (!nvdimm_map)
+		return NULL;
+
+	INIT_LIST_HEAD(&nvdimm_map->list);
+	nvdimm_map->nvdimm_bus = nvdimm_bus;
+	nvdimm_map->offset = offset;
+	nvdimm_map->flags = flags;
+	nvdimm_map->size = size;
+	kref_init(&nvdimm_map->kref);
+
+	if (!request_mem_region(offset, size, dev_name(&nvdimm_bus->dev)))
+		goto err_request_region;
+
+	if (flags)
+		nvdimm_map->mem = memremap(offset, size, flags);
+	else
+		nvdimm_map->iomem = ioremap(offset, size);
+
+	if (!nvdimm_map->mem)
+		goto err_map;
+
+	dev_WARN_ONCE(dev, !is_nvdimm_bus_locked(dev), "%s: bus unlocked!",
+			__func__);
+	list_add(&nvdimm_map->list, &nvdimm_bus->mapping_list);
+
+	return nvdimm_map;
+
+ err_map:
+	release_mem_region(offset, size);
+ err_request_region:
+	kfree(nvdimm_map);
+	return NULL;
+}
+
+static void nvdimm_map_release(struct kref *kref)
+{
+	struct nvdimm_bus *nvdimm_bus;
+	struct nvdimm_map *nvdimm_map;
+
+	nvdimm_map = container_of(kref, struct nvdimm_map, kref);
+	nvdimm_bus = nvdimm_map->nvdimm_bus;
+
+	dev_dbg(&nvdimm_bus->dev, "%s: %pa\n", __func__, &nvdimm_map->offset);
+	list_del(&nvdimm_map->list);
+	if (nvdimm_map->flags)
+		memunmap(nvdimm_map->mem);
+	else
+		iounmap(nvdimm_map->iomem);
+	release_mem_region(nvdimm_map->offset, nvdimm_map->size);
+	kfree(nvdimm_map);
+}
+
+static void nvdimm_map_put(void *data)
+{
+	struct nvdimm_map *nvdimm_map = data;
+	struct nvdimm_bus *nvdimm_bus = nvdimm_map->nvdimm_bus;
+
+	nvdimm_bus_lock(&nvdimm_bus->dev);
+	kref_put(&nvdimm_map->kref, nvdimm_map_release);
+	nvdimm_bus_unlock(&nvdimm_bus->dev);
+}
+
+/**
+ * devm_nvdimm_memremap - map a resource that is shared across regions
+ * @dev: device that will own a reference to the shared mapping
+ * @offset: physical base address of the mapping
+ * @size: mapping size
+ * @flags: memremap flags, or, if zero, perform an ioremap instead
+ */
+void *devm_nvdimm_memremap(struct device *dev, resource_size_t offset,
+		size_t size, unsigned long flags)
+{
+	struct nvdimm_map *nvdimm_map;
+
+	nvdimm_bus_lock(dev);
+	nvdimm_map = find_nvdimm_map(dev, offset);
+	if (!nvdimm_map)
+		nvdimm_map = alloc_nvdimm_map(dev, offset, size, flags);
+	else
+		kref_get(&nvdimm_map->kref);
+	nvdimm_bus_unlock(dev);
+
+	if (devm_add_action_or_reset(dev, nvdimm_map_put, nvdimm_map))
+		return NULL;
+
+	return nvdimm_map->mem;
+}
+EXPORT_SYMBOL_GPL(devm_nvdimm_memremap);
+
 u64 nd_fletcher64(void *addr, size_t len, bool le)
 {
 	u32 *buf = addr;
@@ -335,6 +457,7 @@ struct nvdimm_bus *__nvdimm_bus_register(struct device *parent,
 	if (!nvdimm_bus)
 		return NULL;
 	INIT_LIST_HEAD(&nvdimm_bus->list);
+	INIT_LIST_HEAD(&nvdimm_bus->mapping_list);
 	INIT_LIST_HEAD(&nvdimm_bus->poison_list);
 	init_waitqueue_head(&nvdimm_bus->probe_wait);
 	nvdimm_bus->id = ida_simple_get(&nd_ida, 0, 0, GFP_KERNEL);
diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index 284cdaa268cf..790b62cc81ed 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -31,6 +31,7 @@ struct nvdimm_bus {
 	struct device dev;
 	int id, probe_active;
 	struct list_head poison_list;
+	struct list_head mapping_list;
 	struct mutex reconfig_mutex;
 };
 
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 0c3c30cbbea5..18c3cc48a970 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -99,6 +99,15 @@ struct nd_region_desc {
 	unsigned long flags;
 };
 
+struct device;
+void *devm_nvdimm_memremap(struct device *dev, resource_size_t offset,
+		size_t size, unsigned long flags);
+static inline void __iomem *devm_nvdimm_ioremap(struct device *dev,
+		resource_size_t offset, size_t size)
+{
+	return (void __iomem *) devm_nvdimm_memremap(dev, offset, size, 0);
+}
+
 struct nvdimm_bus;
 struct module;
 struct device;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3] libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()
  2016-07-10  3:25 ` [PATCH v2 08/17] libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush() Dan Williams
  2016-07-10  4:47   ` kbuild test robot
@ 2016-07-12 22:25   ` Dan Williams
  1 sibling, 0 replies; 32+ messages in thread
From: Dan Williams @ 2016-07-12 22:25 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: linux-fsdevel, linux-acpi, Ross Zwisler, hch, linux-kernel

nvdimm_flush() is a replacement for the x86 'pcommit' instruction.  It is
an optional write flushing mechanism that an nvdimm bus can provide for
the pmem driver to consume.  In the case of the NFIT nvdimm-bus-provider
nvdimm_flush() is implemented as a series of flush-hint-address [1]
writes to each dimm in the interleave set (region) that backs the
namespace.

The nvdimm_has_flush() routine relies on platform firmware to describe
the flushing capabilities of a platform.  It uses the heuristic of
whether an nvdimm bus provider provides flush address data to return a
ternary result:

      1: flush addresses defined
      0: dimm topology described without flush addresses (assume ADR)
 -errno: no topology information, unable to determine flush mechanism

The pmem driver is expected to take the following actions on this ternary
result:

      1: nvdimm_flush() in response to REQ_FUA / REQ_FLUSH and shutdown
      0: do not set, WC or FUA on the queue, take no further action
 -errno: warn and then operate as if nvdimm_has_flush() returned '0'

The caveat of this heuristic is that it can not distinguish the "dimm
does not have flush address" case from the "platform firmware is broken
and failed to describe a flush address".  Given we are already
explicitly trusting the NFIT there's not much more we can do beyond
blacklisting broken firmwares if they are ever encountered.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

Change since v2: fix ARCH=i386 usage of writeq

 drivers/acpi/nfit.c          |   33 ++---------------------
 drivers/acpi/nfit.h          |    1 -
 drivers/nvdimm/pmem.c        |   27 ++++++++++++++-----
 drivers/nvdimm/region_devs.c |   61 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/libnvdimm.h    |    2 +
 5 files changed, 87 insertions(+), 37 deletions(-)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 6796f780870a..0497175ee6cb 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -1393,24 +1393,6 @@ static u64 to_interleave_offset(u64 offset, struct nfit_blk_mmio *mmio)
 	return mmio->base_offset + line_offset + table_offset + sub_line_offset;
 }
 
-static void wmb_blk(struct nfit_blk *nfit_blk)
-{
-
-	if (nfit_blk->nvdimm_flush) {
-		/*
-		 * The first wmb() is needed to 'sfence' all previous writes
-		 * such that they are architecturally visible for the platform
-		 * buffer flush.  Note that we've already arranged for pmem
-		 * writes to avoid the cache via arch_memcpy_to_pmem().  The
-		 * final wmb() ensures ordering for the NVDIMM flush write.
-		 */
-		wmb();
-		writeq(1, nfit_blk->nvdimm_flush);
-		wmb();
-	} else
-		wmb_pmem();
-}
-
 static u32 read_blk_stat(struct nfit_blk *nfit_blk, unsigned int bw)
 {
 	struct nfit_blk_mmio *mmio = &nfit_blk->mmio[DCR];
@@ -1445,7 +1427,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, unsigned int bw,
 		offset = to_interleave_offset(offset, mmio);
 
 	writeq(cmd, mmio->addr.base + offset);
-	wmb_blk(nfit_blk);
+	nvdimm_flush(nfit_blk->nd_region);
 
 	if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
 		readq(mmio->addr.base + offset);
@@ -1496,7 +1478,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk,
 	}
 
 	if (rw)
-		wmb_blk(nfit_blk);
+		nvdimm_flush(nfit_blk->nd_region);
 
 	rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
 	return rc;
@@ -1570,7 +1552,6 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus,
 {
 	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
 	struct nd_blk_region *ndbr = to_nd_blk_region(dev);
-	struct nfit_flush *nfit_flush;
 	struct nfit_blk_mmio *mmio;
 	struct nfit_blk *nfit_blk;
 	struct nfit_mem *nfit_mem;
@@ -1645,15 +1626,7 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus,
 		return rc;
 	}
 
-	nfit_flush = nfit_mem->nfit_flush;
-	if (nfit_flush && nfit_flush->flush->hint_count != 0) {
-		nfit_blk->nvdimm_flush = devm_nvdimm_ioremap(dev,
-				nfit_flush->flush->hint_address[0], 8);
-		if (!nfit_blk->nvdimm_flush)
-			return -ENOMEM;
-	}
-
-	if (!arch_has_wmb_pmem() && !nfit_blk->nvdimm_flush)
+	if (nvdimm_has_flush(nfit_blk->nd_region) < 0)
 		dev_warn(dev, "unable to guarantee persistence of writes\n");
 
 	if (mmio->line_size == 0)
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index 9282eb324dcc..9fda77cf81da 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -183,7 +183,6 @@ struct nfit_blk {
 	u64 bdw_offset; /* post interleave offset */
 	u64 stat_offset;
 	u64 cmd_offset;
-	void __iomem *nvdimm_flush;
 	u32 dimm_flags;
 };
 
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index b6fcb97a601c..e303655f243e 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -33,10 +33,24 @@
 #include "pfn.h"
 #include "nd.h"
 
+static struct device *to_dev(struct pmem_device *pmem)
+{
+	/*
+	 * nvdimm bus services need a 'dev' parameter, and we record the device
+	 * at init in bb.dev.
+	 */
+	return pmem->bb.dev;
+}
+
+static struct nd_region *to_region(struct pmem_device *pmem)
+{
+	return to_nd_region(to_dev(pmem)->parent);
+}
+
 static void pmem_clear_poison(struct pmem_device *pmem, phys_addr_t offset,
 		unsigned int len)
 {
-	struct device *dev = pmem->bb.dev;
+	struct device *dev = to_dev(pmem);
 	sector_t sector;
 	long cleared;
 
@@ -122,7 +136,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 		nd_iostat_end(bio, start);
 
 	if (bio_data_dir(bio))
-		wmb_pmem();
+		nvdimm_flush(to_region(pmem));
 
 	bio_endio(bio);
 	return BLK_QC_T_NONE;
@@ -136,7 +150,7 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
 
 	rc = pmem_do_bvec(pmem, page, PAGE_SIZE, 0, rw, sector);
 	if (rw & WRITE)
-		wmb_pmem();
+		nvdimm_flush(to_region(pmem));
 
 	/*
 	 * The ->rw_page interface is subtle and tricky.  The core
@@ -193,6 +207,7 @@ static int pmem_attach_disk(struct device *dev,
 		struct nd_namespace_common *ndns)
 {
 	struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
+	struct nd_region *nd_region = to_nd_region(dev->parent);
 	struct vmem_altmap __altmap, *altmap = NULL;
 	struct resource *res = &nsio->res;
 	struct nd_pfn *nd_pfn = NULL;
@@ -222,7 +237,7 @@ static int pmem_attach_disk(struct device *dev,
 	dev_set_drvdata(dev, pmem);
 	pmem->phys_addr = res->start;
 	pmem->size = resource_size(res);
-	if (!arch_has_wmb_pmem())
+	if (nvdimm_has_flush(nd_region) < 0)
 		dev_warn(dev, "unable to guarantee persistence of writes\n");
 
 	if (!devm_request_mem_region(dev, res->start, resource_size(res),
@@ -284,7 +299,7 @@ static int pmem_attach_disk(struct device *dev,
 			/ 512);
 	if (devm_init_badblocks(dev, &pmem->bb))
 		return -ENOMEM;
-	nvdimm_badblocks_populate(to_nd_region(dev->parent), &pmem->bb, res);
+	nvdimm_badblocks_populate(nd_region, &pmem->bb, res);
 	disk->bb = &pmem->bb;
 	add_disk(disk);
 
@@ -331,8 +346,8 @@ static int nd_pmem_remove(struct device *dev)
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
 {
-	struct nd_region *nd_region = to_nd_region(dev->parent);
 	struct pmem_device *pmem = dev_get_drvdata(dev);
+	struct nd_region *nd_region = to_region(pmem);
 	resource_size_t offset = 0, end_trunc = 0;
 	struct nd_namespace_common *ndns;
 	struct nd_namespace_io *nsio;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 67022f74febc..5d97b127b715 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -14,12 +14,19 @@
 #include <linux/highmem.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/pmem.h>
 #include <linux/sort.h>
 #include <linux/io.h>
 #include <linux/nd.h>
 #include "nd-core.h"
 #include "nd.h"
 
+/*
+ * For readq() and writeq() on 32-bit builds, the hi-lo, lo-hi order is
+ * irrelevant.
+ */
+#include <linux/io-64-nonatomic-hi-lo.h>
+
 static DEFINE_IDA(region_ida);
 
 static int nvdimm_map_flush(struct device *dev, struct nvdimm *nvdimm, int dimm,
@@ -864,6 +871,60 @@ struct nd_region *nvdimm_volatile_region_create(struct nvdimm_bus *nvdimm_bus,
 }
 EXPORT_SYMBOL_GPL(nvdimm_volatile_region_create);
 
+/**
+ * nvdimm_flush - flush any posted write queues between the cpu and pmem media
+ * @nd_region: blk or interleaved pmem region
+ */
+void nvdimm_flush(struct nd_region *nd_region)
+{
+	struct nd_region_data *ndrd = dev_get_drvdata(&nd_region->dev);
+	int i;
+
+	/*
+	 * The first wmb() is needed to 'sfence' all previous writes
+	 * such that they are architecturally visible for the platform
+	 * buffer flush.  Note that we've already arranged for pmem
+	 * writes to avoid the cache via arch_memcpy_to_pmem().  The
+	 * final wmb() ensures ordering for the NVDIMM flush write.
+	 */
+	wmb();
+	for (i = 0; i < nd_region->ndr_mappings; i++)
+		if (ndrd->flush_wpq[i][0])
+			writeq(1, ndrd->flush_wpq[i][0]);
+	wmb();
+}
+EXPORT_SYMBOL_GPL(nvdimm_flush);
+
+/**
+ * nvdimm_has_flush - determine write flushing requirements
+ * @nd_region: blk or interleaved pmem region
+ *
+ * Returns 1 if writes require flushing
+ * Returns 0 if writes do not require flushing
+ * Returns -ENXIO if flushing capability can not be determined
+ */
+int nvdimm_has_flush(struct nd_region *nd_region)
+{
+	struct nd_region_data *ndrd = dev_get_drvdata(&nd_region->dev);
+	int i;
+
+	/* no nvdimm == flushing capability unknown */
+	if (nd_region->ndr_mappings == 0)
+		return -ENXIO;
+
+	for (i = 0; i < nd_region->ndr_mappings; i++)
+		/* flush hints present, flushing required */
+		if (ndrd->flush_wpq[i][0])
+			return 1;
+
+	/*
+	 * The platform defines dimm devices without hints, assume
+	 * platform persistence mechanism like ADR
+	 */
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nvdimm_has_flush);
+
 void __exit nd_region_devs_exit(void)
 {
 	ida_destroy(&region_ida);
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 815b9b430ead..d37fda6dd64c 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -166,4 +166,6 @@ struct nvdimm *nd_blk_region_to_dimm(struct nd_blk_region *ndbr);
 unsigned int nd_region_acquire_lane(struct nd_region *nd_region);
 void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane);
 u64 nd_fletcher64(void *addr, size_t len, bool le);
+void nvdimm_flush(struct nd_region *nd_region);
+int nvdimm_has_flush(struct nd_region *nd_region);
 #endif /* __LIBNVDIMM_H__ */

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3] libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush()
  2016-07-10  3:25 ` [PATCH v2 10/17] libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush() Dan Williams
@ 2016-07-12 22:26   ` Dan Williams
  2016-07-13 19:46     ` Kani, Toshimitsu
  0 siblings, 1 reply; 32+ messages in thread
From: Dan Williams @ 2016-07-12 22:26 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Toshi Kani, linux-kernel, linux-acpi, linux-fsdevel, Ross Zwisler, hch

Given that nvdimm_flush() has higher overhead than wmb_pmem() (pointer
chasing through nd_region), and that we otherwise assume a platform has
ADR capability when flush hints are not present, move nvdimm_flush() to
REQ_FLUSH context.

Note that we still arrange for nvdimm_flush() to be called even in the
ADR case. We need at least once wmb() fence to push buffered writes in
the cpu out to the ADR protected domain.

Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

Change since v2: arrange for wmb() to be called for flushing in the ADR case

 drivers/nvdimm/pmem.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index e303655f243e..9d9c1beef020 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -113,6 +113,11 @@ static int pmem_do_bvec(struct pmem_device *pmem, struct page *page,
 	return rc;
 }
 
+/* account for REQ_FLUSH rename, replace with REQ_PREFLUSH after v4.8-rc1 */
+#ifndef REQ_FLUSH
+#define REQ_FLUSH REQ_PREFLUSH
+#endif
+
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
 	int rc = 0;
@@ -121,6 +126,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 	struct bio_vec bvec;
 	struct bvec_iter iter;
 	struct pmem_device *pmem = q->queuedata;
+	struct nd_region *nd_region = to_region(pmem);
+
+	if (bio->bi_rw & REQ_FLUSH)
+		nvdimm_flush(nd_region);
 
 	do_acct = nd_iostat_start(bio, &start);
 	bio_for_each_segment(bvec, bio, iter) {
@@ -135,8 +144,8 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 	if (do_acct)
 		nd_iostat_end(bio, start);
 
-	if (bio_data_dir(bio))
-		nvdimm_flush(to_region(pmem));
+	if (bio->bi_rw & REQ_FUA)
+		nvdimm_flush(nd_region);
 
 	bio_endio(bio);
 	return BLK_QC_T_NONE;
@@ -149,8 +158,6 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
 	int rc;
 
 	rc = pmem_do_bvec(pmem, page, PAGE_SIZE, 0, rw, sector);
-	if (rw & WRITE)
-		nvdimm_flush(to_region(pmem));
 
 	/*
 	 * The ->rw_page interface is subtle and tricky.  The core
@@ -279,6 +286,7 @@ static int pmem_attach_disk(struct device *dev,
 		return PTR_ERR(addr);
 	pmem->virt_addr = (void __pmem *) addr;
 
+	blk_queue_write_cache(q, true, true);
 	blk_queue_make_request(q, pmem_make_request);
 	blk_queue_physical_block_size(q, PAGE_SIZE);
 	blk_queue_max_hw_sectors(q, UINT_MAX);

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v3] libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush()
  2016-07-12 22:26   ` [PATCH v3] " Dan Williams
@ 2016-07-13 19:46     ` Kani, Toshimitsu
  0 siblings, 0 replies; 32+ messages in thread
From: Kani, Toshimitsu @ 2016-07-13 19:46 UTC (permalink / raw)
  To: dan.j.williams, linux-nvdimm@lists.01.org
  Cc: hch, linux-kernel, ross.zwisler, linux-acpi, linux-fsdevel

On Tue, 2016-07-12 at 15:26 -0700, Dan Williams wrote:
> Given that nvdimm_flush() has higher overhead than wmb_pmem() (pointer
> chasing through nd_region), and that we otherwise assume a platform has
> ADR capability when flush hints are not present, move nvdimm_flush() to
> REQ_FLUSH context.
> 
> Note that we still arrange for nvdimm_flush() to be called even in the
> ADR case. We need at least once wmb() fence to push buffered writes in
> the cpu out to the ADR protected domain.
> 
> Cc: Toshi Kani <toshi.kani@hpe.com>
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

This looks good to me.

Reviewed-by: Toshi Kani <toshi.kani@hpe.com>

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 16/17] x86/insn: remove pcommit
  2016-07-12 22:12     ` Dan Williams
@ 2016-07-22 15:55       ` Dan Williams
  2016-07-22 16:52         ` Ingo Molnar
  0 siblings, 1 reply; 32+ messages in thread
From: Dan Williams @ 2016-07-22 15:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-nvdimm, Xiao Guangrong, Josh Poimboeuf, Linux ACPI,
	linux-kernel, X86 ML, Adrian Hunter, Arnaldo Carvalho de Melo,
	Christoph Hellwig, Alexander Shishkin, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, linux-fsdevel, Thomas Gleixner,
	Borislav Petkov, Ross Zwisler

On Tue, Jul 12, 2016 at 3:12 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Tue, Jul 12, 2016 at 7:57 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Sat, Jul 09, 2016 at 08:25:54PM -0700, Dan Williams wrote:
>>> The pcommit instruction is being deprecated in favor of either ADR
>>> (asynchronous DRAM refresh: flush-on-power-fail) at the platform level, or
>>> posted-write-queue flush addresses as defined by the ACPI 6.x NFIT (NVDIMM
>>> Firmware Interface Table).
>>
>>>  arch/x86/include/asm/cpufeatures.h                 |    1
>>>  arch/x86/include/asm/special_insns.h               |   46 --------------------
>>>  arch/x86/lib/x86-opcode-map.txt                    |    2 -
>>>  tools/objtool/arch/x86/insn/x86-opcode-map.txt     |    2 -
>>>  tools/perf/arch/x86/tests/insn-x86-dat-32.c        |    2 -
>>>  tools/perf/arch/x86/tests/insn-x86-dat-64.c        |    2 -
>>>  tools/perf/arch/x86/tests/insn-x86-dat-src.c       |    4 --
>>
>> Just deprecated, or is it completely eradicated, removed from history,
>> will never ever happen and we'll reissue the opcode for something else?
>>
>> Because if its only deprecated then removing it from the instruction
>> decoders seems wrong, old binaries might still contain the opcode.
>
> Eradicated.
>
> "The new instructions like CLWB and CLFLUSHOPT will be rolled into the
> SDM but PCOMMIT will be removed from the Extensions doc and not rolled
> into the SDM." [1]
>
> Existing binaries are already gating their usage on the presence of
> the cpu id flag, that flag and the instruction opcode are reserved
> going forward.
>
> [1]: https://lists.01.org/pipermail/linux-nvdimm/2016-June/005923.html

x86 maintainers, I have the other patches in this series queued in
-next. Please ack this one and I'll add it for v4.8-rc1, or otherwise
let me know how you want to handle this patch.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 16/17] x86/insn: remove pcommit
  2016-07-22 15:55       ` Dan Williams
@ 2016-07-22 16:52         ` Ingo Molnar
  2016-07-23  0:54           ` Dan Williams
  0 siblings, 1 reply; 32+ messages in thread
From: Ingo Molnar @ 2016-07-22 16:52 UTC (permalink / raw)
  To: Dan Williams
  Cc: Peter Zijlstra, linux-nvdimm, Xiao Guangrong, Josh Poimboeuf,
	Linux ACPI, linux-kernel, X86 ML, Adrian Hunter,
	Arnaldo Carvalho de Melo, Christoph Hellwig, Alexander Shishkin,
	Ingo Molnar, Andy Lutomirski, H. Peter Anvin, linux-fsdevel,
	Thomas Gleixner, Borislav Petkov, Ross Zwisler


* Dan Williams <dan.j.williams@intel.com> wrote:

> On Tue, Jul 12, 2016 at 3:12 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> > On Tue, Jul 12, 2016 at 7:57 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> On Sat, Jul 09, 2016 at 08:25:54PM -0700, Dan Williams wrote:
> >>> The pcommit instruction is being deprecated in favor of either ADR
> >>> (asynchronous DRAM refresh: flush-on-power-fail) at the platform level, or
> >>> posted-write-queue flush addresses as defined by the ACPI 6.x NFIT (NVDIMM
> >>> Firmware Interface Table).
> >>
> >>>  arch/x86/include/asm/cpufeatures.h                 |    1
> >>>  arch/x86/include/asm/special_insns.h               |   46 --------------------
> >>>  arch/x86/lib/x86-opcode-map.txt                    |    2 -
> >>>  tools/objtool/arch/x86/insn/x86-opcode-map.txt     |    2 -
> >>>  tools/perf/arch/x86/tests/insn-x86-dat-32.c        |    2 -
> >>>  tools/perf/arch/x86/tests/insn-x86-dat-64.c        |    2 -
> >>>  tools/perf/arch/x86/tests/insn-x86-dat-src.c       |    4 --
> >>
> >> Just deprecated, or is it completely eradicated, removed from history,
> >> will never ever happen and we'll reissue the opcode for something else?
> >>
> >> Because if its only deprecated then removing it from the instruction
> >> decoders seems wrong, old binaries might still contain the opcode.
> >
> > Eradicated.
> >
> > "The new instructions like CLWB and CLFLUSHOPT will be rolled into the
> > SDM but PCOMMIT will be removed from the Extensions doc and not rolled
> > into the SDM." [1]
> >
> > Existing binaries are already gating their usage on the presence of
> > the cpu id flag, that flag and the instruction opcode are reserved
> > going forward.
> >
> > [1]: https://lists.01.org/pipermail/linux-nvdimm/2016-June/005923.html
> 
> x86 maintainers, I have the other patches in this series queued in -next. Please 
> ack this one and I'll add it for v4.8-rc1, or otherwise let me know how you want 
> to handle this patch.

Since it's just a removal AFAICS that the rest of your series should not depend 
on, can you submit it to the x86 tree?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 16/17] x86/insn: remove pcommit
  2016-07-22 16:52         ` Ingo Molnar
@ 2016-07-23  0:54           ` Dan Williams
  2016-07-23  7:49             ` Ingo Molnar
  0 siblings, 1 reply; 32+ messages in thread
From: Dan Williams @ 2016-07-23  0:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, linux-nvdimm, Xiao Guangrong, Josh Poimboeuf,
	Linux ACPI, linux-kernel, X86 ML, Adrian Hunter,
	Arnaldo Carvalho de Melo, Christoph Hellwig, Alexander Shishkin,
	Ingo Molnar, Andy Lutomirski, H. Peter Anvin, linux-fsdevel,
	Thomas Gleixner, Borislav Petkov, Ross Zwisler

On Fri, Jul 22, 2016 at 9:52 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Dan Williams <dan.j.williams@intel.com> wrote:
>
>> On Tue, Jul 12, 2016 at 3:12 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>> > On Tue, Jul 12, 2016 at 7:57 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> >> On Sat, Jul 09, 2016 at 08:25:54PM -0700, Dan Williams wrote:
>> >>> The pcommit instruction is being deprecated in favor of either ADR
>> >>> (asynchronous DRAM refresh: flush-on-power-fail) at the platform level, or
>> >>> posted-write-queue flush addresses as defined by the ACPI 6.x NFIT (NVDIMM
>> >>> Firmware Interface Table).
>> >>
>> >>>  arch/x86/include/asm/cpufeatures.h                 |    1
>> >>>  arch/x86/include/asm/special_insns.h               |   46 --------------------
>> >>>  arch/x86/lib/x86-opcode-map.txt                    |    2 -
>> >>>  tools/objtool/arch/x86/insn/x86-opcode-map.txt     |    2 -
>> >>>  tools/perf/arch/x86/tests/insn-x86-dat-32.c        |    2 -
>> >>>  tools/perf/arch/x86/tests/insn-x86-dat-64.c        |    2 -
>> >>>  tools/perf/arch/x86/tests/insn-x86-dat-src.c       |    4 --
>> >>
>> >> Just deprecated, or is it completely eradicated, removed from history,
>> >> will never ever happen and we'll reissue the opcode for something else?
>> >>
>> >> Because if its only deprecated then removing it from the instruction
>> >> decoders seems wrong, old binaries might still contain the opcode.
>> >
>> > Eradicated.
>> >
>> > "The new instructions like CLWB and CLFLUSHOPT will be rolled into the
>> > SDM but PCOMMIT will be removed from the Extensions doc and not rolled
>> > into the SDM." [1]
>> >
>> > Existing binaries are already gating their usage on the presence of
>> > the cpu id flag, that flag and the instruction opcode are reserved
>> > going forward.
>> >
>> > [1]: https://lists.01.org/pipermail/linux-nvdimm/2016-June/005923.html
>>
>> x86 maintainers, I have the other patches in this series queued in -next. Please
>> ack this one and I'll add it for v4.8-rc1, or otherwise let me know how you want
>> to handle this patch.
>
> Since it's just a removal AFAICS that the rest of your series should not depend
> on, can you submit it to the x86 tree?

This patch depends on the previous patches in the series removing
calls to pcommit_sfence().

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 16/17] x86/insn: remove pcommit
  2016-07-23  0:54           ` Dan Williams
@ 2016-07-23  7:49             ` Ingo Molnar
  0 siblings, 0 replies; 32+ messages in thread
From: Ingo Molnar @ 2016-07-23  7:49 UTC (permalink / raw)
  To: Dan Williams
  Cc: Peter Zijlstra, linux-nvdimm, Xiao Guangrong, Josh Poimboeuf,
	Linux ACPI, linux-kernel, X86 ML, Adrian Hunter,
	Arnaldo Carvalho de Melo, Christoph Hellwig, Alexander Shishkin,
	Ingo Molnar, Andy Lutomirski, H. Peter Anvin, linux-fsdevel,
	Thomas Gleixner, Borislav Petkov, Ross Zwisler


* Dan Williams <dan.j.williams@intel.com> wrote:

> On Fri, Jul 22, 2016 at 9:52 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > * Dan Williams <dan.j.williams@intel.com> wrote:
> >
> >> On Tue, Jul 12, 2016 at 3:12 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> >> > On Tue, Jul 12, 2016 at 7:57 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> >> On Sat, Jul 09, 2016 at 08:25:54PM -0700, Dan Williams wrote:
> >> >>> The pcommit instruction is being deprecated in favor of either ADR
> >> >>> (asynchronous DRAM refresh: flush-on-power-fail) at the platform level, or
> >> >>> posted-write-queue flush addresses as defined by the ACPI 6.x NFIT (NVDIMM
> >> >>> Firmware Interface Table).
> >> >>
> >> >>>  arch/x86/include/asm/cpufeatures.h                 |    1
> >> >>>  arch/x86/include/asm/special_insns.h               |   46 --------------------
> >> >>>  arch/x86/lib/x86-opcode-map.txt                    |    2 -
> >> >>>  tools/objtool/arch/x86/insn/x86-opcode-map.txt     |    2 -
> >> >>>  tools/perf/arch/x86/tests/insn-x86-dat-32.c        |    2 -
> >> >>>  tools/perf/arch/x86/tests/insn-x86-dat-64.c        |    2 -
> >> >>>  tools/perf/arch/x86/tests/insn-x86-dat-src.c       |    4 --
> >> >>
> >> >> Just deprecated, or is it completely eradicated, removed from history,
> >> >> will never ever happen and we'll reissue the opcode for something else?
> >> >>
> >> >> Because if its only deprecated then removing it from the instruction
> >> >> decoders seems wrong, old binaries might still contain the opcode.
> >> >
> >> > Eradicated.
> >> >
> >> > "The new instructions like CLWB and CLFLUSHOPT will be rolled into the
> >> > SDM but PCOMMIT will be removed from the Extensions doc and not rolled
> >> > into the SDM." [1]
> >> >
> >> > Existing binaries are already gating their usage on the presence of
> >> > the cpu id flag, that flag and the instruction opcode are reserved
> >> > going forward.
> >> >
> >> > [1]: https://lists.01.org/pipermail/linux-nvdimm/2016-June/005923.html
> >>
> >> x86 maintainers, I have the other patches in this series queued in -next. Please
> >> ack this one and I'll add it for v4.8-rc1, or otherwise let me know how you want
> >> to handle this patch.
> >
> > Since it's just a removal AFAICS that the rest of your series should not depend
> > on, can you submit it to the x86 tree?
> 
> This patch depends on the previous patches in the series removing
> calls to pcommit_sfence().

Ok, and the patch looks harmless:

Acked-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-07-23  7:49 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-10  3:24 [PATCH v2 00/17] replace pcommit with ADR or directed flushing Dan Williams
2016-07-10  3:24 ` [PATCH v2 01/17] nfit: always associate flush hints Dan Williams
2016-07-10  3:24 ` [PATCH v2 02/17] nfit: don't override return value of nfit_mem_init Dan Williams
2016-07-10  3:24 ` [PATCH v2 03/17] libnvdimm: introduce devm_nvdimm_memremap(), convert nfit_spa_map() users Dan Williams
2016-07-10  5:30   ` kbuild test robot
2016-07-12 22:22   ` [PATCH v3] " Dan Williams
2016-07-10  3:24 ` [PATCH v2 04/17] libnvdimm, nfit: remove nfit_spa_map() infrastructure Dan Williams
2016-07-10  3:24 ` [PATCH v2 05/17] libnvdimm, nfit: move flush hint mapping to region-device driver-data Dan Williams
2016-07-10  3:25 ` [PATCH v2 06/17] tools/testing/nvdimm: simulate multiple flush hints per-dimm Dan Williams
2016-07-10  3:25 ` [PATCH v2 07/17] libnvdimm: keep region data alive over namespace removal Dan Williams
2016-07-10  3:25 ` [PATCH v2 08/17] libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush() Dan Williams
2016-07-10  4:47   ` kbuild test robot
2016-07-10  5:01     ` Dan Williams
2016-07-11  3:48       ` Li, Philip
2016-07-12 22:25   ` [PATCH v3] " Dan Williams
2016-07-10  3:25 ` [PATCH v2 09/17] libnvdimm: cycle flush hints Dan Williams
2016-07-10  3:25 ` [PATCH v2 10/17] libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush() Dan Williams
2016-07-12 22:26   ` [PATCH v3] " Dan Williams
2016-07-13 19:46     ` Kani, Toshimitsu
2016-07-10  3:25 ` [PATCH v2 11/17] libnvdimm, pmem: flush posted-write queues on shutdown Dan Williams
2016-07-10  3:25 ` [PATCH v2 12/17] fs/dax: remove wmb_pmem() Dan Williams
2016-07-10  3:25 ` [PATCH v2 13/17] libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes Dan Williams
2016-07-10  3:25 ` [PATCH v2 14/17] pmem: kill wmb_pmem() Dan Williams
2016-07-10  3:25 ` [PATCH v2 15/17] Revert "KVM: x86: add pcommit support" Dan Williams
2016-07-10  3:25 ` [PATCH v2 16/17] x86/insn: remove pcommit Dan Williams
2016-07-12 14:57   ` Peter Zijlstra
2016-07-12 22:12     ` Dan Williams
2016-07-22 15:55       ` Dan Williams
2016-07-22 16:52         ` Ingo Molnar
2016-07-23  0:54           ` Dan Williams
2016-07-23  7:49             ` Ingo Molnar
2016-07-10  3:25 ` [PATCH v2 17/17] pmem: kill __pmem address space Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).