linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/13] deprecate pcommit
@ 2016-06-04 20:52 Dan Williams
  2016-06-04 20:52 ` [PATCH 01/13] driver core, libnvdimm: disable manual unbind of dimms while region active Dan Williams
                   ` (13 more replies)
  0 siblings, 14 replies; 38+ messages in thread
From: Dan Williams @ 2016-06-04 20:52 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Xiao Guangrong, Peter Zijlstra, Greg Kroah-Hartman, x86, david,
	Adrian Hunter, Arnaldo Carvalho de Melo, hch, Alexander Shishkin,
	Ingo Molnar, Andy Lutomirski, Josh Poimboeuf, Paolo Bonzini,
	Thomas Gleixner, Borislav Petkov, H. Peter Anvin, Ross Zwisler,
	linux-kernel

Platforms supporting NVDIMMs are now required to provide persistence
guarantees once pmem stores are accepted by the memory subsystem.  This
is usually achieved by a platform-level feature known as ADR
(Asynchronous DRAM Refresh) that flushes any memory subsystem write
pending queues on power loss/shutdown.

The 'pcommit' instruction (which has not shipped on any product) is no
longer needed and is deprecated.

---

Dan Williams (13):
      driver core, libnvdimm: disable manual unbind of dimms while region active
      nfit: always associate flush hints
      libnvdimm: introduce nvdimm_flush()
      libnvdimm, nfit: move flush hint mapping to dimm driver
      tools/testing/nvdimm: simulate multiple flush hints per-dimm
      libnvdimm: cycle flush hints per-cpu
      libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush()
      fs/dax: remove wmb_pmem()
      libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
      pmem: kill wmb_pmem()
      Revert "KVM: x86: add pcommit support"
      x86/insn: remove pcommit
      pmem: kill __pmem address space


 Documentation/filesystems/Locking                  |    2 
 arch/powerpc/sysdev/axonram.c                      |    4 -
 arch/x86/include/asm/cpufeatures.h                 |    1 
 arch/x86/include/asm/pmem.h                        |   77 +++----------
 arch/x86/include/asm/special_insns.h               |   46 --------
 arch/x86/include/asm/vmx.h                         |    1 
 arch/x86/include/uapi/asm/vmx.h                    |    4 -
 arch/x86/kvm/cpuid.c                               |    2 
 arch/x86/kvm/cpuid.h                               |    8 -
 arch/x86/kvm/vmx.c                                 |   32 +----
 arch/x86/lib/x86-opcode-map.txt                    |    2 
 drivers/acpi/nfit.c                                |  106 +++++++++++-------
 drivers/acpi/nfit.h                                |    3 -
 drivers/base/base.h                                |    1 
 drivers/base/bus.c                                 |   12 ++
 drivers/base/core.c                                |    1 
 drivers/base/dd.c                                  |    2 
 drivers/block/brd.c                                |    4 -
 drivers/nvdimm/claim.c                             |    2 
 drivers/nvdimm/dimm.c                              |    7 +
 drivers/nvdimm/dimm_devs.c                         |   27 ++++-
 drivers/nvdimm/namespace_devs.c                    |    1 
 drivers/nvdimm/nd-core.h                           |    1 
 drivers/nvdimm/nd.h                                |   15 +++
 drivers/nvdimm/pmem.c                              |   44 +++++---
 drivers/nvdimm/region_devs.c                       |   74 ++++++++++++-
 drivers/s390/block/dcssblk.c                       |    6 +
 fs/dax.c                                           |   13 +-
 include/linux/blkdev.h                             |    6 +
 include/linux/compiler.h                           |    2 
 include/linux/device.h                             |   20 +++
 include/linux/libnvdimm.h                          |   10 ++
 include/linux/nd.h                                 |    2 
 include/linux/pmem.h                               |  117 ++++----------------
 scripts/checkpatch.pl                              |    1 
 tools/objtool/arch/x86/insn/x86-opcode-map.txt     |    2 
 tools/perf/arch/x86/tests/insn-x86-dat-32.c        |    2 
 tools/perf/arch/x86/tests/insn-x86-dat-64.c        |    2 
 tools/perf/arch/x86/tests/insn-x86-dat-src.c       |    4 -
 .../perf/util/intel-pt-decoder/x86-opcode-map.txt  |    2 
 tools/testing/nvdimm/test/nfit.c                   |   55 ++++++---
 41 files changed, 357 insertions(+), 366 deletions(-)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 01/13] driver core, libnvdimm: disable manual unbind of dimms while region active
  2016-06-04 20:52 [PATCH 00/13] deprecate pcommit Dan Williams
@ 2016-06-04 20:52 ` Dan Williams
  2016-06-04 21:10   ` Greg Kroah-Hartman
                     ` (2 more replies)
  2016-06-04 20:52 ` [PATCH 02/13] nfit: always associate flush hints Dan Williams
                   ` (12 subsequent siblings)
  13 siblings, 3 replies; 38+ messages in thread
From: Dan Williams @ 2016-06-04 20:52 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Greg Kroah-Hartman, david, linux-kernel, hch

There are scenarios where we need a middle ground between disabling all
manual bind/unbind attempts (via driver->suppress_bind_attrs) and
allowing unbind at any userspace-determined time.  Pinning modules takes
away one vector for unwanted out-of-sequence device_release_driver()
invocations, this new mechanism (via device->suppress_unbind_attr) takes
away another.

The first user of this mechanism is the libnvdimm sub-system where
manual dimm disabling should be prevented while the dimm is active in
any region.  Note that there is a 1:N dimm-to-region relationship which
is why this is implemented as a disable count rather than a flag.  This
forces userspace to disable regions before dimms when manually shutting
down a bus topology.

This does not affect any of the kernel-internal initiated invocations of
device_release_driver().

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/base/base.h             |    1 +
 drivers/base/bus.c              |   12 ++++++++++--
 drivers/base/core.c             |    1 +
 drivers/base/dd.c               |    2 +-
 drivers/nvdimm/namespace_devs.c |    1 +
 drivers/nvdimm/region_devs.c    |    4 +++-
 include/linux/device.h          |   20 ++++++++++++++++++++
 7 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index e05db388bd1c..129814b17ca6 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -109,6 +109,7 @@ extern int bus_add_driver(struct device_driver *drv);
 extern void bus_remove_driver(struct device_driver *drv);
 
 extern void driver_detach(struct device_driver *drv);
+extern void __device_release_driver(struct device *dev);
 extern int driver_probe_device(struct device_driver *drv, struct device *dev);
 extern void driver_deferred_probe_del(struct device *dev);
 static inline int driver_match_device(struct device_driver *drv,
diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 6470eb8088f4..b48a903f2d28 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -188,10 +188,18 @@ static ssize_t unbind_store(struct device_driver *drv, const char *buf,
 	if (dev && dev->driver == drv) {
 		if (dev->parent)	/* Needed for USB */
 			device_lock(dev->parent);
-		device_release_driver(dev);
+
+		device_lock(dev);
+		if (atomic_read(&dev->suppress_unbind_attr) > 0)
+			err = -EBUSY;
+		else {
+			__device_release_driver(dev);
+			err = count;
+		}
+		device_unlock(dev);
+
 		if (dev->parent)
 			device_unlock(dev->parent);
-		err = count;
 	}
 	put_device(dev);
 	bus_put(bus);
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 0a8bdade53f2..0ea0e8560e1d 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -708,6 +708,7 @@ void device_initialize(struct device *dev)
 	INIT_LIST_HEAD(&dev->devres_head);
 	device_pm_init(dev);
 	set_dev_node(dev, -1);
+	atomic_set(&dev->suppress_unbind_attr, 0);
 #ifdef CONFIG_GENERIC_MSI_IRQ
 	INIT_LIST_HEAD(&dev->msi_list);
 #endif
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 16688f50729c..9e21817ca2d6 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -756,7 +756,7 @@ EXPORT_SYMBOL_GPL(driver_attach);
  * __device_release_driver() must be called with @dev lock held.
  * When called for a USB interface, @dev->parent lock must be held as well.
  */
-static void __device_release_driver(struct device *dev)
+void __device_release_driver(struct device *dev)
 {
 	struct device_driver *drv;
 
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index c5e3196c45b0..e65572b6092c 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -1950,6 +1950,7 @@ static int init_active_labels(struct nd_region *nd_region)
 		}
 		nd_mapping->ndd = ndd;
 		atomic_inc(&nvdimm->busy);
+		device_disable_unbind_attr(&nvdimm->dev);
 		get_ndd(ndd);
 
 		count = nd_label_active_count(ndd);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 40fcfea26fbb..320f0f3ea367 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -427,8 +427,10 @@ static void nd_region_notify_driver_action(struct nvdimm_bus *nvdimm_bus,
 			nd_mapping->labels = NULL;
 			put_ndd(ndd);
 			nd_mapping->ndd = NULL;
-			if (ndd)
+			if (ndd) {
 				atomic_dec(&nvdimm->busy);
+				device_enable_unbind_attr(&nvdimm->dev);
+			}
 		}
 
 		if (is_nd_pmem(dev))
diff --git a/include/linux/device.h b/include/linux/device.h
index 38f02814d53a..d9eaa85bb9cf 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -849,6 +849,7 @@ struct device {
 
 	void	(*release)(struct device *dev);
 	struct iommu_group	*iommu_group;
+	atomic_t		suppress_unbind_attr; /* disable manual unbind */
 
 	bool			offline_disabled:1;
 	bool			offline:1;
@@ -988,6 +989,25 @@ static inline void device_lock_assert(struct device *dev)
 	lockdep_assert_held(&dev->mutex);
 }
 
+static inline bool device_disable_unbind_attr(struct device *dev)
+{
+	bool suppressed = false;
+
+	device_lock(dev);
+	if (dev->driver) {
+		atomic_inc(&dev->suppress_unbind_attr);
+		suppressed = true;
+	}
+	device_unlock(dev);
+
+	return suppressed;
+}
+
+static inline bool device_enable_unbind_attr(struct device *dev)
+{
+	return atomic_dec_and_test(&dev->suppress_unbind_attr);
+}
+
 static inline struct device_node *dev_of_node(struct device *dev)
 {
 	if (!IS_ENABLED(CONFIG_OF))

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 02/13] nfit: always associate flush hints
  2016-06-04 20:52 [PATCH 00/13] deprecate pcommit Dan Williams
  2016-06-04 20:52 ` [PATCH 01/13] driver core, libnvdimm: disable manual unbind of dimms while region active Dan Williams
@ 2016-06-04 20:52 ` Dan Williams
  2016-06-04 20:52 ` [PATCH 03/13] libnvdimm: introduce nvdimm_flush() Dan Williams
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 38+ messages in thread
From: Dan Williams @ 2016-06-04 20:52 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Ross Zwisler, david, linux-kernel, hch

Before enabling use of flush hints for pmem regions, we need to make
sure they are always associated.  Move the initialization of nfit_flush
out of the block-window specific init path to the general init path.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/acpi/nfit.c |   17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 2215fc847fa9..4771872810ef 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -614,7 +614,6 @@ static void nfit_mem_init_bdw(struct acpi_nfit_desc *acpi_desc,
 {
 	u16 dcr = __to_nfit_memdev(nfit_mem)->region_index;
 	struct nfit_memdev *nfit_memdev;
-	struct nfit_flush *nfit_flush;
 	struct nfit_bdw *nfit_bdw;
 	struct nfit_idt *nfit_idt;
 	u16 idt_idx, range_index;
@@ -647,14 +646,6 @@ static void nfit_mem_init_bdw(struct acpi_nfit_desc *acpi_desc,
 			nfit_mem->idt_bdw = nfit_idt->idt;
 			break;
 		}
-
-		list_for_each_entry(nfit_flush, &acpi_desc->flushes, list) {
-			if (nfit_flush->flush->device_handle !=
-					nfit_memdev->memdev->device_handle)
-				continue;
-			nfit_mem->nfit_flush = nfit_flush;
-			break;
-		}
 		break;
 	}
 }
@@ -675,6 +666,7 @@ static int nfit_mem_dcr_init(struct acpi_nfit_desc *acpi_desc,
 	}
 
 	list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
+		struct nfit_flush *nfit_flush;
 		struct nfit_dcr *nfit_dcr;
 		u32 device_handle;
 		u16 dcr;
@@ -721,6 +713,13 @@ static int nfit_mem_dcr_init(struct acpi_nfit_desc *acpi_desc,
 			break;
 		}
 
+		list_for_each_entry(nfit_flush, &acpi_desc->flushes, list) {
+			if (nfit_flush->flush->device_handle != device_handle)
+				continue;
+			nfit_mem->nfit_flush = nfit_flush;
+			break;
+		}
+
 		if (dcr && !nfit_mem->dcr) {
 			dev_err(acpi_desc->dev, "SPA %d missing DCR %d\n",
 					spa->range_index, dcr);

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 03/13] libnvdimm: introduce nvdimm_flush()
  2016-06-04 20:52 [PATCH 00/13] deprecate pcommit Dan Williams
  2016-06-04 20:52 ` [PATCH 01/13] driver core, libnvdimm: disable manual unbind of dimms while region active Dan Williams
  2016-06-04 20:52 ` [PATCH 02/13] nfit: always associate flush hints Dan Williams
@ 2016-06-04 20:52 ` Dan Williams
  2016-06-06 17:45   ` Jeff Moyer
  2016-06-04 20:52 ` [PATCH 04/13] libnvdimm, nfit: move flush hint mapping to dimm driver Dan Williams
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 38+ messages in thread
From: Dan Williams @ 2016-06-04 20:52 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Ross Zwisler, david, linux-kernel, hch

nvdimm_flush() is an alternative to the x86 pcommit instruction.  It is
an optional write flushing mechanism that an nvdimm bus can provide for
the pmem driver to consume.  In the case of the NFIT nvdimm-bus-provider
nvdimm_flush() is implemented as a series of flush-hint-address [1]
writes to each dimm in the interleave set that backs the namespace.  For
now this implementation is just a simple replacement of wmb_pmem() /
arch_has_wmb_pmem() with nvdimm_flush() / nvdimm_has_flush().

We defer the full implementation of nvdimm_flush() until the
implementation is prepared to also handle the blk-region case.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/pmem.c        |   27 +++++++++++++++++++++------
 drivers/nvdimm/region_devs.c |   35 +++++++++++++++++++++++++++++++++++
 include/linux/libnvdimm.h    |    2 ++
 3 files changed, 58 insertions(+), 6 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 608fc4464574..248a89736bbf 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -46,10 +46,24 @@ struct pmem_device {
 	struct badblocks	bb;
 };
 
+static struct device *to_dev(struct pmem_device *pmem)
+{
+	/*
+	 * nvdimm bus services need a 'dev' parameter, and we record the device
+	 * at init in bb.dev.
+	 */
+	return pmem->bb.dev;
+}
+
+static struct nd_region *to_region(struct pmem_device *pmem)
+{
+	return to_nd_region(to_dev(pmem)->parent);
+}
+
 static void pmem_clear_poison(struct pmem_device *pmem, phys_addr_t offset,
 		unsigned int len)
 {
-	struct device *dev = pmem->bb.dev;
+	struct device *dev = to_dev(pmem);
 	sector_t sector;
 	long cleared;
 
@@ -135,7 +149,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 		nd_iostat_end(bio, start);
 
 	if (bio_data_dir(bio))
-		wmb_pmem();
+		nvdimm_flush(to_region(pmem));
 
 	bio_endio(bio);
 	return BLK_QC_T_NONE;
@@ -149,7 +163,7 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
 
 	rc = pmem_do_bvec(pmem, page, PAGE_SIZE, 0, rw, sector);
 	if (rw & WRITE)
-		wmb_pmem();
+		nvdimm_flush(to_region(pmem));
 
 	/*
 	 * The ->rw_page interface is subtle and tricky.  The core
@@ -205,6 +219,7 @@ static int pmem_attach_disk(struct device *dev,
 		struct nd_namespace_common *ndns)
 {
 	struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
+	struct nd_region *nd_region = to_nd_region(dev->parent);
 	struct vmem_altmap __altmap, *altmap = NULL;
 	struct resource *res = &nsio->res;
 	struct nd_pfn *nd_pfn = NULL;
@@ -234,7 +249,7 @@ static int pmem_attach_disk(struct device *dev,
 	dev_set_drvdata(dev, pmem);
 	pmem->phys_addr = res->start;
 	pmem->size = resource_size(res);
-	if (!arch_has_wmb_pmem())
+	if (nvdimm_has_flush(nd_region) < 0)
 		dev_warn(dev, "unable to guarantee persistence of writes\n");
 
 	if (!devm_request_mem_region(dev, res->start, resource_size(res),
@@ -302,7 +317,7 @@ static int pmem_attach_disk(struct device *dev,
 			/ 512);
 	if (devm_init_badblocks(dev, &pmem->bb))
 		return -ENOMEM;
-	nvdimm_badblocks_populate(to_nd_region(dev->parent), &pmem->bb, res);
+	nvdimm_badblocks_populate(nd_region, &pmem->bb, res);
 	disk->bb = &pmem->bb;
 	add_disk(disk);
 	revalidate_disk(disk);
@@ -345,8 +360,8 @@ static int nd_pmem_remove(struct device *dev)
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
 {
-	struct nd_region *nd_region = to_nd_region(dev->parent);
 	struct pmem_device *pmem = dev_get_drvdata(dev);
+	struct nd_region *nd_region = to_region(pmem);
 	resource_size_t offset = 0, end_trunc = 0;
 	struct nd_namespace_common *ndns;
 	struct nd_namespace_io *nsio;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 320f0f3ea367..420e1a5e2250 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -14,6 +14,7 @@
 #include <linux/highmem.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/pmem.h>
 #include <linux/sort.h>
 #include <linux/io.h>
 #include <linux/nd.h>
@@ -796,6 +797,40 @@ struct nd_region *nvdimm_volatile_region_create(struct nvdimm_bus *nvdimm_bus,
 }
 EXPORT_SYMBOL_GPL(nvdimm_volatile_region_create);
 
+/**
+ * nvdimm_flush - flush any posted write queues between the cpu and pmem media
+ * @nd_region: blk or interleaved pmem region
+ */
+void nvdimm_flush(struct nd_region *nd_region)
+{
+	/*
+	 * TODO: replace wmb_pmem() usage with flush hint writes where
+	 * available.
+	 */
+	wmb_pmem();
+}
+EXPORT_SYMBOL_GPL(nvdimm_flush);
+
+/**
+ * nvdimm_has_flush - determine write flushing requirements
+ * @nd_region: blk or interleaved pmem region
+ *
+ * Returns 1 if writes require flushing
+ * Returns 0 if writes do not require flushing
+ * Returns -ENXIO if flushing capability can not be determined
+ */
+int nvdimm_has_flush(struct nd_region *nd_region)
+{
+	/*
+	 * TODO: return 0 / 1 for NFIT regions depending on presence of
+	 * flush hint tables
+	 */
+	if (arch_has_wmb_pmem())
+		return 1;
+	return -ENXIO;
+}
+EXPORT_SYMBOL_GPL(nvdimm_has_flush);
+
 void __exit nd_region_devs_exit(void)
 {
 	ida_destroy(&region_ida);
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 0c3c30cbbea5..90eb3119c3ce 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -156,4 +156,6 @@ struct nvdimm *nd_blk_region_to_dimm(struct nd_blk_region *ndbr);
 unsigned int nd_region_acquire_lane(struct nd_region *nd_region);
 void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane);
 u64 nd_fletcher64(void *addr, size_t len, bool le);
+void nvdimm_flush(struct nd_region *nd_region);
+int nvdimm_has_flush(struct nd_region *nd_region);
 #endif /* __LIBNVDIMM_H__ */

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 04/13] libnvdimm, nfit: move flush hint mapping to dimm driver
  2016-06-04 20:52 [PATCH 00/13] deprecate pcommit Dan Williams
                   ` (2 preceding siblings ...)
  2016-06-04 20:52 ` [PATCH 03/13] libnvdimm: introduce nvdimm_flush() Dan Williams
@ 2016-06-04 20:52 ` Dan Williams
  2016-06-04 21:29   ` kbuild test robot
                     ` (3 more replies)
  2016-06-04 20:52 ` [PATCH 05/13] tools/testing/nvdimm: simulate multiple flush hints per-dimm Dan Williams
                   ` (9 subsequent siblings)
  13 siblings, 4 replies; 38+ messages in thread
From: Dan Williams @ 2016-06-04 20:52 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Ross Zwisler, david, linux-kernel, hch

Since flush hints are a per-dimm property and we want to start using
them outside of block-window I/O context, move their initialization to
nvdimm_probe() context.

For the future use of flush hints in the pmem driver it would be
unfortunate to call back into the bus provider just to issue a write, so
make flush hints a generic property of an nvdimm.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/acpi/nfit.c          |   86 ++++++++++++++++++++++++++----------------
 drivers/acpi/nfit.h          |    1 
 drivers/nvdimm/dimm.c        |    7 +++
 drivers/nvdimm/dimm_devs.c   |   25 ++++++++++++
 drivers/nvdimm/nd-core.h     |    1 
 drivers/nvdimm/nd.h          |   14 +++++++
 drivers/nvdimm/region_devs.c |   47 +++++++++++++++++++----
 include/linux/libnvdimm.h    |    8 +++-
 8 files changed, 145 insertions(+), 44 deletions(-)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 4771872810ef..4643dd7a4284 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -1104,6 +1104,47 @@ static struct nvdimm *acpi_nfit_dimm_by_handle(struct acpi_nfit_desc *acpi_desc,
 	return NULL;
 }
 
+static int acpi_nfit_populate_flush_hints(struct device *dev,
+		void __iomem *flush_wpq[])
+{
+	int i, j;
+	struct nfit_flush *nfit_flush;
+	struct acpi_nfit_flush_address *flush;
+	struct nvdimm *nvdimm = to_nvdimm(dev);
+	struct nfit_mem *nfit_mem = nvdimm_provider_data(nvdimm);
+
+	nfit_flush = nfit_mem->nfit_flush;
+	if (!nfit_flush || !nfit_flush->flush->hint_count)
+		return 0;
+	flush = nfit_flush->flush;
+
+	for (i = 0; i < flush->hint_count; i++) {
+		unsigned long pfn = PHYS_PFN(flush->hint_address[i]);
+		void __iomem *hint_page;
+
+		/* check if flush hints share a page */
+		for (j = 0; j < i; j++) {
+			unsigned long pfn_j = PHYS_PFN(flush->hint_address[j]);
+
+			if (pfn == pfn_j)
+				break;
+		}
+
+		if (j < i)
+			hint_page = (void *) ((unsigned long) flush_wpq[j]
+					& PAGE_MASK);
+		else
+			hint_page = devm_ioremap_nocache(dev,
+					PHYS_PFN(pfn), PAGE_SIZE);
+		if (!hint_page)
+			return -ENXIO;
+		flush_wpq[i] = hint_page
+			+ (flush->hint_address[i] & ~PAGE_MASK);
+	}
+
+	return 0;
+}
+
 static int acpi_nfit_add_dimm(struct acpi_nfit_desc *acpi_desc,
 		struct nfit_mem *nfit_mem, u32 device_handle)
 {
@@ -1170,10 +1211,10 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *acpi_desc)
 
 	list_for_each_entry(nfit_mem, &acpi_desc->dimms, list) {
 		unsigned long flags = 0, cmd_mask;
+		int rc, flush_hints = 0;
 		struct nvdimm *nvdimm;
 		u32 device_handle;
 		u16 mem_flags;
-		int rc;
 
 		device_handle = __to_nfit_memdev(nfit_mem)->device_handle;
 		nvdimm = acpi_nfit_dimm_by_handle(acpi_desc, device_handle);
@@ -1202,9 +1243,16 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *acpi_desc)
 		if (nfit_mem->family == NVDIMM_FAMILY_INTEL)
 			cmd_mask |= nfit_mem->dsm_mask;
 
+		if (nfit_mem->nfit_flush) {
+			struct acpi_nfit_flush_address *flush;
+
+			flush = nfit_mem->nfit_flush->flush;
+			flush_hints = flush->hint_count;
+		}
+
 		nvdimm = nvdimm_create(acpi_desc->nvdimm_bus, nfit_mem,
 				acpi_nfit_dimm_attribute_groups,
-				flags, cmd_mask);
+				flags, cmd_mask, flush_hints);
 		if (!nvdimm)
 			return -ENOMEM;
 
@@ -1372,24 +1420,6 @@ static u64 to_interleave_offset(u64 offset, struct nfit_blk_mmio *mmio)
 	return mmio->base_offset + line_offset + table_offset + sub_line_offset;
 }
 
-static void wmb_blk(struct nfit_blk *nfit_blk)
-{
-
-	if (nfit_blk->nvdimm_flush) {
-		/*
-		 * The first wmb() is needed to 'sfence' all previous writes
-		 * such that they are architecturally visible for the platform
-		 * buffer flush.  Note that we've already arranged for pmem
-		 * writes to avoid the cache via arch_memcpy_to_pmem().  The
-		 * final wmb() ensures ordering for the NVDIMM flush write.
-		 */
-		wmb();
-		writeq(1, nfit_blk->nvdimm_flush);
-		wmb();
-	} else
-		wmb_pmem();
-}
-
 static u32 read_blk_stat(struct nfit_blk *nfit_blk, unsigned int bw)
 {
 	struct nfit_blk_mmio *mmio = &nfit_blk->mmio[DCR];
@@ -1424,7 +1454,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, unsigned int bw,
 		offset = to_interleave_offset(offset, mmio);
 
 	writeq(cmd, mmio->addr.base + offset);
-	wmb_blk(nfit_blk);
+	nvdimm_flush(nfit_blk->nd_region);
 
 	if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
 		readq(mmio->addr.base + offset);
@@ -1475,7 +1505,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk,
 	}
 
 	if (rw)
-		wmb_blk(nfit_blk);
+		nvdimm_flush(nfit_blk->nd_region);
 
 	rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
 	return rc;
@@ -1669,7 +1699,6 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus,
 	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
 	struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
 	struct nd_blk_region *ndbr = to_nd_blk_region(dev);
-	struct nfit_flush *nfit_flush;
 	struct nfit_blk_mmio *mmio;
 	struct nfit_blk *nfit_blk;
 	struct nfit_mem *nfit_mem;
@@ -1744,15 +1773,7 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus,
 		return rc;
 	}
 
-	nfit_flush = nfit_mem->nfit_flush;
-	if (nfit_flush && nfit_flush->flush->hint_count != 0) {
-		nfit_blk->nvdimm_flush = devm_ioremap_nocache(dev,
-				nfit_flush->flush->hint_address[0], 8);
-		if (!nfit_blk->nvdimm_flush)
-			return -ENOMEM;
-	}
-
-	if (!arch_has_wmb_pmem() && !nfit_blk->nvdimm_flush)
+	if (nvdimm_has_flush(nfit_blk->nd_region) < 0)
 		dev_warn(dev, "unable to guarantee persistence of writes\n");
 
 	if (mmio->line_size == 0)
@@ -2504,6 +2525,7 @@ void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
 	nd_desc = &acpi_desc->nd_desc;
 	nd_desc->provider_name = "ACPI.NFIT";
 	nd_desc->ndctl = acpi_nfit_ctl;
+	nd_desc->populate_flush_hints = acpi_nfit_populate_flush_hints;
 	nd_desc->flush_probe = acpi_nfit_flush_probe;
 	nd_desc->clear_to_send = acpi_nfit_clear_to_send;
 	nd_desc->attr_groups = acpi_nfit_attribute_groups;
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index 11cb38348aef..9c8a6cf760be 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -180,7 +180,6 @@ struct nfit_blk {
 	u64 bdw_offset; /* post interleave offset */
 	u64 stat_offset;
 	u64 cmd_offset;
-	void __iomem *nvdimm_flush;
 	u32 dimm_flags;
 };
 
diff --git a/drivers/nvdimm/dimm.c b/drivers/nvdimm/dimm.c
index 71d12bb67339..642dd2c21009 100644
--- a/drivers/nvdimm/dimm.c
+++ b/drivers/nvdimm/dimm.c
@@ -26,7 +26,7 @@ static int nvdimm_probe(struct device *dev)
 	struct nvdimm_drvdata *ndd;
 	int rc;
 
-	ndd = kzalloc(sizeof(*ndd), GFP_KERNEL);
+	ndd = nvdimm_alloc_drvdata(dev);
 	if (!ndd)
 		return -ENOMEM;
 
@@ -40,6 +40,11 @@ static int nvdimm_probe(struct device *dev)
 	get_device(dev);
 	kref_init(&ndd->kref);
 
+	/* trigger bus-provider specific probing */
+	rc = nvdimm_populate_flush_hints(dev);
+	if (rc)
+		goto err;
+
 	rc = nvdimm_init_nsarea(ndd);
 	if (rc)
 		goto err;
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index bbde28d3dec5..e58e8ba155aa 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -24,6 +24,26 @@
 
 static DEFINE_IDA(dimm_ida);
 
+struct nvdimm_drvdata *nvdimm_alloc_drvdata(struct device *dev)
+{
+	struct nvdimm *nvdimm = to_nvdimm(dev);
+
+	return kzalloc(sizeof(struct nvdimm_drvdata)
+			+ sizeof(void *) * max(1, nvdimm->flush_hints),
+			GFP_KERNEL);
+}
+
+int nvdimm_populate_flush_hints(struct device *dev)
+{
+	struct nvdimm_drvdata *ndd = dev_get_drvdata(dev);
+	struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
+	struct nvdimm_bus_descriptor *nd_desc = nvdimm_bus->nd_desc;
+
+	if (nd_desc->populate_flush_hints)
+		return nd_desc->populate_flush_hints(dev, ndd->flush_wpq);
+	return 0;
+}
+
 /*
  * Retrieve bus and dimm handle and return if this bus supports
  * get_config_data commands
@@ -346,7 +366,7 @@ EXPORT_SYMBOL_GPL(nvdimm_attribute_group);
 
 struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
 		const struct attribute_group **groups, unsigned long flags,
-		unsigned long cmd_mask)
+		unsigned long cmd_mask, int flush_hints)
 {
 	struct nvdimm *nvdimm = kzalloc(sizeof(*nvdimm), GFP_KERNEL);
 	struct device *dev;
@@ -362,6 +382,7 @@ struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
 	nvdimm->provider_data = provider_data;
 	nvdimm->flags = flags;
 	nvdimm->cmd_mask = cmd_mask;
+	nvdimm->flush_hints = flush_hints;
 	atomic_set(&nvdimm->busy, 0);
 	dev = &nvdimm->dev;
 	dev_set_name(dev, "nmem%d", nvdimm->id);
@@ -370,6 +391,8 @@ struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
 	dev->devt = MKDEV(nvdimm_major, nvdimm->id);
 	dev->groups = groups;
 	nd_device_register(dev);
+	dev_dbg(dev, "%s: flush_hints: %d cmds: %#lx\n", __func__, flush_hints,
+			cmd_mask);
 
 	return nvdimm;
 }
diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index 284cdaa268cf..1fa36dd45093 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -39,6 +39,7 @@ struct nvdimm {
 	void *provider_data;
 	unsigned long cmd_mask;
 	struct device dev;
+	int flush_hints;
 	atomic_t busy;
 	int id;
 };
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index d0ac93c31dda..4bba7c50961d 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -47,6 +47,7 @@ struct nvdimm_drvdata {
 	int ns_current, ns_next;
 	struct resource dpa;
 	struct kref kref;
+	void __iomem *flush_wpq[0];
 };
 
 struct nd_region_namespaces {
@@ -189,12 +190,25 @@ void nvdimm_exit(void);
 void nd_region_exit(void);
 struct nvdimm;
 struct nvdimm_drvdata *to_ndd(struct nd_mapping *nd_mapping);
+
+/*
+ * ...for contexts where the dimm is guaranteed not to be disabled while
+ * the returned data is in use.
+ */
+static inline struct nvdimm_drvdata *to_ndd_unlocked(
+		struct nd_mapping *nd_mapping)
+{
+	return nd_mapping->ndd;
+}
+
 int nvdimm_init_nsarea(struct nvdimm_drvdata *ndd);
 int nvdimm_init_config_data(struct nvdimm_drvdata *ndd);
 int nvdimm_set_config_data(struct nvdimm_drvdata *ndd, size_t offset,
 		void *buf, size_t len);
 long nvdimm_clear_poison(struct device *dev, phys_addr_t phys,
 		unsigned int len);
+int nvdimm_populate_flush_hints(struct device *dev);
+struct nvdimm_drvdata *nvdimm_alloc_drvdata(struct device *dev);
 struct nd_btt *to_nd_btt(struct device *dev);
 
 struct nd_gen_sb {
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 420e1a5e2250..5b6f85d00bb5 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -803,11 +803,29 @@ EXPORT_SYMBOL_GPL(nvdimm_volatile_region_create);
  */
 void nvdimm_flush(struct nd_region *nd_region)
 {
+	int i;
+
 	/*
-	 * TODO: replace wmb_pmem() usage with flush hint writes where
-	 * available.
+	 * The first wmb() is needed to 'sfence' all previous writes
+	 * such that they are architecturally visible for the platform
+	 * buffer flush.  Note that we've already arranged for pmem
+	 * writes to avoid the cache via arch_memcpy_to_pmem().  The
+	 * final wmb() ensures ordering for the NVDIMM flush write.
 	 */
-	wmb_pmem();
+	wmb();
+	for (i = 0; i < nd_region->ndr_mappings; i++) {
+		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+		struct nvdimm_drvdata *ndd = to_ndd_unlocked(nd_mapping);
+
+		/*
+		 * Note, nvdimm_drvdata guaranteed to be live since we
+		 * arrange for all associated regions to be disabled
+		 * before the dimm is disabled.
+		 */
+		if (ndd->flush_wpq[0])
+			writeq(1, ndd->flush_wpq[0]);
+	}
+	wmb();
 }
 EXPORT_SYMBOL_GPL(nvdimm_flush);
 
@@ -821,13 +839,26 @@ EXPORT_SYMBOL_GPL(nvdimm_flush);
  */
 int nvdimm_has_flush(struct nd_region *nd_region)
 {
+	int i;
+
+	/* no nvdimm == flushing capability unknown */
+	if (nd_region->ndr_mappings == 0)
+		return -ENXIO;
+
+	for (i = 0; i < nd_region->ndr_mappings; i++) {
+		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
+		struct nvdimm *nvdimm = nd_mapping->nvdimm;
+
+		/* flush hints present, flushing required */
+		if (nvdimm->flush_hints)
+			return 1;
+	}
+
 	/*
-	 * TODO: return 0 / 1 for NFIT regions depending on presence of
-	 * flush hint tables
+	 * The platform defines dimm devices without hints, assume
+	 * platform persistence mechanism like ADR
 	 */
-	if (arch_has_wmb_pmem())
-		return 1;
-	return -ENXIO;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(nvdimm_has_flush);
 
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 90eb3119c3ce..840dec0ebaa7 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -66,11 +66,17 @@ struct nd_mapping {
 	struct nvdimm_drvdata *ndd;
 };
 
+/**
+ * struct nvdimm_bus_descriptor - operations and attributes for an nvdimm bus
+ * @attr_groups: sysfs attributes for this bus
+ */
 struct nvdimm_bus_descriptor {
 	const struct attribute_group **attr_groups;
 	unsigned long cmd_mask;
 	char *provider_name;
 	ndctl_fn ndctl;
+	int (*populate_flush_hints)(struct device *dev,
+			void __iomem *flush_wpq[]);
 	int (*flush_probe)(struct nvdimm_bus_descriptor *nd_desc);
 	int (*clear_to_send)(struct nvdimm_bus_descriptor *nd_desc,
 			struct nvdimm *nvdimm, unsigned int cmd);
@@ -134,7 +140,7 @@ unsigned long nvdimm_cmd_mask(struct nvdimm *nvdimm);
 void *nvdimm_provider_data(struct nvdimm *nvdimm);
 struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
 		const struct attribute_group **groups, unsigned long flags,
-		unsigned long cmd_mask);
+		unsigned long cmd_mask, int flush_hints);
 const struct nd_cmd_desc *nd_cmd_dimm_desc(int cmd);
 const struct nd_cmd_desc *nd_cmd_bus_desc(int cmd);
 u32 nd_cmd_in_size(struct nvdimm *nvdimm, int cmd,

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 05/13] tools/testing/nvdimm: simulate multiple flush hints per-dimm
  2016-06-04 20:52 [PATCH 00/13] deprecate pcommit Dan Williams
                   ` (3 preceding siblings ...)
  2016-06-04 20:52 ` [PATCH 04/13] libnvdimm, nfit: move flush hint mapping to dimm driver Dan Williams
@ 2016-06-04 20:52 ` Dan Williams
  2016-06-04 20:53 ` [PATCH 06/13] libnvdimm: cycle flush hints per-cpu Dan Williams
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 38+ messages in thread
From: Dan Williams @ 2016-06-04 20:52 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: david, linux-kernel, hch

Sample nfit data to test the kernel's handling of the multiple
flush-hint case.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 tools/testing/nvdimm/test/nfit.c |   55 +++++++++++++++++++++++---------------
 1 file changed, 33 insertions(+), 22 deletions(-)

diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index c919866853a0..7454ab01f666 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -98,6 +98,7 @@
 enum {
 	NUM_PM  = 3,
 	NUM_DCR = 5,
+	NUM_HINTS = 8,
 	NUM_BDW = NUM_DCR,
 	NUM_SPA = NUM_PM + NUM_DCR + NUM_BDW,
 	NUM_MEM = NUM_DCR + NUM_BDW + 2 /* spa0 iset */ + 4 /* spa1 iset */,
@@ -584,7 +585,8 @@ static int nfit_test0_alloc(struct nfit_test *t)
 			+ offsetof(struct acpi_nfit_control_region,
 					window_size) * NUM_DCR
 			+ sizeof(struct acpi_nfit_data_region) * NUM_BDW
-			+ sizeof(struct acpi_nfit_flush_address) * NUM_DCR;
+			+ (sizeof(struct acpi_nfit_flush_address)
+					+ sizeof(u64) * NUM_HINTS) * NUM_DCR;
 	int i;
 
 	t->nfit_buf = test_alloc(t, nfit_size, &t->nfit_dma);
@@ -614,7 +616,8 @@ static int nfit_test0_alloc(struct nfit_test *t)
 			return -ENOMEM;
 		sprintf(t->label[i], "label%d", i);
 
-		t->flush[i] = test_alloc(t, 8, &t->flush_dma[i]);
+		t->flush[i] = test_alloc(t, sizeof(u64) * NUM_HINTS,
+				&t->flush_dma[i]);
 		if (!t->flush[i])
 			return -ENOMEM;
 	}
@@ -648,6 +651,8 @@ static int nfit_test1_alloc(struct nfit_test *t)
 
 static void nfit_test0_setup(struct nfit_test *t)
 {
+	const int flush_hint_size = sizeof(struct acpi_nfit_flush_address)
+		+ (sizeof(u64) * NUM_HINTS);
 	struct acpi_nfit_desc *acpi_desc;
 	struct acpi_nfit_memory_map *memdev;
 	void *nfit_buf = t->nfit_buf;
@@ -655,7 +660,7 @@ static void nfit_test0_setup(struct nfit_test *t)
 	struct acpi_nfit_control_region *dcr;
 	struct acpi_nfit_data_region *bdw;
 	struct acpi_nfit_flush_address *flush;
-	unsigned int offset;
+	unsigned int offset, i;
 
 	/*
 	 * spa0 (interleave first half of dimm0 and dimm1, note storage
@@ -1141,37 +1146,41 @@ static void nfit_test0_setup(struct nfit_test *t)
 	/* flush0 (dimm0) */
 	flush = nfit_buf + offset;
 	flush->header.type = ACPI_NFIT_TYPE_FLUSH_ADDRESS;
-	flush->header.length = sizeof(struct acpi_nfit_flush_address);
+	flush->header.length = flush_hint_size;
 	flush->device_handle = handle[0];
-	flush->hint_count = 1;
-	flush->hint_address[0] = t->flush_dma[0];
+	flush->hint_count = NUM_HINTS;
+	for (i = 0; i < NUM_HINTS; i++)
+		flush->hint_address[i] = t->flush_dma[0] + i * sizeof(u64);
 
 	/* flush1 (dimm1) */
-	flush = nfit_buf + offset + sizeof(struct acpi_nfit_flush_address) * 1;
+	flush = nfit_buf + offset + flush_hint_size * 1;
 	flush->header.type = ACPI_NFIT_TYPE_FLUSH_ADDRESS;
-	flush->header.length = sizeof(struct acpi_nfit_flush_address);
+	flush->header.length = flush_hint_size;
 	flush->device_handle = handle[1];
-	flush->hint_count = 1;
-	flush->hint_address[0] = t->flush_dma[1];
+	flush->hint_count = NUM_HINTS;
+	for (i = 0; i < NUM_HINTS; i++)
+		flush->hint_address[i] = t->flush_dma[1] + i * sizeof(u64);
 
 	/* flush2 (dimm2) */
-	flush = nfit_buf + offset + sizeof(struct acpi_nfit_flush_address) * 2;
+	flush = nfit_buf + offset + flush_hint_size  * 2;
 	flush->header.type = ACPI_NFIT_TYPE_FLUSH_ADDRESS;
-	flush->header.length = sizeof(struct acpi_nfit_flush_address);
+	flush->header.length = flush_hint_size;
 	flush->device_handle = handle[2];
-	flush->hint_count = 1;
-	flush->hint_address[0] = t->flush_dma[2];
+	flush->hint_count = NUM_HINTS;
+	for (i = 0; i < NUM_HINTS; i++)
+		flush->hint_address[i] = t->flush_dma[2] + i * sizeof(u64);
 
 	/* flush3 (dimm3) */
-	flush = nfit_buf + offset + sizeof(struct acpi_nfit_flush_address) * 3;
+	flush = nfit_buf + offset + flush_hint_size * 3;
 	flush->header.type = ACPI_NFIT_TYPE_FLUSH_ADDRESS;
-	flush->header.length = sizeof(struct acpi_nfit_flush_address);
+	flush->header.length = flush_hint_size;
 	flush->device_handle = handle[3];
-	flush->hint_count = 1;
-	flush->hint_address[0] = t->flush_dma[3];
+	flush->hint_count = NUM_HINTS;
+	for (i = 0; i < NUM_HINTS; i++)
+		flush->hint_address[i] = t->flush_dma[3] + i * sizeof(u64);
 
 	if (t->setup_hotplug) {
-		offset = offset + sizeof(struct acpi_nfit_flush_address) * 4;
+		offset = offset + flush_hint_size * 4;
 		/* dcr-descriptor4: blk */
 		dcr = nfit_buf + offset;
 		dcr->header.type = ACPI_NFIT_TYPE_CONTROL_REGION;
@@ -1300,10 +1309,12 @@ static void nfit_test0_setup(struct nfit_test *t)
 		/* flush3 (dimm4) */
 		flush = nfit_buf + offset;
 		flush->header.type = ACPI_NFIT_TYPE_FLUSH_ADDRESS;
-		flush->header.length = sizeof(struct acpi_nfit_flush_address);
+		flush->header.length = flush_hint_size;
 		flush->device_handle = handle[4];
-		flush->hint_count = 1;
-		flush->hint_address[0] = t->flush_dma[4];
+		flush->hint_count = NUM_HINTS;
+		for (i = 0; i < NUM_HINTS; i++)
+			flush->hint_address[i] = t->flush_dma[4]
+				+ i * sizeof(u64);
 	}
 
 	post_ars_status(&t->ars_state, t->spa_set_dma[0], SPA0_SIZE);

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 06/13] libnvdimm: cycle flush hints per-cpu
  2016-06-04 20:52 [PATCH 00/13] deprecate pcommit Dan Williams
                   ` (4 preceding siblings ...)
  2016-06-04 20:52 ` [PATCH 05/13] tools/testing/nvdimm: simulate multiple flush hints per-dimm Dan Williams
@ 2016-06-04 20:53 ` Dan Williams
  2016-06-04 20:53 ` [PATCH 07/13] libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush() Dan Williams
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 38+ messages in thread
From: Dan Williams @ 2016-06-04 20:53 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Ross Zwisler, david, linux-kernel, hch

When the NFIT provides multiple flush hint addresses per-dimm it is
expressing that the platform is capable of processing multiple flush
requests in parallel.  There is some fixed cost per flush request, let
the cost be shared in parallel on multiple cpus.

Since there may not be enough flush hint addresses for each cpu to have
one keep a per-cpu index of the last used hint, and assume that access
pattern randomness will keep the flush-hint usage somewhat staggered.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/dimm_devs.c   |    2 ++
 drivers/nvdimm/nd.h          |    1 +
 drivers/nvdimm/region_devs.c |    8 ++++++--
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index e58e8ba155aa..c7061932ad40 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -35,10 +35,12 @@ struct nvdimm_drvdata *nvdimm_alloc_drvdata(struct device *dev)
 
 int nvdimm_populate_flush_hints(struct device *dev)
 {
+	struct nvdimm *nvdimm = to_nvdimm(dev);
 	struct nvdimm_drvdata *ndd = dev_get_drvdata(dev);
 	struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
 	struct nvdimm_bus_descriptor *nd_desc = nvdimm_bus->nd_desc;
 
+	ndd->flush_mask = (1 << ilog2(nvdimm->flush_hints)) - 1;
 	if (nd_desc->populate_flush_hints)
 		return nd_desc->populate_flush_hints(dev, ndd->flush_wpq);
 	return 0;
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 4bba7c50961d..3ce169d49e64 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -47,6 +47,7 @@ struct nvdimm_drvdata {
 	int ns_current, ns_next;
 	struct resource dpa;
 	struct kref kref;
+	unsigned int flush_mask;
 	void __iomem *flush_wpq[0];
 };
 
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 5b6f85d00bb5..76ff68d432fb 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -22,6 +22,7 @@
 #include "nd.h"
 
 static DEFINE_IDA(region_ida);
+static DEFINE_PER_CPU(int, flush_idx);
 
 static void nd_region_release(struct device *dev)
 {
@@ -814,6 +815,7 @@ void nvdimm_flush(struct nd_region *nd_region)
 	 */
 	wmb();
 	for (i = 0; i < nd_region->ndr_mappings; i++) {
+		int idx;
 		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
 		struct nvdimm_drvdata *ndd = to_ndd_unlocked(nd_mapping);
 
@@ -822,8 +824,10 @@ void nvdimm_flush(struct nd_region *nd_region)
 		 * arrange for all associated regions to be disabled
 		 * before the dimm is disabled.
 		 */
-		if (ndd->flush_wpq[0])
-			writeq(1, ndd->flush_wpq[0]);
+		if (ndd->flush_wpq[0]) {
+			idx = this_cpu_inc_return(flush_idx) & ndd->flush_mask;
+			writeq(1, ndd->flush_wpq[idx]);
+		}
 	}
 	wmb();
 }

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 07/13] libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush()
  2016-06-04 20:52 [PATCH 00/13] deprecate pcommit Dan Williams
                   ` (5 preceding siblings ...)
  2016-06-04 20:53 ` [PATCH 06/13] libnvdimm: cycle flush hints per-cpu Dan Williams
@ 2016-06-04 20:53 ` Dan Williams
  2016-06-04 20:53 ` [PATCH 08/13] fs/dax: remove wmb_pmem() Dan Williams
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 38+ messages in thread
From: Dan Williams @ 2016-06-04 20:53 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Ross Zwisler, david, linux-kernel, hch

Given that nvdimm_flush() has higher overhead than wmb_pmem() (pointer
chasing through nd_region), and that we otherwise assume a platform has
ADR capability when flush hints are not present, move nvdimm_flush() to
REQ_FLUSH context.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/pmem.c |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 248a89736bbf..79539b6498c9 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -148,7 +148,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 	if (do_acct)
 		nd_iostat_end(bio, start);
 
-	if (bio_data_dir(bio))
+	if (bio->bi_rw & (REQ_FLUSH | REQ_FUA))
 		nvdimm_flush(to_region(pmem));
 
 	bio_endio(bio);
@@ -162,8 +162,6 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
 	int rc;
 
 	rc = pmem_do_bvec(pmem, page, PAGE_SIZE, 0, rw, sector);
-	if (rw & WRITE)
-		nvdimm_flush(to_region(pmem));
 
 	/*
 	 * The ->rw_page interface is subtle and tricky.  The core
@@ -221,9 +219,9 @@ static int pmem_attach_disk(struct device *dev,
 	struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
 	struct nd_region *nd_region = to_nd_region(dev->parent);
 	struct vmem_altmap __altmap, *altmap = NULL;
+	int nid = dev_to_node(dev), has_flush;
 	struct resource *res = &nsio->res;
 	struct nd_pfn *nd_pfn = NULL;
-	int nid = dev_to_node(dev);
 	struct nd_pfn_sb *pfn_sb;
 	struct pmem_device *pmem;
 	struct resource pfn_res;
@@ -249,8 +247,6 @@ static int pmem_attach_disk(struct device *dev,
 	dev_set_drvdata(dev, pmem);
 	pmem->phys_addr = res->start;
 	pmem->size = resource_size(res);
-	if (nvdimm_has_flush(nd_region) < 0)
-		dev_warn(dev, "unable to guarantee persistence of writes\n");
 
 	if (!devm_request_mem_region(dev, res->start, resource_size(res),
 				dev_name(dev))) {
@@ -293,6 +289,11 @@ static int pmem_attach_disk(struct device *dev,
 		return PTR_ERR(addr);
 	pmem->virt_addr = (void __pmem *) addr;
 
+	has_flush = nvdimm_has_flush(nd_region);
+	if (has_flush < 0)
+		dev_warn(dev, "unable to guarantee persistence of writes\n");
+	else if (has_flush > 0)
+		blk_queue_write_cache(q, true, true);
 	blk_queue_make_request(q, pmem_make_request);
 	blk_queue_physical_block_size(q, PAGE_SIZE);
 	blk_queue_max_hw_sectors(q, UINT_MAX);

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 08/13] fs/dax: remove wmb_pmem()
  2016-06-04 20:52 [PATCH 00/13] deprecate pcommit Dan Williams
                   ` (6 preceding siblings ...)
  2016-06-04 20:53 ` [PATCH 07/13] libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush() Dan Williams
@ 2016-06-04 20:53 ` Dan Williams
  2016-06-04 20:53 ` [PATCH 09/13] libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes Dan Williams
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 38+ messages in thread
From: Dan Williams @ 2016-06-04 20:53 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Ross Zwisler, david, linux-kernel, hch

Flushing posted-write queues is now deferred to REQ_FLUSH context, or
otherwise handled by an ADR event at the platform level.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/dax.c |    7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 761495bf5eb9..434f421da660 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -147,7 +147,7 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
 		      struct buffer_head *bh)
 {
 	loff_t pos = start, max = start, bh_max = start;
-	bool hole = false, need_wmb = false;
+	bool hole = false;
 	struct block_device *bdev = NULL;
 	int rw = iov_iter_rw(iter), rc;
 	long map_len = 0;
@@ -213,7 +213,6 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
 
 		if (iov_iter_rw(iter) == WRITE) {
 			len = copy_from_iter_pmem(dax.addr, max - pos, iter);
-			need_wmb = true;
 		} else if (!hole)
 			len = copy_to_iter((void __force *) dax.addr, max - pos,
 					iter);
@@ -230,8 +229,6 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
 			dax.addr += len;
 	}
 
-	if (need_wmb)
-		wmb_pmem();
 	dax_unmap_atomic(bdev, &dax);
 
 	return (pos == start) ? rc : pos - start;
@@ -783,7 +780,6 @@ int dax_writeback_mapping_range(struct address_space *mapping,
 				return ret;
 		}
 	}
-	wmb_pmem();
 	return 0;
 }
 EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
@@ -1227,7 +1223,6 @@ int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
 		if (dax_map_atomic(bdev, &dax) < 0)
 			return PTR_ERR(dax.addr);
 		clear_pmem(dax.addr + offset, length);
-		wmb_pmem();
 		dax_unmap_atomic(bdev, &dax);
 	}
 	return 0;

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 09/13] libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
  2016-06-04 20:52 [PATCH 00/13] deprecate pcommit Dan Williams
                   ` (7 preceding siblings ...)
  2016-06-04 20:53 ` [PATCH 08/13] fs/dax: remove wmb_pmem() Dan Williams
@ 2016-06-04 20:53 ` Dan Williams
  2016-06-04 20:53 ` [PATCH 10/13] pmem: kill wmb_pmem() Dan Williams
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 38+ messages in thread
From: Dan Williams @ 2016-06-04 20:53 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Ross Zwisler, david, linux-kernel, hch

nsio_rw_bytes() is used to write info block metadata to the namespace,
so it should trigger a flush after every write.  Replace wmb_pmem() with
nvdimm_flush() in this path.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/claim.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index 9997ff94a132..d5dc80c48b4c 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -240,7 +240,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 		return memcpy_from_pmem(buf, nsio->addr + offset, size);
 	} else {
 		memcpy_to_pmem(nsio->addr + offset, buf, size);
-		wmb_pmem();
+		nvdimm_flush(to_nd_region(ndns->dev.parent));
 	}
 
 	return 0;

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 10/13] pmem: kill wmb_pmem()
  2016-06-04 20:52 [PATCH 00/13] deprecate pcommit Dan Williams
                   ` (8 preceding siblings ...)
  2016-06-04 20:53 ` [PATCH 09/13] libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes Dan Williams
@ 2016-06-04 20:53 ` Dan Williams
  2016-06-04 20:53 ` [PATCH 11/13] Revert "KVM: x86: add pcommit support" Dan Williams
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 38+ messages in thread
From: Dan Williams @ 2016-06-04 20:53 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Ross Zwisler, david, linux-kernel, hch

All users have been replaced with flushing in the pmem driver.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/pmem.h |   36 ++-------------------------------
 include/linux/pmem.h        |   47 ++++---------------------------------------
 2 files changed, 6 insertions(+), 77 deletions(-)

diff --git a/arch/x86/include/asm/pmem.h b/arch/x86/include/asm/pmem.h
index fbc5e92e1ecc..a8cf2a6b14d9 100644
--- a/arch/x86/include/asm/pmem.h
+++ b/arch/x86/include/asm/pmem.h
@@ -26,8 +26,7 @@
  * @n: length of the copy in bytes
  *
  * Copy data to persistent memory media via non-temporal stores so that
- * a subsequent arch_wmb_pmem() can flush cpu and memory controller
- * write buffers to guarantee durability.
+ * a subsequent pmem driver flush operation will drain posted write queues.
  */
 static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src,
 		size_t n)
@@ -57,32 +56,12 @@ static inline int arch_memcpy_from_pmem(void *dst, const void __pmem *src,
 }
 
 /**
- * arch_wmb_pmem - synchronize writes to persistent memory
- *
- * After a series of arch_memcpy_to_pmem() operations this drains data
- * from cpu write buffers and any platform (memory controller) buffers
- * to ensure that written data is durable on persistent memory media.
- */
-static inline void arch_wmb_pmem(void)
-{
-	/*
-	 * wmb() to 'sfence' all previous writes such that they are
-	 * architecturally visible to 'pcommit'.  Note, that we've
-	 * already arranged for pmem writes to avoid the cache via
-	 * arch_memcpy_to_pmem().
-	 */
-	wmb();
-	pcommit_sfence();
-}
-
-/**
  * arch_wb_cache_pmem - write back a cache range with CLWB
  * @vaddr:	virtual start address
  * @size:	number of bytes to write back
  *
  * Write back a cache range using the CLWB (cache line write back)
- * instruction.  This function requires explicit ordering with an
- * arch_wmb_pmem() call.
+ * instruction.
  */
 static inline void arch_wb_cache_pmem(void __pmem *addr, size_t size)
 {
@@ -113,7 +92,6 @@ static inline bool __iter_needs_pmem_wb(struct iov_iter *i)
  * @i:		iterator with source data
  *
  * Copy data from the iterator 'i' to the PMEM buffer starting at 'addr'.
- * This function requires explicit ordering with an arch_wmb_pmem() call.
  */
 static inline size_t arch_copy_from_iter_pmem(void __pmem *addr, size_t bytes,
 		struct iov_iter *i)
@@ -136,7 +114,6 @@ static inline size_t arch_copy_from_iter_pmem(void __pmem *addr, size_t bytes,
  * @size:	number of bytes to zero
  *
  * Write zeros into the memory range starting at 'addr' for 'size' bytes.
- * This function requires explicit ordering with an arch_wmb_pmem() call.
  */
 static inline void arch_clear_pmem(void __pmem *addr, size_t size)
 {
@@ -150,14 +127,5 @@ static inline void arch_invalidate_pmem(void __pmem *addr, size_t size)
 {
 	clflush_cache_range((void __force *) addr, size);
 }
-
-static inline bool __arch_has_wmb_pmem(void)
-{
-	/*
-	 * We require that wmb() be an 'sfence', that is only guaranteed on
-	 * 64-bit builds
-	 */
-	return static_cpu_has(X86_FEATURE_PCOMMIT);
-}
 #endif /* CONFIG_ARCH_HAS_PMEM_API */
 #endif /* __ASM_X86_PMEM_H__ */
diff --git a/include/linux/pmem.h b/include/linux/pmem.h
index 57d146fe44dd..9e3ea94b8157 100644
--- a/include/linux/pmem.h
+++ b/include/linux/pmem.h
@@ -26,16 +26,6 @@
  * calling these symbols with arch_has_pmem_api() and redirect to the
  * implementation in asm/pmem.h.
  */
-static inline bool __arch_has_wmb_pmem(void)
-{
-	return false;
-}
-
-static inline void arch_wmb_pmem(void)
-{
-	BUG();
-}
-
 static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src,
 		size_t n)
 {
@@ -101,20 +91,6 @@ static inline int memcpy_from_pmem(void *dst, void __pmem const *src,
 		return default_memcpy_from_pmem(dst, src, size);
 }
 
-/**
- * arch_has_wmb_pmem - true if wmb_pmem() ensures durability
- *
- * For a given cpu implementation within an architecture it is possible
- * that wmb_pmem() resolves to a nop.  In the case this returns
- * false, pmem api users are unable to ensure durability and may want to
- * fall back to a different data consistency model, or otherwise notify
- * the user.
- */
-static inline bool arch_has_wmb_pmem(void)
-{
-	return arch_has_pmem_api() && __arch_has_wmb_pmem();
-}
-
 /*
  * These defaults seek to offer decent performance and minimize the
  * window between i/o completion and writes being durable on media.
@@ -152,7 +128,7 @@ static inline void default_clear_pmem(void __pmem *addr, size_t size)
  * being effectively evicted from, or never written to, the processor
  * cache hierarchy after the copy completes.  After memcpy_to_pmem()
  * data may still reside in cpu or platform buffers, so this operation
- * must be followed by a wmb_pmem().
+ * must be followed by a blkdev_issue_flush() on the pmem block device.
  */
 static inline void memcpy_to_pmem(void __pmem *dst, const void *src, size_t n)
 {
@@ -163,28 +139,13 @@ static inline void memcpy_to_pmem(void __pmem *dst, const void *src, size_t n)
 }
 
 /**
- * wmb_pmem - synchronize writes to persistent memory
- *
- * After a series of memcpy_to_pmem() operations this drains data from
- * cpu write buffers and any platform (memory controller) buffers to
- * ensure that written data is durable on persistent memory media.
- */
-static inline void wmb_pmem(void)
-{
-	if (arch_has_wmb_pmem())
-		arch_wmb_pmem();
-	else
-		wmb();
-}
-
-/**
  * copy_from_iter_pmem - copy data from an iterator to PMEM
  * @addr:	PMEM destination address
  * @bytes:	number of bytes to copy
  * @i:		iterator with source data
  *
  * Copy data from the iterator 'i' to the PMEM buffer starting at 'addr'.
- * This function requires explicit ordering with a wmb_pmem() call.
+ * See blkdev_issue_flush() note for memcpy_to_pmem().
  */
 static inline size_t copy_from_iter_pmem(void __pmem *addr, size_t bytes,
 		struct iov_iter *i)
@@ -200,7 +161,7 @@ static inline size_t copy_from_iter_pmem(void __pmem *addr, size_t bytes,
  * @size:	number of bytes to zero
  *
  * Write zeros into the memory range starting at 'addr' for 'size' bytes.
- * This function requires explicit ordering with a wmb_pmem() call.
+ * See blkdev_issue_flush() note for memcpy_to_pmem().
  */
 static inline void clear_pmem(void __pmem *addr, size_t size)
 {
@@ -230,7 +191,7 @@ static inline void invalidate_pmem(void __pmem *addr, size_t size)
  * @size:	number of bytes to write back
  *
  * Write back the processor cache range starting at 'addr' for 'size' bytes.
- * This function requires explicit ordering with a wmb_pmem() call.
+ * See blkdev_issue_flush() note for memcpy_to_pmem().
  */
 static inline void wb_cache_pmem(void __pmem *addr, size_t size)
 {

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 11/13] Revert "KVM: x86: add pcommit support"
  2016-06-04 20:52 [PATCH 00/13] deprecate pcommit Dan Williams
                   ` (9 preceding siblings ...)
  2016-06-04 20:53 ` [PATCH 10/13] pmem: kill wmb_pmem() Dan Williams
@ 2016-06-04 20:53 ` Dan Williams
  2016-06-06 15:14   ` Paolo Bonzini
  2016-06-04 20:53 ` [PATCH 12/13] x86/insn: remove pcommit Dan Williams
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 38+ messages in thread
From: Dan Williams @ 2016-06-04 20:53 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Xiao Guangrong, david, linux-kernel, Paolo Bonzini, Ross Zwisler, hch

This reverts commit 8b3e34e46aca9b6d349b331cd9cf71ccbdc91b2e.

Given the deprecation of the pcommit instruction, revert its usage as a
vm exit source in kvm.

Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/vmx.h      |    1 -
 arch/x86/include/uapi/asm/vmx.h |    4 +---
 arch/x86/kvm/cpuid.c            |    2 +-
 arch/x86/kvm/cpuid.h            |    8 --------
 arch/x86/kvm/vmx.c              |   32 ++++----------------------------
 5 files changed, 6 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 14c63c7e8337..a002b07a7099 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -72,7 +72,6 @@
 #define SECONDARY_EXEC_SHADOW_VMCS              0x00004000
 #define SECONDARY_EXEC_ENABLE_PML               0x00020000
 #define SECONDARY_EXEC_XSAVES			0x00100000
-#define SECONDARY_EXEC_PCOMMIT			0x00200000
 #define SECONDARY_EXEC_TSC_SCALING              0x02000000
 
 #define PIN_BASED_EXT_INTR_MASK                 0x00000001
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index 5b15d94a33f8..37fee272618f 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -78,7 +78,6 @@
 #define EXIT_REASON_PML_FULL            62
 #define EXIT_REASON_XSAVES              63
 #define EXIT_REASON_XRSTORS             64
-#define EXIT_REASON_PCOMMIT             65
 
 #define VMX_EXIT_REASONS \
 	{ EXIT_REASON_EXCEPTION_NMI,         "EXCEPTION_NMI" }, \
@@ -127,8 +126,7 @@
 	{ EXIT_REASON_INVVPID,               "INVVPID" }, \
 	{ EXIT_REASON_INVPCID,               "INVPCID" }, \
 	{ EXIT_REASON_XSAVES,                "XSAVES" }, \
-	{ EXIT_REASON_XRSTORS,               "XRSTORS" }, \
-	{ EXIT_REASON_PCOMMIT,               "PCOMMIT" }
+	{ EXIT_REASON_XRSTORS,               "XRSTORS" }
 
 #define VMX_ABORT_SAVE_GUEST_MSR_FAIL        1
 #define VMX_ABORT_LOAD_HOST_MSR_FAIL         4
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 769af907f824..fe865563870d 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -364,7 +364,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
 		F(BMI2) | F(ERMS) | f_invpcid | F(RTM) | f_mpx | F(RDSEED) |
 		F(ADX) | F(SMAP) | F(AVX512F) | F(AVX512PF) | F(AVX512ER) |
-		F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB) | F(PCOMMIT);
+		F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB);
 
 	/* cpuid 0xD.1.eax */
 	const u32 kvm_cpuid_D_1_eax_x86_features =
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index e17a74b1d852..35058c2c0eea 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -144,14 +144,6 @@ static inline bool guest_cpuid_has_rtm(struct kvm_vcpu *vcpu)
 	return best && (best->ebx & bit(X86_FEATURE_RTM));
 }
 
-static inline bool guest_cpuid_has_pcommit(struct kvm_vcpu *vcpu)
-{
-	struct kvm_cpuid_entry2 *best;
-
-	best = kvm_find_cpuid_entry(vcpu, 7, 0);
-	return best && (best->ebx & bit(X86_FEATURE_PCOMMIT));
-}
-
 static inline bool guest_cpuid_has_rdtscp(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *best;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index fb93010beaa4..2e2685424fdc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2705,8 +2705,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
 		SECONDARY_EXEC_APIC_REGISTER_VIRT |
 		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
 		SECONDARY_EXEC_WBINVD_EXITING |
-		SECONDARY_EXEC_XSAVES |
-		SECONDARY_EXEC_PCOMMIT;
+		SECONDARY_EXEC_XSAVES;
 
 	if (enable_ept) {
 		/* nested EPT: emulate EPT also to L1 */
@@ -3268,7 +3267,6 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 			SECONDARY_EXEC_SHADOW_VMCS |
 			SECONDARY_EXEC_XSAVES |
 			SECONDARY_EXEC_ENABLE_PML |
-			SECONDARY_EXEC_PCOMMIT |
 			SECONDARY_EXEC_TSC_SCALING;
 		if (adjust_vmx_controls(min2, opt2,
 					MSR_IA32_VMX_PROCBASED_CTLS2,
@@ -4856,9 +4854,6 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
 	if (!enable_pml)
 		exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
 
-	/* Currently, we allow L1 guest to directly run pcommit instruction. */
-	exec_control &= ~SECONDARY_EXEC_PCOMMIT;
-
 	return exec_control;
 }
 
@@ -4902,9 +4897,10 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 
 	vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, vmx_exec_control(vmx));
 
-	if (cpu_has_secondary_exec_ctrls())
+	if (cpu_has_secondary_exec_ctrls()) {
 		vmcs_write32(SECONDARY_VM_EXEC_CONTROL,
 				vmx_secondary_exec_control(vmx));
+	}
 
 	if (kvm_vcpu_apicv_active(&vmx->vcpu)) {
 		vmcs_write64(EOI_EXIT_BITMAP0, 0);
@@ -7557,13 +7553,6 @@ static int handle_pml_full(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
-static int handle_pcommit(struct kvm_vcpu *vcpu)
-{
-	/* we never catch pcommit instruct for L1 guest. */
-	WARN_ON(1);
-	return 1;
-}
-
 /*
  * The exit handlers return 1 if the exit was handled fully and guest execution
  * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
@@ -7614,7 +7603,6 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[EXIT_REASON_XSAVES]                  = handle_xsaves,
 	[EXIT_REASON_XRSTORS]                 = handle_xrstors,
 	[EXIT_REASON_PML_FULL]		      = handle_pml_full,
-	[EXIT_REASON_PCOMMIT]                 = handle_pcommit,
 };
 
 static const int kvm_vmx_max_exit_handlers =
@@ -7923,8 +7911,6 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
 		 * the XSS exit bitmap in vmcs12.
 		 */
 		return nested_cpu_has2(vmcs12, SECONDARY_EXEC_XSAVES);
-	case EXIT_REASON_PCOMMIT:
-		return nested_cpu_has2(vmcs12, SECONDARY_EXEC_PCOMMIT);
 	default:
 		return true;
 	}
@@ -9085,15 +9071,6 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
 
 	if (cpu_has_secondary_exec_ctrls())
 		vmcs_set_secondary_exec_control(secondary_exec_ctl);
-
-	if (static_cpu_has(X86_FEATURE_PCOMMIT) && nested) {
-		if (guest_cpuid_has_pcommit(vcpu))
-			vmx->nested.nested_vmx_secondary_ctls_high |=
-				SECONDARY_EXEC_PCOMMIT;
-		else
-			vmx->nested.nested_vmx_secondary_ctls_high &=
-				~SECONDARY_EXEC_PCOMMIT;
-	}
 }
 
 static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
@@ -9706,8 +9683,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 		exec_control &= ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
 				  SECONDARY_EXEC_RDTSCP |
 				  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
-				  SECONDARY_EXEC_APIC_REGISTER_VIRT |
-				  SECONDARY_EXEC_PCOMMIT);
+				  SECONDARY_EXEC_APIC_REGISTER_VIRT);
 		if (nested_cpu_has(vmcs12,
 				CPU_BASED_ACTIVATE_SECONDARY_CONTROLS))
 			exec_control |= vmcs12->secondary_vm_exec_control;

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 12/13] x86/insn: remove pcommit
  2016-06-04 20:52 [PATCH 00/13] deprecate pcommit Dan Williams
                   ` (10 preceding siblings ...)
  2016-06-04 20:53 ` [PATCH 11/13] Revert "KVM: x86: add pcommit support" Dan Williams
@ 2016-06-04 20:53 ` Dan Williams
  2016-06-04 20:53 ` [PATCH 13/13] pmem: kill __pmem address space Dan Williams
  2016-06-05 17:41 ` [PATCH 00/13] deprecate pcommit Andy Lutomirski
  13 siblings, 0 replies; 38+ messages in thread
From: Dan Williams @ 2016-06-04 20:53 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Xiao Guangrong, Josh Poimboeuf, Peter Zijlstra, x86, david,
	Adrian Hunter, Arnaldo Carvalho de Melo, linux-kernel,
	Alexander Shishkin, Ingo Molnar, Andy Lutomirski, H. Peter Anvin,
	Thomas Gleixner, Borislav Petkov, hch, Ross Zwisler

The pcommit instruction is being deprecated in favor of ADR
(asynchronous DRAM refresh) flush-on-power-fail, required at the
platform level.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/cpufeatures.h                 |    1 
 arch/x86/include/asm/special_insns.h               |   46 --------------------
 arch/x86/lib/x86-opcode-map.txt                    |    2 -
 tools/objtool/arch/x86/insn/x86-opcode-map.txt     |    2 -
 tools/perf/arch/x86/tests/insn-x86-dat-32.c        |    2 -
 tools/perf/arch/x86/tests/insn-x86-dat-64.c        |    2 -
 tools/perf/arch/x86/tests/insn-x86-dat-src.c       |    4 --
 .../perf/util/intel-pt-decoder/x86-opcode-map.txt  |    2 -
 8 files changed, 3 insertions(+), 58 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 4a413485f9eb..700d97df7d28 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -225,7 +225,6 @@
 #define X86_FEATURE_RDSEED	( 9*32+18) /* The RDSEED instruction */
 #define X86_FEATURE_ADX		( 9*32+19) /* The ADCX and ADOX instructions */
 #define X86_FEATURE_SMAP	( 9*32+20) /* Supervisor Mode Access Prevention */
-#define X86_FEATURE_PCOMMIT	( 9*32+22) /* PCOMMIT instruction */
 #define X86_FEATURE_CLFLUSHOPT	( 9*32+23) /* CLFLUSHOPT instruction */
 #define X86_FEATURE_CLWB	( 9*32+24) /* CLWB instruction */
 #define X86_FEATURE_AVX512PF	( 9*32+26) /* AVX-512 Prefetch */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index d96d04377765..587d7914ea4b 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -253,52 +253,6 @@ static inline void clwb(volatile void *__p)
 		: [pax] "a" (p));
 }
 
-/**
- * pcommit_sfence() - persistent commit and fence
- *
- * The PCOMMIT instruction ensures that data that has been flushed from the
- * processor's cache hierarchy with CLWB, CLFLUSHOPT or CLFLUSH is accepted to
- * memory and is durable on the DIMM.  The primary use case for this is
- * persistent memory.
- *
- * This function shows how to properly use CLWB/CLFLUSHOPT/CLFLUSH and PCOMMIT
- * with appropriate fencing.
- *
- * Example:
- * void flush_and_commit_buffer(void *vaddr, unsigned int size)
- * {
- *         unsigned long clflush_mask = boot_cpu_data.x86_clflush_size - 1;
- *         void *vend = vaddr + size;
- *         void *p;
- *
- *         for (p = (void *)((unsigned long)vaddr & ~clflush_mask);
- *              p < vend; p += boot_cpu_data.x86_clflush_size)
- *                 clwb(p);
- *
- *         // SFENCE to order CLWB/CLFLUSHOPT/CLFLUSH cache flushes
- *         // MFENCE via mb() also works
- *         wmb();
- *
- *         // PCOMMIT and the required SFENCE for ordering
- *         pcommit_sfence();
- * }
- *
- * After this function completes the data pointed to by 'vaddr' has been
- * accepted to memory and will be durable if the 'vaddr' points to persistent
- * memory.
- *
- * PCOMMIT must always be ordered by an MFENCE or SFENCE, so to help simplify
- * things we include both the PCOMMIT and the required SFENCE in the
- * alternatives generated by pcommit_sfence().
- */
-static inline void pcommit_sfence(void)
-{
-	alternative(ASM_NOP7,
-		    ".byte 0x66, 0x0f, 0xae, 0xf8\n\t" /* pcommit */
-		    "sfence",
-		    X86_FEATURE_PCOMMIT);
-}
-
 #define nop() asm volatile ("nop")
 
 
diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index d388de72eaca..28632ee68377 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -947,7 +947,7 @@ GrpTable: Grp15
 4: XSAVE
 5: XRSTOR | lfence (11B)
 6: XSAVEOPT | clwb (66) | mfence (11B)
-7: clflush | clflushopt (66) | sfence (11B) | pcommit (66),(11B)
+7: clflush | clflushopt (66) | sfence (11B)
 EndTable
 
 GrpTable: Grp16
diff --git a/tools/objtool/arch/x86/insn/x86-opcode-map.txt b/tools/objtool/arch/x86/insn/x86-opcode-map.txt
index d388de72eaca..28632ee68377 100644
--- a/tools/objtool/arch/x86/insn/x86-opcode-map.txt
+++ b/tools/objtool/arch/x86/insn/x86-opcode-map.txt
@@ -947,7 +947,7 @@ GrpTable: Grp15
 4: XSAVE
 5: XRSTOR | lfence (11B)
 6: XSAVEOPT | clwb (66) | mfence (11B)
-7: clflush | clflushopt (66) | sfence (11B) | pcommit (66),(11B)
+7: clflush | clflushopt (66) | sfence (11B)
 EndTable
 
 GrpTable: Grp16
diff --git a/tools/perf/arch/x86/tests/insn-x86-dat-32.c b/tools/perf/arch/x86/tests/insn-x86-dat-32.c
index 3b491cfe204e..38a48daed154 100644
--- a/tools/perf/arch/x86/tests/insn-x86-dat-32.c
+++ b/tools/perf/arch/x86/tests/insn-x86-dat-32.c
@@ -654,5 +654,3 @@
 "0f c7 1d 78 56 34 12 \txrstors 0x12345678",},
 {{0x0f, 0xc7, 0x9c, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 8, 0, "", "",
 "0f c7 9c c8 78 56 34 12 \txrstors 0x12345678(%eax,%ecx,8)",},
-{{0x66, 0x0f, 0xae, 0xf8, }, 4, 0, "", "",
-"66 0f ae f8          \tpcommit ",},
diff --git a/tools/perf/arch/x86/tests/insn-x86-dat-64.c b/tools/perf/arch/x86/tests/insn-x86-dat-64.c
index 4fe7cce179c4..1f11ea85b60f 100644
--- a/tools/perf/arch/x86/tests/insn-x86-dat-64.c
+++ b/tools/perf/arch/x86/tests/insn-x86-dat-64.c
@@ -764,5 +764,3 @@
 "0f c7 9c c8 78 56 34 12 \txrstors 0x12345678(%rax,%rcx,8)",},
 {{0x41, 0x0f, 0xc7, 0x9c, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 9, 0, "", "",
 "41 0f c7 9c c8 78 56 34 12 \txrstors 0x12345678(%r8,%rcx,8)",},
-{{0x66, 0x0f, 0xae, 0xf8, }, 4, 0, "", "",
-"66 0f ae f8          \tpcommit ",},
diff --git a/tools/perf/arch/x86/tests/insn-x86-dat-src.c b/tools/perf/arch/x86/tests/insn-x86-dat-src.c
index 41b1b1c62660..033b8a6fdab9 100644
--- a/tools/perf/arch/x86/tests/insn-x86-dat-src.c
+++ b/tools/perf/arch/x86/tests/insn-x86-dat-src.c
@@ -866,10 +866,6 @@ int main(void)
 
 #endif /* #ifndef __x86_64__ */
 
-	/* pcommit */
-
-	asm volatile("pcommit");
-
 	/* Following line is a marker for the awk script - do not change */
 	asm volatile("rdtsc"); /* Stop here */
 
diff --git a/tools/perf/util/intel-pt-decoder/x86-opcode-map.txt b/tools/perf/util/intel-pt-decoder/x86-opcode-map.txt
index d388de72eaca..28632ee68377 100644
--- a/tools/perf/util/intel-pt-decoder/x86-opcode-map.txt
+++ b/tools/perf/util/intel-pt-decoder/x86-opcode-map.txt
@@ -947,7 +947,7 @@ GrpTable: Grp15
 4: XSAVE
 5: XRSTOR | lfence (11B)
 6: XSAVEOPT | clwb (66) | mfence (11B)
-7: clflush | clflushopt (66) | sfence (11B) | pcommit (66),(11B)
+7: clflush | clflushopt (66) | sfence (11B)
 EndTable
 
 GrpTable: Grp16

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 13/13] pmem: kill __pmem address space
  2016-06-04 20:52 [PATCH 00/13] deprecate pcommit Dan Williams
                   ` (11 preceding siblings ...)
  2016-06-04 20:53 ` [PATCH 12/13] x86/insn: remove pcommit Dan Williams
@ 2016-06-04 20:53 ` Dan Williams
  2016-06-04 22:18   ` kbuild test robot
  2016-06-05 17:41 ` [PATCH 00/13] deprecate pcommit Andy Lutomirski
  13 siblings, 1 reply; 38+ messages in thread
From: Dan Williams @ 2016-06-04 20:53 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Ross Zwisler, david, hch, linux-kernel

The __pmem address space was meant to annotate codepaths that touch
persistent memory and need to coordinate a call to wmb_pmem().  Now that
wmb_pmem() is gone, there is little need to keep this annotation.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/filesystems/Locking |    2 +
 arch/powerpc/sysdev/axonram.c     |    4 +-
 arch/x86/include/asm/pmem.h       |   41 +++++++++-------------
 drivers/acpi/nfit.c               |    3 +-
 drivers/acpi/nfit.h               |    2 +
 drivers/block/brd.c               |    4 +-
 drivers/nvdimm/pmem.c             |    8 ++--
 drivers/s390/block/dcssblk.c      |    6 ++-
 fs/dax.c                          |    6 ++-
 include/linux/blkdev.h            |    6 ++-
 include/linux/compiler.h          |    2 -
 include/linux/nd.h                |    2 +
 include/linux/pmem.h              |   70 +++++++++----------------------------
 scripts/checkpatch.pl             |    1 -
 14 files changed, 55 insertions(+), 102 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 75eea7ce3d7c..d9c37ec4c760 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -395,7 +395,7 @@ prototypes:
 	int (*release) (struct gendisk *, fmode_t);
 	int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
 	int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
-	int (*direct_access) (struct block_device *, sector_t, void __pmem **,
+	int (*direct_access) (struct block_device *, sector_t, void **,
 				unsigned long *);
 	int (*media_changed) (struct gendisk *);
 	void (*unlock_native_capacity) (struct gendisk *);
diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index ff75d70f7285..154cd9110c08 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -143,12 +143,12 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio)
  */
 static long
 axon_ram_direct_access(struct block_device *device, sector_t sector,
-		       void __pmem **kaddr, pfn_t *pfn, long size)
+		       void **kaddr, pfn_t *pfn, long size)
 {
 	struct axon_ram_bank *bank = device->bd_disk->private_data;
 	loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
 
-	*kaddr = (void __pmem __force *) bank->io_addr + offset;
+	*kaddr = bank->io_addr + offset;
 	*pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
 	return bank->size - offset;
 }
diff --git a/arch/x86/include/asm/pmem.h b/arch/x86/include/asm/pmem.h
index a8cf2a6b14d9..643eba42d620 100644
--- a/arch/x86/include/asm/pmem.h
+++ b/arch/x86/include/asm/pmem.h
@@ -28,10 +28,9 @@
  * Copy data to persistent memory media via non-temporal stores so that
  * a subsequent pmem driver flush operation will drain posted write queues.
  */
-static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src,
-		size_t n)
+static inline void arch_memcpy_to_pmem(void *dst, const void *src, size_t n)
 {
-	int unwritten;
+	int rem;
 
 	/*
 	 * We are copying between two kernel buffers, if
@@ -39,19 +38,17 @@ static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src,
 	 * fault) we would have already reported a general protection fault
 	 * before the WARN+BUG.
 	 */
-	unwritten = __copy_from_user_inatomic_nocache((void __force *) dst,
-			(void __user *) src, n);
-	if (WARN(unwritten, "%s: fault copying %p <- %p unwritten: %d\n",
-				__func__, dst, src, unwritten))
+	rem = __copy_from_user_inatomic_nocache(dst, (void __user *) src, n);
+	if (WARN(rem, "%s: fault copying %p <- %p unwritten: %d\n",
+				__func__, dst, src, rem))
 		BUG();
 }
 
-static inline int arch_memcpy_from_pmem(void *dst, const void __pmem *src,
-		size_t n)
+static inline int arch_memcpy_from_pmem(void *dst, const void *src, size_t n)
 {
 	if (static_cpu_has(X86_FEATURE_MCE_RECOVERY))
-		return memcpy_mcsafe(dst, (void __force *) src, n);
-	memcpy(dst, (void __force *) src, n);
+		return memcpy_mcsafe(dst, src, n);
+	memcpy(dst, src, n);
 	return 0;
 }
 
@@ -63,15 +60,14 @@ static inline int arch_memcpy_from_pmem(void *dst, const void __pmem *src,
  * Write back a cache range using the CLWB (cache line write back)
  * instruction.
  */
-static inline void arch_wb_cache_pmem(void __pmem *addr, size_t size)
+static inline void arch_wb_cache_pmem(void *addr, size_t size)
 {
 	u16 x86_clflush_size = boot_cpu_data.x86_clflush_size;
 	unsigned long clflush_mask = x86_clflush_size - 1;
-	void *vaddr = (void __force *)addr;
-	void *vend = vaddr + size;
+	void *vend = addr + size;
 	void *p;
 
-	for (p = (void *)((unsigned long)vaddr & ~clflush_mask);
+	for (p = (void *)((unsigned long)addr & ~clflush_mask);
 	     p < vend; p += x86_clflush_size)
 		clwb(p);
 }
@@ -93,14 +89,13 @@ static inline bool __iter_needs_pmem_wb(struct iov_iter *i)
  *
  * Copy data from the iterator 'i' to the PMEM buffer starting at 'addr'.
  */
-static inline size_t arch_copy_from_iter_pmem(void __pmem *addr, size_t bytes,
+static inline size_t arch_copy_from_iter_pmem(void *addr, size_t bytes,
 		struct iov_iter *i)
 {
-	void *vaddr = (void __force *)addr;
 	size_t len;
 
 	/* TODO: skip the write-back by always using non-temporal stores */
-	len = copy_from_iter_nocache(vaddr, bytes, i);
+	len = copy_from_iter_nocache(addr, bytes, i);
 
 	if (__iter_needs_pmem_wb(i))
 		arch_wb_cache_pmem(addr, bytes);
@@ -115,17 +110,15 @@ static inline size_t arch_copy_from_iter_pmem(void __pmem *addr, size_t bytes,
  *
  * Write zeros into the memory range starting at 'addr' for 'size' bytes.
  */
-static inline void arch_clear_pmem(void __pmem *addr, size_t size)
+static inline void arch_clear_pmem(void *addr, size_t size)
 {
-	void *vaddr = (void __force *)addr;
-
-	memset(vaddr, 0, size);
+	memset(addr, 0, size);
 	arch_wb_cache_pmem(addr, size);
 }
 
-static inline void arch_invalidate_pmem(void __pmem *addr, size_t size)
+static inline void arch_invalidate_pmem(void *addr, size_t size)
 {
-	clflush_cache_range((void __force *) addr, size);
+	clflush_cache_range(addr, size);
 }
 #endif /* CONFIG_ARCH_HAS_PMEM_API */
 #endif /* __ASM_X86_PMEM_H__ */
diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 4643dd7a4284..2b556c084576 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -1612,8 +1612,7 @@ static void __iomem *__nfit_spa_map(struct acpi_nfit_desc *acpi_desc,
 
 	spa_map->type = type;
 	if (type == SPA_MAP_APERTURE)
-		spa_map->addr.aperture = (void __pmem *)memremap(start, n,
-							ARCH_MEMREMAP_PMEM);
+		spa_map->addr.aperture = memremap(start, n, ARCH_MEMREMAP_PMEM);
 	else
 		spa_map->addr.base = ioremap_nocache(start, n);
 
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index 9c8a6cf760be..626b107c6d6c 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -161,7 +161,7 @@ enum nd_blk_mmio_selector {
 struct nd_blk_addr {
 	union {
 		void __iomem *base;
-		void __pmem  *aperture;
+		void *aperture;
 	};
 };
 
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index c04bd9bc39fd..5f1fe4e6208d 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -381,7 +381,7 @@ static int brd_rw_page(struct block_device *bdev, sector_t sector,
 
 #ifdef CONFIG_BLK_DEV_RAM_DAX
 static long brd_direct_access(struct block_device *bdev, sector_t sector,
-			void __pmem **kaddr, pfn_t *pfn, long size)
+			void **kaddr, pfn_t *pfn, long size)
 {
 	struct brd_device *brd = bdev->bd_disk->private_data;
 	struct page *page;
@@ -391,7 +391,7 @@ static long brd_direct_access(struct block_device *bdev, sector_t sector,
 	page = brd_insert_page(brd, sector);
 	if (!page)
 		return -ENOSPC;
-	*kaddr = (void __pmem *)page_address(page);
+	*kaddr = page_address(page);
 	*pfn = page_to_pfn_t(page);
 
 	return PAGE_SIZE;
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 79539b6498c9..fd645585b71d 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -38,7 +38,7 @@ struct pmem_device {
 	/* when non-zero this device is hosting a 'pfn' instance */
 	phys_addr_t		data_offset;
 	u64			pfn_flags;
-	void __pmem		*virt_addr;
+	void			*virt_addr;
 	/* immutable base size of the namespace */
 	size_t			size;
 	/* trim size when namespace capacity has been section aligned */
@@ -87,7 +87,7 @@ static int pmem_do_bvec(struct pmem_device *pmem, struct page *page,
 	bool bad_pmem = false;
 	void *mem = kmap_atomic(page);
 	phys_addr_t pmem_off = sector * 512 + pmem->data_offset;
-	void __pmem *pmem_addr = pmem->virt_addr + pmem_off;
+	void *pmem_addr = pmem->virt_addr + pmem_off;
 
 	if (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
 		bad_pmem = true;
@@ -176,7 +176,7 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
 }
 
 static long pmem_direct_access(struct block_device *bdev, sector_t sector,
-		      void __pmem **kaddr, pfn_t *pfn, long size)
+		      void **kaddr, pfn_t *pfn, long size)
 {
 	struct pmem_device *pmem = bdev->bd_queue->queuedata;
 	resource_size_t offset = sector * 512 + pmem->data_offset;
@@ -287,7 +287,7 @@ static int pmem_attach_disk(struct device *dev,
 
 	if (IS_ERR(addr))
 		return PTR_ERR(addr);
-	pmem->virt_addr = (void __pmem *) addr;
+	pmem->virt_addr = addr;
 
 	has_flush = nvdimm_has_flush(nd_region);
 	if (has_flush < 0)
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index bed53c46dd90..ac0345bedd9d 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -31,7 +31,7 @@ static void dcssblk_release(struct gendisk *disk, fmode_t mode);
 static blk_qc_t dcssblk_make_request(struct request_queue *q,
 						struct bio *bio);
 static long dcssblk_direct_access(struct block_device *bdev, sector_t secnum,
-			 void __pmem **kaddr, pfn_t *pfn, long size);
+			 void **kaddr, pfn_t *pfn, long size);
 
 static char dcssblk_segments[DCSSBLK_PARM_LEN] = "\0";
 
@@ -884,7 +884,7 @@ fail:
 
 static long
 dcssblk_direct_access (struct block_device *bdev, sector_t secnum,
-			void __pmem **kaddr, pfn_t *pfn, long size)
+			void **kaddr, pfn_t *pfn, long size)
 {
 	struct dcssblk_dev_info *dev_info;
 	unsigned long offset, dev_sz;
@@ -894,7 +894,7 @@ dcssblk_direct_access (struct block_device *bdev, sector_t secnum,
 		return -ENODEV;
 	dev_sz = dev_info->end - dev_info->start;
 	offset = secnum * 512;
-	*kaddr = (void __pmem *) (dev_info->start + offset);
+	*kaddr = dev_info->start + offset;
 	*pfn = __pfn_to_pfn_t(PFN_DOWN(dev_info->start + offset), PFN_DEV);
 
 	return dev_sz - offset;
diff --git a/fs/dax.c b/fs/dax.c
index 434f421da660..c8312f6441bc 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -75,13 +75,13 @@ static long dax_map_atomic(struct block_device *bdev, struct blk_dax_ctl *dax)
 	struct request_queue *q = bdev->bd_queue;
 	long rc = -EIO;
 
-	dax->addr = (void __pmem *) ERR_PTR(-EIO);
+	dax->addr = ERR_PTR(-EIO);
 	if (blk_queue_enter(q, true) != 0)
 		return rc;
 
 	rc = bdev_direct_access(bdev, dax);
 	if (rc < 0) {
-		dax->addr = (void __pmem *) ERR_PTR(rc);
+		dax->addr = ERR_PTR(rc);
 		blk_queue_exit(q);
 		return rc;
 	}
@@ -152,7 +152,7 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
 	int rw = iov_iter_rw(iter), rc;
 	long map_len = 0;
 	struct blk_dax_ctl dax = {
-		.addr = (void __pmem *) ERR_PTR(-EIO),
+		.addr = ERR_PTR(-EIO),
 	};
 	unsigned blkbits = inode->i_blkbits;
 	sector_t file_blks = (i_size_read(inode) + (1 << blkbits) - 1)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3d9cf326574f..fde908b2836b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1659,7 +1659,7 @@ static inline bool integrity_req_gap_front_merge(struct request *req,
  */
 struct blk_dax_ctl {
 	sector_t sector;
-	void __pmem *addr;
+	void *addr;
 	long size;
 	pfn_t pfn;
 };
@@ -1670,8 +1670,8 @@ struct block_device_operations {
 	int (*rw_page)(struct block_device *, sector_t, struct page *, int rw);
 	int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
 	int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
-	long (*direct_access)(struct block_device *, sector_t, void __pmem **,
-			pfn_t *, long);
+	long (*direct_access)(struct block_device *, sector_t, void **, pfn_t *,
+			long);
 	unsigned int (*check_events) (struct gendisk *disk,
 				      unsigned int clearing);
 	/* ->media_changed() is DEPRECATED, use ->check_events() instead */
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 793c0829e3a3..b966974938ed 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -17,7 +17,6 @@
 # define __release(x)	__context__(x,-1)
 # define __cond_lock(x,c)	((c) ? ({ __acquire(x); 1; }) : 0)
 # define __percpu	__attribute__((noderef, address_space(3)))
-# define __pmem		__attribute__((noderef, address_space(5)))
 #ifdef CONFIG_SPARSE_RCU_POINTER
 # define __rcu		__attribute__((noderef, address_space(4)))
 #else /* CONFIG_SPARSE_RCU_POINTER */
@@ -45,7 +44,6 @@ extern void __chk_io_ptr(const volatile void __iomem *);
 # define __cond_lock(x,c) (c)
 # define __percpu
 # define __rcu
-# define __pmem
 # define __private
 # define ACCESS_PRIVATE(p, member) ((p)->member)
 #endif /* __CHECKER__ */
diff --git a/include/linux/nd.h b/include/linux/nd.h
index aee2761d294c..47271d366be5 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -67,7 +67,7 @@ struct nd_namespace_io {
 	struct nd_namespace_common common;
 	struct resource res;
 	resource_size_t size;
-	void __pmem *addr;
+	void *addr;
 	struct badblocks bb;
 };
 
diff --git a/include/linux/pmem.h b/include/linux/pmem.h
index 9e3ea94b8157..e856c2cb0fe8 100644
--- a/include/linux/pmem.h
+++ b/include/linux/pmem.h
@@ -26,37 +26,35 @@
  * calling these symbols with arch_has_pmem_api() and redirect to the
  * implementation in asm/pmem.h.
  */
-static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src,
-		size_t n)
+static inline void arch_memcpy_to_pmem(void *dst, const void *src, size_t n)
 {
 	BUG();
 }
 
-static inline int arch_memcpy_from_pmem(void *dst, const void __pmem *src,
-		size_t n)
+static inline int arch_memcpy_from_pmem(void *dst, const void *src, size_t n)
 {
 	BUG();
 	return -EFAULT;
 }
 
-static inline size_t arch_copy_from_iter_pmem(void __pmem *addr, size_t bytes,
+static inline size_t arch_copy_from_iter_pmem(void *addr, size_t bytes,
 		struct iov_iter *i)
 {
 	BUG();
 	return 0;
 }
 
-static inline void arch_clear_pmem(void __pmem *addr, size_t size)
+static inline void arch_clear_pmem(void *addr, size_t size)
 {
 	BUG();
 }
 
-static inline void arch_wb_cache_pmem(void __pmem *addr, size_t size)
+static inline void arch_wb_cache_pmem(void *addr, size_t size)
 {
 	BUG();
 }
 
-static inline void arch_invalidate_pmem(void __pmem *addr, size_t size)
+static inline void arch_invalidate_pmem(void *addr, size_t size)
 {
 	BUG();
 }
@@ -67,13 +65,6 @@ static inline bool arch_has_pmem_api(void)
 	return IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API);
 }
 
-static inline int default_memcpy_from_pmem(void *dst, void __pmem const *src,
-		size_t size)
-{
-	memcpy(dst, (void __force *) src, size);
-	return 0;
-}
-
 /*
  * memcpy_from_pmem - read from persistent memory with error handling
  * @dst: destination buffer
@@ -82,40 +73,13 @@ static inline int default_memcpy_from_pmem(void *dst, void __pmem const *src,
  *
  * Returns 0 on success negative error code on failure.
  */
-static inline int memcpy_from_pmem(void *dst, void __pmem const *src,
-		size_t size)
+static inline int memcpy_from_pmem(void *dst, void const *src, size_t size)
 {
 	if (arch_has_pmem_api())
 		return arch_memcpy_from_pmem(dst, src, size);
 	else
-		return default_memcpy_from_pmem(dst, src, size);
-}
-
-/*
- * These defaults seek to offer decent performance and minimize the
- * window between i/o completion and writes being durable on media.
- * However, it is undefined / architecture specific whether
- * ARCH_MEMREMAP_PMEM + default_memcpy_to_pmem is sufficient for
- * making data durable relative to i/o completion.
- */
-static inline void default_memcpy_to_pmem(void __pmem *dst, const void *src,
-		size_t size)
-{
-	memcpy((void __force *) dst, src, size);
-}
-
-static inline size_t default_copy_from_iter_pmem(void __pmem *addr,
-		size_t bytes, struct iov_iter *i)
-{
-	return copy_from_iter_nocache((void __force *)addr, bytes, i);
-}
-
-static inline void default_clear_pmem(void __pmem *addr, size_t size)
-{
-	if (size == PAGE_SIZE && ((unsigned long)addr & ~PAGE_MASK) == 0)
-		clear_page((void __force *)addr);
-	else
-		memset((void __force *)addr, 0, size);
+		memcpy(dst, src, size);
+	return 0;
 }
 
 /**
@@ -130,12 +94,12 @@ static inline void default_clear_pmem(void __pmem *addr, size_t size)
  * data may still reside in cpu or platform buffers, so this operation
  * must be followed by a blkdev_issue_flush() on the pmem block device.
  */
-static inline void memcpy_to_pmem(void __pmem *dst, const void *src, size_t n)
+static inline void memcpy_to_pmem(void *dst, const void *src, size_t n)
 {
 	if (arch_has_pmem_api())
 		arch_memcpy_to_pmem(dst, src, n);
 	else
-		default_memcpy_to_pmem(dst, src, n);
+		memcpy(dst, src, n);
 }
 
 /**
@@ -147,12 +111,12 @@ static inline void memcpy_to_pmem(void __pmem *dst, const void *src, size_t n)
  * Copy data from the iterator 'i' to the PMEM buffer starting at 'addr'.
  * See blkdev_issue_flush() note for memcpy_to_pmem().
  */
-static inline size_t copy_from_iter_pmem(void __pmem *addr, size_t bytes,
+static inline size_t copy_from_iter_pmem(void *addr, size_t bytes,
 		struct iov_iter *i)
 {
 	if (arch_has_pmem_api())
 		return arch_copy_from_iter_pmem(addr, bytes, i);
-	return default_copy_from_iter_pmem(addr, bytes, i);
+	return copy_from_iter_nocache(addr, bytes, i);
 }
 
 /**
@@ -163,12 +127,12 @@ static inline size_t copy_from_iter_pmem(void __pmem *addr, size_t bytes,
  * Write zeros into the memory range starting at 'addr' for 'size' bytes.
  * See blkdev_issue_flush() note for memcpy_to_pmem().
  */
-static inline void clear_pmem(void __pmem *addr, size_t size)
+static inline void clear_pmem(void *addr, size_t size)
 {
 	if (arch_has_pmem_api())
 		arch_clear_pmem(addr, size);
 	else
-		default_clear_pmem(addr, size);
+		memset(addr, 0, size);
 }
 
 /**
@@ -179,7 +143,7 @@ static inline void clear_pmem(void __pmem *addr, size_t size)
  * For platforms that support clearing poison this flushes any poisoned
  * ranges out of the cache
  */
-static inline void invalidate_pmem(void __pmem *addr, size_t size)
+static inline void invalidate_pmem(void *addr, size_t size)
 {
 	if (arch_has_pmem_api())
 		arch_invalidate_pmem(addr, size);
@@ -193,7 +157,7 @@ static inline void invalidate_pmem(void __pmem *addr, size_t size)
  * Write back the processor cache range starting at 'addr' for 'size' bytes.
  * See blkdev_issue_flush() note for memcpy_to_pmem().
  */
-static inline void wb_cache_pmem(void __pmem *addr, size_t size)
+static inline void wb_cache_pmem(void *addr, size_t size)
 {
 	if (arch_has_pmem_api())
 		arch_wb_cache_pmem(addr, size);
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 6750595bd7b8..32f01a1a28bb 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -313,7 +313,6 @@ our $Sparse	= qr{
 			__kernel|
 			__force|
 			__iomem|
-			__pmem|
 			__must_check|
 			__init_refok|
 			__kprobes|

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/13] driver core, libnvdimm: disable manual unbind of dimms while region active
  2016-06-04 20:52 ` [PATCH 01/13] driver core, libnvdimm: disable manual unbind of dimms while region active Dan Williams
@ 2016-06-04 21:10   ` Greg Kroah-Hartman
  2016-06-04 21:39     ` Dan Williams
  2016-06-04 21:50   ` kbuild test robot
  2016-06-06 19:25   ` Linda Knippers
  2 siblings, 1 reply; 38+ messages in thread
From: Greg Kroah-Hartman @ 2016-06-04 21:10 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm, david, linux-kernel, hch

On Sat, Jun 04, 2016 at 01:52:38PM -0700, Dan Williams wrote:
> There are scenarios where we need a middle ground between disabling all
> manual bind/unbind attempts (via driver->suppress_bind_attrs) and
> allowing unbind at any userspace-determined time.  Pinning modules takes
> away one vector for unwanted out-of-sequence device_release_driver()
> invocations, this new mechanism (via device->suppress_unbind_attr) takes
> away another.
> 
> The first user of this mechanism is the libnvdimm sub-system where
> manual dimm disabling should be prevented while the dimm is active in
> any region.  Note that there is a 1:N dimm-to-region relationship which
> is why this is implemented as a disable count rather than a flag.  This
> forces userspace to disable regions before dimms when manually shutting
> down a bus topology.
> 
> This does not affect any of the kernel-internal initiated invocations of
> device_release_driver().
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/base/base.h             |    1 +
>  drivers/base/bus.c              |   12 ++++++++++--
>  drivers/base/core.c             |    1 +
>  drivers/base/dd.c               |    2 +-
>  drivers/nvdimm/namespace_devs.c |    1 +
>  drivers/nvdimm/region_devs.c    |    4 +++-
>  include/linux/device.h          |   20 ++++++++++++++++++++
>  7 files changed, 37 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/base/base.h b/drivers/base/base.h
> index e05db388bd1c..129814b17ca6 100644
> --- a/drivers/base/base.h
> +++ b/drivers/base/base.h
> @@ -109,6 +109,7 @@ extern int bus_add_driver(struct device_driver *drv);
>  extern void bus_remove_driver(struct device_driver *drv);
>  
>  extern void driver_detach(struct device_driver *drv);
> +extern void __device_release_driver(struct device *dev);
>  extern int driver_probe_device(struct device_driver *drv, struct device *dev);
>  extern void driver_deferred_probe_del(struct device *dev);
>  static inline int driver_match_device(struct device_driver *drv,
> diff --git a/drivers/base/bus.c b/drivers/base/bus.c
> index 6470eb8088f4..b48a903f2d28 100644
> --- a/drivers/base/bus.c
> +++ b/drivers/base/bus.c
> @@ -188,10 +188,18 @@ static ssize_t unbind_store(struct device_driver *drv, const char *buf,
>  	if (dev && dev->driver == drv) {
>  		if (dev->parent)	/* Needed for USB */
>  			device_lock(dev->parent);
> -		device_release_driver(dev);
> +
> +		device_lock(dev);
> +		if (atomic_read(&dev->suppress_unbind_attr) > 0)
> +			err = -EBUSY;
> +		else {
> +			__device_release_driver(dev);
> +			err = count;
> +		}
> +		device_unlock(dev);
> +
>  		if (dev->parent)
>  			device_unlock(dev->parent);
> -		err = count;
>  	}
>  	put_device(dev);
>  	bus_put(bus);
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 0a8bdade53f2..0ea0e8560e1d 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -708,6 +708,7 @@ void device_initialize(struct device *dev)
>  	INIT_LIST_HEAD(&dev->devres_head);
>  	device_pm_init(dev);
>  	set_dev_node(dev, -1);
> +	atomic_set(&dev->suppress_unbind_attr, 0);
>  #ifdef CONFIG_GENERIC_MSI_IRQ
>  	INIT_LIST_HEAD(&dev->msi_list);
>  #endif
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 16688f50729c..9e21817ca2d6 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -756,7 +756,7 @@ EXPORT_SYMBOL_GPL(driver_attach);
>   * __device_release_driver() must be called with @dev lock held.
>   * When called for a USB interface, @dev->parent lock must be held as well.
>   */
> -static void __device_release_driver(struct device *dev)
> +void __device_release_driver(struct device *dev)
>  {
>  	struct device_driver *drv;
>  
> diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
> index c5e3196c45b0..e65572b6092c 100644
> --- a/drivers/nvdimm/namespace_devs.c
> +++ b/drivers/nvdimm/namespace_devs.c
> @@ -1950,6 +1950,7 @@ static int init_active_labels(struct nd_region *nd_region)
>  		}
>  		nd_mapping->ndd = ndd;
>  		atomic_inc(&nvdimm->busy);
> +		device_disable_unbind_attr(&nvdimm->dev);
>  		get_ndd(ndd);
>  
>  		count = nd_label_active_count(ndd);
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index 40fcfea26fbb..320f0f3ea367 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -427,8 +427,10 @@ static void nd_region_notify_driver_action(struct nvdimm_bus *nvdimm_bus,
>  			nd_mapping->labels = NULL;
>  			put_ndd(ndd);
>  			nd_mapping->ndd = NULL;
> -			if (ndd)
> +			if (ndd) {
>  				atomic_dec(&nvdimm->busy);
> +				device_enable_unbind_attr(&nvdimm->dev);
> +			}
>  		}
>  
>  		if (is_nd_pmem(dev))
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 38f02814d53a..d9eaa85bb9cf 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -849,6 +849,7 @@ struct device {
>  
>  	void	(*release)(struct device *dev);
>  	struct iommu_group	*iommu_group;
> +	atomic_t		suppress_unbind_attr; /* disable manual unbind */
>  
>  	bool			offline_disabled:1;
>  	bool			offline:1;
> @@ -988,6 +989,25 @@ static inline void device_lock_assert(struct device *dev)
>  	lockdep_assert_held(&dev->mutex);
>  }
>  
> +static inline bool device_disable_unbind_attr(struct device *dev)
> +{
> +	bool suppressed = false;
> +
> +	device_lock(dev);
> +	if (dev->driver) {
> +		atomic_inc(&dev->suppress_unbind_attr);
> +		suppressed = true;
> +	}
> +	device_unlock(dev);
> +
> +	return suppressed;
> +}
> +
> +static inline bool device_enable_unbind_attr(struct device *dev)
> +{
> +	return atomic_dec_and_test(&dev->suppress_unbind_attr);
> +}
> +

Ick, that's a mess.

Why not just prevent the unbinding from happening in your bus when you
need it?

And as you are grabbing a lock, why is this an atomic variable?

This is just making things _really_ complex for very limited gain for
what I can tell.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 04/13] libnvdimm, nfit: move flush hint mapping to dimm driver
  2016-06-04 20:52 ` [PATCH 04/13] libnvdimm, nfit: move flush hint mapping to dimm driver Dan Williams
@ 2016-06-04 21:29   ` kbuild test robot
  2016-06-04 21:40   ` kbuild test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 38+ messages in thread
From: kbuild test robot @ 2016-06-04 21:29 UTC (permalink / raw)
  To: Dan Williams
  Cc: kbuild-all, linux-nvdimm, Ross Zwisler, david, linux-kernel, hch

[-- Attachment #1: Type: text/plain, Size: 1907 bytes --]

Hi,

[auto build test ERROR on v4.7-rc1]
[also build test ERROR on next-20160603]
[cannot apply to linux-nvdimm/libnvdimm-for-next tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Dan-Williams/deprecate-pcommit/20160605-045935
config: x86_64-randconfig-x018-201623 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   In file included from drivers/acpi/nfit.c:14:0:
   include/linux/libnvdimm.h:78:37: warning: 'struct device' declared inside parameter list will not be visible outside of this definition or declaration
     int (*populate_flush_hints)(struct device *dev,
                                        ^~~~~~
   drivers/acpi/nfit.c: In function 'acpi_nfit_desc_init':
>> drivers/acpi/nfit.c:2528:32: error: assignment from incompatible pointer type [-Werror=incompatible-pointer-types]
     nd_desc->populate_flush_hints = acpi_nfit_populate_flush_hints;
                                   ^
   cc1: some warnings being treated as errors

vim +2528 drivers/acpi/nfit.c

  2522		dev_set_drvdata(dev, acpi_desc);
  2523		acpi_desc->dev = dev;
  2524		acpi_desc->blk_do_io = acpi_nfit_blk_region_do_io;
  2525		nd_desc = &acpi_desc->nd_desc;
  2526		nd_desc->provider_name = "ACPI.NFIT";
  2527		nd_desc->ndctl = acpi_nfit_ctl;
> 2528		nd_desc->populate_flush_hints = acpi_nfit_populate_flush_hints;
  2529		nd_desc->flush_probe = acpi_nfit_flush_probe;
  2530		nd_desc->clear_to_send = acpi_nfit_clear_to_send;
  2531		nd_desc->attr_groups = acpi_nfit_attribute_groups;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 29846 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/13] driver core, libnvdimm: disable manual unbind of dimms while region active
  2016-06-04 21:10   ` Greg Kroah-Hartman
@ 2016-06-04 21:39     ` Dan Williams
  2016-06-04 21:45       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 38+ messages in thread
From: Dan Williams @ 2016-06-04 21:39 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-nvdimm@lists.01.org, david, linux-kernel, Christoph Hellwig

On Sat, Jun 4, 2016 at 2:10 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Sat, Jun 04, 2016 at 01:52:38PM -0700, Dan Williams wrote:
>> There are scenarios where we need a middle ground between disabling all
>> manual bind/unbind attempts (via driver->suppress_bind_attrs) and
>> allowing unbind at any userspace-determined time.  Pinning modules takes
>> away one vector for unwanted out-of-sequence device_release_driver()
>> invocations, this new mechanism (via device->suppress_unbind_attr) takes
>> away another.
>>
>> The first user of this mechanism is the libnvdimm sub-system where
>> manual dimm disabling should be prevented while the dimm is active in
>> any region.  Note that there is a 1:N dimm-to-region relationship which
>> is why this is implemented as a disable count rather than a flag.  This
>> forces userspace to disable regions before dimms when manually shutting
>> down a bus topology.
>>
>> This does not affect any of the kernel-internal initiated invocations of
>> device_release_driver().
>>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  drivers/base/base.h             |    1 +
>>  drivers/base/bus.c              |   12 ++++++++++--
>>  drivers/base/core.c             |    1 +
>>  drivers/base/dd.c               |    2 +-
>>  drivers/nvdimm/namespace_devs.c |    1 +
>>  drivers/nvdimm/region_devs.c    |    4 +++-
>>  include/linux/device.h          |   20 ++++++++++++++++++++
>>  7 files changed, 37 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/base/base.h b/drivers/base/base.h
>> index e05db388bd1c..129814b17ca6 100644
>> --- a/drivers/base/base.h
>> +++ b/drivers/base/base.h
>> @@ -109,6 +109,7 @@ extern int bus_add_driver(struct device_driver *drv);
>>  extern void bus_remove_driver(struct device_driver *drv);
>>
>>  extern void driver_detach(struct device_driver *drv);
>> +extern void __device_release_driver(struct device *dev);
>>  extern int driver_probe_device(struct device_driver *drv, struct device *dev);
>>  extern void driver_deferred_probe_del(struct device *dev);
>>  static inline int driver_match_device(struct device_driver *drv,
>> diff --git a/drivers/base/bus.c b/drivers/base/bus.c
>> index 6470eb8088f4..b48a903f2d28 100644
>> --- a/drivers/base/bus.c
>> +++ b/drivers/base/bus.c
>> @@ -188,10 +188,18 @@ static ssize_t unbind_store(struct device_driver *drv, const char *buf,
>>       if (dev && dev->driver == drv) {
>>               if (dev->parent)        /* Needed for USB */
>>                       device_lock(dev->parent);
>> -             device_release_driver(dev);
>> +
>> +             device_lock(dev);
>> +             if (atomic_read(&dev->suppress_unbind_attr) > 0)
>> +                     err = -EBUSY;
>> +             else {
>> +                     __device_release_driver(dev);
>> +                     err = count;
>> +             }
>> +             device_unlock(dev);
>> +
>>               if (dev->parent)
>>                       device_unlock(dev->parent);
>> -             err = count;
>>       }
>>       put_device(dev);
>>       bus_put(bus);
>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>> index 0a8bdade53f2..0ea0e8560e1d 100644
>> --- a/drivers/base/core.c
>> +++ b/drivers/base/core.c
>> @@ -708,6 +708,7 @@ void device_initialize(struct device *dev)
>>       INIT_LIST_HEAD(&dev->devres_head);
>>       device_pm_init(dev);
>>       set_dev_node(dev, -1);
>> +     atomic_set(&dev->suppress_unbind_attr, 0);
>>  #ifdef CONFIG_GENERIC_MSI_IRQ
>>       INIT_LIST_HEAD(&dev->msi_list);
>>  #endif
>> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
>> index 16688f50729c..9e21817ca2d6 100644
>> --- a/drivers/base/dd.c
>> +++ b/drivers/base/dd.c
>> @@ -756,7 +756,7 @@ EXPORT_SYMBOL_GPL(driver_attach);
>>   * __device_release_driver() must be called with @dev lock held.
>>   * When called for a USB interface, @dev->parent lock must be held as well.
>>   */
>> -static void __device_release_driver(struct device *dev)
>> +void __device_release_driver(struct device *dev)
>>  {
>>       struct device_driver *drv;
>>
>> diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
>> index c5e3196c45b0..e65572b6092c 100644
>> --- a/drivers/nvdimm/namespace_devs.c
>> +++ b/drivers/nvdimm/namespace_devs.c
>> @@ -1950,6 +1950,7 @@ static int init_active_labels(struct nd_region *nd_region)
>>               }
>>               nd_mapping->ndd = ndd;
>>               atomic_inc(&nvdimm->busy);
>> +             device_disable_unbind_attr(&nvdimm->dev);
>>               get_ndd(ndd);
>>
>>               count = nd_label_active_count(ndd);
>> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
>> index 40fcfea26fbb..320f0f3ea367 100644
>> --- a/drivers/nvdimm/region_devs.c
>> +++ b/drivers/nvdimm/region_devs.c
>> @@ -427,8 +427,10 @@ static void nd_region_notify_driver_action(struct nvdimm_bus *nvdimm_bus,
>>                       nd_mapping->labels = NULL;
>>                       put_ndd(ndd);
>>                       nd_mapping->ndd = NULL;
>> -                     if (ndd)
>> +                     if (ndd) {
>>                               atomic_dec(&nvdimm->busy);
>> +                             device_enable_unbind_attr(&nvdimm->dev);
>> +                     }
>>               }
>>
>>               if (is_nd_pmem(dev))
>> diff --git a/include/linux/device.h b/include/linux/device.h
>> index 38f02814d53a..d9eaa85bb9cf 100644
>> --- a/include/linux/device.h
>> +++ b/include/linux/device.h
>> @@ -849,6 +849,7 @@ struct device {
>>
>>       void    (*release)(struct device *dev);
>>       struct iommu_group      *iommu_group;
>> +     atomic_t                suppress_unbind_attr; /* disable manual unbind */
>>
>>       bool                    offline_disabled:1;
>>       bool                    offline:1;
>> @@ -988,6 +989,25 @@ static inline void device_lock_assert(struct device *dev)
>>       lockdep_assert_held(&dev->mutex);
>>  }
>>
>> +static inline bool device_disable_unbind_attr(struct device *dev)
>> +{
>> +     bool suppressed = false;
>> +
>> +     device_lock(dev);
>> +     if (dev->driver) {
>> +             atomic_inc(&dev->suppress_unbind_attr);
>> +             suppressed = true;
>> +     }
>> +     device_unlock(dev);
>> +
>> +     return suppressed;
>> +}
>> +
>> +static inline bool device_enable_unbind_attr(struct device *dev)
>> +{
>> +     return atomic_dec_and_test(&dev->suppress_unbind_attr);
>> +}
>> +
>
> Ick, that's a mess.
>
> Why not just prevent the unbinding from happening in your bus when you
> need it?

Because historically unbind never fails...

    void device_release_driver(struct device *dev);

> And as you are grabbing a lock, why is this an atomic variable?
>
> This is just making things _really_ complex for very limited gain for
> what I can tell.

I thought it was cleaner to have the disable action synchronize with
unbind, but the lock isn't needed to re-enable unbind.  I can move the
complexity to the caller to bounce the device_lock() and re-validate
that dev->driver is still set, but that does not seem cleaner.

If making device_release_driver() fail is an option I can take a look,
but that seems like a major change in policy.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 04/13] libnvdimm, nfit: move flush hint mapping to dimm driver
  2016-06-04 20:52 ` [PATCH 04/13] libnvdimm, nfit: move flush hint mapping to dimm driver Dan Williams
  2016-06-04 21:29   ` kbuild test robot
@ 2016-06-04 21:40   ` kbuild test robot
  2016-06-04 21:49   ` kbuild test robot
  2016-06-07 18:11   ` Kani, Toshimitsu
  3 siblings, 0 replies; 38+ messages in thread
From: kbuild test robot @ 2016-06-04 21:40 UTC (permalink / raw)
  To: Dan Williams
  Cc: kbuild-all, linux-nvdimm, Ross Zwisler, david, linux-kernel, hch

[-- Attachment #1: Type: text/plain, Size: 1370 bytes --]

Hi,

[auto build test ERROR on v4.7-rc1]
[also build test ERROR on next-20160603]
[cannot apply to linux-nvdimm/libnvdimm-for-next tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Dan-Williams/deprecate-pcommit/20160605-045935
config: i386-randconfig-s1-201623 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/nvdimm/region_devs.c: In function 'nvdimm_flush':
>> drivers/nvdimm/region_devs.c:826:4: error: implicit declaration of function 'writeq' [-Werror=implicit-function-declaration]
       writeq(1, ndd->flush_wpq[0]);
       ^~~~~~
   cc1: some warnings being treated as errors

vim +/writeq +826 drivers/nvdimm/region_devs.c

   820			/*
   821			 * Note, nvdimm_drvdata guaranteed to be live since we
   822			 * arrange for all associated regions to be disabled
   823			 * before the dimm is disabled.
   824			 */
   825			if (ndd->flush_wpq[0])
 > 826				writeq(1, ndd->flush_wpq[0]);
   827		}
   828		wmb();
   829	}

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 24622 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/13] driver core, libnvdimm: disable manual unbind of dimms while region active
  2016-06-04 21:39     ` Dan Williams
@ 2016-06-04 21:45       ` Greg Kroah-Hartman
  2016-06-04 21:48         ` Dan Williams
  0 siblings, 1 reply; 38+ messages in thread
From: Greg Kroah-Hartman @ 2016-06-04 21:45 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm@lists.01.org, david, linux-kernel, Christoph Hellwig

On Sat, Jun 04, 2016 at 02:39:02PM -0700, Dan Williams wrote:
> On Sat, Jun 4, 2016 at 2:10 PM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Sat, Jun 04, 2016 at 01:52:38PM -0700, Dan Williams wrote:
> >> There are scenarios where we need a middle ground between disabling all
> >> manual bind/unbind attempts (via driver->suppress_bind_attrs) and
> >> allowing unbind at any userspace-determined time.  Pinning modules takes
> >> away one vector for unwanted out-of-sequence device_release_driver()
> >> invocations, this new mechanism (via device->suppress_unbind_attr) takes
> >> away another.
> >>
> >> The first user of this mechanism is the libnvdimm sub-system where
> >> manual dimm disabling should be prevented while the dimm is active in
> >> any region.  Note that there is a 1:N dimm-to-region relationship which
> >> is why this is implemented as a disable count rather than a flag.  This
> >> forces userspace to disable regions before dimms when manually shutting
> >> down a bus topology.
> >>
> >> This does not affect any of the kernel-internal initiated invocations of
> >> device_release_driver().
> >>
> >> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> >> ---
> >>  drivers/base/base.h             |    1 +
> >>  drivers/base/bus.c              |   12 ++++++++++--
> >>  drivers/base/core.c             |    1 +
> >>  drivers/base/dd.c               |    2 +-
> >>  drivers/nvdimm/namespace_devs.c |    1 +
> >>  drivers/nvdimm/region_devs.c    |    4 +++-
> >>  include/linux/device.h          |   20 ++++++++++++++++++++
> >>  7 files changed, 37 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/base/base.h b/drivers/base/base.h
> >> index e05db388bd1c..129814b17ca6 100644
> >> --- a/drivers/base/base.h
> >> +++ b/drivers/base/base.h
> >> @@ -109,6 +109,7 @@ extern int bus_add_driver(struct device_driver *drv);
> >>  extern void bus_remove_driver(struct device_driver *drv);
> >>
> >>  extern void driver_detach(struct device_driver *drv);
> >> +extern void __device_release_driver(struct device *dev);
> >>  extern int driver_probe_device(struct device_driver *drv, struct device *dev);
> >>  extern void driver_deferred_probe_del(struct device *dev);
> >>  static inline int driver_match_device(struct device_driver *drv,
> >> diff --git a/drivers/base/bus.c b/drivers/base/bus.c
> >> index 6470eb8088f4..b48a903f2d28 100644
> >> --- a/drivers/base/bus.c
> >> +++ b/drivers/base/bus.c
> >> @@ -188,10 +188,18 @@ static ssize_t unbind_store(struct device_driver *drv, const char *buf,
> >>       if (dev && dev->driver == drv) {
> >>               if (dev->parent)        /* Needed for USB */
> >>                       device_lock(dev->parent);
> >> -             device_release_driver(dev);
> >> +
> >> +             device_lock(dev);
> >> +             if (atomic_read(&dev->suppress_unbind_attr) > 0)
> >> +                     err = -EBUSY;
> >> +             else {
> >> +                     __device_release_driver(dev);
> >> +                     err = count;
> >> +             }
> >> +             device_unlock(dev);
> >> +
> >>               if (dev->parent)
> >>                       device_unlock(dev->parent);
> >> -             err = count;
> >>       }
> >>       put_device(dev);
> >>       bus_put(bus);
> >> diff --git a/drivers/base/core.c b/drivers/base/core.c
> >> index 0a8bdade53f2..0ea0e8560e1d 100644
> >> --- a/drivers/base/core.c
> >> +++ b/drivers/base/core.c
> >> @@ -708,6 +708,7 @@ void device_initialize(struct device *dev)
> >>       INIT_LIST_HEAD(&dev->devres_head);
> >>       device_pm_init(dev);
> >>       set_dev_node(dev, -1);
> >> +     atomic_set(&dev->suppress_unbind_attr, 0);
> >>  #ifdef CONFIG_GENERIC_MSI_IRQ
> >>       INIT_LIST_HEAD(&dev->msi_list);
> >>  #endif
> >> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> >> index 16688f50729c..9e21817ca2d6 100644
> >> --- a/drivers/base/dd.c
> >> +++ b/drivers/base/dd.c
> >> @@ -756,7 +756,7 @@ EXPORT_SYMBOL_GPL(driver_attach);
> >>   * __device_release_driver() must be called with @dev lock held.
> >>   * When called for a USB interface, @dev->parent lock must be held as well.
> >>   */
> >> -static void __device_release_driver(struct device *dev)
> >> +void __device_release_driver(struct device *dev)
> >>  {
> >>       struct device_driver *drv;
> >>
> >> diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
> >> index c5e3196c45b0..e65572b6092c 100644
> >> --- a/drivers/nvdimm/namespace_devs.c
> >> +++ b/drivers/nvdimm/namespace_devs.c
> >> @@ -1950,6 +1950,7 @@ static int init_active_labels(struct nd_region *nd_region)
> >>               }
> >>               nd_mapping->ndd = ndd;
> >>               atomic_inc(&nvdimm->busy);
> >> +             device_disable_unbind_attr(&nvdimm->dev);
> >>               get_ndd(ndd);
> >>
> >>               count = nd_label_active_count(ndd);
> >> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> >> index 40fcfea26fbb..320f0f3ea367 100644
> >> --- a/drivers/nvdimm/region_devs.c
> >> +++ b/drivers/nvdimm/region_devs.c
> >> @@ -427,8 +427,10 @@ static void nd_region_notify_driver_action(struct nvdimm_bus *nvdimm_bus,
> >>                       nd_mapping->labels = NULL;
> >>                       put_ndd(ndd);
> >>                       nd_mapping->ndd = NULL;
> >> -                     if (ndd)
> >> +                     if (ndd) {
> >>                               atomic_dec(&nvdimm->busy);
> >> +                             device_enable_unbind_attr(&nvdimm->dev);
> >> +                     }
> >>               }
> >>
> >>               if (is_nd_pmem(dev))
> >> diff --git a/include/linux/device.h b/include/linux/device.h
> >> index 38f02814d53a..d9eaa85bb9cf 100644
> >> --- a/include/linux/device.h
> >> +++ b/include/linux/device.h
> >> @@ -849,6 +849,7 @@ struct device {
> >>
> >>       void    (*release)(struct device *dev);
> >>       struct iommu_group      *iommu_group;
> >> +     atomic_t                suppress_unbind_attr; /* disable manual unbind */
> >>
> >>       bool                    offline_disabled:1;
> >>       bool                    offline:1;
> >> @@ -988,6 +989,25 @@ static inline void device_lock_assert(struct device *dev)
> >>       lockdep_assert_held(&dev->mutex);
> >>  }
> >>
> >> +static inline bool device_disable_unbind_attr(struct device *dev)
> >> +{
> >> +     bool suppressed = false;
> >> +
> >> +     device_lock(dev);
> >> +     if (dev->driver) {
> >> +             atomic_inc(&dev->suppress_unbind_attr);
> >> +             suppressed = true;
> >> +     }
> >> +     device_unlock(dev);
> >> +
> >> +     return suppressed;
> >> +}
> >> +
> >> +static inline bool device_enable_unbind_attr(struct device *dev)
> >> +{
> >> +     return atomic_dec_and_test(&dev->suppress_unbind_attr);
> >> +}
> >> +
> >
> > Ick, that's a mess.
> >
> > Why not just prevent the unbinding from happening in your bus when you
> > need it?
> 
> Because historically unbind never fails...
> 
>     void device_release_driver(struct device *dev);

Ah, yes, forgot about that.

> > And as you are grabbing a lock, why is this an atomic variable?
> >
> > This is just making things _really_ complex for very limited gain for
> > what I can tell.
> 
> I thought it was cleaner to have the disable action synchronize with
> unbind, but the lock isn't needed to re-enable unbind.  I can move the
> complexity to the caller to bounce the device_lock() and re-validate
> that dev->driver is still set, but that does not seem cleaner.

No, if the user wants to unbind, then they can unbind, don't try to
muddy up the situation by letting it work sometimes and others not at
all.

bind/unbind is a "I really really know what I am doing here" action,
it's rare, and the user gets to keep both pieces if something fails.  I
know some busses don't like it, so we allow them to opt-out.  But to
make it "sometimes yes, sometimes no" is a mess, I don't want to support
that.

I suggest you just don't allow this if you don't want it.  It's not
anything you should ever be relying on for configuration so it shouldn't
be a big deal.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/13] driver core, libnvdimm: disable manual unbind of dimms while region active
  2016-06-04 21:45       ` Greg Kroah-Hartman
@ 2016-06-04 21:48         ` Dan Williams
  0 siblings, 0 replies; 38+ messages in thread
From: Dan Williams @ 2016-06-04 21:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-nvdimm@lists.01.org, david, linux-kernel, Christoph Hellwig

On Sat, Jun 4, 2016 at 2:45 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Sat, Jun 04, 2016 at 02:39:02PM -0700, Dan Williams wrote:
>> On Sat, Jun 4, 2016 at 2:10 PM, Greg Kroah-Hartman
>> <gregkh@linuxfoundation.org> wrote:
>> > On Sat, Jun 04, 2016 at 01:52:38PM -0700, Dan Williams wrote:
>> >> There are scenarios where we need a middle ground between disabling all
>> >> manual bind/unbind attempts (via driver->suppress_bind_attrs) and
>> >> allowing unbind at any userspace-determined time.  Pinning modules takes
>> >> away one vector for unwanted out-of-sequence device_release_driver()
>> >> invocations, this new mechanism (via device->suppress_unbind_attr) takes
>> >> away another.
>> >>
>> >> The first user of this mechanism is the libnvdimm sub-system where
>> >> manual dimm disabling should be prevented while the dimm is active in
>> >> any region.  Note that there is a 1:N dimm-to-region relationship which
>> >> is why this is implemented as a disable count rather than a flag.  This
>> >> forces userspace to disable regions before dimms when manually shutting
>> >> down a bus topology.
>> >>
>> >> This does not affect any of the kernel-internal initiated invocations of
>> >> device_release_driver().
>> >>
>> >> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> >> ---
>> >>  drivers/base/base.h             |    1 +
>> >>  drivers/base/bus.c              |   12 ++++++++++--
>> >>  drivers/base/core.c             |    1 +
>> >>  drivers/base/dd.c               |    2 +-
>> >>  drivers/nvdimm/namespace_devs.c |    1 +
>> >>  drivers/nvdimm/region_devs.c    |    4 +++-
>> >>  include/linux/device.h          |   20 ++++++++++++++++++++
>> >>  7 files changed, 37 insertions(+), 4 deletions(-)
>> >>
>> >> diff --git a/drivers/base/base.h b/drivers/base/base.h
>> >> index e05db388bd1c..129814b17ca6 100644
>> >> --- a/drivers/base/base.h
>> >> +++ b/drivers/base/base.h
>> >> @@ -109,6 +109,7 @@ extern int bus_add_driver(struct device_driver *drv);
>> >>  extern void bus_remove_driver(struct device_driver *drv);
>> >>
>> >>  extern void driver_detach(struct device_driver *drv);
>> >> +extern void __device_release_driver(struct device *dev);
>> >>  extern int driver_probe_device(struct device_driver *drv, struct device *dev);
>> >>  extern void driver_deferred_probe_del(struct device *dev);
>> >>  static inline int driver_match_device(struct device_driver *drv,
>> >> diff --git a/drivers/base/bus.c b/drivers/base/bus.c
>> >> index 6470eb8088f4..b48a903f2d28 100644
>> >> --- a/drivers/base/bus.c
>> >> +++ b/drivers/base/bus.c
>> >> @@ -188,10 +188,18 @@ static ssize_t unbind_store(struct device_driver *drv, const char *buf,
>> >>       if (dev && dev->driver == drv) {
>> >>               if (dev->parent)        /* Needed for USB */
>> >>                       device_lock(dev->parent);
>> >> -             device_release_driver(dev);
>> >> +
>> >> +             device_lock(dev);
>> >> +             if (atomic_read(&dev->suppress_unbind_attr) > 0)
>> >> +                     err = -EBUSY;
>> >> +             else {
>> >> +                     __device_release_driver(dev);
>> >> +                     err = count;
>> >> +             }
>> >> +             device_unlock(dev);
>> >> +
>> >>               if (dev->parent)
>> >>                       device_unlock(dev->parent);
>> >> -             err = count;
>> >>       }
>> >>       put_device(dev);
>> >>       bus_put(bus);
>> >> diff --git a/drivers/base/core.c b/drivers/base/core.c
>> >> index 0a8bdade53f2..0ea0e8560e1d 100644
>> >> --- a/drivers/base/core.c
>> >> +++ b/drivers/base/core.c
>> >> @@ -708,6 +708,7 @@ void device_initialize(struct device *dev)
>> >>       INIT_LIST_HEAD(&dev->devres_head);
>> >>       device_pm_init(dev);
>> >>       set_dev_node(dev, -1);
>> >> +     atomic_set(&dev->suppress_unbind_attr, 0);
>> >>  #ifdef CONFIG_GENERIC_MSI_IRQ
>> >>       INIT_LIST_HEAD(&dev->msi_list);
>> >>  #endif
>> >> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
>> >> index 16688f50729c..9e21817ca2d6 100644
>> >> --- a/drivers/base/dd.c
>> >> +++ b/drivers/base/dd.c
>> >> @@ -756,7 +756,7 @@ EXPORT_SYMBOL_GPL(driver_attach);
>> >>   * __device_release_driver() must be called with @dev lock held.
>> >>   * When called for a USB interface, @dev->parent lock must be held as well.
>> >>   */
>> >> -static void __device_release_driver(struct device *dev)
>> >> +void __device_release_driver(struct device *dev)
>> >>  {
>> >>       struct device_driver *drv;
>> >>
>> >> diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
>> >> index c5e3196c45b0..e65572b6092c 100644
>> >> --- a/drivers/nvdimm/namespace_devs.c
>> >> +++ b/drivers/nvdimm/namespace_devs.c
>> >> @@ -1950,6 +1950,7 @@ static int init_active_labels(struct nd_region *nd_region)
>> >>               }
>> >>               nd_mapping->ndd = ndd;
>> >>               atomic_inc(&nvdimm->busy);
>> >> +             device_disable_unbind_attr(&nvdimm->dev);
>> >>               get_ndd(ndd);
>> >>
>> >>               count = nd_label_active_count(ndd);
>> >> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
>> >> index 40fcfea26fbb..320f0f3ea367 100644
>> >> --- a/drivers/nvdimm/region_devs.c
>> >> +++ b/drivers/nvdimm/region_devs.c
>> >> @@ -427,8 +427,10 @@ static void nd_region_notify_driver_action(struct nvdimm_bus *nvdimm_bus,
>> >>                       nd_mapping->labels = NULL;
>> >>                       put_ndd(ndd);
>> >>                       nd_mapping->ndd = NULL;
>> >> -                     if (ndd)
>> >> +                     if (ndd) {
>> >>                               atomic_dec(&nvdimm->busy);
>> >> +                             device_enable_unbind_attr(&nvdimm->dev);
>> >> +                     }
>> >>               }
>> >>
>> >>               if (is_nd_pmem(dev))
>> >> diff --git a/include/linux/device.h b/include/linux/device.h
>> >> index 38f02814d53a..d9eaa85bb9cf 100644
>> >> --- a/include/linux/device.h
>> >> +++ b/include/linux/device.h
>> >> @@ -849,6 +849,7 @@ struct device {
>> >>
>> >>       void    (*release)(struct device *dev);
>> >>       struct iommu_group      *iommu_group;
>> >> +     atomic_t                suppress_unbind_attr; /* disable manual unbind */
>> >>
>> >>       bool                    offline_disabled:1;
>> >>       bool                    offline:1;
>> >> @@ -988,6 +989,25 @@ static inline void device_lock_assert(struct device *dev)
>> >>       lockdep_assert_held(&dev->mutex);
>> >>  }
>> >>
>> >> +static inline bool device_disable_unbind_attr(struct device *dev)
>> >> +{
>> >> +     bool suppressed = false;
>> >> +
>> >> +     device_lock(dev);
>> >> +     if (dev->driver) {
>> >> +             atomic_inc(&dev->suppress_unbind_attr);
>> >> +             suppressed = true;
>> >> +     }
>> >> +     device_unlock(dev);
>> >> +
>> >> +     return suppressed;
>> >> +}
>> >> +
>> >> +static inline bool device_enable_unbind_attr(struct device *dev)
>> >> +{
>> >> +     return atomic_dec_and_test(&dev->suppress_unbind_attr);
>> >> +}
>> >> +
>> >
>> > Ick, that's a mess.
>> >
>> > Why not just prevent the unbinding from happening in your bus when you
>> > need it?
>>
>> Because historically unbind never fails...
>>
>>     void device_release_driver(struct device *dev);
>
> Ah, yes, forgot about that.
>
>> > And as you are grabbing a lock, why is this an atomic variable?
>> >
>> > This is just making things _really_ complex for very limited gain for
>> > what I can tell.
>>
>> I thought it was cleaner to have the disable action synchronize with
>> unbind, but the lock isn't needed to re-enable unbind.  I can move the
>> complexity to the caller to bounce the device_lock() and re-validate
>> that dev->driver is still set, but that does not seem cleaner.
>
> No, if the user wants to unbind, then they can unbind, don't try to
> muddy up the situation by letting it work sometimes and others not at
> all.
>
> bind/unbind is a "I really really know what I am doing here" action,
> it's rare, and the user gets to keep both pieces if something fails.  I
> know some busses don't like it, so we allow them to opt-out.  But to
> make it "sometimes yes, sometimes no" is a mess, I don't want to support
> that.
>
> I suggest you just don't allow this if you don't want it.  It's not
> anything you should ever be relying on for configuration so it shouldn't
> be a big deal.

Fair enough, I'll push this down into the sub-system.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 04/13] libnvdimm, nfit: move flush hint mapping to dimm driver
  2016-06-04 20:52 ` [PATCH 04/13] libnvdimm, nfit: move flush hint mapping to dimm driver Dan Williams
  2016-06-04 21:29   ` kbuild test robot
  2016-06-04 21:40   ` kbuild test robot
@ 2016-06-04 21:49   ` kbuild test robot
  2016-06-07 18:11   ` Kani, Toshimitsu
  3 siblings, 0 replies; 38+ messages in thread
From: kbuild test robot @ 2016-06-04 21:49 UTC (permalink / raw)
  To: Dan Williams
  Cc: kbuild-all, linux-nvdimm, Ross Zwisler, david, linux-kernel, hch

[-- Attachment #1: Type: text/plain, Size: 1994 bytes --]

Hi,

[auto build test WARNING on v4.7-rc1]
[also build test WARNING on next-20160603]
[cannot apply to linux-nvdimm/libnvdimm-for-next tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Dan-Williams/deprecate-pcommit/20160605-045935
config: x86_64-randconfig-s4-06050512 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from drivers/nvdimm/core.c:13:0:
>> include/linux/libnvdimm.h:78:37: warning: 'struct device' declared inside parameter list will not be visible outside of this definition or declaration
     int (*populate_flush_hints)(struct device *dev,
                                        ^~~~~~

vim +78 include/linux/libnvdimm.h

    62		 * get_ndd() + put_ndd(), all other nd_mapping to ndd
    63		 * conversions use to_ndd() which respects enabled state of the
    64		 * nvdimm.
    65		 */
    66		struct nvdimm_drvdata *ndd;
    67	};
    68	
    69	/**
    70	 * struct nvdimm_bus_descriptor - operations and attributes for an nvdimm bus
    71	 * @attr_groups: sysfs attributes for this bus
    72	 */
    73	struct nvdimm_bus_descriptor {
    74		const struct attribute_group **attr_groups;
    75		unsigned long cmd_mask;
    76		char *provider_name;
    77		ndctl_fn ndctl;
  > 78		int (*populate_flush_hints)(struct device *dev,
    79				void __iomem *flush_wpq[]);
    80		int (*flush_probe)(struct nvdimm_bus_descriptor *nd_desc);
    81		int (*clear_to_send)(struct nvdimm_bus_descriptor *nd_desc,
    82				struct nvdimm *nvdimm, unsigned int cmd);
    83	};
    84	
    85	struct nd_cmd_desc {
    86		int in_num;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 26903 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/13] driver core, libnvdimm: disable manual unbind of dimms while region active
  2016-06-04 20:52 ` [PATCH 01/13] driver core, libnvdimm: disable manual unbind of dimms while region active Dan Williams
  2016-06-04 21:10   ` Greg Kroah-Hartman
@ 2016-06-04 21:50   ` kbuild test robot
  2016-06-06 19:25   ` Linda Knippers
  2 siblings, 0 replies; 38+ messages in thread
From: kbuild test robot @ 2016-06-04 21:50 UTC (permalink / raw)
  To: Dan Williams
  Cc: kbuild-all, linux-nvdimm, Greg Kroah-Hartman, david, linux-kernel, hch

[-- Attachment #1: Type: text/plain, Size: 3336 bytes --]

Hi,

[auto build test WARNING on v4.7-rc1]
[also build test WARNING on next-20160603]
[cannot apply to linux-nvdimm/libnvdimm-for-next tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Dan-Williams/deprecate-pcommit/20160605-045935
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   include/linux/init.h:1: warning: no structured comments found
   kernel/sched/core.c:2079: warning: No description found for parameter 'cookie'
   kernel/sys.c:1: warning: no structured comments found
>> include/linux/device.h:856: warning: No description found for parameter 'suppress_unbind_attr'
   drivers/dma-buf/seqno-fence.c:1: warning: no structured comments found
   include/linux/fence.h:84: warning: No description found for parameter 'child_list'
   include/linux/fence.h:84: warning: No description found for parameter 'active_list'
   drivers/dma-buf/reservation.c:1: warning: no structured comments found
   include/linux/reservation.h:1: warning: no structured comments found

vim +/suppress_unbind_attr +856 include/linux/device.h

929d2fa5 Matthew Wilcox     2008-10-16  840  	dev_t			devt;	/* dev_t, creates the sysfs "dev" */
ca22e56d Kay Sievers        2011-12-14  841  	u32			id;	/* device instance */
929d2fa5 Matthew Wilcox     2008-10-16  842  
9ac7849e Tejun Heo          2007-01-20  843  	spinlock_t		devres_lock;
9ac7849e Tejun Heo          2007-01-20  844  	struct list_head	devres_head;
9ac7849e Tejun Heo          2007-01-20  845  
5a3ceb86 Tejun Heo          2008-08-25  846  	struct klist_node	knode_class;
b7a3e813 Kay Sievers        2006-10-07  847  	struct class		*class;
a4dbd674 David Brownell     2009-06-24  848  	const struct attribute_group **groups;	/* optional groups */
23681e47 Greg Kroah-Hartman 2006-06-14  849  
^1da177e Linus Torvalds     2005-04-16  850  	void	(*release)(struct device *dev);
74416e1e Alex Williamson    2012-05-30  851  	struct iommu_group	*iommu_group;
031a8908 Dan Williams       2016-06-04  852  	atomic_t		suppress_unbind_attr; /* disable manual unbind */
4f3549d7 Rafael J. Wysocki  2013-05-02  853  
4f3549d7 Rafael J. Wysocki  2013-05-02  854  	bool			offline_disabled:1;
4f3549d7 Rafael J. Wysocki  2013-05-02  855  	bool			offline:1;
^1da177e Linus Torvalds     2005-04-16 @856  };
^1da177e Linus Torvalds     2005-04-16  857  
a4232963 Lars-Peter Clausen 2012-07-03  858  static inline struct device *kobj_to_dev(struct kobject *kobj)
a4232963 Lars-Peter Clausen 2012-07-03  859  {
a4232963 Lars-Peter Clausen 2012-07-03  860  	return container_of(kobj, struct device, kobj);
a4232963 Lars-Peter Clausen 2012-07-03  861  }
a4232963 Lars-Peter Clausen 2012-07-03  862  
9a3df1f7 Alan Stern         2008-03-19  863  /* Get the wakeup routines, which depend on struct device */
9a3df1f7 Alan Stern         2008-03-19  864  #include <linux/pm_wakeup.h>

:::::: The code at line 856 was first introduced by commit
:::::: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 Linux-2.6.12-rc2

:::::: TO: Linus Torvalds <torvalds@ppc970.osdl.org>
:::::: CC: Linus Torvalds <torvalds@ppc970.osdl.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 6366 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 13/13] pmem: kill __pmem address space
  2016-06-04 20:53 ` [PATCH 13/13] pmem: kill __pmem address space Dan Williams
@ 2016-06-04 22:18   ` kbuild test robot
  0 siblings, 0 replies; 38+ messages in thread
From: kbuild test robot @ 2016-06-04 22:18 UTC (permalink / raw)
  To: Dan Williams
  Cc: kbuild-all, linux-nvdimm, Ross Zwisler, david, hch, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1923 bytes --]

Hi,

[auto build test WARNING on v4.7-rc1]
[also build test WARNING on next-20160603]
[cannot apply to linux-nvdimm/libnvdimm-for-next tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Dan-Williams/deprecate-pcommit/20160605-045935
config: s390-default_defconfig (attached as .config)
compiler: s390x-linux-gnu-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=s390 

All warnings (new ones prefixed by >>):

   drivers/s390/block/dcssblk.c: In function 'dcssblk_direct_access':
>> drivers/s390/block/dcssblk.c:897:9: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
     *kaddr = dev_info->start + offset;
            ^

vim +897 drivers/s390/block/dcssblk.c

   881		bio_io_error(bio);
   882		return BLK_QC_T_NONE;
   883	}
   884	
   885	static long
   886	dcssblk_direct_access (struct block_device *bdev, sector_t secnum,
   887				void **kaddr, pfn_t *pfn, long size)
   888	{
   889		struct dcssblk_dev_info *dev_info;
   890		unsigned long offset, dev_sz;
   891	
   892		dev_info = bdev->bd_disk->private_data;
   893		if (!dev_info)
   894			return -ENODEV;
   895		dev_sz = dev_info->end - dev_info->start;
   896		offset = secnum * 512;
 > 897		*kaddr = dev_info->start + offset;
   898		*pfn = __pfn_to_pfn_t(PFN_DOWN(dev_info->start + offset), PFN_DEV);
   899	
   900		return dev_sz - offset;
   901	}
   902	
   903	static void
   904	dcssblk_check_params(void)
   905	{

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 16142 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/13] deprecate pcommit
  2016-06-04 20:52 [PATCH 00/13] deprecate pcommit Dan Williams
                   ` (12 preceding siblings ...)
  2016-06-04 20:53 ` [PATCH 13/13] pmem: kill __pmem address space Dan Williams
@ 2016-06-05 17:41 ` Andy Lutomirski
  2016-06-05 18:48   ` Rudoff, Andy
  13 siblings, 1 reply; 38+ messages in thread
From: Andy Lutomirski @ 2016-06-05 17:41 UTC (permalink / raw)
  To: Dan Williams
  Cc: Thomas Gleixner, Paolo Bonzini, Greg Kroah-Hartman, Ingo Molnar,
	Josh Poimboeuf, linux-nvdimm, Alexander Shishkin, Dave Chinner,
	Ross Zwisler, Xiao Guangrong, Christoph Hellwig, linux-kernel,
	Borislav Petkov, H. Peter Anvin, X86 ML,
	Arnaldo Carvalho de Melo, Adrian Hunter, Peter Zijlstra

On Jun 4, 2016 1:53 PM, "Dan Williams" <dan.j.williams@intel.com> wrote:
>
> Platforms supporting NVDIMMs are now required to provide persistence
> guarantees once pmem stores are accepted by the memory subsystem.

Can you point us to a precise definition of what exactly constitutes
stores being "accepted by the memory subsystem"?  Back when pcommit
was a thing (hah!), it was precisely documented in the SDM.

> is usually achieved by a platform-level feature known as ADR
> (Asynchronous DRAM Refresh) that flushes any memory subsystem write
> pending queues on power loss/shutdown.

Blech.  Does this mean that we need some NMI handler or similar that
does wbinvd?  Will CPU caches automatically write themselves back on
power loss?  Will we lose data if something goes wrong with an SMI
handler?

>
> The 'pcommit' instruction (which has not shipped on any product) is no
> longer needed and is deprecated.

Does this mean it will never ship?

--Andy

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/13] deprecate pcommit
  2016-06-05 17:41 ` [PATCH 00/13] deprecate pcommit Andy Lutomirski
@ 2016-06-05 18:48   ` Rudoff, Andy
  0 siblings, 0 replies; 38+ messages in thread
From: Rudoff, Andy @ 2016-06-05 18:48 UTC (permalink / raw)
  To: Andy Lutomirski, Williams, Dan J
  Cc: X86 ML, Xiao Guangrong, linux-nvdimm, Alexander Shishkin,
	Greg Kroah-Hartman, H. Peter Anvin, Dave Chinner, linux-kernel,
	Arnaldo Carvalho de Melo, Christoph Hellwig, Peter Zijlstra,
	Ingo Molnar, Hunter, Adrian, Josh Poimboeuf, Paolo Bonzini,
	Thomas Gleixner, Borislav Petkov

>> Platforms supporting NVDIMMs are now required to provide persistence
>> guarantees once pmem stores are accepted by the memory subsystem.
>
>Can you point us to a precise definition of what exactly constitutes
>stores being "accepted by the memory subsystem"?  Back when pcommit
>was a thing (hah!), it was precisely documented in the SDM.

The Instruction Set Extensions document that describes PCOMMIT
says it applies to stores that were accepted by the memory subsystem
but were still potentially queued there.  So Dan’s description was
building on that same language.  A concrete example might help:

MOV to location X
CLWB X
SFENCE

is the sequence for flushing a store to pmem.  Before PCOMMIT
was deprecated, software needed to understand if the platform
had ADR and add:

PCOMMIT       ; the old way
SFENCE            ; the old way

to the above sequence if ADR wasn’t present on the platform.  Now
that ADR is required for all persistent memory, as it always has
been for NVDIMM-N devices available today, the simpler programming
model already in use for NVDIMM-N is used for all pmem.

>> is usually achieved by a platform-level feature known as ADR
>> (Asynchronous DRAM Refresh) that flushes any memory subsystem write
>> pending queues on power loss/shutdown.
>
>Blech.  Does this mean that we need some NMI handler or similar that
>does wbinvd?  Will CPU caches automatically write themselves back on
>power loss?  Will we lose data if something goes wrong with an SMI
>handler?

No, ADR is a platform-level feature that flushes any buffers in the memory
subsystem on power loss.  No software is involved and the CPU caches are
not touched by ADR.  So any stores flushed by CLFLUSH, CLFLUSHOPT, CLWB,
WBINVD, or non-temporal stores that go around the CPU caches can now be
considered persistent once those operations are complete.

>> The 'pcommit' instruction (which has not shipped on any product) is no
>> longer needed and is deprecated.
>
>Does this mean it will never ship?

Yes.  The new instructions like CLWB and CLFLUSHOPT will be rolled
into the SDM but PCOMMIT will be removed from the Extensions doc
and not rolled into the SDM.

-andy

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 11/13] Revert "KVM: x86: add pcommit support"
  2016-06-04 20:53 ` [PATCH 11/13] Revert "KVM: x86: add pcommit support" Dan Williams
@ 2016-06-06 15:14   ` Paolo Bonzini
  2016-06-06 16:14     ` Dan Williams
  0 siblings, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2016-06-06 15:14 UTC (permalink / raw)
  To: Dan Williams, linux-nvdimm
  Cc: Xiao Guangrong, david, linux-kernel, Ross Zwisler, hch



On 04/06/2016 22:53, Dan Williams wrote:
> This reverts commit 8b3e34e46aca9b6d349b331cd9cf71ccbdc91b2e.
> 
> Given the deprecation of the pcommit instruction, revert its usage as a
> vm exit source in kvm.
> 
> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

This is not what that patch does.  It passes down the pcommit
instruction to the guest if desired, and lets a nested hypervisor trap
it (also if desired).  Finally, it lets the guest see the PCOMMIT bit in
CPUID if the host has it.

If the PCOMMIT cpuid feature is going to stay despite the deprecation,
I'd prefer to keep this patch.  Otherwise please change the commit
message to

Given the deprecation of the pcommit instruction, the relevant VMX
features and CPUID bits are not going to be rolled into the SDM.  Remove
their usage from KVM.

Thanks,

Paolo

> ---
>  arch/x86/include/asm/vmx.h      |    1 -
>  arch/x86/include/uapi/asm/vmx.h |    4 +---
>  arch/x86/kvm/cpuid.c            |    2 +-
>  arch/x86/kvm/cpuid.h            |    8 --------
>  arch/x86/kvm/vmx.c              |   32 ++++----------------------------
>  5 files changed, 6 insertions(+), 41 deletions(-)
> 
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index 14c63c7e8337..a002b07a7099 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -72,7 +72,6 @@
>  #define SECONDARY_EXEC_SHADOW_VMCS              0x00004000
>  #define SECONDARY_EXEC_ENABLE_PML               0x00020000
>  #define SECONDARY_EXEC_XSAVES			0x00100000
> -#define SECONDARY_EXEC_PCOMMIT			0x00200000
>  #define SECONDARY_EXEC_TSC_SCALING              0x02000000
>  
>  #define PIN_BASED_EXT_INTR_MASK                 0x00000001
> diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
> index 5b15d94a33f8..37fee272618f 100644
> --- a/arch/x86/include/uapi/asm/vmx.h
> +++ b/arch/x86/include/uapi/asm/vmx.h
> @@ -78,7 +78,6 @@
>  #define EXIT_REASON_PML_FULL            62
>  #define EXIT_REASON_XSAVES              63
>  #define EXIT_REASON_XRSTORS             64
> -#define EXIT_REASON_PCOMMIT             65
>  
>  #define VMX_EXIT_REASONS \
>  	{ EXIT_REASON_EXCEPTION_NMI,         "EXCEPTION_NMI" }, \
> @@ -127,8 +126,7 @@
>  	{ EXIT_REASON_INVVPID,               "INVVPID" }, \
>  	{ EXIT_REASON_INVPCID,               "INVPCID" }, \
>  	{ EXIT_REASON_XSAVES,                "XSAVES" }, \
> -	{ EXIT_REASON_XRSTORS,               "XRSTORS" }, \
> -	{ EXIT_REASON_PCOMMIT,               "PCOMMIT" }
> +	{ EXIT_REASON_XRSTORS,               "XRSTORS" }
>  
>  #define VMX_ABORT_SAVE_GUEST_MSR_FAIL        1
>  #define VMX_ABORT_LOAD_HOST_MSR_FAIL         4
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 769af907f824..fe865563870d 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -364,7 +364,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>  		F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
>  		F(BMI2) | F(ERMS) | f_invpcid | F(RTM) | f_mpx | F(RDSEED) |
>  		F(ADX) | F(SMAP) | F(AVX512F) | F(AVX512PF) | F(AVX512ER) |
> -		F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB) | F(PCOMMIT);
> +		F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB);
>  
>  	/* cpuid 0xD.1.eax */
>  	const u32 kvm_cpuid_D_1_eax_x86_features =
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index e17a74b1d852..35058c2c0eea 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -144,14 +144,6 @@ static inline bool guest_cpuid_has_rtm(struct kvm_vcpu *vcpu)
>  	return best && (best->ebx & bit(X86_FEATURE_RTM));
>  }
>  
> -static inline bool guest_cpuid_has_pcommit(struct kvm_vcpu *vcpu)
> -{
> -	struct kvm_cpuid_entry2 *best;
> -
> -	best = kvm_find_cpuid_entry(vcpu, 7, 0);
> -	return best && (best->ebx & bit(X86_FEATURE_PCOMMIT));
> -}
> -
>  static inline bool guest_cpuid_has_rdtscp(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpuid_entry2 *best;
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index fb93010beaa4..2e2685424fdc 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2705,8 +2705,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
>  		SECONDARY_EXEC_APIC_REGISTER_VIRT |
>  		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
>  		SECONDARY_EXEC_WBINVD_EXITING |
> -		SECONDARY_EXEC_XSAVES |
> -		SECONDARY_EXEC_PCOMMIT;
> +		SECONDARY_EXEC_XSAVES;
>  
>  	if (enable_ept) {
>  		/* nested EPT: emulate EPT also to L1 */
> @@ -3268,7 +3267,6 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>  			SECONDARY_EXEC_SHADOW_VMCS |
>  			SECONDARY_EXEC_XSAVES |
>  			SECONDARY_EXEC_ENABLE_PML |
> -			SECONDARY_EXEC_PCOMMIT |
>  			SECONDARY_EXEC_TSC_SCALING;
>  		if (adjust_vmx_controls(min2, opt2,
>  					MSR_IA32_VMX_PROCBASED_CTLS2,
> @@ -4856,9 +4854,6 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
>  	if (!enable_pml)
>  		exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
>  
> -	/* Currently, we allow L1 guest to directly run pcommit instruction. */
> -	exec_control &= ~SECONDARY_EXEC_PCOMMIT;
> -
>  	return exec_control;
>  }
>  
> @@ -4902,9 +4897,10 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
>  
>  	vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, vmx_exec_control(vmx));
>  
> -	if (cpu_has_secondary_exec_ctrls())
> +	if (cpu_has_secondary_exec_ctrls()) {
>  		vmcs_write32(SECONDARY_VM_EXEC_CONTROL,
>  				vmx_secondary_exec_control(vmx));
> +	}
>  
>  	if (kvm_vcpu_apicv_active(&vmx->vcpu)) {
>  		vmcs_write64(EOI_EXIT_BITMAP0, 0);
> @@ -7557,13 +7553,6 @@ static int handle_pml_full(struct kvm_vcpu *vcpu)
>  	return 1;
>  }
>  
> -static int handle_pcommit(struct kvm_vcpu *vcpu)
> -{
> -	/* we never catch pcommit instruct for L1 guest. */
> -	WARN_ON(1);
> -	return 1;
> -}
> -
>  /*
>   * The exit handlers return 1 if the exit was handled fully and guest execution
>   * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
> @@ -7614,7 +7603,6 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>  	[EXIT_REASON_XSAVES]                  = handle_xsaves,
>  	[EXIT_REASON_XRSTORS]                 = handle_xrstors,
>  	[EXIT_REASON_PML_FULL]		      = handle_pml_full,
> -	[EXIT_REASON_PCOMMIT]                 = handle_pcommit,
>  };
>  
>  static const int kvm_vmx_max_exit_handlers =
> @@ -7923,8 +7911,6 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
>  		 * the XSS exit bitmap in vmcs12.
>  		 */
>  		return nested_cpu_has2(vmcs12, SECONDARY_EXEC_XSAVES);
> -	case EXIT_REASON_PCOMMIT:
> -		return nested_cpu_has2(vmcs12, SECONDARY_EXEC_PCOMMIT);
>  	default:
>  		return true;
>  	}
> @@ -9085,15 +9071,6 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
>  
>  	if (cpu_has_secondary_exec_ctrls())
>  		vmcs_set_secondary_exec_control(secondary_exec_ctl);
> -
> -	if (static_cpu_has(X86_FEATURE_PCOMMIT) && nested) {
> -		if (guest_cpuid_has_pcommit(vcpu))
> -			vmx->nested.nested_vmx_secondary_ctls_high |=
> -				SECONDARY_EXEC_PCOMMIT;
> -		else
> -			vmx->nested.nested_vmx_secondary_ctls_high &=
> -				~SECONDARY_EXEC_PCOMMIT;
> -	}
>  }
>  
>  static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
> @@ -9706,8 +9683,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>  		exec_control &= ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
>  				  SECONDARY_EXEC_RDTSCP |
>  				  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
> -				  SECONDARY_EXEC_APIC_REGISTER_VIRT |
> -				  SECONDARY_EXEC_PCOMMIT);
> +				  SECONDARY_EXEC_APIC_REGISTER_VIRT);
>  		if (nested_cpu_has(vmcs12,
>  				CPU_BASED_ACTIVATE_SECONDARY_CONTROLS))
>  			exec_control |= vmcs12->secondary_vm_exec_control;
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 11/13] Revert "KVM: x86: add pcommit support"
  2016-06-06 15:14   ` Paolo Bonzini
@ 2016-06-06 16:14     ` Dan Williams
  0 siblings, 0 replies; 38+ messages in thread
From: Dan Williams @ 2016-06-06 16:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-nvdimm, Xiao Guangrong, david, linux-kernel, Ross Zwisler,
	Christoph Hellwig

On Mon, Jun 6, 2016 at 8:14 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 04/06/2016 22:53, Dan Williams wrote:
>> This reverts commit 8b3e34e46aca9b6d349b331cd9cf71ccbdc91b2e.
>>
>> Given the deprecation of the pcommit instruction, revert its usage as a
>> vm exit source in kvm.
>>
>> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> This is not what that patch does.  It passes down the pcommit
> instruction to the guest if desired, and lets a nested hypervisor trap
> it (also if desired).  Finally, it lets the guest see the PCOMMIT bit in
> CPUID if the host has it.
>
> If the PCOMMIT cpuid feature is going to stay despite the deprecation,
> I'd prefer to keep this patch.

No, it will be marked as reserved.

> Otherwise please change the commit
> message to
>
> Given the deprecation of the pcommit instruction, the relevant VMX
> features and CPUID bits are not going to be rolled into the SDM.  Remove
> their usage from KVM.

Ok, will do.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 03/13] libnvdimm: introduce nvdimm_flush()
  2016-06-04 20:52 ` [PATCH 03/13] libnvdimm: introduce nvdimm_flush() Dan Williams
@ 2016-06-06 17:45   ` Jeff Moyer
  0 siblings, 0 replies; 38+ messages in thread
From: Jeff Moyer @ 2016-06-06 17:45 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm, david, linux-kernel, hch

Dan Williams <dan.j.williams@intel.com> writes:

> nvdimm_flush() is an alternative to the x86 pcommit instruction.  It is
> an optional write flushing mechanism that an nvdimm bus can provide for
> the pmem driver to consume.  In the case of the NFIT nvdimm-bus-provider
> nvdimm_flush() is implemented as a series of flush-hint-address [1]
> writes to each dimm in the interleave set that backs the namespace.  For
> now this implementation is just a simple replacement of wmb_pmem() /
> arch_has_wmb_pmem() with nvdimm_flush() / nvdimm_has_flush().

Dan, this needs a whole heck of a lot more explanation.  As I understand
it, you're talking about flushing write pending queues (not the CPU
cache).  And given that the write pending queues are part of the
persistence domain on systems with ADR (now required for pmem), they
don't need to be flushed for normal pmem, only for block window
accesses.  And so this confuses me:

> We defer the full implementation of nvdimm_flush() until the
>implementation is prepared to also handle the blk-region case.

The one thing they're required to address is not addressed?  I
understand that you may want to push data out as far as possible to
limit the exposure to hardware failures, but that's not how you're
positioning this patch, and so it's very confusing.

Please also make the documentation above nvdimm_flush and
nvdimm_has_flush much more idiot-proof.

> @@ -234,7 +249,7 @@ static int pmem_attach_disk(struct device *dev,
>  	dev_set_drvdata(dev, pmem);
>  	pmem->phys_addr = res->start;
>  	pmem->size = resource_size(res);
> -	if (!arch_has_wmb_pmem())
> +	if (nvdimm_has_flush(nd_region) < 0)
>  		dev_warn(dev, "unable to guarantee persistence of writes\n");

And this doesn't make sense to me, either.  NVDIMM-N's do not have to
have flush hint addresses in order to guarantee persistence.  I guess
that's no different than what we have now, but you're sure not
addressing this bogus warning message.  And yes, I know there's no way
to query the platform to see if ADR is available.  But how many people
do you think are sticking NVDIMM-Ns in systems that do not have ADR?
Just get rid of the message already.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/13] driver core, libnvdimm: disable manual unbind of dimms while region active
  2016-06-04 20:52 ` [PATCH 01/13] driver core, libnvdimm: disable manual unbind of dimms while region active Dan Williams
  2016-06-04 21:10   ` Greg Kroah-Hartman
  2016-06-04 21:50   ` kbuild test robot
@ 2016-06-06 19:25   ` Linda Knippers
  2016-06-06 19:31     ` Dan Williams
  2 siblings, 1 reply; 38+ messages in thread
From: Linda Knippers @ 2016-06-06 19:25 UTC (permalink / raw)
  To: Dan Williams, linux-nvdimm; +Cc: Greg Kroah-Hartman, david, linux-kernel, hch

On 6/4/2016 4:52 PM, Dan Williams wrote:
> There are scenarios where we need a middle ground between disabling all
> manual bind/unbind attempts (via driver->suppress_bind_attrs) and
> allowing unbind at any userspace-determined time.  Pinning modules takes
> away one vector for unwanted out-of-sequence device_release_driver()
> invocations, this new mechanism (via device->suppress_unbind_attr) takes
> away another.
> 
> The first user of this mechanism is the libnvdimm sub-system where
> manual dimm disabling should be prevented while the dimm is active in
> any region.  Note that there is a 1:N dimm-to-region relationship which
> is why this is implemented as a disable count rather than a flag.  This
> forces userspace to disable regions before dimms when manually shutting
> down a bus topology.

How is this related to deprecating pcommit?

-- ljk
> 
> This does not affect any of the kernel-internal initiated invocations of
> device_release_driver().
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/base/base.h             |    1 +
>  drivers/base/bus.c              |   12 ++++++++++--
>  drivers/base/core.c             |    1 +
>  drivers/base/dd.c               |    2 +-
>  drivers/nvdimm/namespace_devs.c |    1 +
>  drivers/nvdimm/region_devs.c    |    4 +++-
>  include/linux/device.h          |   20 ++++++++++++++++++++
>  7 files changed, 37 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/base/base.h b/drivers/base/base.h
> index e05db388bd1c..129814b17ca6 100644
> --- a/drivers/base/base.h
> +++ b/drivers/base/base.h
> @@ -109,6 +109,7 @@ extern int bus_add_driver(struct device_driver *drv);
>  extern void bus_remove_driver(struct device_driver *drv);
>  
>  extern void driver_detach(struct device_driver *drv);
> +extern void __device_release_driver(struct device *dev);
>  extern int driver_probe_device(struct device_driver *drv, struct device *dev);
>  extern void driver_deferred_probe_del(struct device *dev);
>  static inline int driver_match_device(struct device_driver *drv,
> diff --git a/drivers/base/bus.c b/drivers/base/bus.c
> index 6470eb8088f4..b48a903f2d28 100644
> --- a/drivers/base/bus.c
> +++ b/drivers/base/bus.c
> @@ -188,10 +188,18 @@ static ssize_t unbind_store(struct device_driver *drv, const char *buf,
>  	if (dev && dev->driver == drv) {
>  		if (dev->parent)	/* Needed for USB */
>  			device_lock(dev->parent);
> -		device_release_driver(dev);
> +
> +		device_lock(dev);
> +		if (atomic_read(&dev->suppress_unbind_attr) > 0)
> +			err = -EBUSY;
> +		else {
> +			__device_release_driver(dev);
> +			err = count;
> +		}
> +		device_unlock(dev);
> +
>  		if (dev->parent)
>  			device_unlock(dev->parent);
> -		err = count;
>  	}
>  	put_device(dev);
>  	bus_put(bus);
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 0a8bdade53f2..0ea0e8560e1d 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -708,6 +708,7 @@ void device_initialize(struct device *dev)
>  	INIT_LIST_HEAD(&dev->devres_head);
>  	device_pm_init(dev);
>  	set_dev_node(dev, -1);
> +	atomic_set(&dev->suppress_unbind_attr, 0);
>  #ifdef CONFIG_GENERIC_MSI_IRQ
>  	INIT_LIST_HEAD(&dev->msi_list);
>  #endif
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 16688f50729c..9e21817ca2d6 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -756,7 +756,7 @@ EXPORT_SYMBOL_GPL(driver_attach);
>   * __device_release_driver() must be called with @dev lock held.
>   * When called for a USB interface, @dev->parent lock must be held as well.
>   */
> -static void __device_release_driver(struct device *dev)
> +void __device_release_driver(struct device *dev)
>  {
>  	struct device_driver *drv;
>  
> diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
> index c5e3196c45b0..e65572b6092c 100644
> --- a/drivers/nvdimm/namespace_devs.c
> +++ b/drivers/nvdimm/namespace_devs.c
> @@ -1950,6 +1950,7 @@ static int init_active_labels(struct nd_region *nd_region)
>  		}
>  		nd_mapping->ndd = ndd;
>  		atomic_inc(&nvdimm->busy);
> +		device_disable_unbind_attr(&nvdimm->dev);
>  		get_ndd(ndd);
>  
>  		count = nd_label_active_count(ndd);
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index 40fcfea26fbb..320f0f3ea367 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -427,8 +427,10 @@ static void nd_region_notify_driver_action(struct nvdimm_bus *nvdimm_bus,
>  			nd_mapping->labels = NULL;
>  			put_ndd(ndd);
>  			nd_mapping->ndd = NULL;
> -			if (ndd)
> +			if (ndd) {
>  				atomic_dec(&nvdimm->busy);
> +				device_enable_unbind_attr(&nvdimm->dev);
> +			}
>  		}
>  
>  		if (is_nd_pmem(dev))
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 38f02814d53a..d9eaa85bb9cf 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -849,6 +849,7 @@ struct device {
>  
>  	void	(*release)(struct device *dev);
>  	struct iommu_group	*iommu_group;
> +	atomic_t		suppress_unbind_attr; /* disable manual unbind */
>  
>  	bool			offline_disabled:1;
>  	bool			offline:1;
> @@ -988,6 +989,25 @@ static inline void device_lock_assert(struct device *dev)
>  	lockdep_assert_held(&dev->mutex);
>  }
>  
> +static inline bool device_disable_unbind_attr(struct device *dev)
> +{
> +	bool suppressed = false;
> +
> +	device_lock(dev);
> +	if (dev->driver) {
> +		atomic_inc(&dev->suppress_unbind_attr);
> +		suppressed = true;
> +	}
> +	device_unlock(dev);
> +
> +	return suppressed;
> +}
> +
> +static inline bool device_enable_unbind_attr(struct device *dev)
> +{
> +	return atomic_dec_and_test(&dev->suppress_unbind_attr);
> +}
> +
>  static inline struct device_node *dev_of_node(struct device *dev)
>  {
>  	if (!IS_ENABLED(CONFIG_OF))
> 
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/13] driver core, libnvdimm: disable manual unbind of dimms while region active
  2016-06-06 19:25   ` Linda Knippers
@ 2016-06-06 19:31     ` Dan Williams
  2016-06-06 19:36       ` Dan Williams
  2016-06-06 19:36       ` Linda Knippers
  0 siblings, 2 replies; 38+ messages in thread
From: Dan Williams @ 2016-06-06 19:31 UTC (permalink / raw)
  To: Linda Knippers
  Cc: linux-nvdimm@lists.01.org, Greg Kroah-Hartman, david,
	linux-kernel, Christoph Hellwig

On Mon, Jun 6, 2016 at 12:25 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
> On 6/4/2016 4:52 PM, Dan Williams wrote:
>> There are scenarios where we need a middle ground between disabling all
>> manual bind/unbind attempts (via driver->suppress_bind_attrs) and
>> allowing unbind at any userspace-determined time.  Pinning modules takes
>> away one vector for unwanted out-of-sequence device_release_driver()
>> invocations, this new mechanism (via device->suppress_unbind_attr) takes
>> away another.
>>
>> The first user of this mechanism is the libnvdimm sub-system where
>> manual dimm disabling should be prevented while the dimm is active in
>> any region.  Note that there is a 1:N dimm-to-region relationship which
>> is why this is implemented as a disable count rather than a flag.  This
>> forces userspace to disable regions before dimms when manually shutting
>> down a bus topology.
>
> How is this related to deprecating pcommit?

We need guarantees that the flush hint mappings are valid for the
duration of a pmem namespace being enabled.  I am going to move the
mapping of the flush hint region from per-dimm to per-region.  However
since multiple regions may reference the same dimm the mapping needs
to be reference counted and shared across regions.  This will be
similar to the arrangement we have for BLK-regions that share a
control region mapping.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/13] driver core, libnvdimm: disable manual unbind of dimms while region active
  2016-06-06 19:31     ` Dan Williams
@ 2016-06-06 19:36       ` Dan Williams
  2016-06-06 19:36       ` Linda Knippers
  1 sibling, 0 replies; 38+ messages in thread
From: Dan Williams @ 2016-06-06 19:36 UTC (permalink / raw)
  To: Linda Knippers
  Cc: linux-nvdimm@lists.01.org, Greg Kroah-Hartman, david,
	linux-kernel, Christoph Hellwig

On Mon, Jun 6, 2016 at 12:31 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Mon, Jun 6, 2016 at 12:25 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
>> On 6/4/2016 4:52 PM, Dan Williams wrote:
>>> There are scenarios where we need a middle ground between disabling all
>>> manual bind/unbind attempts (via driver->suppress_bind_attrs) and
>>> allowing unbind at any userspace-determined time.  Pinning modules takes
>>> away one vector for unwanted out-of-sequence device_release_driver()
>>> invocations, this new mechanism (via device->suppress_unbind_attr) takes
>>> away another.
>>>
>>> The first user of this mechanism is the libnvdimm sub-system where
>>> manual dimm disabling should be prevented while the dimm is active in
>>> any region.  Note that there is a 1:N dimm-to-region relationship which
>>> is why this is implemented as a disable count rather than a flag.  This
>>> forces userspace to disable regions before dimms when manually shutting
>>> down a bus topology.
>>
>> How is this related to deprecating pcommit?
>
> We need guarantees that the flush hint mappings are valid for the
> duration of a pmem namespace being enabled.  I am going to move the
> mapping of the flush hint region from per-dimm to per-region.  However
> since multiple regions may reference the same dimm the mapping needs
> to be reference counted and shared across regions.  This will be
> similar to the arrangement we have for BLK-regions that share a
> control region mapping.

The usage of the word "region" is multiplexed too many times above.

per-region => per PMEM-region-device / BLK-region-device
flush hint region => per-dimm flush hint address range
control region => address range where block window mmio control registers reside

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/13] driver core, libnvdimm: disable manual unbind of dimms while region active
  2016-06-06 19:31     ` Dan Williams
  2016-06-06 19:36       ` Dan Williams
@ 2016-06-06 19:36       ` Linda Knippers
  2016-06-06 19:46         ` Dan Williams
  1 sibling, 1 reply; 38+ messages in thread
From: Linda Knippers @ 2016-06-06 19:36 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm@lists.01.org, Greg Kroah-Hartman, david,
	linux-kernel, Christoph Hellwig



On 6/6/2016 3:31 PM, Dan Williams wrote:
> On Mon, Jun 6, 2016 at 12:25 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
>> On 6/4/2016 4:52 PM, Dan Williams wrote:
>>> There are scenarios where we need a middle ground between disabling all
>>> manual bind/unbind attempts (via driver->suppress_bind_attrs) and
>>> allowing unbind at any userspace-determined time.  Pinning modules takes
>>> away one vector for unwanted out-of-sequence device_release_driver()
>>> invocations, this new mechanism (via device->suppress_unbind_attr) takes
>>> away another.
>>>
>>> The first user of this mechanism is the libnvdimm sub-system where
>>> manual dimm disabling should be prevented while the dimm is active in
>>> any region.  Note that there is a 1:N dimm-to-region relationship which
>>> is why this is implemented as a disable count rather than a flag.  This
>>> forces userspace to disable regions before dimms when manually shutting
>>> down a bus topology.
>>
>> How is this related to deprecating pcommit?
> 
> We need guarantees that the flush hint mappings are valid for the
> duration of a pmem namespace being enabled.  I am going to move the
> mapping of the flush hint region from per-dimm to per-region.  However
> since multiple regions may reference the same dimm the mapping needs
> to be reference counted and shared across regions.  This will be
> similar to the arrangement we have for BLK-regions that share a
> control region mapping.

Why are things moving around?  Aren't flush hints defined per NFIT device
handle, making them an optional per-dimm thing?

I don't understand a lot of this patch series and had the same questions
as Jeff.  How does deprecating pcommit, because it's not necessary with ADR,
change so much?

-- ljk
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/13] driver core, libnvdimm: disable manual unbind of dimms while region active
  2016-06-06 19:36       ` Linda Knippers
@ 2016-06-06 19:46         ` Dan Williams
  2016-06-06 20:20           ` Linda Knippers
  0 siblings, 1 reply; 38+ messages in thread
From: Dan Williams @ 2016-06-06 19:46 UTC (permalink / raw)
  To: Linda Knippers
  Cc: linux-nvdimm@lists.01.org, Greg Kroah-Hartman, david,
	linux-kernel, Christoph Hellwig

On Mon, Jun 6, 2016 at 12:36 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
>
>
> On 6/6/2016 3:31 PM, Dan Williams wrote:
>> On Mon, Jun 6, 2016 at 12:25 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
>>> On 6/4/2016 4:52 PM, Dan Williams wrote:
>>>> There are scenarios where we need a middle ground between disabling all
>>>> manual bind/unbind attempts (via driver->suppress_bind_attrs) and
>>>> allowing unbind at any userspace-determined time.  Pinning modules takes
>>>> away one vector for unwanted out-of-sequence device_release_driver()
>>>> invocations, this new mechanism (via device->suppress_unbind_attr) takes
>>>> away another.
>>>>
>>>> The first user of this mechanism is the libnvdimm sub-system where
>>>> manual dimm disabling should be prevented while the dimm is active in
>>>> any region.  Note that there is a 1:N dimm-to-region relationship which
>>>> is why this is implemented as a disable count rather than a flag.  This
>>>> forces userspace to disable regions before dimms when manually shutting
>>>> down a bus topology.
>>>
>>> How is this related to deprecating pcommit?
>>
>> We need guarantees that the flush hint mappings are valid for the
>> duration of a pmem namespace being enabled.  I am going to move the
>> mapping of the flush hint region from per-dimm to per-region.  However
>> since multiple regions may reference the same dimm the mapping needs
>> to be reference counted and shared across regions.  This will be
>> similar to the arrangement we have for BLK-regions that share a
>> control region mapping.
>
> Why are things moving around?  Aren't flush hints defined per NFIT device
> handle, making them an optional per-dimm thing?
>
> I don't understand a lot of this patch series and had the same questions
> as Jeff.  How does deprecating pcommit, because it's not necessary with ADR,
> change so much?

This patch set deprecates pcommit, and introduces the usage of flush
hints into the pmem path.  The introduction patch used the word
"usually", I should have fleshed that out to say "usually ADR, or
explicit use of flush hints".

A solution to the posted-write-queue flushing needs to be available
and a platform can choose to use flush hints or ADR.  If the NFIT
defines an NVDIMM device without hints we assume the platform must
have ADR.  If the platform NFIT neglects to define an NVDIMM to
physical address range mapping, we warn about a potentially broken
BIOS.  Hopefully we can make this clearer in future versions of the
spec.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/13] driver core, libnvdimm: disable manual unbind of dimms while region active
  2016-06-06 19:46         ` Dan Williams
@ 2016-06-06 20:20           ` Linda Knippers
  2016-06-06 20:36             ` Dan Williams
  0 siblings, 1 reply; 38+ messages in thread
From: Linda Knippers @ 2016-06-06 20:20 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm@lists.01.org, Greg Kroah-Hartman, david,
	linux-kernel, Christoph Hellwig



On 6/6/2016 3:46 PM, Dan Williams wrote:
> On Mon, Jun 6, 2016 at 12:36 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
>>
>>
>> On 6/6/2016 3:31 PM, Dan Williams wrote:
>>> On Mon, Jun 6, 2016 at 12:25 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
>>>> On 6/4/2016 4:52 PM, Dan Williams wrote:
>>>>> There are scenarios where we need a middle ground between disabling all
>>>>> manual bind/unbind attempts (via driver->suppress_bind_attrs) and
>>>>> allowing unbind at any userspace-determined time.  Pinning modules takes
>>>>> away one vector for unwanted out-of-sequence device_release_driver()
>>>>> invocations, this new mechanism (via device->suppress_unbind_attr) takes
>>>>> away another.
>>>>>
>>>>> The first user of this mechanism is the libnvdimm sub-system where
>>>>> manual dimm disabling should be prevented while the dimm is active in
>>>>> any region.  Note that there is a 1:N dimm-to-region relationship which
>>>>> is why this is implemented as a disable count rather than a flag.  This
>>>>> forces userspace to disable regions before dimms when manually shutting
>>>>> down a bus topology.
>>>>
>>>> How is this related to deprecating pcommit?
>>>
>>> We need guarantees that the flush hint mappings are valid for the
>>> duration of a pmem namespace being enabled.  I am going to move the
>>> mapping of the flush hint region from per-dimm to per-region.  However
>>> since multiple regions may reference the same dimm the mapping needs
>>> to be reference counted and shared across regions.  This will be
>>> similar to the arrangement we have for BLK-regions that share a
>>> control region mapping.
>>
>> Why are things moving around?  Aren't flush hints defined per NFIT device
>> handle, making them an optional per-dimm thing?
>>
>> I don't understand a lot of this patch series and had the same questions
>> as Jeff.  How does deprecating pcommit, because it's not necessary with ADR,
>> change so much?
> 
> This patch set deprecates pcommit, and introduces the usage of flush
> hints into the pmem path.  The introduction patch used the word
> "usually", I should have fleshed that out to say "usually ADR, or
> explicit use of flush hints".
> 
> A solution to the posted-write-queue flushing needs to be available
> and a platform can choose to use flush hints or ADR.  If the NFIT
> defines an NVDIMM device without hints we assume the platform must
> have ADR.  If the platform NFIT neglects to define an NVDIMM to
> physical address range mapping, we warn about a potentially broken
> BIOS.  Hopefully we can make this clearer in future versions of the
> spec.

You lost me on those last 2 sentences.  An NVDIMM doesn't have to have
an SPA range, but that seems to be unrelated to pcommit or flushes.

-- ljk

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/13] driver core, libnvdimm: disable manual unbind of dimms while region active
  2016-06-06 20:20           ` Linda Knippers
@ 2016-06-06 20:36             ` Dan Williams
  2016-06-06 21:15               ` Linda Knippers
  0 siblings, 1 reply; 38+ messages in thread
From: Dan Williams @ 2016-06-06 20:36 UTC (permalink / raw)
  To: Linda Knippers
  Cc: linux-nvdimm@lists.01.org, Greg Kroah-Hartman, david,
	linux-kernel, Christoph Hellwig

On Mon, Jun 6, 2016 at 1:20 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
[..]
>> A solution to the posted-write-queue flushing needs to be available
>> and a platform can choose to use flush hints or ADR.  If the NFIT
>> defines an NVDIMM device without hints we assume the platform must
>> have ADR.  If the platform NFIT neglects to define an NVDIMM to
>> physical address range mapping, we warn about a potentially broken
>> BIOS.  Hopefully we can make this clearer in future versions of the
>> spec.
>
> You lost me on those last 2 sentences.  An NVDIMM doesn't have to have
> an SPA range, but that seems to be unrelated to pcommit or flushes.

Ok, now you've lost me...

We need a SPA range to be able to do I/O whether that SPA range is
direct access to media or a block-window aperture.  If an NFIT
inculdes a "System Physical Address (SPA) Range Structure", but
neglects to include a corresponding "NVDIMM Region Mapping Structure"
then the kernel has no idea what actual dimm device(s) back that
memory region.  Without a memory device mapping it is undefined
whether I/O to a SPA range requires flushing or not.

This patch set silences the warning about "not being able to guarantee
persistence" when the BIOS provides a "NVDIMM Region Mapping
Structure".  When that structure is present the kernel uses flush
hints when provided, but ADR otherwise.  See the implementation of
nvdimm_flush() in patch 4.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/13] driver core, libnvdimm: disable manual unbind of dimms while region active
  2016-06-06 20:36             ` Dan Williams
@ 2016-06-06 21:15               ` Linda Knippers
  0 siblings, 0 replies; 38+ messages in thread
From: Linda Knippers @ 2016-06-06 21:15 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm@lists.01.org, Greg Kroah-Hartman, david,
	linux-kernel, Christoph Hellwig



On 6/6/2016 4:36 PM, Dan Williams wrote:
> On Mon, Jun 6, 2016 at 1:20 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
> [..]
>>> A solution to the posted-write-queue flushing needs to be available
>>> and a platform can choose to use flush hints or ADR.  If the NFIT
>>> defines an NVDIMM device without hints we assume the platform must
>>> have ADR.  If the platform NFIT neglects to define an NVDIMM to
>>> physical address range mapping, we warn about a potentially broken
>>> BIOS.  Hopefully we can make this clearer in future versions of the
>>> spec.
>>
>> You lost me on those last 2 sentences.  An NVDIMM doesn't have to have
>> an SPA range, but that seems to be unrelated to pcommit or flushes.
> 
> Ok, now you've lost me...
> 
> We need a SPA range to be able to do I/O whether that SPA range is
> direct access to media or a block-window aperture.  If an NFIT
> inculdes a "System Physical Address (SPA) Range Structure", but
> neglects to include a corresponding "NVDIMM Region Mapping Structure"
> then the kernel has no idea what actual dimm device(s) back that
> memory region.  Without a memory device mapping it is undefined
> whether I/O to a SPA range requires flushing or not.

Right, if you have an SPA Range Structure, you need an NVDIMM Region
Mapping Structure referencing it (although now I'm wondering if that's
true if it's one of the other GUID types, like a virtual CD...).  But the
reverse isn't true.  You can have a Region mapping Structure without an
SPA range, and that's what I thought you were flagging as an error in
your previous sentence.

> This patch set silences the warning about "not being able to guarantee
> persistence" when the BIOS provides a "NVDIMM Region Mapping
> Structure".  When that structure is present the kernel uses flush
> hints when provided, but ADR otherwise.  See the implementation of
> nvdimm_flush() in patch 4.

Ok, I see that now.

Patch 0 and the commit messages could have used a bit more info.
The big picture was a bit difficult to see and patch 4 doesn't say
anything about finishing in the TODOs from the previous patch.

Thanks,

-- ljk

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 04/13] libnvdimm, nfit: move flush hint mapping to dimm driver
  2016-06-04 20:52 ` [PATCH 04/13] libnvdimm, nfit: move flush hint mapping to dimm driver Dan Williams
                     ` (2 preceding siblings ...)
  2016-06-04 21:49   ` kbuild test robot
@ 2016-06-07 18:11   ` Kani, Toshimitsu
  2016-06-07 18:15     ` Dan Williams
  3 siblings, 1 reply; 38+ messages in thread
From: Kani, Toshimitsu @ 2016-06-07 18:11 UTC (permalink / raw)
  To: dan.j.williams, linux-nvdimm@lists.01.org; +Cc: hch, linux-kernel, david

On Sat, 2016-06-04 at 13:52 -0700, Dan Williams wrote:
 :
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index 420e1a5e2250..5b6f85d00bb5 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -803,11 +803,29 @@ EXPORT_SYMBOL_GPL(nvdimm_volatile_region_create);
>   */
>  void nvdimm_flush(struct nd_region *nd_region)
>  {
> +	int i;
> +
>  	/*
> -	 * TODO: replace wmb_pmem() usage with flush hint writes where
> -	 * available.
> +	 * The first wmb() is needed to 'sfence' all previous writes
> +	 * such that they are architecturally visible for the platform
> +	 * buffer flush.  Note that we've already arranged for pmem
> +	 * writes to avoid the cache via arch_memcpy_to_pmem().  The
> +	 * final wmb() ensures ordering for the NVDIMM flush write.
>  	 */
> -	wmb_pmem();
> +	wmb();
> +	for (i = 0; i < nd_region->ndr_mappings; i++) {
> +		struct nd_mapping *nd_mapping = &nd_region->mapping[i];
> +		struct nvdimm_drvdata *ndd = to_ndd_unlocked(nd_mapping);
> +
> +		/*
> +		 * Note, nvdimm_drvdata guaranteed to be live since we
> +		 * arrange for all associated regions to be disabled
> +		 * before the dimm is disabled.
> +		 */
> +		if (ndd->flush_wpq[0])

Hi Dan,

ndd is NULL on our system since nvdimm_probe() always fails
in nvdimm_init_nsarea(), which then calls put_ndd() and
nvdimm_drvdata_release().

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 04/13] libnvdimm, nfit: move flush hint mapping to dimm driver
  2016-06-07 18:11   ` Kani, Toshimitsu
@ 2016-06-07 18:15     ` Dan Williams
  0 siblings, 0 replies; 38+ messages in thread
From: Dan Williams @ 2016-06-07 18:15 UTC (permalink / raw)
  To: Kani, Toshimitsu; +Cc: linux-nvdimm@lists.01.org, hch, linux-kernel, david

On Tue, Jun 7, 2016 at 11:11 AM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
> On Sat, 2016-06-04 at 13:52 -0700, Dan Williams wrote:
>  :
>> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
>> index 420e1a5e2250..5b6f85d00bb5 100644
>> --- a/drivers/nvdimm/region_devs.c
>> +++ b/drivers/nvdimm/region_devs.c
>> @@ -803,11 +803,29 @@ EXPORT_SYMBOL_GPL(nvdimm_volatile_region_create);
>>   */
>>  void nvdimm_flush(struct nd_region *nd_region)
>>  {
>> +int i;
>> +
>>  /*
>> - * TODO: replace wmb_pmem() usage with flush hint writes where
>> - * available.
>> + * The first wmb() is needed to 'sfence' all previous writes
>> + * such that they are architecturally visible for the platform
>> + * buffer flush.  Note that we've already arranged for pmem
>> + * writes to avoid the cache via arch_memcpy_to_pmem().  The
>> + * final wmb() ensures ordering for the NVDIMM flush write.
>>   */
>> -wmb_pmem();
>> +wmb();
>> +for (i = 0; i < nd_region->ndr_mappings; i++) {
>> +struct nd_mapping *nd_mapping = &nd_region->mapping[i];
>> +struct nvdimm_drvdata *ndd = to_ndd_unlocked(nd_mapping);
>> +
>> +/*
>> + * Note, nvdimm_drvdata guaranteed to be live since we
>> + * arrange for all associated regions to be disabled
>> + * before the dimm is disabled.
>> + */
>> +if (ndd->flush_wpq[0])
>
> Hi Dan,
>
> ndd is NULL on our system since nvdimm_probe() always fails
> in nvdimm_init_nsarea(), which then calls put_ndd() and
> nvdimm_drvdata_release().
>

Ah, thanks for the report!  I'll fix this.  We should only care about
labels being present when the DIMM has aliased BLK and PMEM capacity.

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2016-06-07 18:45 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-04 20:52 [PATCH 00/13] deprecate pcommit Dan Williams
2016-06-04 20:52 ` [PATCH 01/13] driver core, libnvdimm: disable manual unbind of dimms while region active Dan Williams
2016-06-04 21:10   ` Greg Kroah-Hartman
2016-06-04 21:39     ` Dan Williams
2016-06-04 21:45       ` Greg Kroah-Hartman
2016-06-04 21:48         ` Dan Williams
2016-06-04 21:50   ` kbuild test robot
2016-06-06 19:25   ` Linda Knippers
2016-06-06 19:31     ` Dan Williams
2016-06-06 19:36       ` Dan Williams
2016-06-06 19:36       ` Linda Knippers
2016-06-06 19:46         ` Dan Williams
2016-06-06 20:20           ` Linda Knippers
2016-06-06 20:36             ` Dan Williams
2016-06-06 21:15               ` Linda Knippers
2016-06-04 20:52 ` [PATCH 02/13] nfit: always associate flush hints Dan Williams
2016-06-04 20:52 ` [PATCH 03/13] libnvdimm: introduce nvdimm_flush() Dan Williams
2016-06-06 17:45   ` Jeff Moyer
2016-06-04 20:52 ` [PATCH 04/13] libnvdimm, nfit: move flush hint mapping to dimm driver Dan Williams
2016-06-04 21:29   ` kbuild test robot
2016-06-04 21:40   ` kbuild test robot
2016-06-04 21:49   ` kbuild test robot
2016-06-07 18:11   ` Kani, Toshimitsu
2016-06-07 18:15     ` Dan Williams
2016-06-04 20:52 ` [PATCH 05/13] tools/testing/nvdimm: simulate multiple flush hints per-dimm Dan Williams
2016-06-04 20:53 ` [PATCH 06/13] libnvdimm: cycle flush hints per-cpu Dan Williams
2016-06-04 20:53 ` [PATCH 07/13] libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush() Dan Williams
2016-06-04 20:53 ` [PATCH 08/13] fs/dax: remove wmb_pmem() Dan Williams
2016-06-04 20:53 ` [PATCH 09/13] libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes Dan Williams
2016-06-04 20:53 ` [PATCH 10/13] pmem: kill wmb_pmem() Dan Williams
2016-06-04 20:53 ` [PATCH 11/13] Revert "KVM: x86: add pcommit support" Dan Williams
2016-06-06 15:14   ` Paolo Bonzini
2016-06-06 16:14     ` Dan Williams
2016-06-04 20:53 ` [PATCH 12/13] x86/insn: remove pcommit Dan Williams
2016-06-04 20:53 ` [PATCH 13/13] pmem: kill __pmem address space Dan Williams
2016-06-04 22:18   ` kbuild test robot
2016-06-05 17:41 ` [PATCH 00/13] deprecate pcommit Andy Lutomirski
2016-06-05 18:48   ` Rudoff, Andy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).