All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/9] libnvdimm: add DMA supported blk-mq pmem driver
@ 2017-08-30 20:55 Dave Jiang
  2017-08-30 20:55 ` [PATCH v7 1/9] dmaengine: ioatdma: revert 7618d035 to allow sharing of DMA channels Dave Jiang
                   ` (8 more replies)
  0 siblings, 9 replies; 12+ messages in thread
From: Dave Jiang @ 2017-08-30 20:55 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams; +Cc: dmaengine, hch, linux-nvdimm

- Per Dan's suggestions
  - Moved all common code from attach_disk to pmem_core as helper functions.
  - Fixed up Kconfig dependencies
  - Cleaned up header file inclusions
  - Removed module parameters
  - Split pmem_core refactor into own patch
  - Removed REQ_FLUSH define

v6:
- Put all common code for pmem drivers in pmem_core per Dan's suggestion.
- Added support code to get number of available DMA chans
- Fixed up Kconfig so that when pmem is built into the kernel, pmem_dma won't
  show up. 

v5:
- Added support to report descriptor transfer capability limit from dmaengine.
- Fixed up scatterlist support for dma_unmap_data per Dan's comments.
- Made the driver a separate pmem blk driver per Christoph's suggestion
  and also fixed up all the issues pointed out by Christoph.
- Added pmem badblock checking/handling per Robert and also made DMA op to
  be used by all buffer sizes.

v4: 
- Addressed kbuild test bot issues. Passed kbuild test bot, 179 configs.

v3:
- Added patch to rename DMA_SG to DMA_SG_SG to make it explicit
- Added DMA_MEMCPY_SG transaction type to dmaengine
- Misc patch to add verification of DMA_MEMSET_SG that was missing
- Addressed all nd_pmem driver comments from Ross.

v2:
- Make dma_prep_memcpy_* into one function per Dan.
- Addressed various comments from Ross with code formatting and etc.
- Replaced open code with offset_in_page() macro per Johannes.

The following series implements a DMA driven blk-mq pmem driver and
also adds infrastructure code to ioatdma and dmaengine in order to
support copying to and from scatterlist in order to process block
requests provided by blk-mq. The usage of DMA engines available on certain
platforms allow us to drastically reduce CPU utilization and at the same time
maintain performance that is good enough. Experimentations have been done on
DRAM backed pmem block device that showed the utilization of DMA engine is
beneficial. By default nd_pmem.ko will be loaded. This can be overridden
through module blacklisting in order to load nd_pmem_dma.ko.

---

Dave Jiang (9):
      dmaengine: ioatdma: revert 7618d035 to allow sharing of DMA channels
      dmaengine: Add DMA_MEMCPY_SG transaction op
      dmaengine: ioatdma: dma_prep_memcpy_sg support
      dmaengine: add function to provide per descriptor xfercap for dma engine
      dmaengine: add SG support to dmaengine_unmap
      dmaengine: provide number of available channels
      libnvdimm: remove definition of REQ_FLUSH
      libnvdimm: move common function for pmem to pmem_core
      libnvdimm: Add DMA based blk-mq pmem driver


 Documentation/dmaengine/provider.txt |    3 
 drivers/dma/dmaengine.c              |   72 +++++
 drivers/dma/ioat/dma.h               |    4 
 drivers/dma/ioat/init.c              |    6 
 drivers/dma/ioat/prep.c              |   57 ++++
 drivers/nvdimm/Kconfig               |   28 ++
 drivers/nvdimm/Makefile              |    6 
 drivers/nvdimm/pmem.c                |  408 +----------------------------
 drivers/nvdimm/pmem.h                |   55 ++++
 drivers/nvdimm/pmem_core.c           |  451 ++++++++++++++++++++++++++++++++
 drivers/nvdimm/pmem_dma.c            |  475 ++++++++++++++++++++++++++++++++++
 include/linux/dmaengine.h            |   49 +++-
 12 files changed, 1213 insertions(+), 401 deletions(-)
 create mode 100644 drivers/nvdimm/pmem_core.c
 create mode 100644 drivers/nvdimm/pmem_dma.c

--
Signature
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v7 1/9] dmaengine: ioatdma: revert 7618d035 to allow sharing of DMA channels
  2017-08-30 20:55 [PATCH v7 0/9] libnvdimm: add DMA supported blk-mq pmem driver Dave Jiang
@ 2017-08-30 20:55 ` Dave Jiang
  2017-08-30 20:55 ` [PATCH v7 2/9] dmaengine: Add DMA_MEMCPY_SG transaction op Dave Jiang
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Dave Jiang @ 2017-08-30 20:55 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams; +Cc: dmaengine, hch, linux-nvdimm

Commit 7618d0359c16 ("dmaengine: ioatdma: Set non RAID channels to be
private capable") makes all non-RAID ioatdma channels as private to be
requestable by dma_request_channel(). With PQ CAP support going away for
ioatdma, this would make all channels private. To support the usage of
ioatdma for blk-mq implementation of pmem we need as many channels we can
share in order to be high performing. Thus reverting the patch.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/dma/ioat/init.c |    3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c
index ed8ed11..1b881fb 100644
--- a/drivers/dma/ioat/init.c
+++ b/drivers/dma/ioat/init.c
@@ -1153,9 +1153,6 @@ static int ioat3_dma_probe(struct ioatdma_device *ioat_dma, int dca)
 		}
 	}
 
-	if (!(ioat_dma->cap & (IOAT_CAP_XOR | IOAT_CAP_PQ)))
-		dma_cap_set(DMA_PRIVATE, dma->cap_mask);
-
 	err = ioat_probe(ioat_dma);
 	if (err)
 		return err;

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 2/9] dmaengine: Add DMA_MEMCPY_SG transaction op
  2017-08-30 20:55 [PATCH v7 0/9] libnvdimm: add DMA supported blk-mq pmem driver Dave Jiang
  2017-08-30 20:55 ` [PATCH v7 1/9] dmaengine: ioatdma: revert 7618d035 to allow sharing of DMA channels Dave Jiang
@ 2017-08-30 20:55 ` Dave Jiang
  2017-08-30 20:55 ` [PATCH v7 3/9] dmaengine: ioatdma: dma_prep_memcpy_sg support Dave Jiang
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Dave Jiang @ 2017-08-30 20:55 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams; +Cc: dmaengine, hch, linux-nvdimm

Adding a dmaengine transaction operation that allows copy to/from a
scatterlist and a flat buffer.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 Documentation/dmaengine/provider.txt |    3 +++
 include/linux/dmaengine.h            |   19 +++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/Documentation/dmaengine/provider.txt b/Documentation/dmaengine/provider.txt
index a75f52f..6241e36 100644
--- a/Documentation/dmaengine/provider.txt
+++ b/Documentation/dmaengine/provider.txt
@@ -181,6 +181,9 @@ Currently, the types available are:
     - Used by the client drivers to register a callback that will be
       called on a regular basis through the DMA controller interrupt
 
+  * DMA_MEMCPY_SG
+    - The device supports scatterlist to/from memory.
+
   * DMA_PRIVATE
     - The devices only supports slave transfers, and as such isn't
       available for async transfers.
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 64fbd38..0c91411 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -67,6 +67,7 @@ enum dma_transaction_type {
 	DMA_PQ_VAL,
 	DMA_MEMSET,
 	DMA_MEMSET_SG,
+	DMA_MEMCPY_SG,
 	DMA_INTERRUPT,
 	DMA_PRIVATE,
 	DMA_ASYNC_TX,
@@ -692,6 +693,7 @@ struct dma_filter {
  * @device_prep_dma_pq_val: prepares a pqzero_sum operation
  * @device_prep_dma_memset: prepares a memset operation
  * @device_prep_dma_memset_sg: prepares a memset operation over a scatter list
+ * @device_prep_dma_memcpy_sg: prepares memcpy between scatterlist and buffer
  * @device_prep_dma_interrupt: prepares an end of chain interrupt operation
  * @device_prep_slave_sg: prepares a slave dma operation
  * @device_prep_dma_cyclic: prepare a cyclic dma operation suitable for audio.
@@ -768,6 +770,10 @@ struct dma_device {
 	struct dma_async_tx_descriptor *(*device_prep_dma_memset_sg)(
 		struct dma_chan *chan, struct scatterlist *sg,
 		unsigned int nents, int value, unsigned long flags);
+	struct dma_async_tx_descriptor *(*device_prep_dma_memcpy_sg)(
+		struct dma_chan *chan,
+		struct scatterlist *sg, unsigned int sg_nents,
+		dma_addr_t buf, bool to_sg, unsigned long flags);
 	struct dma_async_tx_descriptor *(*device_prep_dma_interrupt)(
 		struct dma_chan *chan, unsigned long flags);
 
@@ -899,6 +905,19 @@ static inline struct dma_async_tx_descriptor *dmaengine_prep_dma_memcpy(
 						    len, flags);
 }
 
+static inline struct dma_async_tx_descriptor *dmaengine_prep_dma_memcpy_sg(
+		struct dma_chan *chan, struct scatterlist *sg,
+		unsigned int sg_nents, dma_addr_t buf, bool to_sg,
+		unsigned long flags)
+{
+	if (!chan || !chan->device ||
+			!chan->device->device_prep_dma_memcpy_sg)
+		return NULL;
+
+	return chan->device->device_prep_dma_memcpy_sg(chan, sg, sg_nents,
+						       buf, to_sg, flags);
+}
+
 /**
  * dmaengine_terminate_all() - Terminate all active DMA transfers
  * @chan: The channel for which to terminate the transfers

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 3/9] dmaengine: ioatdma: dma_prep_memcpy_sg support
  2017-08-30 20:55 [PATCH v7 0/9] libnvdimm: add DMA supported blk-mq pmem driver Dave Jiang
  2017-08-30 20:55 ` [PATCH v7 1/9] dmaengine: ioatdma: revert 7618d035 to allow sharing of DMA channels Dave Jiang
  2017-08-30 20:55 ` [PATCH v7 2/9] dmaengine: Add DMA_MEMCPY_SG transaction op Dave Jiang
@ 2017-08-30 20:55 ` Dave Jiang
  2017-08-30 20:55 ` [PATCH v7 4/9] dmaengine: add function to provide per descriptor xfercap for dma engine Dave Jiang
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Dave Jiang @ 2017-08-30 20:55 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams; +Cc: dmaengine, hch, linux-nvdimm

Adding ioatdma support to copy from a physically contiguous buffer to a
provided scatterlist and vice versa. This is used to support
reading/writing persistent memory in the pmem driver.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/dma/ioat/dma.h  |    4 +++
 drivers/dma/ioat/init.c |    2 ++
 drivers/dma/ioat/prep.c |   57 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 63 insertions(+)

diff --git a/drivers/dma/ioat/dma.h b/drivers/dma/ioat/dma.h
index 56200ee..6c08b06 100644
--- a/drivers/dma/ioat/dma.h
+++ b/drivers/dma/ioat/dma.h
@@ -370,6 +370,10 @@ struct dma_async_tx_descriptor *
 ioat_dma_prep_memcpy_lock(struct dma_chan *c, dma_addr_t dma_dest,
 			   dma_addr_t dma_src, size_t len, unsigned long flags);
 struct dma_async_tx_descriptor *
+ioat_dma_prep_memcpy_sg_lock(struct dma_chan *c,
+		struct scatterlist *sg, unsigned int sg_nents,
+		dma_addr_t dma_addr, bool to_sg, unsigned long flags);
+struct dma_async_tx_descriptor *
 ioat_prep_interrupt_lock(struct dma_chan *c, unsigned long flags);
 struct dma_async_tx_descriptor *
 ioat_prep_xor(struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src,
diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c
index 1b881fb..5c69ff6 100644
--- a/drivers/dma/ioat/init.c
+++ b/drivers/dma/ioat/init.c
@@ -1081,6 +1081,8 @@ static int ioat3_dma_probe(struct ioatdma_device *ioat_dma, int dca)
 
 	dma = &ioat_dma->dma_dev;
 	dma->device_prep_dma_memcpy = ioat_dma_prep_memcpy_lock;
+	dma_cap_set(DMA_MEMCPY_SG, dma->cap_mask);
+	dma->device_prep_dma_memcpy_sg = ioat_dma_prep_memcpy_sg_lock;
 	dma->device_issue_pending = ioat_issue_pending;
 	dma->device_alloc_chan_resources = ioat_alloc_chan_resources;
 	dma->device_free_chan_resources = ioat_free_chan_resources;
diff --git a/drivers/dma/ioat/prep.c b/drivers/dma/ioat/prep.c
index 243421a..d8219af 100644
--- a/drivers/dma/ioat/prep.c
+++ b/drivers/dma/ioat/prep.c
@@ -159,6 +159,63 @@ ioat_dma_prep_memcpy_lock(struct dma_chan *c, dma_addr_t dma_dest,
 	return &desc->txd;
 }
 
+struct dma_async_tx_descriptor *
+ioat_dma_prep_memcpy_sg_lock(struct dma_chan *c,
+		struct scatterlist *sg, unsigned int sg_nents,
+		dma_addr_t dma_addr, bool to_sg, unsigned long flags)
+{
+	struct ioatdma_chan *ioat_chan = to_ioat_chan(c);
+	struct ioat_dma_descriptor *hw = NULL;
+	struct ioat_ring_ent *desc = NULL;
+	dma_addr_t dma_off = dma_addr;
+	int num_descs, idx, i;
+	struct scatterlist *s;
+	size_t total_len = 0, len;
+
+
+	if (test_bit(IOAT_CHAN_DOWN, &ioat_chan->state))
+		return NULL;
+
+	/*
+	 * The upper layer will garantee that each entry does not exceed
+	 * xfercap.
+	 */
+	num_descs = sg_nents;
+
+	if (likely(num_descs) &&
+	    ioat_check_space_lock(ioat_chan, num_descs) == 0)
+		idx = ioat_chan->head;
+	else
+		return NULL;
+
+	for_each_sg(sg, s, sg_nents, i) {
+		desc = ioat_get_ring_ent(ioat_chan, idx + i);
+		hw = desc->hw;
+		len = sg_dma_len(s);
+		hw->size = len;
+		hw->ctl = 0;
+		if (to_sg) {
+			hw->src_addr = dma_off;
+			hw->dst_addr = sg_dma_address(s);
+		} else {
+			hw->src_addr = sg_dma_address(s);
+			hw->dst_addr = dma_off;
+		}
+		dma_off += len;
+		total_len += len;
+		dump_desc_dbg(ioat_chan, desc);
+	}
+
+	desc->txd.flags = flags;
+	desc->len = total_len;
+	hw->ctl_f.int_en = !!(flags & DMA_PREP_INTERRUPT);
+	hw->ctl_f.fence = !!(flags & DMA_PREP_FENCE);
+	hw->ctl_f.compl_write = 1;
+	dump_desc_dbg(ioat_chan, desc);
+	/* we leave the channel locked to ensure in order submission */
+
+	return &desc->txd;
+}
 
 static struct dma_async_tx_descriptor *
 __ioat_prep_xor_lock(struct dma_chan *c, enum sum_check_flags *result,

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 4/9] dmaengine: add function to provide per descriptor xfercap for dma engine
  2017-08-30 20:55 [PATCH v7 0/9] libnvdimm: add DMA supported blk-mq pmem driver Dave Jiang
                   ` (2 preceding siblings ...)
  2017-08-30 20:55 ` [PATCH v7 3/9] dmaengine: ioatdma: dma_prep_memcpy_sg support Dave Jiang
@ 2017-08-30 20:55 ` Dave Jiang
  2017-08-30 20:56 ` [PATCH v7 5/9] dmaengine: add SG support to dmaengine_unmap Dave Jiang
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Dave Jiang @ 2017-08-30 20:55 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams; +Cc: dmaengine, hch, linux-nvdimm

Adding a function that will export the transfer capability per descriptor
for a DMA device for the dmaengine subsystem.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/dma/ioat/init.c   |    1 +
 include/linux/dmaengine.h |   10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c
index 5c69ff6..4f24c36 100644
--- a/drivers/dma/ioat/init.c
+++ b/drivers/dma/ioat/init.c
@@ -596,6 +596,7 @@ static int ioat_enumerate_channels(struct ioatdma_device *ioat_dma)
 	if (xfercap_log == 0)
 		return 0;
 	dev_dbg(dev, "%s: xfercap = %d\n", __func__, 1 << xfercap_log);
+	dma->xfercap = 1 << xfercap_log;
 
 	for (i = 0; i < dma->chancnt; i++) {
 		ioat_chan = devm_kzalloc(dev, sizeof(*ioat_chan), GFP_KERNEL);
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 0c91411..53356c4 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -743,6 +743,7 @@ struct dma_device {
 	u32 max_burst;
 	bool descriptor_reuse;
 	enum dma_residue_granularity residue_granularity;
+	u64 xfercap;	/* descriptor transfer capability limit */
 
 	int (*device_alloc_chan_resources)(struct dma_chan *chan);
 	void (*device_free_chan_resources)(struct dma_chan *chan);
@@ -1326,6 +1327,11 @@ struct dma_chan *dma_request_chan_by_mask(const dma_cap_mask_t *mask);
 
 void dma_release_channel(struct dma_chan *chan);
 int dma_get_slave_caps(struct dma_chan *chan, struct dma_slave_caps *caps);
+
+static inline u64 dma_get_desc_xfercap(struct dma_chan *chan)
+{
+	return chan->device->xfercap;
+}
 #else
 static inline struct dma_chan *dma_find_channel(enum dma_transaction_type tx_type)
 {
@@ -1370,6 +1376,10 @@ static inline int dma_get_slave_caps(struct dma_chan *chan,
 {
 	return -ENXIO;
 }
+static inline u64 dma_get_desc_xfercap(struct dma_chan *chan)
+{
+	return -ENXIO;
+}
 #endif
 
 #define dma_request_slave_channel_reason(dev, name) dma_request_chan(dev, name)

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 5/9] dmaengine: add SG support to dmaengine_unmap
  2017-08-30 20:55 [PATCH v7 0/9] libnvdimm: add DMA supported blk-mq pmem driver Dave Jiang
                   ` (3 preceding siblings ...)
  2017-08-30 20:55 ` [PATCH v7 4/9] dmaengine: add function to provide per descriptor xfercap for dma engine Dave Jiang
@ 2017-08-30 20:56 ` Dave Jiang
  2017-08-30 20:56 ` [PATCH v7 6/9] dmaengine: provide number of available channels Dave Jiang
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Dave Jiang @ 2017-08-30 20:56 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams; +Cc: dmaengine, hch, linux-nvdimm

This provides support to unmap scatterlist with the
dmaengine_unmap_data. We will support only 1 scatterlist per
direction.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/dma/dmaengine.c   |   27 +++++++++++++++++++++++++++
 include/linux/dmaengine.h |   13 ++++++++++++-
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 428b141..b4fd81a 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -1120,12 +1120,39 @@ static struct dmaengine_unmap_pool *__get_unmap_pool(int nr)
 	}
 }
 
+static void dmaengine_unmap_sg(struct dmaengine_unmap_data *unmap)
+{
+	struct device *dev = unmap->dev;
+
+	if (unmap->to_sg) {
+		dma_unmap_sg(dev, unmap->unmap_sg.sg,
+				unmap->sg_nents, DMA_TO_DEVICE);
+
+		dma_unmap_page(dev, unmap->unmap_sg.buf_phys, unmap->len,
+					DMA_FROM_DEVICE);
+	}
+
+	if (unmap->from_sg) {
+		dma_unmap_page(dev, unmap->unmap_sg.buf_phys, unmap->len,
+				DMA_TO_DEVICE);
+		dma_unmap_sg(dev, unmap->unmap_sg.sg,
+				unmap->sg_nents, DMA_FROM_DEVICE);
+	}
+
+	mempool_free(unmap, __get_unmap_pool(unmap->map_cnt)->pool);
+}
+
 static void dmaengine_unmap(struct kref *kref)
 {
 	struct dmaengine_unmap_data *unmap = container_of(kref, typeof(*unmap), kref);
 	struct device *dev = unmap->dev;
 	int cnt, i;
 
+	if (unmap->to_sg || unmap->from_sg) {
+		dmaengine_unmap_sg(unmap);
+		return;
+	}
+
 	cnt = unmap->to_cnt;
 	for (i = 0; i < cnt; i++)
 		dma_unmap_page(dev, unmap->addr[i], unmap->len,
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 53356c4..fc53854 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -464,15 +464,26 @@ struct dmaengine_result {
 typedef void (*dma_async_tx_callback_result)(void *dma_async_param,
 				const struct dmaengine_result *result);
 
+struct dmaengine_unmap_sg {
+	struct scatterlist *sg;
+	dma_addr_t buf_phys;
+};
+
 struct dmaengine_unmap_data {
 	u8 map_cnt;
 	u8 to_cnt;
+	u8 to_sg;
 	u8 from_cnt;
+	u8 from_sg;
 	u8 bidi_cnt;
+	int sg_nents;
 	struct device *dev;
 	struct kref kref;
 	size_t len;
-	dma_addr_t addr[0];
+	union {
+		struct dmaengine_unmap_sg unmap_sg;
+		dma_addr_t addr[0];
+	};
 };
 
 /**

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 6/9] dmaengine: provide number of available channels
  2017-08-30 20:55 [PATCH v7 0/9] libnvdimm: add DMA supported blk-mq pmem driver Dave Jiang
                   ` (4 preceding siblings ...)
  2017-08-30 20:56 ` [PATCH v7 5/9] dmaengine: add SG support to dmaengine_unmap Dave Jiang
@ 2017-08-30 20:56 ` Dave Jiang
  2017-08-30 20:56 ` [PATCH v7 7/9] libnvdimm: remove definition of REQ_FLUSH Dave Jiang
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Dave Jiang @ 2017-08-30 20:56 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams; +Cc: dmaengine, hch, linux-nvdimm

Adding a dmaengine support function to provide the number of available
channels that can be shared with support of a filter function.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/dma/dmaengine.c   |   45 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dmaengine.h |    7 +++++++
 2 files changed, 52 insertions(+)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index b4fd81a..bd5000e 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -674,6 +674,51 @@ struct dma_chan *__dma_request_channel(const dma_cap_mask_t *mask,
 }
 EXPORT_SYMBOL_GPL(__dma_request_channel);
 
+static int get_candidate_count(const dma_cap_mask_t *mask,
+					  struct dma_device *dev,
+					  dma_filter_fn fn, void *fn_param)
+{
+	struct dma_chan *chan;
+	int count = 0;
+
+	if (mask && !__dma_device_satisfies_mask(dev, mask)) {
+		dev_dbg(dev->dev, "%s: wrong capabilities\n", __func__);
+		return 0;
+	}
+
+	list_for_each_entry(chan, &dev->channels, device_node) {
+		if (dma_has_cap(DMA_PRIVATE, dev->cap_mask)) {
+			dev_dbg(dev->dev, "%s: %s is marked for private\n",
+				 __func__, dma_chan_name(chan));
+			continue;
+		}
+		if (fn && !fn(chan, fn_param)) {
+			dev_dbg(dev->dev, "%s: %s filter said false\n",
+				 __func__, dma_chan_name(chan));
+			continue;
+		}
+		count++;
+	}
+
+	return count;
+}
+
+int dma_get_channel_count(const dma_cap_mask_t *mask,
+			    dma_filter_fn fn, void *fn_param)
+{
+	struct dma_device *device;
+	int total = 0;
+
+	/* Find a channel */
+	mutex_lock(&dma_list_mutex);
+	list_for_each_entry(device, &dma_device_list, global_node)
+		total += get_candidate_count(mask, device, fn, fn_param);
+	mutex_unlock(&dma_list_mutex);
+
+	return total;
+}
+EXPORT_SYMBOL_GPL(dma_get_channel_count);
+
 static const struct dma_slave_map *dma_filter_match(struct dma_device *device,
 						    const char *name,
 						    struct device *dev)
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index fc53854..ab5f8c1 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -1331,6 +1331,8 @@ enum dma_status dma_wait_for_async_tx(struct dma_async_tx_descriptor *tx);
 void dma_issue_pending_all(void);
 struct dma_chan *__dma_request_channel(const dma_cap_mask_t *mask,
 					dma_filter_fn fn, void *fn_param);
+int dma_get_channel_count(const dma_cap_mask_t *mask,
+					dma_filter_fn fn, void *fn_param);
 struct dma_chan *dma_request_slave_channel(struct device *dev, const char *name);
 
 struct dma_chan *dma_request_chan(struct device *dev, const char *name);
@@ -1364,6 +1366,11 @@ static inline struct dma_chan *__dma_request_channel(const dma_cap_mask_t *mask,
 {
 	return NULL;
 }
+static inline int dma_get_channel_count(const dma_cap_mask_t *mask,
+					dma_filter_fn fn, void *fn_param)
+{
+	return 0;
+}
 static inline struct dma_chan *dma_request_slave_channel(struct device *dev,
 							 const char *name)
 {

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 7/9] libnvdimm: remove definition of REQ_FLUSH
  2017-08-30 20:55 [PATCH v7 0/9] libnvdimm: add DMA supported blk-mq pmem driver Dave Jiang
                   ` (5 preceding siblings ...)
  2017-08-30 20:56 ` [PATCH v7 6/9] dmaengine: provide number of available channels Dave Jiang
@ 2017-08-30 20:56 ` Dave Jiang
  2017-08-30 20:56 ` [PATCH v7 8/9] libnvdimm: move common function for pmem to pmem_core Dave Jiang
  2017-08-30 20:56 ` [PATCH v7 9/9] libnvdimm: Add DMA based blk-mq pmem driver Dave Jiang
  8 siblings, 0 replies; 12+ messages in thread
From: Dave Jiang @ 2017-08-30 20:56 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams; +Cc: dmaengine, hch, linux-nvdimm

According to driver comment REQ_FLUSH is replaced with REQ_PREFLUSH after
v4.8-rc1. Removing definition and putting in new definition.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/nvdimm/pmem.c |    7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index f7099ada..b1460d1 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -144,11 +144,6 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page,
 	return rc;
 }
 
-/* account for REQ_FLUSH rename, replace with REQ_PREFLUSH after v4.8-rc1 */
-#ifndef REQ_FLUSH
-#define REQ_FLUSH REQ_PREFLUSH
-#endif
-
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
 	blk_status_t rc = 0;
@@ -159,7 +154,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 	struct pmem_device *pmem = q->queuedata;
 	struct nd_region *nd_region = to_region(pmem);
 
-	if (bio->bi_opf & REQ_FLUSH)
+	if (bio->bi_opf & REQ_PREFLUSH)
 		nvdimm_flush(nd_region);
 
 	do_acct = nd_iostat_start(bio, &start);

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 8/9] libnvdimm: move common function for pmem to pmem_core
  2017-08-30 20:55 [PATCH v7 0/9] libnvdimm: add DMA supported blk-mq pmem driver Dave Jiang
                   ` (6 preceding siblings ...)
  2017-08-30 20:56 ` [PATCH v7 7/9] libnvdimm: remove definition of REQ_FLUSH Dave Jiang
@ 2017-08-30 20:56 ` Dave Jiang
  2017-08-30 20:56 ` [PATCH v7 9/9] libnvdimm: Add DMA based blk-mq pmem driver Dave Jiang
  8 siblings, 0 replies; 12+ messages in thread
From: Dave Jiang @ 2017-08-30 20:56 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams; +Cc: dmaengine, hch, linux-nvdimm

In preparation for adding another pmem driver, moving all common functions
in pmem that will be used by new driver to a pmem_core module to share.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/nvdimm/Kconfig     |   10 +
 drivers/nvdimm/Makefile    |    3 
 drivers/nvdimm/pmem.c      |  401 +--------------------------------------
 drivers/nvdimm/pmem.h      |   54 +++++
 drivers/nvdimm/pmem_core.c |  451 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 528 insertions(+), 391 deletions(-)
 create mode 100644 drivers/nvdimm/pmem_core.c

diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index 5bdd499..01fe9e8 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -17,12 +17,16 @@ menuconfig LIBNVDIMM
 
 if LIBNVDIMM
 
-config BLK_DEV_PMEM
-	tristate "PMEM: Persistent memory block device support"
-	default LIBNVDIMM
+config BLK_DEV_PMEM_CORE
+	tristate
 	select DAX
 	select ND_BTT if BTT
 	select ND_PFN if NVDIMM_PFN
+
+config BLK_DEV_PMEM
+	tristate "PMEM: Persistent memory block device support"
+	default LIBNVDIMM
+	select BLK_DEV_PMEM_CORE
 	help
 	  Memory ranges for PMEM are described by either an NFIT
 	  (NVDIMM Firmware Interface Table, see CONFIG_NFIT_ACPI), a
diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 909554c..0ce99cf 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -1,9 +1,12 @@
 obj-$(CONFIG_LIBNVDIMM) += libnvdimm.o
+obj-$(CONFIG_BLK_DEV_PMEM_CORE) += nd_pmem_core.o
 obj-$(CONFIG_BLK_DEV_PMEM) += nd_pmem.o
 obj-$(CONFIG_ND_BTT) += nd_btt.o
 obj-$(CONFIG_ND_BLK) += nd_blk.o
 obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
 
+nd_pmem_core-y := pmem_core.o
+
 nd_pmem-y := pmem.o
 
 nd_btt-y := btt.o
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index b1460d1..025d22d 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -15,134 +15,9 @@
  * more details.
  */
 
-#include <asm/cacheflush.h>
-#include <linux/blkdev.h>
-#include <linux/hdreg.h>
 #include <linux/init.h>
-#include <linux/platform_device.h>
 #include <linux/module.h>
-#include <linux/moduleparam.h>
-#include <linux/badblocks.h>
-#include <linux/memremap.h>
-#include <linux/vmalloc.h>
-#include <linux/blk-mq.h>
-#include <linux/pfn_t.h>
-#include <linux/slab.h>
-#include <linux/uio.h>
-#include <linux/dax.h>
-#include <linux/nd.h>
 #include "pmem.h"
-#include "pfn.h"
-#include "nd.h"
-
-static struct device *to_dev(struct pmem_device *pmem)
-{
-	/*
-	 * nvdimm bus services need a 'dev' parameter, and we record the device
-	 * at init in bb.dev.
-	 */
-	return pmem->bb.dev;
-}
-
-static struct nd_region *to_region(struct pmem_device *pmem)
-{
-	return to_nd_region(to_dev(pmem)->parent);
-}
-
-static blk_status_t pmem_clear_poison(struct pmem_device *pmem,
-		phys_addr_t offset, unsigned int len)
-{
-	struct device *dev = to_dev(pmem);
-	sector_t sector;
-	long cleared;
-	blk_status_t rc = BLK_STS_OK;
-
-	sector = (offset - pmem->data_offset) / 512;
-
-	cleared = nvdimm_clear_poison(dev, pmem->phys_addr + offset, len);
-	if (cleared < len)
-		rc = BLK_STS_IOERR;
-	if (cleared > 0 && cleared / 512) {
-		cleared /= 512;
-		dev_dbg(dev, "%s: %#llx clear %ld sector%s\n", __func__,
-				(unsigned long long) sector, cleared,
-				cleared > 1 ? "s" : "");
-		badblocks_clear(&pmem->bb, sector, cleared);
-		if (pmem->bb_state)
-			sysfs_notify_dirent(pmem->bb_state);
-	}
-
-	arch_invalidate_pmem(pmem->virt_addr + offset, len);
-
-	return rc;
-}
-
-static void write_pmem(void *pmem_addr, struct page *page,
-		unsigned int off, unsigned int len)
-{
-	void *mem = kmap_atomic(page);
-
-	memcpy_flushcache(pmem_addr, mem + off, len);
-	kunmap_atomic(mem);
-}
-
-static blk_status_t read_pmem(struct page *page, unsigned int off,
-		void *pmem_addr, unsigned int len)
-{
-	int rc;
-	void *mem = kmap_atomic(page);
-
-	rc = memcpy_mcsafe(mem + off, pmem_addr, len);
-	kunmap_atomic(mem);
-	if (rc)
-		return BLK_STS_IOERR;
-	return BLK_STS_OK;
-}
-
-static blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page,
-			unsigned int len, unsigned int off, bool is_write,
-			sector_t sector)
-{
-	blk_status_t rc = BLK_STS_OK;
-	bool bad_pmem = false;
-	phys_addr_t pmem_off = sector * 512 + pmem->data_offset;
-	void *pmem_addr = pmem->virt_addr + pmem_off;
-
-	if (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
-		bad_pmem = true;
-
-	if (!is_write) {
-		if (unlikely(bad_pmem))
-			rc = BLK_STS_IOERR;
-		else {
-			rc = read_pmem(page, off, pmem_addr, len);
-			flush_dcache_page(page);
-		}
-	} else {
-		/*
-		 * Note that we write the data both before and after
-		 * clearing poison.  The write before clear poison
-		 * handles situations where the latest written data is
-		 * preserved and the clear poison operation simply marks
-		 * the address range as valid without changing the data.
-		 * In this case application software can assume that an
-		 * interrupted write will either return the new good
-		 * data or an error.
-		 *
-		 * However, if pmem_clear_poison() leaves the data in an
-		 * indeterminate state we need to perform the write
-		 * after clear poison.
-		 */
-		flush_dcache_page(page);
-		write_pmem(pmem_addr, page, off, len);
-		if (unlikely(bad_pmem)) {
-			rc = pmem_clear_poison(pmem, pmem_off, len);
-			write_pmem(pmem_addr, page, off, len);
-		}
-	}
-
-	return rc;
-}
 
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
@@ -177,73 +52,12 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 	return BLK_QC_T_NONE;
 }
 
-static int pmem_rw_page(struct block_device *bdev, sector_t sector,
-		       struct page *page, bool is_write)
-{
-	struct pmem_device *pmem = bdev->bd_queue->queuedata;
-	blk_status_t rc;
-
-	rc = pmem_do_bvec(pmem, page, PAGE_SIZE, 0, is_write, sector);
-
-	/*
-	 * The ->rw_page interface is subtle and tricky.  The core
-	 * retries on any error, so we can only invoke page_endio() in
-	 * the successful completion case.  Otherwise, we'll see crashes
-	 * caused by double completion.
-	 */
-	if (rc == 0)
-		page_endio(page, is_write, 0);
-
-	return blk_status_to_errno(rc);
-}
-
-/* see "strong" declaration in tools/testing/nvdimm/pmem-dax.c */
-__weak long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
-		long nr_pages, void **kaddr, pfn_t *pfn)
-{
-	resource_size_t offset = PFN_PHYS(pgoff) + pmem->data_offset;
-
-	if (unlikely(is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512,
-					PFN_PHYS(nr_pages))))
-		return -EIO;
-	*kaddr = pmem->virt_addr + offset;
-	*pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);
-
-	/*
-	 * If badblocks are present, limit known good range to the
-	 * requested range.
-	 */
-	if (unlikely(pmem->bb.count))
-		return nr_pages;
-	return PHYS_PFN(pmem->size - pmem->pfn_pad - offset);
-}
-
 static const struct block_device_operations pmem_fops = {
 	.owner =		THIS_MODULE,
 	.rw_page =		pmem_rw_page,
 	.revalidate_disk =	nvdimm_revalidate_disk,
 };
 
-static long pmem_dax_direct_access(struct dax_device *dax_dev,
-		pgoff_t pgoff, long nr_pages, void **kaddr, pfn_t *pfn)
-{
-	struct pmem_device *pmem = dax_get_private(dax_dev);
-
-	return __pmem_direct_access(pmem, pgoff, nr_pages, kaddr, pfn);
-}
-
-static size_t pmem_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff,
-		void *addr, size_t bytes, struct iov_iter *i)
-{
-	return copy_from_iter_flushcache(addr, bytes, i);
-}
-
-static void pmem_dax_flush(struct dax_device *dax_dev, pgoff_t pgoff,
-		void *addr, size_t size)
-{
-	arch_wb_cache_pmem(addr, size);
-}
-
 static const struct dax_operations pmem_dax_ops = {
 	.direct_access = pmem_dax_direct_access,
 	.copy_from_iter = pmem_copy_from_iter,
@@ -260,152 +74,37 @@ static void pmem_release_queue(void *q)
 	blk_cleanup_queue(q);
 }
 
-static void pmem_freeze_queue(void *q)
-{
-	blk_freeze_queue_start(q);
-}
-
-static void pmem_release_disk(void *__pmem)
-{
-	struct pmem_device *pmem = __pmem;
-
-	kill_dax(pmem->dax_dev);
-	put_dax(pmem->dax_dev);
-	del_gendisk(pmem->disk);
-	put_disk(pmem->disk);
-}
-
 static int pmem_attach_disk(struct device *dev,
 		struct nd_namespace_common *ndns)
 {
-	struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
-	struct nd_region *nd_region = to_nd_region(dev->parent);
-	struct vmem_altmap __altmap, *altmap = NULL;
-	int nid = dev_to_node(dev), fua, wbc;
-	struct resource *res = &nsio->res;
-	struct nd_pfn *nd_pfn = NULL;
-	struct dax_device *dax_dev;
-	struct nd_pfn_sb *pfn_sb;
 	struct pmem_device *pmem;
-	struct resource pfn_res;
 	struct request_queue *q;
-	struct device *gendev;
-	struct gendisk *disk;
-	void *addr;
-
-	/* while nsio_rw_bytes is active, parse a pfn info block if present */
-	if (is_nd_pfn(dev)) {
-		nd_pfn = to_nd_pfn(dev);
-		altmap = nvdimm_setup_pfn(nd_pfn, &pfn_res, &__altmap);
-		if (IS_ERR(altmap))
-			return PTR_ERR(altmap);
-	}
-
-	/* we're attaching a block device, disable raw namespace access */
-	devm_nsio_disable(dev, nsio);
+	int rc;
 
-	pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL);
+	pmem = pmem_core_setup_pmem(dev, ndns);
 	if (!pmem)
-		return -ENOMEM;
-
-	dev_set_drvdata(dev, pmem);
-	pmem->phys_addr = res->start;
-	pmem->size = resource_size(res);
-	fua = nvdimm_has_flush(nd_region);
-	if (!IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) || fua < 0) {
-		dev_warn(dev, "unable to guarantee persistence of writes\n");
-		fua = 0;
-	}
-	wbc = nvdimm_has_cache(nd_region);
-
-	if (!devm_request_mem_region(dev, res->start, resource_size(res),
-				dev_name(&ndns->dev))) {
-		dev_warn(dev, "could not reserve region %pR\n", res);
-		return -EBUSY;
-	}
+		return -ENXIO;
 
 	q = blk_alloc_queue_node(GFP_KERNEL, dev_to_node(dev));
 	if (!q)
 		return -ENOMEM;
 
-	if (devm_add_action_or_reset(dev, pmem_release_queue, q))
-		return -ENOMEM;
-
-	pmem->pfn_flags = PFN_DEV;
-	if (is_nd_pfn(dev)) {
-		addr = devm_memremap_pages(dev, &pfn_res, &q->q_usage_counter,
-				altmap);
-		pfn_sb = nd_pfn->pfn_sb;
-		pmem->data_offset = le64_to_cpu(pfn_sb->dataoff);
-		pmem->pfn_pad = resource_size(res) - resource_size(&pfn_res);
-		pmem->pfn_flags |= PFN_MAP;
-		res = &pfn_res; /* for badblocks populate */
-		res->start += pmem->data_offset;
-	} else if (pmem_should_map_pages(dev)) {
-		addr = devm_memremap_pages(dev, &nsio->res,
-				&q->q_usage_counter, NULL);
-		pmem->pfn_flags |= PFN_MAP;
-	} else
-		addr = devm_memremap(dev, pmem->phys_addr,
-				pmem->size, ARCH_MEMREMAP_PMEM);
-
-	/*
-	 * At release time the queue must be frozen before
-	 * devm_memremap_pages is unwound
-	 */
-	if (devm_add_action_or_reset(dev, pmem_freeze_queue, q))
-		return -ENOMEM;
-
-	if (IS_ERR(addr))
-		return PTR_ERR(addr);
-	pmem->virt_addr = addr;
-
-	blk_queue_write_cache(q, wbc, fua);
+	pmem->q = q;
+	pmem_core_setup_queue(dev, pmem, ndns);
 	blk_queue_make_request(q, pmem_make_request);
-	blk_queue_physical_block_size(q, PAGE_SIZE);
-	blk_queue_logical_block_size(q, pmem_sector_size(ndns));
 	blk_queue_max_hw_sectors(q, UINT_MAX);
-	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q);
-	queue_flag_set_unlocked(QUEUE_FLAG_DAX, q);
-	q->queuedata = pmem;
-
-	disk = alloc_disk_node(0, nid);
-	if (!disk)
-		return -ENOMEM;
-	pmem->disk = disk;
-
-	disk->fops		= &pmem_fops;
-	disk->queue		= q;
-	disk->flags		= GENHD_FL_EXT_DEVT;
-	nvdimm_namespace_disk_name(ndns, disk->disk_name);
-	set_capacity(disk, (pmem->size - pmem->pfn_pad - pmem->data_offset)
-			/ 512);
-	if (devm_init_badblocks(dev, &pmem->bb))
-		return -ENOMEM;
-	nvdimm_badblocks_populate(nd_region, &pmem->bb, res);
-	disk->bb = &pmem->bb;
-
-	dax_dev = alloc_dax(pmem, disk->disk_name, &pmem_dax_ops);
-	if (!dax_dev) {
-		put_disk(disk);
-		return -ENOMEM;
-	}
-	dax_write_cache(dax_dev, wbc);
-	pmem->dax_dev = dax_dev;
 
-	gendev = disk_to_dev(disk);
-	gendev->groups = pmem_attribute_groups;
-
-	device_add_disk(dev, disk);
-	if (devm_add_action_or_reset(dev, pmem_release_disk, pmem))
+	if (devm_add_action_or_reset(dev, pmem_release_queue, q))
 		return -ENOMEM;
 
-	revalidate_disk(disk);
+	rc = pmem_core_remap_pages(dev, pmem, ndns);
+	if (rc < 0)
+		return rc;
 
-	pmem->bb_state = sysfs_get_dirent(disk_to_dev(disk)->kobj.sd,
-					  "badblocks");
-	if (!pmem->bb_state)
-		dev_warn(dev, "'badblocks' notification disabled\n");
+	rc = pmem_core_setup_disk(dev, pmem, ndns, &pmem_fops,
+			&pmem_dax_ops, pmem_attribute_groups);
+	if (rc < 0)
+		return rc;
 
 	return 0;
 }
@@ -436,80 +135,6 @@ static int nd_pmem_probe(struct device *dev)
 	return pmem_attach_disk(dev, ndns);
 }
 
-static int nd_pmem_remove(struct device *dev)
-{
-	struct pmem_device *pmem = dev_get_drvdata(dev);
-
-	if (is_nd_btt(dev))
-		nvdimm_namespace_detach_btt(to_nd_btt(dev));
-	else {
-		/*
-		 * Note, this assumes device_lock() context to not race
-		 * nd_pmem_notify()
-		 */
-		sysfs_put(pmem->bb_state);
-		pmem->bb_state = NULL;
-	}
-	nvdimm_flush(to_nd_region(dev->parent));
-
-	return 0;
-}
-
-static void nd_pmem_shutdown(struct device *dev)
-{
-	nvdimm_flush(to_nd_region(dev->parent));
-}
-
-static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
-{
-	struct nd_region *nd_region;
-	resource_size_t offset = 0, end_trunc = 0;
-	struct nd_namespace_common *ndns;
-	struct nd_namespace_io *nsio;
-	struct resource res;
-	struct badblocks *bb;
-	struct kernfs_node *bb_state;
-
-	if (event != NVDIMM_REVALIDATE_POISON)
-		return;
-
-	if (is_nd_btt(dev)) {
-		struct nd_btt *nd_btt = to_nd_btt(dev);
-
-		ndns = nd_btt->ndns;
-		nd_region = to_nd_region(ndns->dev.parent);
-		nsio = to_nd_namespace_io(&ndns->dev);
-		bb = &nsio->bb;
-		bb_state = NULL;
-	} else {
-		struct pmem_device *pmem = dev_get_drvdata(dev);
-
-		nd_region = to_region(pmem);
-		bb = &pmem->bb;
-		bb_state = pmem->bb_state;
-
-		if (is_nd_pfn(dev)) {
-			struct nd_pfn *nd_pfn = to_nd_pfn(dev);
-			struct nd_pfn_sb *pfn_sb = nd_pfn->pfn_sb;
-
-			ndns = nd_pfn->ndns;
-			offset = pmem->data_offset +
-					__le32_to_cpu(pfn_sb->start_pad);
-			end_trunc = __le32_to_cpu(pfn_sb->end_trunc);
-		} else {
-			ndns = to_ndns(dev);
-		}
-
-		nsio = to_nd_namespace_io(&ndns->dev);
-	}
-
-	res.start = nsio->res.start + offset;
-	res.end = nsio->res.end - end_trunc;
-	nvdimm_badblocks_populate(nd_region, bb, &res);
-	if (bb_state)
-		sysfs_notify_dirent(bb_state);
-}
-
 MODULE_ALIAS("pmem");
 MODULE_ALIAS_ND_DEVICE(ND_DEVICE_NAMESPACE_IO);
 MODULE_ALIAS_ND_DEVICE(ND_DEVICE_NAMESPACE_PMEM);
diff --git a/drivers/nvdimm/pmem.h b/drivers/nvdimm/pmem.h
index 5434321..6df833e 100644
--- a/drivers/nvdimm/pmem.h
+++ b/drivers/nvdimm/pmem.h
@@ -4,6 +4,9 @@
 #include <linux/types.h>
 #include <linux/pfn_t.h>
 #include <linux/fs.h>
+#include <linux/blk-mq.h>
+#include <linux/dax.h>
+#include "nd.h"
 
 #ifdef CONFIG_ARCH_HAS_PMEM_API
 #define ARCH_MEMREMAP_PMEM MEMREMAP_WB
@@ -35,8 +38,59 @@ struct pmem_device {
 	struct badblocks	bb;
 	struct dax_device	*dax_dev;
 	struct gendisk		*disk;
+	struct blk_mq_tag_set	tag_set;
+	struct request_queue	*q;
 };
 
+static inline struct device *to_dev(struct pmem_device *pmem)
+{
+	/*
+	 * nvdimm bus services need a 'dev' parameter, and we record the device
+	 * at init in bb.dev.
+	 */
+	return pmem->bb.dev;
+}
+
+static inline struct nd_region *to_region(struct pmem_device *pmem)
+{
+	return to_nd_region(to_dev(pmem)->parent);
+}
+
+struct device *to_dev(struct pmem_device *pmem);
+struct nd_region *to_region(struct pmem_device *pmem);
+blk_status_t pmem_clear_poison(struct pmem_device *pmem,
+		phys_addr_t offset, unsigned int len);
+void write_pmem(void *pmem_addr, struct page *page,
+		unsigned int off, unsigned int len);
+blk_status_t read_pmem(struct page *page, unsigned int off,
+		void *pmem_addr, unsigned int len);
+blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page,
+			unsigned int len, unsigned int off, bool is_write,
+			sector_t sector);
+int pmem_rw_page(struct block_device *bdev, sector_t sector,
+		       struct page *page, bool is_write);
+void nd_pmem_notify(struct device *dev, enum nvdimm_event event);
+long pmem_dax_direct_access(struct dax_device *dax_dev,
+		pgoff_t pgoff, long nr_pages, void **kaddr, pfn_t *pfn);
+size_t pmem_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff,
+		void *addr, size_t bytes, struct iov_iter *i);
+void pmem_dax_flush(struct dax_device *dax_dev, pgoff_t pgoff,
+		void *addr, size_t size);
 long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
 		long nr_pages, void **kaddr, pfn_t *pfn);
+int nd_pmem_remove(struct device *dev);
+void nd_pmem_shutdown(struct device *dev);
+int pmem_core_remap_pages(struct device *dev,
+		struct pmem_device *pmem, struct nd_namespace_common *ndns);
+int pmem_core_setup_disk(struct device *dev,
+		struct pmem_device *pmem,
+		struct nd_namespace_common *ndns,
+		const struct block_device_operations *block_ops,
+		const struct dax_operations *dax_ops,
+		const struct attribute_group **attrib);
+void pmem_core_setup_queue(struct device *dev, struct pmem_device *pmem,
+		struct nd_namespace_common *ndns);
+struct pmem_device *pmem_core_setup_pmem(struct device *dev,
+		struct nd_namespace_common *ndns);
+
 #endif /* __NVDIMM_PMEM_H__ */
diff --git a/drivers/nvdimm/pmem_core.c b/drivers/nvdimm/pmem_core.c
new file mode 100644
index 0000000..521b61b
--- /dev/null
+++ b/drivers/nvdimm/pmem_core.c
@@ -0,0 +1,451 @@
+/*
+ * Persistent Memory Block Driver shared code
+ * Copyright (c) 2014-2017, Intel Corporation.
+ * Copyright (c) 2015, Christoph Hellwig <hch@lst.de>.
+ * Copyright (c) 2015, Boaz Harrosh <boaz@plexistor.com>.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/blkdev.h>
+#include <linux/badblocks.h>
+#include <linux/memremap.h>
+#include <linux/pfn_t.h>
+#include <linux/uio.h>
+#include "pmem.h"
+#include "pfn.h"
+
+blk_status_t pmem_clear_poison(struct pmem_device *pmem,
+		phys_addr_t offset, unsigned int len)
+{
+	struct device *dev = to_dev(pmem);
+	sector_t sector;
+	long cleared;
+	blk_status_t rc = BLK_STS_OK;
+
+	sector = (offset - pmem->data_offset) / 512;
+
+	cleared = nvdimm_clear_poison(dev, pmem->phys_addr + offset, len);
+	if (cleared < len)
+		rc = BLK_STS_IOERR;
+	if (cleared > 0 && cleared / 512) {
+		cleared /= 512;
+		dev_dbg(dev, "%s: %#llx clear %ld sector%s\n", __func__,
+				(unsigned long long) sector, cleared,
+				cleared > 1 ? "s" : "");
+		badblocks_clear(&pmem->bb, sector, cleared);
+		if (pmem->bb_state)
+			sysfs_notify_dirent(pmem->bb_state);
+	}
+
+	arch_invalidate_pmem(pmem->virt_addr + offset, len);
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(pmem_clear_poison);
+
+void write_pmem(void *pmem_addr, struct page *page,
+		unsigned int off, unsigned int len)
+{
+	void *mem = kmap_atomic(page);
+
+	memcpy_flushcache(pmem_addr, mem + off, len);
+	kunmap_atomic(mem);
+}
+EXPORT_SYMBOL_GPL(write_pmem);
+
+blk_status_t read_pmem(struct page *page, unsigned int off,
+		void *pmem_addr, unsigned int len)
+{
+	int rc;
+	void *mem = kmap_atomic(page);
+
+	rc = memcpy_mcsafe(mem + off, pmem_addr, len);
+	kunmap_atomic(mem);
+	if (rc)
+		return BLK_STS_IOERR;
+	return BLK_STS_OK;
+}
+EXPORT_SYMBOL_GPL(read_pmem);
+
+blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page,
+			unsigned int len, unsigned int off, bool is_write,
+			sector_t sector)
+{
+	blk_status_t rc = BLK_STS_OK;
+	bool bad_pmem = false;
+	phys_addr_t pmem_off = sector * 512 + pmem->data_offset;
+	void *pmem_addr = pmem->virt_addr + pmem_off;
+
+	if (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
+		bad_pmem = true;
+
+	if (!is_write) {
+		if (unlikely(bad_pmem))
+			rc = BLK_STS_IOERR;
+		else {
+			rc = read_pmem(page, off, pmem_addr, len);
+			flush_dcache_page(page);
+		}
+	} else {
+		/*
+		 * Note that we write the data both before and after
+		 * clearing poison.  The write before clear poison
+		 * handles situations where the latest written data is
+		 * preserved and the clear poison operation simply marks
+		 * the address range as valid without changing the data.
+		 * In this case application software can assume that an
+		 * interrupted write will either return the new good
+		 * data or an error.
+		 *
+		 * However, if pmem_clear_poison() leaves the data in an
+		 * indeterminate state we need to perform the write
+		 * after clear poison.
+		 */
+		flush_dcache_page(page);
+		write_pmem(pmem_addr, page, off, len);
+		if (unlikely(bad_pmem)) {
+			rc = pmem_clear_poison(pmem, pmem_off, len);
+			write_pmem(pmem_addr, page, off, len);
+		}
+	}
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(pmem_do_bvec);
+
+int pmem_rw_page(struct block_device *bdev, sector_t sector,
+		       struct page *page, bool is_write)
+{
+	struct pmem_device *pmem = bdev->bd_queue->queuedata;
+	blk_status_t rc;
+
+	rc = pmem_do_bvec(pmem, page, PAGE_SIZE, 0, is_write, sector);
+
+	/*
+	 * The ->rw_page interface is subtle and tricky.  The core
+	 * retries on any error, so we can only invoke page_endio() in
+	 * the successful completion case.  Otherwise, we'll see crashes
+	 * caused by double completion.
+	 */
+	if (rc == 0)
+		page_endio(page, is_write, 0);
+
+	return blk_status_to_errno(rc);
+}
+EXPORT_SYMBOL_GPL(pmem_rw_page);
+
+/* see "strong" declaration in tools/testing/nvdimm/pmem-dax.c */
+__weak long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
+		long nr_pages, void **kaddr, pfn_t *pfn)
+{
+	resource_size_t offset = PFN_PHYS(pgoff) + pmem->data_offset;
+
+	if (unlikely(is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512,
+					PFN_PHYS(nr_pages))))
+		return -EIO;
+	*kaddr = pmem->virt_addr + offset;
+	*pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);
+
+	/*
+	 * If badblocks are present, limit known good range to the
+	 * requested range.
+	 */
+	if (unlikely(pmem->bb.count))
+		return nr_pages;
+	return PHYS_PFN(pmem->size - pmem->pfn_pad - offset);
+}
+
+long pmem_dax_direct_access(struct dax_device *dax_dev,
+		pgoff_t pgoff, long nr_pages, void **kaddr, pfn_t *pfn)
+{
+	struct pmem_device *pmem = dax_get_private(dax_dev);
+
+	return __pmem_direct_access(pmem, pgoff, nr_pages, kaddr, pfn);
+}
+EXPORT_SYMBOL_GPL(pmem_dax_direct_access);
+
+size_t pmem_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff,
+		void *addr, size_t bytes, struct iov_iter *i)
+{
+	return copy_from_iter_flushcache(addr, bytes, i);
+}
+EXPORT_SYMBOL_GPL(pmem_copy_from_iter);
+
+void pmem_dax_flush(struct dax_device *dax_dev, pgoff_t pgoff,
+		void *addr, size_t size)
+{
+	arch_wb_cache_pmem(addr, size);
+}
+EXPORT_SYMBOL_GPL(pmem_dax_flush);
+
+void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
+{
+	struct nd_region *nd_region;
+	resource_size_t offset = 0, end_trunc = 0;
+	struct nd_namespace_common *ndns;
+	struct nd_namespace_io *nsio;
+	struct resource res;
+	struct badblocks *bb;
+	struct kernfs_node *bb_state;
+
+	if (event != NVDIMM_REVALIDATE_POISON)
+		return;
+
+	if (is_nd_btt(dev)) {
+		struct nd_btt *nd_btt = to_nd_btt(dev);
+
+		ndns = nd_btt->ndns;
+		nd_region = to_nd_region(ndns->dev.parent);
+		nsio = to_nd_namespace_io(&ndns->dev);
+		bb = &nsio->bb;
+		bb_state = NULL;
+	} else {
+		struct pmem_device *pmem = dev_get_drvdata(dev);
+
+		nd_region = to_region(pmem);
+		bb = &pmem->bb;
+		bb_state = pmem->bb_state;
+
+		if (is_nd_pfn(dev)) {
+			struct nd_pfn *nd_pfn = to_nd_pfn(dev);
+			struct nd_pfn_sb *pfn_sb = nd_pfn->pfn_sb;
+
+			ndns = nd_pfn->ndns;
+			offset = pmem->data_offset +
+					__le32_to_cpu(pfn_sb->start_pad);
+			end_trunc = __le32_to_cpu(pfn_sb->end_trunc);
+		} else {
+			ndns = to_ndns(dev);
+		}
+
+		nsio = to_nd_namespace_io(&ndns->dev);
+	}
+
+	res.start = nsio->res.start + offset;
+	res.end = nsio->res.end - end_trunc;
+	nvdimm_badblocks_populate(nd_region, bb, &res);
+	if (bb_state)
+		sysfs_notify_dirent(bb_state);
+}
+EXPORT_SYMBOL_GPL(nd_pmem_notify);
+
+static void pmem_freeze_queue(void *q)
+{
+	blk_freeze_queue_start(q);
+}
+
+int pmem_core_remap_pages(struct device *dev, struct pmem_device *pmem,
+		struct nd_namespace_common *ndns)
+{
+	struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
+	void *addr;
+	struct nd_pfn_sb *pfn_sb;
+	struct nd_pfn *nd_pfn = NULL;
+	struct resource *res = &nsio->res;
+	struct resource pfn_res;
+	struct vmem_altmap __altmap, *altmap = NULL;
+
+	/* while nsio_rw_bytes is active, parse a pfn info block if present */
+	if (is_nd_pfn(dev)) {
+		nd_pfn = to_nd_pfn(dev);
+		altmap = nvdimm_setup_pfn(nd_pfn, &pfn_res, &__altmap);
+		if (IS_ERR(altmap))
+			return PTR_ERR(altmap);
+	}
+
+	pmem->pfn_flags = PFN_DEV;
+	if (is_nd_pfn(dev)) {
+		addr = devm_memremap_pages(dev, &pfn_res,
+				&pmem->q->q_usage_counter, altmap);
+		pfn_sb = nd_pfn->pfn_sb;
+		pmem->data_offset = le64_to_cpu(pfn_sb->dataoff);
+		pmem->pfn_pad = resource_size(res) - resource_size(&pfn_res);
+		pmem->pfn_flags |= PFN_MAP;
+		res = &pfn_res; /* for badblocks populate */
+		res->start += pmem->data_offset;
+	} else if (pmem_should_map_pages(dev)) {
+		addr = devm_memremap_pages(dev, &nsio->res,
+				&pmem->q->q_usage_counter, NULL);
+		pmem->pfn_flags |= PFN_MAP;
+	} else
+		addr = devm_memremap(dev, pmem->phys_addr,
+				pmem->size, ARCH_MEMREMAP_PMEM);
+
+	/*
+	 * At release time the queue must be frozen before
+	 * devm_memremap_pages is unwound
+	 */
+	if (devm_add_action_or_reset(dev, pmem_freeze_queue, pmem->q))
+		return -ENXIO;
+
+	if (IS_ERR(addr))
+		return -ENXIO;
+
+	pmem->virt_addr = addr;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pmem_core_remap_pages);
+
+static void pmem_release_disk(void *__pmem)
+{
+	struct pmem_device *pmem = __pmem;
+
+	kill_dax(pmem->dax_dev);
+	put_dax(pmem->dax_dev);
+	del_gendisk(pmem->disk);
+	put_disk(pmem->disk);
+}
+
+int pmem_core_setup_disk(struct device *dev,
+		struct pmem_device *pmem,
+		struct nd_namespace_common *ndns,
+		const struct block_device_operations *block_ops,
+		const struct dax_operations *dax_ops,
+		const struct attribute_group **attribs)
+{
+	struct device *gendev;
+	struct gendisk *disk;
+	struct dax_device *dax_dev;
+	struct nd_region *nd_region = to_nd_region(dev->parent);
+	int wbc, fua;
+	int nid = dev_to_node(dev);
+	struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
+	struct resource *res = &nsio->res;
+
+	wbc = nvdimm_has_cache(nd_region);
+	fua = nvdimm_has_flush(nd_region);
+	if (!IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) || fua < 0) {
+		dev_warn(dev, "unable to guarantee persistence of writes\n");
+		fua = 0;
+	}
+
+	disk = alloc_disk_node(0, nid);
+	if (!disk)
+		return -ENOMEM;
+	pmem->disk = disk;
+
+	disk->fops = block_ops;
+	disk->queue = pmem->q;
+	disk->flags = GENHD_FL_EXT_DEVT;
+	nvdimm_namespace_disk_name(ndns, disk->disk_name);
+	set_capacity(disk, (pmem->size - pmem->pfn_pad - pmem->data_offset)
+			/ 512);
+
+	if (devm_init_badblocks(dev, &pmem->bb))
+		return -ENXIO;
+
+	nvdimm_badblocks_populate(nd_region, &pmem->bb, res);
+	disk->bb = &pmem->bb;
+
+	dax_dev = alloc_dax(pmem, disk->disk_name, dax_ops);
+	if (!dax_dev) {
+		put_disk(disk);
+		return -ENOMEM;
+	}
+
+	dax_write_cache(dax_dev, wbc);
+	pmem->dax_dev = dax_dev;
+	gendev = disk_to_dev(disk);
+	gendev->groups = attribs;
+
+	device_add_disk(dev, disk);
+	if (devm_add_action_or_reset(dev, pmem_release_disk, pmem))
+		return -ENOMEM;
+
+	revalidate_disk(disk);
+
+	pmem->bb_state = sysfs_get_dirent(disk_to_dev(disk)->kobj.sd,
+					  "badblocks");
+	if (!pmem->bb_state)
+		dev_warn(dev, "'badblocks' notification disabled\n");
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pmem_core_setup_disk);
+
+void pmem_core_setup_queue(struct device *dev, struct pmem_device *pmem,
+		struct nd_namespace_common *ndns)
+{
+	struct nd_region *nd_region = to_nd_region(dev->parent);
+	int fua, wbc;
+
+	wbc = nvdimm_has_cache(nd_region);
+	fua = nvdimm_has_flush(nd_region);
+	if (!IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) || fua < 0) {
+		dev_warn(dev, "unable to guarantee persistence of writes\n");
+		fua = 0;
+	}
+
+	blk_queue_write_cache(pmem->q, wbc, fua);
+	blk_queue_physical_block_size(pmem->q, PAGE_SIZE);
+	blk_queue_logical_block_size(pmem->q, pmem_sector_size(ndns));
+	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, pmem->q);
+	queue_flag_set_unlocked(QUEUE_FLAG_DAX, pmem->q);
+	pmem->q->queuedata = pmem;
+}
+EXPORT_SYMBOL_GPL(pmem_core_setup_queue);
+
+struct pmem_device *pmem_core_setup_pmem(struct device *dev,
+		struct nd_namespace_common *ndns)
+{
+	struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
+	struct resource *res = &nsio->res;
+	struct pmem_device *pmem;
+
+	/* we're attaching a block device, disable raw namespace access */
+	devm_nsio_disable(dev, nsio);
+
+	pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL);
+	if (!pmem)
+		return NULL;
+
+	dev_set_drvdata(dev, pmem);
+	pmem->phys_addr = res->start;
+	pmem->size = resource_size(res);
+	if (!devm_request_mem_region(dev, res->start, resource_size(res),
+				dev_name(&ndns->dev))) {
+		dev_warn(dev, "could not reserve region %pR\n", res);
+		return NULL;
+	}
+
+
+	return pmem;
+}
+EXPORT_SYMBOL_GPL(pmem_core_setup_pmem);
+
+int nd_pmem_remove(struct device *dev)
+{
+	struct pmem_device *pmem = dev_get_drvdata(dev);
+
+	if (is_nd_btt(dev))
+		nvdimm_namespace_detach_btt(to_nd_btt(dev));
+	else {
+		/*
+		 * Note, this assumes device_lock() context to not race
+		 * nd_pmem_notify()
+		 */
+		sysfs_put(pmem->bb_state);
+		pmem->bb_state = NULL;
+	}
+	nvdimm_flush(to_nd_region(dev->parent));
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nd_pmem_remove);
+
+void nd_pmem_shutdown(struct device *dev)
+{
+	nvdimm_flush(to_nd_region(dev->parent));
+}
+EXPORT_SYMBOL_GPL(nd_pmem_shutdown);
+
+MODULE_LICENSE("GPL v2");

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 9/9] libnvdimm: Add DMA based blk-mq pmem driver
  2017-08-30 20:55 [PATCH v7 0/9] libnvdimm: add DMA supported blk-mq pmem driver Dave Jiang
                   ` (7 preceding siblings ...)
  2017-08-30 20:56 ` [PATCH v7 8/9] libnvdimm: move common function for pmem to pmem_core Dave Jiang
@ 2017-08-30 20:56 ` Dave Jiang
  2017-08-31 22:13   ` Kani, Toshimitsu
  8 siblings, 1 reply; 12+ messages in thread
From: Dave Jiang @ 2017-08-30 20:56 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams; +Cc: dmaengine, hch, linux-nvdimm

Add a DMA supported blk-mq driver for pmem. This provides significant
CPU utilization reduction at the cost of some increased latency and
bandwidth reduction in some cases.  By default the current cpu-copy based
pmem driver will load, but this driver can be manually selected with a
modprobe configuration. The pmem driver will be using blk-mq with DMA
through the dmaengine API.

Numbers below are measured against pmem simulated via DRAM using
memmap=NN!SS.  DMA engine used is the ioatdma on Intel Skylake Xeon
platform.  Keep in mind the performance for persistent memory
will differ.
Fio 2.21 was used.

64k: 1 task queuedepth=1
CPU Read:  7631 MB/s  99.7% CPU    DMA Read: 2415 MB/s  54% CPU
CPU Write: 3552 MB/s  100% CPU     DMA Write 2173 MB/s  54% CPU

64k: 16 tasks queuedepth=16
CPU Read: 36800 MB/s  1593% CPU    DMA Read:  29100 MB/s  607% CPU
CPU Write 20900 MB/s  1589% CPU    DMA Write: 23400 MB/s  585% CPU

2M: 1 task queuedepth=1
CPU Read:  6013 MB/s  99.3% CPU    DMA Read:  7986 MB/s  59.3% CPU
CPU Write: 3579 MB/s  100% CPU     DMA Write: 5211 MB/s  58.3% CPU

2M: 16 tasks queuedepth=16
CPU Read:  18100 MB/s 1588% CPU    DMA Read:  21300 MB/s 180.9% CPU
CPU Write: 14100 MB/s 1594% CPU    DMA Write: 20400 MB/s 446.9% CPU

Also, due to a significant portion of the code being shared with the
pmem driver, the common code are broken out into a kernel module
called pmem_core to be shared between the two drivers.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
---
 drivers/nvdimm/Kconfig    |   18 ++
 drivers/nvdimm/Makefile   |    3 
 drivers/nvdimm/pmem.h     |    1 
 drivers/nvdimm/pmem_dma.c |  475 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 497 insertions(+)
 create mode 100644 drivers/nvdimm/pmem_dma.c

diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index 01fe9e8..e0a4589 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -40,6 +40,24 @@ config BLK_DEV_PMEM
 
 	  Say Y if you want to use an NVDIMM
 
+config BLK_DEV_PMEM_DMA
+	tristate "PMEM: Persistent memory block device with DMA support"
+	depends on DMA_ENGINE
+	depends on BLK_DEV_PMEM=m || !BLK_DEV_PMEM
+	default LIBNVDIMM
+	select BLK_DEV_PMEM_CORE
+	help
+	  This driver utilizes DMA engines provided by the platform to
+	  help offload the data copying. The desire for this driver is to
+	  reduce CPU utilization with some sacrifice in latency and
+	  performance. Initial benchmarks on DRAM showed that as low
+	  as about 30% of the CPU was used for DMA vs doing CPU copy with
+	  some reduction in throughput. Be aware that when DAX is used,
+	  DMA is bypassed and only CPU is used. The path for using DMA is
+	  only through the normal block device path.
+
+	  Say Y if you want to use an NVDIMM
+
 config ND_BLK
 	tristate "BLK: Block data window (aperture) device support"
 	default LIBNVDIMM
diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 0ce99cf..cecc280 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -1,6 +1,7 @@
 obj-$(CONFIG_LIBNVDIMM) += libnvdimm.o
 obj-$(CONFIG_BLK_DEV_PMEM_CORE) += nd_pmem_core.o
 obj-$(CONFIG_BLK_DEV_PMEM) += nd_pmem.o
+obj-$(CONFIG_BLK_DEV_PMEM_DMA) += nd_pmem_dma.o
 obj-$(CONFIG_ND_BTT) += nd_btt.o
 obj-$(CONFIG_ND_BLK) += nd_blk.o
 obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
@@ -9,6 +10,8 @@ nd_pmem_core-y := pmem_core.o
 
 nd_pmem-y := pmem.o
 
+nd_pmem_dma-y := pmem_dma.o
+
 nd_btt-y := btt.o
 
 nd_blk-y := blk.o
diff --git a/drivers/nvdimm/pmem.h b/drivers/nvdimm/pmem.h
index 6df833e..ed83967 100644
--- a/drivers/nvdimm/pmem.h
+++ b/drivers/nvdimm/pmem.h
@@ -40,6 +40,7 @@ struct pmem_device {
 	struct gendisk		*disk;
 	struct blk_mq_tag_set	tag_set;
 	struct request_queue	*q;
+	unsigned int		sg_allocated;
 };
 
 static inline struct device *to_dev(struct pmem_device *pmem)
diff --git a/drivers/nvdimm/pmem_dma.c b/drivers/nvdimm/pmem_dma.c
new file mode 100644
index 0000000..a9c6c14
--- /dev/null
+++ b/drivers/nvdimm/pmem_dma.c
@@ -0,0 +1,475 @@
+/*
+ * Persistent Memory Block DMA Driver
+ * Copyright (c) 2017, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/blk-mq.h>
+#include <linux/dmaengine.h>
+#include <linux/dma-mapping.h>
+#include <linux/nodemask.h>
+#include "pmem.h"
+#include "pfn.h"
+#include "nd.h"
+
+/* After doing some measurements with various queue depth while running
+ * fio at 4k with 16 processes, it seems that a queue depth of 128
+ * provides the best performance. We can adjust this later when new
+ * data says otherwise.
+ */
+static int queue_depth = 128;
+
+struct pmem_cmd {
+	struct request *rq;
+	struct dma_chan *chan;
+	int sg_nents;
+	struct scatterlist sg[];
+};
+
+static void pmem_release_queue(void *data)
+{
+	struct pmem_device *pmem = data;
+
+	blk_cleanup_queue(pmem->q);
+	blk_mq_free_tag_set(&pmem->tag_set);
+}
+
+static void nd_pmem_dma_callback(void *data,
+		const struct dmaengine_result *res)
+{
+	struct pmem_cmd *cmd = data;
+	struct request *req = cmd->rq;
+	struct request_queue *q = req->q;
+	struct pmem_device *pmem = q->queuedata;
+	struct nd_region *nd_region = to_region(pmem);
+	struct device *dev = to_dev(pmem);
+	blk_status_t blk_status = BLK_STS_OK;
+
+	if (res) {
+		switch (res->result) {
+		case DMA_TRANS_READ_FAILED:
+		case DMA_TRANS_WRITE_FAILED:
+		case DMA_TRANS_ABORTED:
+			dev_dbg(dev, "bio failed\n");
+			blk_status = BLK_STS_IOERR;
+			break;
+		case DMA_TRANS_NOERROR:
+		default:
+			break;
+		}
+	}
+
+	if (req_op(req) == REQ_OP_WRITE && req->cmd_flags & REQ_FUA)
+		nvdimm_flush(nd_region);
+
+	blk_mq_end_request(cmd->rq, blk_status);
+}
+
+static int pmem_check_bad_pmem(struct pmem_cmd *cmd, bool is_write)
+{
+	struct request *req = cmd->rq;
+	struct request_queue *q = req->q;
+	struct pmem_device *pmem = q->queuedata;
+	struct bio_vec bvec;
+	struct req_iterator iter;
+
+	rq_for_each_segment(bvec, req, iter) {
+		sector_t sector = iter.iter.bi_sector;
+		unsigned int len = bvec.bv_len;
+		unsigned int off = bvec.bv_offset;
+
+		if (unlikely(is_bad_pmem(&pmem->bb, sector, len))) {
+			if (is_write) {
+				struct page *page = bvec.bv_page;
+				phys_addr_t pmem_off = sector * 512 +
+					pmem->data_offset;
+				void *pmem_addr = pmem->virt_addr + pmem_off;
+
+		/*
+		 * Note that we write the data both before and after
+		 * clearing poison.  The write before clear poison
+		 * handles situations where the latest written data is
+		 * preserved and the clear poison operation simply marks
+		 * the address range as valid without changing the data.
+		 * In this case application software can assume that an
+		 * interrupted write will either return the new good
+		 * data or an error.
+		 *
+		 * However, if pmem_clear_poison() leaves the data in an
+		 * indeterminate state we need to perform the write
+		 * after clear poison.
+		 */
+				flush_dcache_page(page);
+				write_pmem(pmem_addr, page, off, len);
+				pmem_clear_poison(pmem, pmem_off, len);
+				write_pmem(pmem_addr, page, off, len);
+			} else
+				return -EIO;
+		}
+	}
+
+	return 0;
+}
+
+static blk_status_t pmem_handle_cmd_dma(struct pmem_cmd *cmd, bool is_write)
+{
+	struct request *req = cmd->rq;
+	struct request_queue *q = req->q;
+	struct pmem_device *pmem = q->queuedata;
+	struct device *dev = to_dev(pmem);
+	phys_addr_t pmem_off = blk_rq_pos(req) * 512 + pmem->data_offset;
+	void *pmem_addr = pmem->virt_addr + pmem_off;
+	size_t len;
+	struct dma_device *dma = cmd->chan->device;
+	struct dmaengine_unmap_data *unmap;
+	dma_cookie_t cookie;
+	struct dma_async_tx_descriptor *txd;
+	struct page *page;
+	unsigned int off;
+	int rc;
+	blk_status_t blk_status = BLK_STS_OK;
+	enum dma_data_direction dir;
+	dma_addr_t dma_addr;
+
+	rc = pmem_check_bad_pmem(cmd, is_write);
+	if (rc < 0) {
+		blk_status = BLK_STS_IOERR;
+		goto err;
+	}
+
+	unmap = dmaengine_get_unmap_data(dma->dev, 2, GFP_NOWAIT);
+	if (!unmap) {
+		dev_dbg(dev, "failed to get dma unmap data\n");
+		blk_status = BLK_STS_IOERR;
+		goto err;
+	}
+
+	/*
+	 * If reading from pmem, writing to scatterlist,
+	 * and if writing to pmem, reading from scatterlist.
+	 */
+	dir = is_write ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
+	cmd->sg_nents = blk_rq_map_sg(req->q, req, cmd->sg);
+	if (cmd->sg_nents < 1) {
+		blk_status = BLK_STS_IOERR;
+		goto err;
+	}
+
+	WARN_ON_ONCE(cmd->sg_nents > pmem->sg_allocated);
+
+	rc = dma_map_sg(dma->dev, cmd->sg, cmd->sg_nents, dir);
+	if (rc < 1) {
+		dev_dbg(dma->dev, "DMA scatterlist mapping error\n");
+		blk_status = BLK_STS_IOERR;
+		goto err;
+	}
+
+	unmap->unmap_sg.sg = cmd->sg;
+	unmap->sg_nents = cmd->sg_nents;
+	if (is_write)
+		unmap->from_sg = 1;
+	else
+		unmap->to_sg = 1;
+
+	len = blk_rq_payload_bytes(req);
+	page = virt_to_page(pmem_addr);
+	off = offset_in_page(pmem_addr);
+	dir = is_write ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
+	dma_addr = dma_map_page(dma->dev, page, off, len, dir);
+	if (dma_mapping_error(dma->dev, unmap->addr[0])) {
+		dev_dbg(dma->dev, "DMA buffer mapping error\n");
+		blk_status = BLK_STS_IOERR;
+		goto err_unmap;
+	}
+
+	unmap->unmap_sg.buf_phys = dma_addr;
+	unmap->len = len;
+	if (is_write)
+		unmap->to_cnt = 1;
+	else
+		unmap->from_cnt = 1;
+
+	txd = dmaengine_prep_dma_memcpy_sg(cmd->chan,
+				cmd->sg, cmd->sg_nents, dma_addr,
+				!is_write, DMA_PREP_INTERRUPT);
+	if (!txd) {
+		dev_dbg(dma->dev, "dma prep failed\n");
+		blk_status = BLK_STS_IOERR;
+		goto err_unmap;
+	}
+
+	txd->callback_result = nd_pmem_dma_callback;
+	txd->callback_param = cmd;
+	dma_set_unmap(txd, unmap);
+	dmaengine_unmap_put(unmap);
+	cookie = dmaengine_submit(txd);
+	if (dma_submit_error(cookie)) {
+		dev_dbg(dma->dev, "dma submit error\n");
+		blk_status = BLK_STS_IOERR;
+		goto err_set_unmap;
+	}
+
+	dma_async_issue_pending(cmd->chan);
+	return BLK_STS_OK;
+
+err_set_unmap:
+	dmaengine_unmap_put(unmap);
+err_unmap:
+	dmaengine_unmap_put(unmap);
+err:
+	blk_mq_end_request(cmd->rq, blk_status);
+	return blk_status;
+}
+
+static blk_status_t pmem_handle_cmd(struct pmem_cmd *cmd, bool is_write)
+{
+	struct request *req = cmd->rq;
+	struct request_queue *q = req->q;
+	struct pmem_device *pmem = q->queuedata;
+	struct nd_region *nd_region = to_region(pmem);
+	struct bio_vec bvec;
+	struct req_iterator iter;
+	blk_status_t blk_status = BLK_STS_OK;
+
+	rq_for_each_segment(bvec, req, iter) {
+		blk_status = pmem_do_bvec(pmem, bvec.bv_page, bvec.bv_len,
+				bvec.bv_offset, is_write,
+				iter.iter.bi_sector);
+		if (blk_status != BLK_STS_OK)
+			break;
+	}
+
+	if (is_write && req->cmd_flags & REQ_FUA)
+		nvdimm_flush(nd_region);
+
+	blk_mq_end_request(cmd->rq, blk_status);
+
+	return blk_status;
+}
+
+typedef blk_status_t (*pmem_do_io)(struct pmem_cmd *cmd, bool is_write);
+
+static blk_status_t pmem_queue_rq(struct blk_mq_hw_ctx *hctx,
+		const struct blk_mq_queue_data *bd)
+{
+	struct pmem_cmd *cmd = blk_mq_rq_to_pdu(bd->rq);
+	struct request *req = cmd->rq = bd->rq;
+	struct request_queue *q = req->q;
+	struct pmem_device *pmem = q->queuedata;
+	struct nd_region *nd_region = to_region(pmem);
+	blk_status_t blk_status = BLK_STS_OK;
+	pmem_do_io do_io;
+
+	blk_mq_start_request(req);
+	cmd->chan = dma_find_channel(DMA_MEMCPY_SG);
+	if (cmd->chan)
+		do_io = pmem_handle_cmd_dma;
+	else
+		do_io = pmem_handle_cmd;
+
+	switch (req_op(req)) {
+	case REQ_PREFLUSH:
+		nvdimm_flush(nd_region);
+		blk_mq_end_request(cmd->rq, BLK_STS_OK);
+		break;
+	case REQ_OP_READ:
+		blk_status = do_io(cmd, false);
+		break;
+	case REQ_OP_WRITE:
+		blk_status = do_io(cmd, true);
+		break;
+	default:
+		blk_status = BLK_STS_NOTSUPP;
+		break;
+	}
+
+	if (blk_status != BLK_STS_OK)
+		blk_mq_end_request(cmd->rq, blk_status);
+
+	return blk_status;
+}
+
+static const struct blk_mq_ops pmem_mq_ops = {
+	.queue_rq	= pmem_queue_rq,
+};
+
+static const struct attribute_group *pmem_attribute_groups[] = {
+	&dax_attribute_group,
+	NULL,
+};
+
+static const struct block_device_operations pmem_fops = {
+	.owner =		THIS_MODULE,
+	.rw_page =		pmem_rw_page,
+	.revalidate_disk =	nvdimm_revalidate_disk,
+};
+
+static const struct dax_operations pmem_dax_ops = {
+	.direct_access = pmem_dax_direct_access,
+	.copy_from_iter = pmem_copy_from_iter,
+	.flush = pmem_dax_flush,
+};
+
+static bool pmem_dma_filter_fn(struct dma_chan *chan, void *node)
+{
+	return dev_to_node(&chan->dev->device) == (int)(unsigned long)node;
+}
+
+static int pmem_attach_disk(struct device *dev,
+		struct nd_namespace_common *ndns)
+{
+	struct pmem_device *pmem;
+	int rc;
+	struct dma_chan *chan = NULL;
+	int has_dma;
+
+	pmem = pmem_core_setup_pmem(dev, ndns);
+	if (!pmem)
+		return -ENXIO;
+
+	chan = dma_find_channel(DMA_MEMCPY_SG);
+	if (!chan)
+		dev_warn(dev, "Forced back to CPU, no DMA\n");
+
+	has_dma = 1;
+	pmem->tag_set.ops = &pmem_mq_ops;
+	if (has_dma) {
+		dma_cap_mask_t dma_mask;
+		int node = 0, count;
+
+		dma_cap_zero(dma_mask);
+		dma_cap_set(DMA_MEMCPY_SG, dma_mask);
+		count = dma_get_channel_count(&dma_mask, pmem_dma_filter_fn,
+				(void *)(unsigned long)node);
+		if (count)
+			pmem->tag_set.nr_hw_queues = count;
+		else {
+			has_dma = 0;
+			pmem->tag_set.nr_hw_queues = num_online_cpus();
+		}
+	} else
+		pmem->tag_set.nr_hw_queues = num_online_cpus();
+
+	dev_dbg(dev, "%d HW queues allocated\n", pmem->tag_set.nr_hw_queues);
+
+	pmem->tag_set.queue_depth = queue_depth;
+	pmem->tag_set.numa_node = dev_to_node(dev);
+
+	if (has_dma) {
+		pmem->sg_allocated = (SZ_4K - sizeof(struct pmem_cmd)) /
+			sizeof(struct scatterlist);
+		pmem->tag_set.cmd_size = sizeof(struct pmem_cmd) +
+			sizeof(struct scatterlist) * pmem->sg_allocated;
+	} else
+		pmem->tag_set.cmd_size = sizeof(struct pmem_cmd);
+
+	pmem->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
+	pmem->tag_set.driver_data = pmem;
+
+	rc = blk_mq_alloc_tag_set(&pmem->tag_set);
+	if (rc < 0)
+		return rc;
+
+	pmem->q = blk_mq_init_queue(&pmem->tag_set);
+	if (IS_ERR(pmem->q)) {
+		blk_mq_free_tag_set(&pmem->tag_set);
+		return -ENOMEM;
+	}
+
+	pmem_core_setup_queue(dev, pmem, ndns);
+
+	if (has_dma) {
+		u64 xfercap = dma_get_desc_xfercap(chan);
+
+		/* set it to some sane size if DMA driver didn't export */
+		if (xfercap == 0)
+			xfercap = SZ_1M;
+
+		dev_dbg(dev, "xfercap: %#llx\n", xfercap);
+		/* max xfer size is per_descriptor_cap * num_of_sg */
+		blk_queue_max_hw_sectors(pmem->q,
+				pmem->sg_allocated * xfercap / 512);
+		blk_queue_max_segments(pmem->q, pmem->sg_allocated);
+	}
+		blk_queue_max_hw_sectors(pmem->q, UINT_MAX);
+
+	if (devm_add_action_or_reset(dev, pmem_release_queue, pmem)) {
+		pmem_release_queue(pmem);
+		return -ENOMEM;
+	}
+
+	rc = pmem_core_remap_pages(dev, pmem, ndns);
+	if (rc < 0)
+		return rc;
+
+	rc = pmem_core_setup_disk(dev, pmem, ndns, &pmem_fops,
+			&pmem_dax_ops, pmem_attribute_groups);
+	if (rc < 0)
+		return rc;
+
+	return 0;
+}
+
+static int nd_pmem_probe(struct device *dev)
+{
+	struct nd_namespace_common *ndns;
+
+	ndns = nvdimm_namespace_common_probe(dev);
+	if (IS_ERR(ndns))
+		return PTR_ERR(ndns);
+
+	if (devm_nsio_enable(dev, to_nd_namespace_io(&ndns->dev)))
+		return -ENXIO;
+
+	if (is_nd_btt(dev))
+		return nvdimm_namespace_attach_btt(ndns);
+
+	if (is_nd_pfn(dev))
+		return pmem_attach_disk(dev, ndns);
+
+	/* if we find a valid info-block we'll come back as that personality */
+	if (nd_btt_probe(dev, ndns) == 0 || nd_pfn_probe(dev, ndns) == 0
+			|| nd_dax_probe(dev, ndns) == 0)
+		return -ENXIO;
+
+	/* ...otherwise we're just a raw pmem device */
+	return pmem_attach_disk(dev, ndns);
+}
+
+static struct nd_device_driver nd_pmem_driver = {
+	.probe = nd_pmem_probe,
+	.remove = nd_pmem_remove,
+	.notify = nd_pmem_notify,
+	.shutdown = nd_pmem_shutdown,
+	.drv = {
+		.name = "nd_pmem",
+	},
+	.type = ND_DRIVER_NAMESPACE_IO | ND_DRIVER_NAMESPACE_PMEM,
+};
+
+static int __init pmem_init(void)
+{
+	dmaengine_get();
+	return nd_driver_register(&nd_pmem_driver);
+}
+module_init(pmem_init);
+
+static void pmem_exit(void)
+{
+	dmaengine_put();
+	driver_unregister(&nd_pmem_driver.drv);
+}
+module_exit(pmem_exit);
+
+MODULE_SOFTDEP("pre: dmaengine");
+MODULE_LICENSE("GPL v2");

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 9/9] libnvdimm: Add DMA based blk-mq pmem driver
  2017-08-30 20:56 ` [PATCH v7 9/9] libnvdimm: Add DMA based blk-mq pmem driver Dave Jiang
@ 2017-08-31 22:13   ` Kani, Toshimitsu
  2017-08-31 23:37     ` Dave Jiang
  0 siblings, 1 reply; 12+ messages in thread
From: Kani, Toshimitsu @ 2017-08-31 22:13 UTC (permalink / raw)
  To: dan.j.williams, dave.jiang, vinod.koul; +Cc: dmaengine, hch, linux-nvdimm

On Wed, 2017-08-30 at 13:56 -0700, Dave Jiang wrote:
 :
+static int pmem_attach_disk(struct device *dev,
> +		struct nd_namespace_common *ndns)
> +{
> +	struct pmem_device *pmem;
> +	int rc;
> +	struct dma_chan *chan = NULL;
> +	int has_dma;
> +
> +	pmem = pmem_core_setup_pmem(dev, ndns);
> +	if (!pmem)
> +		return -ENXIO;
> +
> +	chan = dma_find_channel(DMA_MEMCPY_SG);
> +	if (!chan)
> +		dev_warn(dev, "Forced back to CPU, no DMA\n");

I just tested this series on a Haswell system.  The above warning
messages was shown during boot-up (is this expected?), but it did not
appear to switch back to CPU.  Initial IO requests to pmem (mkfs) hit
the WARN_ON_ONCE() below.  pmem->sg_allocated is zero since it was not
initialized (has_dma was set to 0).

[  897.147527] WARNING: CPU: 23 PID: 707 at
drivers/nvdimm/pmem_dma.c:171 pmem_handle_cmd_dma+0x675/0x710

It then displayed seemed-forever-repeated error messages below.

 ioatdma 0000:80:04.7: ioat_timer_event: Channel halted (1)
 ioatdma 0000:80:04.7: Errors:
 ioatdma 0000:80:04.7: Err(0): DMA Transfer Source Address Error
 ioatdma 0000:80:04.7: Reset channel...
 ioatdma 0000:80:04.7: Restart channel...
 ioatdma 0000:80:04.7: ioat_timer_event: Channel halted (1000)
 ioatdma 0000:80:04.7: Errors:
 ioatdma 0000:80:04.7: Err(12): Completion Address Error
 ioatdma 0000:80:04.7: Reset channel...
 ioatdma 0000:80:04.7: Restart channel...
 ioatdma 0000:80:04.7: ioat_timer_event: Channel halted (1000)
   :

Any thoughts?  Also, how do I choose the mode between CPU and DMA?
Thanks!
-Toshi

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 9/9] libnvdimm: Add DMA based blk-mq pmem driver
  2017-08-31 22:13   ` Kani, Toshimitsu
@ 2017-08-31 23:37     ` Dave Jiang
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Jiang @ 2017-08-31 23:37 UTC (permalink / raw)
  To: Kani, Toshimitsu, Williams, Dan J, Koul, Vinod
  Cc: dmaengine, hch, linux-nvdimm



On 08/31/2017 03:13 PM, Kani, Toshimitsu wrote:
> On Wed, 2017-08-30 at 13:56 -0700, Dave Jiang wrote:
>  :
> +static int pmem_attach_disk(struct device *dev,
>> +struct nd_namespace_common *ndns)
>> +{
>> +struct pmem_device *pmem;
>> +int rc;
>> +struct dma_chan *chan = NULL;
>> +int has_dma;
>> +
>> +pmem = pmem_core_setup_pmem(dev, ndns);
>> +if (!pmem)
>> +return -ENXIO;
>> +
>> +chan = dma_find_channel(DMA_MEMCPY_SG);
>> +if (!chan)
>> +dev_warn(dev, "Forced back to CPU, no DMA\n");
> 
> I just tested this series on a Haswell system.  The above warning
> messages was shown during boot-up (is this expected?), but it did not
> appear to switch back to CPU.  Initial IO requests to pmem (mkfs) hit
> the WARN_ON_ONCE() below.  pmem->sg_allocated is zero since it was not
> initialized (has_dma was set to 0).
> 
> [  897.147527] WARNING: CPU: 23 PID: 707 at
> drivers/nvdimm/pmem_dma.c:171 pmem_handle_cmd_dma+0x675/0x710
> 
> It then displayed seemed-forever-repeated error messages below.
> 
>  ioatdma 0000:80:04.7: ioat_timer_event: Channel halted (1)
>  ioatdma 0000:80:04.7: Errors:
>  ioatdma 0000:80:04.7: Err(0): DMA Transfer Source Address Error
>  ioatdma 0000:80:04.7: Reset channel...
>  ioatdma 0000:80:04.7: Restart channel...
>  ioatdma 0000:80:04.7: ioat_timer_event: Channel halted (1000)
>  ioatdma 0000:80:04.7: Errors:
>  ioatdma 0000:80:04.7: Err(12): Completion Address Error
>  ioatdma 0000:80:04.7: Reset channel...
>  ioatdma 0000:80:04.7: Restart channel...
>  ioatdma 0000:80:04.7: ioat_timer_event: Channel halted (1000)
>    :
> 
> Any thoughts?  Also, how do I choose the mode between CPU and DMA?
> Thanks!
> -Toshi
> 
Toshi,
I may have broken the no DMA path when I was doing some reformatting the
past revision. Let me do some debugging and I'll get back to you. To run
it without DMA, you just make sure ioatdma is blacklisted or rmmoded.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-08-31 23:35 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-30 20:55 [PATCH v7 0/9] libnvdimm: add DMA supported blk-mq pmem driver Dave Jiang
2017-08-30 20:55 ` [PATCH v7 1/9] dmaengine: ioatdma: revert 7618d035 to allow sharing of DMA channels Dave Jiang
2017-08-30 20:55 ` [PATCH v7 2/9] dmaengine: Add DMA_MEMCPY_SG transaction op Dave Jiang
2017-08-30 20:55 ` [PATCH v7 3/9] dmaengine: ioatdma: dma_prep_memcpy_sg support Dave Jiang
2017-08-30 20:55 ` [PATCH v7 4/9] dmaengine: add function to provide per descriptor xfercap for dma engine Dave Jiang
2017-08-30 20:56 ` [PATCH v7 5/9] dmaengine: add SG support to dmaengine_unmap Dave Jiang
2017-08-30 20:56 ` [PATCH v7 6/9] dmaengine: provide number of available channels Dave Jiang
2017-08-30 20:56 ` [PATCH v7 7/9] libnvdimm: remove definition of REQ_FLUSH Dave Jiang
2017-08-30 20:56 ` [PATCH v7 8/9] libnvdimm: move common function for pmem to pmem_core Dave Jiang
2017-08-30 20:56 ` [PATCH v7 9/9] libnvdimm: Add DMA based blk-mq pmem driver Dave Jiang
2017-08-31 22:13   ` Kani, Toshimitsu
2017-08-31 23:37     ` Dave Jiang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.